off.

AWS S3: Persistence of Memory

Cover Image for AWS S3: Persistence of Memory
Leo Toff
Leo Toff

A summary of Amazon S3 with code in Terraform and Node, it's intended to be used as a cheatsheet for AWS Solutions Architect certification exam.

S3 (Simple Storage Service) was the first AWS product, released in March 2006. It stores objects organized in buckets, and that's the limit of its hierarchy. Buckets are flat, there are no folders. Objects are assigned unique keys within buckets.

Resiliency

S3 automatically stores data redundantly across multiple AZs within a region to ensure high durability and availability of the stored objects. Amazon S3 is designed for 99.999999999% (11 nines) of durability and 99.99% of availability. By distributing data across multiple AZs, S3 can withstand the loss of an entire AZ without impacting data durability or availability.

However, S3 does not automatically replicate objects across regions. This has to be set up manually and paid for separately (see Cross-Region Replication).

Consistency model

S3 provides strong read-after-write consistency for PUT and DELETE operations of new objects, as well as eventual consistency for overwrite PUT and DELETE operations. This means that after a new object is created or an existing object is deleted, it can be immediately read or retrieved with strong consistency.

However, for updating (overwriting) an existing object or deleting an object followed by recreating it with the same key, S3 provides eventual consistency. This means that the changes may not be immediately visible to all clients, and it might take some time for the update to propagate across the system. The eventual consistency model helps improve performance, availability, and scalability.

Security

Simple Storage Service is designed with multiple layers of security. See S3 security for more info.

Cost

Amazon S3 Standard:

  • Storage cost: Around $0.023 per GB per month.
  • Retrieval cost: Free.

Amazon S3 Intelligent-Tiering:

  • Storage cost: Consists of two access tiers - Frequent and Infrequent.
    • Frequent access tier: Around $0.023 per GB per month.
    • Infrequent access tier: Around $0.0125 per GB per month.
  • Retrieval cost: Free.
  • Monitoring and automation fee: Around $0.0025 per 1,000 objects per month.

Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA):

  • Storage cost: Around $0.01 per GB per month.
  • Retrieval cost: Around $0.01 per GB.

Amazon S3 Glacier:

  • Storage cost: Around $0.004 per GB per month.
  • Retrieval cost:
    • Expedited retrieval: Around $0.03 per GB (within 1-5 hours).
    • Standard retrieval: Around $0.01 per GB (within 3-5 hours).
    • Bulk retrieval: Around $0.0025 per GB (within 5-12 hours).

Amazon S3 Glacier Deep Archive:

  • Storage cost: Around $0.00099 per GB per month.
  • Retrieval cost:
    • Standard retrieval: Around $0.02 per GB (within 12 hours).
    • Bulk retrieval: Around $0.0025 per GB (within 48 hours).

Basic usage

One of the most basic use cases for S3 is to upload and download objects to a bucket using AWS SDK.

Resource initialization

locals {
  bucket_name = "bucket-name"
}

resource "aws_s3_bucket" "bucket" {
  bucket = local.bucket_name
}

Uploading and downloading in Node

const AWS = require('aws-sdk');
const axios = require('axios');
const fs = require('fs');

AWS.config.update({ region: 'us-west-2' });
const s3 = new AWS.S3({ apiVersion: '2006-03-01' });

const BUCKET_NAME = 'bucket-name';
const OBJECT_KEY = 'file.ext';
const LOCAL_FILE_PATH = './path/to/file.ext';
const LOCAL_OUTPUT_FILE_PATH = './path/to/output/file.ext';

async function uploadFile(filePath, key) {
  const fileStream = fs.createReadStream(filePath);
  const params = {
    Bucket: BUCKET_NAME,
    Key: key,
    Body: fileStream,
  };

  try {
    const result = await s3.upload(params).promise();
    console.log(`File uploaded successfully: ${result.Location}`);
  } catch (error) {
    console.error(`File upload failed: ${error.message}`);
  }
}

function downloadFile(key, outputPath) {
  return new Promise(async (resolve, reject) => {
    const params = {
      Bucket: BUCKET_NAME,
      Key: key,
    };

    try {
      const fileStream = fs.createWriteStream(outputPath);
      await s3.getObject(params).createReadStream().pipe(fileStream);

      fileStream.on('finish', () => {
        console.log(`File downloaded successfully to ${outputPath}`);
        resolve();
      });

      fileStream.on('error', (error) => {
        console.error(`File download failed: ${error.message}`);
        reject(error);
      });

    } catch (error) {
      console.error(`File download failed: ${error.message}`);
      reject(error);
    }
  });
}

// Example usage
(async () => {
  await uploadFile(LOCAL_FILE_PATH, OBJECT_KEY);
  await downloadFile(OBJECT_KEY, OUTPUT_PATH);
})();

Feature: pre-signing

Pre-signing provides temporary and secure access to objects stored in Amazon S3 without requiring AWS access and secret keys. These URLs are generated using AWS credentials that have permissions to access the specified S3 object. Pre-signed URLs can be used to perform actions like uploading, downloading, or deleting an object in an S3 bucket, and they are typically used to grant temporary access to a private resource.

Resource initialization

There is no specific configuration required on the S3 bucket itself to use pre-signed URLs. However, the AWS credentials used to generate the pre-signed URLs need to have the necessary permissions to perform the desired operations (e.g., getObject, putObject, deleteObject) on the bucket. To grant these permissions, an AWS Identity and Access Management (IAM) policy that allows the necessary actions on the specified S3 bucket is created and attached to the IAM user or role whose credentials are used to generate the pre-signed URLs.

IAM policy that allows getObject and putObject actions on a specific S3 bucket:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:getObject",
        "s3:putObject"
      ],
      "Resource": "arn:aws:s3:::bucket-name/*"
    }
  ]
}

Now, to generate a pre-signed URL the IAM user or role that is used to generate it needs to have this policy attached:

  1. Create an S3 bucket
  2. Create an IAM policy that allows getObject and putObject actions on the S3 bucket.
  3. Create an IAM user.
  4. Attache the IAM policy to the IAM user.
resource "aws_s3_bucket" "this" {
  bucket = "bucket-name"
}

resource "aws_iam_policy" "this" {
  name        = "example-policy"
  description = "An example policy for accessing the S3 bucket"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect   = "Allow"
        Action   = ["s3:getObject", "s3:putObject"]
        Resource = "arn:aws:s3:::${aws_s3_bucket.this.bucket}/*"
      }
    ]
  })
}

resource "aws_iam_user" "this" {
  name = "example-user"
}

resource "aws_iam_user_policy_attachment" "this" {
  user       = aws_iam_user.this.name
  policy_arn = aws_iam_policy.this.arn
}

Pre-signing and uploading in Node

Once an S3 bucket and a user with putObject / getObject permissions on it are initilized, pre-signed URLs can be generated and used for uploading / downloading.

A typical flow of interacting with an S3 bucket using pre-signing:

  1. The Client requests a pre-signed URL from the Backend.
  2. The Backend generates a pre-signed URL by communicating with the AWS S3 service.
  3. AWS S3 returns the generated pre-signed URL to the Backend.
  4. The Backend sends the pre-signed URL to the Client.
  5. The Client uses the pre-signed URL to either upload or download a file directly to/from the AWS S3 service.
  6. AWS S3 confirms the transfer is complete and sends a response back to the Client.
sequenceDiagram
    participant Client
    participant Backend
    participant S3

    Client->>Backend: Request pre-signed URL
    activate Backend
    Backend->>S3: Generate pre-signed URL
    activate S3
    S3-->>Backend: Pre-signed URL
    deactivate S3
    Backend-->>Client: Pre-signed URL
    deactivate Backend
    Client->>S3: Upload/Download using pre-signed URL
    activate S3
    S3-->>Client: Transfer complete
    deactivate S3

Node code for generating a pre-signed URL for uploading an object to S3, and then using the URL for uploading data:

const AWS = require('aws-sdk');
const axios = require('axios');
const fs = require('fs');

AWS.config.update({ region: 'us-west-2' });
const s3 = new AWS.S3({ apiVersion: '2006-03-01' });

const bucketName = 'example-bucket';
const objectKey = 'file.ext'; // Desired object key
const localFilePath = './path/to/file.ext'; // File path

// Generate a pre-signed URL for uploading
async function generateUploadUrl() {
  return new Promise((resolve, reject) => {
    const params = {
      Bucket: bucketName,
      Key: objectKey,
      Expires: 60 * 60, // URL expires in 1 hour
    };

    // 'putObject' for uploading
    // 'getObject' for downloading
    s3.getSignedUrl('putObject', params, (error, url) => {
      if (error) {
        console.error('Error generating pre-signed URL:', error);
        reject(error);
      } else {
        console.log('Pre-signed URL for uploading:', url);
        resolve(url);
      }
    });
  });
}

async function uploadFile(uploadUrl, localFilePath) {
  try {
    const fileStream = fs.createReadStream(localFilePath);
    const response = await axios.put(uploadUrl, fileStream, {
      headers: {
        'Content-Type': 'binary/octet-stream',
      },
    });
    console.log('File uploaded successfully:', response.status);
  } catch (error) {
    console.error('Error uploading file:', error);
  }
}

(async () => {
  const uploadUrl = await generateUploadUrl();
  await uploadFile(uploadUrl);
})();

Reasons to use pre-signed S3 URLs:

  1. Security: Pre-signed URLs provide a secure way to grant temporary access to private resources in an S3 bucket. By using a pre-signed URL, one doesn't need to share their AWS access and secret keys with clients or users, reducing the risk of exposing sensitive credentials.
  2. Temporary Access: Pre-signed URLs have an expiration time, which means the access to the resource is only available for a limited period. Once the URL expires, it can no longer be used to access the object, providing an extra layer of security.
  3. Ease of use: Pre-signed URLs can be easily generated and shared with clients, users, or other systems that need temporary access to a private S3 object. They can be used with standard HTTP clients or libraries without requiring the AWS SDK or additional authentication mechanisms.
  4. Offloading to clients: By using pre-signed URLs, one can offload the work of uploading or downloading files to the client, reducing the load on their application server. This can improve the application's performance and reduce costs, as the data transfer happens directly between the client and S3, bypassing the server.
  5. Fine-grained permissions: One can generate pre-signed URLs for specific operations (like upload, download, or delete) and specific objects in their S3 bucket. This allows them to grant fine-grained, temporary permissions to clients or users, without giving them access to the entire S3 bucket or AWS account.

Feature: multipart upload

Multipart upload enables users to upload large objects (files) to an S3 bucket in parts, or "parts," rather than as a single, monolithic object. This approach offers several advantages, particularly when dealing with large files or transferring data over unreliable networks. S3 doesn't require any specific configuration to enable multipart uploads. The multipart upload feature is available by default for all S3 buckets.

A general outline of the multipart upload process:

  1. Initiate the multipart upload: A request is sent to Amazon S3 to initiate a new multipart upload. S3 returns a unique upload ID, which will be used to associate the parts and complete the upload process.
  2. Upload the parts: Upload each part of the file separately, specifying the upload ID and the part number. Parts can be uploaded in any order and can even be uploaded in parallel to improve performance.
  3. Complete the multipart upload: After successfully uploading all parts, send a request to Amazon S3 to complete the upload. This request includes the upload ID and information about each uploaded part (e.g., part number and ETag). S3 then combines the parts to create the final object.
sequenceDiagram
    participant User
    participant S3

    User->>S3: Initiate multipart upload
    S3->>User: Return upload ID
    Note over User,S3: Parallel uploads begin

    User->>S3: Upload Part 1 (with upload ID & part number)
    S3->>User: Return ETag for Part 1
    User->>S3: Upload Part 2 (with upload ID & part number)
    S3->>User: Return ETag for Part 2
    User->>S3: Upload Part 3 (with upload ID & part number)
    S3->>User: Return ETag for Part 3

    Note over User,S3: Parallel uploads end
    User->>S3: Complete multipart upload (with upload ID, part numbers & ETags)
    S3->>User: Confirm upload completion

Code in Node:

const AWS = require('aws-sdk');
const fs = require('fs');
const path = require('path');

AWS.config.update({ region: 'us-west-2' });
const s3 = new AWS.S3({ apiVersion: '2006-03-01' });

const bucketName = 'bucket-name'; // S3 bucket name
const fileName = './path/to/file.ext'; // File path

async function uploadFileMultipart(fileName, bucketName) {
  try {
    const fileSize = fs.statSync(fileName).size;
    const partSize = Math.ceil(fileSize / 3); // Divide the file into 3 equal parts

    // Step 1: Initiate the multipart upload
    const initUpload = await s3.createMultipartUpload({
      Bucket: bucketName,
      Key: path.basename(fileName),
    }).promise();

    const uploadId = initUpload.UploadId;
    const partPromises = [];

    // Step 2: Upload the parts
    for (let i = 0; i < 3; i++) {
      const start = i * partSize;
      const end = Math.min((i + 1) * partSize, fileSize);
      const filePart = fs.createReadStream(fileName, { start, end: end - 1 });

      const uploadPart = s3.uploadPart({
        Bucket: bucketName,
        Key: path.basename(fileName),
        PartNumber: i + 1,
        UploadId: uploadId,
        Body: filePart,
      }).promise();

      partPromises.push(uploadPart);
    }

    const uploadedParts = await Promise.all(partPromises);
    const parts = uploadedParts.map((part, index) => ({
      ETag: part.ETag,
      PartNumber: index + 1,
    }));

    // Step 3: Complete the multipart upload
    const completeUpload = await s3.completeMultipartUpload({
      Bucket: bucketName,
      Key: path.basename(fileName),
      UploadId: uploadId,
      MultipartUpload: {
        Parts: parts,
      },
    }).promise();

    console.log('File uploaded successfully:', completeUpload.Location);
  } catch (error) {
    console.error('Error uploading the file:', error);
  }
};

(async () => {
  await uploadFileMultipart(fileName, bucketName);
})();

Why to use multipart upload:

  1. Improved performance: Uploading large files in parts allows for parallel uploads, which can significantly improve upload speeds. This is particularly useful when dealing with massive files, as the overall time required for the transfer is reduced.
  2. Network reliability: If a network connection is unstable or unreliable, uploading a large file as a single object might lead to failed uploads or the need to restart the entire process in case of an interruption. With multipart uploads, if a part fails to upload, it can be retried individually without affecting the other parts, saving time and bandwidth.
  3. Resume capability: Multipart uploads enable users to resume an upload after a failure or interruption. As each part is uploaded independently, the process can be resumed from the last successful part, rather than starting over from the beginning.
  4. Error handling: Uploading a large object as a single entity can lead to timeout errors or request failures. With multipart uploads, the likelihood of encountering such issues is reduced, as smaller parts are less likely to trigger timeouts or fail.
  5. Flexibility: Multipart uploads allow one to adjust the part size to suit their specific requirements, such as maximizing upload speed, minimizing the risk of failure, or optimizing the use of available resources.

Feature: transfer acceleration (S3TA)

S3TA improves transfer performance by routing traffic through Amazon CloudFront’s globally distributed Edge Locations and over AWS backbone networks, and by using network protocol optimizations. S3TA is enabled in the destination bucket and requires virtual-hosted style requests like <bucketname>.s3-accelerate.amazonaws.com or dual-stack <bucketname>.s3-accelerate.dualstack.amazonaws.com

Here's how to configure an S3 bucket with Transfer Acceleration enabled using Terraform:

resource "aws_s3_bucket" "this" {
  bucket = "example-bucket"
}

resource "aws_s3_bucket_acceleration_configuration" "this" {
  bucket = aws_s3_bucket.this.id
  enabled = true
}

output "s3_bucket_transfer_acceleration_endpoint" {
  value = aws_s3_bucket.this.bucket_regional_domain_name
}

It outputs the endpoint URL "s3_bucket_transfer_acceleration_endpoint" which should be used for interacting with the bucket.

Code in Node with AWS SDK that uploads a file to the bucket:

const AWS = require('aws-sdk');
const fs = require('fs');

AWS.config.update({ region: 'us-west-2' });
const s3 = new AWS.S3({ apiVersion: '2006-03-01' });

const bucketName = 'example-bucket';
const transferAccelerationEndpoint = '<bucketname>.s3-accelerate.amazonaws.com'; // "s3_bucket_transfer_acceleration_endpoint" output from Terraform
const filePath = './path/to/file.ext'; // File path
const key = 'file.ext'; // Desired object key

const s3 = new AWS.S3({
  endpoint: transferAccelerationEndpoint,
  useAccelerateEndpoint: true
});

fs.readFile(filePath, (err, data) => {
  if (err) {
    console.error('Error reading the file:', err);
    return;
  }

  const params = {
    Bucket: bucketName,
    Key: key,
    Body: data
  };

  s3.putObject(params, (err, data) => {
    if (err) {
      console.error('Error uploading the file:', err);
      return;
    }
    console.log('File uploaded successfully:', data);
  });
});

Feature: cross-region replication (CRR)

S3 provides a feature called Cross-Region Replication (CRR), which allows one to configure replication of objects to different regions. Replication among AZs within the target region is done automatically.

CRR can be useful in scenarios such as compliance requirements, minimizing latency, or providing higher durability and availability. When using CRR, a replication configuration must be set up, specifying the source and destination buckets in different regions.

TODO: add an example CRR setup in CRR and its usage in Node

Feature: file gateway

S3 File Gateway (when paired with Storage Gateway service) enables on-premises applications and workloads to access, store, and manage data in S3 using NFS (Network File System) and SMB (Server Message Block). It helps integrate cloud storage with on-premises environments.

To create an S3 bucket with NFS File Gateway enabled, one would need an AWS Storage Gateway along with an S3 bucket and an IAM user with the necessary permissions. Here's a reference setup:

locals {
  bucket_name            = "bucket-name"
  gateway_name           = "nfs-file-gateway"
  gateway_activation_key = "activation-key" # Must use secrets management service
}

resource "aws_s3_bucket" "bucket" {
  bucket = local.bucket_name
}

resource "aws_storagegateway_gateway" "nfs_gateway" {
  gateway_name     = locals.gateway_name
  gateway_timezone = "GMT"
  gateway_type     = "FILE_S3"
  activation_key   = locals.gateway_activation_key
}

resource "aws_storagegateway_nfs_file_share" "nfs_file_share" {
  client_list  = ["0.0.0.0/0"]
  gateway_arn  = aws_storagegateway_gateway.nfs_gateway.arn
  location_arn = aws_s3_bucket.bucket.arn
}