Tag Archives: Amazon Elastic File System (EFS)

Using Amazon EFS for AWS Lambda in your serverless applications

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aws-lambda-in-your-serverless-applications/

Serverless applications are event-driven, using ephemeral compute functions to integrate services and transform data. While AWS Lambda includes a 512-MB temporary file system for your code, this is an ephemeral scratch resource not intended for durable storage.

Amazon EFS is a fully managed, elastic, shared file system designed to be consumed by other AWS services, such as Lambda. With the release of Amazon EFS for Lambda, you can now easily share data across function invocations. You can also read large reference data files, and write function output to a persistent and shared store. There is no additional charge for using file systems from your Lambda function within the same VPC.

EFS for Lambda makes it simpler to use a serverless architecture to implement many common workloads. It opens new capabilities, such as building and importing large code libraries directly into your Lambda functions. Since the code is loaded dynamically, you can also ensure that the latest version of these libraries is always used by every new execution environment. For appending to existing files, EFS is also a preferred option to using Amazon S3.

This blog post shows how to enable EFS for Lambda in your AWS account, and walks through some common use-cases.

Capabilities and behaviors of Lambda with EFS

EFS is built to scale on demand to petabytes of data, growing and shrinking automatically as files are written and deleted. When used with Lambda, your code has low-latency access to a file system where data is persisted after the function terminates.

EFS is a highly reliable NFS-based regional service, with all data stored durably across multiple Availability Zones. It is cost-optimized, due to no provisioning requirements, and no purchase commitments. It uses built-in lifecycle management to optimize between SSD-performance class and an infrequent access class that offer 92% lower cost.

EFS offers two performance modes – general purpose and MaxIO. General purpose is suitable for most Lambda workloads, providing lower operational latency and higher performance for individual files.

You also choose between two throughput modes – bursting and provisioned. The bursting mode uses a credit system to determine when a file system can burst. With bursting, your throughput is calculated based upon the amount of data you are storing. Provisioned throughput is useful when you need more throughout than provided by the bursting mode. Total throughput available is divided across the number of concurrent Lambda invocations.

The Lambda service mounts EFS file systems when the execution environment is prepared. This adds minimal latency when the function is invoked for the first time, often within hundreds of milliseconds. When the execution environment is already warm from previous invocations, the EFS mount is already available.

EFS can be used with Provisioned Concurrency for Lambda. When the reserved capacity is prepared, the Lambda service also configures and mounts EFS file system. Since Provisioned Concurrency executes any initialization code, any libraries or packages consumed from EFS at this point are downloaded. In this use-case, it’s recommended to use provisioned throughout when configuring EFS.

The EFS file system is shared across Lambda functions as it scales up the number of concurrent executions. As files are written by one instance of a Lambda function, all other instances can access and modify this data, depending upon the access point permissions. The EFS file system scales with your Lambda functions, supporting up to 25,000 concurrent connections.

Creating an EFS file system

Configuring EFS for Lambda is straight-forward. I show how to do this in the AWS Management Console but you can also use the AWS CLI, AWS SDK, AWS Serverless Application Model (AWS SAM), and AWS CloudFormation. EFS file systems are always created within a customer VPC, so Lambda functions using the EFS file system must all reside in the same VPC.

To create an EFS file system:

  1. Navigate to the EFS console.
  2. Choose Create File System.
    EFS: Create File System
  3. On the Configure network access page, select your preferred VPC. Only resources within this VPC can access this EFS file system. Accept the default mount targets, and choose Next Step.
  4. On Configure file system settings, you can choose to enable encryption of data at rest. Review this setting, then accept the other defaults and choose Next Step. This uses bursting mode instead of provisioned throughout.
  5. On the Configure client access page, choose Add access point.
    EFS: Add access point
  6. Enter the following parameters. This configuration creates a file system with open read/write permissions – read more about settings to secure your access points. Choose Next Step.EFS: Access points
  7. On the Review and create page, check your settings and choose Create File System.
  8. In the EFS console, you see the new file system and its configuration. Wait until the Mount target state changes to Available before proceeding to the next steps.

Alternatively, you can use CloudFormation to create the EFS access point. With the AWS::EFS::AccessPoint resource, the preceding configuration is defined as follows:

  AccessPointResource:
    Type: 'AWS::EFS::AccessPoint'
    Properties:
      FileSystemId: !Ref FileSystemResource
      PosixUser:
        Uid: "1000"
        Gid: "1000"
      RootDirectory:
        CreationInfo:
          OwnerGid: "1000"
          OwnerUid: "1000"
          Permissions: "0777"
        Path: "/lambda"

For more information, see the example setup template in the code repository.

Working with AWS Cloud9 and Amazon EC2

You can mount EFS access points on Amazon EC2 instances. This can be useful for browsing file systems contents and downloading files from other locations. The EFS console shows customized mount instructions directly under each created file system:

EFS customized mount instructions

The instance must have access to the same security group and reside in the same VPC as the EFS file system. After connecting via SSH to the EC2 instance, you mount the EFS mount target to a directory. You can also mount EFS in AWS Cloud9 instances using the terminal window.

Any files you write into the EFS file system are available to any Lambda functions using the same EFS file system. Similarly, any files written by Lambda functions are available to the EC2 instance.

Sharing large code packages with Lambda

EFS is useful for sharing software packages or binaries that are otherwise too large for Lambda layers. You can copy these to EFS and have Lambda use these packages as if there are installed in the Lambda deployment package.

For example, on EFS you can install Puppeteer, which runs a headless Chromium browser, using the following script run on an EC2 instance or AWS Cloud9 terminal:

  mkdir node && cd node
  npm init -y
  npm i puppeteer --save

Building packages in EC2 for EFS

You can then use this package from a Lambda function connected to this folder in the EFS file system. You include the Puppeteer package with the mount path in the require declaration:

const puppeteer = require ('/mnt/efs/node/node_modules/puppeteer')

In Node.js, to avoid changing declarations manually, you can add the EFS mount path to the Node.js module search path by using app-module-path. Lambda functions support a range of other runtimes, including Python, Java, and Go. Many other runtimes offer similar ways to add the EFS path to the list of default package locations.

There is an important difference between using packages in EFS compared with Lambda layers. When you use Lambda layers to include packages, these are downloaded to an immutable code package. Any changes to the underlying layer do not affect existing functions published using that layer.

Since EFS is a dynamic binding, any changes or upgrades to packages are available immediately to the Lambda function when the execution environment is prepared. This means you can output a build process to an EFS mount, and immediately consume any new versions of the build from a Lambda function.

Configuring AWS Lambda to use EFS

Lambda functions that access EFS must run from within a VPC. Read this guide to learn more about setting up Lambda functions to access resources from a VPC. There are also sample CloudFormation templates you can use to configure private and public VPC access.

The execution role for Lambda function must provide access to the VPC and EFS. For development and testing purposes, this post uses the AWSLambdaVPCAccessExecutionRole and AmazonElasticFileSystemClientFullAccess managed policies in IAM. For production systems, you should use more restrictive policies to control access to EFS resources.

Once your Lambda function is configured to use a VPC, next configure EFS in Lambda:

  1. Navigate to the Lambda console and select your function from the list.
  2. Scroll down to the File system panel, and choose Add file system.
    EFS: Add file system
  3. In the File system configuration:
  • From the EFS file system dropdown, select the required file system. From the Access point dropdown, choose the required EFS access point.
  • In the Local mount path, enter the path your Lambda function uses to access this resource. Enter an absolute path.
  • Choose Save.
    EFS: Add file system

The File system panel now shows the configuration of the EFS mount, and the function is ready to use EFS. Alternatively, you can use an AWS Serverless Application Model (SAM) template to add the EFS configuration to a function resource:

AWSTemplateFormatVersion: '2010-09-09'
Resources:
  MyLambdaFunction:
    Type: AWS::Serverless::Function
    Properties:
	...
      FileSystemConfigs:
      - Arn: arn:aws:elasticfilesystem:us-east-1:xxxxxx:accesspoint/
fsap-123abcdef12abcdef
        LocalMountPath: /mnt/efs

To learn more, see the SAM documentation on this feature.

Example applications

You can view and download these examples from this GitHub repository. To deploy, follow the instructions in the repo’s README.md file.

1. Processing large video files

The first example uses EFS to process a 60-minute MP4 video and create screenshots for each second of the recording. This uses to the FFmpeg Linux package to process the video. After copying the MP4 to the EFS file location, invoke the Lambda function to create a series of JPG frames. This uses the following code to execute FFmpeg and pass the EFS mount path and input file parameters:

const os = require('os')

const inputFile = process.env.INPUT_FILE
const efsPath = process.env.EFS_PATH

const { exec } = require('child_process')

const execPromise = async (command) => {
	console.log(command)
	return new Promise((resolve, reject) => {
		const ls = exec(command, function (error, stdout, stderr) {
		  if (error) {
		    console.log('Error: ', error)
		    reject(error)
		  }
		  console.log('stdout: ', stdout);
		  console.log('stderr: ' ,stderr);
		})
		
		ls.on('exit', function (code) {
		  console.log('Finished: ', code);
		  resolve()
		})
	})
}

// The Lambda handler
exports.handler = async function (eventObject, context) {
	await execPromise(`/opt/bin/ffmpeg -loglevel error -i ${efsPath}/${inputFile} -s 240x135 -vf fps=1 ${efsPath}/%d.jpg`)
}

In this example, the process writes more than 2000 individual JPG files back to the EFS file system during a single invocation:

Console output from sample application

2. Archiving large numbers of files

Using the output from the first application, the second example creates a single archive file from the JPG files. The code uses the Node.js archiver package for processing:

const outputFile = process.env.OUTPUT_FILE
const efsPath = process.env.EFS_PATH

const fs = require('fs')
const archiver = require('archiver')

// The Lambda handler
exports.handler = function (event) {

  const output = fs.createWriteStream(`${efsPath}/${outputFile}`)
  const archive = archiver('zip', {
    zlib: { level: 9 } // Sets the compression level.
  })
  
  output.on('close', function() {
    console.log(archive.pointer() + ' total bytes')
  })
  
  output.on('end', function() {
    console.log('Data has been drained')
  })
  
  archive.pipe(output)  

  // append files from a glob pattern
  archive.glob(`${efsPath}/*.jpg`)
  archive.finalize()
}

After executing this Lambda function, the resulting ZIP file is written back to the EFS file system:

Console output from second sample application.

3. Unzipping archives with a large number of files

The last example shows how to unzip an archive containing many files. This uses the Node.js unzipper package for processing:

const inputFile = process.env.INPUT_FILE
const efsPath = process.env.EFS_PATH
const destinationDir = process.env.DESTINATION_DIR

const fs = require('fs')
const unzipper = require('unzipper')

// The Lambda handler
exports.handler = function (event) {

  fs.createReadStream(`${efsPath}/${inputFile}`)
    .pipe(unzipper.Extract({ path: `${efsPath}/${destinationDir}` }))

}

Once this Lambda function is executed, the archive is unzipped into a destination direction in the EFS file system. This example shows the screenshots unzipped into the frames subdirectory:

Console output from third sample application.

Conclusion

EFS for Lambda allows you to share data across function invocations, read large reference data files, and write function output to a persistent and shared store. After configuring EFS, you provide the Lambda function with an access point ARN, allowing you to read and write to this file system. Lambda securely connects the function instances to the EFS mount targets in the same Availability Zone and subnet.

EFS opens a range of potential new use-cases for Lambda. In this post, I show how this enables you to access large code packages and binaries, and process large numbers of files. You can interact with the file system via EC2 or AWS Cloud9 and pass information to and from your Lambda functions.

EFS for Lambda is supported at launch in APN Partner solutions, including Epsagon, Lumigo, Datadog, HashiCorp Terraform, and Pulumi. To learn more about how to use EFS for Lambda, see the AWS News Blog post and read the documentation.

New – A Shared File System for Your Lambda Functions

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-a-shared-file-system-for-your-lambda-functions/

I am very happy to announce that AWS Lambda functions can now mount an Amazon Elastic File System (EFS), a scalable and elastic NFS file system storing data within and across multiple availability zones (AZ) for high availability and durability. In this way, you can use a familiar file system interface to store and share data across all concurrent execution environments of one, or more, Lambda functions. EFS supports full file system access semantics, such as strong consistency and file locking.

To connect an EFS file system with a Lambda function, you use an EFS access point, an application-specific entry point into an EFS file system that includes the operating system user and group to use when accessing the file system, file system permissions, and can limit access to a specific path in the file system. This helps keeping file system configuration decoupled from the application code.

You can access the same EFS file system from multiple functions, using the same or different access points. For example, using different EFS access points, each Lambda function can access different paths in a file system, or use different file system permissions.

You can share the same EFS file system with Amazon Elastic Compute Cloud (EC2) instances, containerized applications using Amazon ECS and AWS Fargate, and on-premises servers. Following this approach, you can use different computing architectures (functions, containers, virtual servers) to process the same files. For example, a Lambda function reacting to an event can update a configuration file that is read by an application running on containers. Or you can use a Lambda function to process files uploaded by a web application running on EC2.

In this way, some use cases are much easier to implement with Lambda functions. For example:

  • Processing or loading data larger than the space available in /tmp (512MB).
  • Loading the most updated version of files that change frequently.
  • Using data science packages that require storage space to load models and other dependencies.
  • Saving function state across invocations (using unique file names, or file system locks).
  • Building applications requiring access to large amounts of reference data.
  • Migrating legacy applications to serverless architectures.
  • Interacting with data intensive workloads designed for file system access.
  • Partially updating files (using file system locks for concurrent access).
  • Moving a directory and all its content within a file system with an atomic operation.

Creating an EFS File System
To mount an EFS file system, your Lambda functions must be connected to an Amazon Virtual Private Cloud that can reach the EFS mount targets. For simplicity, I am using here the default VPC that is automatically created in each AWS Region.

Note that, when connecting Lambda functions to a VPC, networking works differently. If your Lambda functions are using Amazon Simple Storage Service (S3) or Amazon DynamoDB, you should create a gateway VPC endpoint for those services. If your Lambda functions need to access the public internet, for example to call an external API, you need to configure a NAT Gateway. I usually don’t change the configuration of my default VPCs. If I have specific requirements, I create a new VPC with private and public subnets using the AWS Cloud Development Kit, or use one of these AWS CloudFormation sample templates. In this way, I can manage networking as code.

In the EFS console, I select Create file system and make sure that the default VPC and its subnets are selected. For all subnets, I use the default security group that gives network access to other resources in the VPC using the same security group.

In the next step, I give the file system a Name tag and leave all other options to their default values.

Then, I select Add access point. I use 1001 for the user and group IDs and limit access to the /message path. In the Owner section, used to create the folder automatically when first connecting to the access point, I use the same user and group IDs as before, and 750 for permissions. With this permissions, the owner can read, write, and execute files. Users in the same group can only read. Other users have no access.

I go on, and complete the creation of the file system.

Using EFS with Lambda Functions
To start with a simple use case, let’s build a Lambda function implementing a MessageWall API to add, read, or delete text messages. Messages are stored in a file on EFS so that all concurrent execution environments of that Lambda function see the same content.

In the Lambda console, I create a new MessageWall function and select the Python 3.8 runtime. In the Permissions section, I leave the default. This will create a new AWS Identity and Access Management (IAM) role with basic permissions.

When the function is created, in the Permissions tab I click on the IAM role name to open the role in the IAM console. Here, I select Attach policies to add the AWSLambdaVPCAccessExecutionRole and AmazonElasticFileSystemClientReadWriteAccess AWS managed policies. In a production environment, you can restrict access to a specific VPC and EFS access point.

Back in the Lambda console, I edit the VPC configuration to connect the MessageWall function to all subnets in the default VPC, using the same default security group I used for the EFS mount points.

Now, I select Add file system in the new File system section of the function configuration. Here, I choose the EFS file system and accesss point I created before. For the local mount point, I use /mnt/msg and Save. This is the path where the access point will be mounted, and corresponds to the /message folder in my EFS file system.

In the Function code editor of the Lambda console, I paste the following code and Save.

import os
import fcntl

MSG_FILE_PATH = '/mnt/msg/content'


def get_messages():
    try:
        with open(MSG_FILE_PATH, 'r') as msg_file:
            fcntl.flock(msg_file, fcntl.LOCK_SH)
            messages = msg_file.read()
            fcntl.flock(msg_file, fcntl.LOCK_UN)
    except:
        messages = 'No message yet.'
    return messages


def add_message(new_message):
    with open(MSG_FILE_PATH, 'a') as msg_file:
        fcntl.flock(msg_file, fcntl.LOCK_EX)
        msg_file.write(new_message + "\n")
        fcntl.flock(msg_file, fcntl.LOCK_UN)


def delete_messages():
    try:
        os.remove(MSG_FILE_PATH)
    except:
        pass


def lambda_handler(event, context):
    method = event['requestContext']['http']['method']
    if method == 'GET':
        messages = get_messages()
    elif method == 'POST':
        new_message = event['body']
        add_message(new_message)
        messages = get_messages()
    elif method == 'DELETE':
        delete_messages()
        messages = 'Messages deleted.'
    else:
        messages = 'Method unsupported.'
    return messages

I select Add trigger and in the configuration I select the Amazon API Gateway. I create a new HTTP API. For simplicity, I leave my API endpoint open.

With the API Gateway trigger selected, I copy the endpoint of the new API I just created.

I can now use curl to test the API:

$ curl https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MessageWall
No message yet.
$ curl -X POST -H "Content-Type: text/plain" -d 'Hello from EFS!' https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MessageWall
Hello from EFS!

$ curl -X POST -H "Content-Type: text/plain" -d 'Hello again :)' https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MessageWall
Hello from EFS!
Hello again :)

$ curl https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MessageWall
Hello from EFS!
Hello again :)

$ curl -X DELETE https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MessageWall
Messages deleted.

$ curl https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MessageWall
No message yet.

It would be relatively easy to add unique file names (or specific subdirectories) for different users and extend this simple example into a more complete messaging application. As a developer, I appreciate the simplicity of using a familiar file system interface in my code. However, depending on your requirements, EFS throughput configuration must be taken into account. See the section Understanding EFS performance later in the post for more information.

Now, let’s use the new EFS file system support in AWS Lambda to build something more interesting. For example, let’s use the additional space available with EFS to build a machine learning inference API processing images.

Building a Serverless Machine Learning Inference API
To create a Lambda function implementing machine learning inference, I need to be able, in my code, to import the necessary libraries and load the machine learning model. Often, when doing so, the overall size of those dependencies goes beyond the current AWS Lambda limits in the deployment package size. One way of solving this is to accurately minimize the libraries to ship with the function code, and then download the model from an S3 bucket straight to memory (up to 3 GB, including the memory required for processing the model) or to /tmp (up 512 MB). This custom minimization and download of the model has never been easy to implement. Now, I can use an EFS file system.

The Lambda function I am building this time needs access to the public internet to download a pre-trained model and the images to run inference on. So I create a new VPC with public and private subnets, and configure a NAT Gateway and the route table used by the the private subnets to give access to the public internet. Using the AWS Cloud Development Kit, it’s just a few lines of code.

I create a new EFS file system and an access point in the new VPC using similar configurations as before. This time, I use /ml for the access point path.

Then, I create a new MLInference Lambda function with the same set up as before for permissions and connect the function to the private subnets of the new VPC. Machine learning inference is quite a heavy workload, so I select 3 GB for memory and 5 minutes for timeout. In the File system configuration, I add the new access point and mount it under /mnt/inference.

The machine learning framework I am using for this function is PyTorch, and I need to put the libraries required to run inference in the EFS file system. I launch an Amazon Linux EC2 instance in a public subnet of the new VPC. In the instance details, I select one of the availability zones where I have an EFS mount point, and then Add file system to automatically mount the same EFS file system I am using for the function. For the security groups of the EC2 instance, I select the default security group (to be able to mount the EFS file system) and one that gives inbound access to SSH (to be able to connect to the instance).

I connect to the instance using SSH and create a requirements.txt file containing the dependencies I need:

torch
torchvision
numpy

The EFS file system is automatically mounted by EC2 under /mnt/efs/fs1. There, I create the /ml directory and change the owner of the path to the user and group I am using now that I am connected (ec2-user).

$ sudo mkdir /mnt/efs/fs1/ml
$ sudo chown ec2-user:ec2-user /mnt/efs/fs1/ml

I install Python 3 and use pip to install the dependencies in the /mnt/efs/fs1/ml/lib path:

$ sudo yum install python3
$ pip3 install -t /mnt/efs/fs1/ml/lib -r requirements.txt

Finally, I give ownership of the whole /ml path to the user and group I used for the EFS access point:

$ sudo chown -R 1001:1001 /mnt/efs/fs1/ml

Overall, the dependencies in my EFS file system are using about 1.5 GB of storage.

I go back to the MLInference Lambda function configuration. Depending on the runtime you use, you need to find a way to tell where to look for dependencies if they are not included with the deployment package or in a layer. In the case of Python, I set the PYTHONPATH environment variable to /mnt/inference/lib.

I am going to use PyTorch Hub to download this pre-trained machine learning model to recognize the kind of bird in a picture. The model I am using for this example is relatively small, about 200 MB. To cache the model on the EFS file system, I set the TORCH_HOME environment variable to /mnt/inference/model.

All dependencies are now in the file system mounted by the function, and I can type my code straight in the Function code editor. I paste the following code to have a machine learning inference API:

import urllib
import json
import os

import torch
from PIL import Image
from torchvision import transforms

transform_test = transforms.Compose([
    transforms.Resize((600, 600), Image.BILINEAR),
    transforms.CenterCrop((448, 448)),
    transforms.ToTensor(),
    transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
])

model = torch.hub.load('nicolalandro/ntsnet-cub200', 'ntsnet', pretrained=True,
                       **{'topN': 6, 'device': 'cpu', 'num_classes': 200})
model.eval()


def lambda_handler(event, context):
    url = event['queryStringParameters']['url']

    img = Image.open(urllib.request.urlopen(url))
    scaled_img = transform_test(img)
    torch_images = scaled_img.unsqueeze(0)

    with torch.no_grad():
        top_n_coordinates, concat_out, raw_logits, concat_logits, part_logits, top_n_index, top_n_prob = model(torch_images)

        _, predict = torch.max(concat_logits, 1)
        pred_id = predict.item()
        bird_class = model.bird_classes[pred_id]
        print('bird_class:', bird_class)

    return json.dumps({
        "bird_class": bird_class,
    })

I add the API Gateway as trigger, similarly to what I did before for the MessageWall function. Now, I can use the serverless API I just created to analyze pictures of birds. I am not really an expert in the field, so I looked for a couple of interesting images on Wikipedia:

I call the API to get a prediction for these two pictures:

$ curl https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MLInference?url=https://path/to/image/atlantic-puffin.jpg

{"bird_class": "106.Horned_Puffin"}

$ curl https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MLInference?url=https://path/to/image/western-grebe.jpg

{"bird_class": "053.Western_Grebe"}

It works! Looking at Amazon CloudWatch Logs for the Lambda function, I see that the first invocation, when the function loads and prepares the pre-trained model for inference on CPUs, takes about 30 seconds. To avoid a slow response, or a timeout from the API Gateway, I use Provisioned Concurrency to keep the function ready. The next invocations take about 1.8 seconds.

Understanding EFS Performance
When using EFS with your Lambda function, is very important to understand how EFS performance works. For throughput, each file system can be configured to use bursting or provisioned mode.

When using bursting mode, all EFS file systems, regardless of size, can burst at least to 100 MiB/s of throughput. Those over 1 TiB in the standard storage class can burst to 100 MiB/s per TiB of data stored in the file system. EFS uses a credit system to determine when file systems can burst. Each file system earns credits over time at a baseline rate that is determined by the size of the file system that is stored in the standard storage class. A file system uses credits whenever it reads or writes data. The baseline rate is 50 KiB/s per GiB of storage.

You can monitor the use of credits in CloudWatch, each EFS file system has a BurstCreditBalance metric. If you see that you are consuming all credits, and the BurstCreditBalance metric is going to zero, you should enable provisioned throughput mode for the file system, from 1 to 1024 MiB/s. There is an additional cost when using provisioned throughput, based on how much throughput you are adding on top of the baseline rate.

To avoid running out of credits, you should think of the throughput as the average you need during the day. For example, if you have a 10GB file system, you have 500 KiB/s of baseline rate, and every day you can read/write 500 KiB/s * 3600 seconds * 24 hours = 43.2 GiB.

If the libraries and everything you function needs to load during initialization are about 2 GiB, and you only access the EFS file system during function initialization, like in the MLInference Lambda function above, that means you can initialize your function (for example because of updates or scaling up activities) about 20 times per day. That’s not a lot, and you would probably need to configure provisioned throughput for the EFS file system.

If you have 10 MiB/s of provisioned throughput, then every day you have 10 MiB/s * 3600 seconds * 24 hours = 864 GiB to read or write. If you only use the EFS file system at function initialization to read about 2 GB of dependencies, it means that you can have 400 initializations per day. That may be enough for your use case.

In the Lambda function configuration, you can also use the reserve concurrency control to limit the maximum number of execution environments used by a function.

If, by mistake, the BurstCreditBalance goes down to zero, and the file system is relatively small (for example, a few GiBs), there is the possibility that your function gets stuck and can’t execute fast enough before reaching the timeout. In that case, you should enable (or increase) provisioned throughput for the EFS file system, or throttle your function by setting the reserved concurrency to zero to avoid all invocations until the EFS file system has enough credits.

Understanding Security Controls
When using EFS file systems with AWS Lambda, you have multiple levels of security controls. I’m doing a quick recap here because they should all be considered during the design and implementation of your serverless applications. You can find more info on using IAM authorization and access points with EFS in this post.

To connect a Lambda function to an EFS file system, you need:

  • Network visibility in terms of VPC routing/peering and security group.
  • IAM permissions for the Lambda function to access the VPC and mount (read only or read/write) the EFS file system.
  • You can specify in the IAM policy conditions which EFS access point the Lambda function can use.
  • The EFS access point can limit access to a specific path in the file system.
  • File system security (user ID, group ID, permissions) can limit read, write, or executable access for each file or directory mounted by a Lambda function.

The Lambda function execution environment and the EFS mount point uses industry standard Transport Layer Security (TLS) 1.2 to encrypt data in transit. You can provision Amazon EFS to encrypt data at rest. Data encrypted at rest is transparently encrypted while being written, and transparently decrypted while being read, so you don’t have to modify your applications. Encryption keys are managed by the AWS Key Management Service (KMS), eliminating the need to build and maintain a secure key management infrastructure.

Available Now
This new feature is offered in all regions where AWS Lambda and Amazon EFS are available, with the exception of the regions in China, where we are working to make this integration available as soon as possible. For more information on availability, please see the AWS Region table. To learn more, please see the documentation.

EFS for Lambda can be configured using the console, the AWS Command Line Interface (CLI), the AWS SDKs, and the Serverless Application Model. This feature allows you to build data intensive applications that need to process large files. For example, you can now unzip a 1.5 GB file in a few lines of code, or process a 10 GB JSON document. You can also load libraries or packages that are larger than the 250 MB package deployment size limit of AWS Lambda, enabling new machine learning, data modelling, financial analysis, and ETL jobs scenarios.

Amazon EFS for Lambda is supported at launch in AWS Partner Network solutions, including Epsagon, Lumigo, Datadog, HashiCorp Terraform, and Pulumi.

There is no additional charge for using EFS from Lambda functions. You pay the standard price for AWS Lambda and Amazon EFS. Lambda execution environments always connect to the right mount target in an AZ and not across AZs. You can connect to EFS in the same AZ via cross account VPC but there can be data transfer costs for that. We do not support cross region, or cross AZ connectivity between EFS and Lambda.

Danilo

Cost Optimize your Jenkins CI/CD pipelines using EC2 Spot Instances

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/cost-optimize-your-jenkins-ci-cd-pipelines-using-ec2-spot-instances/

Author: Rajesh Kesaraju, Sr. Specialist Solution Architect, EC2 Spot Instances

In this blog post, I go over using Amazon EC2 Spot Instances on continuous integration and continuous deployment (CI/CD) workloads, via the popular open-source automation server Jenkins. I also break down the steps required to adopt Spot Instances into your CI/CD pipelines for cost optimization purposes. In this blog, I explain how to configure your Jenkins environment to achieve significant cost savings by using Spot Instances with the EC2 Fleet Jenkins plugin.

Overview of EC2 Spot, CI/CD, and Jenkins

AWS offers multiple purchasing models for its EC2 instances. This particular blog post focuses on Amazon EC2 Spot Instances, which lets you take advantage of unused EC2 capacity in the AWS Cloud at a steep discount.

You can use Spot Instances for various stateless, fault-tolerant, or flexible applications such as big data, containerized workloads, CI/CD, web servers, high performance computing (HPC), and other test and development workloads.

CI/CD pipelines are familiar to many readers via a popular piece of open-source software called Jenkins. Jenkins’ automation of development, testing and deployment scenarios, courtesy of more than 2000 plugins, plays a key role in many organizations’ software development and delivery ecosystems. Jenkins accelerates software development through multiple stages, including building and documenting, packaging and analytics, staging and deploying, etc.

Lyft began using EC2 Spot Instances for their Jenkins CI pipelines, and discovered they could save up to 90 percent compared to their previous non-Spot EC2 implementations. They moved their entire CI/CD pipeline to EC2 Spot Instances by modifying just four lines of their deployment code.

In this blog, I walk through how to configure your Jenkins environment to achieve significant cost savings by using Spot Instances with the EC2 Fleet Jenkins plugin.

Solution Overview

For the following tutorial, you need both an AWS account and Jenkins downloaded and installed on your system.

This blog post uses Spot Instances. If your Jenkins server runs on On-Demand Instances, you can easily switch to Spot Instances with EC2-Fleet Plugin. Now, let’s look at how this plugin can be configured to make your Jenkins elastically scale up/down depending on pending jobs, and save significantly on compute costs.

Solution

Create a new EC2 key pair

To access the SSH interfaces of your Jenkins instances, you must have an EC2 key pair. Please follow below steps to create a new EC2 key pair.

1. Log in to your AWS Account;
2. Switch to your preferred Region;
3. Provision a new EC2 key pair:

    1. Go to the EC2 console and click on the key pairs option from the left frame.
    2. Click on the Create key pair button;
    3. Provide key pair name and click on the Create button;
    4. Your web browser should download a .pem file – keep this file as it will be required to access the EC2 instances that you create in this workshop. If you’re using a Windows system, convert the .pem file to a PuTTY .ppk file. If you’re not sure how to do this, instructions are available here.

Create an AWS IAM User for EC2-Fleet Plugin

To control Spot Instances from the EC2-Fleet plugin, you first create an IAM user (with programmatic access) in your AWS account. Then configure an IAM policy for AWS permissions. Use the following code to achieve this step.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:*",
                "autoscaling:*",
                "iam:ListInstanceProfiles",
                "iam:ListRoles",
                "iam:PassRole"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

These permissions allow you to configure the plugin, allow programmatic access to AWS resources to create and terminate Spot Instances, and control Auto Scaling Group (ASG) parameters.

In the IAM dashboard, click User and select the Jenkins User you created. Next, click Create access key, and save the Access key ID and Secret access key for use in next steps.

IAM > Users > Jenkins User > Security Credentials > Create Access Key

Create Access Key for Jenkins user fig. 1

Create Access Key for Jenkins user fig. 2

Create an Auto Scaling Group

Auto Scaling groups help you configure your Jenkins EC2-Fleet plugin to control Jenkins build agents and scale up or down depending on the job queue. It also replaces instances that were terminated due to demand spike in specific Spot Instance pools.

Documentation on how to create Auto Scaling Group is here.

Set your ASG to diversify your Spot Fleet across multiple Spot pools to increase your chances of getting a Spot Instance for your Jenkins jobs, and set an allocation strategy. I used “capacity-optimized” as the allocation strategy in the following example.

The “capacity-optimized” option allocates Spot Instances from the deepest pools of available spare capacity, which lowers the chance of interruptions. Alternatively, you may choose the “lowest-price” allocation strategy if you have builds that finish quicker, and the cost of re-processing of failed jobs due to interruption isn’t that significant. Learn more about allocation strategies in this blog.

The Jenkins EC2-Fleet plugin overrides and controls ASG’s capacity configuration. So, start with one instance for now. I also cover a scenario that starts with 0 instances to further minimize costs.

A sample ASG configuration looks like the following image. Notice there are 6 instance types across 3 Availability Zones, this means EC2 Spot capacity is provided from 18 (6×3) Spot Instance pools! This configuration increases the likelihood of getting Spot Instances from the deepest Spot pools at a steep discount.

An example ASG configuration Image

Install and Configure an EC2-Fleet plugin in Jenkins

Install the latest version of EC2-Fleet plugin in Jenkins. This plugin launches Spot Instances using ASG or EC2 Spot Fleet where you can run your build jobs. In this blog, I launch EC2 Spot Instances using ASG.

After installing you see it in the plugin manager. This blog uses current version 2.0.0.

Go to Manage Jenkins > Plugin Manager then install EC2 Fleet Jenkins Plugin

Jenkins Server EC2-Fleet Plugin

In the first part of this solution, you created an AWS user. Now, configure this user in your Jenkins Amazon EC2 Fleet configuration section.

Navigate to Manage Jenkins -> Configure Clouds -> Add a New Cloud -> Amazon EC2 Fleet.

Configure Amazon EC2 Fleet plugin as Cloud Setting

Create a name for your EC2-Fleet plugin configuration. I use Amazon EC2 Spot Fleet. Then configure your AWS Credentials.

Configure ASG in Jenkins EC2-fleet plugin

1. Change the Kind to AWS Credentials;
2. Change the Scope to System (Jenkins and nodes only) – you don’t want your builds to have access to these credentials!
3. At the ID (optional) field. Enter this if need to access this using scripts
4. Provide Access key ID and Secret access key fields, saved when you created Jenkins user before, then click Add.
5. Once you are done adding credentials, select the corresponding AWS Region to your ASG.
6. EC2 Fleet dropdown automatically populates the ASG that you created earlier.
7. Once the ASG is selected, check your configuration. The following image shows this test:

Testing Jenkins EC2 Fleet Plugin configuration

Configure Launcher

1. Change the Kind to SSH Username with private key;
2. Change the Scope to System (Jenkins and nodes only) – you also don’t want your builds to have access to these credentials;
3. Enter ec2-user as the Username.
4. Select the Enter directly button for the Private Key. Open the .pem file that you downloaded previously, and copy the contents of the file to the Key field including the BEGIN RSA PRIVATE KEY and END RSA PRIVATE KEY fields.
5. Verify your launcher looks like as below, and click on the Add button

Configuring Launcher and providing Jenkins credentials

Once your credentials are added, you move on to complete rest of the Launcher configuration.

1. Select the ec2-user option from the Credentials drop-down.
2. Select the “Non verifying Verification Strategy” option from the Host Key Verification Strategy drop-down. Select this option because Spot Instances have a random SSH host fingerprint.

Configuring Jenkins launch agents by ec2-user via SSH

3. Mark the Connect Private check box to ensure that your Jenkins Master always communicates with the Agents via their internal VPC IP addresses (in real-world scenarios, your build      agents would likely not be publicly addressable).
4. Change the Label field to spot-agents.
5. Set the Max Idle Minutes Before Scaledown. In this example, I used AWS launched per-second billing in 2017, so there’s no need to keep a build agent running for too much longer than it’s required.
6. Change the Maximum Cluster Size depending on your need. For example, I set the Maximum Cluster Size to 2.

After saving these configurations, your screen should look similar to the following image.

Configuring Launcher with cluster scaling parameters

Configure Number of Executors

Determine the number of executors based on your build requirements, such as how many builds on average can be executed concurrently on each machine based on machine’s vCPU and RAM allocations.

If you cram many executors into one machine, each build average execution may increase, which slows down the pipeline.

Once you determine optimum executors per machine, any additional pending jobs get executed on scaled out machines by auto scaling.

Some Important aspects about Cluster size settings

Jenkins EC2-Fleet agent settings override ASG settings. So, “Minimum Cluster Size” and “Maximum Cluster Size” values mentioned here override ASG’s settings dynamically.

If you set minimum cluster size as 0, then when there are no pending jobs there won’t be any idle servers after Max Idle Minutes before shut down minutes are met. In this scenario, when there is a new build request, it takes roughly two to five minutes for new EC2 Spot Instances to start processing after boot strapping and installing necessary Jenkins agents.

If your jobs are time-insensitive, this strategy maximizes savings, as you eliminate spending money on idle instances.

Alternatively, you may set “Minimum Cluster Size” as 1 or 2 so you have running instances all the time if you have a need to process several builds/tests a day and/or builds taking very long time and occur on a daily basis.

In this blog the idea is to cost optimize CI/CD pipelines, to avoid idle instances.

Configure Jenkins build Jobs to utilize EC2 Spot Instances

Finally, configure your build job to check “Restrict where this project can be run” and enter “spot-agents” as the label expression. When builds are initiated, they are executed against Spot Instance.

Configuring Jenkins builds to utilize EC2 Spot Instances with Label Expression

At this point, you are ready to run your builds on Spot Instances and save significantly!

Configuring Jenkins server to run on EC2 Spot Instances

Now you started saving on your build agents, is it possible to save on Jenkins server also using Spot Instances? Yes! Let’s see how this can be done with a few simple techniques.

By running your Jenkins server on Spot, you optimize the compute costs associated with the whole Jenkins CI/CD environment. With Spot Instance diversification strategy and making use of ASG features, you can move your Jenkins server also to EC2 Spot Instance.

There is one slight wrinkle in the above process. Jenkins requires persistent data on a local file system, whereas a Spot Instance cannot be guaranteed to be persistent due to the chance of interruptions. Therefore, to switch your Jenkins server over to Spot, you must first move your Jenkins data to an Amazon Elastic File System (EFS) volume — which your Spot Instance can then access.

Amazon Elastic File System provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources. More here

Here are the steps to mount EFS volume to an existing Jenkins server, and move its content to Amazon EFS managed store. Then, you can point the Jenkins server to use EFS mount point for its operations and maintain server state. This way when a Spot Instance gets interrupted, server state is not lost, and another Spot Instance can pick up from where the previous server left off.

Here are the steps move data from JENKINS_HOME to Amazon EFS

1. Mount EFS volume:

sudo mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \
$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)\
.%FILE-SYSTEM-ID%.efs.<AWS Region>.amazonaws.com:/ (http://efs.<AWS Region>.amazonaws.com/) /mnt

2. Copy existing JENKINS_HOME (/var/lib/Jenkins) content to EFS after shutting down Jenkins server
3. >> sudo chown jenkins:jenkins /mnt
>> sudo cp -rpv /var/lib/jenkins/* /mnt

Once you’ve moved the contents of JENKINS_HOME to Amazon EFS, now it’s safe to run Jenkins Server to EC2 Spot Instance.

Spot Instances can be interrupted by AWS, so you may lose your Jenkins server access momentarily when an interruption occurs.

Since you already externalized the state to Amazon EFS, you don’t lose any previous state. So, when the instance running your Jenkins server gets interrupted, in a matter of a few minutes, ASG replenishes new EC2 Spot Instance to run your Jenkins Server.

To get this into service automatically complete the following steps:

1. Ensure ASG launch instances into target group that is pointed by Application Load Balancer (ALB)
2. If Spot Instance is terminated with the two minute warning, ASG launches a replacement instance. The new Spot Instance will be bootstrapped just as your original Spot Instance was and then mount to EFS as configured in user data of “Launch Template”
3. Configure ASG from Launch Template (Sample UserData section as below)

#!/bin/bash
# Install all pending updates to the system
yum -y update
# Configure YUM to be able to access official Jenkins RPM packages
wget -O /etc/yum.repos.d/jenkins.repo https://pkg.jenkins.io/redhat-stable/jenkins.repo
# Import the Jenkins repository public key
rpm --import https://pkg.jenkins.io/redhat-stable/jenkins.io.key
# Configure YUM to be able to access contributed Maven RPM packages
wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
# Update the release version in the Maven repository configuration for this mainline release of Amazon Linux
sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
# Install the Java 8 SDK, Git, Jenkins and Maven
yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel git jenkins apache-maven
# Set the default version of java to run out of the Java 8 SDK path (required by Jenkins)
update-alternatives --set java /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java
update-alternatives --set javac /usr/lib/jvm/java-1.8.0-openjdk.x86_64/bin/javac
# Mount the Jenkins EFS volume at JENKINS_HOME
mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 $(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone).${EFSJenkinsHomeVolume}.efs.${AWSRegion}.amazonaws.com:/ /var/lib/jenkins
# Start the Jenkins service
service jenkins start

That’s it, you are now ready to run your Jenkins server on EC2 Spot Instances and save up to 90% compared to On-Demand price.

Conclusion

You are now ready to leverage the power and flexibility of EC2 Spot Instances.

With a few modifications to your deployment you can significantly reduce your compute costs or accelerate throughput by accessing 10x compute for the same cost.

For users getting started on their Amazon EC2 Spot Instances, we are here to help. Please also share any questions in the comments section below.

Here is where to begin in the Amazon EC2 Spot Instance console — and start transforming your Jenkins workloads today.


About the author

Rajesh Kesaraju

Rajesh Kesaraju is a Sr. Specialist SA for EC2 Spot with Amazon AWS. He helps customers to cost optimize their workloads by utilizing EC2 Spot instances in various types of workloads such as Big Data, Containers, HPC, CI/CD, and Sateless Applications Etc.

Amazon Elastic Container Service now supports Amazon EFS file systems

Post Syndicated from Martin Beeby original https://aws.amazon.com/blogs/aws/amazon-ecs-supports-efs/

It has only been five years since Jeff wrote on this blog about the launch of the Amazon Elastic Container Service. I remember reading that post and thinking how exotic and unusual containers sounded. Fast forward just five years, and containers are an everyday part of most developers lives, but whilst customers are increasingly adopting container orchestrators such as ECS, there are still some types of applications that have been hard to move into this containerized world.

Customers building applications that require data persistence or shared storage have faced a challenge since containers are temporary in nature. As containers are scaled in and out dynamically, any local data is lost as containers are terminated. Today we are changing that for ECS by launching support for Amazon Elastic File System (EFS) file systems. Both containers running on ECS and AWS Fargate will be able to use Amazon Elastic File System (EFS).

This new capability will help customers containerize applications that require shared storage such as content management systems, internal DevOps tools, and machine learning frameworks. A whole new set of workloads will now enjoy the benefits containers bring, enabling customers to speed up their deployment process, optimize their infrastructure utilization, and build more resilient systems.

Amazon Elastic File System (EFS) provides a fully managed, highly available, scalable shared file system; it means that you can store your data separately from your compute. It is also a regional service, meaning that the service is storing data within and across 3 Availability Zones for high availability and durability.

Until now, it was possible to get EFS working with ECS if you were running your containers on a cluster of EC2 instances. However, if you wanted to use AWS Fargate as your container data plane then, prior to this announcement, you couldn’t mount an EFS file system. Fargate does not allow you as a customer to gain access to the managed instances inside the Fargate fleet and so you are unable to make the modifications required to the instances to setup EFS.

I’m sure many of our customers will be delighted that they now have a way of connecting EFS file systems easily to ECS, personally I’m ecstatic that we can use this new feature in combination with Fargate, it will be perfect for a little side project that I am currently building and finally give us a way of having persistent storage work in combination with serverless containers.

There is a good reason both ECS and EFS include the word Elastic in their name as both of these services can scale up and down as your application requires. EFS scales on demand from zero to petabytes with no disruptions, growing and shrinking automatically as you add and remove files. With ECS there are options to use either Cluster Auto Scaling or Fargate to ensure that your capacity grows and shrinks to meet demand. For you, our customer, this means that you are only ever paying for the storage and compute that you are actually going to use.

So, enough talking, let’s get to the fun bit and see how we can get a containerized application working with Amazon Elastic File System (EFS).

A Simple Shared File System Example
For this example, I have built some basic infrastructure, so I can show you the before and after effect of adding EFS to a Fargate cluster. Firstly I have created a VPC that spans two Availability Zones. Secondly, I have created an ECS cluster. On the ECS Cluster, I plan to run two containers using Fargate, and this means that I don’t have to set up any EC2 instances as my containers will run on the Fargate fleet that is managed by AWS.

To deploy my application, I create a Task Definition that uses a Docker Image of an Open Source application called Cloud Commander, which is a simple drag and drop, file manager.

In the ECS console, I create a service and use the Task Definition that I created to deploy my application. Once the service is deployed, and the containers are provisioned, I head over to the Application Load Balancer URL which was created as part of my service, and I can see that my application appears to be working. I can drag a file to upload it to the application.

However there is a problem. If I refresh the page, occasionally, the file that I dragged to upload disappears. This happens because I have two containers running my application, and they are both using their local file systems. As I refresh my browser, the load balancer sends me to one of the two containers, and only one of the containers is storing the image on its local volume.

What I need is a shared file system that both the containers can mount and to which they can both write files.

Next, I create a new file system inside the EFS console. In the wizard, I choose the same VPC that I used when I created my ECS cluster and select all of the availability zones that the VPC spans and ask the service to create mount targets in each one. These mount targets will mean that containers that are in different Availability Zones will still be able to connect to the file system.

I select the defaults for all the other options in the wizard. In Step 3, I click the button to Add an access Point. An access point is a way of me giving a particular application access to the file system and gives me incredibly granular control over what data my application is allowed to access. You can add multiple Access points to your EFS file system and provide different applications, different levels of access to the same file system.

The application I am deploying will handle user uploads for my web site, so I will create an EFS Access Point that gives this application full access to the /uploads directory, but nothing else. To do this, I will create an access point with a new User ID (1000) and Group ID (1000), and a home directory of /uploads. The directory will be created with this user and group as the owner with full permissions, giving read permissions to all other users.

Security is the number one priority at AWS and the team have worked hard to ensure ECS integrates with EFS to provide multiple layers of security for protecting EFS filesystems from unauthorized access, including IAM role-based access control, VPC security groups, and encryption of data in transit.

After working through the wizard, my file system is created, and I’m given a File system ID and an Access point ID. I will need these IDs to configure the task definitions in Fargate.

I go back to the Task Definition inside my ECS Cluster and create a new revision of the Task Definition. I scroll down to the Volumes section of the definition and click Add volume.

I can then add my EFS File System details, I select the correct File system ID and also the correct Access point ID that I created earlier.

I have opted to Enable Encryption in transit, but for this example I have not enabled EFS IAM authorization which would be helpful in a larger application with many clients requiring different levels of access for different portions of a filesystem. This feature can simplify management by using IAM authorization, if you want more details on this check out the blog we wrote when the feature launched earlier in the year.

Now that I have updated my task definition, I can update my ECS service to use this new definition. It’s also essential here to make sure that I set the platform version to 1.4.0.

The service deploys my two new containers and decommissions the old two. The new containers will now be using the shared EFS file system, and so my application now works as expected.

If I upload files and then revisit the application, my files will still be there. If my containers are replaced or are scaled up or down, the file system will persist.

Looking to the Future
I am loving the innovation that has been coming out of the containers teams recently, and looking at their public roadmap; they have some really exciting plans for the future. If you have ideas or feature requests, make sure you add your voice to the many customers that are already guiding their roadmap.

The new feature is available in all regions where ECS and EFS are available and comes at no additional cost. So, go check it out in the AWS console and let us know what you think.

Happy Containerizing

— Martin

 

New for Amazon EFS – IAM Authorization and Access Points

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-amazon-efs-iam-authorization-and-access-points/

When building or migrating applications, we often need to share data across multiple compute nodes. Many applications use file APIs and Amazon Elastic File System (EFS) makes it easy to use those applications on AWS, providing a scalable, fully managed Network File System (NFS) that you can access from other AWS services and on-premises resources.

EFS scales on demand from zero to petabytes with no disruptions, growing and shrinking automatically as you add and remove files, eliminating the need to provision and manage capacity. By using it, you get strong file system consistency across 3 Availability Zones. EFS performance scales with the amount of data stored, with the option to provision the throughput you need.

Last year, the EFS team focused on optimizing costs with the introduction of the EFS Infrequent Access (IA) storage class, with storage prices up to 92% lower compared to EFS Standard. You can quickly start reducing your costs by setting a Lifecycle Management policy to move to EFS IA the files that haven’t been accessed for a certain amount of days.

Today, we are introducing two new features that simplify managing access, sharing data sets, and protecting your EFS file systems:

  • IAM authentication and authorization for NFS Clients, to identify clients and use IAM policies to manage client-specific permissions.
  • EFS access points, to enforce the use of an operating system user and group, optionally restricting access to a directory in the file system.

Using IAM Authentication and Authorization
In the EFS console, when creating or updating an EFS file system, I can now set up a file system policy. This is an IAM resource policy, similar to bucket policies for Amazon Simple Storage Service (S3), and can be used, for example, to disable root access, enforce read-only access, or enforce in-transit encryption for all clients.

Identity-based policies, such as those used by IAM users, groups, or roles, can override these default permissions. These new features work on top of EFS’s current network-based access using security groups.

I select the option to disable root access by default, click on Set policy, and then select the JSON tab. Here, I can review the policy generated based on my settings, or create a more advanced policy, for example to grant permissions to a different AWS account or a specific IAM role.

The following actions can be used in IAM policies to manage access permissions for NFS clients:

  • ClientMount to give permission to mount a file system with read-only access
  • ClientWrite to be able to write to the file system
  • ClientRootAccess to access files as root

I look at the policy JSON. I see that I can mount and read (ClientMount) the file system, and I can write (ClientWrite) in the file system, but since I selected the option to disable root access, I don’t have ClientRootAccess permissions.

Similarly, I can attach a policy to an IAM user or role to give specific permissions. For example, I create a IAM role to give full access to this file system (including root access) with this policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "elasticfilesystem:ClientMount",
                "elasticfilesystem:ClientWrite",
                "elasticfilesystem:ClientRootAccess"
            ],
            "Resource": "arn:aws:elasticfilesystem:us-east-2:123412341234:file-system/fs-d1188b58"
        }
    ]
}

I start an Amazon Elastic Compute Cloud (EC2) instance in the same Amazon Virtual Private Cloud as the EFS file system, using Amazon Linux 2 and a security group that can connect to the file system. The EC2 instance is using the IAM role I just created.

The open source efs-utils are required to connect a client using IAM authentication, in-transit encryption, or both. Normally, on Amazon Linux 2, I would install efs-utils using yum, but the new version is still rolling out, so I am following the instructions to build the package from source in this repository. I’ll update this blog post when the updated package is available.

To mount the EFS file system, I use the mount command. To leverage in-transit encryption, I add the tls option. I am not using IAM authentication here, so the permissions I specified for the “*” principal in my file system policy apply to this connection.

$ sudo mkdir /mnt/shared
$ sudo mount -t efs -o tls fs-d1188b58 /mnt/shared

My file system policy disables root access by default, so I can’t create a new file as root.

$ sudo touch /mnt/shared/newfile
touch: cannot touch ‘/mnt/shared/newfile’: Permission denied

I now use IAM authentication adding the iam option to the mount command (tls is required for IAM authentication to work).

$ sudo mount -t efs -o iam,tls fs-d1188b58 /mnt/shared

When I use this mount option, the IAM role from my EC2 instance profile is used to connect, along with the permissions attached to that role, including root access:

$ sudo touch /mnt/shared/newfile
$ ls -la /mnt/shared/newfile
-rw-r--r-- 1 root root 0 Jan  8 09:52 /mnt/shared/newfile

Here I used the IAM role to have root access. Other common use cases are to enforce in-transit encryption (using the aws:SecureTransport condition key) or create different roles for clients needing write or read-only access.

EFS IAM permission checks are logged by AWS CloudTrail to audit client access to your file system. For example, when a client mounts a file system, a NewClientConnection event is shown in my CloudTrail console.

Using EFS Access Points
EFS access points allow you to easily manage application access to NFS environments, specifying a POSIX user and group to use when accessing the file system, and restricting access to a directory within a file system.

Use cases that can benefit from EFS access points include:

  • Container-based environments, where developers build and deploy their own containers (you can also see this blog post for using EFS for container storage).
  • Data science applications, that require read-only access to production data.
  • Sharing a specific directory in your file system with other AWS accounts.

In the EFS console, I create two access points for my file system, each using a different POSIX user and group:

  • /data – where I am sharing some data that must be read and updated by multiple clients.
  • /config – where I share some configuration files that must not be updated by clients using the /data access point.

I used file permissions 755 for both access points. That means that I am giving read and execute access to everyone and write access to the owner of the directory only. Permissions here are used when creating the directory. Within the directory, permissions are under full control of the user.

I mount the /data access point adding the accesspoint option to the mount command:

$ sudo mount -t efs -o tls,accesspoint=fsap-0204ce67a2208742e fs-d1188b58 /mnt/shared

I can now create a file, because I am not doing that as root, but I am automatically using the user and group ID of the access point:

$ sudo touch /mnt/shared/datafile
$ ls -la /mnt/shared/datafile
-rw-r--r-- 1 1001 1001 0 Jan  8 09:58 /mnt/shared/datafile

I mount the file system again, without specifying an access point. I see that datafile was created in the /data directory, as expected considering the access point configuration. When using the access point, I was unable to access any files that were in the root or other directories of my EFS file system.

$ sudo mount -t efs -o tls /mnt/shared/
$ ls -la /mnt/shared/data/datafile 
-rw-r--r-- 1 1001 1001 0 Jan  8 09:58 /mnt/shared/data/datafile

To use IAM authentication with access points, I add the iam option:

$ sudo mount -t efs -o iam,tls,accesspoint=fsap-0204ce67a2208742e fs-d1188b58 /mnt/shared

I can restrict a IAM role to use only a specific access point adding a Condition on the AccessPointArn to the policy:

"Condition": {
    "StringEquals": {
        "elasticfilesystem:AccessPointArn" : "arn:aws:elasticfilesystem:us-east-2:123412341234:access-point/fsap-0204ce67a2208742e"
    }
}

Using IAM authentication and EFS access points together simplifies securely sharing data for container-based architectures and multi-tenant-applications, because it ensures that every application automatically gets the right operating system user and group assigned to it, optionally limiting access to a specific directory, enforcing in-transit encryption, or giving read-only access to the file system.

Available Now
IAM authorization for NFS clients and EFS access points are available in all regions where EFS is offered, as described in the AWS Region Table. There is no additional cost for using them. You can learn more about using EFS with IAM and access points in the documentation.

It’s now easier to create scalable architectures sharing data and configurations. Let me know what you are going use these new features for!

Danilo

Optimize Storage Cost with Reduced Pricing for Amazon EFS Infrequent Access

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/optimize-storage-cost-with-reduced-pricing-for-amazon-efs-infrequent-access/

Today we are announcing a new price reduction – one of the largest in AWS Cloud history to date – when using Infrequent Access (IA) with Lifecycle Management with Amazon Elastic File System. This price reduction makes it possible to optimize cost even further and automatically save up to 92% on file storage costs as your access patterns change. With this new reduced pricing you can now store and access your files natively in a file system for effectively $0.08/GB-month, as we’ll see in an example later in this post.

Amazon Elastic File System (EFS) is a low-cost, simple to use, fully managed, and cloud-native NFS file system for Linux-based workloads that can be used with AWS services and on-premises resources. EFS provides elastic storage, growing and shrinking automatically as files are created or deleted – even to petabyte scale – without disruption. Your applications always have the storage they need immediately available. EFS also includes, for free, multi-AZ availability and durability right out of the box, with strong file system consistency.

Easy Cost Optimization using Lifecycle Management
As storage grows the likelihood that a given application needs access to all of the files all of the time lessens, and access patterns can also change over time. Industry analysts such as IDC, and our own analysis of usage patterns confirms, that around 80% of data is not accessed very often. The remaining 20% is in active use. Two common drivers for moving applications to the cloud are to maximize operational efficiency and to reduce the total cost of ownership, and this applies equally to storage costs. Instead of keeping all of the data on hand on the fastest performing storage it may make sense to move infrequently accessed data into a different storage class/tier, with an associated cost reduction. Identifying this data manually can be a burden so it’s also ideal to have the system monitor access over time and perform the movement of data between storage tiers automatically, again without disruption to your running applications.

EFS Infrequent Access (IA) with Lifecycle Management provides an easy to use, cost-optimized price and performance tier suitable for files that are not accessed regularly. With the new price reduction announced today builders can now save up to 92% on their file storage costs compared to EFS Standard. EFS Lifecycle Management is easy to enable and runs automatically behind the scenes. When enabled on a file system, files not accessed according to the lifecycle policy you choose will be moved automatically to the cost-optimized EFS IA storage class. This movement is transparent to your application.

Although the infrequently accessed data is held in a different storage class/tier it’s still immediately accessible. This is one of the advantages to EFS IA – you don’t have to sacrifice any of EFS‘s benefits to get the cost savings. Your files are still immediately accessible, all within the same file system namespace. The only tradeoff is slightly higher per operation latency (double digit ms vs single digit ms — think magnetic vs SSD) for the files in the IA storage class/tier.

As an example of the cost optimization EFS IA provides let’s look at storage costs for 100 terabytes (100TB) of data. The EFS Standard storage class is currently charged at $0.30/GB-month. When it was launched in July the EFS IA storage class was priced at $0.045/GB-month. It’s now been reduced to $0.025/GB-month. As I noted earlier, this is one of the largest price drops in the history of AWS to date!

Using the 20/80 access statistic mentioned earlier for EFS IA:

  • 20% of 100TB = 20TB at $0.30/GB-month = $0.30 x 20 x 1,000 = $6,000
  • 80% of 100TB = 80TB at $0.025/GB-month = $0.025 x 80 x 1,000 = $2,000
  • Total for 100TB = $8,000/month or $0.08/GB-month. Remember, this price also includes (for free) multi-AZ, full elasticity, and strong file system consistency.

Compare this to using only EFS Standard where we are storing 100% of the data in the storage class, we get a cost of $0.30 x 100 x 1,000 = $30,000. $22,000/month is a significant saving and it’s so easy to enable. Remember too that you have control over the lifecycle policy, specifying how frequently data is moved to the IA storage tier.

Getting Started with Infrequent Access (IA) Lifecycle Management
From the EFS Console I can quickly get started in creating a file system by choosing a Amazon Virtual Private Cloud and the subnets in the Virtual Private Cloud where I want to expose mount targets for my instances to connect to.

In the next step I can configure options for the new volume. This is where I select the Lifecycle policy I want to apply to enable use of the EFS IA storage class. Here I am going to enable files that have not been accessed for 14 days to be moved to the IA tier automatically.

In the final step I simply review my settings and then click Create File System to create the volume. Easy!

A Lifecycle Management policy can also be enabled, or changed, for existing volumes. Navigating to the volume in the EFS Console I can view the applied policy, if any. Here I’ve selected an existing file system that has no policy attached and therefore is not benefiting from EFS IA.

Clicking the pencil icon to the right of the field takes me to a dialog box where I can select the appropriate Lifecycle policy, just as I did when creating a new volume.

Amazon Elastic File System IA with Lifecycle Management is available now in all regions where Elastic File System is present.

— Steve

 

New – Infrequent Access Storage Class for Amazon Elastic File System (EFS)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-infrequent-access-storage-class-for-amazon-elastic-file-system-efs/

Amazon Elastic File System lets you create petabyte-scale file systems that can be accessed in massively parallel fashion from hundreds or thousands of EC2 instances and on-premises servers, while scaling on demand without disrupting applications. Since the mid-2016 launch of EFS, we have added many new features including encryption of data at rest and in transit, a provisioned throughput option when you need high throughput access to a set of files that do not occupy a lot of space, on-premises access via AWS Direct Connect, EFS File Sync, support for AWS VPN and Inter-Region VPC Peering, and more.

Infrequent Access Storage Class
Today I would like to tell you about the new Amazon EFS Infrequent Access storage class, as pre-announced at AWS re:Invent. As part of a new Lifecycle Management option for EFS file systems, you can now indicate that you want to move files that have not been accessed in the last 30 days to a storage class that is 85% less expensive. You can enable the use of Lifecycle Management when you create a new EFS file system, and you can enable it later for file systems that were created on or after today’s launch.

The new storage class is totally transparent. You can still access your files as needed and in the usual way, with no code or operational changes necessary.

You can use the Infrequent Access storage class to meet auditing and retention requirements, create nearline backups that can be recovered using normal file operations, and to keep data close at hand that you need on an occasional basis.

Here are a couple of things to keep in mind:

Eligible Files – Files that are 128 KiB or larger and that have not been accessed or modified for at least 30 days can be transitioned to the new storage class. Modifications to a file’s metadata that do not change the file will not delay a transition.

Priority – Operations that transition files to Infrequent Access run at a lower priority than other operations on the file system.

Throughput – If your file system is configured for Bursting mode, the amount of Standard storage determines the throughput. Otherwise, the provisioned throughput applies.

Enabling Lifecycle Management
You can enable Lifecycle Management and benefit from the Infrequent Access storage class with one click:

As I noted earlier, you can check this when you create the file system, or you can enable it later for file systems that you create from now on.

Files that have not been read or written for 30 days will be transitioned to the Infrequent Access storage class with no further action on your part. Files in the Standard Access class can be accessed with latency measured in single-digit milliseconds; files in the Infrequent Access class have latency in the double-digits. Your next AWS bill will include information on your use of both storage classes, so that you can see your cost savings.

Available Now
This feature is available now and you can start using it today in all AWS Regions where EFS is available. Infrequent Access storage is billed at $0.045 per GB/Month in US East (N. Virginia), with correspondingly low pricing in other regions. There’s also a data transfer charge of $0.01 per GB for reads and writes to Infrequent Access storage.

Like every AWS service and feature, we are launching with an initial set of features and a really strong roadmap! For example, we are working on additional lifecycle management flexibility, and would be very interested in learning more about what kinds of times and rules you would like.

Jeff;

PS – AWS DataSync will help you to quickly and easily automate data transfer between your existing on-premises storage and EFS.

Optimizing a Lift-and-Shift for Security

Post Syndicated from Jonathan Shapiro-Ward original https://aws.amazon.com/blogs/architecture/optimizing-a-lift-and-shift-for-security/

This is the third and final blog within a three-part series that examines how to optimize lift-and-shift workloads. A lift-and-shift is a common approach for migrating to AWS, whereby you move a workload from on-prem with little or no modification. This third blog examines how lift-and-shift workloads can benefit from an improved security posture with no modification to the application codebase. (Read about optimizing a lift-and-shift for performance and for cost effectiveness.)

Moving to AWS can help to strengthen your security posture by eliminating many of the risks present in on-premise deployments. It is still essential to consider how to best use AWS security controls and mechanisms to ensure the security of your workload. Security can often be a significant concern in lift-and-shift workloads, especially for legacy workloads where modern encryption and security features may not present. By making use of AWS security features you can significantly improve the security posture of a lift-and-shift workload, even if it lacks native support for modern security best practices.

Adding TLS with Application Load Balancers

Legacy applications are often the subject of a lift-and-shift. Such migrations can help reduce risks by moving away from out of date hardware but security risks are often harder to manage. Many legacy applications leverage HTTP or other plaintext protocols that are vulnerable to all manner of attacks. Often, modifying a legacy application’s codebase to implement TLS is untenable, necessitating other options.

One comparatively simple approach is to leverage an Application Load Balancer or a Classic Load Balancer to provide SSL offloading. In this scenario, the load balancer would be exposed to users, while the application servers that only support plaintext protocols will reside within a subnet which is can only be accessed by the load balancer. The load balancer would perform the decryption of all traffic destined for the application instance, forwarding the plaintext traffic to the instances. This allows  you to use encryption on traffic between the client and the load balancer, leaving only internal communication between the load balancer and the application in plaintext. Often this approach is sufficient to meet security requirements, however, in more stringent scenarios it is never acceptable for traffic to be transmitted in plaintext, even if within a secured subnet. In this scenario, a sidecar can be used to eliminate plaintext traffic ever traversing the network.

Improving Security and Configuration Management with Sidecars

One approach to providing encryption to legacy applications is to leverage what’s often termed the “sidecar pattern.” The sidecar pattern entails a second process acting as a proxy to the legacy application. The legacy application only exposes its services via the local loopback adapter and is thus accessible only to the sidecar. In turn the sidecar acts as an encrypted proxy, exposing the legacy application’s API to external consumers via TLS. As unencrypted traffic between the sidecar and the legacy application traverses the loopback adapter, it never traverses the network. This approach can help add encryption (or stronger encryption) to legacy applications when it’s not feasible to modify the original codebase. A common approach to implanting sidecars is through container groups such as pod in EKS or a task in ECS.

Implementing the Sidecar Pattern With Containers

Figure 1: Implementing the Sidecar Pattern With Containers

Another use of the sidecar pattern is to help legacy applications leverage modern cloud services. A common example of this is using a sidecar to manage files pertaining to the legacy application. This could entail a number of options including:

  • Having the sidecar dynamically modify the configuration for a legacy application based upon some external factor, such as the output of Lambda function, SNS event or DynamoDB write.
  • Having the sidecar write application state to a cache or database. Often applications will write state to the local disk. This can be problematic for autoscaling or disaster recovery, whereby having the state easily accessible to other instances is advantages. To facilitate this, the sidecar can write state to Amazon S3, Amazon DynamoDB, Amazon Elasticache or Amazon RDS.

A sidecar requires customer development, but it doesn’t require any modification of the lift-and-shifted application. A sidecar treats the application as a blackbox and interacts with it via its API, configuration file, or other standard mechanism.

Automating Security

A lift-and-shift can achieve a significantly stronger security posture by incorporating elements of DevSecOps. DevSecOps is a philosophy that argues that everyone is responsible for security and advocates for automation all parts of the security process. AWS has a number of services which can help implement a DevSecOps strategy. These services include:

  • Amazon GuardDuty: a continuous monitoring system which analyzes AWS CloudTrail Events, Amazon VPC Flow Log and DNS Logs. GuardDuty can detect threats and trigger an automated response.
  • AWS Shield: a managed DDOS protection services
  • AWS WAF: a managed Web Application Firewall
  • AWS Config: a service for assessing, tracking, and auditing changes to AWS configuration

These services can help detect security problems and implement a response in real time, achieving a significantly strong posture than traditional security strategies. You can build a DevSecOps strategy around a lift-and-shift workload using these services, without having to modify the lift-and-shift application.

Conclusion

There are many opportunities for taking advantage of AWS services and features to improve a lift-and-shift workload. Without any alteration to the application you can strengthen your security posture by utilizing AWS security services and by making small environmental and architectural changes that can help alleviate the challenges of legacy workloads.

About the author

Dr. Jonathan Shapiro-Ward is an AWS Solutions Architect based in Toronto. He helps customers across Canada to transform their businesses and build industry leading cloud solutions. He has a background in distributed systems and big data and holds a PhD from the University of St Andrews.

Optimizing a Lift-and-Shift for Cost Effectiveness and Ease of Management

Post Syndicated from Jonathan Shapiro-Ward original https://aws.amazon.com/blogs/architecture/optimizing-a-lift-and-shift-for-cost/

Lift-and-shift is the process of migrating a workload from on premise to AWS with little or no modification. A lift-and-shift is a common route for enterprises to move to the cloud, and can be a transitionary state to a more cloud native approach. This is the second blog post in a three-part series which investigates how to optimize a lift-and-shift workload. The first post is about performance.

A key concern that many customers have with a lift-and-shift is cost. If you move an application as is  from on-prem to AWS, is there any possibility for meaningful cost savings? By employing AWS services, in lieu of self-managed EC2 instances, and by leveraging cloud capability such as auto scaling, there is potential for significant cost savings. In this blog post, we will discuss a number of AWS services and solutions that you can leverage with minimal or no change to your application codebase in order to significantly reduce management costs and overall Total Cost of Ownership (TCO).

Automate

Even if you can’t modify your application, you can change the way you deploy your application. The adopting-an-infrastructure-as-code approach can vastly improve the ease of management of your application, thereby reducing cost. By templating your application through Amazon CloudFormation, Amazon OpsWorks, or Open Source tools you can make deploying and managing your workloads a simple and repeatable process.

As part of the lift-and-shift process, rationalizing the workload into a set of templates enables less time to spent in the future deploying and modifying the workload. It enables the easy creation of dev/test environments, facilitates blue-green testing, opens up options for DR, and gives the option to roll back in the event of error. Automation is the single step which is most conductive to improving ease of management.

Reserved Instances and Spot Instances

A first initial consideration around cost should be the purchasing model for any EC2 instances. Reserved Instances (RIs) represent a 1-year or 3-year commitment to EC2 instances and can enable up to 75% cost reduction (over on demand) for steady state EC2 workloads. They are ideal for 24/7 workloads that must be continually in operation. An application requires no modification to make use of RIs.

An alternative purchasing model is EC2 spot. Spot instances offer unused capacity available at a significant discount – up to 90%. Spot instances receive a two-minute warning when the capacity is required back by EC2 and can be suspended and resumed. Workloads which are architected for batch runs – such as analytics and big data workloads – often require little or no modification to make use of spot instances. Other burstable workloads such as web apps may require some modification around how they are deployed.

A final alternative is on-demand. For workloads that are not running in perpetuity, on-demand is ideal. Workloads can be deployed, used for as long as required, and then terminated. By leveraging some simple automation (such as AWS Lambda and CloudWatch alarms), you can schedule workloads to start and stop at the open and close of business (or at other meaningful intervals). This typically requires no modification to the application itself. For workloads that are not 24/7 steady state, this can provide greater cost effectiveness compared to RIs and more certainty and ease of use when compared to spot.

Amazon FSx for Windows File Server

Amazon FSx for Windows File Server provides a fully managed Windows filesystem that has full compatibility with SMB and DFS and full AD integration. Amazon FSx is an ideal choice for lift-and-shift architectures as it requires no modification to the application codebase in order to enable compatibility. Windows based applications can continue to leverage standard, Windows-native protocols to access storage with Amazon FSx. It enables users to avoid having to deploy and manage their own fileservers – eliminating the need for patching, automating, and managing EC2 instances. Moreover, it’s easy to scale and minimize costs, since Amazon FSx offers a pay-as-you-go pricing model.

Amazon EFS

Amazon Elastic File System (EFS) provides high performance, highly available multi-attach storage via NFS. EFS offers a drop-in replacement for existing NFS deployments. This is ideal for a range of Linux and Unix usecases as well as cross-platform solutions such as Enterprise Java applications. EFS eliminates the need to manage NFS infrastructure and simplifies storage concerns. Moreover, EFS provides high availability out of the box, which helps to reduce single points of failure and avoids the need to manually configure storage replication. Much like Amazon FSx, EFS enables customers to realize cost improvements by moving to a pay-as-you-go pricing model and requires a modification of the application.

Amazon MQ

Amazon MQ is a managed message broker service that provides compatibility with JMS, AMQP, MQTT, OpenWire, and STOMP. These are amongst the most extensively used middleware and messaging protocols and are a key foundation of enterprise applications. Rather than having to manually maintain a message broker, Amazon MQ provides a performant, highly available managed message broker service that is compatible with existing applications.

To use Amazon MQ without any modification, you can adapt applications that leverage a standard messaging protocol. In most cases, all you need to do is update the application’s MQ endpoint in its configuration. Subsequently, the Amazon MQ service handles the heavy lifting of operating a message broker, configuring HA, fault detection, failure recovery, software updates, and so forth. This offers a simple option for reducing management overhead and improving the reliability of a lift-and-shift architecture. What’s more is that applications can migrate to Amazon MQ without the need for any downtime, making this an easy and effective way to improve a lift-and-shift.

You can also use Amazon MQ to integrate legacy applications with modern serverless applications. Lambda functions can subscribe to MQ topics and trigger serverless workflows, enabling compatibility between legacy and new workloads.

Integrating Lift-and-Shift Workloads with Lambda via Amazon MQ

Figure 1: Integrating Lift-and-Shift Workloads with Lambda via Amazon MQ

Amazon Managed Streaming Kafka

Lift-and-shift workloads which include a streaming data component are often built around Apache Kafka. There is a certain amount of complexity involved in operating a Kafka cluster which incurs management and operational expense. Amazon Kinesis is a managed alternative to Apache Kafka, but it is not a drop-in replacement. At re:Invent 2018, we announced the launch of Amazon Managed Streaming Kafka (MSK) in public preview. MSK provides a managed Kafka deployment with pay-as-you-go pricing and an acts as a drop-in replacement in existing Kafka workloads. MSK can help reduce management costs and improve cost efficiency and is ideal for lift-and-shift workloads.

Leveraging S3 for Static Web Hosting

A significant portion of any web application is static content. This includes videos, image, text, and other content that changes seldom, if ever. In many lift-and-shifted applications, web servers are migrated to EC2 instances and host all content – static and dynamic. Hosting static content from an EC2 instance incurs a number of costs including the instance, EBS volumes, and likely, a load balancer. By moving static content to S3, you can significantly reduce the amount of compute required to host your web applications. In many cases, this change is non-disruptive and can be done at the DNS or CDN layer, requiring no change to your application.

Reducing Web Hosting Costs with S3 Static Web Hosting

Figure 2: Reducing Web Hosting Costs with S3 Static Web Hosting

Conclusion

There are numerous opportunities for reducing the cost of a lift-and-shift. Without any modification to the application, lift-and-shift workloads can benefit from cloud-native features. By using AWS services and features, you can significantly reduce the undifferentiated heavy lifting inherent in on-prem workloads and reduce resources and management overheads.

About the author

Dr. Jonathan Shapiro-Ward is an AWS Solutions Architect based in Toronto. He helps customers across Canada to transform their businesses and build industry leading cloud solutions. He has a background in distributed systems and big data and holds a PhD from the University of St Andrews.

Optimizing a Lift-and-Shift for Performance

Post Syndicated from Jonathan Shapiro-Ward original https://aws.amazon.com/blogs/architecture/optimizing-a-lift-and-shift-for-performance/

Many organizations begin their cloud journey with a lift-and-shift of applications from on-premise to AWS. This approach involves migrating software deployments with little, or no, modification. A lift-and-shift avoids a potentially expensive application rewrite but can result in a less optimal workload that a cloud native solution. For many organizations, a lift-and-shift is a transitional stage to an eventual cloud native solution, but there are some applications that can’t feasibly be made cloud-native such as legacy systems or proprietary third-party solutions. There are still clear benefits of moving these workloads to AWS, but how can they be best optimized?

In this blog series post, we’ll look at different approaches for optimizing a black box lift-and-shift. We’ll consider how we can significantly improve a lift-and-shift application across three perspectives: performance, cost, and security. We’ll show that without modifying the application we can integrate services and features that will make a lift-and-shift workload cheaper, faster, more secure, and more reliable. In this first blog, we’ll investigate how a lift-and-shift workload can have improved performance through leveraging AWS features and services.

Performance gains are often a motivating factor behind a cloud migration. On-premise systems may suffer from performance bottlenecks owing to legacy infrastructure or through capacity issues. When performing a lift-and-shift, how can you improve performance? Cloud computing is famous for enabling horizontally scalable architectures but many legacy applications don’t support this mode of operation. Traditional business applications are often architected around a fixed number of servers and are unable to take advantage of horizontal scalability. Even if a lift-and-shift can’t make use of auto scaling groups and horizontal scalability, you can achieve significant performance gains by moving to AWS.

Scaling Up

The easiest alternative to scale up to compute is vertical scalability. AWS provides the widest selection of virtual machine types and the largest machine types. Instances range from small, burstable t3 instances series all the way to memory optimized x1 series. By leveraging the appropriate instance, lift-and-shifts can benefit from significant performance. Depending on your workload, you can also swap out the instances used to power your workload to better meet demand. For example, on days in which you anticipate high load you could move to more powerful instances. This could be easily automated via a Lambda function.

The x1 family of instances offers considerable CPU, memory, storage, and network performance and can be used to accelerate applications that are designed to maximize single machine performance. The x1e.32xlarge instance, for example, offers 128 vCPUs, 4TB RAM, and 14,000 Mbps EBS bandwidth. This instance is ideal for high performance in-memory workloads such as real time financial risk processing or SAP Hana.

Through selecting the appropriate instance types and scaling that instance up and down to meet demand, you can achieve superior performance and cost effectiveness compared to running a single static instance. This affords lift-and-shift workloads far greater efficiency that their on-prem counterparts.

Placement Groups and C5n Instances

EC2 Placement groups determine how you deploy instances to underlying hardware. One can either choose to cluster instances into a low latency group within a single AZ or spread instances across distinct underlying hardware. Both types of placement groups are useful for optimizing lift-and-shifts.

The spread placement group is valuable in applications that rely on a small number of critical instances. If you can’t modify your application  to leverage auto scaling, liveness probes, or failover, then spread placement groups can help reduce the risk of simultaneous failure while improving the overall reliability of the application.

Cluster placement groups help improve network QoS between instances. When used in conjunction with enhanced networking, cluster placement groups help to ensure low latency, high throughput, and high network packets per second. This is beneficial for chatty applications and any application that leveraged physical co-location for performance on-prem.

There is no additional charge for using placement groups.

You can extend this approach further with C5n instances. These instances offer 100Gbps networking and can be used in placement group for the most demanding networking intensive workloads. Using both placement groups and the C5n instances require no modification to your application, only to how it is deployed – making it a strong solution for providing network performance to lift-and-shift workloads.

Leverage Tiered Storage to Optimize for Price and Performance

AWS offers a range of storage options, each with its own performance characteristics and price point. Through leveraging a combination of storage types, lift-and-shifts can achieve the performance and availability requirements in a price effective manner. The range of storage options include:

Amazon EBS is the most common storage service involved with lift-and-shifts. EBS provides block storage that can be attached to EC2 instances and formatted with a typical file system such as NTFS or ext4. There are several different EBS types, ranging from inexpensive magnetic storage to highly performant provisioned IOPS SSDs. There are also storage-optimized instances that offer high performance EBS access and NVMe storage. By utilizing the appropriate type of EBS volume and instance, a compromise of performance and price can be achieved. RAID offers a further option to optimize EBS. EBS utilizes RAID 1 by default, providing replication at no additional cost, however an EC2 instance can apply other RAID levels. For instance, you can apply RAID 0 over a number of EBS volumes in order to improve storage performance.

In addition to EBS, EC2 instances can utilize the EC2 instance store. The instance store provides ephemeral direct attached storage to EC2 instances. The instance store is included with the EC2 instance and provides a facility to store non-persistent data. This makes it ideal for temporary files that an application produces, which require performant storage. Both EBS and the instance store are expose to the EC2 instance as block level devices, and the OS can use its native management tools to format and mount these volumes as per a traditional disk – requiring no significant departure from the on prem configuration. In several instance types including the C5d and P3d are equipped with local NVMe storage which can support extremely IO intensive workloads.

Not all workloads require high performance storage. In many cases finding a compromise between price and performance is top priority. Amazon S3 provides highly durable, object storage at a significantly lower price point than block storage. S3 is ideal for a large number of use cases including content distribution, data ingestion, analytics, and backup. S3, however, is accessible via a RESTful API and does not provide conventional file system semantics as per EBS. This may make S3 less viable for applications that you can’t easily modify, but there are still options for using S3 in such a scenario.

An option for leveraging S3 is AWS Storage Gateway. Storage Gateway is a virtual appliance than can be run on-prem or on EC2. The Storage Gateway appliance can operate in three configurations: file gateway, volume gateway and tape gateway. File gateway provides an NFS interface, Volume Gateway provides an iSCSI interface, and Tape Gateway provides an iSCSI virtual tape library interface. This allows files, volumes, and tapes to be exposed to an application host through conventional protocols with the Storage Gateway appliance persisting data to S3. This allows an application to be agnostic to S3 while leveraging typical enterprise storage protocols.

Using S3 Storage via Storage Gateway

Figure 1: Using S3 Storage via Storage Gateway

Conclusion

A lift-and-shift can achieve significant performance gains on AWS by making use of a range of instance types, storage services, and other features. Even without any modification to the application, lift-and-shift workloads can benefit from cutting edge compute, network, and IO which can help realize significant, meaningful performance gains.

About the author

Dr. Jonathan Shapiro-Ward is an AWS Solutions Architect based in Toronto. He helps customers across Canada to transform their businesses and build industry leading cloud solutions. He has a background in distributed systems and big data and holds a PhD from the University of St Andrews.

Amazon ECS and Docker volume drivers, part 2: Amazon EFS

Post Syndicated from tiffany jernigan (@tiffanyfayj) original https://aws.amazon.com/blogs/compute/amazon-ecs-and-docker-volume-drivers-amazon-efs/

← Introduction and Part 1: Amazon EBS

 

Post by: Tiffany Jernigan and Jeremy Cowan

Introduction

This is the second post in a series showing how to use Docker volumes with Amazon ECS. If you are unfamiliar with Docker volumes or REX-Ray, or want to know how to use a volume plugin with ECS and Amazon Elastic Block Store (Amazon EBS), see Part 1.

In this post, you use the REX-Ray EFS plugin with Amazon Elastic File System (Amazon EFS) to persist and share data among multiple ECS tasks. To help you get started, we have created an AWS CloudFormation template that builds a two-instance ECS cluster across two Availability Zones.

The template bootstraps the REX-Ray EFS plugin onto each node. Each instance has the REX-Ray EFS plugin installed, is assigned an IAM role with an inline policy with permissions for REX-Ray to issue the necessary AWS API calls, and a security group to open port 2049 for EFS. The template also creates a Network Load Balancer that is used to expose an ECS service to the internet.

Set up the environment

First, create a folder in which you create all files and enter it. Next, set the full path for your EC2 key pair that you need later to connect to your instance using SSH.

#example path /Users/tiffany/.aws/ec2-keypair.pem
export KeyPairPath=<your-keypair>

Step 1: Instantiate the CloudFormation template

Next, create a CloudFormation stack with the following S3 template:
rexray-demo-efs.yaml

KeyPairName=$(echo $KeyPairPath | cut -d / -f5 | sed 's/.pem//')
Region=$(aws configure get region) #You can also replace this
CloudFormationStack=$(aws cloudformation create-stack \
--region $Region \
--stack-name rexray-demo-efs \
--capabilities CAPABILITY_NAMED_IAM \
--template-url http://s3.amazonaws.com/ecs-refarch-volume-plugins/rexray-demo-efs.yaml \
--parameters ParameterKey=KeyName,ParameterValue=$KeyPairName \
| jq -r .StackId)

The ECS container instances are bootstrapped with a user data script that installs the rexray/efs Docker plugin using:

docker plugin install rexray/efs REXRAY_PREEMPT=true \
EFS_REGION=${AWS::Region} \
EFS_SECURITYGROUPS=${EFSSecurityGroup} \
--grant-all-permissions

Step 2: Export output parameters as environment variables

This shell script exports the output parameters from the CloudFormation template. With the following command, import them as OS environment variables. Later, you use these variables to create task and service definitions.

cat > get-outputs.sh << 'EOF'
#!/bin/bash
function usage {
  echo "usage: source <(./get-outputs.sh  )"
  echo "stack name or ID must be provided or exported as the CloudFormationStack environment variable"
  echo "region must be provided or set with aws configure"
}

function main {
    #Get stack
    if [ -z "$1" ]; then
        if [ -z "$CloudFormationStack" ]; then
            echo "please provide stack name or ID"
            usage
            exit 1
        fi
    else
        CloudFormationStack="$1"
    fi
    #Get region
    if [ -z "$2" ]; then
        region=$(aws configure get region)
        if [ -z $region ]; then
            echo "please provide region"
            usage
            exit 1
        fi
    else
        region="$2"
    fi
    
    echo "#Region: $region"
    echo "#Stack: $CloudFormationStack"
    echo "#---"
    
    echo "#Checking if stack exists..."
    aws cloudformation wait stack-exists \
    --region $region \
    --stack-name $CloudFormationStack
    
    echo "#Checking if stack creation is complete..."
    aws cloudformation wait stack-create-complete \
    --region $region \
    --stack-name $CloudFormationStack
     
    echo "#Getting output keys and values..."
    echo "#---"
    aws cloudformation describe-stacks \
    --region $region \
    --stack-name $CloudFormationStack \
    --query 'Stacks[].Outputs[].[OutputKey, OutputValue]' \
    --output text | awk '{print "export", $1"="$2}'
}
main "[email protected]"
EOF
#Add executable permissions
chmod +x get-outputs.sh

Now run the script:

./get-outputs.sh && source <(./get-outputs.sh)

Step 3: Create a task definition

In this step, you create a task definition for an Apache web service, Space, which is an example website using Apache2 on Ubuntu. The scheduler and the REX-Ray EFS plugin ensure that each copy of the task establishes a connection with EFS.

cat > space-taskdef-efs.json << EOF 
{
    "containerDefinitions": [
        {
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "${CWLogGroupName}",
                    "awslogs-region": "${AWSRegion}",
                    "awslogs-stream-prefix": "ecs"
                }
            },
            "portMappings": [
               {
                    "containerPort": 80,
                    "protocol": "tcp"
                }
            ],
            "mountPoints": [
                {
                    "containerPath": "/var/www/",
                    "sourceVolume": "rexray-efs-vol"
                }
            ],
            "image": "tiffanyfay/space:apache",
            "essential": true,
            "name": "space"
        }
    ],
    "memory": "512",
    "family": "rexray-efs",
    "networkMode": "awsvpc",
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "512",
    "volumes": [
        {
            "name": "rexray-efs-vol",
            "dockerVolumeConfiguration": {
                "autoprovision": true,
                "scope": "shared",
                "driver": "rexray/efs"
            }
        }
    ]
}
EOF

Because autoprovision is set to true, the Docker volume driver, rexray/efs, creates a new file system for you. And because scope is shared, the file system can be used across multiple tasks.

Register the task definition and extract the task definition ARN from the result:

TaskDefinitionArn=$(aws ecs register-task-definition \
--region $AWSRegion \
--cli-input-json 'file://space-taskdef-efs.json' \
| jq -r .taskDefinition.taskDefinitionArn)

Step 4: Create a service definition

In this step, you create a service definition for the rexray-efs task definition. An ECS service is a long-running task that is monitored by the service scheduler. If the task dies or becomes unhealthy, the scheduler automatically attempts to restart the task.

The web service is fronted by a Network Load Balancer that is configured for forward traffic on port 80 to the tasks registered with a specific target group. The desired count is the desired number of task copies to run. The minimum and maximum healthy percent parameters inform the scheduler to run only exactly the number of desired copies of this task at a time. Unless a task has been stopped, it does not try starting a new one.

cat > space-svcdef-efs.json << EOF 
{
    "cluster": "${ECSClusterName}",
    "serviceName": "space-svc",
    "taskDefinition": "${TaskDefinitionArn}",
    "loadBalancers": [
        {
            "targetGroupArn": "${WebTargetGroupArn}",
            "containerName": "space",
            "containerPort": 80
        }
    ],
    "desiredCount": 4,
    "launchType": "EC2",
    "healthCheckGracePeriodSeconds": 60, 
    "deploymentConfiguration": {
        "maximumPercent": 100,
        "minimumHealthyPercent": 0
    },
    "networkConfiguration": {
        "awsvpcConfiguration": {
            "subnets": [
                "${SubnetIds}"
            ],
            "securityGroups": [
                "${EFSSecurityGroupId}",
                "${InstanceSecurityGroupId}"
            ]
        }
    }
}
EOF

Create the Apache service:

SvcDefinitionArn=$(aws ecs create-service \
--region $AWSRegion \
--cli-input-json file://space-svcdef-efs.json \
| jq -r .service.serviceArn)

Wait for service to be up with the last status as RUNNING for the tasks using either the CLI or the console:

aws ecs wait services-stable \
--region $AWSRegion \
--cluster $ECSClusterName \
--services $SvcDefinitionArn

Next, look at your file system and see two mount points—one for each Availability Zone:

FileSystemId=$(aws efs describe-file-systems \
--region $AWSRegion \
--query 'FileSystems[?Name==`/rexray-efs-vol`].FileSystemId' \
--output text)
aws efs describe-mount-targets \
--region $AWSRegion \
--file-system-id $FileSystemId 

Step 5: View the webpage

Now, open a browser and paste NLBDNSName as the URL.

echo $NLBDNSName

If you refresh the page, you can see that the task ID and EC2 instance ID change as the traffic is being load balanced.

Get the DNS info for an instance so that you can connect to it using SSH and modify index.shtml:

InstanceDns=$(aws ec2 describe-instances \
--region $AWSRegion \
--filter Name="tag:aws:cloudformation:stack-id",Values="$CloudFormationStack" \
--query 'Reservations[1].Instances[].PublicDnsName' \
--output text)
ssh -i $KeyPairPath [email protected]$InstanceDns

Now, get one of the Docker container IDs and use docker exec to change the image being displayed:

ContainerId=$(docker ps --filter volume="rexray-efs-vol" \
--format "{{.ID}}" --latest)
docker exec -it $ContainerId sed -i "s/ecsship/cruiser/" /var/www/index.shtml

To see the update, refresh the load balancer webpage.

Step 6: Clean up

To clean up the resources that you created in this post, take the following steps.

Delete the mount targets and file system.

FileSystemId=$(aws efs describe-file-systems \
--region $AWSRegion \
--query 'FileSystems[?Name==`/rexray-efs-vol`].FileSystemId' \
--output text)
MountTargetIds=($(aws efs describe-mount-targets \
--region $AWSRegion \
--file-system-id $FileSystemId \
--query 'MountTargets[].MountTargetId' --output text))
aws efs delete-mount-target --region $AWSRegion \
--mount-target-id ${MountTargetIds[2]}
aws efs delete-mount-target --region $AWSRegion \
--mount-target-id ${MountTargetIds[1]}
aws efs delete-file-system --region $AWSRegion \
--file-system-id $FileSystemId 

Delete the service.

aws ecs update-service \
--region $AWSRegion \
--cluster $ECSClusterName \
--service $SvcDefinitionArn \
--desired-count 0
aws ecs delete-service \
--region $AWSRegion \
--cluster $ECSClusterName \
--service $SvcDefinitionArn

Delete the CloudFormation template. This removes the rest of the environment that was pre-created for this exercise.

aws cloudformation delete-stack --region $AWSRegion \
--stack-name $CloudFormationStack

Summary

Congratulations on getting your service up and running with Docker volume plugins and EFS!

You have created a CloudFormation stack including two instances, running the REX-Ray EFS plugin, across two subnets, a Network Load Balancer, as well as an ECS cluster. You also created a task definition and service which used the plugin to create an elastic filesystem.

We look forward to hearing about how you use Docker Volume Plugins with ECS.

Tiffany and Jeremy

New – Provisioned Throughput for Amazon Elastic File System (EFS)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-provisioned-throughput-for-amazon-elastic-file-system-efs/

Amazon Elastic File System lets you create petabyte-scale file systems that can be accessed in massively parallel fashion from hundreds or thousands of Amazon Elastic Compute Cloud (EC2) servers and on-premises resources, scaling on demand without disrupting applications. Behind the scenes, storage is distributed across multiple Availability Zones and redundant storage servers in order to provide you with file systems that are scalable, durable, and highly available. Space is allocated and billed on as as-needed basis, allowing you to consume as much as you need while keeping costs proportional to actual usage. Applications can achieve high levels of aggregate throughput and IOPS, with consistent low latencies. Our customers are using EFS for a broad spectrum of use cases including media processing workflows, big data & analytics jobs, code repositories, build trees, and content management repositories, taking advantage of the ability to simply lift-and-shift their existing file-based applications and workflows to the cloud.

A Quick Review
As you may already know, EFS lets you choose one of two performance modes each time you create a file system:

General Purpose – This is the default mode, and the one that you should start with. It is perfect for use cases that are sensitive to latency, and supports up to 7,000 operations per second per file system.

Max I/O – This mode scales to higher levels of aggregate throughput and performance, with slightly higher latency. It does not have an intrinsic limit on operations per second.

With either mode, throughput scales with the size of the file system, and can also burst to higher levels as needed. The size of the file system determines the upper bound on how high and how long you can burst. For example:

1 TiB – A 1 TiB file system can deliver 50 MiB/second continuously, and burst to 100 MiB/second for up to 12 hours each day.

10 TiB – A 10 TiB file system can deliver 500 MiB/second continuously, and burst up to 1 GiB/second for up to 12 hours each day.

EFS uses a credit system that allows you to “save up” throughput during quiet times and then use it during peak times. You can read about Amazon EFS Performance to learn more about the credit system and the two performance modes.

New Provisioned Throughput
Web server content farms, build trees, and EDA simulations (to name a few) can benefit from high throughput to a set of files that do not occupy a whole lot of space. In order to address this usage pattern, you now have the option to provision any desired level of throughput (up to 1 GiB/second) for each of your EFS file systems. You can set an initial value when you create the file system, and then increase it as often as you’d like. You can dial it back down every 24 hours, and you can also switch between provisioned throughput and bursting throughput on the same cycle. You can for example, configure an EFS file system to provide 50 MiB/second of throughput for your web server, even if the volume contains a relatively small amount of content.

If your application has throughput requirements that exceed what is possible using the default (bursting) model, the provisioned model is for you! You can achieve the desired level of throughput as soon as you create the file system, regardless of how much or how little storage you consume.

Here’s how I set this up:

Using provisioned throughput means that I will be billed separately for storage (in GiB/month units) and for provisioned throughput (in MiB/second-month units).

I can monitor average throughput by using a CloudWatch metric math expression. The Amazon EFS Monitoring Tutorial contains all of the formulas, along with CloudFormation templates that I can use to set up a complete CloudWatch Dashboard in a matter of minutes:

I select the desired template, enter the Id of my EFS file system, and click through the remaining screens to create my dashboard:

The template creates an IAM role, a Lambda function, a CloudWatch Events rule, and the dashboard:

The dashboard is available within the CloudWatch Console:

Here’s the dashboard for my test file system:

To learn more about how to use the EFS performance mode that is the best fit for your application, read Amazon Elastic File System – Choosing Between the Different Throughput & Performance Modes.

Available Now
This feature is available now and you can start using in today in all AWS regions where EFS is available.

Jeff;

 

New – Encryption of Data in Transit for Amazon EFS

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-encryption-of-data-in-transit-for-amazon-efs/

Amazon Elastic File System was designed to be the file system of choice for cloud-native applications that require shared access to file-based storage. We launched EFS in mid-2016 and have added several important features since then including on-premises access via Direct Connect and encryption of data at rest. We have also made EFS available in additional AWS Regions, most recently US West (Northern California). As was the case with EFS itself, these enhancements were made in response to customer feedback, and reflect our desire to serve an ever-widening customer base.

Encryption in Transit
Today we are making EFS even more useful with the addition of support for encryption of data in transit. When used in conjunction with the existing support for encryption of data at rest, you now have the ability to protect your stored files using a defense-in-depth security strategy.

In order to make it easy for you to implement encryption in transit, we are also releasing an EFS mount helper. The helper (available in source code and RPM form) takes care of setting up a TLS tunnel to EFS, and also allows you to mount file systems by ID. The two features are independent; you can use the helper to mount file systems by ID even if you don’t make use of encryption in transit. The helper also supplies a recommended set of default options to the actual mount command.

Setting up Encryption
I start by installing the EFS mount helper on my Amazon Linux instance:

$ sudo yum install -y amazon-efs-utils

Next, I visit the EFS Console and capture the file system ID:

Then I specify the ID (and the TLS option) to mount the file system:

$ sudo mount -t efs fs-92758f7b -o tls /mnt/efs

And that’s it! The encryption is transparent and has an almost negligible impact on data transfer speed.

Available Now
You can start using encryption in transit today in all AWS Regions where EFS is available.

The mount helper is available for Amazon Linux. If you are running another distribution of Linux you will need to clone the GitHub repo and build your own RPM, as described in the README.

Jeff;

Event-Driven Computing with Amazon SNS and AWS Compute, Storage, Database, and Networking Services

Post Syndicated from Christie Gifrin original https://aws.amazon.com/blogs/compute/event-driven-computing-with-amazon-sns-compute-storage-database-and-networking-services/

Contributed by Otavio Ferreira, Manager, Software Development, AWS Messaging

Like other developers around the world, you may be tackling increasingly complex business problems. A key success factor, in that case, is the ability to break down a large project scope into smaller, more manageable components. A service-oriented architecture guides you toward designing systems as a collection of loosely coupled, independently scaled, and highly reusable services. Microservices take this even further. To improve performance and scalability, they promote fine-grained interfaces and lightweight protocols.

However, the communication among isolated microservices can be challenging. Services are often deployed onto independent servers and don’t share any compute or storage resources. Also, you should avoid hard dependencies among microservices, to preserve maintainability and reusability.

If you apply the pub/sub design pattern, you can effortlessly decouple and independently scale out your microservices and serverless architectures. A pub/sub messaging service, such as Amazon SNS, promotes event-driven computing that statically decouples event publishers from subscribers, while dynamically allowing for the exchange of messages between them. An event-driven architecture also introduces the responsiveness needed to deal with complex problems, which are often unpredictable and asynchronous.

What is event-driven computing?

Given the context of microservices, event-driven computing is a model in which subscriber services automatically perform work in response to events triggered by publisher services. This paradigm can be applied to automate workflows while decoupling the services that collectively and independently work to fulfil these workflows. Amazon SNS is an event-driven computing hub, in the AWS Cloud, that has native integration with several AWS publisher and subscriber services.

Which AWS services publish events to SNS natively?

Several AWS services have been integrated as SNS publishers and, therefore, can natively trigger event-driven computing for a variety of use cases. In this post, I specifically cover AWS compute, storage, database, and networking services, as depicted below.

Compute services

  • Auto Scaling: Helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You can configure Auto Scaling lifecycle hooks to trigger events, as Auto Scaling resizes your EC2 cluster.As an example, you may want to warm up the local cache store on newly launched EC2 instances, and also download log files from other EC2 instances that are about to be terminated. To make this happen, set an SNS topic as your Auto Scaling group’s notification target, then subscribe two Lambda functions to this SNS topic. The first function is responsible for handling scale-out events (to warm up cache upon provisioning), whereas the second is in charge of handling scale-in events (to download logs upon termination).

  • AWS Elastic Beanstalk: An easy-to-use service for deploying and scaling web applications and web services developed in a number of programming languages. You can configure event notifications for your Elastic Beanstalk environment so that notable events can be automatically published to an SNS topic, then pushed to topic subscribers.As an example, you may use this event-driven architecture to coordinate your continuous integration pipeline (such as Jenkins CI). That way, whenever an environment is created, Elastic Beanstalk publishes this event to an SNS topic, which triggers a subscribing Lambda function, which then kicks off a CI job against your newly created Elastic Beanstalk environment.

  • Elastic Load Balancing: Automatically distributes incoming application traffic across Amazon EC2 instances, containers, or other resources identified by IP addresses.You can configure CloudWatch alarms on Elastic Load Balancing metrics, to automate the handling of events derived from Classic Load Balancers. As an example, you may leverage this event-driven design to automate latency profiling in an Amazon ECS cluster behind a Classic Load Balancer. In this example, whenever your ECS cluster breaches your load balancer latency threshold, an event is posted by CloudWatch to an SNS topic, which then triggers a subscribing Lambda function. This function runs a task on your ECS cluster to trigger a latency profiling tool, hosted on the cluster itself. This can enhance your latency troubleshooting exercise by making it timely.

Storage services

  • Amazon S3: Object storage built to store and retrieve any amount of data.You can enable S3 event notifications, and automatically get them posted to SNS topics, to automate a variety of workflows. For instance, imagine that you have an S3 bucket to store incoming resumes from candidates, and a fleet of EC2 instances to encode these resumes from their original format (such as Word or text) into a portable format (such as PDF).In this example, whenever new files are uploaded to your input bucket, S3 publishes these events to an SNS topic, which in turn pushes these messages into subscribing SQS queues. Then, encoding workers running on EC2 instances poll these messages from the SQS queues; retrieve the original files from the input S3 bucket; encode them into PDF; and finally store them in an output S3 bucket.

  • Amazon EFS: Provides simple and scalable file storage, for use with Amazon EC2 instances, in the AWS Cloud.You can configure CloudWatch alarms on EFS metrics, to automate the management of your EFS systems. For example, consider a highly parallelized genomics analysis application that runs against an EFS system. By default, this file system is instantiated on the “General Purpose” performance mode. Although this performance mode allows for lower latency, it might eventually impose a scaling bottleneck. Therefore, you may leverage an event-driven design to handle it automatically.Basically, as soon as the EFS metric “Percent I/O Limit” breaches 95%, CloudWatch could post this event to an SNS topic, which in turn would push this message into a subscribing Lambda function. This function automatically creates a new file system, this time on the “Max I/O” performance mode, then switches the genomics analysis application to this new file system. As a result, your application starts experiencing higher I/O throughput rates.

  • Amazon Glacier: A secure, durable, and low-cost cloud storage service for data archiving and long-term backup.You can set a notification configuration on an Amazon Glacier vault so that when a job completes, a message is published to an SNS topic. Retrieving an archive from Amazon Glacier is a two-step asynchronous operation, in which you first initiate a job, and then download the output after the job completes. Therefore, SNS helps you eliminate polling your Amazon Glacier vault to check whether your job has been completed, or not. As usual, you may subscribe SQS queues, Lambda functions, and HTTP endpoints to your SNS topic, to be notified when your Amazon Glacier job is done.

  • AWS Snowball: A petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data.You can leverage Snowball notifications to automate workflows related to importing data into and exporting data from AWS. More specifically, whenever your Snowball job status changes, Snowball can publish this event to an SNS topic, which in turn can broadcast the event to all its subscribers.As an example, imagine a Geographic Information System (GIS) that distributes high-resolution satellite images to users via Web browser. In this example, the GIS vendor could capture up to 80 TB of satellite images; create a Snowball job to import these files from an on-premises system to an S3 bucket; and provide an SNS topic ARN to be notified upon job status changes in Snowball. After Snowball changes the job status from “Importing” to “Completed”, Snowball publishes this event to the specified SNS topic, which delivers this message to a subscribing Lambda function, which finally creates a CloudFront web distribution for the target S3 bucket, to serve the images to end users.

Database services

  • Amazon RDS: Makes it easy to set up, operate, and scale a relational database in the cloud.RDS leverages SNS to broadcast notifications when RDS events occur. As usual, these notifications can be delivered via any protocol supported by SNS, including SQS queues, Lambda functions, and HTTP endpoints.As an example, imagine that you own a social network website that has experienced organic growth, and needs to scale its compute and database resources on demand. In this case, you could provide an SNS topic to listen to RDS DB instance events. When the “Low Storage” event is published to the topic, SNS pushes this event to a subscribing Lambda function, which in turn leverages the RDS API to increase the storage capacity allocated to your DB instance. The provisioning itself takes place within the specified DB maintenance window.

  • Amazon ElastiCache: A web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud.ElastiCache can publish messages using Amazon SNS when significant events happen on your cache cluster. This feature can be used to refresh the list of servers on client machines connected to individual cache node endpoints of a cache cluster. For instance, an ecommerce website fetches product details from a cache cluster, with the goal of offloading a relational database and speeding up page load times. Ideally, you want to make sure that each web server always has an updated list of cache servers to which to connect.To automate this node discovery process, you can get your ElastiCache cluster to publish events to an SNS topic. Thus, when ElastiCache event “AddCacheNodeComplete” is published, your topic then pushes this event to all subscribing HTTP endpoints that serve your ecommerce website, so that these HTTP servers can update their list of cache nodes.

  • Amazon Redshift: A fully managed data warehouse that makes it simple to analyze data using standard SQL and BI (Business Intelligence) tools.Amazon Redshift uses SNS to broadcast relevant events so that data warehouse workflows can be automated. As an example, imagine a news website that sends clickstream data to a Kinesis Firehose stream, which then loads the data into Amazon Redshift, so that popular news and reading preferences might be surfaced on a BI tool. At some point though, this Amazon Redshift cluster might need to be resized, and the cluster enters a ready-only mode. Hence, this Amazon Redshift event is published to an SNS topic, which delivers this event to a subscribing Lambda function, which finally deletes the corresponding Kinesis Firehose delivery stream, so that clickstream data uploads can be put on hold.At a later point, after Amazon Redshift publishes the event that the maintenance window has been closed, SNS notifies a subscribing Lambda function accordingly, so that this function can re-create the Kinesis Firehose delivery stream, and resume clickstream data uploads to Amazon Redshift.

  • AWS DMS: Helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.DMS also uses SNS to provide notifications when DMS events occur, which can automate database migration workflows. As an example, you might create data replication tasks to migrate an on-premises MS SQL database, composed of multiple tables, to MySQL. Thus, if replication tasks fail due to incompatible data encoding in the source tables, these events can be published to an SNS topic, which can push these messages into a subscribing SQS queue. Then, encoders running on EC2 can poll these messages from the SQS queue, encode the source tables into a compatible character set, and restart the corresponding replication tasks in DMS. This is an event-driven approach to a self-healing database migration process.

Networking services

  • Amazon Route 53: A highly available and scalable cloud-based DNS (Domain Name System). Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.You can set CloudWatch alarms and get automated Amazon SNS notifications when the status of your Route 53 health check changes. As an example, imagine an online payment gateway that reports the health of its platform to merchants worldwide, via a status page. This page is hosted on EC2 and fetches platform health data from DynamoDB. In this case, you could configure a CloudWatch alarm for your Route 53 health check, so that when the alarm threshold is breached, and the payment gateway is no longer considered healthy, then CloudWatch publishes this event to an SNS topic, which pushes this message to a subscribing Lambda function, which finally updates the DynamoDB table that populates the status page. This event-driven approach avoids any kind of manual update to the status page visited by merchants.

  • AWS Direct Connect (AWS DX): Makes it easy to establish a dedicated network connection from your premises to AWS, which can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.You can monitor physical DX connections using CloudWatch alarms, and send SNS messages when alarms change their status. As an example, when a DX connection state shifts to 0 (zero), indicating that the connection is down, this event can be published to an SNS topic, which can fan out this message to impacted servers through HTTP endpoints, so that they might reroute their traffic through a different connection instead. This is an event-driven approach to connectivity resilience.

More event-driven computing on AWS

In addition to SNS, event-driven computing is also addressed by Amazon CloudWatch Events, which delivers a near real-time stream of system events that describe changes in AWS resources. With CloudWatch Events, you can route each event type to one or more targets, including:

Many AWS services publish events to CloudWatch. As an example, you can get CloudWatch Events to capture events on your ETL (Extract, Transform, Load) jobs running on AWS Glue and push failed ones to an SQS queue, so that you can retry them later.

Conclusion

Amazon SNS is a pub/sub messaging service that can be used as an event-driven computing hub to AWS customers worldwide. By capturing events natively triggered by AWS services, such as EC2, S3 and RDS, you can automate and optimize all kinds of workflows, namely scaling, testing, encoding, profiling, broadcasting, discovery, failover, and much more. Business use cases presented in this post ranged from recruiting websites, to scientific research, geographic systems, social networks, retail websites, and news portals.

Start now by visiting Amazon SNS in the AWS Management Console, or by trying the AWS 10-Minute Tutorial, Send Fan-out Event Notifications with Amazon SNS and Amazon SQS.