Tag Archives: Technical How-to

Monitoring and troubleshooting serverless data analytics applications

2021-06-28 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/monitoring-and-troubleshooting-serverless-data-analytics-applications/

This series is about building serverless solutions in streaming data workloads. The application example used in this series is Alleycat, which allows bike racers to compete with each other virtually on home exercise bikes.

The first four posts have explored the architecture behind the application, which is enabled by Amazon Kinesis, Amazon DynamoDB, and AWS Lambda. This post explains how to monitor and troubleshoot issues that are common in streaming applications.

To set up the example, visit the GitHub repo and follow the instructions in the README.md file. Note that this walkthrough uses services that are not covered by the AWS Free Tier and incur cost.

Monitoring the Alleycat application

The business requirements for Alleycat state that it must handle up to 1,000 simultaneous racers. With each racer emitting a message every second, each 5-minute race results in 300,000 messages.

While the architecture can support this throughput, the settings for each service determine how the workload scales up. The deployment templates in the GitHub repo do not use sufficiently high settings to handle this amount of data. In the section, I show how this results in errors and what steps you can take to resolve the issues. To start, I run the simulator for several races with the maximum racers configuration set to 1,000.

Monitoring the Kinesis stream

The monitoring tab of the Kinesis stream provides visualizations of stream metrics. This immediately shows that there is a problem in the application when running at full capacity:

The iterator age is growing, indicating that the data consumers are falling behind the data producers. The Get records graph also shows the number of records in the stream growing.
The Incoming data (count) metric shows the number of separate records ingested by the stream. The red line indicates the maximum capacity of this single-shard stream. With 1,000 active racers, this is almost at full capacity.
However, the Incoming data – sum (bytes) graph shows that the total amount of data ingested by the stream is currently well under the maximum level shown by the red line.

There are two solutions for improving the capacity on the stream. First, the data producer application (the Alleycat frontend) could combine messages before sending. It’s currently reaching the total number of messages per second but the total byte capacity is significantly below the maximum. This action improves message packing but increases latency since the frontend waits to group messages.

Alternatively, you can add capacity by resharding. This enables you to increase (or decrease) the number of shards in a stream to adapt to the rate of data flowing through the application. You can do this with the UpdateShardCount API action. The existing stream goes into an Updating status and the stream scales by splitting shards. This creates two new child shards that split the partition keyspace of the parent. It also results in another, separate Lambda consumer for the new shard.

Monitoring the Lambda function

The monitoring tab of the consuming Lambda function provides visualization of metrics that can highlight problems in the workload. At full capacity, the monitoring highlights issues to resolve:

The Duration chart shows that the function is exceeding its 15-second timeout, when the function normally finishes in under a second. This typically indicates that there are too many records to process in a single batch or throttling is occurring downstream.
The Error count metric is growing, which highlights either logical errors in the code or errors from API calls to downstream resources.
The IteratorAge metric appears for Lambda functions that are consuming from streams. In this case, the growing metric confirms that data consumption is falling behind data production in the stream.
Concurrent executions remain at 1 throughout. This is set by the parallelization factor in the event source mapping and can be increased up to 10.

Monitoring the DynamoDB table

The metric tab on the application’s table in the DynamoDB console provides visualizations for the performance of the service:

The consumed Read usage is well within the provisioned maximum and there is no read throttling on the table.
Consumed Write usage, shown in blue, is frequently bursting through the provisioned capacity.
The number of Write throttled requests confirms that the DynamoDB service is throttling requests since the table is over capacity.

You can resolve this issue by increasing the provisioned throughput on the table and related global secondary indexes. Write capacity units (WCUs) provide 1 KB of write throughput per second. You can set this value manually, use automatic scaling to match varying throughout, or enable on-demand mode. Read more about the pricing models for each to determine the best approach for your workload.

Monitoring Kinesis Data Streams

Kinesis Data Streams ingests data into shards, which are fixed capacity sequences of records, up to 1,000 records or 1 MB per second. There is no limit to the amount of data held within a stream but there is a configurable retention period. By default, Kinesis stores records for 24 hours but you can increase this up to 365 days as needed.

Kinesis is integrated with Amazon CloudWatch. Basic metrics are published every minute, and you can optionally enable enhanced metrics for an additional charge. In this section, I review the most commonly used metrics for monitoring the health of streams in your application.

Metrics for monitoring data producers

When data producers are throttled, they cannot put new records onto a Kinesis stream. Use the WriteProvisionedThroughputExceeded metric to detect if producers are throttled. If this is more than zero, you won’t be able to put records to the stream. Monitoring the Average for this statistic can help you determine if your producers are healthy.

When producers succeed in sending data to a stream, the PutRecord.Success and PutRecords.Success are incremented. Monitoring for spikes or drops in these metrics can help you monitor the health of producers and catch problems early. There are two separate metrics for each of the API calls, so watch the Average statistic for whichever of the two calls your application uses.

Metrics for monitoring data consumers

When data consumers are throttled or start to generate errors, Kinesis continues to accept new records from producers. However, there is growing latency between when records are written and when they are consumed for processing.

Using the GetRecords.IteratorAgeMilliseconds metric, you can measure the difference between the age of the last record consumed and the latest record put to the stream. It is important to monitor the iterator age. If the age is high in relation to the stream’s retention period, you can lose data as records expire from the stream. This value should generally not exceed 50% of the stream’s retention period – when the value reaches 100% of the stream retention period, data is lost.

If the iterator age is growing, one temporary solution is to increase the retention time of the stream. This gives you more time to resolve the issue before losing data. A more permanent solution is to add more consumers to keep up with data production, or resolve any errors that are slowing consumers.

When consumers exceed the ReadProvisionedThroughputExceeded metric, they are throttled and you cannot read from the stream. This results in a growth of records in the stream waiting for processing. Monitor the Average statistic for this metric and aim for values as close to 0 as possible.

The GetRecords.Success metric is the consumer-side equivalent of PutRecords.Success. Monitor this value for spikes or drops to ensure that your consumers are healthy. The Average is usually the most useful statistic for this purpose.

Increasing data processing throughput for Kinesis Data Streams

Adjusting the parallelization factor

Kinesis invokes Lambda consumers every second with a configurable batch size of messages. It’s important that the processing in the function keeps pace with the rate of traffic to avoid a growing iterator age. For compute intensive functions, you can increase the memory allocated in the function, which also increases the amount of virtual CPU available. This can help reduce the duration of a processing function.

If this is not possible or the function is falling behind data production in the stream, consider increasing the parallelization factor. By default, this is set to 1, meaning that each shard has a single instance of a Lambda function it invokes. You can increase this up to 10, which results in multiple instances of the consumer function processing additional batches of messages.

Using enhanced fan-out to reduce iterator age

Standard consumers use a pull model over HTTP to fetch batches of records. Each consumer operates in serial. A stream with five consumers averages 200 ms of latency each, meaning it takes up to 1 second for all five to receive batches of records.

You can improve the overall latency by removing any unnecessary data consumers. If you use Kinesis Data Firehose and Kinesis Data Analytics on a stream, these count as consumers too. If you can remove subscribers, this helps with over data consumption throughput.

If the workload needs all of the existing subscribers, use enhanced fan-out (EFO). EFO consumers use a push model over HTTP/2 and are independent of each other. With EFO, the same five consumers in the previous example would receive batches of messages in parallel, using dedicated throughput. Overall latency averages 70 ms and typically data delivery speed is improved by up to 65%. There is an additional charge for this feature.

To learn more about processing streaming data with Lambda, see this AWS Online Tech Talk presentation.

Conclusion

In this post, I show how the existing settings in the Alleycat application are not sufficient for handling the expected amount of traffic. I walk through the metrics visualizations for Kinesis Data Streams, Lambda, and DynamoDB to find which quotas should be increased.

I explain which CloudWatch metrics can be used with Kinesis Data Stream to ensure that data producers and data consumers are healthy. Finally, I show how you can use the parallelization factor and enhanced fan-out features to increase the throughput of data consumers.

For more serverless learning resources, visit Serverless Land.

Building a CI/CD pipeline to update an AWS CloudFormation StackSets

2021-06-26 Karim Afifi

Post Syndicated from Karim Afifi original https://aws.amazon.com/blogs/devops/building-a-ci-cd-pipeline-to-update-an-aws-cloudformation-stacksets/

AWS CloudFormation StackSets can extend the functionality of CloudFormation Stacks by enabling you to create, update, or delete one or more stack across multiple accounts. As a developer working in a large enterprise or for a group that supports multiple AWS accounts, you may often find yourself challenged with updating AWS CloudFormation StackSets. If you’re building a CI/CD pipeline to automate the process of updating CloudFormation stacks, you can do so natively. AWS CodePipeline can initiate a workflow that builds and tests a stack, and then pushes it to production. The workflow can either create or manipulate an existing stack; however, working with AWS CloudFormation StackSets is currently not a supported action at the time of this writing.

You can update an existing CloudFormation stack using one of two methods:

Directly updating the stack – AWS immediately deploys the changes that you submit. You can use this method when you want to quickly deploy your updates.
Running change sets – You can preview the changes AWS CloudFormation will make to the stack, and decide whether to proceed with the changes.

You have several options when building a CI/CD pipeline to automate creating or updating a stack. You can create or update a stack, delete a stack, create or replace a change set, or run a change set. Creating or updating a CloudFormation StackSet, however, is not a supported action.

The following screenshot shows the existing actions supported by CodePipeline against AWS CloudFormation on the CodePipeline console.

CodePipeline console

This post explains how to use CodePipeline to update an existing CloudFormation StackSet. For this post, we update the StackSet’s parameters. Parameters enable you to input custom values to your template each time you create or update a stack.

Overview of solution

To implement this solution, we walk you through the following high-level steps:

Update a parameter for a StackSet by passing a parameter key and its associated value via an AWS CodeCommit
Create an AWS CodeBuild
Build a CI/CD pipeline.
Run your pipeline and monitor its status.

After completing all the steps in this post, you will have a fully functional CI/CD that updates the CloudFormation StackSet parameters. The pipeline starts automatically after you apply the intended changes into the CodeCommit repository.

The following diagram illustrates the solution architecture.

Solution Architecture

The solution workflow is as follows:

Developers integrate changes into a main branch hosted within a CodeCommit repository.
CodePipeline polls the source code repository and triggers the pipeline to run when a new version is detected.
CodePipeline runs a build of the new revision in CodeBuild.
CodeBuild runs the changes in the yml file, which includes the changes against the StackSets. (To update all the stack instances associated with this StackSet, do not specify DeploymentTargets or Regions in the buildspec.yml file.)
Verify that the changes were applied successfully.

Prerequisites

To complete this tutorial, you should have the following prerequisites:

An AWS account.
Access to either AWS Cloud9 or the AWS Command Line Interface (AWS CLI).
Basic knowledge of AWS CloudFormation.
A CloudFormation StackSet. You can create a StackSet via the AWS Management Console or the AWS CLI. For instructions, see Create a StackSet. You can also use one of the sample templates provided by AWS CloudFormation device to create a StackSet. For more information, please visit the Create a StackSet
An AWS Identity and Access Management (IAM) service role to allow CodeBuild to build this project.

Retrieving your StackSet parameters

Your first step is to verify that you have a StackSet in the AWS account you intend to use. If not, create one before proceeding. For this post, we use an existing StackSet called StackSet-Test.

Sign in to your AWS account.
On the CloudFormation console, choose StackSets.
Choose your StackSet.

StackSet

For this post, we modify the value of the parameter with the key KMSId.

On the Parameters tab, note the value of the key KMSId.

Parameters

Creating a CodeCommit repository

To create your repository, complete the following steps:

On the CodeCommit console, choose Repositories.
Choose Create repository.

Repositories name

For Repository name, enter a name (for example, Demo-Repo).
Choose Create.

Repositories Description

Choose Create file to populate the repository with the following artifacts.

Create file

A buildspec.yml file informs CodeBuild of all the actions that should be taken during a build run for our application. We divide the build run into separate predefined phases for logical organization, and list the commands that run on the provisioned build server performing a build job.

Enter the following code in the code editor:

YAML

phases:

pre_build:

commands:

- aws cloudformation update-stack-set --stack-set-name StackSet-Test --use-previous-template --parameters ParameterKey=KMSId,ParameterValue=newCustomValue

The preceding AWS CloudFormation command updates a StackSet with the name StackSet-Test. The command results in updating the parameter value of the parameter key KMSId to newCustomValue.

Name the file yml.
Provide an author name and email address.
Choose Commit changes.

Creating a CodeBuild project

To create your CodeBuild project, complete the following steps:

On the CodeBuild console, choose Build projects.
Choose Create build project.

create build project

For Project name, enter your project name (for example, Demo-Build).
For Description, enter an optional description.

project name

For Source provider, choose AWS CodeCommit.
For Repository, choose the CodeCommit repository you created in the previous step.
For Reference type, keep default selection Branch.
For Branch, choose master.

Source configuration

To set up the CodeBuild environment, we use a managed image based on Amazon Linux 2.

For Environment Image, select Managed image.
For Operating system, choose Amazon Linux 2.
For Runtime(s), choose Standard.
For Image, choose amazonlinux2-aarch64-standard:1.0.
For Image version, choose Always use the latest for this runtime version.

Environment

For Service role¸ select New service role.
For Role name, enter your service role name.

Service Role

Chose Create build project.

Creating a CodePipeline pipeline

To create your pipeline, complete the following steps:

On the CodePipeline console, choose Pipelines.
Choose Create pipeline

Code Pipeline

For Pipeline name, enter a name for the pipeline (for example, DemoPipeline).
For Service role, select New service role.
For Role name, enter your service role name.

Pipeline name

Choose Next.
For Source provider, choose AWS CodeCommit.
For Repository name, choose the repository you created.
For Branch name, choose master.

Source Configurations

Choose Next.
For Build provider, choose AWS CodeBuild.
For Region, choose your Region.
For Project name, choose the build project you created.

CodeBuild

Choose Next.
Choose Skip deploy stage.
Choose Skip
Choose Create pipeline.

The pipeline is now created successfully.

Running and monitoring your pipeline

We use the pipeline to release changes. By default, a pipeline starts automatically when it’s created and any time a change is made in a source repository. You can also manually run the most recent revision through your pipeline, as in the following steps:

On the CodePipeline console, choose the pipeline you created.
On the pipeline details page, choose Release change.

The following screenshot shows the status of the run from the pipeline.

Release change

Under Build, choose Details to view build logs, phase details, reports, environment variables, and build details.

Build details

Choose the Build logs tab to view the logs generated as a result of the build in more detail.

The following screenshot shows that we ran the AWS CloudFormation command that was provided in the buildspec.yml file. It also shows that all phases of the build process are successfully complete.

Phase Details

The StackSet parameter KMSId has been updated successfully with the new value newCustomValue as a result of running the pipeline. Please note that we used the parameter KMSId as an example for demonstration purposes. Any other parameter that is part of your StackSet could have been used instead.

Cleaning up

You may delete the resources that you created during this post:

AWS CloudFormation StackSet.
AWS CodeCommit repository.
AWS CodeBuild project.
AWS CodePipeline.

Conclusion

In this post, we explored how to use CodePipeline, CodeBuild, and CodeCommit to update an existing CloudFormation StackSet. Happy coding!

About the author

Karim Afifi is a Solutions Architect Leader with Amazon Web Services. He is part of the Global Life Sciences Solution Architecture team. team. He is based out of New York, and enjoys helping customers throughout their journey to innovation.

Field Notes: Accelerating Data Science with RStudio and Shiny Server on AWS Fargate

2021-06-22 Chayan Panda

Post Syndicated from Chayan Panda original https://aws.amazon.com/blogs/architecture/field-notes-accelerating-data-science-with-rstudio-and-shiny-server-on-aws-fargate/

Data scientists continuously look for ways to accelerate time to value for analytics projects. RStudio Server is a popular Integrated Development Environment (IDE) for R, which is used to render analytics visualizations for faster decision making. These visualizations are traditionally hosted on legacy unix servers along with Shiny Server to support analytics. In this previous blog, we provided a solution architecture to run Data Science use cases for medium to large enterprises across industry verticals.

In this post, we describe and deliver the infrastructure code to run a secure, scalable and highly available RStudio and Shiny Server installation on AWS. We use these services: AWS Fargate, Amazon Elastic Container Service (Amazon ECS), Amazon Elastic File System (Amazon EFS), AWS DataSync, and Amazon Simple Storage Service (Amazon S3). We will then demonstrate a Data Science use case in RStudio and create an application on Shiny. The use case discussed involves pre-processing a dataset, and training a machine learning model in RStudio. The goal is to build a shiny application to surface breast cancer prediction insights against a set of parameters to users.

Overview of solution

We show how to deploy a Open Source RStudio Server and a Shiny Server in a serverless architecture from an automated deployment pipeline built with AWS Developer Tools. This is illustrated in the diagram that follows. The deployment adheres to best practices for following an AWS Multi-Account strategy using AWS Organizations.

Figure 1. RStudio/Shiny Open Source Deployment Pipeline on AWS Serverless Infrastructure

Multi-Account Setup

In the preceding architecture, a central development account hosts the development resources. From this account, the deployment pipeline creates AWS services for RStudio and Shiny along with the integrated services into another AWS account. There can be multiple RStudio/Shiny accounts and instances to suit your requirements. You can also host multiple non-production instances of RStudio/Shiny in a single account.

Public URL Domain and Data Feed

The RStudio/Shiny deployment accounts obtain the networking information for the publicly resolvable domain from a central networking account. The data feed for the containers comes from a central data repository account. Users upload data to the S3 buckets in the central data account or configure an automated service like AWS Transfer Family to programmatically upload files. AWS DataSync transfers the uploaded files from Amazon S3 and stores the files on Amazon EFS mount points on the containers. Amazon EFS provides shared, persistent, and elastic storage for the containers.

Security Footprint

We recommend that you configure AWS Shield or AWS Shield Advanced for the networking account and enable Amazon GuardDuty in all accounts. You can also use AWS Config and AWS CloudTrail for monitoring and alerting on security events before deploying the infrastructure code. You should use an outbound filter such as AWS Network Firewall for network traffic destined for the internet. AWS Web Application Firewall (AWS WAF) protects the Amazon Elastic Load Balancers (Amazon ELB). You can restrict access to RStudio and Shiny from only allowed IP ranges using the automated pipeline.

High Availability

You deploy all AWS services in this architecture in one particular AWS Region. The AWS services used are managed services and configured for high availability. Should a service become unavailable, it automatically launches in the same Availability Zone (AZ) or in a different AZ within the same AWS Region. This means if Amazon ECS restarts the container in another AZ, following a failover, the files and data for the container will not be lost as these are stored on Amazon EFS.

Deployment

The infrastructure code provided in this blog creates all resources described in the preceding architecture. The following numbered items refer to Figure 1.

1. We used AWS Cloud Development Kit (AWS CDK) for Python to develop the infrastructure code and stored the code in an AWS CodeCommit repository.
2. AWS CodePipeline integrates the AWS CDK stacks for automated builds. The stacks are divided into four different stages and are organized by AWS service.
3. AWS CodePipeline fetches the container images from public Docker Hub and stores the images into Amazon Elastic Container Registry (Amazon ECR) repositories for cross-account access. The deployment pipeline accesses these images to create the Amazon ECS container on AWS Fargate in the deployment accounts.
4. The build script uses a key from AWS Key Management Service (AWS KMS) to create secrets. These include a RStudio front-end password, public key for bastion containers, and central data account access keys in AWS Secrets Manager. The deployment pipeline uses these secrets to configure the cross-account containers.
5. The central networking account Amazon Route 53 has the pre-configured base public domain. This is done outside the automated pipeline and the base domain info is passed on as a parameter to the deployment pipeline.
6. The central networking account delegates the base public domain to the RStudio deployment accounts via AWS Systems Manager (SSM) Parameter Store.
7. An AWS Lambda function retrieves the delegated Route 53 zone for configuring the RStudio and Shiny sub-domains.
8. AWS Certificate Manager configures encryption in transit by applying HTTPS certificates on the RStudio and Shiny sub-domains.
9. The pipeline configures an Amazon ECS cluster to control the RStudio, Shiny and Bastion containers and to scale up and down the number of containers as needed.
10. The pipeline creates RStudio container for the instance in a private subnet. RStudio container is not horizontally scalable for the Open Source version of RStudio.
– If you create only one container, the container will be configured for multiple front-end users. You need to specify the user names as email ids in cdk.json.
– Users receive their passwords and Rstudio/Shiny URLs via emails using Amazon Simple Email Service (SES).
– You can also create one RStudio container for each Data Scientist depending on your compute requirements by setting the cdk.json parameter individual_containers to true. You can also control the container memory/vCPU using cdk.json.
– Further details are provided in the readme. If your compute requirements exceed Fargate container compute limits, consider using EC2 launch type of Amazon ECS which offers a range of Amazon EC2 servers to fit your compute requirement. You can specify your installation type in cdk.json and choose either Fargate or EC2 launcg type for your RStudio containers.
11. To help you SSH to RStudio and Shiny containers for administration tasks, the pipeline creates a Bastion container in the public subnet. A Security Group restricts access to the bastion container and you can only access it from the IP range you provide in the cdk.json.
12. Shiny containers are horizontally scalable and the pipeline creates the Shiny containers in the private subnet using Fargate launch type of Amazon ECS. You can specify the number of containers you need for Shiny Server in cdk.json.
13. Application Load Balancers route traffic to the containers and perform health checks. The pipeline registers the RStudio and Shiny load balancers with the respective Amazon ECS services.
14. AWS WAF rules are built to provide additional security to RStudio and Shiny endpoints. You can specify approved IPs to restrict access to RStudio and Shiny from only allowed IPs.
15. Users upload files to be analysed to a central data lake account either with manual S3 upload or programmatically using AWS Transfer for SFTP.
16. AWS DataSync transfers files from Amazon S3 to cross-account Amazon EFS on an hourly interval schedule.
17. An AWS Lambda initiates DataSync transfer on demand outside of the hourly schedule for files that require urgent analysis. It is expected that bulk of the data transfer will happen on the hourly schedule and on-demand trigger will only be used when necessary.
18. Amazon EFS file systems provide shared, persistent and elastic storage for the containers. This is to facilitate deployment of Shiny Apps from RStudio containers using shared file system. The EFS file systems will live through container recycles.
19. You can create Amazon Athena tables on the central data account S3 buckets for direct interaction using JDBC from the RStudio container. Access keys for cross account operation are stored in the RStudio container R environment.

Note: It is recommended that you implement short term credential vending for this operation.

The source code for this deployment can be found in the aws-samples GitHub repository.

Prerequisites

To deploy the cdk stacks from the source code, you should have the following prerequisites:

1. Access to four AWS accounts (minimum three) for a basic multi-account deployment.
2. Permission to deploy all AWS services mentioned in the solution overview.
3. Review RStudio and Shiny Open Source Licensing: AGPL v3 (https://www.gnu.org/licenses/agpl-3.0-standalone.html)
4. Basic knowledge of R, RStudio Server, Shiny Server, Linux, AWS Developer Tools (AWS CDK in Python, AWS CodePipeline, AWS CodeCommit), AWS CLI and, the AWS services mentioned in the solution overview
5. Ensure you have a Docker hub login account, otherwise you might get an error while pulling the container images from Docker Hub with the pipeline – You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limits.
6. Review the readmes delivered with the code and ensure you understand how the parameters in cdk.json control the deployment and how to prepare your environment to deploy the cdk stacks via the pipeline detailed below.

Installation

Create the AWS accounts to be used for deployment and ensure you have admin permissions access to each account. Typically, the following accounts are required:

Central Development account – this is the account where the AWS Secret Manager parameters, AWS CodeCommit repository, Amazon ECR repositories, and AWS CodePipeline will be created.
Central Network account – the Route53 base public domain will be hosted in this account
Rstudio instance account – You can use as many of these accounts as required, this account will deploy RStudio and Shiny containers for an instance (dev, test, uat, prod) along with a bastion container and associated services as described in the solution architecture.
Central Data account – this is the account to be used for deploying the data lake resources – such as S3 bucket for selecting up ingested source files.

1. Install AWS CLI and create an AWS CLI profile for each account (pipeline, rstudio, network, datalake ) so that AWS CDK can be used.
2. Install AWS CDK in Python and bootstrap each account and allow the Central Development account to perform cross-account deployment to all the other accounts.

export CDK_NEW_BOOTSTRAP=1
npx cdk bootstrap --profile <AWS CLI profile of central development account> --cloudformation-execution-policies arn:aws:iam::aws:policy/AdministratorAccess aws://<Central Development Account>/<Region>

cdk bootstrap \
--profile <AWS CLI profile of rstudio deployment account> \
--trust <Central Development Account> \
--cloudformation-execution-policies arn:aws:iam::aws:policy/AdministratorAccess \
aws://<RStudio Deployment Account>/<Region>

cdk bootstrap \
--profile <AWS CLI profile of central network account> \
--trust <Central Development Account> \
--cloudformation-execution-policies arn:aws:iam::aws:policy/AdministratorAccess \
aws://<Central Network Account>/<Region>

cdk bootstrap \
--profile <AWS CLI profile of central data account> \
--trust <Central Development Account> \
--cloudformation-execution-policies arn:aws:iam::aws:policy/AdministratorAccess \
aws://<Central Data Account>/<Region>

3. Build the Docker container images in Amazon ECR in the central development account by running the image build pipeline as instructed in the readme.
a. Using the AWS console, create an AWS CodeCommit repository to hold the source code for building the images – for example, rstudio_docker_images.
b. Clone the GitHub repository and move into the image-build folder.
c. Using the CLI – Create a secret to store your DockerHub login details as follows:

aws secretsmanager create-secret --profile <AWS CLI profile of central development account> --name ImportedDockerId --secret-string '{"username":"<dockerhub username>", "password":"<dockerhub password>"}'

d. Create an AWS CodeCommit repository to hold the source code for building the images – e.g. rstudio_docker_images and pass the repository name to the name parameter in cdk.json for the image build pipeline
e. Pass the account numbers (comma separated) where rstudio instances will be deployed in the cdk.json paramter rstudio_account_ids
f. Synthesize the image build stack

cdk synth --profile <AWS CLi profile of central development account>

g. Commit the changes into the AWS CodeCommit repo you created using GitHub.
h. Deploy the pipeline stack for container image build.

cdk deploy --profile <AWS CLI profile of central development account>

i. Log into AWS console in the central development account and navigate to CodePipeline service. Monitor the pipeline (pipeline name is the name you provided in the name parameter in cdk.json) and confirm the docker images build successfully.

4. Move into the rstudio-fargate folder. Provide the comma separated accounts where rstudio/shiny will be deployed in the cdk.json against the parameter rstudio_account_ids.

5. Synthesize the stack Rstudio-Configuration-Stack in the Central Development account.

cdk synth Rstudio-Configuration-Stack --profile <AWS CLI profile of central development account>

6. Deploy the Rstudio-Configuration-Stack. This stack should create a new CMK KMS Key to use for creating the secrets with AWS Secrets Maanger. The stack will output the AWS ARN for the KMS key. Note down the ARN. Set the parameter “encryption_key_arn” inside cdk.json to the above ARN.

cdk deploy Rstudio-Configuration-Stack --profile <AWS CLI profile of rstudio deployment account>

7. Run the script rstudio_config.sh after setting the required cdk.json parameters. Refer readme.

sh ./rstudio_config.sh <AWS CLI profile of the central development account> "arn:aws:kms:<region>:<AWS CLI profile of central development account>:key/<key hash>" <AWS CLI profile of central data account> <comma separated AWS CLI profiles of the rstudio deployment accounts>

8. Run the script check_ses_email.sh with comma separated profiles for rstudio deployment accounts. This will check whether all user emails have been registed with Amazon SES for all the rstudio deployment accounts in the region, before you can deploy rstudio/shiny.

sh ./check_ses_email.sh <comma separated AWS CLI profiles of the rstudio deployment accounts>

9. Before committing the code into the AWS CodeCommit repository, synthesize the pipeline stack against all the accounts involved in this deployment. This ensures all the necessary context values are populated into cdk.context.json file and to avoid the DUMMY values being mapped.

cdk synth --profile <AWS CLI profile of the central development account>
cdk synth --profile <AWS CLI profile of the central network account>
cdk synth --profile <AWS CLIrepeat for each profile of the RStudio deplyment account>

10. Deploy the Rstudio Fargate pipeline stack.

cdk deploy --profile <AWS CLI profile of the central development account> Rstudio-Piplenine-Stack

Data Science use case

Now the installation is completed. We can demonstrate a typical data science use case:

Explore, and pre-process a dataset, and train a machine learning model in RStudio,
Build a Shiny application that makes prediction against the trained model to surface insight to dashboard users.

This showcases how to publish a Shiny application from RStudio containers to Shiny containers via a common EFS filesystem.

First, we log on to the RStudio container with the URL from the deployment and clone the accompanying repository using the command line terminal. The ML example is in ml_example directory. We use the UCI Breast Cancer Wisconsin (Diagnostic) dataset from mlbench library. Refer to the ml_example/breast_cancer_modeling.r.

$ git clone https://github.com/aws-samples/aws-fargate-with-rstudio-open-source.git

Figure 2 – Use the terminal to clone the repository in RStudio IDE.

Let’s open the ml_example/breast_cancer_modeling.r script in the RStudio IDE. The script does the following:

Install and import the required libraries, mainly caret, a popular machine learning library, and mlbench, a collection of ML datasets;
Import the UCI breast cancer dataset, create an 80/20 split for training and testing (in shiny app) purposes;
Perform preprocessing to impute the missing values (shown as NA) in the dataframe and standardize the numeric columns;
Train a stochastic gradient boosting model with cross-validation with the area under the ROC curve (AUC) as the tuning metric;
Save the testing split, preprocessing object and the trained model into the directory where shiny app script is located/breast-cancer-prediction.

You can execute the whole script with this command in the console.

> source('~/aws-fargate-with-rstudio-open-source/ml_example/breast_cancer_modeling.r')

We can then inspect the model evaluation in the model object gbmFit.

> gbmFit
Stochastic Gradient Boosting 

560 samples
  9 predictor
  2 classes: 'benign', 'malignant' 

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times) 
Summary of sample sizes: 504, 505, 503, 504, 504, 504, ... 
Resampling results across tuning parameters:

  interaction.depth  n.trees  ROC        Sens       Spec     
  1                   50      0.9916391  0.9716967  0.9304474
  1                  100      0.9917702  0.9700676  0.9330789
  1                  150      0.9911656  0.9689790  0.9305000
  2                   50      0.9922102  0.9708859  0.9351316
  2                  100      0.9917640  0.9681682  0.9346053
  2                  150      0.9910501  0.9662613  0.9361842
  3                   50      0.9922109  0.9689865  0.9381316
  3                  100      0.9919198  0.9684384  0.9360789
  3                  150      0.9912103  0.9673348  0.9345263

If the results are as expected, move on to developing a dashboard and publishing the model for business users to consume the machine learning insights.

In the repository, ml_example/breast-cancer-prediction/app.R has a Shiny application that displays a summary statistics and distribution of the testing data, and an interactive dashboard. This allows users to select data points on the chart and understand get the machine learning model inference as needed. Users can also modify the threshold to alter the specificity and sensitivity of the prediction. Thanks to the shared EFS filesystem across the RStudio and Shiny containers, we can publish the Shiny application with the following shell command to /srv/shiny-server.

$ cp ~/aws-fargate-with-rstudio-open-source/ml_example/breast-cancer-prediction/ \ /srv/shiny-server/ -rfv

That’s it. The Shiny application is now on the Shiny containers accessible from the Shiny URL, load balanced by Application Load Balancer. You can slide over the Probability Threshold to test how it changes the total count in the prediction, change the variables for the scatter plot and select data points to test the individual predictions.

Figure 3 – The Shiny Application

Cleaning up

Please follow the readme in the repository to delete the stacks created.

Conclusion

In this blog, we demonstrated how a serverless architecture can be deployed, walked through a data science use case in RStudio server and deployed an interactive dashboard in Shiny server. The solution creates a scalable, secure, and serverless data science environment for the R community that accelerates the data science process. The infrastructure and data science code is available in the github repository.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Building leaderboard functionality with serverless data analytics

2021-06-21 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-serverless-applications-with-streaming-data-part-4/

Part 1 explains the application’s functionality, how to deploy to your AWS account, and provides an architectural review. Part 2 compares different ways to ingest streaming data into Amazon Kinesis Data Streams and shows how to optimize shard capacity. Part 3 uses Amazon Kinesis Data Firehose with AWS Lambda to implement the all-time rankings functionality.

This post walks through the leaderboard functionality in the application. This solution uses Kinesis Data Streams, Lambda, and Amazon DynamoDB to provide both real-time and historical rankings.

To set up the example, visit the GitHub repo and follow the instructions in the README.md file. Note that this walkthrough uses services that are not covered by the AWS Free Tier and incur cost.

Overview of leaderboards in Alleycat

Alleycat races are 5 minutes long and run continuously throughout the day for each class. Competitors select a class type and automatically join the current race by choosing Start Race. There are up to 1,000 competitors per race. To track performance, there are two types of leaderboard in Alleycat: Race results for completed races and the Realtime rankings for active races.

The user selects a completed race from the dropdown to a leaderboard ranking of results. This table is not real time and only reflects historical results.
The user selects the Here now option to see live results for all competitors in the current virtual race for the chosen class.

Architecture overview

The backend microservice supporting these features is located in the 1-streaming-kds directory in the GitHub repo. It uses the following architecture:

The tumbling window function receives results every second from Kinesis Data Streams. This function aggregates metrics and saves the latest results to the application’s DynamoDB table.
DynamoDB Streams invoke the publishing Lambda function every time an item is changed in the table. This reformats the message and publishes to the application’s IoT topic in AWS IoT Core to update the frontend.
The final results function is an additional consumer on Kinesis Data Streams. It filters for only the last result in each competitor’s race and publishes the results to the DynamoDB table.
The frontend calls an Amazon API Gateway endpoint to fetch the historical results for completed races via a Lambda function.

Configuring the tumbling window function

This Lambda function aggregates data from the Kinesis stream and stores the result in the DynamoDB table:

This function receives batches of 1,000 messages per invocation. The messages may contain results for multiple races and competitors. The function groups the results by race and racer ID and then flattens the data structure. It writes an item to the DynamoDB table in this format:

{
 "PK": "race-5403307",
 "SK": "results",
 "ts": 1620992324960,
 "results": "{\"0\":167.04,\"1\":136,\"2\":109.52,\"3\":167.14,\"4\":129.69,\"5\":164.97,\"6\":149.86,\"7\":123.6,\"8\":154.29,\"9\":89.1,\"10\":137.41,\"11\":124.8,\"12\":131.89,\"13\":117.18,\"14\":143.52,\"15\":95.04,\"16\":109.34,\"17\":157.38,\"18\":81.62,\"19\":165.76,\"20\":181.78,\"21\":140.65,\"22\":112.35,\"23\":112.1,\"24\":148.4,\"25\":141.75,\"26\":173.24,\"27\":131.72,\"28\":133.77,\"29\":118.44}",
 "GSI": 2
}

This transformation significantly reduces the number of writes to the DynamoDB table per function invocation. Each time a write occurs, this triggers the publishing process and notifies the frontend via the IoT topic.

DynamoDB can handle almost any number of writes to the table. You can set up to 40,000 write capacity units with default limits and can request even higher limits via an AWS Support ticket. However, you may want to limit the throughput to reduce the WCU cost.

The tumbling window feature of Lambda allows invocations from a streaming source to pass state between invocations. You can specify a window interval of up to 15 minutes. During the window, a state is passed from one invocation to the next, until a final invocation at the end of the window. Alleycat uses this feature to buffer aggregated results and only write the output at the end of the tumbling window:

For a tumbling window period of 5 seconds, this means that the Lambda function is invoked multiple times, passing an intermediate state from invocation to invocation. Once the window ends, it then writes the final aggregated result to DynamoDB. The tradeoff in this solution is that it reduces the number of real-time notifications to the frontend since these are published from the table’s stream. This increases the latency of live results in the frontend application.

Implementing tumbling windows in Lambda functions

The template.yaml file describes the tumbling Lambda function. The event definition specifies the tumbling window duration in the TumblingWindowInSeconds attribute:

  TumblingWindowFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: tumblingFunction/    
      Handler: app.handler
      Runtime: nodejs14.x
      Timeout: 15
      MemorySize: 256
      Environment:
        Variables:
          DDB_TABLE: !Ref DynamoDBtableName
      Policies:
        DynamoDBCrudPolicy:
          TableName: !Ref DynamoDBtableName
      Events:
        Stream:
          Type: Kinesis
          Properties:
            Stream: !Sub "arn:aws:kinesis:${AWS::Region}:${AWS::AccountId}:stream/${KinesisStreamName}"
            BatchSize: 1000
            StartingPosition: TRIM_HORIZON      
            TumblingWindowInSeconds: 15

When you enable tumbling windows, the function’s event payload contains several new attributes:

{
    "Records": [
        {
 	    ...
        }
    ],
    "shardId": "shardId-000000000000",
    "eventSourceARN": "arn:aws:kinesis:us-east-2:123456789012:stream/alleycat",
    "window": {
        "start": "2021-05-05T18:51:00Z",
        "end": "2021-05-05T18:51:15Z"
    },
    "state": {},
    "isFinalInvokeForWindow": false,
    "isWindowTerminatedEarly": false
}

These include:

Window start and end: The beginning and ending timestamps for the current tumbling window.
State: An object containing the state returned from the previous invocation, which is initially empty in a new window. The state object can contain up to 1 MB of data.
isFinalInvokeForWindow: Indicates if this is the last invocation for the current window. This only occurs once per window period.
isWindowTerminatedEarly: A window ends early if the state exceeds the maximum allowed size of 1 MB.

The event handler in app.js uses tumbling windows if they are defined in the AWS SAM template:

// Main Lambda handler
exports.handler = async (event) => {

	// Retrieve existing state passed during tumbling window	
	let state = event.state || {}
	
	// Process the results from event
	let jsonRecords = getRecordsFromPayload(event)
	jsonRecords.map((record) => raceMap[record.raceId] = record.classId)
	state = getResultsByRaceId(state, jsonRecords)

	// If tumbling window is not configured, save and exit
	if (event.window === undefined) {
		return await saveCurrentRaces(state) 
	}

	// If tumbling window is configured, save to DynamoDB on the 
	// final invocation in the window
	if (event.isFinalInvokeForWindow) {
		await saveCurrentRaces(state) 
	} else {
		return { state }
	}
}

The final results function and API

The final results function filters for the last event in each competitor’s race, which contains the final score. The function writes each score to the DynamoDB table. Since there may be many write events per invocation, this function uses the Promise.all construct in Node.js to complete the database operations in parallel:

const saveFinalScores = async (raceResults) => {
	let paramsArr = []

	raceResults.map((result) => {
		paramsArr.push({
		  TableName : process.env.DDB_TABLE,
		  Item: {
		    PK: `race-${result.raceId}`,
		    SK: `racer-${result.racerId}`,
		    GSI: result.output,
		    ts: Date.now()
		  }
		})
	})
	// Save to DDB in parallel
	await Promise.all(paramsArr.map((params) => documentClient.put (params).promise()))
}

Using this approach, each call to documentClient.put is made without waiting for a response from the SDK. The call returns a Promise object, which is in a pending state until the database operation returns with a status. Promise.all waits for all promises to resolve or reject before code execution continues. Comparing the serial and concurrent approach, this reduces the overall time for multiple database writes. The tradeoff is that it increases the number of writes to DynamoDB and the number of WCUs consumed.

For a large number of put operations to DynamoDB, you can also use the DocumentClient’s batchWrite operation. This delegates to the underlying DynamoDB BatchWriteItem operation in the AWS SDK and can accept up to 25 separate put requests in a single call. To handle more than 25, you can make multiple batchWrite requests and still use the parallelized invocation method shown above.

The DynamoDB table maintains a list of race results with one item per racer per race:

The frontend calls an API Gateway endpoint that invokes the getLeaderboard function. This function uses the DocumentClient’s query API to return results for a selected race, sorted by a global secondary index containing the final score:

const AWS = require('aws-sdk')
AWS.config.region = process.env.AWS_REGION 
const documentClient = new AWS.DynamoDB.DocumentClient()

// Main Lambda handler
exports.handler = async (event) => {

    const classId = parseInt(event.queryStringParameters.classId)

    const params = {
        TableName: process.env.DDB_TABLE,
        IndexName: 'GSI_PK_Index',
        KeyConditionExpression: 'PK = :ID',
        ExpressionAttributeValues: {
          ':ID': `class-${classId}`
        },
        ScanIndexForward: false,
        Limit: 1000
    }   

    const result = await documentClient.query(params).promise()   
    return result.Items
}

By default, this returns the top 1,000 places by using the Limit parameter. You can customize this or use pagination to implement fetching large result sets more efficiently.

Conclusion

In this post, I explain the all-time leaderboard logic in the Alleycat application. This is an asynchronous, eventually consistent process that checks batches of incoming records for new personal best records. This uses Kinesis Data Firehose to provide a zero-administration way to deliver and process large batches of records continuously.

This post shows the architecture in Alleycat and how this is defined in AWS SAM. Finally, I walk through how to build a data transformation Lambda function that decodes a payload and returns records back to Kinesis.

Part 5 discusses how to troubleshoot issues in Alleycat and how to monitor streaming applications generally.

For more serverless learning resources, visit Serverless Land.

Field Notes: SQL Server Deployment Options on AWS Using Amazon EC2

2021-06-18 Saqlain Tahir

Post Syndicated from Saqlain Tahir original https://aws.amazon.com/blogs/architecture/field-notes-sql-server-deployment-options-on-aws-using-amazon-ec2/

Many enterprise applications run Microsoft SQL Server as their backend relational database. There are various options for customers to benefit from deploying their SQL Server on AWS. This blog will help you choose the right architecture for your SQL Server Deployment with high availability options, using Amazon EC2 for mission-critical applications.

SQL Server on Amazon EC2 offers more efficient control of deployment options and enables customers to fine-tune their Microsoft workload performance with full control. Most importantly you can bring your own licenses (BYOL) on AWS. You can re-host, aka “lift and shift”, your SQL Server to Amazon Elastic Compute Cloud (Amazon EC2) for large scale Enterprise applications. If you are re-hosting, you can still use your existing SQL Server license on AWS. Lifting and shifting your on-premises MS SQL Server environment to AWS using Amazon EC2 is recommended to migrate your SQL Server workloads to the cloud.

First, it is important to understand the considerations for deploying a SQL Server using Amazon EC2. For example, when would you want to use Failover Cluster over Availability Groups?

The following table will help you to choose the right architecture for SQL Server architecture based on the type of workload and availability requirements:

Following table will help you to choose the right architecture for SQL Server architecture based on the type of workload and high availability requirements:

Self-managed MS SQL Server on EC2 usually means hosting MS SQL on EC2 backed by Amazon Elastic Block Store (EBS) or Amazon FSx for Windows File Server. Persistent storage from Amazon EBS and Amazon FSx delivers speed, security, and durability for your business-critical relational databases such as Microsoft SQL Server.

Amazon EBS delivers highly available and performant block storage for your most demanding SQL Server deployments to achieve maximum performance.
Amazon FSx delivers fully managed Windows native shared file storage (SMB) with a multi-Availability Zone (AZ) design for highly available (HA) SQL environments.

Previously, if you wanted to migrate your Failover Cluster SQL databases to AWS, there was no native shared storage option. You would need to implement third party solutions that added a cost and complexity to install, set up, and maintain the storage configuration.

Amazon FSx for Windows File Server provides shared storage that multiple SQL databases can connect to across multiple AZs for a DR and HA solution. It is also helpful to achieve throughput and certain IOPS without scaling up the instance types to get the same IOPS from EBS volumes.

Overview of solution

Most customers need High Availability (HA) for their SQL Server production environment to ensure uptime and availability. This is important to minimize changes to the SQL Server applications while migrating. Customers may want to protect their investment in Microsoft SQL Server licenses by taking a Bring your own license (BYOL) approach to cloud migration.

There are some scenarios where applications running on Microsoft SQL Server need full control of the infrastructure and software. If customers require it, they can deploy their SQL Server to AWS on Amazon EC2. Currently, there are three ways to deploy SQL Server workloads on AWS as shown in the following diagram:

There are some scenarios where applications running on Microsoft SQL Server need full control of the infrastructure and software. If customers require it, they can deploy their SQL Server to AWS on Amazon EC2. Currently, there are various ways to deploy SQL Server workloads on AWS as shown in the following diagram:

Walkthrough

Now the question comes, how do you deploy the preceding SQL Server architectures?

First, let’s discuss the high-level breakdown of deployment options including the two types of SQL HA modes:

Standalone
- Single SQL Server Node without HA
- Provision Amazon EC2 instance with EBS volume
- Single Availability Zone deployment
Always On Failover Cluster Instance (FCI): EC2 and FSx for Windows File Server
- Protects the whole instance, including system databases
- Failovers over at the instance level
- Requires Shared Storage, Amazon FSx for Windows File Server is a great option
- Can be used in conjunction with Availability Groups to provide read-replicas and DR copies (dependent upon SQL Server Edition)
- Can be implemented at the Enterprise or Standard Edition level (with limitations)
- Multi Availability Zone Deployment
Always On Availability Groups (AG): EC2 and EBS
- Protects one or more user databases (Standard Edition is limited to a single user database per AG)
- Failover is at the Availability Group level, meaning potentially only a subset of user databases can failover versus the whole instance
- System databases are not replicated, meaning users, jobs etc. will not automatically appear on passive nodes, manual creation is needed on all nodes
- Natively provides access to read-replicas and DR copies (dependent upon Edition)
- Can be implemented at the Enterprise or Standard
- Multi Availability Zone Deployment

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
SQL Server Licenses in case of BYOL Deployment
Identify Software and Hardware requirements for SQL Server Environment
Identify SQL Server application requirements based on best practices in this deployment guide

Deployment options on AWS

Here are some tools and services provided by AWS to deploy the SQL Server production ready environment by following best practices.

SQL Server on the AWS Cloud: Quick Start Reference Deployment

Use Case:

You want to deploy SQL Server on AWS for a Proof of Concept (PoC) or Pilot deployment using CloudFormation templates within hours by following these best practices.

Overview:

The Quick Start deployment guide provides step-by-step instructions for deploying SQL Server on the Amazon Web Services (AWS) Cloud, using AWS CloudFormation templates and AWS Systems Manager Automation documents that automate the deployment.

SQL Server on the AWS Cloud: Quick Start Reference Deployment

Implementation:

Quick Start Link: SQL Server with WSFC Quick Start

Source Code: GitHub

SQL Server Always On Deployments with AWS Launch Wizard

Use Case:

You intend to deploy SQL Server on AWS for your production workloads to benefit from automation, time and cost savings, and most importantly by leveraging proven deployment best practices from AWS.

Overview:

AWS Launch Wizard is a service that guides you through the sizing, configuration, and deployment of Microsoft SQL Server applications on AWS, following the AWS Well-Architected Framework. AWS Launch Wizard supports both single instance and high availability (HA) application deployments.

AWS Launch Wizard reduces the time it takes to deploy SQL Server solutions to the cloud. You input your application requirements, including performance, number of nodes, and connectivity, on the service console. AWS Launch Wizard identifies the right AWS resources to deploy and run your SQL Server application. You can also receive an estimated cost of deployment, modify your resources and instantly view the updated cost assessment.

When you approve, AWS Launch Wizard provisions and configures the selected resources in a few hours to create a fully-functioning production-ready SQL Server application. It also creates custom AWS CloudFormation templates, which can be reused and customized for subsequent deployments.

Once deployed, your SQL Server application is ready to use and can be accessed from the EC2 console. You can manage your SQL Server application with AWS Systems Manager.

SQL Server Always On Deployments with AWS Launch Wizard

Implementation:

AWS Launch Wizard Link: AWS Launch Wizard for SQL Server

Simplify your Microsoft SQL Server high availability deployments using Amazon FSx for Windows File Server

Use Case:

You need SQL enterprise edition to run an Always on Availability Group (AG), whereas you only need the standard edition to run Failover Cluster Instance (FCI). You want to use standard licensing to save costs but want to achieve HA. SQL Server Standard is typically 40–50% less expensive than the Enterprise Edition.

Overview:

Always On Failover Cluster (FCI) uses block level replication rather than database-level transactional replication. You can migrate to AWS without re-architecting. As the shared storage handles replication you don’t need to use SQL nodes for it, and frees up CPU/Memory for primary compute jobs. With FCI, the entire instance is protected – if the primary node becomes unavailable, the entire instance is moved to the standby node. This takes care of the SQL Server logins, SQL Server Agent jobs, and certificates that are stored in the system databases. These are physically stored in shared storage.

Simplify your Microsoft SQL Server high availability deployments using Amazon FSx for Windows File Server

Implementation:

FCI implementation: SQL Server Deployment using FCI, FSx QuickStart.

Clustering for SQL Server High Availability using SIOS Data Keeper

Use Case:

Windows Server Failover Clustering is a requirement if you are using SQL Server Enterprise or the SQL Server Standard edition and it might appear to be the perfect HA solution for applications running on Windows Server. But like FCIs, it requires the use of shared storage. If you want to use software SAN across multiple instances, then SIOS Data Keeper can be an option.

Overview:

WSFC has a potential role to play in many HA configurations, including for SQL Server FCIs, but its use requires separate data replication provisions in a SANless environment, whether in an enterprise datacenter or in the cloud. SIOS data keeper is a partner solution software SAN across multiple instances. Instead of FSx, you deploy another cluster for SIOS data keeper to host the shared volumes or use a hyper-converged model to deploy SQL Server on the same server as the SIOS data keeper. You can also use SIOS DataKeeper Cluster Edition, a highly optimized, host-based replication solution.

Clustering for SQL Server High Availability using SIOS Data Keeper

Implementation:

QuickStart: SIOS DataKeeper Cluster Edition on the AWS Cloud

Conclusion

In this blog post, we covered the different options of SQL Server Deployment on AWS using EC2. The options presented showed how you can have the exact same experience from an administration point of view, as well as full control over your EC2 environment, including sysadmin and root-level access.

We also showed various ways to achieve High Availability, by deploying SQL Server on AWS as a new environment using AWS QuickStart and AWS Launch Wizard. We also showed how you can deploy SQL Server using AWS managed windows storage Amazon FSx to handle shared storage constraint, cost and IOPS requirement scenarios. If you need shared storage in the cloud outside the Windows FSx option, AWS supports a partner solution using SIOS DataKeeper Cluster Edition.

We hope you found this blog post useful and welcome your feedback in the comments!

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Develop Data Pre-processing Scripts Using Amazon SageMaker Studio and an AWS Glue Development Endpoint

2021-06-17 Sam Mokhtari

Post Syndicated from Sam Mokhtari original https://aws.amazon.com/blogs/architecture/field-notes-develop-data-pre-processing-scripts-using-amazon-sagemaker-studio-and-an-aws-glue-development-endpoint/

This post was co-written with Marcus Rosen, a Principal – Machine Learning Operations with Rio Tinto, a global mining company.

Data pre-processing is an important step in setting up Machine Learning (ML) projects for success. Many AWS customers use Apache Spark on AWS Glue or Amazon EMR to run data pre-processing scripts while using Amazon SageMaker to build ML models. To develop spark scripts in AWS Glue, you can create an environment called a Glue Development (Dev) Endpoint that lets you author and test your data pre-processing scripts iteratively. When you’re satisfied with the results of your development, you can create a Glue ETL job that runs the final script as part of your automation framework.

With the introduction of Amazon SageMaker Studio in AWS re:Invent 2020, you can now use a single web-based IDE to spin up a notebook and perform all ML development steps. These include data pre-processing, ML model training, ML model deployment and monitoring.

This post walks you through how to connect a SageMaker Studio notebook to an AWS Glue Dev Endpoint, so you can use a single tool to iteratively develop both data pre-processing scripts and ML models.

Solution Overview

The following diagram shows the components that are used in this solution.

First, we use an AWS CloudFormation template to set up the required networking components (for example, VPC, subnets).
Then, we create an AWS Glue Dev Endpoint and use a security group to allow SageMaker Studio to securely access the endpoint.
Finally, we create a studio domain and use a SparkMagic kernel to connect to the AWS Glue Dev Endpoint and run spark scripts.

In the Amazon SageMaker Studio notebook, SparkMagic will call a REST API against a Livy server running on the AWS Glue Dev Endpoint. Apache Livy is a service that enables interaction with a remote Spark cluster over a REST API.

The following diagram shows the components that are used in this solution. We use an AWS CloudFormation template to set up the required ntworking components (for example, VPC, subnets).

Set up the VPC

You can use the following CloudFormation template to set up the environment needed for this solution.

This template deploys the following resources in your account:

A new VPC, with both public and private subnet.
VPC endpoints for the following resources:
- Amazon S3
- AWS Glue
- Amazon SageMaker API
- Amazon SageMaker RunTime
Security groups for SageMaker Studio, Glue endpoint and VPC endpoints
SageMaker Service IAM role
AWS Glue Dev Endpoint IAM role
Set up the AWS Glue Dev Endpoint

Set up AWS Glue Dev Endpoint

Review this Developer Guide: Adding a Development Endpoint for instructions to create an AWS Glue Dev Endpoint.

Note: you must use the AWS Glue Dev Endpoint IAM role provisioned by the CloudFormation template.

In the Networking section, select Choose a VPC, subnet, and security groups.

Then choose the VPC glue security group, which you provisioned through the CloudFormation template.

The AWS Glue Dev Endpoint needs to be secured with an SSH public key, which should be generated within your local environment. An SSH key pair (public/private) can be generated using the ssh-keygen on Linux or using PuTTYgen on Windows.

Glue Dev Endpoint screenshot

The final review page looks similar to the following screenshot.

Final review page

Once the AWS Glue Dev Endpoint is in Ready status, keep note of its private IP address (Glue -> ETL -> Dev Endpoints). You will use this IP for the Livy port forwarding.

Set up SageMaker Studio

We recommend launching the SageMaker Studio resource by following the instructions in Securing Amazon SageMaker Studio connectivity using a private VPC .

Follow these steps when you provision the SageMaker Studio resources:

Select Standard setup with the AWS Identity and Access Management (IAM) authentication method.
Attach a SageMaker Service IAM role, created by the CloudFormation template, to SageMaker Studio.
Under Network and storage, select the same VPC and private subnet as the AWS Glue endpoint.
For the Network Access for Studio option, select VPC Only — SageMaker Studio will use your VPC. Direct internet access is disabled.

Then ensure that the security group with the self-referencing rule is attached. Also, check your other required security groups are attached for SageMaker Studio from the CloudFormation template output.

Connect the SageMaker Studio notebook to the AWS Glue Dev Endpoint

Once you launch the SageMaker Studio and you add the users. Follow these steps to connect the SageMaker Studio notebook to the AWS Glue Dev Endpoint:

Open the Studio and go to the launcher page (by pressing the “+” icon on the top-left of the page.
Under Notebooks and compute resources, select SparkMagic in the dropdown menu and select Notebook.
Then open another launcher page, select SparkMagic in the same dropdown menu and select Image terminal. One thing to note is that the SparkMagic app will take some time to initialize. Proceed once the apps are in Ready status (2-3 minutes).

Notebooks and compute resources screenshot

4. Upload the private key into SparkMagic Image terminal. In other words, copy the private key to “.ssh” directory and update its permissions using “chmod 400”.

Note: the private key is corresponding to the public key used when you create the AWS Glue Dev Endpoint.

5. Now, you need to achieve port forwarding of the Livy service in order for SparkMagic kernel to be able to connect to the AWS Glue Dev Endpoint. You run the following command in the image terminal:

/usr/bin/ssh -4 -N -o ServerAliveInterval=60 -o ServerAliveCountMax=3 -o StrictHostKeyChecking=no -i /root/.ssh/{PRIVATE_KEY} -L 8998:169.254.76.1:8998 glue@{GLUE_ENDPOINT_PRIVATE_IP_ADDRESS}

The command consists of:

{PRIVATE_KEY} is the private key file name that you copied into .ssh directory.
{GLUE_ENDPOINT_PRIVATE_IP_ADDRESS} is the private IP address of the AWS Glue Dev Endpoint.
“8998” is the Livy port we are using for port forwarding.
“169.254.76.1” is the remote IP address defined by AWS Glue, this IP address does not change.

Note: Keep this terminal open and the SSH command running in order to keep the Livy session active.

6. Go to the SparkMagic notebook and restart the kernel, by going to the top menu and selecting Kernel > Restart Kernel.

7. Once the notebook kernel is restarted, the connection between the Studio Notebook and the AWS Glue Dev Endpoint is ready. To test the integration, you can run the following example command to list the tables in the AWS Glue Data Catalog.

spark.sql("show tables").show()

To test the integration, you can run the following command to list the tables in the Glue Data Catalog

Cleaning up

To avoid incurring future charges, delete the resources you created:

Shut down the notebooks you started with Studio.
Delete the Studio domain.
Delete Amazon EFS storage volume for SageMaker Studio
Delete AWS Glue Endpoint
On the AWS CloudFormation console, delete the CloudFormation stack.

Conclusion

Our customers needed a single web-based IDE to spin up a notebook and perform all ML development steps including data pre-processing, ML model training, ML model deployment and monitoring. This blog post demonstrated how you can configure a SageMaker Studio notebook and connect to AWS Glue Dev Endpoint. This provides a framework for you to use when developing both data preprocessing scripts and ML models.

To learn more about how to develop data pre-processing scripts and ML models in Amazon SageMaker, you can check out the examples in this repository.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Increase your e-commerce website reliability using chaos engineering and AWS Fault Injection Simulator

2021-06-17 Bastien Leblanc

Post Syndicated from Bastien Leblanc original https://aws.amazon.com/blogs/devops/increase-e-commerce-reliability-using-chaos-engineering-with-aws-fault-injection-simulator/

Customer experience is a key differentiator for retailers, and improving this experience comes through speed and reliability. An e-commerce website is one of the first applications customers use to interact with your brand.

For a long time, testing an application has been the only way to battle-test an application before going live. Testing is very effective at identifying issues in an application, through processes like unit testing, regression testing, and performance testing. But this isn’t enough when you deploy a complex system such as an e-commerce website. Planning for unplanned events, circumstances, new deployment dependencies, and more is rarely covered by testing. That’s where chaos engineering plays its part.

In this post, we discuss a basic implementation of chaos engineering for an e-commerce website using AWS Fault Injection Simulator.

Chaos engineering for retail

At AWS, we help you build applications following the Well-Architected Framework. Each pillar has a different importance for each customer, but the reliability pillar has consistently been valued as high priority by retailers for their e-commerce website.

One of the recommendations of this pillar is to run game days on your application.

A game day simulates a failure or event to test systems, processes, and team responses. The purpose is to perform the actions the team would perform as if an exceptional event happened. These should be conducted regularly so that your team builds muscle memory of how to respond. Your game days should cover the areas of operations, security, reliability, performance, and cost.

Chaos engineering is the practice of stressing an application in testing or production environments by creating disruptive events, such as a sudden increase in CPU or memory consumption, observing how the system responds, and implementing improvements. E-commerce websites have increased in complexity to the point that you need automated processes to detect the unknown unknowns.

Let’s see how retailers can run game days, applying chaos engineering principles using AWS FIS.

Typical e-commerce components for chaos engineering

If we consider a typical e-commerce architecture, whether you have a monolith deployment, a well-known e-commerce software, or a microservices approach, all e-commerce websites contain critical components. The first task is to identify which components should be tested using chaos engineering.

We advise you to consider specific criteria when choosing which components to prioritize for chaos engineering. From our experience, the first step is to look at your critical customer journey:

Homepage
Search
Recommendations and personalization
Basket and checkout

From these critical components, consider focusing on the following:

High and peak traffic: Some components have specific or unpredictable traffic, such as slots, promotions, and the homepage.
Proven components: Some components have been tested and don’t have any existing issues. If the component isn’t tested, chaos engineering isn’t the right tool. You should return to unit testing, QA, stress testing, and performance testing and fix the known issues, then chaos engineering can help identify the unknown unknowns.

The following are some real-world examples of relevant e-commerce services that are great chaos engineering candidates:

Authentication – This is customer-facing because it’s part of every critical customer journey buying process
Search – Used by most customers, search is often more important than catalog browsing
Products – This is a critical component that is customer-facing
Ads – Ads may not be critical, but have high or peak traffic
Recommendations – A website without recommendations should still be 100% functional (to be checked with hypothesis during experiments), but without personal recommendations, a customer journey is greatly impacted

Solution overview

Let’s go through an example with a simplified recommendations service for an e-commerce application. The application is built with microservices, which is a typical target for chaos experiments. In a microservices architecture, unknown issues are potentially more frequent because of the distributed nature of the development. The following diagram illustrates our simplified architecture.

Recommendations Service Architecture

Following the principles of chaos engineering, we define the following for each scenario:

A steady state
One or multiple hypothesis
One or multiple experimentations to test these hypotheses

Defining a steady state is about knowing what “good” looks like for your application. In our recommendations example, steady state is measured as follows:

Customer latency at p90 between 0.3–0.5 seconds (end-to-end latency when asking for a recommendations)
A success rate of 100% at the 95 percentile

For the sake of simplification of this article, we use a simplified version of a steady state than what is done in a real environment. You could go deeper by checking latency, for example (such as if the answer is fast but wrong). You could also analyze the metrics with an anomaly detection band instead of fixed metrics.

We could test the following situations and what should occur as a result:

What if Amazon DynamoDB isn’t accessible from the recommendations engine? In this case, the recommendations engine should fall back to using Amazon Elasticsearch (Amazon ES) only.
What if Amazon Personalize is slow to answer (over 2 seconds)? Recommendations should be served from a cache or reply with empty responses (which the front end should handle gracefully)
What if failures occur in Amazon Elastic Container Service (Amazon ECS), such as instances in the cluster failing or not being accessible? Scaling should kick in and continue serving customers.

Chaos experiments run the hypotheses and check the outcomes. Initially, we run the experiments individually to avoid any confusion, but going forward we can run these experiments regularly and concurrently (for example, what happens if you introduce failing tasks on Amazon ECS and DynamoDB).

Create an experiment

We measure observability and metrics through X-Ray and Amazon CloudWatch metrics. The service is fronted by a load balancer so we can use the native CloudWatch metrics for the customer-facing state. Based on our definitions, we include the metrics that matter for our customer, as summarized in the following table.

Metric	Steady state	CloudWatch Namespace	Metric Name
Latency	< 0.5 seconds	AWS/X-Ray	ResponseTime
Success Rate	100% at 95 percentile	AWS/X-Ray	OkRate

Now that we have ways to measure a steady state, we implement the hypothesis and experiments in AWS FIS. For this post, we test what happens if failures occur in Amazon ECS.

We use the action aws:ecs:drain-container-instances, which targets the cluster running the relevant task.

Let’s aim for 20% of instances that are impacted by the experiment. You should modify this percentage based on your environment, striking a balance between enough disturbance without failing the entire service.

1. On the AWS FIS console, choose Create experiment template to start creating your experiment.

FIS Home page -> create experiment template

Configure the experiment with an action aws:ecs:drain-container-instances

add action for experiment, drainage 30%, duration: 600sec

Setting up the experiment action using ECS drain instances

Configure the targeted ECS cluster(s) you want to include in your chaos experiment, we recommend to use tags to easily target a component without changing the experiment again.

set target as resource tag, key=chaos, value=true

Definition target for the chaos experiment

Before running an experiment, we have to define the stop conditions. It’s usually a combination of multiple CloudWatch alarms, which could be a manual stop (a specific alarm that can be set to the ALARM state to stop the experiment), but more importantly alarms on business metrics that you define as criteria for the applications to serve your customers. For an e-commerce website, this could be the following:

Error rate over 5%
Search errors over x%
Order or minimum decreased by more than 15% than baseline (for more information about using the anomaly detection prediction band for business metrics, see How to set up CloudWatch Anomaly Detection to set dynamic alarms, automate actions, and drive online sales)

For this post, we focus on error rate.

2. Create a CloudWatch alarm for error rate on the service.

CW graphs : X-ray responsetime to p50 and a second one on p90

Clouwatch alarm conditions : static, greater than 0.5

3. Configure this alarm in AWS FIS as a stop condition.

FIS Stop conditions = RecommendationResponseTime

Run the experiment

We’re now ready to run the experiment. Let’s generate some load on the e-commerce website and see how it copes before and after the experiment. For the purpose of this post, we assume we’re running in a performance or QA environment without actual customers, so the load generated should be representative of the typical load on the application.

In our example, we ingest the load using the open-source tool vegeta. Some general load is generated using a command similar to the following:

echo "GET http://xxxxx.elb.amazonaws.com/recommendations?userID=aaa&amp;currentItemID=&amp;numResults=12&amp;feature=home_product_recs&amp;fullyQualifyImageUrls=1" | vegeta attack -rate=5 -duration 0 &gt; add-results.bin

We created a dedicated CloudWatch dashboard to monitor how the recommendations service is serving customer workload. The steady state looks like the following screenshot.

Dashboard - steady state

The p90 latency is under 0.5 seconds, p90 of success is greater than x% , the number of requests varies, but the response time is steady.

Now let’s start the experiment on AWS FIS console.

FIS - start the experiment

After a few minutes, let’s check how the recommendations service is running.

Dashboard - 1st experiment, Responsetime < SLA, CPU at 80%

The number of tasks running on the ECS cluster has decreased as expected, but the service has enough room to avoid any issue due to losing part of the ECS cluster. However, the average CPU usage starts to go over 80%, so we can suspect that we’re close to saturation.

AWS FIS helped us prove that even with some degradation in the ECS cluster, the service-level agreement was still met.

But what if we increase the impact of the disruption and confirm this CPU saturation assumption? Let’s run the same experiment with more instances drained from the ECS cluster and observe our metrics.

Dashboard - breached SLA on response time, 100% CPU

With less capacity available, the response time has largely exceeded the SLA, and we have reached the limit of the architecture. We would recommend to explore optimizing the architecture with concepts like auto scaling, or caching.

Going further

Now that we have a simple chaos experiment up and running, what are the next steps? One way of expanding on this is by increasing the number of hypotheses.

As a second hypothesis, we suggest adding network latency to the application. Network latency, especially for a distributed application, is a very interesting use case for chaos engineering. It’s not easy to test manually, and often applications are designed with a “perfect” network mindset. We use the action arn:aws:ssm:send-command/AWSFIS-Run-Network-Latency to target the instances running our application.

For more information about actions, see SSM Agent for AWS FIS actions.

However, having only technical metrics (such as latency and success code) lacks a customer-centric view. When running an e-commerce website, customer experience matters. Think about how your customers are using your website and how to measure the actual outcome for a customer.

Conclusion

In this post, we covered a basic implementation of chaos engineering for an e-commerce website using AWS FIS. For more information about chaos engineering, see Principles of Chaos Engineering.

Amazon Fault Injection Simulator is now generally available, you can use it to run chaos experiments today. Click here to learn more

To go beyond these first steps, you should consider increasing the number of experiments in your application, targeting crucial elements, starting with your development and environments and moving gradually to run experiments in production.

Author bio

Bastien Leblanc - Profile Photo Bastien Leblanc is the AWS Retail technical lead for EMEA. He works with retailers focusing on delivering exceptional end-user experience using AWS Services. With a strong background in data and analytics he helps retailers transform their business with cutting-edge AWS technologies.

Field Notes: Benchmarking Performance of the New M5zn, D3, and R5b Instance Types with Datadog

2021-06-15 Ray Zaman

Post Syndicated from Ray Zaman original https://aws.amazon.com/blogs/architecture/field-notes-benchmarking-performance-of-the-new-m5zn-d3-and-r5b-instance-types-with-datadog/

This post was co-written with Danton Rodriguez, Product Manager at Datadog.

At re:Invent 2020, AWS announced the new Amazon Elastic Compute Cloud (Amazon EC2) M5zn, D3, and R5b instance types. These instances are built on top of the AWS Nitro System, a collection of AWS-designed hardware and software innovations that enable the delivery of private networking, and efficient, flexible, and secure cloud services with isolated multi-tenancy.

If you’re thinking about deploying your workloads to any of these new instances, Datadog helps you monitor your deployment and gain insight into your entire AWS infrastructure. The Datadog Agent—open-source software available on GitHub—collects metrics, distributed traces, logs, profiles, and more from Amazon EC2 instances and the rest of your infrastructure.

How to deploy Datadog to analyze performance data

The Datadog Agent is compatible with the new instance types. You can use a configuration management tool such as AWS CloudFormation to deploy it automatically across all your instances. You can also deploy it with a single command directly to any Amazon EC2 instance. For example, you can use the following command to deploy the Agent to an instance running Amazon Linux 2:

DD_AGENT_MAJOR_VERSION=7 DD_API_KEY=[Your API Key] bash -c "$(curl -L https://raw.githubusercontent.com/DataDog/datadog-agent/master/cmd/agent/install_script.sh)"

The Datadog Agent uses an API key to send monitoring data to Datadog. You can find your API key in the Datadog account settings page. Once you deploy the Agent, you can instantly access system-level metrics and visualize your infrastructure. The Agent automatically tags EC2 metrics with metadata, including Availability Zone and instance type, so you can filter, search, and group data in Datadog. For example, the Host Map helps you visualize how I/O load on d3.2xlarge instances is distributed across Availability Zones and individual instances (as shown in Figure 1 ).

Figure 1 – Visualizing read I/O operations on d3.2xlarge instances across three Availability Zones.

Enabling trace collection for better visibility

Installing the Agent allows you to use Datadog APM to collect traces from the services running on your Amazon EC2 instances, and monitor their performance with Datadog dashboards and alerts.

Datadog APM includes support for auto-instrumenting applications built on a wide range of languages and frameworks, such as Java, Python, Django, and Ruby on Rails. To start collecting traces, you add the relevant Datadog tracing library to your code. For more information on setting up tracing for a specific language, Datadog has language-specific guides to help get you started.

Visualizing M5zn performance in Datadog

The new M5zn instances are a high frequency, high speed and low-latency networking variant of Amazon EC2 M5 instances. M5zn instances deliver the highest all-core turbo CPU performance from Intel Xeon Scalable processors in the cloud, with a frequency up to 4.5 GHz —making them ideal for gaming, simulation modeling, and other high performance computing applications across a broad range of industries.

To demonstrate how to visualize M5zn’s performance improvements in Datadog, we ran a benchmark test for a simple Python application deployed behind Apache servers on two instance types:

M5 instance (hellobench-m5-x86_64 service in Figure 2)
M5zn instance (hellobench-m5zn-x86_64 service in Figure 2).

Our benchmark application used the aiohttp library and was instrumented with Datadog’s Python tracing library. To run the test, we used Apache’s HTTP server benchmarking tool to generate a constant stream of requests across the two instances.

The hellobench-m5zn-x86_64 service running on the M5zn instance reported a 95th percentile latency that was about 48 percent lower than the value reported by the hellobench-m5-x86_64 service running on the M5 instance (4.73 ms vs. 9.16 ms) over the course of our testing. The summary of results is shown in Datadog APM (as shown in figure 2 below):

Figure 2 – Performance benchmarks for a Python application running on two instance types: M5 and M5zn.

To analyze this performance data, we visualize the complete distribution of the benchmark response time results in a Datadog dashboard. Viewing the full latency distribution allows us to have a more complete picture when considering selecting the right instance type, so we can better adhere to Service Level Objective (SLO) targets.

Figure 3 shows that the M5zn was able to outperform the M5 across the entire latency distribution, both for the median request and for the long tail end of the distribution. The median request, or 50th percentile, was 36 percent faster (299.65 µs vs. 465.28 µs) while the tail end of the distribution process was 48 percent faster (4.73 ms vs. 9.16 ms) as mentioned in the preceding paragraph.

Screenshot of Latency Distribution : M5

Figure 3 – Using a Datadog dashboard to show how the M5zn instance type performed faster across the entire latency distribution during the benchmark test.

We can also create timeseries graphs of our test results to show that the M5zn was able to sustain faster performance throughout the duration of the test, despite processing a higher number of requests. Figure 4 illustrates the difference by displaying the 95th percentile response time and the request rate of both instances across 30-second intervals.

Figure 4 – The M5zn’s p95 latency was nearly half of the M5’s despite higher throughput during the benchmark test.

We can dig even deeper with Datadog Continuous Profiler, an always-on production code profiler used to analyze code-level performance across your entire environment, with minimal overhead. Profiles reveal which functions (or lines of code) consume the most resources, such as CPU and memory.

Even though M5zn is already designed to deliver excellent CPU performance, Continuous Profiler can help you optimize your applications to leverage the M5zn high-frequency processor to its maximum potential. As shown in Figure 5, the Continuous Profiler highlights lines of code where CPU utilization is exceptionally high, so you can optimize those methods or threads to make better use of the available compute power.

After you migrate your workloads to M5zn, Continuous Profiler can help you quantify CPU-time improvements on a per-method basis and pinpoint code-level bottlenecks. This also occurs even as you add new features and functionalities to your application.

Figure 5 – Using Datadog Continuous Profiler to identify functions with the most CPU time.

Comparing D3, D3en, and D2 performance in Datadog

The new D3 and D3en instances leverage 2nd-generation Intel Xeon Scalable Processors (Cascade Lake) and provide a sustained all core frequency up to 3.1 GHz.
Compared to D2 instances, D3 instances provide up to 2.5x higher networking speed and 45 percent higher disk throughput.
D3en instances provide up to 7.5x higher networking speed, 100 percent higher disk throughput, 7x more storage capacity, and 80 percent lower cost-per-TB of storage.

These instances are ideal for HDD storage workloads, such as distributed/clustered file systems, big data and analytics, and high capacity data lakes. D3en instances are the densest local storage instances in the cloud. For our testing, we deployed the Datadog Agent to three instances: the D2, D3, and D3en. We then used two benchmark applications to gauge performance under demanding workloads.

Our first benchmark test used TestDFSIO, an open-source benchmark test included with Hadoop that is used to analyze the I/O performance of an HDFS cluster.

We ran TestDFSIO with the following command:

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar TestDFSIO -write -nrFiles 48 -size 10GB

The Datadog Agent automatically collects system metrics that can help you visualize how the instances performed during the benchmark test. The D3en instance led the field and hit a maximum write speed of 259,000 Kbps.

Figure 6 – Using Datadog to visualize and compare write speed of an HDFS cluster on D2, D3, and D3en instance types during the TestDFSIO benchmark test.

The D3en instance completed the TestDFSIO benchmark test 39 percent faster than the D2 (204.55 seconds vs. 336.84 seconds). The D3 instance completed the benchmark test 27 percent faster at 244.62 seconds.

Datadog helps you realize additional benefits of the D3en and D3 instances: notably, they exhibited lower CPU utilization than the D2 during the benchmark (as shown in figure 7).

Figure 7 – Using Datadog to compare CPU usage on D2, D3, and D3en instances during the TestDFSIO benchmark test.

For our second benchmark test, we again deployed the Datadog Agent to the same three instances: D2, D3, and D3en. In this test, we used TPC-DS, a high CPU and I/O load test that is designed to simulate real-world database performance.

TPC-DS is a set of tools that generates a set of data that can be loaded into your database of choice, in this case PostgreSQL 12 on Ubuntu 20.04. It then generates a set of SQL statements that are used to exercise the database. For this benchmark, 8 simultaneous threads were used on each instance.

The D3en instance completed the TPC-DS benchmark test 59 percent faster than the D2 (220.31 seconds vs. 542.44 seconds). The D3 instance completed the benchmark test 53 percent faster at 253.78 seconds.

Using the metrics collected from Datadog’s PostgreSQL integration, we learn that the D3en not only finished the test faster, but had lower system load during the benchmark test run. This is further validation of the overall performance improvements you can expect when migrating to the D3en.

Figure 8 – Using Datadog to compare system load on D2, D3, and D3en instances during the TPC-DS benchmark test.

The performance improvements are also visible when comparing rows returned per second. While all three instances had similar peak burst performance, the D3en and D3 sustained a higher rate of rows returned throughout the duration of the TPC-DS test.

Figure 9 – Using Datadog to compare Rows returned per second on D2, D3, and D3en instances during the TPC-DS benchmark test.

From these results, we learn that not only do the new D3en and D3 instances have faster disk throughput, but they also offer improved CPU performance, which translates into superior performance to power your most critical workloads.

Comparing R5b and R5 performance

The new R5b instances provide 3x higher EBS-optimized performance compared to R5 instances, and are frequently used to power large workloads that rely on Amazon EBS. Customers that operate applications with stringent storage performance requirements can consolidate their existing R5 workloads into fewer or smaller R5b instances to reduce costs.

To compare I/O performance across these two instance types, we installed the FIO benchmark application and the Datadog Agent on an R5 instance and an R5b instance. We then added EBS io1 storage volumes to each with a Provisioned IOPS setting of 25,000.

We ran FIO with a 75 percent read, 25 percent write workload using the following command:

sudo fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/disk-io1/test --bs=4k --iodepth=64 --size=16G —readwrite=randrw —rwmixread=75

Using the metrics collected from the Datadog Agent, we were able to visualize the benchmark performance results. In approximately one minute, FIO ramped up and reached the maximum I/O operations per second.

The left side of Figure 10 shows the R5b instance reaching the provisioned maximum IOPS of 25,000, while the read operations at 25 percent as expected. The right side shows the R5 reaching its EBS IOPS limit of 18,750, with its relative 25 percent write operations.

It should be noted that R5b instances have far higher performance ceilings than what is being shown here, which you can find in the User Guide: Amazon EBS–optimized instances.

Figure 10 – Comparing IOPS on R5b and R5 instances during the FIO benchmark test.

Also, note that the R5b finished the benchmark test approximately one minute faster than the R5 (166 seconds vs. 223 seconds). We learn that the shorter test duration is driven by the R5b’s faster read time, which reached a maximum of 75,000 Kbps.

Figure 11 – The R5b instance’s faster read time enabled it to complete the benchmark test more quickly than the R5 instance.

From these results, we have learned that the R5b delivers superior I/O capacity with higher throughput, making it a great choice for large relational databases and other IOPS-intensive workloads.

Conclusion

If you are thinking about shifting your workloads to one of the new Amazon EC2 instance types, you can use the Datadog Agent to immediately begin collecting and analyzing performance data. With Datadog’s other AWS integrations, you can monitor even more of your AWS infrastructure and correlate that data with the data collected by the Agent. For example, if you’re running EBS-optimized R5b instances, you can monitor them alongside performance data from your EBS volumes with Datadog’s Amazon EBS integration.

About Datadog

Datadog is an AWS Partner Network (APN) Advanced Technology Partner with AWS Competencies in DevOps, Migration, Containers, and Microsoft Workloads.

Read more about the M5zn, D3en, D3, and R5b instances, and sign up for a free Datadog trial if you don’t already have an account.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Monitoring memory usage in Amazon Lightsail instance

2021-06-14 Emma White

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/monitoring-memory-usage-lightsail-instance/

This post is written by Sebastian Lee, Solution Architect, Startup Singapore.

Amazon Lightsail is a great starting point for those looking to get started on AWS. Lightsail is ideal for startups, SMBs, and hobbyist developers because it simplifies the deployment of instances, databases, load-balancers, CDNs, and even containers. However, you cannot track metrics beyond CPU utilization, network utilization, and error messages. Many startups and small businesses need to review more metrics like memory usage and disk usage.

In this blog, I walk through the steps to configure a Lightsail instance to send memory usage to Amazon CloudWatch for monitoring, alarming and notifications.

architecture overview

Product and Solution Overview

Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site-reliability engineers and IT managers. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events. It provides a unified view of your AWS resources, applications and services that run on AWS and on-premise servers. You can configure your Lightsail resources to work with Amazon CloudWatch to receive more metrics.

The following sections include steps to install a Cloudwatch agent on your Amazon Lightsail instance and configure it to have the necessary permission to send memory usage metrics to Amazon Cloudwatch.

Prerequisites

Before you begin the walkthrough, you must have an instance running in your Lightsail account. You can follow the steps here if you need help creating an instance.

Walkthrough

1. Create IAM user

First, you must create an IAM user to provide permission to send data to CloudWatch.

Sign in to the AWS Management Console and open the IAM console.
In the navigation pane, choose Users, and then choose Add user.
Enter “lightsail-cloudwatch-agent” in the User name text box.
For Access type, select Programmatic access, and then choose Next: Permissions.
For Set permissions, choose Attach existing policies directly.
1. In the list of policies, select the check box next to CloudWatchAgentServerPolicy. You can use the search text box to find the policy.
Choose Next: Tags.
Optionally, you can add one or more tag-key value pairs to organize, track, or control access for this role, and then choose Next: Review.
Confirm that the correct policies are listed, and then choose Create user.
In the row for the new user, choose Show. Copy the access key and secret key to a file so that you can use them when installing the agent.
1. Important: You will not be able to copy the secret key after leaving this page. If you lose it, you will have to create a new one
Choose Close.

Now that you created an IAM user, you can SSH into your Lightsail instance.

2. SSH into Amazon Lightsail instance

You can connect to your instance using the browser-based SSH client available in the Lightsail console, or by using your own SSH client with the SSH key of your instance.

Complete the following steps to connect to your instance using the browser-based SSH client in the Lightsail console:

Open the Lightsail console.
Click the terminal icon, next to the instance, as shown in the following screenshot.

3. Installing the CloudWatch agent

Now that you have SSH’d into your instance, you are ready to install the CloudWatch agent. The CloudWatch agent is available as a package on Amazon Linux 2 instances. For other operating systems, see Download and configure the CloudWatch agent using the command line.

Enter the following command to install the CloudWatch agent on a linux instance.

> sudo yum -y install amazon-cloudwatch-agent

========================================================================
Install 1 Package
…
Installed:
amazon-cloudwatch-agent.x86_64 0:1.247347.4-1.amzn2  

Complete!

4. Setup credentials

Now that you installed the CloudWatch Agent, you must allow it to access your AWS resources. First, setup the necessary credentials.

Enter the following command to create a credentials profile in the AWS Command Line Interface (AWS CLI).

Follow the prompts to enter the access key ID and secret access key you copied in the preceding steps.

> sudo aws configure --profile AmazonCloudWatchAgent

Follow the prompts to enter the access key ID and secret access key you copied earlier in this tutorial

AWS Access Key ID [None]: <Enter the access key from step 1>
AWS Secret Access Key [None]: <Enter the secret key from step 1>
Default region name [None]:
Default output format [None]:

5. Create CloudWatch configuration file to collect memory usage metrics

To tell CloudWatch agent to collect memory usage metrics, you will need to create a CloudWatch config file.

Enter the following command to create a config file for the CloudWatch agent.

> sudo vim /opt/aws/amazon-cloudwatch-agent/bin/config.json

Press “I” to enter insert mode in Vim, and paste the following text into the file.

{
    "agent": {
        "metrics_collection_interval": 60,
        "run_as_user": "root"
    },
    "metrics": {
	"append_dimensions": {
	    "ImageID": "${aws:ImageId}",
	    "InstanceId":"${aws:InstanceId}",
	    "InstanceType":"${aws:InstanceType}"
	},
        "metrics_collected": {
            "mem": {
                "measurement": [
                    "mem_used_percent"
                ],
                "metrics_collection_interval": 60
            }
        }
    }
}

Press “ESC”, and then type “:wq!” to save the file and exit Vim.

6. Configure CloudWatch agent

In this section, you configure the CloudWatch agent to use the shared credential profile created earlier.

Enter the following command to create a common configuration file for the CloudWatch agent.

> sudo vim /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml

Press “I” to enter insert mode in Vim, and paste the following text into the file.

[credentials]
shared_credential_profile = "AmazonCloudWatchAgent"

Press “ESC”, and then type “:wq!” to save the file and exit Vim.

7. Start CloudWatch agent

Now the necessary configuration for CloudWatch agent is setup. Let’s start the agent.

Enter the following command to start the CloudWatch agent.

> sudo amazon-cloudwatch-agent-ctl -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -a fetch-config -s 

****** processing cwagent-otel-collector ******
cwagent-otel-collector will not be started as it has not been configured yet.

****** processing amazon-cloudwatch-agent ******
…
Redirecting to /bin/systemctl restart amazon-cloudwatch-agent.service

Enter the following command to verify that the CloudWatch agent is running.

> sudo amazon-cloudwatch-agent-ctl -a status
{
  "status": "running",
  "starttime": "2021-04-16T10:34:27+0000",
  "configstatus": "configured",
  "cwoc_status": "stopped",
  "cwoc_starttime": "",
  "cwoc_configstatus": "not configured",
  "version": "1.247347.4"
}

8. Verify metrics in CloudWatch

At this point, you should be able to view your metrics in CloudWatch.

Navigate to the CloudWatch console.
On the left navigation panel, choose Metrics.
Under “Custom Namespaces”, You should see a link for “CWAgent”.
Choose CWAgent.
Choose ImageId, InstanceId, InstanceType.
Select checkbox to display metrics on graph.

cloudwatch metrics

In addition, you can create a CloudWatch alarm to monitor the memory usage metrics to automatically send you a notification when the metric reaches a threshold you specify. To create an alarm in CloudWatch, you can follow this guide.

Conclusion

In this blog, I covered how you can install the CloudWatch agent on your Amazon Lightsail instance to send memory metrics to Amazon CloudWatch. For more information on additional metrics and logs supported by CloudWatch Agent, see the CloudWatch User Guide

To get started with Amazon Lightsail, check out our getting started page for more tutorial and resources.

Building serverless applications with streaming data: Part 3

2021-06-14 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-serverless-applications-with-streaming-data-part-3/

This series is about building serverless solutions in streaming data workloads. These are traditionally challenging to build, since data can be streamed from thousands or even millions of devices continuously. The application example used in this series is Alleycat, which allows bike racers to compete with each other virtually on home exercise bikes.

In this post, I explain how the all-time rankings functionality is implemented. This uses Amazon Kinesis Data Firehose to deliver hundreds of thousands of data points for processing by AWS Lambda functions.

To set up the example, visit the GitHub repo and follow the instructions in the README.md file. Note that this walkthrough uses services that are not covered by the AWS Free Tier and incur cost.

Overview of real time rankings in Alleycat

In the example scenario, there are 40,000 users and up to 1,000 competitors may race at any given time. While a competitor is racing, there is a real time rankings display that shows their performance in the selected class:

The racer can select Here now to compare against racers in the current virtual race. Alternatively, selecting All time compares their performance against the best performance of all races who have ever competed in the race. The rankings board is dynamic and shows the rankings for the current second on the display for the local racer:

In Alleycat, races occur every five minutes continuously, and the all-time data is gathered from races that are completed. For this to work, the application must do the following:

Continuously aggregate race data from each of the six exercise classes and deliver to an Amazon S3 bucket.
Compare the incoming race data with the personal best records for every competitor in the same class.
If the current performance is a record, update the all-time best records dataset.
Compress the dataset, since there are thousands of racers, each with 5 minutes of personal racing history.

The resulting dataset contains second-by-second personal records for every racer in a selected race. This is saved in the application’s history bucket in S3 and loaded by the Alleycat front end before the beginning of each race:

        // From Home.vue's loadRealtimeHistory method

        // Load gz from S3 history bucket, unzip and parse to JSON
        const URL = `https://${this.$appConfig.historyBucket}.s3.${this.$appConfig.region}.amazonaws.com/class-${this.selectedClassId}.gz`
        const response = await axios.get(URL, { responseType: 'arraybuffer' })
        const buffer = Buffer.from(response.data, 'base64')
        const dezipped = await gunzip(buffer)
        const history = JSON.parse(dezipped.toString())

Architecture overview

The backend microservice handling this feature is located in the 2-streaming-kdf directory in the GitHub repo. It uses the following architecture:

Amazon Kinesis Data Firehose is a consumer of Kinesis Data Streams. Records are delivered from the application’s main stream as they arrive.
Kinesis Data Firehose invokes a data transformation Lambda function. This calculates the output for each racer’s data points and returns the modified records to Kinesis.
Kinesis Data Firehose delivers batches of records to an S3 bucket.
When objects are written to the S3 bucket, this event triggers the S3 processor Lambda function. This compares incoming racer performance with historical records.
The Lambda function saves the new all-time records in a compressed format to the application’s history bucket in S3.

The process happens continuously providing that new records are delivered to the application’s main Kinesis stream.

Using Kinesis Data Firehose

Kinesis Data Firehose can consume records from Kinesis Data Streams, or directly from the AWS SDK, AWS IoT Core, and other data producers. In Alleycat, there are multiple consumers that process incoming data, so Kinesis Data Firehose is configured as a consumer on the main stream.

The process of updating historical records is not a real-time process in Alleycat. Processing individual racer messages and comparing against all-time records is possible but would be computationally more complex. In practice, only a few incoming data points are all-time records. As a result, Alleycat asynchronously processes batches of records to find and update all-time records. The process is eventually consistent and the tradeoff is latency. There may be up to 1 minute between recording a personal record and updating the historical dataset in S3.

Kinesis Data Firehose provides several key functions for this process. First, it batches groups of messages based upon the batching hints provided in the AWS Serverless Application Model (AWS SAM) template. This application buffers records for up to 60 seconds or until 1 MB of records are available, whichever is reached first. You can adjust these settings and batch for up to 900 seconds or 128 MB of data. Note that these settings are hints and not absolute – the service can dynamically adjust these if data delivery falls behind data writing in the stream. As a result, hints should be treated as guidance and the actual settings may change.

Kinesis Data Firehose also enables data compression before delivery in various common formats. Since the S3 delivery bucket is an intermediary storage location in this application, using data compression reduces the storage cost. Kinesis can also encrypt data before delivery but this feature is not used in Alleycat. Finally, Kinesis Data Firehose can transform the incoming records by invoking a Lambda function. This enables the application to calculate the racer’s output before delivering the records.

The configuration for Kinesis Data Firehose, the S3 buckets, and the Lambda function is located in the template.yaml file in the GitHub repo. This AWS SAM template shows how to define the complete integration:

  DeliveryStream:
    Type: AWS::KinesisFirehose::DeliveryStream
    DependsOn:
      - DeliveryStreamPolicy    
    Properties:
      DeliveryStreamName: "alleycat-data-firehose"
      DeliveryStreamType: "KinesisStreamAsSource"
      KinesisStreamSourceConfiguration: 
        KinesisStreamARN: !Sub "arn:aws:kinesis:${AWS::Region}:${AWS::AccountId}:stream/${KinesisStreamName}"
        RoleARN: !GetAtt DeliveryStreamRole.Arn
      ExtendedS3DestinationConfiguration: 
        BucketARN: !GetAtt DeliveryBucket.Arn
        BufferingHints: 
          SizeInMBs: 1
          IntervalInSeconds: 60
        CloudWatchLoggingOptions: 
          Enabled: true
          LogGroupName: "/aws/kinesisfirehose/alleycat-firehose"
          LogStreamName: "S3Delivery"
        CompressionFormat: "GZIP"
        EncryptionConfiguration: 
          NoEncryptionConfig: "NoEncryption"
        Prefix: ""
        RoleARN: !GetAtt DeliveryStreamRole.Arn
        ProcessingConfiguration: 
          Enabled: true
          Processors: 
            - Type: "Lambda"
              Parameters: 
                - ParameterName: "LambdaArn"
                  ParameterValue: !GetAtt FirehoseProcessFunction.Arn

Once Kinesis Data Firehose is set up, there is no ongoing administration. The service scales automatically to adjust to the amount of the data available in the application’s Kinesis data stream.

How the Lambda data transformer works

Kinesis Data Firehose buffers up to 3 MB before invoking the data transformation function (you can configure this setting with the ProcessingConfiguration API). The service delivers records as a JSON array:

{
    "invocationId": "d8ce3cc6-abcd-407a-abcd-d9bc2ce58e72",
    "sourceKinesisStreamArn": "arn:aws:kinesis:us-east-2:012345678912:stream/alleycat",
    "deliveryStreamArn": "arn:aws:firehose:us-east-2:012345678912:deliverystream/alleycat-firehose",
    "region": "us-east-2",
    "records": [
        {
            "recordId": "49617596128959546022271021234567...",
            "approximateArrivalTimestamp": 1619185789847,
            "data": "eyJ1dWlkIjoiYjM0MGQzMGYtjI1Yi00YWM4LThjY2QtM2ZhNThiMWZmZjNlIiwiZXZlbnQiOiJ1cGRhdGUiLCJkZXZpY2VUaW1lc3RhbXiOjE2MTkxODU3ODk3NUsInNlY29uZCI6MwicmFjZUlkIjoxNjE5MTg1NjIwMj1LCJuYW1lIjoiSHViZXJ0IiwicmFjZXJJZCI6MSwiY2xhc3NJZCI6MSwiY2FkZW5jZSI6ODAsInJlc2lzdGFuY2UiOjc4fQ==",
            "kinesisRecordMetadata": {
                "sequenceNumber": "49617596128959546022271020452",
                "subsequenceNumber": 0,
                "partitionKey": "1619185789798",
                "shardId": "shardId-000000000000",
                "approximateArrivalTimestamp": 1619185789847
            }
        }, 
   ...

The payload contains metadata about the invocation and data source and each record contains a recordId and a base64 encoded data attribute. The Lambda function can then decode and modify the data attribute for each record as needed. The Lambda function must finally return a JSON array of the same length as the incoming event.records array, with the following attributes:

recordId: this must match the incoming recordId, so Kinesis can map the modified data payload back to the original record.
result: this must be “Ok”, “Dropped”, or “ProcessingFailed”. “Dropped” means that the function has intentionally removed a payload from processing. Any records with “ProcessingFailed” are delivered to the S3 bucket in a folder called processing-failed. This includes metadata indicating the number of delivery attempts, the timestamp of the last attempt, and the Lambda function’s ARN.
data: the returned data payload must be base64 encoded and the modified record must be within the 1 MB limit per record.

The transformer function in Alleycat shows how to implement this process in Node.js:

exports.handler = (event) => {
  const output = event.records.map((record) => {
    // Extract JSON record from base64 data
    const buffer = Buffer.from(record.data, "base64").toString()
    const jsonRecord = JSON.parse(buffer)

    // Add calculated field
    jsonRecord.output = ((jsonRecord.cadence + 35) * (jsonRecord.resistance + 65)) / 100

    // Convert back to base64 + add a newline
    const dataBuffer = Buffer.from(JSON.stringify(jsonRecord) + "\n", "utf8").toString("base64")

    return {
      recordId: record.recordId,
      result: "Ok",
      data: dataBuffer,
    }
  })

  console.log(`{ recordsTotal: ${output.length} }`)
  return { records: output }
}

If you are processing the data in downstream services in JSON format, adding a newline at the end of the data buffer for each record can simplify the import process.

The data transformation Lambda function can run for up to 5 minutes and can use other AWS services for data processing. The Kinesis Data Firehose service scales up the Lambda function automatically if traffic increases, up to 5 outstanding invocations per shard (or 10 if the destination is Splunk).

Processing the Kinesis Data Firehose output

Kinesis Data Firehose puts a series of objects in the intermediary S3 bucket. Each put event invokes the S3 processor Lambda function. This function compares each data point to historical best performance, per racer ID, per class ID, at the same second of the race.

Data points that do not beat existing records are discarded. Otherwise, the function merges new historical records into the dataset and saves the result into the final S3 history bucket. This object is compressed in gzip format and contains the history for a single class ID.

When the frontend downloads this dataset from the S3 bucket, it decompresses the object. The resulting JSON structure shows personal output records per racer ID for each second of the race. The frontend uses this data to update the leaderboard on the local device during each second of the active race.

Conclusion

In this post, I explain the all-time leaderboard logic in the Alleycat application. This is an asynchronous, eventually consistent process that checks batches of incoming records for new personal records. This uses Kinesis Data Firehose to provide a zero-administration way to deliver and process large batches of records continuously.

This post shows the architecture in Alleycat and how this is defined in AWS SAM. Finally, I walk through how to build a data transformation Lambda function that correctly decodes a payload and returns records back to Kinesis.

Part 4 show how to combine Kinesis with Amazon DynamoDB to support queries for streaming data. Alleycat uses this architecture to provide real-time rankings for competitors in the same virtual race.

For more serverless learning resources, visit Serverless Land.

Field Notes: Data-Driven Risk Analysis with Amazon Neptune and Amazon Elasticsearch Service

2021-06-10 Adriaan de Jonge

Post Syndicated from Adriaan de Jonge original https://aws.amazon.com/blogs/architecture/field-notes-data-driven-risk-analysis-with-amazon-neptune-and-amazon-elasticsearch-service/

This blog post is co-authored with Charles Crouspeyre and Angad Srivastava. Charles is Director at Accenture Applied Intelligence and ASEAN AI SME (Subject Matter Expert) and Angad is Data and Analytics Consultant at AWS and NLP (Natural Language Processing) expert. Together, they are the lead architects of the solution presented in this blog.

In this blog, you learn how Amazon Neptune as a graph database, combined with Amazon Elasticsearch Service (Amazon ES) for full text indexing helps you shorten risk analysis processes from weeks to minutes. We give a walk-through of the steps involved in creating this knowledge management solution, which includes natural language processing components.

The business problem

Our Energy customer needs to do a risk assessment before acquiring raw materials that will be processed in their equipment. The process includes assessing the inventory of raw materials, the capacity of storage units, analyzing the performance of the processing units, and quality assurance of the end product. The cycle time for a comprehensive risk analysis across different teams working in silos is more than 2 weeks, while the window of opportunity for purchasing is a couple of days. So, the customer either puts their equipment and personnel at risk or misses good buying opportunities.

The solution described in this blog helps our customer improve and speed up their decision making. This is done through an automated analysis and understanding of the documents and information they have gathered over the years. They use Natural Language Processing (NLP) to analyze and better understand the documents which is discussed later on in this blog.

Our customer has accumulated years of documents that were mostly in silos across the organization: emails, SharePoint, local computer, private notes, and more.

The data is so heterogenous and widespread that it became hard for our customer to retrieve the right information in a timely manner. Our objective was to create a platform centralizing all this information, and to facilitate present and future information retrieval. Making informed decisions on time helps our customer to purchase raw materials at a better price, increasing their margins significantly.

Overview of business solution

To understand the tasks involved, let’s look at the high-level platform workflow:

Figure 1: This illustration visualizes a 4-step process consisting of Hydrate, Analyze, Search and Feedback.

We can summarize our workflow as a 4-step process:

Hydrate: where we extract the information from multiple sources and do a first level of processing such as document scanning and natural language processing (NLP).
Analyze: where the information extracted from the hydration step is ingested and merged with existing information.
Search: where information is retrieved from the system based on user queries, by leveraging our knowledge graph and the concept map representation that we have created.
Feedback: where users can rate the results for the system as good or bad. The feedback is collected and used to update the Knowledge graph, to re-train our models or to improve our query matching layer.

High-level technical architecture

The following architecture consists of a traditional data layer, combined with a knowledge layer. The compute part of the solution is serverless. The database storage part requires long-running solutions.

Figure 2: A diagram visualizing the steps involved in data processing across two layers, the data layer and the knowledge layer and their implementations with AWS services.

The data layer of our application is similar to many common data analytics setups, and includes:

An ingestion and normalization component, implemented with AWS Lambda, fronted by Amazon API Gateway and AWS Transfer Family
An ETL component, implemented with AWS Glue and AWS Lambda
A data enhancement component, implemented with Lambda
An analytics component, implemented with Amazon Redshift
A knowledge query component, implemented with Lambda
A user interface, a custom implementation based on React

Where our solution really adds value, is the knowledge layer, which is what we will focus on in this blog. We created this specifically for our knowledge representation and management. This layer consists of the following:

The knowledge extraction block, where the raw text is extracted, analyzed and classified into facts and structured data. This is implemented using Amazon SageMaker and Amazon Comprehend.
The knowledge repository, where the raw data is saved and kept is Amazon Simple Storage Service (Amazon S3).
The relationship, and knowledge extraction, and indexing, where the facts extracted earlier are analyzed and added to our knowledge graph. This is implemented with a combination of Neptune, Amazon S3, Amazon DocumentDB (with MongoDB compatibility) and Amazon ES. Neptune is used as a property graph, queried with the Gremlin graph traversal language.
The knowledge aggregator, where we leverage both our knowledge graph and business representation to extract facts to associate with the user query, and rank information based on their relevance. This is implemented leveraging Amazon ES.

The last component, the knowledge aggregator, is fundamental for our infrastructure. In general, when we talk about information retrieval system – a system designed to supply the right information in the hands of users at the right time – there are two common approaches:

Keyword-based search: take the user query and search for the presence of certain keywords from the query in the available documents.
Concept-based search: build a business-related taxonomy to extend the keyword-based search into a business-related concept-based search.

The downside of a keyword-based search is that it does not capture the complexity and specificity of the business domain in which the query occurs. Due to this limitation, we chose to go with a concept-based search approach as it allows us to inject a layer of business understanding to our ingestion and information retrieval.

Knowledge layer deep-dive

Because the value added from our solution is in the knowledge layer, let’s dive deeper into the details of this layer.

Figure 3: An architecture diagram of the knowledge layer of the solution, classified in 3 categories: ingestion, knowledge representation and retrieval

The architecture in Figure 3 describes the technical solution architecture broken down into 3 key steps. The 3 steps are:

Ingestion
Knowledge representation
Retrieval

Another way to approach the problem definition is by looking at the process flow for how raw data/information flows through the system to generate the knowledge layer. Figure 4 gives an example of how the information is broadly treated as it progresses the logical phases of the process flow.

Figure 6: Context based Knowledge Graph Generation

Figure 4: An illustration of how information is extracted from an unstructured document, modeled as a graph and visualized in a business-friendly and concise format.

In this example, we can recognize a raw material of type “Champion” and detect a relationship between this entity and another entity of type “basic nitrogen”. This relationship is classified as the type “is characterized by”.

The facts in the parsed content are then classified into different categories of relevancy based on the contextual information contained in the text i.e., an important paragraph that mentions a potential issue will get classified as a key highlight with a high degree of relevancy.

This paragraph text is further analyzed to recognize and extract entities mentioned such as “Champion” and “basic nitrogen”; and to determine the semantic relationship between these entities based on the context of the paragraph i.e., “characterized by” and, “incompatibility due to low levels of basic nitrogen”.

There is a correlation between the different steps of the technical architecture versus the phases in the information analysis process. So we will present them together.

This table shows the correlation between the different steps of the technical architecture versus the phases in the information analysis process.

During the Ingestion step in the technical solution architecture, the aim is to process the incoming raw data in any format as defined in the Extract Information phase of the information analysis process flow.
Once the data ingestion has occurred, the next step is to capture the knowledge representation. The contextualize information phase of the information analysis process flow helps ensure that comprehensive and accurate knowledge representation occurs within the system.
The last step for the solution is to then facilitate retrieval of information by providing appropriate interfaces for interacting with the knowledge representation within the system. This is facilitated by the assemble information phase of the Information Analysis process.

To further understand the proposed solution, let us review the steps and the associated process flow phases.

Technical Architecture Step 1: Ingestion

Information comes in through the ingestion pipeline from various sources, such as websites, reports, news, blogs and internal data. Raw data enters the system either through automated API-based integrations with external websites or internal systems like Microsoft SharePoint, or can be ingested manually through AWS Transfer Family. Once a new piece of data has been ingested into the solution, it initiates the process for extracting information from the raw data.

Information Analysis Phase 1: Extract information

Once the information lands in our system, the knowledge representation process starts with our Lambda functions acting as the orchestrator between other components. Amazon SageMaker was initially used to create custom models for document categorization and classification of ingested unstructured files.

For example, an unstructured file that is ingested into our system gets recognized as a new email (one of the acceptable data sources) and is classified as “compatibility highlights” based on the email contents. But with improvements in the capabilities of Amazon Comprehend managed service, the need for custom model development, maintenance, and machine learning operations (MLOps) could be reduced. The solution now uses Amazon Comprehend with custom training for the initial step of document categorization and information extraction. Additionally, Amazon Comprehend was also used to create custom named-entity recognition models, that were trained to recognize custom raw materials and properties.

In this example, an unstructured pdf document is ingested into our system as illustrated in Figure 5.

Example of an unstructured pdf document being ingested into our system

Figure 5: Phase 1 – Information Extraction

Amazon Comprehend analyzes the unstructured document, classifies its contents and extracts a specific piece of information regarding a type of raw material called “Champion”. This has an inherent property called “low basic nitrogen” associated with it.

Technical Architecture Step 2: Knowledge representation

Knowledge representation is the process of extracting semantic relationships between the various information/data elements within a piece of raw data. It then incorporates it into the existing layers of knowledge already identified and stored. Based on the categorization of the document, the raw text is pre-processed and parsed into logical units. The parsed data is then analyzed in our NLP layer for content identification and fact classification.

The facts and key relationships deduced from the results of both Amazon Comprehend are returned back to the Lambda functions, which in-turn store the detected facts to the knowledge graph.

Information Analysis Phase 2: Contextualize information

Once the information is extracted from the document; our first step is to contextualize the information using our business representation in the form of a taxonomy. The system detects different parts and entities that the paragraph is composed of, and structures the information into our knowledge graph as illustrated in Figure 6.

Figure 6: Context based Knowledge Graph Generation

This data extraction process is repeated iteratively, so that the knowledge graph grows over time through the detection of new facts and relationships. When we ingest new data into our knowledge graph, we search our knowledge graph for similar entities. If a similar entity exists, we analyze the type of relationships and properties both entities have. When we observe sufficient similarities between the entities, we associate relationships from one entity to the other.

For example, a new entity “Crude A” which has the properties – level of basic nitrogen and level of sulfur is ingested. Next, we have “Champion”, as described above, which has similar levels of basic nitrogen and a property “risk” associated to it. Based on the existing knowledge graph, we can now infer that there is a high probability that “Crude A” has a similar risk associated to it as shown in Figure 7.

Figure 7: Crude Knowledge Graph Representation

The probability calculations can take multiple factors into consideration to make the process more accurate. This makes the structure of the knowledge graph quite dynamic and evolve automatically.

The complete raw data is also stored in Amazon ES as a secondary implementation to perform free form queries. This process helps ensure that all the relevant information for any extracted factoid associated with an entity in the knowledge graph is completely represented with the system. Some of this information may not exist in the knowledge graph because the document data extraction model can’t capture all the relevant information. One reason for such a problem to occur can be poor quality of the source document making automated reading of documents and data extraction difficult. Another reason can be the sophistication of the Amazon Comprehend models.

Technical Architecture Step 3: Retrieval

To retrieve information, the user query is analyzed by the Lambda function on the right side of Figure 3. Based on the analysis, key terms are identified from the user query for which a search needs to be performed. For example, if the query provided is “What is the likelihood of damage due to processing Champion in location A”, semantic analysis of the query will indicate that we are looking for relationships between entities Champion, any type of risks, any known incidents at location A and possible mitigations to reduce identified risks.

To address the query, the information then needs to compiled together from the existing knowledge graph as well as Amazon ES to provide a complete answer.

Information Analysis Phase 3: Assemble information

Figure 8 illustrates the output of Information assembly process.

Figure 8: “Champion” crude assembled information for property Nitrogen

Based on the facts available within the knowledge graph, we have identified that for “Champion” there is a possibility of damage occurring “due to increased pressure drop and loss of heat transfer” but this can be mitigated by “blending champion to meet basic nitrogen levels”.

In addition, say there was information available about “Crude B” that has been processed at “location A”. This also originated from “Brunei” and had similar “Nitrogen” and properties such as “Kerogen3”, “napthenic” and had a processing incident causing damage. We can then conclude by looking at the information stored within the knowledge graph and Amazon ES, that there are other possibilities for damage to occur due to processing of “Champion” at “Location A” as well.

Once all the relevant pieces of information have been collected, a sorted list of information is sent back to the user interface to be displayed.

Fact reconciliation

In reality, it is possible that new information contradicts existing information, which causes conflicts during ingestion. There are various ways to handle such contradictions, for example:

Figure 5: Visualizations of four illustrative ways to deal with contradictory new facts.

Figure 9: Visualizations of four illustrative ways to deal with contradictory new facts.

Assume the latest data is the most accurate, by looking at the timestamp of each data point. This makes it possible to update the list of facts in our knowledge graph
Analyze how new facts alter the properties or relationships of existing facts and update them or create a relationship between nodes
Calculate a reliability score for the available sources, to rank the fact based on who has provided them
Ask for end user feedback through the user interface

In our solution, we have mechanism 1, 2, and 4. Mechanisms 1 and 2 are implemented within the contextualize information phase of the information analysis process.

Mechanism 4 is implemented in the search results use interface where the user has a ‘thumbs up’ and ‘thumbs down’ button to classify the different search results as relevant or not. This information is then fed back into the Amazon Comprehend model, the knowledge graph as well as captured within Amazon ES to optimize subsequent search results.

Over time, mechanism 4 can be expanded to capture more detailed feedback including corrections to the search result instead of a simple yes/no feedback mechanism. Such enhancements to mechanism 4 and the implementation of Mechanism 3 can be a possible future enhancement for the solution proposed.

Conclusion

Our customer needed help to shorten their risk analysis process to make high-impact purchase decisions for raw materials. Our knowledge management solution helped them extract knowledge from their vast set of documents and make it available in knowledge graph format, for risk analysts to analyze. Knowledge graphs are a great way to handle this “domain specificity”. It helps extract information during the ingestion phase. It also helps contextualize queries during the retrieval phase.

The possibilities are endless. One thing is certain: we’d encourage you to use graph databases with Neptune supported by Amazon ES for your use cases with highly connected data!

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Complying with DMARC across multiple accounts using Amazon SES

2021-06-10 Brendan Paul

Post Syndicated from Brendan Paul original https://aws.amazon.com/blogs/messaging-and-targeting/complying-with-dmarc-across-multiple-accounts-using-amazon-ses/

Introduction

For enterprises of all sizes, email is a critical piece of infrastructure that supports large volumes of communication from an organization. As such, companies need a robust solution to deal with the complexities this may introduce. In some cases, companies have multiple domains that support several different business units and need a distributed way of managing email sending for those domains. For example, you might want different business units to have the ability to send emails from subdomains, or give a marketing company the ability to send emails on your behalf. Amazon Simple Email Service (Amazon SES) is a cost-effective, flexible, and scalable email service that enables developers to send mail from any application. One of the benefits of Amazon SES is that you can configure Amazon SES to authorize other users to send emails from addresses or domains that you own (your identities) using their own AWS accounts. When allowing other accounts to send emails from your domain, it is important to ensure this is done securely. Amazon SES allows you to send emails to your users using popular authentication methods such as DMARC. In this blog, we walk you through 1/ how to comply with DMARC when using Amazon SES and 2/ how to enable other AWS accounts to send authenticated emails from your domain.

DMARC: what is it, why is it important?

DMARC stands for “Domain-based Message Authentication, Reporting & Conformance”, and it is an email authentication protocol (DMARC.org). DMARC gives domain owners and email senders a way to protect their domain from being used by malicious actors in phishing or spoofing attacks. Email spoofing can be used as a way to compromise users’ financial or personal information by taking advantage of their trust of well-known brands. DMARC makes it easier for senders and recipients to determine whether or not an email was actually sent by the domain that it claims to have been sent by.

Solution Overview

In this solution, you will learn how to set up DKIM signing on Amazon SES, implement a DMARC Policy, and enable other accounts in your organization to send emails from your domain using Sending Authorization. When you set up DKIM signing, Amazon SES will attach a digital signature to all outgoing messages, allowing recipients to verify that the email came from your domain. You will then set your DMARC Policy, which tells an email receiver what to do if an email is not authenticated. Lastly, you will set up Sending Authorization so that other AWS accounts can send authenticated emails from your domain.

Prerequisites

In order to complete the example illustrated in this blog post, you will need to have:

A domain in an Amazon Route53 Hosted Zone or third-party provider. Note: You will need to add/update records for the domain. For this blog we will be using Route53.
An AWS Organization
A second AWS account to send Amazon SES Emails within a different AWS Organizations OU. If you have not worked with AWS Organizations before, review the Organizations Getting Started Guide

How to comply with DMARC (DKIM and SPF) in Amazon SES

In order to comply with DMARC, you must authenticate your messages with either DKIM (DomainKeys Identified Mail), SPF (Sender Policy Framework), or both. DKIM allows you to send email messages with a cryptographic key, which enables email providers to determine whether or not the email is authentic. SPF defines what servers are allowed to send emails for their domain. To use SPF for DMARC compliance you need to set up a custom MAIL FROM domain in Amazon SES. To authenticate your emails with DKIM in Amazon SES, you have the option of:

Setting up a sending identity, so that Amazon SES automatically adds a DKIM signature to your messages
Providing Your Own DKIM Authentication Token
Manually adding your DKIM signature to emails that you send

In this blog, you will be setting up a sending identity.

Setting up DKIM Signing in Amazon SES

Navigate to the Amazon SES Console
Select Verify a New Domain and type the name of your domain in
Select Generate DKIM Settings
Choose Verify This Domain
1. This will generate the DNS records needed to complete domain verification, DKIM signing, and routing incoming mail.
2. Note: When you initiate domain verification using the Amazon SES console or API, Amazon SES gives you the name and value to use for the TXT record. Add a TXT record to your domain’s DNS server using the specified Name and Value. Amazon SES domain verification is complete when Amazon SES detects the existence of the TXT record in your domain’s DNS settings.
If you are using Route 53 as your DNS provider, choose the Use Route 53 button to update the DNS records automatically
1. If you are not using Route 53, go to your third-party provider and add the TXT record to verify the domain as well as the three CNAME records to enable DKIM signing. You can also add the MX record at the end to route incoming mail to Amazon SES.
2. A list of common DNS Providers and instructions on how to update the DNS records can be found in the Amazon SES documentation
Choose Create Record Sets if you are using Route53 as shown below or choose Close after you have added the necessary records to your third-party DNS provider.

Note: in the case that you previously verified a domain, but did NOT generate the DKIM settings for your domain, follow the steps below. Skip these steps if this is not the case:

Go to the Amazon SES Console, and select your domain
Select the DKIM dropdown
Choose Generate DKIM Settings and copy the three values in the record set shown
1. You may also download the record set as a CSV file
Navigate to the Route53 console or your third-party DNS provider. Instructions on how to update the DNS records in your third-party can be found in the Amazon SES documentation
Select the domain you are using
Choose Create Record

Enter the values that Amazon SES has generated for you, and add the three CNAME records to your domain
Wait a few minutes, and go back to your domain in the Amazon SES Console
Check that the DKIM status is verified

You also want to set up a custom MAIL FROM domain that you will use later on. To do so, follow the steps in the documentation.

Setting up a DMARC policy on your domain

DMARC policies are TXT records you place in DNS to define what happens to incoming emails that don’t align with the validations provided when setting up DKIM and SPF. With this policy, you can choose to allow the email to pass through, quarantine the email into a folder like junk or spam, or reject the email.

As a best practice, you should start with a DMARC policy that doesn’t reject all email traffic and collect reports on emails that don’t align to determine if they should be allowed. You can also set a percentage on the DMARC policy to perform filtering on a subset of emails to, for example, quarantine only 50% of the emails that don’t align. Once you are in a state where you can begin to reject non-compliant emails, flip the policy to reject failed authentications. When you set the DMARC policy for your domain, any subdomains that are authorized to send on behalf of your domain will inherit this policy and the same rule will apply. For more information on setting up a DMARC policy, see our documentation.

In a scenario where you have multiple subdomains sending emails, you should be setting the DMARC policy for the organizational domain that you own. For example, if you own the domain example.com and also want to use the sub-domain sender.example.com to send emails you can set the organizational DMARC policy (as a DNS TXT record) to:

	Name	Type	Value
1	_dmarc.example.com	TXT	“v=DMARC1;p=quarantine;pct=50;rua=mailto:[email protected]”

This DMARC policy states that 50% of emails coming from example.com that fail authentication should be quarantined and you want to send a report of those failures to [email protected]. For your sender.example.com sub-domain, this policy will be inherited unless you specify another DMARC policy for our sub-domain. In the case where you want to be stricter on the sub-domain you could add another DMARC policy like you see in the following table.

	Name	Type	Value
1	_dmarc.sender.example.com	TXT	“v=DMARC1;p=reject;pct=100;rua=mailto:[email protected];ruf=mailto:[email protected]”

This policy would apply to emails coming from sender.example.com and would reject any email that fails authentication. It would also send aggregate feedback to [email protected] and detailed message-specific failure information to [email protected] for further analysis.

Sending Authorization in Amazon SES – Allowing Other Accounts to Send Authenticated Emails

Now that you have configured Amazon SES to comply with DMARC in the account that owns your identity, you may want to allow other accounts in your organization the ability to send emails in the same way. Using Sending Authorization, you can authorize other users or accounts to send emails from identities that you own and manage. An example of where this could be useful is if you are an organization which has different business units in that organization. Using sending authorization, a business unit’s application could send emails to their customers from the top-level domain. This application would be able to leverage the authentication settings of the identity owner without additional configuration. Another advantage is that if the business unit has its own subdomain, the top-level domain’s DKIM settings can apply to this subdomain, so long as you are using Easy DKIM in Amazon SES and have not set up Easy DKIM for the specific subdomains.

Setting up sending authorization across accounts

Before you set up sending authorization, note that working across multiple accounts can impact bounces, complaints, pricing, and quotas in Amazon SES. Amazon SES documentation provides a good understanding of the impacts when using multiple accounts. Specifically, delegated senders are responsible for bounces and complaints and can set up notifications to monitor such activities. These also count against the delegated senders account quotas. To set up Sending Authorization across accounts:

Navigate to the Amazon SES Console from the account that owns the Domain
Select Domains under Identity Management
Select the domain that you want to set up sending authorization with
Select View Details
Expand Identity Policies and Click Create Policy
You can either create a policy using the policy generator or create a custom policy. For the purposes of this blog, you will create a custom policy.
For the custom policy, you will allow a particular Organization Unit (OU) from our AWS Organization access to our domain. You can also limit access to particular accounts or other IAM principals. Use the following policy to allow a particular OU to access the domain:

{
  “Version”: “2012-10-17”,
  “Id”: “AuthPolicy”,
  “Statement”: [
   {
   “Sid”: “AuthorizeOU”,
   “Effect”: “Allow”,
   “Principal”: “*”,
   “Action”: [
   “SES:SendEmail”,
   “SES:SendRawEmail”
   ],
   “Resource”: “<Arn of Verified Domain>”,
   “Condition”: {
   “ForAnyValue:StringLike”: {
   “aws:PrincipalOrgPaths”: “<Organization Id>/<Root OU Id>/<Organizational Unit Id>”
   }
   }
   }
  ]
}

9. Make sure to replace the escaped values with your Verified Domain ARN and the Org path of the OU you want to limit access to.

You can find more policy examples in the documentation. Note that you can configure sending authorization such that all accounts under your AWS Organization are authorized to send via a certain subdomain.

Testing

You can now test the ability to send emails from your domain in a different AWS account. You will do this by creating a Lambda function to send a test email. Before you create the Lambda function, you will need to create an IAM role for the Lambda function to use.

Creating the IAM Role:

Log in to your separate AWS account
Navigate to the IAM Management Console
Select Role and choose Create Role
Under Choose a use case select Lambda
choose Next: Permissions
In the search bar, type SES and select the check box next to AmazonSESFullAccess
Choose Next:Tags and Review
Give the role a name of your choosing, and choose Create Role

Navigate to Lambda Console

Select Create Function
Choose the box marked Author from Scratch
Give the function a name of your choosing (Ex: TestSESfunction)
In this demo, you will be using Python 3.8 runtime, but feel free to modify to your language of choice
Select the Change default execution role dropdown, and choose the Use an existing role radio button
Under Existing Role, choose the role that you created in the previous step, and create the function

Edit the function

Navigate to the Function Code portion of the page and open the function python file
Replace the default code with the code shown below, ensuring that you put your own values in based on your resources
Values needed:
1. Test Email Address: an email address you have access to
  1. NOTE: If you are still operating in the Amazon SES Sandbox, this will need to be a verified email in Amazon SES. To verify an email in Amazon SES, follow the process here. Alternatively, here is how you can move out of the Amazon SES Sandbox
2. SourceArn: The arn of your domain. This can be found in Amazon SES Console → Domains → <YourDomain> → Identity ARN
3. ReturnPathArn: The same as your Source ARN
4. Source: This should be your Mail FROM Domain @ your domain
  1. Your Mail FROM Domain can be found under Domains → <YourDomain> → Mail FROM Domain dropdown
  2. Ex: [email protected]
5. Use the following function code for this example

import json
import boto3
from botocore.exceptions import ClientError

client = boto3.client('ses')
def lambda_handler(event, context):
   # Try to send the email.
   try:
   #Provide the contents of the email.
   response = client.send_email(
   Destination={
   'ToAddresses': [
   '<[email protected]>',
   ],
   },
   Message={
   'Body': {
   'Html': {
   'Charset': 'UTF-8',
   'Data': 'This email was sent with Amazon SES.',
   },
   },
   'Subject': {
   'Charset': 'UTF-8',
   'Data': 'Amazon SES Test',
   },
   },
   SourceArn='<your-ses-identity-ARN>',
   ReturnPathArn='<your-ses-identity-ARN>',
   Source='<[email protected]>',
           )
   # Display an error if something goes wrong.
   except ClientError as e:
   print(e.response['Error']['Message'])
   else:
   print("Email sent! Message ID:"),
   print(response['ResponseMetadata']['RequestId'])

Once you have replaced the appropriate values, choose the Deploy button to deploy your changes

Run a Test invocation

After you have deployed your changes, select the “Test” Panel above your function code

You can leave all of these keys and values as default, as the function does not use any event parameters
Choose the Invoke button in the top right corner
You should see this above the test event window:

Verifying that the Email has been signed properly

Depending on your email provider, you may be able to check the DKIM signature directly in the application. As an example, for Outlook, right click on the message, and choose View Source from the menu. You should see line that shows the Authentication Results and whether or not the DKIM/SPF signature passed. For Gmail, go to your Gmail Inbox on the Gmail web app. Choose the message you wish to inspect, and choose the More Icon. Choose View Original from the drop-down menu. You should then see the SPF and DKIM “PASS” Results.

Cleanup

To clean up the resources in your account,

Navigate to the Route53 Console
Select the Hosted Zone you have been working with
Select the CNAME, TXT, and MX records that you created earlier in this blog and delete them
Navigate to the SES Console
Select Domains
Select the Domain that you have been working with
Click the drop down Identity Policies and delete the one that you created in this blog
If you verified a domain for the sake of this blog: navigate to the Domains tab, select the domain and select Remove
Navigate to the Lambda Console
Select Functions
Select the function that you created in this exercise
Select Actions and delete the function

Conclusion

In this blog post, we demonstrated how to delegate sending and management of your sub-domains to other AWS accounts while also complying with DMARC when using Amazon SES. In order to do this, you set up a sending identity so that Amazon SES automatically adds a DKIM signature to your messages. Additionally, you created a custom MAIL FROM domain to comply with SPF. Lastly, you authorized another AWS account to send emails from a sub-domain managed in a different account, and tested this using a Lambda function. Allowing other accounts the ability to manage and send email from your sub-domains provides flexibility and scalability for your organization without compromising on security.

Now that you have set up DMARC authentication for multiple accounts in your enviornment, head to the AWS Messaging & Targeting Blog to see examples of how you can combine Amazon SES with other AWS Services!

If you have more questions about Amazon Simple Email Service, check out our FAQs or our Developer Guide.

If you have feedback about this post, submit comments in the Comments section below.

Frictionless hosting of containerized ASP.NET web apps using Amazon Lightsail

2021-06-10 Emma White

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/frictionless-hosting-of-containerized-asp-net-web-apps-using-amazon-lightsail/

This post is written by Fahad Mustafa, Cloud Application Architect, AWS Professional Services

There are many ways to deploy ASP.NET web apps to AWS. Each with its own use cases and differing pricing models. But what if you have a small website and database that you must deploy rapidly, manage, and scale? What if you want a cost-effective simple monthly plan? In these cases, Amazon Lightsail is a great choice. This post shows you how to take a containerized ASP.NET web application that connects to a PostgreSQL database and deploy it to Lightsail. So that you can get your ASP.NET web app up and running.

Product Overview

Amazon Lightsail is an easy way to get started on AWS. It gives you building blocks to deploy an application or website and provision a database at an affordable, monthly price.

Lightsail is perfect for students, small businesses, and startups to get their website or application up and running in the cloud. By providing a secure, highly available, and managed environment Lightsail does all the heavy lifting like setting up IAM roles and policies.

Lightsail can also run containers! By pointing Lightsail to a public image on Amazon ECR or Docker Hub, or uploading an image from your local machine, you can easily run the container, scale it, monitor it and use a custom domain.

Overview of solution

To deploy an ASP.NET app that connects to a PostgreSQL database, you create a Lightsail container service and PostgreSQL database through the AWS Management Console. Create your app and container image. Push the image to Lightsail and finally create the Lightsail deployment to run the container.

solution diagram

Overview of steps

In this post, you create a sample ASP.NET web app through the .NET CLI. Alternatively, you can use Visual Studio to create the app.

This is the sequence of steps I review in this post:

Create a PostgreSQL database
Create a Lightsail container service
Create an ASP.NET web app
Create a Dockerfile and build image
Upload the image to Lightsail
Deploy and run the image

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
.NET SDK: To build and run .NET applications. For this post, you use .NET 5.
Docker: To create and run container images on our local machine.
AWS Command Line Interface (AWS CLI): Version 2.1.1 or newer is required for the Lightsail Control plugin.
Lightsail Control (lightsailctl) plugin: Enables the AWS CLI to access the container images that are on the local machine.
PgAdmin: To connect and manage the PostgreSQL database

Walkthrough

Create a PostgreSQL database

In this step, you create a PostgreSQL database through the Lightsail console.

Create the database

Sign in to the Lightsail console.
On the Lightsail home page, choose the Database
Choose Create database.
Choose the Database location by changing the AWS Region and Availability Zone.
Choose the database engine. In this example, select PostgreSQL 12.6.
Optional – Specify login credentials. If not changed, AWS generates a default secure password.
Optional – Specify the master database name. If not changed, AWS will use “dbmaster” as the default.
Choose the database plan. Compare the plan’s memory, CPU, storage, and transfer quota to decide which best fits your needs. The smallest database plan is Free Tier eligible.
Identify your database by giving it a unique name.
Choose Create database.

Creating and configuring the database can take a few minutes. Once ready, the status changes to Available. For more information and options on creating a database in Lightsail, see Creating a database in Amazon Lightsail.

available database

Now you are ready to connect to the database and create a table. To connect, see Connecting to your PostgreSQL database in Amazon Lightsail. This sample uses a database named aspnetlightsaildb and a table named Person that you can create by running the following script using PgAdmin. Note that the Owner value is dbmasteruser. This is the default username AWS generates. If you changed the default, then use the username you specified in step 6.

-- Database: aspnetlightsaildb
CREATE DATABASE aspnetlightsaildb
    WITH 
    OWNER = dbmasteruser
    ENCODING = 'UTF8'
    LC_COLLATE = 'en_US.UTF-8'
    LC_CTYPE = 'en_US.UTF-8'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1;	
-- Table: public.Person
CREATE TABLE IF NOT EXISTS public."Person"
(
    "Id" integer NOT NULL GENERATED ALWAYS AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
    "Name" text COLLATE pg_catalog."default",
    "DateOfBirth" date,
    "Address" text COLLATE pg_catalog."default",
    CONSTRAINT "Person_pkey" PRIMARY KEY ("Id")
)
TABLESPACE pg_default;
ALTER TABLE public."Person"
    OWNER to dbmasteruser;

Now your database and table is created and you can create a container service.

Create a Lightsail container service

In this step, you create a Lightsail container service that is ready to accept your container images.

Create the container service

Sign in to the Lightsail console.
On the Lightsail home page, choose the Containers Tab
Choose Create container service.
In the Create a container service page, choose Change AWS Region, then choose an AWS Region for your container service.
Choose a capacity for your container service. For more information, see Container service capacity (scale and power).
Skip the Set up your first deployment step as you’ll create the deployment after creating the container image on your dev machine.
Enter a name for your container service. Take note of this name, you’ll need it later to deploy the container to Lightsail.
Click Create container service.

After a few minutes, your container service status changes from Pending to Ready. This indicates you can now deploy images. If this is the first time you created a service, it can take 10–15 minutes for the status to become Ready.

container service

Create an ASP.NET web app

Using the .NET CLI, you’ll create a sample ASP.NET web app. In an empty directory run the following command:

dotnet new webapp --name HelloWorldLightsail

The “webapp” segment of the commands specifies the project template to use. In this case, it’s a default ASP.NET web app. The “name” parameter is the name of the ASP.NET project.

To connect to your PostgreSQL Db from ASP.NET, you must install the “Npgsql” Nuget package. In the root directory of the project run the following command in the terminal:

dotnet add package Npgsql.EntityFrameworkCore.PostgreSQL –-version 5.0.6

Once installed, you create a Model class to represent the data and a DbContext class to connect and query the database.

public class Person
    {
        public int Id { get; set; }

        public string Name { get; set; }

        public DateTime DateOfBirth { get; set; }

        public string Address { get; set; }
    }

public class PostgreSqlContext : DbContext
    {
        public PostgreSqlContext(DbContextOptions<PostgreSqlContext> options) : base(options)
        {
        }

        public DbSet<Person> Person { get; set; }
    }

The next step is to add the connection string to appSettings.json. In the root of the settings file add a new ConnectionStrings property as shown below. The following properties are required:

lightsail-endpoint: The database endpoint as shown in the Lightsail console.
db-name: The name of the database you want to connect to.
db-username: The username as shown in the Lightsail console.
db-password: The password as shown in the Lightsail console.

"ConnectionStrings": {
    "AspnetLightsailDb": "Server=<lightsail-endpoint>;Port=5432;Database=<db-name>;User Id=<db-username>;Password=<db-password>;"
  }

Next step is to tell ASP.NET where to find the connection string and which DbContext class to use. This is done by configuring the DbContext in Startup.cs. Under the ConfigureServices method add the following line of code:

  services.AddDbContext<PostgreSqlContext>(options => options.UseNpgsql(Configuration.GetConnectionString("AspnetLightsailDb")));

Now, you are ready to perform operations against the database. This is done by performing operations against the Person property of the PostgreSqlContext instance.

For example to fetch all records form the Person table:

public IList<Person> Person { get;set; }

        public async Task OnGetAsync()
        {
            Person = await _context.Person.ToListAsync();
        }

You now have an ASP.NET web application that can query the “Person” table against the PostgreSQL database.

Create a Dockerfile and build image

In order to containerize the web app, you must create a Dockerfile. This file provides instructions to Docker on how to build the container image.

To create a Dockerfile and build image

In the root directory of the project, you created (where the .csproj file lives) create an empty file named “Dockerfile”. Note this file does not have an extension.
Open the file with a text editor or IDE and insert the following:

# https://hub.docker.com/_/microsoft-dotnet
FROM mcr.microsoft.com/dotnet/sdk:5.0 AS build
WORKDIR /source

# copy csproj and restore as distinct layers
COPY *.csproj .
RUN dotnet restore

# copy everything else and build app
COPY . .
RUN dotnet publish -c release -o /app --no-restore

# final stage/image
FROM mcr.microsoft.com/dotnet/aspnet:5.0
WORKDIR /app
COPY --from=build /app ./
ENTRYPOINT ["dotnet", "HelloWorldLightsail.dll"]

To build the image, open a terminal in the same directory as the Dockerfile. Run the following command to build the image.
docker build -t helloworldlightsail .
The “-t” parameter is a human readable tag you give the image to make it easy to identify.
After the command completes, you can verify that the image exists by running
docker images
You should see the newly created image.

Upload the image to Lightsail

In this step, you upload the newly built image to the Lightsail container service that you created earlier.

To upload the image to Lightsail

Ensure you have configured the AWS CLI to access AWS.
In a terminal enter the following command:
aws lightsail push-container-image --region ap-southeast-2 --service-name aspnet-helloworld --label helloworldlightsail --image helloworldlightsail:latest
The –-region and –-service-name parameters should match the container service you created through the AWS Management Console. The –-label parameter is a descriptive name you give the image when it’s stored in the container service. This will help you track the different versions of the image. The –-image parameter consists of the image name and tag on your local machine that you want to push to Lightsail. Read more about how to push images to Lightsail.
After the command runs successfully browse your container service in the Lightsail console and click the “Images” tab. You should see the uploaded image.

3. After the command runs successfully browse your container service in the Lightsail console and click the “Images” tab

Deploy and run the image

Now that your image is uploaded to the container service it’s time to create a deployment to run the app.

To create a deployment

Go to the Deployments tab in the Lightsail console.
Click on Create your first deployment.
Enter the Container name.
Click Choose stored image and select the image you uploaded in the previous step.
Click on Add open ports to add a port mapping to the container. This allows Lightsail to forward web traffic to your ASP.NET web app. By default ASP.NET web server will listen to port 80.
Under the Public endpoint section, select the container from the drop-down. This specifies which container Lightsail will forward traffic to since a single deployment can have more than one container.
Click Save and deploy

Your configuration should looks like this. Read more about creating container services deployments in Lightsail.

configuration overview

After the deployment is complete, you can navigate to the Public domain of your container service. You will see your ASP.NET web app in action!

public domain

Conclusion

In this post, I demonstrated how easy it is to create a PostgreSQL DB and deploy an ASP.NET web app to Amazon Lightsail. Going from a container on your dev machine to a publicly accessible, scalable, and secure cloud environment within minutes.

You can now add a custom domain to your web app through the Lightsail console. Additionally, you can increase the scale of your container to keep up with demand based on the useful CPU and memory metrics provided in the console.

If you have more advanced needs for your web app, you have the whole robust ecosystem of AWS at your disposal. You can deploy your ASP.NET web app to Amazon Elastic Container Service (Amazon ECS) or even decide to go completely serverless and utilize AWS Lambda and API Gateway.

Visit the Amazon Lightsail homepage to get started with your next idea and read the docs for more details about container services on Amazon Lightsail.

Building an ARM64 Rust development environment using AWS Graviton2 and AWS CDK

2021-06-09 Alistair McLean

Post Syndicated from Alistair McLean original https://aws.amazon.com/blogs/devops/building-an-arm64-rust-development-environment-using-aws-graviton2-and-aws-cdk/

2020 was the year that ARM chips made the headlines by moving from largely mobile form factors into the cloud thanks to AWS Graviton2, allowing you to have up to 40% better price performance over comparable current generation x86 Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Relational Database Service (Amazon RDS) instances.

We speak to customers daily about Graviton2. One recurring question we hear is “Graviton2 is great, but how can my team develop for ARM natively without the complexity of cross-compilation or having to buy custom hardware on premises?” This post seeks to answer that question by setting up the Visual Studio Code-based Code Server IDE, running on a Graviton2 EC2 instance that enables native development in a cost-effective and secure manner accessed via your browser.

The Rust programming language has gained a huge amount of popularity recently. This post aims to show that you can use this environment for Rust development as well as hundreds of other supported languages. AWS has committed to supporting the Rust community and using the language to deliver fast and robust services to customers at scale, and we want to enable our customers to do the same.

We also include instructions for building and installing the rust-analyzer and CodeLLDB debugger plugins to add additional language features.

Solution overview

The following diagram illustrates our solution architecture.

Architecture of the solution showing components and their linkages

The solution consists of an EC2 Graviton2 instance located in a private VPC subnet routed through an AWS Global Accelerator accelerator to provide routing optimization and keep packet loss, jitter, and latency lower by up to 60%. An internal facing Application Load Balancer containing the AWS Certificate Manager certificate decrypts and forwards traffic to this instance.

Code Server queries AWS Secrets Manager to initially set the login password on startup and allow for continued password-based authentication and easy password rotation. The EC2 instance has access to the internet through a NAT gateway and has no public IP address or key pair associated, and is accessible only through AWS Systems Manager Session Manager.

Prerequisites

For this walkthrough, the following are prerequisites:

An AWS account
Familiarity with the AWS Command Line Interface (AWS CLI)
Account capacity for two Elastic IPs for the NAT gateways
Access to an AWS account with administrator or PowerUser (or equivalent) AWS Identity and Access Management (IAM) role policies attached
Access to AWS CloudShell
Basic knowledge of the Linux operating system
A public hosted zone in Amazon Route 53—for instructions on setting one up, see Getting started with Amazon Route 53
A private CIDR range for the new VPC that is created as part of the AWS Cloud Development Kit (AWS CDK) stack

AWS CDK stack

In order to deploy our architecture, I use the AWS CDK. As a developer, it’s more intuitive to me to define my infrastructure using a language and tooling with which I am familiar. I can also do things like environment variable injection and scripting as part of the stack creation to add stack parameters and customization points.

The AWS CDK application is comprised of five stacks. Each stack defines a separate part of the architecture:

Networking – Defines a VPC across two Availability Zones with the CIDR range of your choice. The routing and public/private subnet creation is done for us as part of the default configuration.
Certificate – This is the reason for the domain prerequisite. It’s a best practice to encrypt web applications using TLS, and for that we need a certificate and therefore a domain. This stack creates a certificate for the subdomain you specify as part of the stack creation and DNS validation in Route 53.
Amazon EC2 configuration – This defines both our AMI and the instance type and configuration. In this case, we’re using Amazon Linux 2 ARM64 edition. Here we also set the instance-managed roles that allow Session Manager connectivity and Secrets Manager access.
ALB configuration – Here we define the internal load balancer and specify the listener, certificate, and target configuration. I have injected the Amazon EC2 configuration as part of the class constructor so that I can reference it directly as a target.
Global accelerator configuration – Finally, the accelerator is defined here with two ports open, the ALB we defined in the ALB stack as a target, and most importantly adds in a CNAME DNS entry pointing to the DNS name of the accelerator.

Walkthrough overview

This walkthrough uses the AWS CDK command line tools to deploy the stack. Session Manager is enabled to allow access to the EC2 instance and configure the Code Server application and associated plugins.

The walkthrough specifically covers the following steps:

Deploy the AWS CDK stacks via CloudShell to build out the application infrastructure and associated IAM roles.
Launch Code Server via the official Docker container with the commands to get and set the password stored in Secrets Manager.
Log in and build the rust-analyzer and CodeLLDB plugins from a terminal to allow for debugging within a “Hello World” application.

Start CloudShell and install the appropriate tooling

In this section, I use dummy values for the domain, the VPC CIDR, AWS Region, and the secret password. You need to submit real values as appropriate.

sudo yum groupinstall -y "Development Tools"
sudo npm install aws-cdk -g
git clone https://github.com/aws-samples/cdk-graviton2-alb-aga-route53.git
cd cdk-graviton2-alb-aga-route53
python3 -m venv .
source bin/activate
python -m pip install -r requirements.txt
export VPC_CIDR=”10.0.0.1/16” #Substitute your CIDR here.
export CDK_DEPLOY_ACCOUNT=`aws sts get-caller-identity | jq -r '.Account'`
export CDK_DEPLOY_REGION=$AWS_REGION
export R53_DOMAIN=”code-server.example.com” #Substitute your domain here.
cdk bootstrap aws://$CDK_DEPLOY_ACCOUNT/$CDK_DEPLOY_REGION
cdk deploy --all

The deploy step takes around 10-15 mins to run and prompts a couple of times to add resources like security groups and IAM roles.

Log in to the new instance using Session Manager

Install the latest version of the Session Manager plugin for the AWS CLI:

cd ~
curl "https://s3.amazonaws.com/session-manager-downloads/plugin/latest/linux_64bit/session-manager-plugin.rpm" -o "session-manager-plugin.rpm"
sudo yum install -y session-manager-plugin.rpm

Now start a session, logging into the newly created EC2 instance and log in as ec2-user:

aws ssm start-session --target i-1234xyz7890abc #Substitute the instance id we just created here
#Once session is active:
sudo su - ec2-user

Add the password as a secret and start the container

Enter the following code to add the password as a secret in Secrets Manager and start the container:

aws secretsmanager create-secret --name CodeServerProd --secret-string Password123abc # Substitute the appropriate password here.
sudo docker run -d --name=code-server -e PUID=1000 -e PGID=1000 -e PASSWORD=`aws secretsmanager get-secret-value --secret-id CodeServerProd | jq -r '.SecretString'` -p 8080:8080 -v /home/ec2-user/.config:/config --restart unless-stopped codercom/code-server

Access and configure the web application for Rust development

So far, we have accomplished the following:

Created the infrastructure in the diagram via AWS CDK deployment
Configured the EC2 instance to run Docker and added this to the systemctl startup scripts
Created a secret in Secrets Manager to use as the application login password
Instantiated a Docker container running Code Server

Next, we access the running container via the web interface and install the required development tools.

Log in to the Code Server web application

To log in to the Code Server web application, complete the following steps:

Browse to https://code-server.example.com, where example.com is the name of the domain you supplied in the AWS CDK step.
Log in using the password you created in Secrets Manager.
Create a new terminal by choosing the hamburger icon and, under Terminal, choosing New Terminal.
Issue the following commands into the terminal to install the Rust programming language:

bash
sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential npm clang lldb
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

Install the rust-analyzer plugin

Open the extensions panel and enter Rust Analyzer in the search bar. Then install the plugin.

Install the debugger

Go back to the extensions panel in the Code Server application and enter CodeLLDB into the search bar. Then install this extension.

Create a sample application and open it in the Code Server window

To create and use our sample application, complete the following steps:

In the existing Code Server terminal, enter the following:

mkdir -p ~/src/
cd ~/src
cargo new helloworld --bin

Open the newly created folder in Code Server verifying that the helloworld directory was successfully created.

Open File or Folder dialog in Code Server

Rust-analyzer runs when you open up src/main.rs and index the file.
You can run the program by choosing Run in the editor.

Main Code Server editor window showing helloworld Rust program code.

Similarly, to launch the debugger, choose Debug in the editor.

Code Server Debugger view

Troubleshooting

If the CloudShell session times out, you need to reset your environment variables in order to re-deploy, modify, and delete the stack deployment.

Clean up

This stack incurs an estimated monthly cost of $143.00.

To delete the stack, log in to CloudShell and enter the following commands:

cd cdk-graviton2-alb-aga-route53
source bin/activate

# Re-set the environment variables again if required
export VPC_CIDR=”10.0.0.1/16” #Substitute your CIDR here.
export CDK_DEPLOY_ACCOUNT=`aws sts get-caller-identity | jq -r '.Account'`
export CDK_DEPLOY_REGION=$AWS_REGION
export R53_DOMAIN=”code-server.example.com” #Substitute your domain here.
cdk destroy --all

This destroys all the resources created in the first step. You can verify this by browsing to the AWS CloudFormation console and noting the deletion of all the stacks.

Conclusion

AWS is a place where builders can reinvent the future. The future of development means supporting different chipsets depending on different business requirements. This post is designed to enable development targeting the ARM64 microarchitecture by utilizing AWS Graviton2. Happy building!

Author bio

Author portrait

Alistair is a Principal Solutions Architect at AWS focused on EdTech customers. Originally from the west coast of Scotland, Alistair now lives in Fairfield, Connecticut, with his wife and two daughters and enjoys spending time with his family, skiing, golfing, cycling, and using his pellet smoker.

Building serverless applications with streaming data: Part 2

2021-06-07 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-serverless-applications-with-streaming-data-part-2/

Part 1 introduces the Alleycat application that allows bike racers to compete with each other virtually on home exercise bikes. I explain the application’s functionality, how to deploy to your AWS account, and provide an architectural review.

In the example scenario, there are 40,000 users and up to 1,000 competitors may race at any given time. The workload must continuously ingest and buffer this data, then process and analyze the information to provide analytics and leaderboard content for the frontend application.

In this post, I focus on data ingestion. I compare the two different methods used in Alleycat, and discuss other approaches available. This post refers to Amazon Kinesis Data Streams, the AWS SDK, and AWS IoT Core in the solutions.

To set up the example, visit the GitHub repo and follow the instructions in the README.md file. Note that this walkthrough uses services that are not covered by the AWS Free Tier and incur cost.

Using AWS IoT Core to ingest streaming data

AWS IoT Core enables publish-subscribe capabilities for large numbers of client applications. Clients can send data to the backend using the AWS IoT Device SDK, which uses the MQTT standard for IoT messaging. After processing, the backend can publish aggregation and status messages back to the frontend via AWS IoT Core. This service fans out the messages to clients using topics.

When using this approach, note the Quality of Service (QoS) options available. By default, the SDK uses QoS level 0, which means the device does not confirm the message is received. This is intended for workloads that can lose messages occasionally without impacting performance. In Alleycat, if performance metrics are sometimes lost, this does not likely impact the overall end user experience.

For workloads requiring higher reliability, use QoS level 1, which causes the SDK to resend the message until an acknowledgement is received. While there is no additional charge for using QoS level 1, it generally increases the number of messages, which increases the overall cost. You are not charged for the PUBACK acknowledgement message – for more details, read more about AWS IoT Core pricing.

Frontend

In this scenario, the Alleycat frontend application is running on a physical exercise bike. The user selects a racer ID and exercise class and chooses Start Race to join the current virtual race for that class.

Every second, the frontend sends a message containing the cadence and resistance metrics and the current second in the race for the local racer. This message is created as a JSON object in the Home.vue component and sent to the ‘alleycat-publish’ topic:

      const message = {
        uuid: uuidv4(),
        event: this.event,
        deviceTimestamp: Date.now(),
        second: this.currentSecond,
        raceId: RACE_ID,
        name: this.racer.name,
        racerId: this.racer.id,
        classId: this.selectedClassId,
        cadence: this.racer.getCurrentCadence(),
        resistance: this.racer.getCurrentResistance
      }

The IoT.vue component contains the logic for this integration and uses the AWS IoT Device SDK to send and receive messages. On startup, the frontend connects to AWS IoT Core and publishes the messages using an MQTT client:

    bus.$on('publish', (data) => {
      console.log('Publish: ', data)
      mqttClient.publish(topics.publish, JSON.stringify(data))
    })

The SDK automatically attempts to retry in the event of a network disconnection and exposes an error handler to allow custom logic if other errors occur.

Backend

The resources used in the backend are defined using the AWS Serverless Application Model (AWS SAM) and configured in the core setup templates:

Messages are published to topics in AWS IoT Core, which act as channels of interest. The message broker uses topic names and topic filters to route messages between publishers and subscribers. Incoming messages are routed using rules. Alleycat’s IoT rule routes all incoming messages to a Kinesis stream:

  IotTopicRule:
    Type: AWS::IoT::TopicRule
    Properties:
      RuleName: 'alleycatIngest'
      TopicRulePayload:
        RuleDisabled: 'false'
        Sql: "SELECT * FROM 'alleycat-publish'"
        Actions:
        - Kinesis:
            StreamName: 'alleycat'
            PartitionKey: "${timestamp()}"
            RoleArn: !GetAtt IoTKinesisRole.Arn

  IoTKinesisRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - iot.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      Path: /
      Policies:
        - PolicyName: IoTKinesisPutPolicy
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action: 'kinesis:PutRecord'
                Resource: !GetAtt KinesisStream.Arn

Using the AWS::IoT::TopicRule resource, you can optionally define an error action. This allows you to store messages in a durable location, such as an Amazon S3 bucket, if an error occurs. Errors can occur if a rule does not have permission to access a destination or throttling occurs in a target.

Rules can route matching messages to up to 10 targets. For debugging purposes, you can also enable Amazon CloudWatch Logs, which can help in troubleshoot failed message deliveries. The AWS IoT Core Message Broker allows up to 20,000 publish requests per second – if you need a higher limit for your workload, submit a request to AWS Support.

Using the AWS SDK to ingest streaming data

The Alleycat frontend creates traffic for a single user but there is also a simulator application that can generate messages for up to 1,000 riders. Instead of routing messages using an MQTT client, the simulator uses the AWS SDK to put messages directly into the Kinesis data stream.

The SDK provides a service interface object for Kinesis and two API methods for putting messages into streams: putRecord and putRecords. The first option accepts only a single message but the second enables batching of up to 500 messages per request. This is the preferred option for adding multiple messages, compared with calling putRecord multiple times.

The putRecords API takes parameters as a JSON array of messages:

const params = {
   StreamName: 'alley-cat',
   [{
      "Data":"{\"event\":\"update\",\"deviceTimestamp\":1620824038331,\"second\":3,\"raceId\":5402746,\"name\":\"Hayden\",\"racerId\":0,\"classId\":1,\"cadence\":79.8,\"resistance\":79}",
      "PartitionKey":"1620824038331"
   },
   {
      "Data":"{\"event\":\"update\",\"deviceTimestamp\":1620824038331,\"second\":3,\"raceId\":5402746,\"name\":\"Hubert\",\"racerId\":1,\"classId\":1,\"cadence\":60.4,\"resistance\":60.6}",
      "PartitionKey":"1620824038331"
   }
]}

The SDK automatically base64 encodes the Data attribute, which in this case is the JSON string output from JSON.stringify. In the JavaScript SDK, the putRecords API can return a promise, allowing the code to await the operation:

const result = await kinesis.putRecords(params).promise()

Shards and partition keys

Kinesis data streams consist of one or more shards, which are sequences of data records with a fixed capacity. Each shard can support up to 1,000 records per second for writes, up to maximum total data write rate of 1MB per second. The total capacity of a stream is the total of its shards.

When you send messages to a stream, the partitionKey attribute determines which shard it is routed to. The example application configures a Kinesis data stream with a single shard so the partitionKey attribute has no effect – all messages are routed to the same shard. However, many production applications have more than one shard and use the partitionKey to assign messages to shards.

The partitionKey is hashed by the Kinesis service to route to a shard. This diagram shows how partitionKey values from data producers are hashed by an MD5 function and mapped to individual shards:

While you cannot designate a specific shard ID in a message, you can influence the assignment depending on your choice of partitionKey:

Random: Using a randomized value results in random hash so messages are randomly sent to different shards. This effectively load balances messages across all available shards.
Time-based: A timestamp value may cause groups of messages sent to a single shard, if the messages arrive at the same time. The identical timestamp results in an identical hash.
Application-specific: if Alleycat used the classID as a partitionKey, racers in each class would always be routed to the same shard. This could be useful for downstream aggregation logic but would limit the capacity of messages per classID.

Optimizing capacity in a shard

Each shard can ingest data at a rate of 1 MB per second or 1,000 records per second, whichever limit is reached first. Since the payload maximum is 1MB, this could equate to one 1MB message per second. If the payload is larger, you must divide it into smaller pieces to avoid an error. For 1,000 messages, each payload must be under 1 KB on average to fit within the allowed capacity.

The combination of the two payload limits can result in different capacity profiles for a shard:

The data payloads are evenly sized and use the 1 MB per second capacity.
Data payload sizes vary, so the number of messages that can be packed into 1 MB varies per second.
There are a large number of very messages, consuming all 1,000 messages per second. However, the total data capacity used is significantly less than 1 MB.

In the Alleycat application, the average payload size is around 170 bytes. When producing 1,000 messages a second, the workload is only using about 20% of the 1 MB per second limit. Since PUT payload size is a factor in Kinesis pricing, messages that are much smaller than 25 KB are less cost-efficient. Compare these two messaging patterns for the Alleycat application:

In this default mode, a smaller message is published once per second. This reduces overall latency but results in higher overall messaging cost.
The client application batches outgoing messages and sends to Kinesis every 5 seconds. This results in lower cost and better packing of messages, but introduces additional latency.

There is a tradeoff between cost and latency when optimizing a shard’s capacity and the decision depends upon the needs of your workload. If the client buffers messages, this adds latency on the client side. This is acceptable in many workloads that collect metrics for archival or asynchronous reporting purchases. However, for low-latency applications like Alleycat, it provides a better experience for the application user to send messages as soon as they are available.

Conclusion

This post focuses on ingesting data into Kinesis Data Streams. I explain the two approaches used by the Alleycat frontend and the simulator application and highlight other approaches that you can use. I show how messages are routed to shards using partition keys. Finally, I explore additional factors to consider when ingesting data, to improve efficiency and reduce cost.

Part 3 covers using Amazon Kinesis Data Firehose for transforming, aggregating, and loading streaming data into data stores. This is used to provide the historical, second-by-second leaderboard for the frontend application.

For more serverless learning resources, visit Serverless Land.

Build and Deploy Docker Images to AWS using EC2 Image Builder

2021-05-22 Joseph Keating

Post Syndicated from Joseph Keating original https://aws.amazon.com/blogs/devops/build-and-deploy-docker-images-to-aws-using-ec2-image-builder/

The NFL, an AWS Professional Services partner, is collaborating with NFL’s Player Health and Safety team to build the Digital Athlete Program. The Digital Athlete Program is working to drive progress in the prevention, diagnosis, and treatment of injuries; enhance medical protocols; and further improve the way football is taught and played. The NFL, in conjunction with AWS Professional Services, delivered an EC2 Image Builder pipeline for automating the production of Docker images. Following similar practices from the Digital Athlete Program, this post demonstrates how to deploy an automated Image Builder pipeline.

“AWS Professional Services faced unique environment constraints, but was able to deliver a modular pipeline solution leveraging EC2 Image Builder. The framework serves as a foundation to create hardened images for future use cases. The team also provided documentation and knowledge transfer sessions to ensure our team was set up to successfully manage the solution.”—Joseph Steinke, Director, Data Solutions Architect, National Football League

A common scenario you may face is how to build Docker images that can be utilized throughout your organization. You may already have existing processes that you’re looking to modernize. You may be looking for a streamlined, managed approach so you can reduce the overhead of operating your own workflows. Additionally, if you’re new to containers, you may be seeking an end-to-end process you can use to deploy containerized workloads. With either case, there is need for a modern, streamlined approach to centralize the configuration and distribution of Docker images. This post demonstrates how to build a secure end-to-end workflow for building secure Docker images.

Image Builder now offers a managed service for building Docker images. With Image Builder, you can automatically produce new up-to-date container images and publish them to specified Amazon Elastic Container Registry (Amazon ECR) repositories after running stipulated tests. You don’t need to worry about the underlying infrastructure. Instead, you can focus simply on your container configuration and use the AWS tools to manage and distribute your images. In this post, we walk through the process of building a Docker image and deploying the image to Amazon ECR, share some security best practices, and demonstrate deploying a Docker image to Amazon Elastic Container Service (Amazon ECS). Additionally, we dive deep into building Docker images following modern principles.

The project we create in this post addresses a use case in which an organization needs an automated workflow for building, distributing, and deploying Docker images. With Image Builder, we build and deploy Docker images and test our image locally that we have created with our Image Builder pipeline.

Solution Overview

The following diagram illustrates our solution architecture.

Figure: Show the architecture of the Docker EC2 Image Builder Pipeline

We configure the Image Builder pipeline with AWS CloudFormation. Then we use Amazon Simple Storage Service (Amazon S3) as our source for the pipeline. This means that when we want to update the pipeline with a new Dockerfile, we have to update the source S3 bucket. The pipeline assumes an AWS Identity and Access Management (IAM) role that we generate later in the post. When the pipeline is run, it pulls the latest Dockerfile configuration from Amazon S3, builds a Docker image, and deploys the image to Amazon ECR. Finally, we use AWS Copilot to deploy our Docker image to Amazon ECS. For more information about Copilot, see Applications.

The style in which the Dockerfile application code was written is a personal preference. For more information, see Best practices for writing Dockerfiles.

Overview of AWS services

For this post, we use the following services:

EC2 Image Builder – Image Builder is a fully managed AWS service that makes it easy to automate the creation, management, and deployment of customized, secure, and up-to-date server images that are pre-installed and pre-configured with software and settings to meet specific IT standards.
Amazon ECR – Amazon ECR is an AWS managed container image registry service that is secure, scalable, and reliable.
CodeCommit – AWS CodeCommit is a fully-managed source control service that hosts secure Git-based repositories.
AWS KMS – Amazon Key Management Service (AWS KMS) is a fully managed service for creating and managing cryptographic keys. These keys are natively integrated with most AWS services. You use a KMS key in this post to encrypt resources.
Amazon S3 – Amazon Simple Storage Service (Amazon S3) is an object storage service utilized for storing and encrypting data. We use Amazon S3 to store our configuration files.
AWS CloudFormation – AWS CloudFormation allows you to use domain-specific languages or simple text files to model and provision, in an automated and secure manner, all the resources needed for your applications across all Regions and accounts. You can deploy AWS resources in a safe, repeatable manner, and automate the provisioning of infrastructure.

Prerequisites

To provision the pipeline deployment, you must have the following prerequisites:

A Git client to clone the source code provided.
Docker installed and running on the local host or laptop.
The latest version of the AWS Command Line Interface (AWS CLI). For more information, see Installing, updating, and uninstalling the AWS CLI.
An AWS account with local credentials properly configured (typically under ~/.aws/credentials).
An IAM user with Git credentials.
The source code cloned locally.

CloudFormation templates

You use the following CloudFormation templates to deploy several resources:

vpc.yml – Contains all the core networking configuration. It deploys the VPC, two private subnets, two public subnets, and the route tables. The private subnets utilize a NAT gateway to communicate to the internet. The public subnets have full outbound access to the internet gateway.
kms.yml – Contains the AWS Key Management Service (AWS KMS) configuration that we use for encrypting resources. The KMS key policy is also configured in this template.
s3-iam-config.yml – Contains the S3 bucket and IAM roles we use with our Image Builder pipeline.
docker-image-builder.yml – Contains the configuration for the Image Builder pipeline that we use to build Docker images.

Docker Overview

Containerizing an application comes with many benefits. By containerizing an application, the application is decoupled from the underlying infrastructure, greater consistency is gained across environments, and the application can now be deployed in a loosely coupled microservice model. The lightweight nature of containers enables teams to spend less time configuring their application and more time building features that create value for their customers. To achieve these great benefits, you need reliable resources to centralize the creation and distribution of your container images. Additionally, you need to understand container fundamentals. Let’s start by reviewing a Docker base image.

In this post, we follow the multi-stage pattern for building our Docker image. With this approach, we can selectively copy artifacts from one phase to another. This allows you to remove anything not critical to the application’s function in the final image. Let’s walk through some of the logic we put into our Docker image to optimize performance and security.

Let’s begin by looking at line 15-25. Here, we are pulling down the latest amazon/aws-cli Docker image. We are leveraging this image so that we can utilize IAM credentials to clone our CodeCommit repository. In lines 15-24 we are installing and configuring our git configuration. Finally, in line 25 we are cloning our application code from our repository.

In this next section, we set environment variables, installing packages, unpack tar files, and set up a custom Java Runtime Environment (JRE). Amazon Corretto is a no-cost, multi-platform, production-ready distribution of the Open Java Development Kit (OpenJDK). One important distinction to make here is how we are utilizing RUN and ADD in the Dockerfile. By configuring our own custom JRE we can remove unnecessary modules from our image. One of our goals with building Docker images is to keep them lightweight, which is why we are taking the extra steps to ensure that we don’t add any unnecessary configuration.

Let’s take a look at the next section of the Dockerfile. Now that we have all the package that we require, we will create a working directory where we will install our demo app. After the application code is pulled down from CodeCommit, we use Maven to build our artifact.

In the following code snippet, we use FROM to begin a new stage in our build. Notice that we are using the same base as our first stage. If objects on the disk/filesystem in in the first stage stay the same, the previous stage cache can be reused. Using this pattern can greatly reduce build time.

Docker images have a single unique digest. This is a SHA-256 value and is known as the immutable identifier for the image. When changes are made to your image, through a Dockerfile update for example, a new image with a new immutable identifier is generated. The immutable identifier is pinned to prevent unexpected behaviors in code due to change or update. You can also prevent man-in-the-middle attacks by adopting this pattern. Additionally, using a SHA can mitigate the risk of having to rely on mutable tags that can be applied or changed to the wrong image by mistake. You can use the following command to check to ensure that no unintended changes occured.

docker images <input_container_image_id> --digests

Lastly, we configure our final stage, in which we create a user and group to manage our application inside the container. As this user, we copy the binaries created from our first stage. With this pattern, you can clearly see the benefit of using stages when building Docker images. Finally, we note the port that should be published with expose for the container and we define our Entrypoint, which is the instruction we use to run our container.

Deploying the CloudFormation templates

To deploy your templates, complete the following steps:

1. Create a directory where we store all of our demo code by running the following from your terminal:

mkdir awsblogrepo && cd awsblogrepo

2. Clone the source code repository found in the following location:

git clone https://github.com/aws-samples/build-and-deploy-docker-images-to-aws-using-ec2-image-builder.git

You now use the AWS CLI to deploy the CloudFormation templates. Make sure to leave the CloudFormation template names as written in this post.

3. Deploy the VPC CloudFormation template:

aws cloudformation create-stack \
--stack-name vpc-config \
--template-body file://templates/vpc.yml \
--parameters file://parameters/vpc-params.json  \
--capabilities CAPABILITY_IAM \
--region us-east-1

The output should look like the following code:

{
    "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/vpc-config/12e90fe0-76c9-11eb-9284-12717722e021"
}

4. Open the parameters/kms-params.json file and update the UserARN parameter with your account ID:

[
  {
      "ParameterKey": "KeyName",
      "ParameterValue": "DemoKey"
  },
  {
    "ParameterKey": "UserARN",
    "ParameterValue": "arn:aws:iam::<input_your_account_id>:root"
  }
]

5. Deploy the KMS key CloudFormation template:

aws cloudformation create-stack \
--stack-name kms-config \
--template-body file://templates/kms.yml \
--parameters file://parameters/kms-params.json \
--capabilities CAPABILITY_IAM \
--region us-east-1

The output should look like the following:

{
    "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/kms-config/66a663d0-777d-11eb-ad2b-0e84b19d341f"
}

6. Open the parameters/s3-iam-config.json file and update the DemoConfigS3BucketName parameter to a unique name of your choosing:

[
  {
    "ParameterKey" : "Environment",
    "ParameterValue" : "dev"
  },
  {
    "ParameterKey": "NetworkStackName",
    "ParameterValue" : "vpc-config"
  },
  {
    "ParameterKey" : "EC2InstanceRoleName",
    "ParameterValue" : "EC2InstanceRole"
  },
  {
    "ParameterKey" : "DemoConfigS3BucketName",
    "ParameterValue" : "<input_your_unique_bucket_name>"
  },
  {
    "ParameterKey" : "KMSStackName",
    "ParameterValue" : "kms-config"
  }
]

7. Deploy the IAM role configuration template:

aws cloudformation create-stack \
--stack-name s3-iam-config \
--template-body file://templates/s3-iam-config.yml \
--parameters file://parameters/s3-iam-config.json \
--capabilities CAPABILITY_NAMED_IAM \
--region us-east-1

The output should look like the following:

{
    "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/s3-iam-config/8b69c270-7782-11eb-a85c-0ead09d00613"
}

8. Open the parameters/kms-params.json file:

[
  {
      "ParameterKey": "KeyName",
      "ParameterValue": "DemoKey"
  },
  {
    "ParameterKey": "UserARN",
    "ParameterValue": "arn:aws:iam::1234567891012:root"
  }
]

9. Add the following values as a comma-separated list to the UserARN parameter key. Make sure to provide your AWS account ID:

arn:aws:iam::<input_your_aws_account_id>:role/EC2ImageBuilderRole

When finished, the file should look similar to the following:

[
  {
      "ParameterKey": "KeyName",
      "ParameterValue": "DemoKey"
  },
  {
    "ParameterKey": "UserARN",
    "ParameterValue": "arn:aws:iam::123456789012:role/EC2ImageBuilderRole,arn:aws:iam::123456789012:root"
  }
]

Now that the AWS KMS parameter file has been updated, you update the AWS KMS CloudFormation stack.

10. Run the following command to update the kms-config stack:

aws cloudformation update-stack \
--stack-name kms-config \
--template-body file://templates/kms.yml \
--parameters file://parameters/kms-params.json \
--capabilities CAPABILITY_IAM \
--region us-east-1

The output should look like the following:

{
    "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/kms-config/66a663d0-777d-11eb-ad2b-0e84b19d341f"
}

11. Open the parameters/docker-image-builder-params.json file and update the ImageBuilderBucketName parameter to the bucket name you generated earlier:

[
  {
    "ParameterKey": "Environment",
    "ParameterValue": "dev"
  },
  {
      "ParameterKey": "ImageBuilderBucketName",
      "ParameterValue": "<input_your_s3_bucket_name>"
  },
  {
      "ParameterKey": "NetworkStackName",
      "ParameterValue": "vpc-config"
  },
  {
      "ParameterKey": "KMSStackName",
      "ParameterValue": "kms-config"
  },
  {
      "ParameterKey": "S3ConfigStackName",
      "ParameterValue": "s3-iam-config"
  },
  {
      "ParameterKey": "ECRName",
      "ParameterValue": "demo-ecr"
  }
]

12. Run the following commands to upload the Dockerfile and component file to S3. Make sure to update the s3 bucket name with the name you generated earlier:

aws s3 cp java/Dockerfile s3://<input_your_bucket_name>/Dockerfile && \
aws s3 cp components/component.yml s3://<input_your_bucket_name>/component.yml

The output should look like the following:

upload: java/Dockerfile to s3://demo12345/Dockerfile
upload: components/component.yml to s3://demo12345/component.yml

13. Deploy the docker-image-builder.yml template:

aws cloudformation create-stack \
--stack-name docker-image-builder-config \
--template-body file://templates/docker-image-builder.yml \
--parameters file://parameters/docker-image-builder-params.json \
--capabilities CAPABILITY_NAMED_IAM \
--region us-east-1

The output should look like the following:

{
    "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/docker-image-builder/24317190-76f4-11eb-b879-0afa5528cb21"
}

Configure the Repository

You use AWS CodeCommit as your source control repository. You now walk through the steps of deploying our CodeCommit repository:

1. On the CodeCommit console, choose Repositories.

2. Locate your repository and under Clone URL, choose HTTPS.

Figure: Shows DemoRepo CodeCommit Repository

You clone this repository in the build directory you created when deploying the CloudFormation templates.

3. In your terminal, past the Git URL from the previous step and clone the repository:

git clone https://git-codecommit.us-east-1.amazonaws.com/v1/repos/DemoRepo

4. Now let’s create and push your main branch:

cd DemoRepo

git checkout -b main

touch initial.txt

git add . && git commit -m "Initial commit"

git push -u origin main

5. On the Repositories page of the CodeCommit console, choose DemoRepo.

The following screenshot shows that we have created our main branch and pushed our first commit to our repository.

Figure: Shows the DemoRepo main branch

6. Back in your terminal, create a new feature branch:

git checkout -b feature/configure-repo

7. Create the build directories:

mkdir templates; \
mkdir parameters; \
mkdir java; \
mkdir components

You now copy over the configuration files from the cloned GitHub repository to our CodeCommit repository.

8. Run the following command from the awsblogrepo directory you created earlier:

cp -r build-and-deploy-docker-images-to-aws-using-ec2-image-builder/* DemoRepo/

9. Commit and push your changes:

git add . && git commit -m "Copying config files into source control."

git push --set-upstream origin feature/configure-repo

10. On the CodeCommit console, navigate to DemoRepo.

Figure: Shows the DemoRepo CodeCommit Repository

11. In the navigation pane, under Repositories, choose Branches.

Figure: Shows the DemoRepo’s code

12. Select the feature/configure-repo branch.

Figure: Shows the DemoRepo’s branches

13. Choose Create pull request.

Figure: Shows the DemoRepo code

14. For Title, enter Repository Configuration.

15. For Description, enter a brief description.

16. Choose Create pull request.

Figure: Shows a pull request for DemoRepo

17. Choose Merge to merge the pull request.

Figure: Shows merge for DemoRepo pull request

Now that you have all the code copied into your CodeCommit repository, you now build an image using the Image Builder pipeline.

EC2 Image Builder Deep Dive

With Image Builder, you can build and deploy Docker images to your AWS account. Let’s look at how your Image Builder pipeline is configured.

A recipe defines the source image to use as your starting point to create a new image, along with the set of components that you add to customize your image and verify that everything is working as expected. Take note of the ParentImage property. Here, you’re declaring that the parent image that your pipeline pulls from the latest Amazon Linux image. This enables organizations to define images that they have approved to be utilized downstream by development teams. Having better control over what Docker images development teams are using improves an organization security posture while enabling the developers to have the tools they need readily available. The DockerfileTemplateUri property refers to the location of the Dockerfile that your Image Builder pipeline is deploying. Take some time to review the configuration.

Run the Image Builder Pipeline

Now you build a Docker image by running the pipeline.

1. Update your account ID and run the following command:

aws imagebuilder start-image-pipeline-execution \
--image-pipeline-arn arn:aws:imagebuilder:us-east-1:<input_your_aws_account_id>:image-pipeline/docker-image-builder-config-docker-java-container

The output should look like the following:

{
    "requestId": "87931a2e-cd74-44e9-9be1-948fec0776aa",
    "clientToken": "e0f710be-0776-43ea-a6d7-c10137a554bf",
    "imageBuildVersionArn": "arn:aws:imagebuilder:us-east-1:123456789012:image/docker-image-builder-config-container-recipe/1.0.0/1"
}

2. On the Image Builder console, choose the docker-image-builder-config-docker-java-container pipeline.

Figure: Shows EC2 Image Builder Pipeline status

At the bottom of the page, a new Docker image is building.

3. Wait until the image status becomes Available.

Figure: Shows docker image building in EC2 Image Builder console

4. On the Amazon ECR console, open java-demo-ib.

The Docker image has been successfully created, tagged, and deployed to Amazon ECR from the Image Builder pipeline.

Figure: Shows demo-java-ib image in ECR

Test the Docker Image Locally

1. On the Amazon ECR console, open java-demo-ib.

2. Copy the image URI.

ECR Screenshot

3. Run the following commands to authenticate to your ECR repository:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <input_your_account_id>.dkr.ecr.us-east-1.amazonaws.com

4. Run the following command in your terminal, and update the Amazon ECR URI with the content you copied from the previous step:

docker pull <input_ecr_image_uri>

You should see output similar to the following:

1.0.0-80: Pulling from demo-java-ib
596ba82af5aa: Pull complete 
6f476912a053: Pull complete 
3e7162a86ef8: Pull complete 
ec7d8bb8d044: Pull complete 
Digest: sha256:14668cda786aa496f406062ce07087d66a14a7022023091e9b953aae0bdf2634
Status: Downloaded newer image for 123456789012.dkr.ecr.us-east-1.amazonaws.com/demo-java-ib:1.0.0-1
123456789012.dkr.ecr.us-east-1.amazonaws.com/demo-java-ib:1.0.0-1

5. Run the following command from your terminal:

docker image ls

You should see output similar to the following:

REPOSITORY                                                  TAG        IMAGE ID       CREATED          SIZE
123456789012.dkr.ecr.us-east-1.amazonaws.com/demo-java-ib   1.0.0-1   ac75e982863c   34 minutes ago   47.3MB

6. Run the following command from your terminal using the IMAGE ID value from the previous output:

docker run -dp 8090:8090 --name java_hello_world -it <docker_image_id> sh

You should see an output similar to the following:

49ea3a278639252058b55ab80c71245d9f00a3e1933a8249d627ce18c3f59ab1

7. Test your container by running the following command:

curl localhost:8090

You should see an output similar to the following:

Hello World!

8. Now that you have verified that your container is working properly, you can stop your container. Run the following command from your terminal:

docker stop java_hello_world

Conclusion

In this article, we showed how to leverage AWS services to automate the creation, management, and distribution of Docker Images. We walked through how to configure EC2 Image Builder to create and distribute Docker images. Finally, we built a Docker image using our EC2 Image Builder pipeline and tested the image locally. Thank you for reading!

Joe Keating is a Modernization Architect in Professional Services at Amazon Web Services. He works with AWS customers to design and implement a variety of solutions in the AWS Cloud. Joe enjoys cooking with a glass or two of wine and achieving mediocrity on the golf course.

Virginia Chu is a Sr. Cloud Infrastructure Architect in Professional Services at Amazon Web Services. She works with enterprise-scale customers around the globe to design and implement a variety of solutions in the AWS Cloud.

BK works as a Senior Security Architect with AWS Professional Services. He love to solve security problems for his customers, and help them feel comfortable within AWS. Outside of work, BK loves to play computer games, and go on long drives.

Forwarding emails automatically based on content with Amazon Simple Email Service

2021-05-19 Murat Balkan

Post Syndicated from Murat Balkan original https://aws.amazon.com/blogs/messaging-and-targeting/forwarding-emails-automatically-based-on-content-with-amazon-simple-email-service/

Introduction

Email is one of the most popular channels consumers use to interact with support organizations. In its most basic form, consumers will send their email to a catch-all email address where it is further dispatched to the correct support group. Often, this requires a person to inspect content manually. Some IT organizations even have a dedicated support group that handles triaging the incoming emails before assigning them to specialized support teams. Triaging each email can be challenging, and delays in email routing and support processes can reduce customer satisfaction. By utilizing Amazon Simple Email Service’s deep integration with Amazon S3, AWS Lambda, and other AWS services, the task of categorizing and routing emails is automated. This automation results in increased operational efficiencies and reduced costs.

This blog post shows you how a serverless application will receive emails with Amazon SES and deliver them to an Amazon S3 bucket. The application uses Amazon Comprehend to identify the dominant language from the message body. It then looks it up in an Amazon DynamoDB table to find the support group’s email address specializing in the email subject. As the last step, it forwards the email via Amazon SES to its destination. Archiving incoming emails to Amazon S3 also enables further processing or auditing.

Architecture

By completing the steps in this post, you will create a system that uses the architecture illustrated in the following image:

Architecture showing how to forward emails by content using Amazon SES

The flow of events starts when a customer sends an email to the generic support email address like [email protected]. This email is listened to by Amazon SES via a recipient rule. As per the rule, incoming messages are written to a specified Amazon S3 bucket with a given prefix.

This bucket and prefix are configured with S3 Events to trigger a Lambda function on object creation events. The Lambda function reads the email object, parses the contents, and sends them to Amazon Comprehend for language detection.

Amazon DynamoDB looks up the detected language code from an Amazon DynamoDB table, which includes the mappings between language codes and support group email addresses for these languages. One support group could answer English emails, while another support group answers French emails. The Lambda function determines the destination address and re-sends the same email address by performing an email forward operation. Suppose the lookup does not return any destination address, or the language was not be detected. In that case, the email is forwarded to a catch-all email address specified during the application deployment.

In this example, Amazon SES hosts the destination email addresses used for forwarding, but this is not a requirement. External email servers will also receive the forwarded emails.

Prerequisites

To use Amazon SES for receiving email messages, you need to verify a domain that you own. Refer to the documentation to verify your domain with Amazon SES console. If you do not have a domain name, you will register one from Amazon Route 53.

Deploying the Sample Application

Clone this GitHub repository to your local machine and install and configure AWS SAM with a test AWS Identity and Access Management (IAM) user.

You will use AWS SAM to deploy the remaining parts of this serverless architecture.

The AWS SAM template creates the following resources:

An Amazon DynamoDB mapping table (language-lookup) contains information about language codes and associates them with destination email addresses.
An AWS Lambda function (BlogEmailForwarder) that reads the email content parses it, detects the language, looks up the forwarding destination email address, and sends it.
An Amazon S3 bucket, which will store the incoming emails.
IAM roles and policies.

To start the AWS SAM deployment, navigate to the root directory of the repository you downloaded and where the template.yaml AWS SAM template resides. AWS SAM also requires you to specify an Amazon Simple Storage Service (Amazon S3) bucket to hold the deployment artifacts. If you haven’t already created a bucket for this purpose, create one now. You will refer to the documentation to learn how to create an Amazon S3 bucket. The bucket should have read and write access by an AWS Identity and Access Management (IAM) user.

At the command line, enter the following command to package the application:

sam package --template template.yaml --output-template-file output_template.yaml --s3-bucket BUCKET_NAME_HERE

In the preceding command, replace BUCKET_NAME_HERE with the name of the Amazon S3 bucket that should hold the deployment artifacts.

AWS SAM packages the application and copies it into this Amazon S3 bucket.

When the AWS SAM package command finishes running, enter the following command to deploy the package:

sam deploy --template-file output_template.yaml --stack-name blogstack --capabilities CAPABILITY_IAM --parameter-overrides FromEmailAddress=info@ YOUR_DOMAIN_NAME_HERE CatchAllEmailAddress=catchall@ YOUR_DOMAIN_NAME_HERE

In the preceding command, change the YOUR_DOMAIN_NAME_HERE with the domain name you validated with Amazon SES. This domain also applies to other commands and configurations that will be introduced later.

This example uses “blogstack” as the stack name, you will change this to any other name you want. When you run this command, AWS SAM shows the progress of the deployment.

Configure the Sample Application

Now that you have deployed the application, you will configure it.

Configuring Receipt Rules

To deliver incoming messages to Amazon S3 bucket, you need to create a Rule Set and a Receipt rule under it.

Note: This blog uses Amazon SES console to create the rule sets. To create the rule sets with AWS CloudFormation, refer to the documentation.

Navigate to the Amazon SES console. From the left navigation choose Rule Sets.
Choose Create a Receipt Rule button at the right pane.
Add info@YOUR_DOMAIN_NAME_HERE as the first recipient addresses by entering it into the text box and choosing Add Recipient.

Choose the Next Step button to move on to the next step.

On the Actions page, select S3 from the Add action drop-down to reveal S3 action’s details. Select the S3 bucket that was created by the AWS SAM template. It is in the format of your_stack_name-inboxbucket-randomstring. You will find the exact name in the outputs section of the AWS SAM deployment under the key name InboxBucket or by visiting the AWS CloudFormation console. Set the Object key prefix to info/. This tells Amazon SES to add this prefix to all messages destined to this recipient address. This way, you will re-use the same bucket for different recipients.

Choose the Next Step button to move on to the next step.

In the Rule Details page, give this rule a name at the Rule name field. This example uses the name info-recipient-rule. Leave the rest of the fields with their default values.

Choose the Next Step button to move on to the next step.

Review your settings on the Review page and finalize rule creation by choosing Create Rule

In this example, you will be hosting the destination email addresses in Amazon SES rather than forwarding the messages to an external email server. This way, you will be able to see the forwarded messages in your Amazon S3 bucket under different prefixes. To host the destination email addresses, you need to create different rules under the default rule set. Create three additional rules for catchall@YOUR_DOMAIN_NAME_HERE , english@ YOUR_DOMAIN_NAME_HERE and french@YOUR_DOMAIN_NAME_HERE email addresses by repeating the steps 2 to 5. For Amazon S3 prefixes, use catchall/, english/, and french/ respectively.

Configuring Amazon DynamoDB Table

To configure the Amazon DynamoDB table that is used by the sample application

Navigate to Amazon DynamoDB console and reach the tables view. Inspect the table created by the AWS SAM application.

language-lookup table is the table where languages and their support group mappings are kept. You need to create an item for each language, and an item that will hold the default destination email address that will be used in case no language match is found. Amazon Comprehend supports more than 60 different languages. You will visit the documentation for the supported languages and add their language codes to this lookup table to enhance this application.

To start inserting items, choose the language-lookup table to open table overview page.
Select the Items tab and choose the Create item From the dropdown, select Text. Add the following JSON content and choose Save to create your first mapping object. While adding the following object, replace Destination attribute’s value with an email address you own. The email messages will be forwarded to that address.

{

“language”: “en”,

“destination”: “english@YOUR_DOMAIN_NAME_HERE”

}

Lastly, create an item for French language support.

{

“language”: “fr”,

“destination”: “french@YOUR_DOMAIN_NAME_HERE”

}

Testing

Now that the application is deployed and configured, you will test it.

Use your favorite email client to send the following email to the domain name info@ email address.

Subject: I need help

Body:

Hello, I’d like to return the shoes I bought from your online store. How can I do this?

After the email is sent, navigate to the Amazon S3 console to inspect the contents of the Amazon S3 bucket that is backing the Amazon SES Rule Sets. You will also see the AWS Lambda logs from the Amazon CloudWatch console to confirm that the Lambda function is triggered and run successfully. You should receive an email with the same content at the address you defined for the English language.

Next, send another email with the same content, this time in French language.

Subject: j’ai besoin d’aide

Body:

Bonjour, je souhaite retourner les chaussures que j’ai achetées dans votre boutique en ligne. Comment puis-je faire ceci?

Suppose a message is not matched to a language in the lookup table. In that case, the Lambda function will forward it to the catchall email address that you provided during the AWS SAM deployment.

You will inspect the new email objects under english/, french/ and catchall/ prefixes to observe the forwarding behavior.

Continue experimenting with the sample application by sending different email contents to info@ YOUR_DOMAIN_NAME_HERE address or adding other language codes and email address combinations into the mapping table. You will find the available languages and their codes in the documentation. When adding a new language support, don’t forget to associate a new email address and Amazon S3 bucket prefix by defining a new rule.

Cleanup

To clean up the resources you used in your account,

Navigate to the Amazon S3 console and delete the inbox bucket’s contents. You will find the name of this bucket in the outputs section of the AWS SAM deployment under the key name InboxBucket or by visiting the AWS CloudFormation console.
Navigate to AWS CloudFormation console and delete the stack named “blogstack”.
After the stack is deleted, remove the domain from Amazon SES. To do this, navigate to the Amazon SES Console and choose Domains from the left navigation. Select the domain you want to remove and choose Remove button to remove it from Amazon SES.
From the Amazon SES Console, navigate to the Rule Sets from the left navigation. On the Active Rule Set section, choose View Active Rule Set button and delete all the rules you have created, by selecting the rule and choosing Action, Delete.
On the Rule Sets page choose Disable Active Rule Set button to disable listening for incoming email messages.
On the Rule Sets page, Inactive Rule Sets section, delete the only rule set, by selecting the rule set and choosing Action, Delete.
Navigate to CloudWatch console and from the left navigation choose Logs, Log groups. Find the log group that belongs to the BlogEmailForwarderFunction resource and delete it by selecting it and choosing Actions, Delete log group(s).
You will also delete the Amazon S3 bucket you used for packaging and deploying the AWS SAM application.

Conclusion

This solution shows how to use Amazon SES to classify email messages by the dominant content language and forward them to respective support groups. You will use the same techniques to implement similar scenarios. You will forward emails based on custom key entities, like product codes, or you will remove PII information from emails before forwarding with Amazon Comprehend.

With its native integrations with AWS services, Amazon SES allows you to enhance your email applications with different AWS Cloud capabilities easily.

To learn more about email forwarding with Amazon SES, you will visit documentation and AWS blogs.

Create a serverless feedback collector application using Amazon Pinpoint’s two-way SMS functionality

2021-04-20 Murat Balkan

Post Syndicated from Murat Balkan original https://aws.amazon.com/blogs/messaging-and-targeting/create-a-serverless-feedback-collector-application-by-using-amazon-pinpoints-two-way-sms-functionality/

Introduction

Two-way SMS communication is used by many companies to create interactive engagements with their customers. Traditional SMS notifications are one-way. While this is valid for many different use cases like one-time passwords (OTP) notifications and security notifications or reminders, some other use-cases may benefit from collecting information from the same channel. Two-way SMS allows customers to create this feedback mechanism and enhance business interactions and overall customer experience.

SMS is chosen for its simplicity and availability across different sets of devices. By combining the two-way SMS mechanism with the vast breadth of services Amazon Web Services (AWS) offers, companies can create effective architectures to better interact and serve their customers.

This blog post shows you how a serverless online appointment application can use Amazon Pinpoint’s two-way SMS functionality to collect customer feedback for completed appointments. You will learn how Amazon Pinpoint interacts with other AWS serverless services with its out-of-the-box integrations to create a scalable messaging application.

Architecture

By completing the steps in this post, you can create a system that uses the architecture illustrated in the following image:

The architecture of a feedback collector application that is composed of serverless AWS services

The flow of events starts when a Amazon DynamoDB table item, representing an online appointment, changes its status to COMPLETED. An AWS Lambda function which is subscribed to these changes over DynamoDB Streams detects this change and sends an SMS to the customer by using Amazon Pinpoint API’s sendMessages operation.

Amazon Pinpoint delivers the SMS to the recipient and generates a unique message ID to the AWS Lambda function. The Lambda function then adds this message ID to a DynamoDB table called “message-lookup”. This table is used for tracking different feedback requests sent during a multi-step conversation and associate them with the appointment ids. At this stage, the Lambda function also populates another table “feedbacks” which will hold the feedback responses that will be sent as SMS reply messages.

Each time a recipient replies to an SMS, Amazon Pinpoint publishes this reply event to an Amazon SNS topic which is subscribed by an Amazon SQS queue. Amazon Pinpoint will also add a messageId to this event which allows you to bind it to a sendMessages operation call.

A second AWS Lambda function polls these reply events from the Amazon SQS queue. It checks whether the reply is in the correct format (i.e. a number) and also associated with a previous request. If all conditions are met, the AWS Lambda function checks the ConversationStage attribute’s value from its message-lookup table. According to the current stage and the SMS answer received, AWS Lambda function will determine the next step.

For example, if the feedback score received is less than 5, a follow-up SMS is sent to the user asking if they’ll be happy to receive a call from the customer support team.

All SMS replies from the users are reflected to “feedbacks” table for further analysis.

Deploying the Sample Application

Clone this GitHub repository to your local machine and install and configure AWS SAM with a test AWS IAM user.

You will use AWS SAM to deploy the remaining parts of this serverless architecture.

The AWS SAM template creates the following resources:

- An Amazon DynamoDB table (appointments) that contains information about appointments, customers and their appointment status.
- An Amazon DynamoDB table (feedbacks) that holds the received feedbacks from customers.
- An Amazon DynamoDB table (message-lookup) that holds the Amazon Pinpoint message ids and associate them to appointments to track a multi-step conversation.
- Two AWS Lambda functions (FeedbackSender and FeedbackReceiver)
- An Amazon SNS topic that collects state change events from Amazon Pinpoint.
- An Amazon SQS queue that queues the incoming messages.
- An Amazon Pinpoint Application with an associated SMS channel.

This architecture consists of two Lambda functions, which are represented as two different apps in the AWS SAM template. These functions are named FeedbackSender and FeedbackReceiver. The FeedbackSender function listens the Amazon DynamoDB Stream associated with the appointments table and sends the SMS message requesting a feedback. Second Lambda function, FeedbackReceiver, polls the Amazon SQS queue and updates the feedbacks table in Amazon DynamoDB. (pinpoint-two-way-sms)

Note: You’ll incur some costs by deploying this stack into your account.

To start the SAM deployment, navigate to the root directory of the repository you downloaded and where the template.yaml AWS SAM template resides. AWS SAM also requires you to specify an Amazon Simple Storage Service (Amazon S3) bucket to hold the deployment artifacts. If you haven’t already created a bucket for this purpose, create one now. The bucket should have read and write access by an AWS Identity and Access Management (IAM) user.

At the command line, enter the following command to package the application:

sam package --template template.yaml --output-template-file output_template.yaml --s3-bucket BUCKET_NAME_HERE

In the preceding command, replace BUCKET_NAME_HERE with the name of the Amazon S3 bucket that should hold the deployment artifacts.

AWS SAM packages the application and copies it into this Amazon S3 bucket.

When the AWS SAM package command finishes running, enter the following command to deploy the package:

sam deploy --template-file output_template.yaml --stack-name BlogStackPinpoint --capabilities CAPABILITY_IAM

When you run this command, AWS SAM shows the progress of the deployment. When the deployment finishes, navigate to the Amazon Pinpoint console and choose the project named “BlogApplication”. This example uses “BlogStackPinpoint” as the stack name, you can change this to any other name you want.

From the left navigation, choose Settings, SMS and voice. On the SMS and voice settings page, choose the Request phone number button under Number settings

Screenshot of request phone number screen

Choose a target country. Set the Default message type as Transactional, and click on the Request long codes button to buy a long code.

Note: In United States, you can also request a Toll Free Number(TFN)

Screenshot showing long code additio

A long code will be added to the Number settings list.

Choose the newly added number to reach the SMS Settings page and enable the option Enable two-way-SMS. At the Incoming messages destination, select Choose an existing SNS topic, and from the drop down select the Amazon SNS topic that was created by the BlogStackPinpoint stack.

Choose Save to save your SMS settings.

Testing the Sample Application

Now that the application is deployed and configured, test it by creating sample records in the Amazon DynamoDB table. Navigate to Amazon DynamoDB console and reach the tables view. Inspect the tables that were created by the AWS SAM application.

Here, appointments table is the table where the appointments and their statuses are kept. It tracks the appointment lifecycle events with items identified by unique ids. In this sample scenario, we are assuming that an appointment application creates a record with ‘CREATED’ status when a new appointment is planned. After the appointment is finished, same application updates the status to ‘COMPLETED’ which will trigger the feedback collection process. Feedback results are collected in the feedbacks table. Amazon Pinpoint message id’s, conversation stage and appointment id’s are kept in the message-lookup table.

To start testing the end-to-end flow, choose the appointments table to open table overview page.
Next, select the Items tab and choose the Create item From the dropdown, select Text. Add the following and choose Save to create your first appointment object. While adding the following object, replace CustomerPhone attribute’s value with a phone number you own. The feedback request messages will be delivered to that number. Note: This number should match the country number for the long code you provisioned.

{

"CustomerName": "Customer A",

"CustomerPhone": "+12345678900",

"AppointmentStatus":"CREATED",

"id": "1"

}

To trigger sending the feedback SMS, you need to set an existing item’s status to “COMPLETED” To do this, select the item and click Edit from the Actions menu.

Replace the item’s current JSON with the following.

{

"AppointmentStatus": "COMPLETED",

"CustomerName": "Customer A",

"CustomerPhone": "+12345678900",

"id": "1"

}

Before choosing the Save button, double check that you have set CustomerPhone attribute’s value to a valid phone number.

After the change, you should receive an SMS message asking for a feedback. Provide a numeric reply of that is less than five to this message. This will trigger a follow up question asking for a consent to receive an in-person callback.

During your SMS conversation with the application, inspect the feedbacks table. The feedback you have given over this two-way SMS channel should have been reflected into the table.

If you want to repeat the process, make sure to increment the AppointmentId field for any additional appointment records.

Cleanup

To clean up the resources you used in your account, simply navigate to AWS Cloudformation console and delete the stack named “BlogStackPinpoint”.

After the stack is deleted, you also need to delete the Long code from the Pinpoint Console by choosing the number and pressing Remove phone number button. You can also delete the Amazon S3 bucket you used for packaging and deploying the AWS SAM application.

Conclusion

This architecture shows how Amazon Pinpoint can be used to make two-way SMS communication with your customers. You can implement Two-way SMS functionality in other use cases such as appointment reminders, polls, Q&A services, and more.

To learn more about Pinpoint and it’s two-way SMS mechanism, you can visit the Pinpoint documentation.

Supporting AWS Graviton2 and x86 instance types in the same Auto Scaling group

2021-03-04 Emma White

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/supporting-aws-graviton2-and-x86-instance-types-in-the-same-auto-scaling-group/

This post is written by Tyler Lynch, Sr. Solutions Architect – EdTech, and Praneeth Tekula, Technical Account Manager.

As customers seek performance improvements and to cost optimize their workloads, they are evaluating and adopting AWS Graviton2 based instances. This post provides instructions on how to configure your Amazon EC2 Auto Scaling group (ASG) to use both Graviton2 and x86 based Amazon EC2 Instances in the same Auto Scaling group with different AMIs. This allows you to introduce Graviton2 based instances as part of a multiple instance type strategy.

For example, a customer may want to use the same Auto Scaling group definition across multiple Regions, but an instance type might not available in that region yet. Implementing instance and architecture diversity allow those Auto Scaling group definitions to be portable.

Solution Overview

The Amazon EC2 Auto Scaling console currently doesn’t support the selection of multiple launch templates, so I use the AWS Command Line Interface (AWS CLI) throughout this post. First, you create your launch templates that specify AMIs for use on x86 and arm64 based instances. Then you create your Auto Scaling group using a mixed instance policy with instance level overrides to specify the launch template to use for that instance.

Finally, you extend the launch templates to use architecture-specific EC2 user data to download architecture-specific binaries. Putting it all together, here are the high-level steps to follow:

Create the launch templates:
1. Launch template for x86– Creates a launch template for x86 instances, specifying the AMI but not the instance sizes.
2. Launch template for arm64– Creates a launch template for arm64 instances, specifying the AMI but not the instance sizes.
Create the Auto Scaling group that references the launch templates in a mixed instance policy override.
Create a sample Node.js application.
Create the architecture-specific user data scripts.
Modify the launch templates to use architecture-specific user data scripts.

Prerequisites

The prerequisites for this solution are as follows:

The AWS CLI installed locally. I use AWS CLI version 2 for this post.
- For AWS CLI v2, you must use 2.1.3+
- For AWS CLI v1, you must use 1.18.182+
The correct AWS Identity and Access Management(IAM) role permissions for your account allowing for the creation and execution of the launch templates, Auto Scaling groups, and launching EC2 instances.
A source control service such as AWS CodeCommit or GitHub that your user data script can interact with to git clone the Hello World Node.js application.
The source code repository initialized and cloned locally.

Create the Launch Templates

You start with creating the launch template for x86 instances, and then the launch template for arm64 instances. These are simple launch templates where you only specify the AMI for Amazon Linux 2 in US-EAST-1 (architecture dependent). You use the AWS CLI cli-input-json feature to make things more readable and repeatable.

You first must add the lt-x86-cli-input.json file to your local working for reference by the AWS CLI.

In your preferred text editor, add a new file, and copy paste the following JSON into the file.


{
    "LaunchTemplateName": "lt-x86",
    "VersionDescription": "LaunchTemplate for x86 instance types using Amazon Linux 2 x86 AMI in US-EAST-1",
    "LaunchTemplateData": {
        "ImageId": "ami-04bf6dcdc9ab498ca"
    }
}

Save the file in your local working directory and name it lt-x86-cli-input.json.

Now, add the lt-arm64-cli-input.json file into your local working directory.

In a text editor, add a new file, and copy paste the following JSON into the file.


{
    "LaunchTemplateName": "lt-arm64",
    "VersionDescription": "LaunchTemplate for Graviton2 instance types using Amazon Linux 2 Arm64 AMI in US-EAST-1",
    "LaunchTemplateData": {
        "ImageId": "ami-09e7aedfda734b173"
    }
}

Save the file in your local working directory and name it lt-arm64-cli-input.json.

Now that your CLI input files are ready, create your launch templates using the CLI.

From your terminal, run the following commands:


aws ec2 create-launch-template \
            --cli-input-json file://./lt-x86-cli-input.json \
            --region us-east-1

aws ec2 create-launch-template \
            --cli-input-json file://./lt-arm64-cli-input.json \
            --region us-east-1

After you run each command, you should see the command output similar to this:


{
	"LaunchTemplate": {
		"LaunchTemplateId": "lt-07ab8c76f8e021b0c",
		"LaunchTemplateName": "lt-x86",
		"CreateTime": "2020-11-20T16:08:08+00:00",
		"CreatedBy": "arn:aws:sts::111111111111:assumed-role/Admin/myusername",
		"DefaultVersionNumber": 1,
		"LatestVersionNumber": 1
	}
}

{
	"LaunchTemplate": {
		"LaunchTemplateId": "lt-0c65656a2c75c0f76",
		"LaunchTemplateName": "lt-arm64",
		"CreateTime": "2020-11-20T16:08:37+00:00",
		"CreatedBy": "arn:aws:sts::111111111111:assumed-role/Admin/myusername",
		"DefaultVersionNumber": 1,
		"LatestVersionNumber": 1
	}
}

Create the Auto Scaling Group

Moving on to creating your Auto Scaling group, start with creating another JSON file to use the cli-input-json feature. Then, create the Auto Scaling group via the CLI.

I want to call special attention to the LaunchTemplateSpecification under the MixedInstancePolicy Overrides property. This Auto Scaling group is being created with a default launch template, the one you created for arm64 based instances. You override that at the instance level for x86 instances.

Now, add the asg-mixed-arch-cli-input.json file into your local working directory.

In a text editor, add a new file, and copy paste the following JSON into the file.
You need to change the subnet IDs specified in the VPCZoneIdentifier to your own subnet IDs.


{
    "AutoScalingGroupName": "asg-mixed-arch",
    "MixedInstancesPolicy": {
        "LaunchTemplate": {
            "LaunchTemplateSpecification": {
                "LaunchTemplateName": "lt-arm64",
                "Version": "$Default"
            },
            "Overrides": [
                {
                    "InstanceType": "t4g.micro"
                },
                {
                    "InstanceType": "t3.micro",
                    "LaunchTemplateSpecification": {
                        "LaunchTemplateName": "lt-x86",
                        "Version": "$Default"
                    }
                },
                {
                    "InstanceType": "t3a.micro",
                    "LaunchTemplateSpecification": {
                        "LaunchTemplateName": "lt-x86",
                        "Version": "$Default"
                    }
                }
            ]
        }
    },    
    "MinSize": 1,
    "MaxSize": 5,
    "DesiredCapacity": 3,
    "VPCZoneIdentifier": "subnet-e92485b6, subnet-07fe637b44fd23c31, subnet-828622e4, subnet-9bd6a2d6"
}

Save the file in your local working directory and name it asg-mixed-arch-cli-input.json.

Now that your CLI input file is ready, create your Auto Scaling group using the CLI.

From your terminal, run the following command:


aws autoscaling create-auto-scaling-group \
            --cli-input-json file://./asg-mixed-arch-cli-input.json \
            --region us-east-1

After you run the command, there isn’t any immediate output. Describe the Auto Scaling group to review the configuration.

From your terminal, run the following command:


aws autoscaling describe-auto-scaling-groups \
            --auto-scaling-group-names asg-mixed-arch \
            --region us-east-1

Let’s evaluate the output. I removed some of the output for brevity. It shows that you have an Auto Scaling group with a mixed instance policy, which specifies a default launch template named lt-arm64. In the Overrides property, you can see the instances types that you specified and the values that define the lt-x86 launch template to be used for specific instance types (t3.micro, t3a.micro).


{
    "AutoScalingGroups": [
        {
            "AutoScalingGroupName": "asg-mixed-arch",
            "AutoScalingGroupARN": "arn:aws:autoscaling:us-east-1:111111111111:autoScalingGroup:a1a1a1a1-a1a1-a1a1-a1a1-a1a1a1a1a1a1:autoScalingGroupName/asg-mixed-arch",
            "MixedInstancesPolicy": {
                "LaunchTemplate": {
                    "LaunchTemplateSpecification": {
                        "LaunchTemplateId": "lt-0cc7dae79a397d663",
                        "LaunchTemplateName": "lt-arm64",
                        "Version": "$Default"
                    },
                    "Overrides": [
                        {
                            "InstanceType": "t4g.micro"
                        },
                        {
                            "InstanceType": "t3.micro",
                            "LaunchTemplateSpecification": {
                                "LaunchTemplateId": "lt-04b525bfbde0dcebb",
                                "LaunchTemplateName": "lt-x86",
                                "Version": "$Default"
                            }
                        },
                        {
                            "InstanceType": "t3a.micro",
                            "LaunchTemplateSpecification": {
                                "LaunchTemplateId": "lt-04b525bfbde0dcebb",
                                "LaunchTemplateName": "lt-x86",
                                "Version": "$Default"
                            }
                        }
                    ]
                },
                ...
            },
            ...
            "Instances": [
                {
                    "InstanceId": "i-00377a23630a5e107",
                    "InstanceType": "t4g.micro",
                    "AvailabilityZone": "us-east-1b",
                    "LifecycleState": "InService",
                    "HealthStatus": "Healthy",
                    "LaunchTemplate": {
                        "LaunchTemplateId": "lt-0cc7dae79a397d663",
                        "LaunchTemplateName": "lt-arm64",
                        "Version": "1"
                    },
                    "ProtectedFromScaleIn": false
                },
                {
                    "InstanceId": "i-07c2d4f875f1f457e",
                    "InstanceType": "t4g.micro",
                    "AvailabilityZone": "us-east-1a",
                    "LifecycleState": "InService",
                    "HealthStatus": "Healthy",
                    "LaunchTemplate": {
                        "LaunchTemplateId": "lt-0cc7dae79a397d663",
                        "LaunchTemplateName": "lt-arm64",
                        "Version": "1"
                    },
                    "ProtectedFromScaleIn": false
                },
                {
                    "InstanceId": "i-09e61e95cdf705ade",
                    "InstanceType": "t4g.micro",
                    "AvailabilityZone": "us-east-1c",
                    "LifecycleState": "InService",
                    "HealthStatus": "Healthy",
                    "LaunchTemplate": {
                        "LaunchTemplateId": "lt-0cc7dae79a397d663",
                        "LaunchTemplateName": "lt-arm64",
                        "Version": "1"
                    },
                    "ProtectedFromScaleIn": false
                }
            ],
            ...
        }
    ]
}

Create Hello World Node.js App

Now that you have created the launch templates and the Auto Scaling group you are ready to create the “hello world” application that self-reports the processor architecture. You work in the local directory that is cloned from your source repository as specified in the prerequisites. This doesn’t have to be the local working directory where you are creating architecture-specific files.

In a text editor, add a new file with the following Node.js code:


// Hello World sample app.
const http = require('http');

const port = 3000;

const server = http.createServer((req, res) => {
  res.statusCode = 200;
  res.setHeader('Content-Type', 'text/plain');
  res.end(`Hello World. This processor architecture is ${process.arch}`);
});

server.listen(port, () => {
  console.log(`Server running on processor architecture ${process.arch}`);
});

Save the file in the root of your source repository and name it app.js.
Commit the changes to Git and push the changes to your source repository. See the following commands:


git add .
git commit -m "Adding Node.js sample application."
git push

Create user data scripts

Moving on to your creating architecture-specific user data scripts that will define the version of Node.js and the distribution that matches the processor architecture. It will download and extract the binary and add the binary path to the environment PATH. Then it will clone the Hello World app, and then run that app with the binary of Node.js that was installed.

Now, you must add the ud-x86-cli-input.txt file to your local working directory.

In your text editor, add a new file, and copy paste the following text into the file.
Update the git clone command to use the repo URL where you created the Hello World app previously.
Update the cd command to use the repo name.


sudo yum update -y
sudo yum install git -y
VERSION=v14.15.3
DISTRO=linux-x64
wget https://nodejs.org/dist/$VERSION/node-$VERSION-$DISTRO.tar.xz
sudo mkdir -p /usr/local/lib/nodejs
sudo tar -xJvf node-$VERSION-$DISTRO.tar.xz -C /usr/local/lib/nodejs 
export PATH=/usr/local/lib/nodejs/node-$VERSION-$DISTRO/bin:$PATH
git clone https://github.com/<<githubuser>>/<<repo>>.git
cd <<repo>>
node app.js

Save the file in your local working directory and name it ud-x86-cli-input.txt.

Now, add the ud-arm64-cli-input.txt file into your local working directory.

In a text editor, add a new file, and copy paste the following text into the file.
Update the git clone command to use the repo URL where you created the Hello World app previously.
Update the cd command to use the repo name.


sudo yum update -y
sudo yum install git -y
VERSION=v14.15.3
DISTRO=linux-arm64
wget https://nodejs.org/dist/$VERSION/node-$VERSION-$DISTRO.tar.xz
sudo mkdir -p /usr/local/lib/nodejs
sudo tar -xJvf node-$VERSION-$DISTRO.tar.xz -C /usr/local/lib/nodejs 
export PATH=/usr/local/lib/nodejs/node-$VERSION-$DISTRO/bin:$PATH
git clone https://github.com/<<githubuser>>/<<repo>>.git
cd <<repo>>
node app.js

Save the file in your local working directory and name it ud-arm64-cli-input.txt.

Now that your user data scripts are ready, you need to base64 encode them as the AWS CLI does not perform base64-encoding of the user data for you.

On a Linux computer, from your terminal use the base64 command to encode the user data scripts.


base64 ud-x86-cli-input.txt > ud-x86-cli-input-base64.txt
base64 ud-arm64-cli-input.txt > ud-arm64-cli-input-base64.txt

On a Windows computer, from your command line use the certutil command to encode the user data. Before you can use this file with the AWS CLI, you must remove the first (BEGIN CERTIFICATE) and last (END CERTIFICATE) lines.


certutil -encode ud-x86-cli-input.txt ud-x86-cli-input-base64.txt
certutil -encode ud-arm64-cli-input.txt ud-arm64-cli-input-base64.txt
notepad ud-x86-cli-input-base64.txt
notepad ud-arm64-cli-input-base64.txt

Modify the Launch Templates

Now, you modify the launch templates to use architecture-specific user data scripts.

Please note that the contents of your ud-x86-cli-input-base64.txt and ud-arm64-cli-input-base64.txt files are different from the samples here because you referenced your own GitHub repository. These base64 encoded user data scripts below will not work as is, they contain placeholder references for the git clone and cd commands.

Next, update the lt-x86-cli-input.json file to include your base64 encoded user data script for x86 based instances.

In your preferred text editor, open the ud-x86-cli-input-base64.txt file.
Open the lt-x86-cli-input.json file, and add in the text from the ud-x86-cli-input-base64.txt file into the UserData property of the LaunchTemplateData object. It should look similar to this:


{
    "LaunchTemplateName": "lt-x86",
    "VersionDescription": "LaunchTemplate for x86 instance types using Amazon Linux 2 x86 AMI in US-EAST-1",
    "LaunchTemplateData": {
        "ImageId": "ami-04bf6dcdc9ab498ca",
        "UserData": "IyEvYmluL2Jhc2gKeXVtIHVwZGF0ZSAteQoKVkVSU0lPTj12MTQuMTUuMwpESVNUUk89bGludXgteDY0CndnZXQgaHR0cHM6Ly9ub2RlanMub3JnL2Rpc3QvJFZFUlNJT04vbm9kZS0kVkVSU0lPTi0kRElTVFJPLnRhci54egpzdWRvIG1rZGlyIC1wIC91c3IvbG9jYWwvbGliL25vZGVqcwpzdWRvIHRhciAteEp2ZiBub2RlLSRWRVJTSU9OLSRESVNUUk8udGFyLnh6IC1DIC91c3IvbG9jYWwvbGliL25vZGVqcyAKZXhwb3J0IFBBVEg9L3Vzci9sb2NhbC9saWIvbm9kZWpzL25vZGUtJFZFUlNJT04tJERJU1RSTy9iaW46JFBBVEgKZ2l0IGNsb25lIGh0dHBzOi8vZ2l0aHViLmNvbS88PGdpdGh1YnVzZXI+Pi88PHJlcG8+Pi5naXQKY2QgPDxyZXBvPj4Kbm9kZSBhcHAuanMK"
    }
}

Save the file.

Next, update the lt-arm64-cli-input.json file to include your base64 encoded user data script for arm64 based instances.

In your text editor, open the ud-arm64-cli-input-base64.txt file.
Open the lt-arm64-cli-input.json file, and add in the text from the ud-arm64-cli-input-base64.txt file into the UserData property of the LaunchTemplateData It should look similar to this:


{
    "LaunchTemplateName": "lt-arm64",
    "VersionDescription": "LaunchTemplate for Graviton2 instance types using Amazon Linux 2 Arm64 AMI in US-EAST-1",
    "LaunchTemplateData": {
        "ImageId": "ami-09e7aedfda734b173",
        "UserData": "IyEvYmluL2Jhc2gKeXVtIHVwZGF0ZSAteQoKVkVSU0lPTj12MTQuMTUuMwpESVNUUk89bGludXgtYXJtNjQKd2dldCBodHRwczovL25vZGVqcy5vcmcvZGlzdC8kVkVSU0lPTi9ub2RlLSRWRVJTSU9OLSRESVNUUk8udGFyLnh6CnN1ZG8gbWtkaXIgLXAgL3Vzci9sb2NhbC9saWIvbm9kZWpzCnN1ZG8gdGFyIC14SnZmIG5vZGUtJFZFUlNJT04tJERJU1RSTy50YXIueHogLUMgL3Vzci9sb2NhbC9saWIvbm9kZWpzIApleHBvcnQgUEFUSD0vdXNyL2xvY2FsL2xpYi9ub2RlanMvbm9kZS0kVkVSU0lPTi0kRElTVFJPL2JpbjokUEFUSApnaXQgY2xvbmUgaHR0cHM6Ly9naXRodWIuY29tLzw8Z2l0aHVidXNlcj4+Lzw8cmVwbz4+LmdpdApjZCA8PHJlcG8+Pgpub2RlIGFwcC5qcwoKCg=="
    }
}

Save the file.

Now, your CLI input files are ready. Next, create a new version of your launch templates and then set the newest version as the default.

From your terminal, run the following commands:


aws ec2 create-launch-template-version \
            --cli-input-json file://./lt-x86-cli-input.json \
            --region us-east-1

aws ec2 create-launch-template-version \
            --cli-input-json file://./lt-arm64-cli-input.json \
            --region us-east-1

aws ec2 modify-launch-template \
            --launch-template-name lt-x86 \
            --default-version 2
			
aws ec2 modify-launch-template \
            --launch-template-name lt-arm64 \
            --default-version 2

After you run each command, you should see the command output similar to this:


{
    "LaunchTemplate": {
        "LaunchTemplateId": "lt-08ff3d03d4cf0038d",
        "LaunchTemplateName": "lt-x86",
        "CreateTime": "1970-01-01T00:00:00+00:00",
        "CreatedBy": "arn:aws:sts::111111111111:assumed-role/Admin/myusername",
        "DefaultVersionNumber": 2,
        "LatestVersionNumber": 2
    }
}

{
    "LaunchTemplate": {
        "LaunchTemplateId": "lt-0c5e1eb862a02f8e0",
        "LaunchTemplateName": "lt-arm64",
        "CreateTime": "1970-01-01T00:00:00+00:00",
        "CreatedBy": "arn:aws:sts::111111111111:assumed-role/Admin/myusername",
        "DefaultVersionNumber": 2,
        "LatestVersionNumber": 2
    }
}

Now, refresh the instances in the Auto Scaling group so that the newest version of the launch template is used.

From your terminal, run the following command:


aws autoscaling start-instance-refresh \
            --auto-scaling-group-name asg-mixed-arch

Verify Instances

The sample Node.js application self reports the process architecture in two ways: when the application is started, and when the application receives a HTTP request on port 3000. Retrieve the last five lines of the instance console output via the AWS CLI.

First, you need to get an instance ID from the autoscaling group.

From your terminal, run the following commands:


aws autoscaling describe-auto-scaling-groups \
            --auto-scaling-group-name asg-mixed-arch \
            --region us-east-1

Evaluate the output. I removed some of the output for brevity. You need to use the InstanceID from the output.


{
    "AutoScalingGroups": [
        {
            "AutoScalingGroupName": "asg-mixed-arch",
            "AutoScalingGroupARN": "arn:aws:autoscaling:us-east-1:111111111111:autoScalingGroup:a1a1a1a1-a1a1-a1a1-a1a1-a1a1a1a1a1a1:autoScalingGroupName/asg-mixed-arch",
            "MixedInstancesPolicy": {
                ...
            },
            ...
            "Instances": [
                {
                    "InstanceId": "i-0eeadb140405cc09b",
                    "InstanceType": "t4g.micro",
                    "AvailabilityZone": "us-east-1a",
                    "LifecycleState": "InService",
                    "HealthStatus": "Healthy",
                    "LaunchTemplate": {
                        "LaunchTemplateId": "lt-0c5e1eb862a02f8e0",
                        "LaunchTemplateName": "lt-arm64",
                        "Version": "2"
                    },
                    "ProtectedFromScaleIn": false
                }
            ],
          ....
        }
    ]
}

Now, retrieve the last five lines of console output from the instance.

From your terminal, run the following command:


aws ec2 get-console-output –instance-id d i-0eeadb140405cc09b \
            --output text | tail -n 5

Evaluate the output, you should see Server running on processor architecture arm64. This confirms that you have successfully utilized an architecture-specific user data script.


[  58.798184] cloud-init[1257]: node-v14.15.3-linux-arm64/share/systemtap/tapset/node.stp
[  58.798293] cloud-init[1257]: node-v14.15.3-linux-arm64/LICENSE
[  58.798402] cloud-init[1257]: Cloning into 'node-helloworld'...
[  58.798510] cloud-init[1257]: Server running on processor architecture arm64
2021-01-14T21:14:32+00:00

Cleaning Up

Delete the Auto Scaling group and use the force-delete option. The force-delete option specifies that the group is to be deleted along with all instances associated with the group, without waiting for all instances to be terminated.


aws autoscaling delete-auto-scaling-group \
            --auto-scaling-group-name asg-mixed-arch --force-delete \
            --region us-east-1

Now, delete your launch templates.


aws ec2 delete-launch-template --launch-template-name lt-x86
aws ec2 delete-launch-template --launch-template-name lt-arm64

Conclusion

You walked through creating and using architecture-specific user data scripts that were processor architecture-specific. This same method could be applied to fleets where you have different configurations needed for different instance types. Variability such as disk sizes, networking configurations, placement groups, and tagging can now be accomplished in the same Auto Scaling group.

Field Notes: Accelerate Research with Managed Jupyter on Amazon SageMaker

2021-03-02 Mrudhula Balasubramanyan

Post Syndicated from Mrudhula Balasubramanyan original https://aws.amazon.com/blogs/architecture/field-notes-accelerate-research-with-managed-jupyter-on-amazon-sagemaker/

Research organizations across industry verticals have unique needs. These include facilitating stakeholder collaboration, setting up compute environments for experimentation, handling large datasets, and more. In essence, researchers want the freedom to focus on their research, without the undifferentiated heavy-lifting of managing their environments.

In this blog, I show you how to set up a managed Jupyter environment using custom tools used in Life Sciences research. I show you how to transform the developed artifacts into scripted components that can be integrated into research workflows. Although this solution uses Life Sciences as an example, it is broadly applicable to any vertical that needs customizable managed environments at scale.

Overview of solution

This solution has two parts. First, the System administrator of an organization’s IT department sets up a managed environment and provides researchers access to it. Second, the researchers access the environment and conduct interactive and scripted analysis.

This solution uses AWS Single Sign-On (AWS SSO), Amazon SageMaker, Amazon ECR, and Amazon S3. These services are architected to build a custom environment, provision compute, conduct interactive analysis, and automate the launch of scripts.

Walkthrough

The architecture and detailed walkthrough are presented from both an admin and researcher perspective.

Architecture from an admin perspective

Architecture from admin perspective

In order of tasks, the admin:

authenticates into AWS account as an AWS Identity and Access Management (IAM) user with admin privileges
sets up AWS SSO and users who need access to Amazon SageMaker Studio
creates a Studio domain
assigns users and groups created in AWS SSO to the Studio domain
creates a SageMaker notebook instance shown generically in the architecture as Amazon EC2
launches a shell script provided later in this post to build and store custom Docker image in a private repository in Amazon ECR
attaches the custom image to Studio domain that the researchers will later use as a custom Jupyter kernel inside Studio and as a container for the SageMaker processing job.

Architecture from a researcher perspective

Architecture from a researcher perspective

In order of tasks, the researcher:

authenticates using AWS SSO
SSO authenticates researcher to SageMaker Studio
researcher performs interactive analysis using managed Jupyter notebooks with custom kernel, organizes the analysis into script(s), and launches a SageMaker processing job to execute the script in a managed environment
the SageMaker processing job reads data from S3 bucket and writes data back to S3. The user can now retrieve and examine results from S3 using Jupyter notebook.

Prerequisites

For this walkthrough, you should have:

An AWS account
Admin access to provision and delete AWS resources
Researchers’ information to add as SSO users: full name and email

Set up AWS SSO

To facilitate collaboration between researchers, internal and external to your organization, the admin uses AWS SSO to onboard to Studio.

For admins: follow these instructions to set up AWS SSO prior to creating the Studio domain.

Onboard to SageMaker Studio

Researchers can use just the functionality they need in Amazon SageMaker Studio. Studio provides managed Jupyter environments with sharable notebooks for interactive analysis, and managed environments for script execution.

When you onboard to Studio, a home directory is created for you on Amazon Elastic File System (Amazon EFS) which provides reliable, scalable storage for large datasets.

Once AWS SSO has been setup, follow these steps to onboard to Studio via SSO. Note the Studio domain id (ex. d-2hxa6eb47hdc) and the IAM execution role (ex. AmazonSageMaker-ExecutionRole-20201156T214222) in the Studio Summary section of Studio. You will be using these in the following sections.

Provision custom image

At the core of research is experimentation. This often requires setting up playgrounds with custom tools to test out ideas. Docker images are an effective[CE1] [BM2] way to package those tools and dependencies and deploy them quickly. They also address another critical need for researchers – reproducibility.

To demonstrate this, I picked a Life Sciences research problem that requires custom Python packages to be installed and made available to a team of researchers as Jupyter kernels inside Studio.

For the custom Docker image, I picked a Python package called Pegasus. This is a tool used in genomics research for analyzing transcriptomes of millions of single cells, both interactively as well as in cloud-based analysis workflows.

In addition to Python, you can provision Jupyter kernels for languages such as R, Scala, Julia, in Studio using these Docker images.

Launch an Amazon SageMaker notebook instance

To build and push custom Docker images to ECR, you use an Amazon SageMaker notebook instance. Note that this is not part of SageMaker Studio and unrelated to Studio notebooks. It is a fully managed machine learning (ML) Amazon EC2 instance inside the SageMaker service that runs the Jupyter Notebook application, AWS CLI, and Docker.

Use these instructions to launch a SageMaker notebook instance.
Once the notebook instance is up and running, select the instance and navigate to the IAM role attached to it. This role comes with IAM policy ‘AmazonSageMakerFullAccess’ as a default. Your instance will need some additional permissions.
Create a new IAM policy using these instructions.
Copy the IAM policy below to paste into the JSON tab.
Fill in the values for <region-id> (ex. us-west-2), <AWS-account-id>, <studio-domain-id>, <studio-domain-iam-role>. Name the IAM policy ‘sagemaker-notebook-policy’ and attach it to the notebook instance role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "additionalpermissions",
            "Effect": "Allow",
            "Action": [
                "iam:PassRole",
                "sagemaker:UpdateDomain"
            ],
            "Resource": [
                "arn:aws:sagemaker:<region-id>:<AWS-account-id>:domain/<studio-domain-id>",
                "arn:aws:iam::<AWS-account-id>:role/<studio-domain-iam-role>"
            ]
        }
    ]
}

Start a terminal session in the notebook instance.
Once you are done creating the Docker image and attaching to Studio in the next section, you will be shutting down the notebook instance.

Create private repository, build, and store custom image, attach to SageMaker Studio domain

This section has multiple steps, all of which are outlined in a single bash script.

First the script creates a private repository in Amazon ECR.
Next, the script builds a custom image, tags, and pushes to Amazon ECR repository. This custom image will serve two purposes: one as a custom Python Jupyter kernel used inside Studio, and two as a custom container for SageMaker processing.
To use as a custom kernel inside SageMaker Studio, the script creates a SageMaker image and attaches to the Studio domain.
Before you initiate the script, fill in the following information: your AWS account ID, Region (ex. us-east-1), Studio IAM execution role, and Studio domain id.
You must create four files: bash script, Dockerfile, and two configuration files.
Copy the following bash script to a file named ‘pegasus-docker-images.sh’ and fill in the required values.

#!/bin/bash

# Pegasus python packages from Docker hub

accountid=<fill-in-account-id>

region=<fill-in-region>

executionrole=<fill-in-execution-role ex. AmazonSageMaker-ExecutionRole-xxxxx>

domainid=<fill-in-Studio-domain-id ex. d-xxxxxxx>

if aws ecr describe-repositories | grep 'sagemaker-custom'
then
    echo 'repo already exists! Skipping creation'
else
    aws ecr create-repository --repository-name sagemaker-custom
fi

aws ecr get-login-password --region $region | docker login --username AWS --password-stdin $accountid.dkr.ecr.$region.amazonaws.com

docker build -t sagemaker-custom:pegasus-1.0 .

docker tag sagemaker-custom:pegasus-1.0 $accountid.dkr.ecr.$region.amazonaws.com/sagemaker-custom:pegasus-1.0

docker push $accountid.dkr.ecr.$region.amazonaws.com/sagemaker-custom:pegasus-1.0

if aws sagemaker list-images | grep 'pegasus-1'
then
    echo 'Image already exists! Skipping creation'
else
    aws sagemaker create-image --image-name pegasus-1 --role-arn arn:aws:iam::$accountid:role/service-role/$executionrole
    aws sagemaker create-image-version --image-name pegasus-1 --base-image $accountid.dkr.ecr.$region.amazonaws.com/sagemaker-custom:pegasus-1.0
fi

if aws sagemaker list-app-image-configs | grep 'pegasus-1-config'
then
    echo 'Image config already exists! Skipping creation'
else
   aws sagemaker create-app-image-config --cli-input-json file://app-image-config-input.json
fi

aws sagemaker update-domain --domain-id $domainid --cli-input-json file://default-user-settings.json

Copy the following to a file named ‘Dockerfile’.

FROM cumulusprod/pegasus-terra:1.0

USER root

Copy the following to a file named ‘app-image-config-input.json’.

{
    "AppImageConfigName": "pegasus-1-config",
    "KernelGatewayImageConfig": {
        "KernelSpecs": [
            {
                "Name": "python3",
                "DisplayName": "Pegasus 1.0"
            }
        ],
        "FileSystemConfig": {
            "MountPath": "/root",
            "DefaultUid": 0,
            "DefaultGid": 0
        }
    }
}

Copy the following to a file named ‘default-user-settings.json’.

{
    "DefaultUserSettings": {
        "KernelGatewayAppSettings": { 
           "CustomImages": [ 
              { 
                 "ImageName": "pegasus-1",
                 "ImageVersionNumber": 1,
                 "AppImageConfigName": "pegasus-1-config"
              }
           ]
        }
    }
}

Launch ‘pegasus-docker-images.sh’ in the directory with all four files, in the terminal of the notebook instance. If the script ran successfully, you should see the custom image attached to the Studio domain.

Amazon SageMaker dashboard

Perform interactive analysis

You can now launch the Pegasus Python kernel inside SageMaker . If this is your first time using Studio, you can get a quick tour of its UI.

For interactive analysis, you can use publicly available notebooks in Pegasus tutorial from this GitHub repository. Review the license before proceeding.

To clone the repository in Studio, open a system terminal using these instructions. Initiate $ git clone https://github.com/klarman-cell-observatory/pegasus

In the directory ‘pegasus’, select ‘notebooks’ and open ‘pegasus_analysis.ipynb’.
For kernel choose ‘Pegasus 1.0 (pegasus-1/1)’.
You can now run through the notebook and examine the output generated. Feel free to work through the other notebooks for deeper analysis.

Pagasus tutorial

At any point during experimentation, you can share your analysis along with results with your colleagues using these steps. The snapshot that you create also captures the notebook configuration such as instance type and kernel, to ensure reproducibility.

Formalize analysis and execute scripts

Once you are done with interactive analysis, you can consolidate your analysis into a script to launch in a managed environment. This is an important step, if you want to later incorporate this script as a component into a research workflow and automate it.

Copy the following script to a file named ‘pegasus_script.py’.

"""
BSD 3-Clause License

Copyright (c) 2018, Broad Institute
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
  contributors may be used to endorse or promote products derived from
  this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

"""

import pandas as pd
import pegasus as pg

if __name__ == "__main__":
    BASE_DIR = "/opt/ml/processing"
    data = pg.read_input(f"{BASE_DIR}/input/MantonBM_nonmix_subset.zarr.zip")
    pg.qc_metrics(data, percent_mito=10)
    df_qc = pg.get_filter_stats(data)
    pd.DataFrame(df_qc).to_csv(f"{BASE_DIR}/output/qc_metrics.csv", header=True, index=False)

The Jupyter notebook following provides an example of launching a processing job using the script in SageMaker.

Create a notebook in SageMaker Studio in the same directory as the script.
Copy the following code to the notebook and name it ‘sagemaker_pegasus_processing.ipynb’.
Select ‘Python 3 (Data Science)’ as the kernel.
Launch the cells.

import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput
region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
role = sagemaker.get_execution_role()
bucket = sagemaker_session.default_bucket()

prefix = 'pegasus'

account_id = boto3.client('sts').get_caller_identity().get('Account')
ecr_repository = 'research-custom'
tag = ':pegasus-1.0'

uri_suffix = 'amazonaws.com'
if region in ['cn-north-1', 'cn-northwest-1']:
    uri_suffix = 'amazonaws.com.cn'
processing_repository_uri = '{}.dkr.ecr.{}.{}/{}'.format(account_id, region, uri_suffix, ecr_repository + tag)
print(processing_repository_uri)

script_processor = ScriptProcessor(command=['python3'],
                image_uri=processing_repository_uri,
                role=role,
                instance_count=1,
                instance_type='ml.m5.xlarge')
!wget https://storage.googleapis.com/terra-featured-workspaces/Cumulus/MantonBM_nonmix_subset.zarr.zip

local_path = "MantonBM_nonmix_subset.zarr.zip"

s3 = boto3.resource("s3")

base_uri = f"s3://{bucket}/{prefix}"
input_data_uri = sagemaker.s3.S3Uploader.upload(
    local_path=local_path, 
    desired_s3_uri=base_uri,
)
print(input_data_uri)

code_uri = sagemaker.s3.S3Uploader.upload(
    local_path="pegasus_script.py", 
    desired_s3_uri=base_uri,
)
print(code_uri)

script_processor.run(code=code_uri,
                      inputs=[ProcessingInput(source=input_data_uri, destination='/opt/ml/processing/input'),],
                      outputs=[ProcessingOutput(source="/opt/ml/processing/output", destination=f"{base_uri}/output")]
                     )
script_processor_job_description = script_processor.jobs[-1].describe()
print(script_processor_job_description)

output_path = f"{base_uri}/output"
print(output_path)

The ‘output_path’ is the S3 prefix where you will find the results from SageMaker processing. This will be printed as the last line after execution. You can examine the results either directly in S3 or by copying the results back to your home directory in Studio.

Cleaning up

To avoid incurring future charges, shut down the SageMaker notebook instance. Detach image from the Studio domain, delete image in Amazon ECR, and delete data in Amazon S3.

Conclusion

In this blog, I showed you how to set up and use a unified research environment using Amazon SageMaker. Although the example pertained to Life Sciences, the architecture and the framework presented are generally applicable to any research space. They strive to address the broader research challenges of custom tooling, reproducibility, large datasets, and price predictability.

As a logical next step, take the scripted components and incorporate them into research workflows and automate them. You can use SageMaker Pipelines to incorporate machine learning into your workflows and operationalize them.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.