Tag Archives: Field Notes

Field Notes: How to Deploy End-to-End CI/CD in the China Regions Using AWS CodePipeline

Post Syndicated from Ashutosh Pateriya original https://aws.amazon.com/blogs/architecture/field-notes-how-to-deploy-end-to-end-ci-cd-in-the-china-regions-using-aws-codepipeline/

This post was co-authored by Ravi Intodia, Cloud Archiect, Infosys Technologies Ltd, Nirmal Tomar, Principal Consultant, Infosys Technologies Ltd and Ashutosh Pateriya, Solution Architect, AWS.

Today’s businesses must contend with fast-changing competitive environments, expanding security needs, and scalability issues. Businesses must find a way to reconcile the need for operational stability with the need for quick product development. Continuous integration and continuous delivery (CI/CD) enables rapid software iterations while maintaining system stability and security.

With an increase in AWS Cloud and DevOps adoption, many organizations seek solutions which go beyond geographical boundaries. AWS CodePipeline, along with its related services, lets you integrate and deploy your solutions across multiple AWS accounts and Regions. However, it becomes more challenging when you want to deploy your application in multiple AWS Regions as well as in China, due to the unavailability of AWS CodePipeline in the Beijing and Ningxia Regions.

In this blog post, you will learn how to overcome the unique challenges when deploying applications across many parts of the world, including China. For this solution, we will use the power and flexibility of AWS CodeBuild to implement AWS Command Line Interface (AWS CLI) commands to perform custom actions that are not directly supported by CodePipeline or AWS CodeDeploy.

CodePipeline for multi-account and multi-Region deployment consists of the following components:

  • ArtifactStore and encryption keys – In the AWS account which hosts CodePipeline, there should be an Amazon Simple Storage Service (Amazon S3) bucket and an AWS Key Management Service (AWS KMS) key for each Region where resources need to be deployed.
  • CodeBuild and CodePipeline roles – In the AWS account which hosts CodePipeline there should be roles created that can be used or assumed by CodeBuild and CodePipeline projects for performing required actions.
  • Cross-account roles – In each AWS account where cross-account deployments are required, an AWS role with the necessary permissions must be created. The CodePipeline role of the deploying account must be allowed to assume this role for all required accounts. Cross-account roles will also have access to the required S3 buckets and AWS KMS keys for deploying accounts.
Figure 1. High-level solution for AWS Regions

Figure 1. High-level solution for AWS Regions

Although the solution works for most Regions, we encounter challenges when we try to expand our current worldwide solutions into the China Regions.

The challenges are as follows:

  • Cross-account roles – Cross-account roles cannot be created between accounts in non-China Regions and the China Regions. This means that CodeDeploy will be unable to assume the target account role necessary to complete component deployment.
  • Availability of services – Services required to configure a cloud native CI/CD pipeline are unavailable in the China Regions.
  • Connectivity – There is no direct network connectivity available between the China Regions and other AWS Regions.
  • User management – Accounts by users in China are distinct from AWS Region user accounts, and must be maintained independently.

Due to the lack of cross-account roles and the CodePipeline service, setting up a worldwide CI/CD pipeline that includes the China Regions is not automatically supported.

High-level solution

In the proposed solution, we will build and deploy the application to both Regions using the AWS CI/CD services from the non-China Region, and we will create an access key in a China Region with access to deploy the application using services, such as AWS Lambda, AWS Elastic Beanstalk, Amazon Elastic Container Service, and Amazon Elastic Kubernetes Service. This access key is stored in a non-China account as an SSM parameter after encryption. On committing, the CodePipeline in the non-China Region is initiated, and it builds the package and deploys the application in both Regions from a single place.

Solution architecture

Figure 2. High-level solution for cross-account deployment from AWS Regions to a China Region

Figure 2. High-level solution for cross-account deployment from AWS Regions to a China Region

In this architecture, AWS CLI commands are used to set an AWS profile of CodeBuild instance with China credentials (retrieved from the AWS Systems Manager Parameter Store). This enables a CodeBuild instance to run an AWS CloudFormation package and deploy commands directly on the China account, thereby deploying required resources in the desired China Region.

This solution is not relying on any AWS CI/CD services like CodeDeploy in the China Region. With this solution we can create a complete CI/CD pipeline running in an AWS Region that can deploy an application in both Regions.

The following key components are needed for deployment:

  • AWS Identity and Access Management (IAM) user credentials – An IAM user needs to be created in the target account in China.
  • SSM parameter (secure string) – China IAM user access key (secret access key needs to be saved as a secure string SSM parameter in the deployment AWS account).
  • Update CloudFormation templates – CloudFormation templates need to be updated to support China Region mappings (such as using “arn:aws-cn” instead of “arn:aws”).
  • Enhance CodeBuild to support build and deployment – CodeBuild buildspec.yml needs to be enhanced to perform build and deployment to China accounts, as mentioned in the following.

Prerequisites

  • Two AWS accounts: One AWS account outside of China, and one account in China.
  • Practical experience in deploying Lambda functions using CodeBuild, CodeDeploy, and CodePipeline, and using AWS CLI. Because this example focuses specifically on extending CodePipeline from Regions outside of China to deploy in China Region, we are not going to explore a standard CodePipeline set up.

Detailed Implementation

This solution is built using CodePipeline, CodeCommit, CodeBuild, AWS Cloud​Formation templates, and IAM.

Steps

  1. One-time key generation in an account in China with necessary access to deploy application, including creation of one S3 bucket for CodeBuild artifacts.
    Note: As a best practice, we suggest rotation of the access key every 30 days.
  2. Complete the setup of CodePipeline to deploy application in Regions outside of China, as well as including China Region.

As a demonstration, let’s deploy a Lambda function in us-east-1 and cn-north-1 and discuss the steps in detail. The same steps can be followed to deploy any other AWS service.

Part 1 – In the account based in China Region: cn-north-1

  1. Create an S3 bucket with default encryption enabled for CodeBuild artifacts.

  1. Create an IAM user (with programmatic access only) with the required permissions to deploy Lambda functions and related resources. The IAM user will also have access to the S3 bucket created for CodeBuild artifacts.To create an IAM policy, refer to the AWS IAM Policy resource.

Part 2 – In AWS account based in non-China Region: us-east-1

  1. Create two SSM parameters of type secure string.
SSM Parameter Name SSM Parameter Value
/China/Dev/UserAccessKey <Value of China IAM User Access Key>
/China/Dev/UserAccessKey <Value of China IAM User Access Key>

To create secure SSM parameters using the AWS CLI, refer to the Create a Systems Manager parameter (AWS CLI) tutorial.

To create secure SSM parameters using the AWS Management Console, refer to the Create a Systems Manager parameter (console) tutorial.

Note: Creating secure SSM parameters is not supported by CloudFormation templates. Also, as a security best practice, you should not have any sensitive information as part of CloudFormation templates to avoid any possible security breach.

  1. Create an AWS KMS key for encrypting CodeBuild or CodePipeline artifacts (for cross-Region deployments, create AWS KMS key in all Regions, and create SSM parameters for each in the Region having CodePipeline).
  2. Create artifacts S3 bucket for CodeBuild or CodePipeline artifacts.
  3. Create CI/CD related roles. For CI/CD service roles, refer to:
    https://docs.aws.amazon.com/codepipeline/latest/userguide/pipelines-create-service-role.html
    https://docs.aws.amazon.com/codebuild/latest/userguide/setting-up.html#setting-up-service-role
  4. Create a CodeCommit repository.
  5. Create SSM parameters for the following.
SSM Parameter Name SSM Parameter Value
/China/Dev/DeploymentS3Bucket <Artifact Bucket Name in China Region>
/US/Dev/CodeBuildRole <Role ARN of CodeBuild Service Role>
/US/Dev/CodePipelineRole <Role ARN of Codepipeline Service Role>
/US/Dev/CloudformationRole <Role ARN of Cloudformation Service Role>
/US/Dev/DeploymentS3Bucket <Artifact Bucket Name in Pipeline Region>
/US/Dev/CodeBuildImage <Code Build Image Details>
/US/Stage/CrossAccountStageRole <Role ARN for Cross Account Service Role for Stage>
/US/Prod/CrossAccountStageRole <Role ARN for Cross Account Service Role for Prod>
  1. In CodeCommit, push the Lambda code and CloudFormation template for deploying Lambda resources (Lambda function, Lambda role, Lambda log group, and so forth).
  2. In CodeCommit, push two buildspec yml files, one for us-east-1, and one for cn-north-1.
    1. buildspec.yml: For us-east-1
# Buildspec Reference Doc: https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html
version: 0.2
phases:
  install:
    runtime-versions:
      python: 3.7
  pre_build:
    commands:
      - echo "[+] Updating PIP...."
      - pip install --upgrade pip
      - echo "[+] Installing dependencies...."
      #- Commands To Install required dependencies
      - yum install zip unzip -y -q
      - pip install awscli --upgrade
      
  build:
    commands:
      - echo "Starting build `date` in `pwd`"
      - echo "Starting SAM packaging `date` in `pwd`"
      - aws cloudformation package --template-file cloudformation_template.yaml --s3-bucket ${S3_BUCKET} --output-template-file transform-packaged.yaml
      # Additional package commands for cross-region deployments
      - echo "SAM packaging completed on `date`"
      - echo "Build completed `date` in `pwd`"
     
artifacts:
  type: zip
  files:
    - transform-packaged.yaml
    # - additional artifacts for cross-region deployments 

  discard-paths: yes

cache:
  paths:
    - '/root/.cache/pip'
    1. buildspec-china.yml: For cn-north-1
      buildspec-china.yml will be customized for performing build and deployment both. Refer to the following for details.
# Buildspec Reference Doc: https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html
version: 0.2    
phases:
  install:
    runtime-versions:
      python: 3.7

  pre_build:
    commands:
      - echo "[+] Updating PIP...."
      - pip install --upgrade pip
      - echo "[+] Installing dependencies...."
      #- Commands To Install required dependencies
      - yum install zip unzip -y -q
      - pip install awscli --upgrade
      # Setting China Region IAM User Profile
      - echo "Start setting User Profile  `date` in `pwd`"
      - USER_ACCESS_KEY=`aws ssm get-parameter --name ${USER_ACCESS_KEY_SSM} --with-decryption --query Parameter.Value --output text`
      - USER_SECRET_KEY=`aws ssm get-parameter --name ${USER_SECRET_KEY_SSM} --with-decryption --query Parameter.Value --output text`
      - aws configure --profile china set aws_access_key_id ${USER_ACCESS_KEY}
      - aws configure --profile china set aws_secret_access_key ${USER_SECRET_KEY}
      - echo "Setting User Profile Completed `date` in `pwd`"

  build:
    commands:
      # Creating Deployment Package 
      - echo "Start build/packaging  `date` in `pwd`"
      - S3_BUCKET=`aws ssm get-parameter --name ${S3_BUCKET_SSM} --query Parameter.Value --output text`
      - zip -q -r package.zip *
      - >
        bash -c '
        aws cloudformation package 
        --template-file cloudformation_template_china.yaml
        --s3-bucket ${S3_BUCKET}
        --output-template-file transformed-template-china.yaml
        --profile china
        --region cn-north-1'
      - echo "Completed build/packaging  `date` in `pwd`"
  post_build:
    commands:
      # Deploying 
      - echo "Start deployment  `date` in `pwd`"
      - >
        bash -c '
        aws cloudformation deploy 
        --capabilities CAPABILITY_NAMED_IAM
        --template-file transformed-template-china.yaml
        --stack-name ${ProjectName}-app-stack-dev
        --profile china
        --region cn-north-1'
      - echo "Completed deployment  `date` in `pwd`"
artifacts:
  type: zip
  files:
    - package.zip
    - transformed-template-china.yaml

Environment Variables: USER_ACCESS_KEY_SSM, USER_SECRET_KEY_SSM and S3_BUCKET_SSM

After creating and committing the previous files, your CodeCommit repository will look like the following.

Now that we have a CodeCommit repository, next we will create a CodePipeline for Lambda with the following stages:

  1. Source – Use the previously created CodeCommit repository (CodeRepository-US-East-1) as the source.
  2. Build – The CodeBuild project uses buildspec.yml by default, and takes output of Source stage as input and builds artifacts for us-east-1.
  3. Deploy
    1. Code-Deploy Project for deploying to us-east-1
      This takes output of the previous CodeBuild stage as input and performs deployment in two steps: create-changeset and execute-changeset (assuming the required role attached to the Code-Deploy for deployment).
    2. CodeBuild Project for deploying to cn-north-1
      This takes output of Source stage as input and performs build and deployment both to cn-north-1 using buildspec-china.yml. Also, it uses China IAM user credentials and bucket SSM parameters from environment variables.CodeBuild project details are outlined in the following image.

  1. Optional – Add further steps like manual approval, deployment to higher environment, and so forth, as required.

Congratulations! you have just created a CodePipeline with Lambda deployed in both a non-China Region and a China Region. Your CodePipeline should appear similar to the following.

Figure 3. CodePipeline implementation for both regions

Figure 3. CodePipeline implementation for both Regions

Note: Actual CodePipeline view will be vertical only where all environment deployment will be one after the other. For the purpose of this example, we have placed them side-by-side to more easily showcase multiple environments.

CodePipeline Implementation Steps

We have created this pipeline with the following high-level steps, and you can add or remove steps as needed.

Step 1. After you commit the source code, CodePipeline will launch in the non-China Region and fetch the source code.

Step 2. Build the package using using buildspec.yml.

Step 3. Deploy the application in both Regions by following the subsection steps for development environment.

    1. Create the changeset for the development environment.
    2. Implement the changeset for the development environment.

Step 4. Repeat step 3, but deploy the application in the staging environment.

Step 5. Wait for approval from your administrator or application owner before deploying application in production environment.

Step 6. Repeat steps 3 and 4 to deploy the application in the production environment.

Cleaning up

To avoid incurring future charges, clean up the resources created as part of this blog post.

  • Delete the CloudFormation stack created in the non-China Region.
  • Delete the SSM parameter created to store the access key.
  • Delete the access created in the China Region.

Conclusion

In this blog post, we have explored the question: how can you use AWS services to implement CI/CD in a China Region and keep them in sync with an AWS Region? Although we are using us-east-1 as an example here, this solution will work for any Region where CodePipeline services are available, including the China Region.

The question has been answered by dividing it into three problem statements as follows.

Problem 1: CodePipeline is not available in the China Regions.
Solution: Set up CodePipeline in a non-China Region and deploy to a China Region.

Problem 2: AWS cross-account roles are not possible between a non-China Region and the China Regions.
Solution: Use the power and flexibility of CodeBuild to build your application and also deploy your application using the AWS CLI.

Problem 3: Keep a non-China Region and the China Regions in sync.
Solution: Maintain all code and managing deployments from a common deployment AWS account.

Reference:

AWS CodeCommit | Managed Source Control Service

AWS CodePipeline | Continuous Integration and Continuous Delivery

AWS CodeBuild – Fully Managed Build Service

Field Notes: How to Boost Your Search Results Using Relevance Tuning with Amazon Kendra

Post Syndicated from Sam Palani original https://aws.amazon.com/blogs/architecture/field-notes-how-to-boost-your-search-results-using-amazon-kendra-relevance-tuning/

One challenge enterprises face when they implement an intelligent search solution for their large data sources, is the ability to quickly provide relevant search results. When working with large data sources, not all features or attributes within your data will be equally relevant to all your users. We want to prioritize identifying and boosting specific attributes for your users to provide the most relevant search results.

Relevance in Amazon Kendra tuning allows you to give a boost to a result in the response when the query includes terms that match the attribute. For example, you might have two similar documents but one is created more recently. A good practice is to boost the relevance for the newer (or earlier) document.

Relevance tuning in Amazon Kendra can be performed manually at the Index level or at the query level. In this blog post, we show how to tune an existing index that is connected to external data sources, and ultimately optimize your internal search results.

Solution overview

We will walk through how you can manually tune your index using boosting techniques to achieve the best results. This enables you to prioritize the results from a specific data source so your users get the most relevant results when they perform searches.

Figure 1. Amazon Kendra setup

Figure 1 illustrates a standard Amazon Kendra setup. An Amazon Kendra index is connected to different Amazon Simple Storage Service (Amazon S3) buckets with multiple data sources.

There are two types of user personas. The first persona is administrators who are responsible for managing the index and performing administrative tasks such as access control, index tuning, and so forth. The second persona is users who access Amazon Kendra either directly or through a custom application that can make API search requests on an Amazon Kendra index. You can use relevance tuning to boost results from one of these data sources to provide a more relevant search result.

Prerequisites

This solution requires the following:

If you do not have these prerequisites set up, you might check out Create and query an index that walks you through how to create and query an index in Amazon Kendra.

Furthermore, the AWS services you use in this tutorial are within the AWS Free Tier under a 30-day trial.

Step 1 – Check facet definition

First, review your facet definition and confirm it is facetable, displayable, searchable, and sortable.

In the Amazon Kendra console, select your Amazon Kendra index, then select Facet definition in the Data management panel. Confirm that _data_source_id has all of its attributes checked.

Figure 2. Check facet definition

Step 2 – Review data sources

Next, verify that you have at least two data sources for your Amazon Kendra index.

In the Amazon Kendra console, select your Amazon Kendra index, and then select Data sources in the Data management panel. Confirm that your data sources are correctly synced and available. In our example, data-source-2 is an earlier version and contains unprocessed documents compared to sample-datasource that has newer versions and has more relevant content.

Figure 3. Verify data sources

Step 3 – Perform a regular Amazon Kendra search

Next, we will test a regular search without any relevance tuning. Select Search console, and enter the search term Amazon Kendra VPC. Review your search results.

Figure 4. Regular Amazon Kendra search

In our example search results, the document from the second data source 39_kendra-dg_kendra-dg appears as the third result.

Step 4 – Relevance tuning through boosting

Now we will boost the first data source so documents from the first data source are displayed ahead of the other data sources.

Select data source, and boost the first data source sample-datasource to 8. Press the Save button to save your tuning. Wait several seconds for the changes to propagate.

Figure 5. Boost sample-datasource

Step 5 – Perform the search after boosting

Next, we will test the search with relevance tuning applied. In the search text box enter the search term Amazon Kendra VPC. Review your search results.

Figure 6. Searching after boosting

Notice that the search result no longer contains the document from the second data source.

Cleaning up

To avoid incurring any future charges, remove any index created specifically for this tutorial. In the Amazon Kendra console, select your index. Then select Actions, and choose Delete.

Figure 7. Delete index

Conclusion

In this blog post, we showed you how relevance tuning can be used to produce results ranked by their relevance. We also walked you through an example regarding how to manually perform relevance tuning at the index level in Amazon Kendra to boost your search results.

In addition to relevance tuning at the index level, you can also perform relevance tuning at the query level. Finally, check out the What is Amazon Kendra? and Relevance tuning with Amazon Kendra blog posts to learn more.

Field Notes: Deploy and Visualize ROS Bag Data on AWS using rviz and Webviz for Autonomous Driving

Post Syndicated from Aubrey Oosthuizen original https://aws.amazon.com/blogs/architecture/field-notes-deploy-and-visualize-ros-bag-data-on-aws-using-rviz-and-webviz-for-autonomous-driving/

In the automotive industry, ROS bag files are frequently used to capture drive data from test vehicles configured with cameras, LIDAR, GPS, and other input devices. The data for each device is stored as a topic in the ROS bag file. Developers and engineers need to visualize and inspect the contents of ROS bag files to identify issues or replay the drive data.

There are a couple of challenges to be addressed by migrating the visualization workflow into Amazon Web Services (AWS):

  • Search, identify, and stream scenarios for ADAS engineers. Visualization tool should be ready instantly, load the data for a certain scenario over a search API, and show the first result through data streaming, to provide a good user experience.
  • Native integration with the tool chain. Many customers implement the Data Catalog, data storage, and search API in AWS native services. This visualization tool should be integrated into such a tool chain directly.

Overview of solutions

This blog post describes three solutions on how to deploy and visualize ROS bag data on AWS by using two popular visualization tools:

  • rviz is the standard open-source visualization tool used by the ROS community and has a large set of tools and plugin support.
  • Webviz is an open-source tool created by Cruise that provides modular and configurable browser-based visualization.

In the autonomous driving data lake reference architecture, both visualization tools are covered in the step 10: Provide an advanced analytics and visualization toolchain including search function for particular scenarios using AWS AppSync, Amazon QuickSight (KPI reporting and monitoring), and Webviz, rviz, or other tooling for visualization.

Prerequisites

Solution 1 – Visualize ROS bag files using rviz on AWS RoboMaker virtual desktops

AWS RoboMaker provides simulation and testing infrastructure for robotics as a managed service. This includes out of the box support for virtual desktops with ROS tooling preconfigured and installed. When you launch a virtual desktop, AWS RoboMaker launches the NICE DCV web browser client. This client provides access to your AWS Cloud9 desktop and streaming applications.

Launch rviz on AWS RoboMaker virtual desktop

Follow the guide on creating a new development environment to provision a new integrated development environment (IDE) and open it. After your AWS Cloud9 IDE is open, you can launch a new virtual desktop by pressing the Virtual Desktop button at the top center of the IDE, and selecting Launch Virtual Desktop. This might take a couple of seconds to open in a new browser tab.

After your virtual desktop is loaded, you can run rviz by opening the terminal and running the following commands:

$ source /opt/ros/melodic/setup.bash
$ roscore &
$ rosrun rviz rviz

Solution 2 – Visualize ROS bag files using rviz on EC2 and TigerVNC

Note: We strongly recommend using the AWS RoboMaker managed solution for provisioning virtual desktops for your visualization needs. In cases where this is not possible due to different Linux distributions or versions, this solution allows an alternative method for setting up a virtual desktop on EC2.

In this solution (source code) we use AWS Cloud Development Kit (AWS CDK) to deploy a new Ubuntu 18.04 AMI EC2 instance to your AWS account and preconfigure it with rviz, TigerVNC, and Ubuntu Desktop.

 

Figure 1. Architecture for solution 2 (visualize ROS bag files using rviz on EC2 and TigerVNC)

Open a shell terminal that has your AWS CLI configured. Run the following commands to clone the code and install the corresponding nodejs dependencies.

$ git clone https://github.com/aws-samples/aws-autonomous-driving-data-lake-ros-bag-visualization-using-rviz.git rviz-infra
$ cd rviz-infra
$ npm install

Note: Review the README to understand the project structure and commands.

Next, configure your project-specific settings, like your AWS Account, Region, VNC password, VPC to deploy the Amazon EC2 machine into, and EC2 instance type by running the bootstrap script:

$ npm run project-bootstrap

You will be prompted for various inputs to bootstrap the project, including the VNC password to use. Most of these input values will be stored into your cdk.json file, and the VNC password will be stored in the AWS Systems Manager Parameter Store, a capability of AWS Systems Manager.

Run the following command to deploy your stack into your AWS account.

$ cdk deploy

After the stack has been deployed, and the EC2 instance provisioned, its user-data script will initiate and install TigerVNC and the required ROS tooling.

To see the progress, let’s connect to the instance using SSH and then tail the bootstrapping log.

$ ./ssm.sh ssh
$ sudo su ubuntu
$ tail -f /var/log/cloud-init-output.log

It takes approximately 15 minutes for the user-data bootstrapping script to finish. When it finishes, you will see the message “rviz-setup bootstrapping completed. You can now log in via VNC”.

Open a new shell terminal in the project root and start a port forwarding session using SSM through the ssm helper script:

$ ./ssm.sh vnc

After you see the waiting for connections output, you can open your VNC viewer and connect to localhost:5901.

When prompted for a password, enter the one you used when previously running the bootstrapping script.

You now have access to your Ubuntu Desktop environment.

Running rviz

If you opted to install sample data, the Ford AV Sample Data has been downloaded and installed on the EC2 instance already. To visualize it in rviz, a few helper scripts have been created and can be run by opening a new terminal and initiating the following commands:

$ cd /home/ubuntu/catkin_ws
$ ./0-run-all.sh

This helper script will open and run rviz on the Ford sample data.

Figure 2. Ford AV Dataset visualized with rviz on Ubuntu Desktop

You should now be able to use this server to run and visualize your ROS bag files using rviz.

Solution 3 – Visualize ROS bag files using Webviz on Amazon Elastic Container Service (Amazon ECS)

The third solution (source code) uses AWS CDK to deploy Webviz as a container running on AWS Fargate, fronted by an Application Load Balancer (ALB). In addition, the Infrastructure as Code (IaC) can either create a new Amazon S3 bucket or import an existing one. The S3 bucket would have its cross-origin resource sharing (CORS) rules updated to allow streaming bag files from your ALB domain.

The bucket and ROS bag files won’t need to be made public since we will use presigned Amazon S3 URLs for authorizing the streaming of the files.

Finally, the AWS CDK code deploys an AWS Lambda function that can be invoked to generate a properly formatted Webviz streaming URL that contains the HTTP encoded and presigned URL for streaming your bag files directly in the browser.

Figure 3. Architecture for solution 3 (visualize ROS bag files using Webviz on Amazon ECS)

To clone the repository and install its dependencies.

$ git clone https://github.com/aws-samples/aws-autonomous-driving-data-lake-ros-bag-visualization-using-webviz.git webviz-infra
$ cd webviz-infra
$ npm install

Note: Review the project README to understand the different files, project structures, and commands.

Next, modify the cdk.json to specify your specific project configurations (for example, region, bucket name, and whether you wish to import an existing bucket or create a new one).

{
  "context": {
    ....
    "bucketName": "<bucket_name>", // [required] Name of bucket to use or create
    "bucketExists": true, // [required] Should create or update existing bucket
    "generateUrlFunctionName": "generate_ros_streaming_url", // [required] Name of lambda function
    "scenarioDB": { // [optional] Configuration of SceneDescription table
      "partitionKey": "bag_file",
      "sortKey": "scene_id",
      "region": "eu-west-1",
      "tableName": "SceneDescriptions"
    }
  }
}

To deploy the CDK stack into your AWS account run the following command.

$ cdk deploy

You can now access your Webviz instance by opening the URL value for the webvizserviceServiceURL output from the previous deploy command.

Figure 4. Example showing Webviz being served by our ALB

Custom layouts for Webviz can be imported through json configs. The solution contains an example config in the project root. This custom layout contains the topic configurations and window layouts specific to our ROS bag format and should be modified according to your ROS bag topics.

  1. Select Config → Import/Export Layout
  2. Copy and paste the contents of layout.json

Figure 5. Configuring custom layout for Webviz

Next, you need some ROS bag files to stream into your S3 bucket. If you configured AWS CDK to use an existing bucket, and the bucket already contains some ROS bag files, then you can skip the next step.

Upload ROS bag files

You can use the AWS CLI to copy a local bag file to the specified S3 bucket using the aws s3 cp command. You can also copy files between S3 buckets with the aws s3 cp command.

Generate streaming URL with helper script

The code repository contains a Python helper script in the project root to invoke your deployed Lambda function generate_ros_streaming_url locally with the correct payload.

To run the helper script, run the following command in your terminal:

$ python get_url.py \
--bucket-name <bucket_name> \
--key <ros_bag_key>

Response format: http://webviz-lb-<account>.<region>.elb.amazonaws.com?remote-bag-url=<presigned-url>

The response URL can be opened directly in your browser to visualize your targeted ROS bag file.

Generate a streaming URL through Lambda function in the AWS Console

You can generate a streaming URL by invoking your Lambda function generate_ros_streaming_url with the following example payload in the AWS console.

{
"key": "<ros_bag_key>",
"bucket": "<bucket_name>",
"seek_to": "<ros_timestamp>"
}

The seek_to value informs the Lambda function to add a parameter to jump to the specified ROS timestamp when generating the streaming URL.

Example output:

{
"statusCode": 200,
"body":
"{\"url\": \"http://webviz-lb-<domain>.<region>.elb.amazonaws.com?remote-bag-url=<PRESIGNED_ENCODED_URL>&seek-to=<ros_timestamp>\"}"
}

This body output URL can be opened in your browser to start visualizing your bag files directly.

By using a Lambda function to generate the streaming URL, you have the flexibility to integrate it with other user interfaces or dashboards. For example, if you use Amazon QuickSight to visualize different detected scenarios, you can define a customer action to invoke the Lambda function through API Gateway to get a streaming URL for the target scenario.

Similarly, custom web applications can be used to visualize the scenes and their corresponding ROS bag files stored in a metadata store. Invoke the Lambda function from your web server to generate and return a visualization URL that can be use by the web application.

Using the streaming URL

Open the streaming URL in your browser. If you added a seek_to value while generating the URL, it should jump to that point in the bag file.

Figure 6. Example visualization of ROS bag file streamed from Amazon S3

That’s it. You should now start to see your ROS bag file being streamed directly from Amazon S3 in your browser.

Deploying Webviz as part of the Autonomous Driving Data Lake Reference Solution

This solution forms part of the autonomous driving data lake solution which consists of a reference architecture and corresponding field notes and open-source code modules:

  1. Building an automated scene detection pipeline for Autonomous Driving – ADAS Workflow
  2. Deploying Autonomous Driving and ADAS Workloads at Scale with Amazon Managed Workflows for Apache Airflow
  3. Building an Automated Image Processing and Model Training Pipeline for Autonomous Driving

If this Webviz solution is deployed in conjunction with Building an automated scene detection pipeline for Autonomous Driving – ADAS Workflow (ASD) it supports some out-of-the-box integration with solution 3.

You can configure the solution’s cdk.json to specify the relevant values for the SceneDescription table created by the ASD CDK code. Redeploy the stack after changing this using $ cdk deploy.

With these values, the Lambda function generate_ros_streaming_url now supports an additional payload format:

{
"record_id": “<scene_description_table_partition_key>”,
"scene_id": “<scene_description_table_sort_key>”
}

The get_url.py script also supports the additional scene lookup parameters. To look up a scene stored in your SceneDescription table run the following commands in your terminal:

$ python get_url.py –-record <scene_description_table_partition_key> --scene <scene_description_table_sort_key>

Invoking the generate_ros_streaming_url with the record and scene parameters will result in a lookup of the ROS bag file for the scene from DynamoDB, presigning the ROS bag file and returning an URL to stream the file directly in your browser.

Cleaning Up

To clean up the AWS RoboMaker development environment, review Deleting an Environment.

For the CDK application, you can destroy your CDK stack by running $ cdk destroy from your terminal. Some buckets will need to be manually emptied and deleted from the AWS console.

 Conclusion

This blog post illustrated how to deploy two common tools used to visualize ROS bag files, using three different solutions. First, we showed you how to set up an AWS RoboMaker development environment and run rviz. Second, we showed you how to deploy an Amazon EC2 machine that automatically configures Ubuntu-Desktop with TigerVNC and rviz preinstalled. Third, we showed you how to deploy Webviz on Fargate and configure a bucket to allow streaming bag files. Finally, you learned how streaming URLs can be generated and integrated into your custom scenario detection and visualization tools.

We hope you found this post interesting and helpful in extending your autonomous vehicle solutions, and invite your comments and feedback.

Field Notes: How to Scale OpenTravel Messaging Architecture with Amazon Kinesis Data Streams

Post Syndicated from Danny Gagne original https://aws.amazon.com/blogs/architecture/field-notes-how-to-scale-opentravel-messaging-architecture-with-amazon-kinesis-data-streams/

The travel industry relies on OpenTravel messaging systems to collect and distribute travel data—like hotel inventory and pricing—to many independent ecommerce travel sites. These travel sites need immediate access to the most current hotel inventory and pricing data. This allows shoppers access to the available rooms at the right prices. Each time a room is reserved, unreserved, or there is a price change, an ordered message with minimal latency must be sent from the hotel system to the travel site.

Overview of Solution

In this blog post, we will describe the architectural considerations needed to scale a low latency FIFO messaging system built with Amazon Kinesis Data Streams.

The architecture must satisfy the following constraints:

  1. Messages must arrive in order to each destination, with respect to their hotel code.
  2. Messages are delivered to multiple destinations independently.

Kinesis Data Streams is a managed service that enables real-time processing of streaming records. The service provides ordering of records, as well as the ability to read and replay records in the same order to multiple Amazon Kinesis applications. Kinesis data stream applications can also consume data in parallel from a stream through parallelization of consumers.

We will start with an architecture that supports one hotel with one destination, and scale it to support 1,000 hotels and 100 destinations (38,051 records per second). With the OpenTravel use case, the order of messages matters only within a stream of messages from a given hotel.

Smallest scale: One hotel with one processed delivery destination. One million messages a month.

Figure 1: Architecture showing a system sending messages from 1 hotel to 1 destination (.3805 RPS)

Figure 1: Architecture showing a system sending messages from 1 hotel to 1 destination (.3805 RPS)

Full Scale: 1,000 Hotels with 100 processed delivery destination. 100 billion messages a month.

Figure 2: Architecture showing a system sending messages from 1 hotel to 1 destination (38,051 RPS)

Figure 2: Architecture showing a system sending messages from 1 hotel to 1 destination (38,051 RPS)

In the example application, OpenTravel XML message data is the record payload. The record is written into the stream with a producer. The record is read from the stream shard by the consumer and sent to several HTTP destinations.

Each Kinesis data stream shard supports writes up to 1,000 records per second, up to a maximum data write total of 1 MB per second. Each shard supports reads up to five transactions per second, up to a maximum data read total of 2 MB per second. There is no upper quota on the number of streams you can have in an account.

Streams Shards Writes/Input limit Reads/Output limit
1 1 1 MB per second
1,000 records per second
2 MB per second
1 500 500 MB per second
500,000 records per second
1,000 MB per second

OpenTravel message

In the following OpenTravel message, there is only one field that is important to our use case: HotelCode. In OTA (OpenTravel Agency) messaging, order matters, but it only matters in the context of a given hotel, specified by the HotelCode. As we scale up our solution, we will use the HotelCode as a partition key. Each destination receives the same messages. If we are sending 100 destinations, then we will send the message 100 times. The average message size is 4 KB.

<HotelAvailNotif>
    <request>
        <POS>
            <Source>
                <RequestorID ID = "1000">
                </RequestorID>
                <BookingChannel>
                    <CompanyName Code = "ClientTravelAgency1">
                    </CompanyName>
                </BookingChannel>
            </Source>
        </POS>
   <AvailStatusMessages HotelCode="100">
     <AvailStatusMessage BookingLimit="9">
         <StatusApplicationControl Start="2013-12-20" End="2013-12-25" RatePlanCode="BAR" InvCode="APT" InvType="ROOM" Mon="true" Tue="true" Weds="true" Thur="false"  Fri="true" Sat="true" Sun="true"/>
         <LengthsOfStay ArrivalDateBased="true">
       <LengthOfStay Time="2" TimeUnit="Day" MinMaxMessageType="MinLOS"/>
       <LengthOfStay Time="8" TimeUnit="Day" MinMaxMessageType="MaxLOS"/>
         </LengthsOfStay>
         <RestrictionStatus Status="Open" SellThroughOpenIndicator="false" MinAdvancedBookingOffset="5"/>
     </AvailStatusMessage>    
    </AvailStatusMessages>
 </request>
</HotelAvailNotif>

Source: https://github.com/XML-Travelgate/xtg-content-articles-pub/blob/master/docs/hotel-push/HotelPUSH.md 

Producer and consumer message error handling

Asynchronous message processing presents challenges with error handling, because an error can be logged to the producer or consumer application logs. Short-lived errors can be resolved with a delayed retry. However, there are cases when the data is bad and the message will always cause an error; these messages should be added to a dead letter queue (DLQ).

Producer retries and consumer retries are some of reasons why records may be delivered more than once. Kinesis Data Streams does not automatically remove duplicate records. If the destination needs a strict guarantee of uniqueness, the message must have a primary key to remove duplicates when processed in the client (Handling Duplicate Records). In most cases, the destination can mitigate duplicate messages by processing the same message multiple times in a way that produces the same result (idempotence). If a destination endpoint becomes unavailable or is not performant, the consumer should back off and retry until the endpoint is available. The failure of a single destination endpoint should not impact the performance of delivery of messages to other destinations.

Messaging with one hotel and one destination

When starting at the smallest scale—one hotel and one destination—the system must support one million messages per month. This volume expects input of 4 GB of message data per month at a rate of 0.3805 records per second, and .0015 MB per second both input and output. For the system to process one hotel to one destination, it needs at least one producer, one stream, one shard, and a single consumer.

Hotels/Destinations Streams Shards Consumers (standard) Input Output
1 hotel, 1 destination
(4-KB average record size)
1 1 1 0.3805 records per second, or .0015 MB per second 0.3805 records per second, or .0015 MB per second
Maximum stream limits 1 1 1 250 (4-KB records) per second
1 MB per second (per stream)
500 (4-KB records) per second
2 MB per second (per stream)

With this design, the single shard supports writes up to 250 records per second, up to a maximum data write total of 1 MB per second. PutRecords request supports up to 500 records and each record in the request can be as large as 1 MB. Because the average message size is 4 KB, it enables 250 records per second to be written to the shard.

Figure 3: Architecture using Amazon Kinesis Data Streams (1 hotel, 1 destination)

The single shard can also support consumer reads up to five transactions per second, up to a maximum data read total of 2 MB per second. The producer writes to the shard in FIFO order. A consumer pulls from the shard and delivers to the destination in the same order as received. In this one hotel and one destination example, the consumer read capacity would be 500 records per second.

Figure 4: Records passing through the architecture

Messaging with 1,000 hotels and one destination

Next, we will maintain a single destination and scale the number of hotels from one hotel to 1,000 hotels. This scale increases the producer inputs from one million messages to one billion messages a month. This volume expects an input of 4 TB of message data per month at a rate of 380 records per second, and 1.486 MB per second.

Hotels/Destinations Streams Shards Consumers (standard) Input Output
1,000 hotels,
1 destination
(4-KB average
record size)
1 2 1 380 records per second, or
1.486 MB
per second
380 records per second, or
1.486 MB
per second
Maximum stream limits 1 2 1 500 (4-KB records) per second
2 MB per second
(per stream)
1,000 (4-KB records) per second
4 MB per second
(per stream)

The volume of incoming messages increased to 1.486 MB per second, requiring one additional shard. A shard can be split using the API to increase the overall throughput capacity. The write capacity is increased by distributing messages to the shards. We will use the HotelCode as a partition key, to maintain order of messages with respect to a given hotel. This will distribute the writes into the two shards. The messages for each HotelCode will be written into the same shard and maintain FIFO order. By using the HotelCode as a partition key, it doubles the stream write capacity to 2 MB per second.

A single consumer can read the records from each shard at 2 MB per second per shard. The consumer will iterate through each shard, processing the records in FIFO order based on the HotelCode. It then sends messages to the destination in the same order they were sent from the producer.

Messaging with 1,000 hotels and five destinations

As we scale up further, we maintain 1,000 hotels and scale the delivery to five destinations. This increases the consumer reads from one billion messages to five billion messages a month. It has the same input of 4 TB of message data per month at a rate of 380 records per second, and 1.486 MB per second. The output is now 326 GB of message data per month at a rate of 1,902.5 records per second, and 7.43 MB per second.

Hotels/Destinations Streams Shards Consumers (standard) Input Output
1,000 hotels,
5 destinations
(4-KB average
record size)
1 4 5 380 records per second, or 1.486 MB per second 1,902.5 records
per second, or
7.43 MB per second
Maximum stream capacity 1 4 5 1,000 (4-KB records) records per second
(per shard)
4 MB per second (per stream)
2,000 (4-KB records) records per second per shard with a
standard consumer
8 MB per second
(per stream)

To support this scale, we increase the number of consumers to match the destinations. A dedicated consumer for each destination allows the system to have committed capacity to each destination. If a destination fails or becomes slow, it will not impact other destination consumers.

Figure 5: Four Kinesis shards to support read throughput

We increase to four shards to support additional read throughput required for the increased destination consumers. Each destination consumer iterates through the messages in each shard in FIFO order based on the HotelCode. It then sends the messages to the assigned destination maintaining the hotel specific ordering. Like in the previous 1,000-to-1 design, the messages for each HotelCode are written into the same shard while maintaining its FIFO order.

The distribution of messages per HotelCode should be monitored with the metrics like WriteProvisionedThroughputExceeded and ReadProvisionedThroughputExceeded to ensure an even distribution between shards, because HotelCode is the partition key. If there is uneven distribution, it may require a dedicated shard for each HotelCode. In the following examples, it is assumed that there is an even distribution of messages.

Figure 6: Architecture showing 1,000 Hotels, 4 Shards, 5 Consumers, and 5 Destinations

Messaging with 1,000 hotels and 100 destinations

With 1,000 hotels and 100 destinations, the system processes 100 billion messages per month. It continues to have the same number of incoming messages, 4 TB of message data per month, at a rate of 380 records per second, and 1.486 MB per second. The number of destinations has increased from 4 to 100, resulting in 390 TB of message data per month, at a rate of 38,051 records per second, and 148.6 MB per second.

Hotels/Destinations Streams Shards Consumers (standard) Input Output
1,000 hotels,
100 destinations
(4-KB average
record size)
1 78 100 380 records per second, or
1.486 MB per second
38,051 records per second, or
148.6 MB per second
Maximum stream capacity 1 78 100 19,500 (4-KB records) records per second (per stream)
78 MB per second (per stream)
39,000 (4-KB records) records per second per stream with a standard consumer
156 MB per second
(per stream)

The number of shards increases from four to 78 to support the required read throughput needed for 100 consumers, to attain a read capacity of 156 MB per second with 78 shards.

 

Figure 7: Architecture with 1,000 hotels, 78 Kinesis shards, 100 consumers, and 100 destinations

Next, we will look at a different architecture using a different type of consumer called the enhanced fan-out consumer. The enhanced fan-out consumer will improve the efficiency of the stream throughput and processing efficiency.

Lambda enhanced fan-out consumers

Enhanced fan-out consumers can increase the per shard read consumption throughput through event-based consumption, reduce latency with parallelization, and support error handling. The enhanced fan-out consumer increases the read capacity of consumers from a shared 2 MB per second, to a dedicated 2 MB per second for each consumer.

When using Lambda as an enhanced fan-out consumer, you can use the Event Source Mapping Parallelization Factor to have one Lambda pull from one shard concurrently with up to 10 parallel invocations per consumer. Each parallelized invocation contains messages with the same partition key (HotelCode) and maintains order. The invocations complete each message before processing with the next parallel invocation.

Figure 8: Lambda fan-out consumers with parallel invocations, maintaining order

Consumers can gain performance improvements by setting a larger batch size for each invocation. When setting the Event Source Mapping batch size, the consumer can set the maximum number of records that the Lambda will retrieve from the stream at the time of invoking the function. The Event Source Mapping batch window can be used to adjust the time to wait to gather records for the batch before invoking the function.

The Lambda Event Source Mapping has a setting ReportBatchItemFailures that can be used to keep track of the last successfully processed record. When the next invocation of the Lambda starts the batch from the checkpoint, it starts with the failed record. This will occur until the maximum number of retries for the failed record occurs and it is expired. If this feature is enabled and a failure occurs, Lambda will prioritize checkpointing, over other set mechanisms, to minimize duplicate processing.

Lambda has built-in support for sending old or exhausted retries to an on-failure destination with the option of Amazon Simple Queue Service as a DLQ or SNS topic. The Lambda consumer can be configured with maximum record retries and maximum record age, so repeated failures are sent to a DLQ and handled with a separate Lambda function.

Figure 9: Lambda fan-out consumers with parallel invocations, and error handling

Messaging with Lambda fan-out consumers at scale

In the 1,000 hotel and 100 destination scenario, we will scale by shard and stream. Lambda fan-out consumers have a hard quota of 20 fanout-out consumers per stream. With one consumer per destination, 100 fan-out consumers will need five streams. The producer will write to one stream, which has a consumer that writes to the five streams that our destination consumers are reading from.

Hotels/Destinations Streams Shards Consumers (fan-out) Input Output
1,000 hotels,
100 destinations
(4-KB average
record size)
5 10 (2 each stream) 100 380 records
per second, or
74.3 MB per second
38,051 records
per second, or
148.6 MB per second
Maximum stream capacity 5 10 (2 each stream) 100 500 (4-KB records) per second per stream
2 MB per second per stream
40,000 (4-KB records) per second per stream
200,000 (4-KB records) per second with 5 streams
40 MB per second per stream
200 MB per second with 5 streams

 

Figure 10: Architecture 1,000 hotels, 100 destinations, multiple streams, and fan-out consumers

This architecture increases throughput to 200 MB per second, with only 12 shards, compared to 78 shards with standard consumers.

Conclusion

In this blog post, we explained how to use Kinesis Data Streams to process low latency ordered messages at scale with OpenTravel data. We reviewed the aspects of efficient processing and message consumption, scaling considerations, and error handling scenarios.  We explored multiple architectures, one dimension of scale at a time, to demonstrate several considerations for scaling the OpenTravel messaging system.

References

Field Notes: How to Back Up a Database with KMS Encryption Using AWS Backup

Post Syndicated from Ram Konchada original https://aws.amazon.com/blogs/architecture/field-notes-how-to-back-up-a-database-with-kms-encryption-using-aws-backup/

An AWS security best practice from The 5 Pillars of the AWS Well-Architected Framework is to ensure that data is protected both in transit and at rest. One option is to use SSL/TLS to encrypt data in transit, and use cryptographic keys to encrypt data at rest. To meet your organization’s disaster recovery goals, periodic snapshots of databases should be scheduled and stored across Regions and across administrative accounts. This ensures quick Recovery Point Objective (RPO) and Recovery Time Objective (RTO).

From a security standpoint, these snapshots should also be encrypted. Consider a scenario where one administrative AWS account is used for running the Amazon Relational Database Service (Amazon RDS) instance and backups. In this scenario, you may discover a situation where data cannot be recovered either from production instance or backups if this AWS account is compromised. Amazon RDS snapshots encrypted using AWS managed customer master keys (CMKs) cannot be copied across accounts. Cross-account backup helps you avoid this situation.

This blog post presents a solution that helps you to perform cross-account backup using AWS Backup service and restore database instance snapshots encrypted using AWS Key Management Service (KMS) CMKs across the accounts. This can help you to meet your security, compliance, and business continuity requirements. Although the solution uses RDS as the database choice, it can be applied to other database services (with some limitations).

Architecture

Figure 1 illustrates the solution described in this blog post.

Figure 1: Reference architecture for Amazon RDS with AWS Backup using KMS over two different accounts

Figure 1: Reference architecture for Amazon RDS with AWS Backup using KMS over two different accounts

Solution overview

When your resources like Amazon RDS (including Aurora clusters) are encrypted, cross-account copy can only be performed if they are encrypted by AWS KMS customer managed customer master keys (CMK). The default vault is encrypted using Service Master Keys (SMK). Therefore, to perform cross-account backups, you must use CMK encrypted vaults instead of using your default backup vault.

  1. In the source account, create a backup of the Amazon RDS instance encrypted with customer managed key.
  2. Give the backup account access to the customer-managed AWS KMS encryption key used by the source account’s RDS instance.
  3. In the backup account, ensure that you have a backup vault encrypted using a customer-managed key created in the backup account. Also, make sure that this vault is shared with the different account using vault policy.
  4. Copy the encrypted snapshots to the target account. This will re-encrypt snapshots using the target vault account’s AWS KMS encryption keys in the target Region.

Prerequisites

  • Ensure you are in the correct AWS Region of operation.
  • Two AWS accounts within the same AWS Organization.
  • Source account where you have a CMK encrypted Amazon RDS instance.
  • Opt-in to cross-account backup.
  • Backup account to which you will copy the encrypted snapshots.
  • A backup vault encrypted with backup CMK (different from source CMK) in the backup account.
  • An IAM role to perform cross-account backup. You can also use the AWSBackupDefaultServiceRole.

Solution

In this blog post, two accounts are used that are part of the same organization. Ensure that you update your account IDs accordingly.

Source account – 111222333444
AWS Backup account – 666777888999

Create a customer-managed key in the source account

Step 1 – Create CMKs

Create symmetric and asymmetric CMKs in the AWS Management Console. During this process, you will determine the cryptographic configuration of your CMK and the origin of its key material. You cannot change these properties after the CMK is created. You can also set the key policy for the CMK, which can be changed later. Follow key rotation best practices to ensure maximum security.

  1. Choose Create key.
  2. To create a symmetric CMK, for key type choose Symmetric. Use AWS KMS as the key material origin, and choose the single-region key for Regionality.
  3. Choose Next, and proceed with Step 2.

Step 2 – Add labels

  1. Type an alias for the CMK (alias cannot begin with aws/. The aws/ prefix is reserved by Amazon Web Services to represent AWS managed CMKs in your account).
  2. Add a description that identifies the key usage.
  3. Add tags based on an appropriate tagging strategy.
  4. Choose Next, and proceed with Step 3.

Step 3 – Key administrative permissions

Select the IAM users and roles that can administer the CMK. Ensure that least privilege design is implemented when assigning roles and permissions, in addition to following best practices.

Step 4 – Key usage permissions

Next, we will need to define the key usage and permissions. Complete the following steps:

  1. Select the IAM users and roles that can use the CMK for cryptographic operations.
  2. Within the Other AWS accounts section, type the 12-digit account number of the backup account.
Figure 2: Key usage permissions

Figure 2: Key usage permissions

Step 5 – Key configuration

Review the chosen settings, and press the Finish button to complete key creation.

Figure 3: Key configuration

Figure 3: Key configuration

 

Cross-account access key policy

Read the blog post on sharing custom encryption keys more securely between accounts using AWS Key Management Service for more information.

Step 6 – CMK verification

Within the AWS KMS console page, verify that the CMK key has been successfully created and status is enabled.

Create an Amazon RDS database in source account

  1. Choose the correct AWS Region.
  2. Navigate to RDS through the console search option.
  3. Choose Create a Database option, and choose your Database type.
  4. In Database encryption settings, use the CMK key you created in the preceding steps.
  5. Create the database.
  6. Follow Amazon RDS security best practices.

Create an AWS Backup vault in the source account

  1. On the AWS Backup service, navigate to AWS Backup > AWS Backup vault.
  2. Create a backup vault by specifying the name, and add tags based on an appropriate tagging strategy.

Create an on-demand AWS Backup in the source account

For the purpose of this solution, we will create an on-demand backup from the AWS Backup dashboard. You can also choose an existing snapshot if it is already available.

  1. Choose Create on-demand backup. Choose resource type as Amazon RDS, and select the appropriate database name. Choose the option to create backup now. Complete the setup by providing an appropriate IAM role and tag values (you can use the prepopulated default IAM role).
  2. Wait for the backup to be created, processed, and completed. This may take several hours, depending on the size of the database. If this step is too close to an existing scheduled backup time, you may see the following message on the console: Note – this step might fail if the on-demand backup window is too close to or overlapping the scheduled backup time determined by the Amazon RDS service. If this occurs, then create an on-demand backup after the scheduled backup is completed.
Figure 4: Create on-demand backup

Figure 4: Create on-demand backup

  1. Confirm the status is completed once the backup process has finished.
Figure 5: Completed on-demand backup

Figure 5: Completed on-demand backup

  1. If you navigate to the backup vault you should see the recovery point stored within the source account’s vault.
Figure 6: Backup vault

Figure 6: Backup vault

Prepare AWS Backup account (666777888999)

Create SYMMETRIC CMK in the backup account

Follow the same steps as before in creating a symmetric CMK in backup account. Ensure that you do not grant access to the source AWS account to this key.

Figure 7: CMK in backup account

Figure 7: CMK in backup account

Add IAM policy to users and roles to use CMK created in source account

The key policy in the account, that owns the CMK, sets the valid range for permissions. But, users and roles in the external account cannot use the CMK until you attach IAM policies that delegate those permissions or use grants to manage access to the CMK. The IAM policies are set in the external account and follow the best practices for IAM policies. Review the blog post on sharing custom encryption keys more securely between accounts using AWS Key Management Service for more information.

Create an AWS Backup vault in backup account

In the backup account, navigate to the “Backup vaults” section on the AWS Backup service page, and choose Create Backup vault. Next, provide a backup vault name, and choose the CMK key you previously created. In addition, you can specify Backup vault tags. Finally, press the Create Backup vault button.

Figure 8: Create backup vault

Figure 8: Create backup vault

Allow access to the backup vault from organization (or organizational unit)

This step will enable multiple accounts with the organization to store snapshots in the backup vault.

{

    "Version": "2012-10-17",

    "Statement": [

        {

            "Effect": "Allow",

            "Principal": "*",

            "Action": "backup:CopyIntoBackupVault",

            "Resource": "*",

            "Condition": {

                "StringEquals": {

                    "aws:PrincipalOrgID": "o-XXXXXXXXXX"

                }

            }

        }

    ]

}

Copy recovery point from source account vault to backup account vault

Initiate a recovery point copy by navigating to the backup vault in the source account, and create a copy job. Select the correct Region, provide the backup vault ARN, and press the Copy button.

Figure 9: Copy job initiation

Figure 9: Copy job initiation

Next, allow the backup account to copy the data back into source account by adding permissions to your back vault “sourcebackupvault” access policy.

Figure 10: Allow AWS Backup vault access

Figure 10: Allow AWS Backup vault access

Initiate copy job

From the backup vault in the source account, press the Copy button to copy a recovery point to the backup account (shown in Figure 11).

Figure 11: Initiate copy job

Figure 11: Initiate copy job

Verify copy job is successfully completed

Verify that the copy job is completed and the recovery point is copied successfully to the AWS Backup account vault.

Figure 12: Copy job is completed

Figure 12: Copy job is completed

Restore Amazon RDS database in AWS Backup account

Initiate restore of recovery point

In the backup account, navigate to the backup vault on the AWS Backup service page. Push the Restore button to initiate the recovery job.

Figure 13: Restore recovery point

Figure 13: Restore recovery point

Restore AWS Backup

The process of restoring the AWS backup will automatically detect the database (DB) engine. Choose the DB instance class, storage type, and the Multi-AZ configuration based on your application requirements. Set the encryption key to the CMK created in the backup account.

Scroll down to bottom of the page, and press the Restore backup button.

Figure 14: Restore backup

Figure 14: Restore backup

Restore job verification

Confirm that Restore job is completed in the Status field.

Figure 15: Restore job verification

Figure 15: Restore job verification

Database verification

Once the job completes, the database is restored. This can be verified on the Management Console of the RDS service.

Conclusion

In this blog post, we showed you how to cross-account backup AWS KMS encrypted RDS instances using AWS Backup and CMK. We also verified the encryption keys used by the source and backup snapshots.

Using AWS Backup cross-account backup and cross-Region copy capabilities, you can quickly restore data to your backup accounts in your preferred AWS Region. This helps AWS Backup users to minimize business impact in the event of compromised accounts, unexpected disaster or service interruption. You can create consistent database snapshots and recover them across regions to meet your security, compliance, RTO and RPO requirements.

Thanks for reading this blog post. If you have any feedback or questions, please add them in the comments section.

 

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Three Steps to Port Your Containerized Application to AWS Lambda

Post Syndicated from Arthi Jaganathan original https://aws.amazon.com/blogs/architecture/field-notes-three-steps-to-port-your-containerized-application-to-aws-lambda/

AWS Lambda support for container images allows porting containerized web applications to run in a serverless environment. This gives you automatic scaling, built-in high availability, and a pay-for-value billing model so you don’t pay for over-provisioned resources. If you are currently using containers, container image support for AWS Lambda means you can use these benefits without additional upfront engineering efforts to adopt new tooling or development workflows. You can continue to use your team’s familiarity with containers while gaining the benefits from the operational simplicity and cost effectiveness of Serverless computing.

This blog post describes the steps to take a containerized web application and run it on Lambda with only minor changes to the development, packaging, and deployment process. We use Ruby, as an example, but the steps are the same for any programming language.

Overview of solution

The following sample application is a containerized web service that generates PDF invoices. We will migrate this application to run the business logic in a Lambda function, and use Amazon API Gateway to provide a Serverless RESTful web API. API Gateway is a managed service to create and run API operations at scale.

Figure 1: Generating PDF invoice with Lambda

Figure 1: Generating PDF invoice with Lambda

Walkthrough

In this blog post, you will learn how to port the containerized web application to run in a serverless environment using Lambda.

At a high level, you are going to:

  1. Get the containerized application running locally for testing
  2. Port the application to run on Lambda
    1. Create a Lambda function handler
    2. Modify the container image for Lambda
    3. Test the containerized Lambda function locally
  3. Deploy and test on Amazon Web Services (AWS)

Prerequisites

For this walkthrough, you need the following:

1. Get the containerized application running locally for testing

The sample code for this application is available on GitHub. Clone the repository to follow along.

```bash

git clone https://github.com/aws-samples/aws-lambda-containerized-custom-runtime-blog.git

```

1.1. Build the Docker image

Review the Dockerfile in the root of the cloned repository. The Dockerfile uses Bitnami’s Ruby 3.0 image from the Amazon ECR Public Gallery as the base. It follows security best practices by running the application as a non-root user and exposes the invoice generator service on port 4567. Open your terminal and navigate to the folder where you cloned the GitHub repository. Build the Docker image using the following command.

```bash
docker build -t ruby-invoice-generator .
```

1.2 Test locally

Run the application locally using the Docker run command.

```bash
docker run -p 4567:4567 ruby-invoice-generator
```

In a real-world scenario, the order and customer details for the invoice would be passed as POST request body or GET request query string parameters. To keep things simple, we are randomly selecting from a few hard coded values inside lib/utils.rb. Open another terminal, and test invoice creation using the following command.

```bash
curl "http://localhost:4567/invoice" \
  --output invoice.pdf \
  --header 'Accept: application/pdf'
```

This command creates the file invoice.pdf in the folder where you ran the curl command. You can also test the URL directly in a browser. Press Ctrl+C to stop the container. At this point, we know our application works and we are ready to port it to run on Lambda as a container.

2. Port the application to run on Lambda

There is no change to the Lambda operational model and request plane. This means the function handler is still the entry point to application logic when you package a Lambda function as a container image. Also, by moving our business logic to a Lambda function, we get to separate out two concerns and replace the web server code from the container image with an HTTP API powered by API Gateway. You can focus on the business logic in the container with API Gateway acting as a proxy to route requests.

2.1. Create the Lambda function handler

The code for our Lambda function is defined in function.rb, and the handler function from that file will be described shortly. The main difference to note between the original Sintra-powered code and our Lambda handler version is the need to base64 encode the PDF. This is a requirement for returning binary media from API Gateway Lambda proxy integration. API Gateway will automatically decode this to return the PDF file to the client.

```ruby

def self.process(event:, context:)

  self.logger.debug(JSON.generate(event))

  invoice_pdf = Base64.encode64(Invoice.new.generate)

  { 'headers' => { 'Content-Type': 'application/pdf' },

    'statusCode' => 200,

    'body' => invoice_pdf,

    'isBase64Encoded' => true

  }

end

```

If you need a reminder on the basics of Lambda function handlers, review the documentation on writing a Lambda handler in Ruby. This completes the new addition to the development workflow—creating a Lambda function handler as the wrapper for the business logic.

2.2 Modify the container image for Lambda

AWS provides open source base images for Lambda. At this time, these base images are only available for Ruby runtime versions 2.5 and 2.7. But, you can bring any version of your preferred runtime (Ruby 3.0 in this case) by packaging it with your Docker image. We will use Bitnami’s Ruby 3.0 image from the Amazon ECR Public Gallery as the base. Amazon ECR is a fully managed container registry, and it is important to note that Lambda only supports running container images that are stored in Amazon ECR; you can’t upload an arbitrary container image to Lambda directly.

Because the function handler is the entry point to business logic, the Dockerfile CMD must be set to the function handler instead of starting the web server. In our case, because we are using our own base image to bring a custom runtime, there is an additional change we must make. Custom images need the runtime interface client to manage the interaction between the Lambda service and our function’s code.

The runtime interface client is an open-source lightweight interface that receives requests from Lambda, passes the requests to the function handler, and returns the runtime results back to the Lambda service. The following are the relevant changes to the Dockerfile.

```bash

ENTRYPOINT ["aws_lambda_ric"]

CMD [ "function.Billing::InvoiceGenerator.process" ]

```

The Docker command that is implemented when the container runs is: aws_lambda_ric function.Billing::InvoiceGenerator.process.

Refer to Dockerfile.lambda in the cloned repository for the complete code. This image follows best practices for optimizing Lambda container images by using a multi-stage build. The final image uses  tag named 3.0-prod as its base. This does not include development dependencies and helps keep the image size down. Create the Lambda-compatible container image using the following command.

```bash

docker build -f Dockerfile.lambda -t lambda-ruby-invoice-generator .

```

This concludes changes to the Dockerfile. We have introduced a new dependency on the runtime interface client and used it as our container’s entrypoint.

2.3. Test the containerized Lambda function locally

The runtime interface client expects requests from the Lambda Runtime API. But when we test on our local development workstation, we don’t run the Lambda service. So, we need a way to proxy the Runtime API for local testing. Because local testing is an integral part of most development workflows, AWS provides the Lambda runtime interface emulator. The emulator is a lightweight web server running on port 8080 that converts HTTP requests to Lambda-compatible JSON events. The flow for local testing is shown in Figure 2.

Figure 2: Testing the Lambda container image locally

Figure 2: Testing the Lambda container image locally

When we want to perform local testing, the runtime interface emulator becomes the entrypoint. Consequently, the Docker command that is executed when the container runs locally is: aws-lambda-rie aws_lambda_ric function.Billing::InvoiceGenerator.process.

You can package the emulator with the image. The Lambda container image support launch blog documents steps for this approach. However, to keep the image as slim as possible, we recommend that you install it locally, and mount it while running the container instead. Here is the installation command for Linux platforms.

``` bash

mkdir -p
~/.aws-lambda-rie && curl -Lo ~/.aws-lambda-rie/aws-lambda-rie
https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie
&& chmod +x ~/.aws-lambda-rie/aws-lambda-rie

```

Use the Docker run command with the appropriate overrides to entrypoint and cmd to start the Lambda container. The emulator is mapped to local port 9000.

```bash

docker run \

  -v ~/.aws-lambda-rie:/aws-lambda \

  -p 9000:8080 \

  -e LOG_LEVEL=DEBUG \

  --entrypoint /aws-lambda/aws-lambda-rie \

  lambda-ruby-invoice-generator \

  /opt/bitnami/ruby/bin/aws_lambda_ric function.Billing::InvoiceGenerator.process

```

Open another terminal and run the curl command below to simulate a Lambda request.

```bash

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'

```
You will see a JSON output with the base64 encoded value of the invoice for body and isBase64Encoded property set to true. After Lambda is integrated with API Gateway, the API endpoint uses the flag to decode the text before returning the response to the caller. The client will receive the decoded PDF invoice. Push Ctrl+C to stop the container. This concludes changes to the local testing workflow.

3. Deploy and test on AWS

The final step is to deploy the invoice generator service. Set your AWS Region and AWS Account ID as environment variables.

```bash

export AWS_REGION=Region

export AWS_ACCOUNT_ID=account

```

3.1. Push the Docker image to Amazon ECR

Create an Amazon ECR repository for the image.

```bash

ECR_REPOSITORY=`aws ecr create-repository \

  --region $AWS_REGION \

  --repository-name lambda-ruby-invoice-generator \

  --tags Key=Project,Value=lambda-ruby-invoice-generator-blog \

  --query "repository.repositoryName" \

  --output text`

```

Login to Amazon ECR, and push the container image to the newly-created repository.

```bash

aws ecr get-login-password \

  --region $AWS_REGION | docker login \

  --username AWS \

  --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com

 

docker tag \

  lambda-ruby-invoice-generator:latest \

  $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$ECR_REPOSITORY:latest

 

docker push

$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$ECR_REPOSITORY:latest

```

3.2. Create the Lambda function

After the push succeeds, you can create the Lambda function. You need to create a Lambda execution role first and attach the managed IAM policy named AWSLambdaBasicExecutionRole. This gives the function access to Amazon CloudWatch for logging and monitoring.

```bash

LAMBDA_ROLE=`aws iam create-role \

  --region $AWS_REGION \

  --role-name ruby-invoice-generator-lambda-role \

  --assume-role-policy-document file://lambda-role-trust-policy.json \

  --tags Key=Project,Value=lambda-ruby-invoice-generator-blog \

  --query "Role.Arn" \

  --output text`

 

aws iam attach-role-policy \

  --region $AWS_REGION \

  --role-name ruby-invoice-generator-lambda-role \

  --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

 

LAMBDA_ARN=`aws lambda create-function \

  --region $AWS_REGION \

  --function-name ruby-invoice-generator \

  --description "[AWS Blog] Lambda Ruby Invoice Generator" \

  --role $LAMBDA_ROLE \

  --package-type Image \

  --code ImageUri=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$ECR_REPOSITORY:latest \

  --timeout 15 \

  --memory-size 256 \

  --tags Project=lambda-ruby-invoice-generator-blog \

  --query "FunctionArn" \
  --output text`
```

Wait for the function to be ready. Use the following command to verify that the function state is set to Active.

```bash

aws lambda get-function \

  --region $AWS_REGION \

  --function-name ruby-invoice-generator \

  --query "Configuration.State"

```

3.3. Integrate with API Gateway

API Gateway offers two option to create RESTful APIs: HTTP and REST. We will use HTTP API because it offers lower cost and latency when compared to REST API. REST API provides additional features that we don’t need for this demo.

```bash

aws apigatewayv2 create-api \

  --region $AWS_REGION \

  --name invoice-generator-api \

  --protocol-type HTTP \

  --target $LAMBDA_ARN \

  --route-key "GET /invoice" \

  --tags Key=Project,Value=lambda-ruby-invoice-generator-blog \

  --query "{ApiEndpoint: ApiEndpoint, ApiId: ApiId}" \

  --output json

```

Record the ApiEndpoint and ApiId from the earlier command, and substitute them for the placeholders in the following command. You need to update the Lambda resource policy to allow HTTP API to invoke it.

```bash

export API_ENDPOINT="<ApiEndpoint>"

export API_ID="<ApiId>"

 

aws lambda add-permission \

  --region $AWS_REGION \

  --statement-id invoice-generator-api \

  --action lambda:InvokeFunction \

  --function-name $LAMBDA_ARN \

  --principal apigateway.amazonaws.com \

  --source-arn "arn:aws:execute-api:$AWS_REGION:$AWS_ACCOUNT_ID:$API_ID/*/*/invoice"

```

3.4. Verify the AWS deployment

Use the curl command to generate a PDF invoice.

```bash

curl "$API_ENDPOINT/invoice" \
  --output lambda-invoice.pdf \
  --header 'Accept: application/pdf'

```

This creates the file lambda-invoice.pdf in the local folder. You can also test directly from a browser.

This concludes the final step for porting the containerized invoice web service to Lambda. This deployment workflow is very similar to any other containerized application where you first build the image and then create or update your application to the new image as part of deployment. Only the actual deploy command has changed because we are deploying to Lambda instead of a container platform.

Cleaning up

To avoid incurring future charges, delete the resources. Follow cleanup instructions in the README file on the GitHub repo.

Conclusion

In this blog post, we learned how to port an existing containerized application to Lambda with only minor changes to development, packaging, and testing workflows. For teams with limited time, this accelerates the adoption of Serverless by using container domain knowledge and promoting reuse of existing tooling. We also saw how you can easily bring your own runtime by packaging it in your image and how you can simplify your application by eliminating the web application framework.

For more Serverless learning resources, visit https://serverlessland.com.

 

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

 

Field Notes: Creating Custom Analytics Dashboards with FireEye Helix and Amazon QuickSight

Post Syndicated from Karish Chowdhury original https://aws.amazon.com/blogs/architecture/field-notes-creating-custom-analytics-dashboards-with-fireeye-helix-and-amazon-quicksight/

FireEye Helix is a security operations platform that allows organizations to take control of any incident from detection to response. FireEye Helix detects security incidents by correlating logs and configuration settings from sources like VPC Flow Logs, AWS CloudTrail, and Security groups.

In this blog post, we will discuss an architecture that allows you to create custom analytics dashboards with Amazon QuickSight. These dashboards are based on the threat detection logs collected by FireEye Helix. We automate this process so that data can be pulled and ingested based on a provided schedule. This approach uses AWS Lambda, and Amazon Simple Storage Service (Amazon S3) in addition to QuickSight.

Architecture Overview

The solution outlines how to ingest the security log data from FireEye Helix to Amazon S3 and create QuickSight visualizations from the log data. With this approach, you need Amazon EventBridge to invoke a Lambda function to connect to the FireEye Helix API. There are two steps to this process:

  1. Download the log data from FireEye Helix and store it in Amazon S3.
  2. Create a visualization Dashboard in QuickSight.

The architecture shown in Figure 1 represents the process we will walk through in this blog post. To implement this solution, you will need the following AWS services and features involved:

Figure 1: Solution architecture

Figure 1: Solution architecture

Prerequisites to implement the solution:

The following items are required to get your environment set up for this walkthrough.

  1. AWS account.
  2. FireEye Helix search alerts API endpoint. This is available under the API documentation in the FireEye Helix console.
  3. FireEye Helix API key. This FireEye community page explains how to generate an API key with appropriate permissions (always follow least privilege principles). This key is used by the Lambda function to periodically fetch alerts.
  4. AWS Secrets Manager secret (to store the FireEye Helix API key). To set it up, follow the steps outlined in the Creating a secret.

Extract the data from FireEye Helix and load it into Amazon S3

You will use the following high-level steps to retrieve the necessary security log data from FireEye Helix and store it on Amazon S3 to make it available for QuickSight.

  1. Establish an AWS Identity and Access Management (IAM) role for the Lambda function. It must have permissions to access Secrets Manager and Amazon S3 so they can retrieve the FireEye Helix API key and store the extracted data, respectively.
  2. Create an Amazon S3 bucket to store the FireEye Helix security log data.
  3. Create a Lambda function that uses the API key from Secrets Manager, calls the FireEye Helix search alerts API to extract FireEye Helix’s threat detection logs, and stores the data in the S3 bucket.
  4. Establish a CloudWatch EventBridge rule to invoke the Lambda function on an automated schedule.

To simplify this deployment, we have developed a CloudFormation template to automate creating the preceding requirements. Follow the below steps to deploy the template:

  • Download the source code from the GitHub repository
  • Navigate to CloudFormation console and select Create Stack
  • Select “Upload a template file” radio button, click on “Choose file” button, and select “helix-dashboard.yaml” file in the downloaded Github repository. Click “Next” to proceed.
  • On “Specify stack details” screen enter the parameters shown in Figure 2.
Figure 2: CloudFormation stack creation with initial parameters

Figure 2: CloudFormation stack creation with initial parameters

The parameters in Figure 2 include:

  • HelixAPISecretName – Enter the Secrets Manager secret name where the FireEye Helix API key is stored.
  • HelixEndpointUrl – Enter the Helix search API endpoint URL.
  • Amazon S3 bucket – Enter the bucket prefix (a random suffix will be added to make it unique).
  • Schedule – Choose the default option that pulls logs once a day, or enter the CloudWatch event schedule expression.

Select the check box next to “I acknowledge that AWS CloudFormation might create IAM resources.” and then press the Create Stack button. After the CloudFormation stack completes, you will have a fully functional process that will retrieve the FireEye Helix security log data and store it on the S3 bucket.

You can also select the Lambda function from the CloudFormation stack outputs to navigate to the Lambda console. Review the following default code, and add any additional transformation logic according to your needs after fetching the results (line 32).

import boto3
from datetime import datetime
import requests
import os

region_name = os.environ['AWS_REGION']
secret_name = os.environ['APIKEY_SECRET_NAME']
bucket_name = os.environ['S3_BUCKET_NAME']
helix_api_url = os.environ['HELIX_ENDPOINT_URL']

def lambda_handler(event, context):

    now = datetime.now()
    # Create a Secrets Manager client to fetch API Key from Secrets Manager
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )
    apikey = client.get_secret_value(
            SecretId=secret_name
        )['SecretString']
    
    datestr = now.strftime("%B %d, %Y")
    apiheader = {'x-fireeye-api-key': apikey}
    
    try:
        # Call Helix Rest API to fetch the Alerts
        helixalerts = requests.get(
            f'{helix_api_url}?format=csv&query=start:"{datestr}" end:"{datestr}" class=alerts', headers=apiheader)
    
        # Optionally transform the content according to your needs..
        
        # Create a S3 client to upload the CSV file
        s3 = boto3.client('s3')
        path = now.strftime("%Y/%m/%d")+'/alerts-'+now.strftime("%H-%M-%S")+'.csv'
        response = s3.put_object(
            Body=helixalerts.content,
            Bucket=bucket_name,
            Key=path,
        )
        print('S3 upload response:', response)
    except Exception as e:
        print('error while fetching the alerts', e)
        raise e
        
    return {
        'statusCode': 200,
        'body': f'Successfully fetched alerts from Helix and uploaded to {path}'
    }

Creating a visualization in QuickSight

Once the FireEye data is ingested into an S3 bucket, you can start creating custom reports and dashboards using QuickSight. Following is a walkthrough on how to create a visualization in QuickSight based on the data that was ingested from FireEye Helix.

Step 1 – When placing the FireEye data into Amazon S3, ensure you have a clean directory structure so you can partition your data. By partitioning your data, you can restrict the amount of data scanned by each query, thus improving performance and reducing cost. The following is a sample directory structure you could use.

      ssw-fireeye-logs

           2021

               04

               05

The following is an example of what the data will look like after it is ingested from FireEye Helix into your Amazon S3 bucket. In this blog post, we will use the Alert_Desc column to report on the types of common attacks.

Step 2 – Next, you must create a manifest file that will instruct QuickSight how to read the FireEye log files on Amazon S3. The preceding example is a manifest file that instructs QuickSight to recursively search for files in the ssw-fireeye-logs bucket, and can be seen in the URIPrefixes section. The GlobalUploadSettings section informs QuickSight the type and format of files it will read.

"fileLocations": [
	{
		"URIPrefixes": [
			"s3://ssw-fireeye-logs/"
		]
	},
	],
	"globalUploadSettings": {
		"format": "CSV",
		"delimiter": ",",
		"textqalifier": "'",
		"containsHeader": "true"
	}
}

Step 3 – Open Amazon QuickSight. Use the AWS Management Console and search for QuickSight.

Step 4 – Below the QuickSight logo, find and select Datasets.

Step 5 – Push the blue New dataset button.

 

 

Step 6 – Now you are on Create a Dataset page which enables you to select a data source you would like QuickSight to ingest. Because we have stored the FireEye Helix data on S3, you should choose the S3 data source.

 

 

 

 

Step 7 – A pop-up box will appear called New S3 data source. Type a data source name, and upload the manifest file you created. Next, push the Connect button.

Step 8 – You are now directed to the Visualize screen. For this exercise let’s choose a Pie chart, you can find this in the Visual types section by hovering over each icon and reading each tool tip that comes up. Look for the tool tip that says Pie chart. After selecting the Pie Chart visual type, two entries in the Field wells section at the top of the screen will show up called Group/Color and Value. Click the drop down in Group/Color and select the Alert_Desc column. Now click the drop down in Value and also select Alter_Desc column but choose count as an aggregate. This will create an informative visualization of the most common attacks based on the sample data shown previously in Step 1.

Figure 3: Visualization screen in QuickSight

Figure 3: Visualization screen in QuickSight

Clean up

If you created any resources in AWS for this solution, consider removing them and any example resources you deployed. This includes the FireEye Helix API keys, S3 buckets, Lambda functions, and QuickSight visualizations. This helps ensure that there are no unwanted recurring charges. For more information, review how to delete the CloudFormation stack.

Conclusion

This blog post showed you how to ingest security log data from FireEye Helix and store that data in Amazon S3. We also showed you how to create your own visualizations in QuickSight to better understand the overall security posture of your AWS environment. With this solution, you can perform deep analysis and share these insights among different audiences in your organization (such as, operations, security, and executive management).

 

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

 

Field Notes: Implementing HA and DR for Microsoft SQL Server using Always On Failover Cluster Instance and SIOS DataKeeper

Post Syndicated from Sudhir Amin original https://aws.amazon.com/blogs/architecture/field-notes-implementing-ha-and-dr-for-microsoft-sql-server-using-always-on-failover-cluster-instance-and-sios-datakeeper/

To ensure high availability (HA) of Microsoft SQL Server in Amazon Elastic Compute Cloud (Amazon EC2), there are two options: Always On Failover Cluster Instance (FCI) and Always On availability groups. With a wide range of networking solutions such as VPN and AWS Direct Connect, you have options to further extend your HA architecture to another AWS Region to also meet your disaster recovery (DR) objectives.

You can also run asynchronous replication between Regions, or a combination of both synchronous replication between Availability Zones and asynchronous replication between Regions.

Choosing the right SQL Server HA solution depends on your requirements. If you have hundreds of databases, or are looking to avoid the expense of SQL Server Enterprise Edition, then SQL Server FCI is the preferred option. If you are looking to protect user-defined databases and system databases that hold agent jobs, usernames, passwords, and so forth, then SQL Server FCI is the preferred option. If you already own SQL Server Enterprise licenses we recommend continuing with that option.

On-premises Always On FCI deployments typically require shared storage solutions such as a fiber channel storage area network (SAN). When deploying SQL Server FCI in AWS, a SAN is not a viable option. Although Amazon FSx for Microsoft Windows is the recommended storage option for SQL Server FCI in AWS, it does not support adding cluster nodes in different Regions. To build a SQL Server FCI that spans both Availability Zones and Regions, you can use a solution such as SIOS DataKeeper.

In this blog post, we will show you how SIOS DataKeeper solves both HA and DR requirements for customers by enabling a SQL Server FCI that spans both Availability Zones and Regions. Synchronous replication will be used between Availability Zones for HA, while asynchronous replication will be used between Regions for DR.

Architecture overview

We will create a three-node Windows Server Failover Cluster (WSFC) with the configuration shown in Figure 1 between two Regions. Virtual private cloud (VPC) peering was used to enable routing between the different VPCs in each Region.

High-level architecture showing two cluster nodes replicating synchronously between Availability Zones.

Figure 1 – High-level architecture showing two cluster nodes replicating synchronously between Availability Zones.

Walkthrough

SIOS DataKeeper is a host-based, block-level volume replication solution that integrates with WSFC to enable what is known as a SANLess cluster. DataKeeper runs on each cluster node and has two primary functions: data replication and cluster integration through the DataKeeper volume cluster resource.

The DataKeeper volume resource takes the place of the traditional Cluster Disk resource found in a SAN-based cluster. Upon failover, the DataKeeper volume resource controls the replication direction, ensuring the active cluster node becomes the source of the replication while all the remaining nodes become the target of the replication.

After the SANLess cluster is configured, it is managed through Windows Server Failover Cluster Manager just like its SAN-based counterpart. Configuration of DataKeeper can be done through the DataKeeper UI, CLI, or PowerShell. AWS CloudFormation templates can also be used for automated deployment, as shown in AWS Quick Start.

The following steps explain how to configure a three-node SQL Server SANless cluster with SIOS DataKeeper.

Prerequisites

  • IAM permissions to create VPCs and VPC Peering connection between them.
  • A VPC that spans two Availability Zones and includes two private subnets to host Primary and Secondary SQL Server.
  • Another VPC with single Availability Zone and includes one private subnet to host Tertiary SQL Server Node for Disaster Recovery.
  • Subscribe to the SIOS DataKeeper Cluster Edition AMI in the AWS Marketplace, or sign up for FTP download for SIOS Cluster Edition software.
  • An existing deployment of Active Directory with networking access to support the Windows Failover Cluster and SQL Deployment. Active Directory can deployed on EC2 with AWS Launch Wizard for Active Directory or through our Quick Start for Active Directory.
  • Active Directory Domain administrator account credentials.

Configuring a three-node cluster

Step 1: Configure the EC2 instances

It’s good practice to record the system and network configuration details as shown below[SM1]. Indicate the hostname, volume information, and networking configuration of each cluster node. Each node requires a primary IP address and two secondary IP addresses, one for the core cluster resource and one for the SQL Server client listener.

The example shown in Figure 2 uses a single volume; however, it is completely acceptable to use multiple volumes in your SQL Server FCI.

Figure 2 – Shows Storage, Network and AWS Configuration details of all threes cluster nodes

Figure 2 – Shows Storage, Network and AWS Configuration details of all threes cluster nodes

Due to Region and Availability Zone constructs, each cluster node will reside in a different subnet. A cluster that spans multiple subnets is referred to as a multi-subnet failover cluster. Refer to Create a failover cluster and SQL Server Multi-Subnet Clustering to learn more.

Step 2: Join the domain

With properly configured networking and security groups, DNS name resolution, and Active Directory credentials with required permissions, join all the cluster nodes to the domain. You can use AWS Managed Microsoft AD or manage your own Microsoft Active Directory domain controllers.

Step 3: Create the basic cluster

  1. Install the Failover Clustering feature and its prerequisite software components on all three nodes using PowerShell, shown in the following example.
    Install-WindowsFeature -Name Failover-Clustering -IncludeManagementTools
  2. Run cluster validation [CA2] [CA3] using PowerShell, shown in the following example.
    Test-Cluster -Node wsfcnode1.awslab.com, wsfcnode2.awslab.com, drnode.awslab.com
    NOTE: Windows PowerShell cmdlet Test-Cluster tests the underlying hardware and software, directly and individually, to obtain an accurate assessment of how well Failover Clustering can be supported in a given configuration with the following steps.
  3. Create the cluster using PowerShell, shown in the following example.
    New-Cluster -Name WSFCCluster1 -NoStorage -StaticAddress 10.0.0.101, 10.0.32.101, 11.0.0.101 -Node wsfcnode1.awslab.com, wsfcnode2.awslab.com, drnode.awslab.com
  4. For more information using PowerShell for WSFC, view Microsoft’s documentation.
  5. It is always best practice to configure a file share witness and add it to your cluster quorum. You can use Amazon FSx for Windows to provide a Server Message Block (SMB) share, or you can create a file share on another domain joined server in your environment. See the following for more documentation for more details.

A successful implementation will show all three nodes UP (shown in Figure 3) in the Failover Cluster Manager.

Figure 3 –Shows all three nodes status participating in Windows Failover Cluster

Figure 3 – Shows all three nodes status participating in Windows Failover Cluster

Step 4: Configure SIOS DataKeeper

After the basic cluster is created, you are ready to proceed with the DataKeeper configuration. Below are the basic steps to configure DataKeeper. For more detailed instructions, review the SIOS documentation.

Step 4.1: Install DataKeeper on all three nodes

  • After you sign up for a free demo, SIOS provides you with an FTP URL to download the software package of SIOS Datakeeper Cluster edition. Use the FTP URL to download and run the latest software release.
  • Choose Next, and read and accept the license terms.
  • By Default, both components SIOS Datakeeper Server Components and SIOS DataKeeper User Interface will be selected. Choose Next to proceed.
  • Accept the default install location, and choose Next.
  • Next, you will be prompted to accept the system configuration changes to add firewall exceptions for the required ports. Choose Yes to enable.
  • Enter credentials for SIOS Service account, and choose Next.
  • Finally, choose Yes, and then Finish, to restart your system.

Step 4.2: Relocate the intent log (bitmap) file to the ephemeral drive

  • Relocate the bitmap file to the ephemeral drive (confirm the ephemeral drive uses Z:\ for its drive letter).
  • SIOS DataKeeper uses an intent log (also referred to as a bitmap file) to track changes made to the source, or to target volume during times that the target is unlocked. SIOS recommends to use ephemeral storage as a preferred location for the intent log to minimize the performance impact associated with the intent log. By default, SIOS InstallShield Wizard will store the bitmap at: “C:\Program Files (x86) \SIOS\DataKeeper\Bitmaps”.
  • To change the intent log location, make the following registry changes. Edit registry through regedit. “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ExtMirr\Parameter”
  • Modify the “BitmapBaseDir” parameter by changing to the new location (i.e.s, Z:\), and then reboot your system.
  • To ensure the ephemeral drive is reattached upon reboot, specify volume settings in the DriveLetterMappingConfig.json, and save the file under the location.
    “C:\ProgramData\Amazon\EC2-Windows\Launch\Config\DriveLetterMappingConfig.json”.

    { 
      "driveLetterMapping": [
      {
       "volumeName": "Temporary Storage 0",
       "driveLetter": "Z"
      },
      {
       "volumeName": "Data",
        "driveLetter": "D"
      }
     ]
    }
    
  • Next, open Windows PowerShell and use the following command to run the EC2Launch script that initializes the disks:
    C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeDisks.ps1 -ScheduleFor more information, see EC2Launch documentation.NOTE: If you are using the SIOS DataKeeper Marketplace AMI, you can skip the previous steps to install SIOS Software and relocating bitmap file, and continue with the following steps.

Step 4.3: Create a DataKeeper job

  • Open SIOS DataKeeper application. From the right-side Actions panel, select Create Job.
  • Enter Job name and Job description, and then choose Create Job.
  • A create mirror wizard will open. In this job, we will set up a mirror relationship between WSFCNode1(Primary) and WSFCNode2(Secondary) within us-east-1 using synchronous replication.
    Select the source WSFCNode1.awscloud.com and volume D. Choose Next to proceed.

    NOTE: The volume on both the source and target systems must be NTFS file system type, and the target volume must be greater than or equal to the size of the source volume.
  • Choose the target WSFCNODE2.awscloud.com and correctly-assigned volume drive D.
  • Now, select how the source volume data should be sent to the target volume.
    For our design, choose Synchronous replication for nodes within us-east-1.
  • After you choose Done, you will be prompted to register the volume into Windows Failover Cluster as a clustered datakeeper volume as a storage resource. When prompted, choose Yes.
  • After you have successfully created the first mirror you will see the status of the mirror job (as shown in Figure 4).
    If you have additional volumes, repeat the previous steps for each additional volume.

    Figure 4 – Shows the SIOS Datakeeper Job Summary of mirrored volume between WFSCNODE1 & WSFCNODE2

    Figure 4 – Shows the SIOS Datakeeper Job Summary of mirrored volume between WFSCNODE1 & WSFCNODE2

  • If you have additional volumes, repeat the previous steps for each additional volume.

Step 4.4: Add another mirror pair to the same job using asynchronous replication between nodes residing in different AWS Regions (us-east-1 and us-east-2)

  • Open the create mirror wizard from the right-side action pane.
  • Choose the source WSFCNode1.awscloud.com and volume D.
  • For Server, enter DRNODE.awscloud.com.
  • After you are connected, select the target DRNODE.awscloud.com and assigned volume D.
  • For, How should the source volume data be sent to the target volume?, choose Asynchronous replication for nodes between us-east-1 and us-east-2.
  • Choose Done to finish, and a pop-up window will provide you with additional information on action to take in the event of failover.
  • You configured WSFCNode1.awscloud.com to mirror asynchronously to DRNODE.awscloud.com. However, during the event of failover, WSFCNode2.awscloud.com will become the new owner of the Failover Cluster and therefore own the source volume. This function of SIOS DataKeeper, called switchover, automatically changes the source of the mirror to the new owner of the Windows Failover Cluster. For our example, SIOS DataKeeper will automatically perform a switchover function to create a new mirror pair between WSFCNODE2.awscloud.com and DRNODE.awscloud.com. Therefore, select Asynchronous mirror type, and choose OK. For more information, see Switchover and Failover with Multiple Targets.
  • A successfully-configured job will appear under Job Overview.

 

If you prefer to script the process of creating the DataKeeper job, and adding the additional mirrors, you can use the DataKeeper CLI as described in How to Create a DataKeeper Replicated Volume that has Multiple Targets via CLI.

If you have done everything correctly, the DataKeeper UI will look similar to the following image.

Furthermore, Failover Cluster Manager will show the Datakeeper Volume D, online, registered.

Step 5: Install the SQL Server FCI

  • Install SQL Server on WSFCNode1 with the New SQL Server failover cluster installation option in the SQL Server installer.
  • Install SQL Server on WSFCNode2 with the Add node to a SQL Server failover cluster option in the SQL Server installer.
  • If you are using SQL Server Enterprise Edition, install SQL Server on DRNode using the “Add Node to Existing Cluster” option in the SQL Server installer.
  • If you are using SQL Server Standard Edition, you will not be able to add the DR node to the cluster. Follow the SIOS documentation: Accessing Data on the Non-Clustered Disaster Recovery Node.

For detailed information on installing a SQL Server FCI, visit SQL Server Failover Cluster Installation.

At the end of installation, your Failover Cluster Manager will look similar to the following image.

Step 6: Client redirection considerations

When you install SQL Server into a multi-subnet cluster, RegisterAllProvidersIP is set to true. With that setting enabled, your DNS Server will register three Name Records (A), each using the name that you used when you created the SQL Listener during the SQL Server FCI installation. Each Name (A) record will have a different IP address, one for each of the cluster IP addresses that are associated with that cluster name resource.

If your clients are using .NET Framework 4.6.1 or later, the clients will manage the cross-subnet failover automatically. If your clients are using .NET Framework 4.5 through 4.6, then you will need to add MultiSubnetFailover=true to your connection string.

If you have an older client that does not allow you to add the MultiSubnetFailover=true parameter, then you will need to change the way that the failover cluster behaves. First, set the RegisterAllProvidersIP to false, so that only one Name (A) record will be entered in DNS. Each time the cluster fails over the name resource, the Name (A) record in DNS will be updated with the cluster IP address associated with the node coming online. Finally, adjust the TTL of the Name (A) record so that the clients do not have to wait the default 20 minutes before the TTL expires and the IP address is refreshed.

In the following PowerShell, you can see how to change the RegisterAllProvidersIP setting and update the TTL to 30 seconds.

Get-ClusterResource "sql name resource" | Set-ClusterParameter RegisterAllProvidersIP 0  
Set-ClusterParameter -Name HostRecordTTL 30

Cleanup

When you are finished, you can terminate all three EC2 instances you launched.

Conclusion

Deploying a SQL Server FCI, that includes both Availability Zones and AWS Regions, is ideal for a solution that includes HA and DR. Although AWS provides the necessary infrastructure in terms of compute and networking, it is the combination of SIOS DataKeeper with WSFC that makes this configuration possible.

The solution described in this blogpost works with all supported versions of Windows Failover Cluster and SQL Server. Other configurations, such as hybrid-cloud and multi-cloud, are also possible. If adequate networking is in place, and Windows Server is running, where the cluster nodes reside is entirely up to you.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Building an automated scene detection pipeline for Autonomous Driving – ADAS Workflow

Post Syndicated from Kevin Soucy original https://aws.amazon.com/blogs/architecture/field-notes-building-an-automated-scene-detection-pipeline-for-autonomous-driving/

This Field Notes blog post in 2020 explains how to build an Autonomous Driving Data Lake using this Reference Architecture. Many organizations face the challenge of ingesting, transforming, labeling, and cataloging massive amounts of data to develop automated driving systems. In this re:Invent session, we explored an architecture to solve this problem using Amazon EMR, Amazon S3, Amazon SageMaker Ground Truth, and more. You learn how BMW Group collects 1 billion+ km of anonymized perception data from its worldwide connected fleet of customer vehicles to develop safe and performant automated driving systems.

Architecture Overview

The objective of this post is to describe how to design and build an end-to-end Scene Detection pipeline which:

This architecture integrates an event-driven ROS bag ingestion pipeline running Docker containers on Elastic Container Service (ECS). This includes a scalable batch processing pipeline based on Amazon EMR and Spark. The solution also leverages AWS Fargate, Spot Instances, Elastic File System, AWS Glue, S3, and Amazon Athena.

reference architecture - build automated scene detection pipeline - Autonomous Driving

Figure 1 – Architecture Showing how to build an automated scene detection pipeline for Autonomous Driving

The data included in this demo was produced by one vehicle across four different drives in the United States. As the ROS bag files produced by the vehicle’s on-board software contains very complex data, such as Lidar Point Clouds, the files are usually very large (1+TB files are not uncommon).

These files usually need to be split into smaller chunks before being processed, as is the case in this demo. These files also may need to have post-processing algorithms applied to them, like lane detection or object detection.

In our case, the ROS bag files are split into approximately 10GB chunks and include topics for post-processed lane detections before they land in our S3 bucket. Our scene detection algorithm assumes the post processing has already been completed. The bag files include object detections with bounding boxes, and lane points representing the detected outline of the lanes.

Prerequisites

This post uses an AWS Cloud Development Kit (CDK) stack written in Python. You should follow the instructions in the AWS CDK Getting Started guide to set up your environment so you are ready to begin.

You can also use the config.json to customize the names of your infrastructure items, to set the sizing of your EMR cluster, and to customize the ROS bag topics to be extracted.

You will also need to be authenticated into an AWS account with permissions to deploy resources before executing the deploy script.

Deployment

The full pipeline can be deployed with one command: * `bash deploy.sh deploy true` . The progress of the deployment can be followed on the command line, but also in the CloudFormation section of the AWS console. Once deployed, the user must upload 2 or more bag files to the rosbag-ingest bucket to initiate the pipeline.

The default configuration requires two bag files to be processed before an EMR Pipeline is initiated. You would also have to manually initiate the AWS  Glue Crawler to be able to explore the parquet data with tools like Athena or Quicksight.

ROS bag ingestion with ECS Tasks, Fargate, and EFS

This solution provides an end-to-end scene detection pipeline for ROS bag files, ingesting the ROS bag files from S3, and transforming the topic data to perform scene detection in PySpark on EMR. This then exposes scene descriptions via DynamoDB to downstream consumers.

The pipeline starts with an S3 bucket (Figure 1 – #1) where incoming ROS bag files can be uploaded from local copy stations as needed. We recommend, using Amazon Direct Connect for a private, high-throughout connection to the cloud.

This ingestion bucket is configured to initiate S3 notifications each time an object ending in the prefix “.bag” is created. An AWS Lambda function then initiates a Step Function for orchestrating the ECS Task. This passes the bucket and bag file prefix to the ECS task as environment variables in the container.

The ECS Task (Figure 1 – #2) runs serverless leveraging Fargate as the capacity provider, This avoids the need to provision and autoscale EC2 instances in the ECS cluster. Each ECS Task processes exactly one bag file. We use Elastic FileStore to provide virtually unlimited file storage to the container, in order to easily work with larger bag files. The container uses the open-source bagpy python library to extract structured topic data (for example, GPS, detections, inertial measurement data,). The topic data is uploaded as parquet files to S3, partitioned by topic and source bag file. The application writes metadata about each file, such as the topic names found in the file and the number of messages per topic, to a DynamoDB table (Figure 1 – #4).

This module deploys an AWS  Glue Crawler configured to crawl this bucket of topic parquet files. These files populate the AWS Glue Catalog with the schemas of each topic table and make this data accessible in Athena, Glue jobs, Quicksight, and Spark on EMR.  We use the AWS Glue Catalog (Figure 1 – #5) as a permanent Hive Metastore.

Glue Data Catalog of parquet datasets on S3

Figure 2 – Glue Data Catalog of parquet datasets on S3

 

Run ad-hoc queries against the Glue tables using Amazon Athena

Figure 3 – Run ad-hoc queries against the Glue tables using Amazon Athena

The topic parquet bucket also has an S3 Notification configured for all newly created objects, which is consumed by an EMR-Trigger Lambda (Figure 1 – #5). This Lambda function is responsible for keeping track of bag files and their respective parquet files in DynamoDB (Figure 1 – #6). Once in DynamoDB, bag files are assigned to batches, initiating the EMR batch processing step function. Metadata is stored about each batch including the step function execution ARN in DynamoDB.

EMR pipeline orchestration with AWS Step Functions

Figure 4 – EMR pipeline orchestration with AWS Step Functions

The EMR batch processing step function (Figure 1 – #7) orchestrates the entire EMR pipeline, from provisioning an EMR cluster using the open-source EMR-Launch CDK library to submitting Pyspark steps to the cluster, to terminating the cluster and handling failures.

Batch Scene Analytics with Spark on EMR

There are two PySpark applications running on our cluster. The first performs synchronization of ROS bag topics for each bagfile. As the various sensors in the vehicle have different frequencies, we synchronize the various frequencies to a uniform frequency of 1 signal per 100 ms per sensor. This makes it easier to work with the data.

We compute the minimum and maximum timestamp in each bag file, and construct a unified timeline. For each 100 ms we take the most recent signal per sensor and assign it to the 100 ms timestamp. After this is performed, the data looks more like a normal relational table and is easier to query and analyze.

Batch Scene Analytics with Spark on EMR

Figure 5 – Batch Scene Analytics with Spark on EMR

Scene Detection and Labeling in PySpark

The second spark application enriches the synchronized topic dataset (Figure 1 – #8), analyzing the detected lane points and the object detections. The goal is to perform a simple lane assignment algorithm for objects detected by the on-board ML models and to save this enriched dataset (Figure 1 – #9) back to S3 for easy-access by analysts and data scientists.

Object Lane Assignment Example

Figure 9 – Object Lane Assignment example

 

Synchronized topics enriched with object lane assignments

Figure 9 – Synchronized topics enriched with object lane assignments

Finally, the last step takes this enriched dataset (Figure 1 – #9) to summarize specific scenes or sequences where a person was identified as being in a lane. The output of this pipeline includes two new tables as parquet files on S3 – the synchronized topic dataset (Figure 1 – #8) and the synchronized topic dataset enriched with object lane assignments (Figure 1 – #9), as well as a DynamoDB table with scene metadata for all person-in-lane scenarios (Figure 1 – #10).

Scene Metadata

The Scene Metadata DynamoDB table (Figure 1 – #10) can be queried directly to find sequences of events, as will be covered in a follow up post for visually debugging scene detection algorithms using WebViz/RViz. Using WebViz, we were able to detect that the on-board object detection model labels Crosswalks and Walking Signs as “person” even when a person is not crossing the street, for example:

Example DynamoDB item from the Scene Metadata table

Example DynamoDB item from the Scene Metadata table

Figure 10 – Example DynamoDB item from the Scene Metadata table

These scene descriptions can also be converted to Open Scenario format and pushed to an ElasticSearch cluster to support more complex scenario-based searches. For example, downstream simulation use cases or for visualization in QuickSight. An example of syncing DynamoDB tables to ElasticSearch using DynamoDB streams and Lambda can be found here (https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-elasticsearch-service-using-aws-lambda/). As DynamoDB is a NoSQL data store, we can enrich the Scene Metadata table with scene parameters. For example, we can identify the maximum or minimum speed of the car during the identified event sequence, without worrying about breaking schema changes. It is also straightforward to save a dataframe from PySpark to DynamoDB using open-source libraries.

As a final note, the modules are built to be exactly that, modular. The three modules that are easily isolated are:

  1. the ECS Task pipeline for extracting ROS bag topic data to parquet files
  2. the EMR Trigger Lambda for tracking incoming files, creating batches, and initiating a batch processing step function
  3. the EMR Pipeline for running PySpark applications leveraging Step Functions and EMR Launch

Clean Up

To clean up the deployment, you can run bash deploy.sh destroy false. Some resources like S3 buckets and DynamoDB tables may have to be manually emptied and deleted via the console to be fully removed.

Limitations

The bagpy library used in this pipeline does not yet support complex or non-structured data types like images or LIDAR data. Therefore its usage is limited to data that can be stored in a tabular csv format before being converted to parquet.

Conclusion

In this post, we showed how to build an end-to-end Scene Detection pipeline at scale on AWS to perform scene analytics and scenario detection with Spark on EMR from raw vehicle sensor data. In a subsequent blog post, we will cover how how to extract and catalog images from ROS bag files, create a labelling job with SageMaker GroundTruth and then train a Machine Learning Model to detect cars.

Recommended Reading: Field Notes: Building an Autonomous Driving and ADAS Data Lake on AWS

Field Notes: Deploying Autonomous Driving and ADAS Workloads at Scale with Amazon Managed Workflows for Apache Airflow

Post Syndicated from Hendrik Schoeneberg original https://aws.amazon.com/blogs/architecture/field-notes-deploying-autonomous-driving-and-adas-workloads-at-scale-with-amazon-managed-workflows-for-apache-airflow/

Cloud Architects developing autonomous driving and ADAS workflows are challenged by loosely distributed process steps along the tool chain in hybrid environments. This is accelerated by the need to create a holistic view of all running pipelines and jobs. Common challenges include:

  • finding and getting access to the data sources specific to your use case, understanding the data, moving and storing it,
  • cleaning and transforming it and,
  • preparing it for downstream consumption.

Verifying that your data pipeline works correctly and ensuring you provide a certain data quality while your application is being developed adds to the complexity. Furthermore, the fact that the data itself constantly changes poses more challenges. In this blog post, we show you the steps to follow from a workflow run in an Amazon Managed Workflows for Apache Airflow (MWAA) environment. All required infrastructure is created leveraging the AWS Cloud Development Kit (CDK).

We illustrate how image data collected during field operational tests (FOTs) of vehicles can be processed by:

  • detecting objects within the video stream
  • adding the bounding boxes of all detected objects to the image data
  • providing visual and metadata information for data cataloguing as well as potential applications on the bounding boxes.

We define a workflow that processes ROS bag files on Amazon S3, and extracts individual PNGs from a video stream using AWS Fargate on Amazon Elastic Container Service (Amazon ECS). Then we show how to use Amazon Rekognition to detect objects within the exported images and draw bounding boxes for the detected objects within the images. Finally, we demonstrate how users’ interface with MWAA to develop and publish their workflows. We also show methods to keep track of the performance and costs of your workflows.

Overview of the solution

Deploying Autonomous Driving & ADAS workloads at scale

Figure 1 – Architecture for deploying Autonomous Driving & ADAS workloads at scale

The preceding diagram shows our target solution architecture. It includes five components:

  • Amazon MWAA environment: we use Amazon MWAA to orchestrate our image processing workflow’s tasks. A workflow, or DAG (directed acyclic graph), is a set of task definitions and dependencies written in Python. We create a MWAA environment using CDK and define our image processing pipeline as a DAG which MWAA can orchestrate.
  • ROS bag file processing workflow: this is a workflow run on Managed Workflows for Apache Airflow. This detects and highlights objects within the camera stream of a ROS bag file. The workflow is as follows:
    • Monitors the location in Amazon S3 for the new ROS bag files.
    • If a new file gets uploaded, we extract video data from the ROS bag file’s topic that holds the video stream, extract individual PNGs from the video stream and store them on S3.
    • We use AWS Fargate on Amazon ECS  to run a containerized image extraction workload.
    • Once the extracted images are stored on S3, they are passed to Amazon Rekognition for object detection. We then provide an image or video to the Amazon Rekognition API, and the service can identify objects, people, text, scenes, and activities. Amazon Rekognition Image can return the bounding box for many object labels.
    • We use the information to highlight the detected objects within the image by drawing the bounding box.
    • The bounding boxes of the objects Amazon Rekognition detects within the extracted PNGs are rendered directly into the image using Pillow, a fork of the Python Imaging Library (PIL).
    • The resulting images are then stored in Amazon S3 and our workflow is complete.
  • Predefined DAGs: Our solution comes with a pre-defined ROS bag image processing DAG that will be copied into the Amazon MWAA environment with CDK.
  • AWS CodePipeline for automated CI/CD pipeline: CodePipeline automates the build, test, and deploy phases of your release process every time there is a code change, based on the release model you define
  • Amazon CloudWatch for monitoring and operational dashboard: With Amazon CloudWatch we can monitor important metrics of the ROS bag image processing workflow, for example. to verify the compute load, number of running workflows or tasks or to monitor and alert on processing errors.

Prerequisites

Deployment

To deploy and run the solution, we follow these steps:

  1. Configure and create an MWAA environment in AWS using CDK
  2. Copy the ROS bag file to S3
  3. Manage the ROS bag processing DAG from the Airflow UI
  4. Inspect the results
  5. Monitor the operations dashboard

Each component illustrated in the solution architecture diagram, as well as all required IAM roles and configuration parameters are created from an AWS CDK application. This is included in these example sources.

1. Configure and create an MWAA environment in AWS using CDK

  • Download the deployment templates
  • Extract the zip file to a directory
  • Open a shell and go to the extracted directory
  • Start the deployment by typing the following code:
./deploy.sh <named-profile> deploy true <region>
  • Replace <named-profile> with the named profile which you configured for your AWS CLI to point to your target AWS account.
  • Replace <region> with the Region that is associated with your named profile, for example, us-east-1.
  • As an example: if your named profile for the AWS CLI is called rosbag-processing and your associated Region is us-east-1 the resulting command would be:
./deploy.sh rosbag-processing deploy true us-east-1

After confirming the deployment, the CDK will take several minutes to synthesize the CloudFormation templates, build the Docker image and deploy the solution to the target AWS account.

Figure 2 - Screenshot of Rosbag processing stack

Figure 2 – Screenshot of Rosbag processing stack

Inspect the CloudFormation stacks being created in the AWS console. Sign into your AWS console and select the service CloudFormation in the AWS Region that you chose for your named profile (for exmaple, us-east-After the installation script has run, the CloudFormation stack overview in the console should show the following stacks:

Figure 3 - Screenshot of CloudFormation stack overview

Figure 3 – Screenshot of CloudFormation stack overview

2. Copy the ROS bag file to S3

  • You can download the ROS bag file here. After downloading you need to copy the file to the input bucket that was created during the CDK deployment in the previous step.
  • Open the AWS management console and navigate to the service S3. You should find a list of your buckets including the buckets for input data, output data and DAG definitions that were created during the CDK deployment.
Figure 4 - Screenshot showing S3 buckets created

Figure 4 – Screenshot showing S3 buckets created

  • Select the bucket with prefix rosbag-processing-stack-srcbucket and copy the ROS bag file into the bucket.
  • Now that the ROS bag file is in S3, we need to enable the image processing workflow in MWAA.

3. Manage the ROS bag processing DAG from the Airflow UI

  • In order to access the Airflow web server, open the AWS management console and navigate to the MWAA service.
  • From the overview, select the MWAA environment that was created and select Open Airflow UI as shown in the following screenshot:
Figure 5 - Screenshot showing Airflow Environments

Figure 5 – Screenshot showing Airflow Environments

The Airflow UI and the ROS bag image processing DAG should appear as in the following screenshot:

  • We can now enable the DAG by flipping the switch (1) from “Off” to “On” position. Amazon MWAA will now schedule the launch for the workflow.
  • By selecting the DAG’s name (rosbag_processing) and selecting the Graph View, you can review the workflow’s individual tasks:
Figure 6 - The Airflow UI and the ROS bag image processing DAG

Figure 6 – The Airflow UI and the ROS bag image processing DAG

Figure 7 - Dag Workflow

Figure 7 – Dag Workflow

After enabling the image processing DAG by flipping the switch to “on” in the previous step, Airflow will detect the newly uploaded ROS bag file in S3 and start processing it. You can monitor the progress from the DAG’s tree view (1), as shown in the following screenshot:

Figure 8 - Monitor progress in the DAG's tree view

Figure 8 – Monitor progress in the DAG’s tree view

Re-running the image processing pipeline

Our solution uses S3 metadata to keep track of the processing status of each ROS bag file in the input bucket in S3. If you want to re-process a ROS bag file:

  • open the AWS management console,
  • navigate to S3, click on the ROS bag file you want to re-process,
  • remove the file’s metadata tag processing.status from the properties tab:
Figure 9 - Metadata tag

Figure 9 – Metadata tag

With the metadata tag processing.status removed, Amazon MWAA picks up the file the next time the image processing workflow is launched. You can manually start the workflow by choosing Trigger DAG (1) from the Airflow web server UI.

Figure 10 - Manually start the workflow by choosing Trigger DAG

Figure 10 – Manually start the workflow by choosing Trigger DAG

4. Inspect the results

In the AWS management console navigate to S3, and select the output bucket with prefix rosbag-processing-stack-destbucket. The bucket holds both the extracted images from the ROS bag’s topic containing the video data (1) and the images with highlighted bounding boxes for the objects detected by Amazon Rekognition (2).

Figure 11 - Output S3 buckets

Figure 11 – Output S3 buckets

  • Navigate to the folder 20201005 and select an image,
  • Navigate to the folder bounding_boxes
  • Select the corresponding image. You should receive an output similar to the following two examples:
Left: Image extracted from ROS bag camera feed topic. Right: Same image with highlighted objects that Amazon Rekognition detected.

Figure 12 – Left: Image extracted from ROS bag camera feed topic. Right: Same image with highlighted objects that Amazon Rekognition detected.

Amazon CloudWatch for monitoring and creating operational dashboard

Amazon MWAA comes with CloudWatch metrics to monitor your MWAA environment. You can create dashboards for quick and intuitive operational insights. To do so:

  • Open the AWS management console and navigate to the service CloudWatch.
  • Select the menu “Dashboards” from the navigation pane and click on “Create dashboard”.
  • After providing a name for your dashboard and selecting a widget type, you can choose from a variety of metrics that MWAA provides.  This is shown in the following picture.

MWAA comes with a range of pre-defined metrics that allow you to monitor failed tasks, the health of your managed Airflow environment and helps you right-size your environment.

Figure 13 - CloudWatch metrics in a dashboard help to monitor your MWAA environment.

Figure 13 – CloudWatch metrics in a dashboard help to monitor your MWAA environment.

Clean Up

In order to delete the deployed solution from your AWS account, you must:

1. Delete all data in your S3 buckets

  • In the management console navigate to the service S3. You should see three buckets containing rosbag-processing-stack that were created during the deployment.
  • Select each bucket and delete its contents. Note: Do not delete the bucket itself.

2. Delete the solution’s CloudFormation stack

You can delete the deployed solution from your AWS account either from the management console or the command line interface.

Using the management console:

  • Open the management console and navigate to the CloudFormation service. Make sure to select the same region you deployed your solution to (for exmaple,. us-east-1).
  • Select stacks from the side menu. The solution’s CloudFormation stack is shown in the following screenshot:
Figure 14 - Cleaning up the CloudFormation Stack

Figure 14 – Cleaning up the CloudFormation Stack

Select the rosbag-processing-stack and select Delete. CloudFormation will now destroy all resources related to the ROS bag image processing pipeline.

Using the Command Line Interface (CLI):

Open a shell and navigate to the directory from which you started the deployment, then type the following:

cdk destroy --profile <named-profile>

Replace <named-profile> with the named profile which you configured for your AWS CLI to point to your target AWS account.

As an example: if your named profile for the AWS CLI is called rosbag-processing and your associated Region is us-east-1 the resulting command would be:

cdk destroy --profile rosbag-processing

After confirming that you want to destroy the CloudFormation stack, the solution’s resources will be deleted from your AWS account.

Conclusion

This blog post illustrated how to deploy and leverage workflows for running Autonomous Driving & ADAS workloads at scale. This solution was designed using fully managed AWS services. The solution was deployed on AWS using AWS CDK templates. Using MWAA we were able to set up a centralized, transparent, intuitive workflow orchestration across multiple systems. Even though the pipeline involved several steps in multiple AWS services, MWAA helped us understand and visualize the workflow, its individual tasks and dependencies.

Usually, distributed systems are hard to debug in case of errors or unexpected outcomes. However, building the ROS bag processing pipelines on MWAA allowed us to inspect the log files involved centrally from the Airflow UI, for debugging or auditing. Furthermore, with MWAA the ROS bag processing workflow is part of the code base and can be tested, refactored, versioned just like any other software artifact. By providing the CDK templates, we were able to architect a fully automated solution which you can adapt to your use case.

We hope you found this interesting and helpful and invite your comments on the solution!

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Building an Automated Image Processing and Model Training Pipeline for Autonomous Driving

Post Syndicated from Antonia Schulze original https://aws.amazon.com/blogs/architecture/field-notes-building-an-automated-image-processing-and-model-training-pipeline-for-autonomous-driving/

In this blog post, we demonstrate how to build an automated and scalable data pipeline for autonomous driving. This solution was built with the goal of accelerating the process of analyzing recorded footage and training a model to improve the experience of autonomous driving.

We will demonstrate the extraction of images from ROS bag file by using Amazon Rekognition to label the images for cataloging, and build a searchable database using Amazon DynamoDB. This is so we can find relevant images for training computer vision Machine Learning (ML) algorithms. Next, we show you how to use the database to find suitable images, create a labeling job with Amazon SageMaker Ground Truth, and train a machine learning model to detect cars. The following diagram shows the architecture for this solution.

Overview of the solution

Figure 1 - Architecture Showing how to build an automated Image Processing and Model Training pipeline

Figure 1 – Architecture Showing how to build an automated Image Processing and Model Training pipeline

Prerequisites

This post uses an AWS Cloud Development Kit (AWS CDK) stack written in Python. Follow the instructions in the CDK Getting Started guide to set up your environment.

Deployment

The full pipeline can be deployed with one command: * `bash deploy.sh deploy true`. We can follow the progress of deployment on the command line, but also in the CloudFormation section of the AWS console. Once the pipeline is deployed, we must upload bag files to the rosbag-ingest bucket to launch the pipeline. Once the pipeline has finished, we can clone the repository to the SageMaker Notebook instance ros-bag-demo-notebook.

Walkthrough

  • The Robot Operating System (ROS) is a collection of open source middleware, which provides tools and libraries for building robotic systems. The middleware uses a Publish/Subscribe (pub/sub) architecture, which can be used for the transportation of sensor data to any software modules, which need to operate on that data.
    • Each sensor publishes its data as a topic, and then any module which needs that data subscribes to that topic.
  • This Pub/Sub architecture lends itself well to recording data from multiple sensors of varying modalities (camera, LIDAR, RADAR) into a single file which can be replayed for testing and diagnostic purposes. ROS supports this capability with its ROS bag module which stores data in an ROS bag format file.
    • An ROS bag file includes a collection of topics, each with a set of time-stamped messages. These files can be replayed on an ROS system, with the timestamps, ensuring that messages are published to the topics in real time and the order they were recorded.
  • The input for this example is a set of ROS bag files, each one is approximately 10 GB.
    • To extract the image data from our input ROS bag files, you create a Docker container based on an ROS image.
    • You then create an ROS launch configuration file to extract images to .png files based on the ROS bag tutorial instructions. The Docker container is stored in an Amazon Elastic Container Registry (Amazon ECR), ready to run as an AWS Fargate task.

AWS Fargate is a serverless compute engine for containers that work with both Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (EKS). By using Fargate, we can create and run the Docker containers with our ROS environment, and have many containers running in parallel with each processing a single ROS bag file.

When you have the individual images, you need a way to assess their contents to build a searchable image catalog. This objective allows ML data scientists to search through the recorded images to find, for example, images containing pedestrians. The catalog can also be extended with data from other sources, such as weather data, location data, and so forth. You use Amazon Rekognition to process the images, and it helps add image and video analysis to your applications. When you provide an image or video to the Amazon Rekognition API, the service identifies objects, people, text, scenes, and activities. By requesting that Amazon Rekognition label each image, you receive a large amount of information to catalog the image.

The image ingestion pipeline is largely event driven. Many of the AWS services you use have limits on job concurrency and API access rates. To resolve these issues, you place all events into an Amazon Simple Queue Service (Amazon SQS) queue, invoke a Lambda function on queue, and make the appropriate API call (for example, Amazon Rekognition DetectLabels). If the API call is successful, you delete the message from the queue, otherwise (for example, the rate is exceeded) you exit the Lambda function and the message will be returned to the queue. One benefit is that when service limits change, depending on the account configuration or Region, the pipeline will automatically scale to accommodate these changes.

  • The pipeline is launched when an ROS bag file is uploaded to the Amazon Simple Storage Service (Amazon S3) bucket which has been configured to post an object creation event to an SQS queue.
  • A Lambda function is invoked from the SQS queue and it starts a Step Functions step, which runs our dockerized container on a Fargate cluster. An extracted image is stored in an S3 bucket, which invokes a second SQS queue to start a Lambda function. The Lambda function calls the DetectLabels function of the Amazon Rekognition API which, returns labels for everything that Amazon Rekognition can detect in the scene.
  • This also includes the confidence level for each label. The labels and confidence scores are stored in a DynamoDB data catalog table. You can query all images for specific objects that you are interested in and filter to create subsets that are of interest.
Figure 2 - DynamoDB table containing detected objects and confidence scores

Figure 2 – DynamoDB table containing detected objects and confidence scores

Because you will use a public workforce for labeling in the next section, you will need to create anonymized versions of images where faces and license plates are blurred out. Amazon Rekognition has a DetectFaces API call to find any faces in the image. There is no corresponding call for detecting license plates, so you detect all text in the image with the DetectText API. Use the write of the .json output file to invoke a Lambda function which calls the Amazon Rekognition APIs and blurs the relevant Regions before saving them to S3.

Image labeling with Amazon SageMaker Ground Truth

Since the images are now stored in their raw and anonymized format we can start the data labeling step. We will sample the images we want to label. The data catalog in DynamoDB lets you query the table based on your parameters and sub-area you want to optimize your model on. For example, you could query the DynamoDB table for images having a crowd of pedestrians and specifically label these images and allow your model to improve in these particular circumstances. Once we have identified the images of interest, we can copy them to a specific S3 folder and start the SageMaker Ground Truth job on an object detection task. You can find a detailed blog post on streamlining data for object detection in Amazon SageMaker Ground Truth.

The result of a SageMaker Ground Truth job is a manifest file containing the S3 Path, bounding box coordinates, and class labels (per image). This is automatically uploaded to S3. We need to replace the anonymized images with the raw image S3 Path since we want to train the model on raw images. We have provided you a sample manifest file in the repository and you can follow along the blogpost with the Jupyter Notebook provided in `object-detection/Transfer-Learning.iypnb`. First, we can verify that the annotations are high quality by viewing the following sample image.

Visualization of annotations from SageMaker Ground Truth job

Figure 3 – Visualization of annotations from SageMaker Ground Truth job

Fine-tune a GluonCV model with SageMaker Script Mode

The ML technique transfer learning allows us to use neural networks that have previously been trained on large datasets of similar applications, and fine-tune them based on a smaller custom annotated data. Frameworks such as GluonCV provide a model zoo for object detection, that allows us to have a quick access to these pre-trained models. In this case, we have selected a YOLOv3 model that has been pre-trained on the COCO dataset. Based on empirical analysis, other networks such as Faster-RCNN outperform YOLOv3, but tend to have slower inference times as measured in frames per second, which is a key aspect for real-time applications.

The preferred object detection format for GluonCV is based on .lst file format, and converts to the RecordIO design, providing faster disk access and compact storage. GluonCV provides a tutorial on how to convert a .lst file format into a RecordIO file.

To train a customized neural network we will use Amazon SageMaker Script Mode, allowing us to use your own training algorithms and the straightforward SageMaker UI.

from sagemaker import get_execution_role
sagemaker_session = sagemaker.Session()
role = get_execution_role()
s3_output_path = "s3://<path to bucket where model weights will be saved>/"

model_estimator = MXNet(
    entry_point="train_yolov3.py",
    role=role,
    train_instance_count=1,  
    train_instance_type="ml.p3.8xlarge",
    framework_version="1.8.0",
    output_path=s3_output_path,
    py_version="py37"
)

model_estimator.fit("s3://<bucket path for train and validation record-io files>/")

Hyperparameter optimization on SageMaker

While training neural networks, there are many parameters that can be optimized to the use case and the custom dataset. We refer to this as automatic model tuning in SageMaker or hyperparameter optimization. SageMaker launches multiple training jobs with a unique combination of hyperparameters, and search for the configuration achieving the highest mean average precision (mAP) on our held-out test data.

hyperparameter_ranges = {
    "lr": ContinuousParameter(0.001, 0.1),
    "wd": ContinuousParameter(0.0001, 0.001),
    "batch-size": CategoricalParameter([8, 16])
    }
metric_definitions = [
    {"Name": "val:car mAP", "Regex": "val:mAP=(.*?),"},
    {"Name": "test:car mAP", "Regex": "test:mAP=(.*?),"},
    {"Name": "BoxCenterLoss", "Regex": "BoxCenterLoss=(.*?),"},
    {"Name": "ObjLoss", "Regex": "ObjLoss=(.*?),"},
    {"Name": "BoxScaleLoss", "Regex": "BoxScaleLoss=(.*?),"},
    {"Name": "ClassLoss", "Regex": "ClassLoss=(.*?),"},
]
objective_metric_name = "val:car mAP"

hpo_tuner = HyperparameterTuner(
    model_estimator,
    objective_metric_name,
    hyperparameter_ranges,
    metric_definitions,
    max_jobs=10,  # maximum jobs that should be ran
    max_parallel_jobs=2
)

hpo_tuner.fit("s3://<bucket path for train and validation record-io files>/")

Model compilation

Although we don’t have hard constraints for a model environment when training in the cloud, we should mind the production environment when running inference with trained models: no powerful GPUs and limited storage are common challenges. Fortunately, Amazon SageMaker Neo allows you to train once and run anywhere in the cloud and at the edge, while reducing the memory footprint of your model.

best_estimator = hpo_tuner.best_estimator()
compiled_model = best_estimator.compile_model(
    target_instance_family="ml_c4",
    role=role,
    input_shape={"data": [1, 3, 512, 512]},
    output_path=s3_output_path,
    framework="mxnet",
    framework_version="1.8",
    env={"MMS_DEFAULT_RESPONSE_TIMEOUT": "500"}
)

Deploying the model

Deploying a model requires a few additional lines of code for hosting.

from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

predictor = compiled_model.deploy(
initial_instance_count=1, instance_type="ml.c4.xlarge", endpoint_name="YOLO-DEMO-endpoint", deserializer=JSONDeserializer(),serializer=JSONSerializer()
)

Run inference

Once the model is deployed with an endpoint, we can test some inference. As the model has been trained on 512×512 pixel images, we need to format inference images respectively, before serializing the data and making a prediction request to the SageMaker endpoint.

import PIL.Image
import numpy as np
test_image = PIL.Image.open("test.png")
test_image = np.asarray(test_image.resize((512, 512))) 
endpoint_response = predictor.predict(test_image)

We can then visualize the response and show the confidence score associated with the prediction on the test image.

Figure 4 - Visualization of the response and confidence score associated with the prediction on the test image.

Figure 4 – Visualization of the response and confidence score associated with the prediction on the test image.

Clean Up

To clean up the deployment you should run bash deploy.sh destroy false. In Addition to that, you also need to delete the SageMaker Endpoint. Some resources like S3 buckets and DynamoDB tables must be manually emptied and deleted through the console to be fully removed.

Conclusion

This post described how to extract images at large scale from ROS bag files and label a subset of them with SageMaker Ground Truth. With this labeled training dataset, we fine-tuned an object detection neural network using SageMaker Script Mode. To deploy the model in the autonomous driving vehicle, we compiled the model with SageMaker Neo, reducing the storage size and optimizing the model graph on the specific hardware. Finally, you ran some test inference predictions and visualized them in a SageMaker Notebook. You can find the code for this blog post in this GitHub repository.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Deliver Messages Using an IoT Rule Action to Amazon Managed Streaming for Apache Kafka

Post Syndicated from Siddhesh Keluskar original https://aws.amazon.com/blogs/architecture/field-notes-deliver-messages-using-an-iot-rule-action-to-amazon-managed-streaming-for-apache-kafka/

With IoT devices scaling up rapidly, real-time data integration and data processing has become a major challenge. This is why customers often choose Message Queuing Telemetry Transport (MQTT) for message ingestion, and Apache Kafka to build a real-time streaming data pipeline. AWS IoT Core now supports a new IoT rule action to deliver messages from your devices directly to your Amazon MSK or self-managed Apache Kafka clusters for data analysis and visualization, without you having to write a single line of code.

In this post, you learn how to set up a real-time streaming data pipeline for IoT data using AWS IoT Core rule and Amazon Managed Streaming for Apache Kafka. The audience for this post is architects and developers creating solutions to ingest sensor data, and high-volume high-frequency streaming data, and process it using a Kafka cluster. Also, this blog describes the SASL_SSL (using user name and password) method to access your Kafka cluster.

Overview of solution

Figure 1 represents an IoT data ingestion pipeline where multiple IoT devices connect to AWS IoT Core. These devices can send messages to AWS IoT Core over MQTT or HTTPS protocol. AWS IoT Core rule for Kafka is configured to intercept messages from the desired topic and route them to the Apache Kafka cluster. These messages can then be received by multiple consumers connected to the Kafka cluster. In this post, we will use AWS Python SDK to represent IoT devices and publish messages.

Figure 1 - Architecture representing an IoT ingestion pipeline

Figure 1 – Architecture representing an IoT ingestion pipeline

Prerequisites

Walkthrough

I will show you how to stream AWS IoT data on an Amazon MSK cluster using AWS IoT Core rules and SASL_SSL SCRAM-SHA-512 mechanism of authentication. Following are the steps for this walkthrough:

  1. Create an Apache Kafka cluster using Amazon MSK.
  2. Configure an Apache Kafka cluster for SASL_SSL authentication.
  3. Set up a Kafka producer and consumer on AWS Cloud9 to test the setup.
  4. Configure an IoT Rule action to send a message to Kafka.

1. Create an Apache Kafka cluster using Amazon MSK

  • The first step is to create an Apache Kafka cluster. Open the service page for Amazon MSK by signing in to your AWS account.
  • Choose Create Cluster, and select Custom Create. AWS IoT Core supports SSL and SASL_SSL based authentication for Amazon MSK. We are using custom settings to configure these authentication methods.
Figure 2 - Screenshot showing how to create an MSK cluster.

Figure 2 – Screenshot showing how to create an MSK cluster.

  • Assign a cluster name, and select Apache Kafka (version of your choice), for this walkthrough, we are using 2.6.1.
  • Keep the configuration as Amazon MSK default configuration. Choose your Networking components: VPC, number of Availability Zones (a minimum of two is required for high availability), and subnets.
  • Choose SASL/SCRAM authentication (default selection is None).

Use the encryption settings as shown in the following screenshot:

Figure 3 - Screenshot showing Encryption Settings

Figure 3 – Screenshot showing Encryption Settings

  • Keep the monitoring settings as Basic Monitoring, and Choose Create Cluster.
  • It takes approximately 15–20 minutes for the cluster to be created.

2. Configure an Apache Kafka cluster for SASL_SSL authentication

  • When the Apache Kafka cluster is available, we must then configure authentication for producers and consumers.
  • Open AWS Secrets Manager, choose Store a new secret, and then choose Other type of secrets.
  • Enter user name and password as two keys, and assign the user name and password values of your choice.
Figure 5 - Screenshot showing how to store a new secret

Figure 4 – Screenshot showing how to store a new secret

  • Next, select Add new key link.
  • Note: Do not select DefaultEncryptionKey! A secret created with the default key cannot be used with an Amazon MSK cluster. Only a Customer managed key can be used as an encryption key for an Amazon MSK–compatible secret.
  • To add a new key, select Create key, select Symmetric key, and choose Next.
  • Type an Alias, and choose Next.
  • Select appropriate users as Key administrators, and choose Next.
  • Review the configuration, and select Finish.
Figure 6 - Select the newly-created Customer Managed Key as the encryption key

Figure 5 – Select the newly-created Customer Managed Key as the encryption key

 

Figure 7 - Specify the key value pais to be stored in this secret

Figure 6 – Specify the key value pair to be stored in this secret

  • Select the newly-created Customer Managed Key as the encryption key, and choose Next.
  • Provide a Secret name (Secret name must start with AmazonMSK_ for Amazon MSK cluster to recognize it), for example, AmazonMSK_SECRET_NAME.
  • Choose Next twice, and then choose Store.
Select the newly-created Customer Managed Key as the encryption key, and choose Next. Provide a Secret name (Secret name must start with AmazonMSK_ for Amazon MSK cluster to recognize it) (for example, AmazonMSK_SECRET_NAME). Choose Next twice, and then choose Store.

Figure 7 – Storing a new secret

  • Open the Amazon MSK service page, and select your Amazon MSK cluster. Choose Associate Secrets, and then select Choose secrets (this will only be available after the cluster is created and in Active Status).
  • Choose the secret we created in the previous step, and choose Associate secrets. Only the secret name starting with AmazonMSK_ will be visible.

3. Set up Kafka producer and consumer on AWS Cloud9 to test the setup

  • To test if the cluster and authentication is correctly setup, we use Kafka SDK on AWS Cloud9 IDE.
  • Choose Create environment, and follow the console to create a new AWS Cloud9 environment. You can use an existing AWS Cloud9 environment, in addition to an environment with Kafka consumer and producer already configured.
  • This blog requires Java 8 or earlier.
  • Verify your version of Java with the command: java -version. Next, add your AWS Cloud9 instance Security Group to inbound rules of your Kafka cluster.
  • Open the Amazon MSK page and select your cluster, then choose Security groups applied.
Figure 9 - Selecting Security Groups Applied

Figure 8 – Selecting Security Groups Applied

  • Next, choose Inbound rules, and then choose Edit inbound rules.
  • Choose Add rule, and add Custom TCP ports 2181 and 9096 with Security Group of your AWS Cloud9 instance.
Figure 10 - Screenshot showing rules applied

Figure 9 – Screenshot showing rules applied

  • The Security Group for your AWS Cloud9 can be found in the Environment details section of your AWS Cloud9 instance.
Figure 11 - Screenshot showing Edit Inbound Rules added

Figure 10 – Screenshot showing Edit Inbound Rules added

  • Use Port range values as per the client information section of your Bootstrap server and Zookeeper connection.
Figure 12 - Screesnhot showing where to access 'View client information'

Figure 11 – Screenshot showing where to access ‘View client information’

 

Figure 13 - Screesnhot showing client integration information

Figure 12 – Screenshot showing client integration information

Invoke the following commands on AWS Cloud9 console to download and extract Kafka CLI tools:

wget https://archive.apache.org/dist/kafka/2.2.1/kafka_2.12-2.2.1.tgz
tar -xzf kafka_2.12-2.2.1.tgz
cd kafka_2.12-2.2.1/
mkdir client && cd client 

Next, create a file users_jass.conf, and add the user name and password that you added in Secrets Manage:

sudo nano users_jaas.conf

Paste the following configuration and save. Verify the user name and passwords are the same as saved in Secrets Manager.

KafkaClient {
   org.apache.kafka.common.security.scram.ScramLoginModule required
   username="hello"
   password="world";
};

Invoke the following commands:

export KAFKA_OPTS=-Djava.security.auth.login.config=$PWD/users_jaas.conf

Create a new file with name client_sasl.properties.

sudo nano client_sasl.properties

Copy the following content to file:

security.protocol=SASL_SSL
sasl.mechanism=SCRAM-SHA-512
ssl.truststore.location=<path-to-keystore-file>/kafka.client.truststore.jks

<path-to-keystore-file> can be retrieved by running following command:

cd ~/environment/kafka_2.12-2.2.1/client
echo $PWD

Next, copy the cacerts file from your Java lib folder to client folder. The path of Java lib folder might be different based on your version of Java.

cd ~/environment/kafka_2.12-2.2.1/client
cp /usr/lib/jvm/java-11-amazon-corretto.x86_64/lib/security/cacerts kafka.client.truststore.jks  
Figure 14 - Screenshot showing client integration information

Figure 13 – Screenshot showing client integration information

Save the previous endpoints as BOOTSTRAP_SERVER and ZOOKEEPER_STRING.

export BOOTSTRAP_SERVER=b-2.iot-demo-cluster.slu5to.c13.kafka.us-east-1.amazonaws.com:9096,b-1.iot-demo-cluster.slu5to.c13.kafka.us-east-1.amazonaws.com:9096
export ZOOKEEPER_STRING=z-1.iot-demo-cluster.slu5to.c13.kafka.us-east-1.amazonaws.com:2181,z-3.iot-demo-cluster.slu5to.c13.kafka.us-east-1.amazonaws.com:2181,z-2.iot-demo-cluster.slu5to.c13.kafka.us-east-1.amazonaws.com:2181

Save the Topic name in an environment variable.

TOPIC="AWSKafkaTutorialTopic"

  • Next, create a new Topic using the Zookeeper String.
cd ~/environment/kafka_2.12-2.2.1
bin/kafka-topics.sh --create --zookeeper $ZOOKEEPER_STRING --replication-factor 2 --partitions 1 --topic $TOPIC 
  • Confirm that you receive the message: Created topic AWSKafkaTutorialTopic.
  • Start Kafka producer by running this command in your Kafka folder:
cd ~/environment/kafka_2.12-2.2.1

bin/kafka-console-producer.sh --broker-list $BOOTSTRAP_SERVER --topic $TOPIC --producer.config client/client_sasl.properties
  • Next, open a new Terminal by pressing the + button, and initiate the following commands to configure the environment variables:
export BOOTSTRAP_SERVER=b-2.iot-demo-cluster.slu5to.c13.kafka.us-east-1.amazonaws.com:9096,b-1.iot-demo-cluster.slu5to.c13.kafka.us-east-1.amazonaws.com:9096
TOPIC="AWSKafkaTutorialTopic"

cd ~/environment/kafka_2.12-2.2.1/client
export KAFKA_OPTS=-Djava.security.auth.login.config=$PWD/users_jaas.conf

cd ~/environment/kafka_2.12-2.2.1/
bin/kafka-console-consumer.sh --bootstrap-server $BOOTSTRAP_SERVER --topic $TOPIC --from-beginning --consumer.config client/client_sasl.properties --from-beginning
  • Now that you have a Kafka consumer and producer opened side-by-side, you can type in producer terminal and verify it from the consumer terminal.
Now that you have a Kafka consumer and producer opened side-by-side, you can type in producer terminal and verify it from the consumer terminal.

Figure 14 – Screenshot showing Kafka consumer and producer opened side-by-side

4. Configure an IoT Rule action to send a message to Kafka

  • Create an AWS Identity and Access Management (IAM) role with SecretsManager permissions to allow IoT rule to access Kafka KeyStore in AWS Secrets Manager.
  • Sign in to IAM, select Policies from the left-side panel, choose Create policy.
  • Select Choose a service, and search for AWS KMS.
  • In Actions, choose All AWS KMS actions. Select All resources in the Resources section, and choose Next.
  • Name the policy KMSfullAccess, and choose Create policy.
  • Select Roles from the left-side panel, choose Create Role, then select EC2 from Choose a use case, and choose Next:Permissions.
  • Assign the policy SecretsManagerReadWrite. Note: if you do not select EC2, SecretsManager Policy will be unavailable.
  • Search for and select SecretsManagerReadWrite and KMSfullAccess Policy.

Add tags, type Role name as kafkaSASLRole, and choose Create Role.

  • After the Role is created, search the newly-created Role name to view the Summary of the role.
  • Choose the Trust relationships tab, and choose Edit trust relationship.

Enter the following trust relationship:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "iot.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
  • Choose Update Trust Policy.
  • Next, create a new AWS IoT Core rule by signing in to the AWS IoT Core service. Choose Act from the left side-menu, and select Rules.
  • Choose Create. Insert details for Name, Description, and Rule query statement, and then choose Add action. The following query is used for this post:
  • SELECT * from ‘iot/topic’
  • Select Send a message to an Apache Kafka cluster. Next, choose Configure action.
Figure 15 - Screenshot to create a rule

Figure 15 – Screenshot to create a rule

 

Create a VPC destination (if you do not already have one).

Figure 16 – How to Create a VPC destination

  • Create a VPC destination (if you do not already have one).
  • Select the VPC ID of your Kafka cluster. Select a Security Group with access to Kafka cluster Security Group.
  • Choose security group settings of the EC2 instance we created, or the security group of Kafka cluster.
  • Choose Create Role, and then select Create Destination. It takes approximately 5–10 minutes for the Destination to be Enabled. After the status is Enabled, navigate back to the Rule creation page and select the VPC Destination.
  • Enter AWSKafkaTutorialTopic as Kafka topic (confirm there is no extra space after the topic name, or you will get an error). Do not update Key and Partition boxes.

    Figure 17 - Screenshot showing how to enter the AWSKafkaTutorialTopic

    Figure 17 – Screenshot showing how to enter the AWSKafkaTutorialTopic

  • Verify the Security Group of your VPC destination is added to the inbound list for your Kafka cluster.
Figure 18 - Showing Inbound list for Kafka Cluster

Figure 18 – Showing Security Group for Kafka Cluster

 

Figure -Screenshot showing Inbound Inbound rules

Figure 19 -Screenshot showing Inbound Inbound rules

The first two Custom TCP entries are for AWS Cloud9 security group. The last two entries are for VPC endpoint.

Set the Client properties as follows:

Bootstrap.server = The TLS bootstrap string for Kafka cluster

security.protocol = SASL_SSL

ssl.truststore = EMPTY for Amazon MSK, enter SecretBinary template for self-managed Kafka

ss.truststore.password = EMPTY for Amazon MSK, enter truststore password for self-managed Kafka

sasl.mechanism = SCRAM-SHA-512

  • Replace the secret name with your stored secret name starting with AmazonMSK_, replace the IAM role ARN with your IAM role ARN.
  • The secret and IAM role are created in previous steps of this post. Enter the following template in the sasl.scram.username field to retrieve username from Secrets Manager.
${get_secret('AmazonMSK_cluster_secret','SecretString','username','arn:aws:iam::318219976534:role/kafkaSASLRole')}

Perform a similar step for sasl.scram.password field:

${get_secret('AmazonMSK_cluster_secret','SecretString','password','arn:aws:iam::318219976534:role/kafkaSASLRole')}
  • Choose Add action.
  • Choose Create rule.

Testing the data pipeline

  • Open MQTT test client from AWS IoT Core page.
  • Publish the message to the MQTT topic that you configured while creating the rule.
  • Keep the consumer session active (created in earlier step). You will see data published on the MQTT topic being streamed to Kafka consumer.
Figure 20 - Screenshot showing testing the data pipeline

Figure 20 – Screenshot showing testing the data pipeline

Common troubleshooting checks

Confirm that your:

  1. AWS Cloud9 Security Group is added to Amazon MSK Security Group Inbound rule
  2. VPC endpoint Security Group is added to Amazon MSK Security Group Inbound rule
  3. Topic is created in the Kafka cluster
  4. IAM role has Secrets Manager and KMS permissions
  5. Environment variables are correctly configured in terminal
  6. Folder paths have been correctly followed

Cleaning up

To avoid incurring future changes, delete the following resources:

  • Amazon MSK cluster
  • AWS IoT Core rule
  • IAM role
  • Secrets Manager Secret
  • AWS Cloud9 instance

Conclusion

In this post, I showed you how to configure an IoT Rule Action to deliver messages to Apache Kafka cluster using AWS IoT Core and Amazon MSK. You can now build a real-time streaming data pipeline by securely delivering MQTT messages to a highly-scalable, durable, and reliable system using Apache Kafka.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

 

Field Notes: Building a Scalable Real-Time Newsfeed Watchlist Using Amazon Comprehend

Post Syndicated from Zahi Ben Shabat original https://aws.amazon.com/blogs/architecture/field-notes-building-a-scalable-real-time-newsfeed-watchlist-using-amazon-comprehend/

One of the challenges businesses have is to constantly monitor information via media outlets and be alerted when a key interest is picked up, such as individual, product, or company information. One way to do this is to scan media and news feeds against a company watchlist. The list may contain personal names, organizations or suborganizations of interest, and other type of entities (for example, company products). There are several reasons why a company might need to develop such a process: reputation risk mitigation, data leaks, competitor influence, and market change awareness.

In this post, I will share with you a prototype solution that combines several AWS Services: Amazon Comprehend, Amazon API Gateway, AWS Lambda, and Amazon Aurora Serverless. We will examine the solution architecture and I will provide you with the steps to create the solution in your AWS environment.

Overview of solution

Architecture showing how to build a Scalable Real-Time Newsfeed Watchlist Using Amazon Comprehend

Figure 1 – Architecture showing how to build a Scalable Real-Time Newsfeed Watchlist Using Amazon Comprehend

Walkthrough

The preceding architecture shows an Event-driven design. To interact with the solution we use API Lambda functions which are initiated upon user request. Following are the high-level steps.

  1. Create a watchlist. Use the “Refresh_watchlist” API to submit new data, or load existing data from a CSV file located in a bucket. More details in the following “Loading Data” section.
  2. Make sure the data is loaded properly and check a known keyword. Use the “check-keyword” API. More details in the following “Testing” section.
  3. Once the watchlist data is ready, submit a request to “query_newsfeed” with a given newsfeed configuration (url, document section qualifier) to submit a new job to scan the content against the watchlist. Review the example in the “Testing” section.
  4. If an entity or keyword matched, you will get a notification email with the match results.

Technical Walkthrough

  • When a new request to “query_newsfeed” is submitted. The Lambda handler extracts the content of the URL and creates a new message in the ‘Incoming Queue’.
  • Once there are available messages in the incoming queue, a subscribed Lambda function is invoked “evaluate content”. Thistakes the scraped content from the message and submits it to Amazon Comprehend to extract the desired elements (entities, key phrase, sentiment).
  • The result of Amazon Comprehend is passed through a matching logic component, which runs the results against the watchlist (Aurora Serverless Postgres DB), utilizing Fuzzy Name matching.
  • If a match occurs, a new message is generated for Amazon SNS which initiates a notification email.

To deploy and test the solution we follow four steps:

  1. Create infrastructure
  2. Create serverless API Layer
  3. Load Watchlist data
  4. Test the match

The code for Building a Scalable real-time newsfeed watchlist is available in this repository.

Prerequisites

You will need an AWS account and a Serverless Framework CLI Installed.

Security best practices

Before setting up your environment, review the following best practices, and if required change the source code to comply with your security standards.

Creating the infrastructure

I recommend reviewing these resources to get started with: Amazon Aurora Serverless, Lambda, Amazon Comprehend, and Amazon S3.

To begin the procedure:

  1. Clone the GitHub repository to your local drive.
  2. Navigate to “infrastructure” directory.
  3. Use CDK or AWS CLI to deploy the stack:
aws cloudformation deploy --template RealtimeNewsAnalysisStack.template.json --stack-name RealtimeNewsAnalysis --parameter-overrides [email protected]
cdk synth / deploy –-parameters [email protected]

4. Navigate back to the root directory and into the “serverless” directory.

5. Initiate the following serverless deployment commands:

sls plugin install -n serverless-python-requirements 
sls deploy

6. Load data to the watchlist using a standard web service call to the “refresh_watchlist” API.

7. Test the service by calling the web service “check-keyword”.

8. Use “query_newsfeed” Web service to scan newsfeed and articles against the watchlist.

9. Check your mailbox for match notifications.

10. For cleanup and removal of the solution, review the “clean up” section at the end of this post.

The following screenshot shows best practices for images.

Screenshot showing image best practices

Figure 2 – Screenshot showing image best practices

Loading the watchlist data

We can use the refresh watchlist API to recreate the list with the list provided in the message. Use a tool like Postman to send a POST web service call to the refresh_watchlist.

Set the message body to RAW – JSON:

{
"refresh_list_from_bucket": false,
    "watchlist": [
        {"entity":"Mateo Jackson", "entity_type": "person"},
        {"entity":"AnyCompany", "entity_type": "organization"},
        {"entity":"Example product", "entity_type": "product"},
        {"entity":"Alice", "entity_type": "person"},
        {"entity":"Li Juan", "entity_type": "person"}
    ] }

It is possible to use a CSV file to load the data into the watchlist. Locate your newsfeed bucket and upload a CSV file “watchlist.csv” (no header required) under a directory “watchlist” in the newsfeed bucket (create the directory).

CSV Example:

CSV example table

The following is a screenshot showing how Postman initiates the request.

Screenshot showing Postman initiate the request

Figure 3 – Screenshot showing Postman initiate the request

Testing

You can use the dedicated check keyword API to test against a list of keywords to see if the match works. This does not utilize Amazon Comprehend, but it can verify that the list is loaded properly and match against a given criterion.

can use the dedicated check keyword API to test against a list of keywords

Figure 4 – You can use the dedicated check keyword API to test against a list of keywords

Note: the spelling mistake for alise with an “s” instead of “c”, and, the pronunciation of Li is spelled as Lee. Both returned as a match.

Now, let’s test it with a related news article.

Screenshot showing test with a related news article

Figure 5 – Screenshot showing test with a related news article

Check your mailbox! You should receive an email with the match result.

Cleaning up

Use cloudformation/cdk for clean up. Also, use serverless clean up `sls remove`.

Conclusion

In this post, you learned how to create a scalable watchlist and use it to monitor newsfeed content. This is a practical demonstration for a typical customer problem. The algorithms Levenshtein distance and soundex, along with Amazon Comprehend built-in machine learning capabilities, provides a powerful method to process and analyze text. To support a high volume of queries, the solution uses Amazon SQS to process messages and Amazon Aurora Serverless to automatically scale the database as needed. It is possible to use the same queue for additional data source ingestion.

This solution can be modified for additional purposes such as financial institutions OFAC watchlist (Work in progress) or other monitoring applications. Feel free to provide feedback and tell us how this solution can be useful for you.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

References

Developer Guide: Amazon Comprehend

Amazon Aurora Serverless

Amazon Simple Queue Service

PostgreSQL Documentation

Get started with Serverless Framework Open Source & AWS

 

 

Field Notes: Build Dynamic IVR Menus with Amazon Connect and AWS Lambda

Post Syndicated from Marius Cealera original https://aws.amazon.com/blogs/architecture/field-notes-build-dynamic-ivr-menus-with-amazon-connect-and-aws-lambda/

This post was co-written by Marius Cealera, Senior Partner Solutions Architect at AWS, and Zdenko Estok, Cloud Architect and DevOps Engineer at Accenture. 

Modern interactive voice response (IVR) systems help customers find answers to their questions through a series of menus, usually relying on the customer to filter and select the right options. Adding more options in these IVR menus and hoping to increase the rate of self-serviced calls can be tempting, but it can also overwhelm customers and lead them to ‘zeroing out’. That is, they select to be transferred to a human agent, defeating the purpose of a self-service solution.

This post provides a technical overview of one of Accenture’s Advanced Customer Engagement (ACE+) solutions, explaining how to build a dynamic IVR menu in Amazon Connect, in combination with AWS Lambda and Amazon DynamoDB. The solution can help the zeroing out problem by providing each customer with a personalized list of menu options. Solutions architects, developers, and contact center administrators will learn how to use Lambda and DynamoDB to build Amazon Connect flows where menu options are customized for every known customer. We have provided code examples to deploy a similar solution.

Overview of solution

“Imagine a situation where a customer navigating a call center IVR needs to choose from many menu options – for example, an insurance company providing a self-service line for customers to check their policies. For dozens of insurance policy variations, the IVR would need a long and complex menu. Chances are that after the third or fourth menu choice the customers will be confused, irritated or may even forget what they are looking for,” says Zdenko Estok, Cloud Architect and Amazon Connect Specialist at Accenture.

“Even splitting the menu in submenus does not completely solve the problem. It is a step in the right direction as it can reduce the total time spent listening to menu options, but it has the potential to grow into a huge tree of choices.”

Figure 1. A static one-layer menu on the left, a three-layer menu on the right

Figure 1. A static one-layer menu on the left, a three-layer menu on the right

One way to solve this issue is by presenting only relevant menu options to customers. This approach significantly reduces the time spent in the IVR menu, leading to a better customer experience. This also minimizes the chance the customer will require transfer to a human agent.

Figure 2. Selectively changing the IVR menu structure based on customer profile

Figure 2. Selectively changing the IVR menu structure based on customer profile

For use cases with a limited number of menu options, the solution can be achieved directly in the Amazon Connect IVR designer through the use of the Check Contact Attributes block. However, this approach can lead to complex and hard to maintain flows for situations where dozens of menu variations are possible. A more scalable solution is to store customer information and menu options in DynamoDB and build the menu dynamically by using a series of Lambda functions (Figure 3).

Consider a customer with the following information stored in a database: phone number, name, and active insurance policies. A dynamic menu implementation will authenticate the user based on the phone number, retrieve the policy information from the database, and build the menu options.

The first Lambda function retrieves the customer’s active policies and builds the personalized greeting and menu selection prompt. The second Lambda function maps the customer’s menu selection to the correct menu path. This is required since the menu is dynamic and the items and ordering are different for different customers. This approach also allows administrators to add or change insurance types and their details, directly in the database, without the need to update the IVR structure. This can be useful when maintaining IVR flows for dozens of products or services.

Figure 3 - IVR flow leveraging dynamically generated menu options.

Figure 3 – IVR flow leveraging dynamically generated menu options.

Walkthrough

Sample code for this solution is provided in this GitHub repo. The code is packaged as a CDK application, allowing the solution to be deployed in minutes. The deployment tasks are as follows:

  1. Deploy the CDK app.
  2. Update Amazon Connect instance settings.
  3. Import the demo flow and data.

Prerequisites

For this walkthrough, you need the following prerequisites:

  • An AWS account.
  • AWS CLI access to the AWS account where you would like to deploy your solution.
  • An Amazon Connect instance. If you do not have an Amazon Connect instance, you can deploy one and claim a phone number with Set up your Amazon Connect instance.

Deploy the CDK application

The resources required for this demo are packaged as a CDK app. Before proceeding, confirm you have CLI access to the AWS account where you would like to deploy your solution.

  1. Open a terminal window and clone the GitHub repository in a directory of your choice:

git clone [email protected]:aws-samples/amazon-connect-dynamic-ivr-menus.git

Navigate to the cdk-app directory and follow the deployment instructions. The default region is usually us-east-1. If you would like to deploy in another Region, you can run:

export AWS_DEFAULT_REGION=eu-central-1

Update Amazon Connect instance settings

You need to update your Amazon Connect instance settings to implement the Lambda functions created by the CDK app.

  1. Log into the AWS console.
  2. Navigate to Services > Amazon Connect. Select your Amazon Connect instance.
  3. Select Contact Flows.
  4. Scroll down to the Lambda section and add getCustomerDetails* and selectionFulfi lment* functions. If the Lambda functions are not listed, return to the Deploy the CDK application section and verify there are no deployment errors.
  5. Select +Add Lambda function.

Import the demo flow

  1. Download the DemoMenu Amazon Connect flow from the flow_archive section of the sample code repository.
  2. Log in to the Amazon Connect console. You can find the Amazon Connect access url for your instance in the AWS Console, under Services > Amazon Connect > (Your Instance Name). The access url will have the following format: https://<your_instance_name>.awsapps.com/connect/login
  3.  Create a new contact flow by selecting ‘Contact Flows’ from the left side menu and then select Create New Contact Flow.
  4.  Select ‘Import Flow(beta)’ from the upper right corner menu and select the DemoMenu file downloaded at step 1.
  5.  Click on the first ‘Invoke Lambda’ block, and verify the getCustomerDetails* Lambda is selected.
  6.  Select the second Invoke Lambda block, and verify the selectionFulfilment*’Lambda is selected.
  7.  Select Publish.
  8.  Associate the new flow with your claimed phone number (phone numbers are listed in the left side menu).

Update the demo data and test

  1. For the demo to work and recognize your phone number, you will need to enter your phone number into the demo customers table.
  2. Navigate to the AWS console and select DynamoDB.
  3. From the left hand side menu select Tables, open the CdkAppStack-policiesDb*table, and navigate to the Items tab. If the table is empty, verify you started the populateDBLamba, as mentioned in the CDK deployment instructions.
  4. Select one of the customers in the table, then select Actions > Duplicate. In the new item, enter your phone number (in international format).
  5. Select Save.
  6. Dial your claimed Connect number. You should hear the menu options based on your database table entry.

Clean up

You can remove all resources provisioned for the CDK app by navigating to the cdk-app directory and running the following command:

cdk destroy

This will not remove your Amazon Connect instance. You can remove it by navigating to the AWS console > Services > Amazon Connect. Find your Connect instance and select Remove.

Conclusion

In this post we showed you how a dynamic IVR menu can be implemented in Amazon Connect. Using a dynamic menu can significantly reduce call durations by helping customers reach relevant content faster in the IVR system, which often leads to improved customer satisfaction. Furthermore, this approach to building IVR menus provides call center administrators with a way to manage menus with dozens or hundreds of branches directly in a backend database, as well as add or update menu options.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Accelerating Innovation with the Accenture AWS Business Group (AABG)

By working with the Accenture AWS Business Group (AABG), you can learn from the resources, technical expertise, and industry knowledge of two leading innovators, helping you accelerate the pace of innovation to deliver disruptive products and services. The AABG helps customers ideate and innovate cloud solutions with customers through rapid prototype development.

Connect with our team at [email protected] to learn how to use machine learning in your products and services.

 

Zdenko Estok

Zdenko Estok

Zdenko Estok works as a Cloud Architect and DevOps engineer at Accenture. He works with AABG to develop and implement innovative cloud solutions, and specializes in Infrastructure as Code and Cloud Security. Zdenko likes to bike to the office and enjoys pleasant walks in nature.

Field Notes: Launch a Fully Configured AWS Deep Learning Desktop with NICE DCV

Post Syndicated from Ajay Vohra original https://aws.amazon.com/blogs/architecture/field-notes-launch-a-fully-configured-aws-deep-learning-desktop-with-nice-dcv/

You want to start quickly when doing deep learning using GPU-activated Elastic Compute Cloud (Amazon EC2) instances in the AWS Cloud. Although AWS provides end-to-end machine learning (ML) in Amazon SageMaker, working at the deep learning frameworks level, the quickest way to start is with AWS Deep Learning AMIs (DLAMIs), which provide preconfigured Conda environments for most of the popular frameworks.

DLAMIs make it straightforward to launch Amazon EC2 instances, but these instances do not automatically provide the high-performance graphics visualization required during deep learning research. Additionally, they are not preconfigured to use AWS storage services or SageMaker. This post explains how you can launch a fully-configured deep learning desktop in the AWS Cloud. Not only is this desktop preconfigured with the popular frameworks such as TensorFlow, PyTorch, and Apache MXNet, but it is also enabled for high-performance graphics visualization. NICE DCV is the remote display protocol used for visualization. In addition, it is preconfigured to use AWS storage services and SageMaker.

Overview of the Solution

The deep learning desktop described in this solution is ready for research and development of deep neural networks (DNNs) and their visualization. You no longer need to set up low-level drivers, libraries, and frameworks, or configure secure access to AWS storage services and SageMaker. The desktop has preconfigured access to your data in a Simple Storage Service (Amazon S3) bucket, and a shared Amazon Elastic File System (Amazon EFS) is automatically attached to the desktop. It is automatically configured to access SageMaker for ML services, and provides you with the ability to prepare the data needed for deep learning, and to research, develop, build, and debug your DNNs. You can use all the advanced capabilities of SageMaker from your deep learning desktop. The following diagram shows the reference architecture for this solution.

Reference Architecture to Launch a Fully Configured AWS Deep Learning Desktop with NICE DCV

Figure 1 – Architecture overview of the solution to launch a fully configured AWS Deep Learning Desktop with NICE DCV

The deep learning desktop solution discussed in this post is contained in a single AWS CloudFormation template. To launch the solution, you create a CloudFormation stack from the template. Before we provide a detailed walkthrough for the solution, let us review the key benefits.

DNN Research and Development

During the DNN research phase, there is iterative exploration until you choose the preferred DNN architecture. During this phase, you may prefer to work in an isolated environment (for example, a dedicated desktop) with your favorite integrated development environment (IDE) (for example, Visual Studio Code or PyCharm). Developers like the ability to step through code the IDE Debugger. With the increasing support for imperative programming in modern ML frameworks, the ability to step through code in the research phase can accelerate DNN development.

The DLAMIs are preconfigured with NVIDIA GPU drivers, NVIDIA CUDA Toolkit, and low-level libraries such as Deep Neural Network library (cuDNN). Deep learning ML frameworks such as TensorFlow, PyTorch, and Apache MXNet are preconfigured.

After you launch the deep learning desktop, you need to install and open your favorite IDE, clone your GitHub repository, and you can start researching, developing, debugging, and visualizing your DNN. The acceleration of DNN research and development is the first key benefit for the solution described in this post.

Screenshot showing Developing on deep learning desktop with Visual Studio Code IDE

Figure 2 – Developing on deep learning desktop with Visual Studio Code IDE

Elasticity in number of GPUs

During the research phase, you need to debug any issues using a single GPU. However, as the DNN is stabilized, you horizontally scale across multiple GPUs in a single machine, followed by scaling across multiple machines.

Most modern deep learning frameworks support distributed training across multiple GPUs in a single machine, and also across multiple machines. However, when you use a single GPU in an on-premises desktop equipped with multiple GPUs, the idle GPUs are wasted. With the deep learning desktop solution described in this post, you can stop the deep learning desktop instance, change its Amazon EC2 instance type to another compatible type, restart the desktop, and get the exact number of GPUs you need at the moment. The elasticity in the number of GPUs in the deep learning desktop is the second key benefit for the solution described in this post.

Integrated access to storage services 

Since the deep learning desktop is running in AWS Cloud, you have access to all of the AWS data storage options, including the S3 object store, the Amazon EFS, and the Amazon FSx file system for Lustre. You can build your favorite data pipeline and it will be supported by one or more data storage options. You can also easily use ML-IO library, which is a high-performance data access library for ML tasks with support for multiple data formats. The integrated access to highly durable and scalable object and file system storage services for accessing ML data is the third key benefit for the solution described in this post.

Integrated access to SageMaker

Once you have a stable version of your DNN, you need to find the right hyperparameters that lead to model convergence during training. Having tuned the hyperparameters, you need to run multiple trials over permutations of datasets and hyperparameters to fine-tune your models. Finally, you may need to prune and compile the models to optimize inference. To compress the training time, you may need to do distributed data parallel training across multiple GPUs in multiple machines. For all of these activities, the deep learning desktop is preconfigured to use SageMaker. You can use jupyter-lab notebooks running on the desktop to launch SageMaker training jobs for distributed training in infrastructure automatically managed by SageMaker.

Submitting a SageMaker training job from deep learning desktop using Jupyter Lab notebook

Figure 3 – Submitting a SageMaker training job from deep learning desktop using Jupyter Lab notebook

The SageMaker training logs, TensorBoard summaries, and model checkpoints can be configured to be written to the Amazon EFS attached to the deep learning desktop. You can use the Linux command tail to monitor the logs, or start a TensorBoard server from the Conda environment on the deep learning desktop, and monitor the progress of your SageMaker training jobs. You can use a Jupyter Lab notebook running on the deep learning desktop to load a specific model checkpoint available on the Amazon EFS, and visualize the predictions from the model checkpoint, even while the SageMaker training job is still running.

 Locally monitoring the TensorBoard summaries from SageMaker training job

Figure 4 – Locally monitoring the TensorBoard summaries from SageMaker training job

SageMaker offers many advanced capabilities, such as profiling ML training jobs using Amazon SageMaker Debugger, and these services are easily accessible from the deep learning desktop. You can manage the training input data, training model checkpoints, training logs, and TensorBoard summaries of your local iterative development, in addition to the distributed SageMaker training jobs, all from your deep learning desktop. The integrated access to SageMaker services is the fourth key use case for the solution described in this post.

Prerequisites

To get started, complete the following steps:

Walkthrough

The complete source and reference documentation for this solution is available in the repository accompanying this post. Following is a walkthrough of the steps.

Create a CloudFormation stack

Create a stack on the CloudFormation console in your selected AWS Region using the CloudFormation template in your cloned GitHub repository. This CloudFormation stack creates IAM resources. When you are creating a CloudFormation stack using the console, you must confirm: I acknowledge that AWS CloudFormation might create IAM resources.

To create the CloudFormation stack, you must specify values for the following input parameters (for the rest of the input parameters, default values are recommended):

  • DesktopAccessCIDR – Use the public internet address of your laptop as the base value for the CIDR.
  • DesktopInstanceType – For deep leaning, the recommended value for this parameter is p3.2xlarge, or larger.
  • DesktopVpcId – Select an Amazon Virtual Private Cloud (VPC) with at least one public subnet.
  • DesktopVpcSubnetId – Select a public subnet in your VPC.
  • DesktopSecurityGroupId – The specified security group must allow inbound access over ports 22 (SSH) and 8443 (NICE DCV) from your DesktopAccessCIDR, and must allow inbound access from within the security group to port 2049 and all network ports required for distributed SageMaker training in your subnet.
  • If you leave it blank, the automatically-created security group allows inbound access for SSH, and NICE DCV from your DesktopAccessCIDR, and allows inbound access to all ports from within the security group.
  • KeyName – Select your SSH key pair name.
  • S3Bucket – Specify your S3 bucket name. The bucket can be empty.

Visit the documentation on all the input parameters.

Connect to the deep learning desktop

  • When the status for the stack in the CloudFormation console is CREATE_COMPLETE, find the deep learning desktop instance launched in your stack in the Amazon EC2 console,
  • Connect to the instance using SSH as user ubuntu, using your SSH key pair. When you connect using SSH, if you see the message, “Cloud init in progress. Machine will REBOOT after cloud init is complete!!”, disconnect and try again in about 15 minutes.
  • The desktop installs the NICE DCV server on first-time startup, and automatically reboots after the install is complete. If instead you see the message, “NICE DCV server is enabled!”, the desktop is ready for use.
  • Before you can connect to the desktop using the NICE DCV client, you need to set a new password for user ubuntu using the Bash command:
    sudo passwd ubuntu 
  • After you successfully set the new password for user ubuntu, exit the SSH connection. You are now ready to connect to the desktop using a suitable NICE DCV client (a non–web browser client is recommended) using the user ubuntu, and the new password.
  • NICE DCV client asks you to specify the server host and port to connect. For the server host, use the public IPv4 DNS address of the desktop Amazon EC2 instance available in Amazon EC2 console.
  • You do not need to specify the port, because the desktop is configured to use the default NICE DCV server port of 8443.
  • When you first login to the desktop using the NICE DCV client, you will be asked if you would like to upgrade the OS version. Do not upgrade the OS version!

Develop on the deep learning desktop

When you are connected to the desktop using the NICE DCV client, use the Ubuntu Software Center to install Visual Studio Code, or your favorite IDE. To view the available Conda environments containing the popular deep learning frameworks preconfigured on the desktop, open a desktop terminal, and run the Bash command:

conda env list

The deep learning desktop instance has secure access to the S3 bucket you specified when you created the CloudFormation stack. You can verify access to the S3 bucket by running the Bash command (replace ‘your-bucket-name’ following with your S3 bucket name):

aws s3 ls your-bucket-name 

If your bucket is empty, a successful initiation of the previous command will produce no output, which is normal.

An Amazon Elastic Block Store (Amazon EBS) root volume is attached to the instance. In addition, an Amazon EFS is mounted on the desktop at the value of EFSMountPath input parameter, which by default is /home/ubuntu/efs. You can use the Amazon EFS for staging deep learning input and output data.

Use SageMaker from the deep learning desktop

The deep learning desktop is preconfigured to use SageMaker. To get started with SageMaker examples in a JupyterLab notebook, launch the following Bash commands in a desktop terminal:

mkdir ~/git
cd ~/git
git clone https://github.com/aws/amazon-sagemaker-examples.git
jupyter-lab

This will start a ‘jupyter-lab’ notebook server in the terminal, and open a tab in your web browser. You can explore any of the SageMaker example notebooks. We recommend starting with the example Distributed Training of Mask-RCNN in SageMaker using Amazon EFS found at the following path in the cloned repository:

advanced_functionality/distributed_tensorflow_mask_rcnn/mask-rcnn-scriptmode-efs.ipynb

The preceding SageMaker example requires you to specify a subnet and a security group. Use the preconfigured OS environment variables as follows:

security_group_ids = [ os.environ['desktop_sg_id'] ] 
subnets = [ os.environ['desktop_subnet_id' ] ] 

Stopping and restarting the desktop

You may safely reboot, stop, and restart the desktop instance at any time. The desktop will automatically mount the Amazon EFS at restart.

Clean Up

When you no longer need the deep learning desktop, you may delete the CloudFormation stack from the CloudFormation console. Deleting the stack will shut down the desktop instance, and delete the root Amazon EBS volume attached to the desktop. The Amazon EFS is not automatically deleted when you delete the stack.

Conclusion

In this post, we showed how to launch a desktop pre-configured with the popular machine learning frameworks for research and development of deep learning neural networks.  NICE-DCV was used for high performance visualization related to deep learning. AWS storage services were used for highly scalable access to deep learning data.  Finally, Amazon SageMaker was used for the distributed training of deep learning data.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: How Sportradar Accelerated Data Recovery Using AWS Services

Post Syndicated from Mithil Prasad original https://aws.amazon.com/blogs/architecture/field-notes-how-sportradar-accelerated-data-recovery-using-aws-services/

This post was co-written by Mithil Prasad, AWS Senior Customer Solutions Manager, Patrick Gryczkat, AWS Solutions Architect, Ben Burdsall, CTO at Sportradar and Justin Shreve, Director of Engineering at Sportradar. 

Ransomware is a type of malware which encrypts data, effectively locking those affected by it out of their own data and requesting a payment to decrypt the data.  The frequency of ransomware attacks has increased over the past year, with local governments, hospitals, and private companies experiencing cases of ransomware.

For Sportradar, providing their customers with access to high quality sports data and insights is central to their business. Ensuring that their systems are designed securely and in a way which minimizes the possibility of a ransomware attack is top priority.  While ransomware attacks can occur both on premises and in the cloud, AWS services offer increased visibility and native encryption and back up capabilities. This helps prevent and minimize the likelihood and impact of a ransomware attack.

Recovery, backup, and the ability to go back to a known good state is best practice. To further expand their defense and diminish the value of ransom, the Sportradar architecture team set out to leverage their AWS Step Functions expertise to minimize recovery time. The team’s strategy centered on achieving a short deployment process. This process commoditized their production environment, allowing them to spin up interchangeable environments in new isolated AWS accounts, pulling in data from external and isolated sources, and diminishing the value of a production environment as a ransom target. This also minimized the impact of a potential data destruction event.

By partnering with AWS, Sportradar was able to build a secure and resilient infrastructure to provide timely recovery of their service in the event of data destruction by an unauthorized third party. Sportradar automated the deployment of their application to a new AWS account and established a new isolation boundary from an account with compromised resources. In this blog post, we show how the Sportradar architecture team used a combination of AWS CodePipeline and AWS Step Functions to automate and reduce their deployment time to less than two hours.

Solution Overview

Sportradar’s solution uses AWS Step Functions to orchestrate the deployment of resources, the recovery of data, and the deployment of application code, and to navigate all necessary dependencies for order of deployment. While deployment can be orchestrated through CodePipeline, Sportradar used their familiarity with Step Functions to create a quick and repeatable deployment process for their environment.

Sportradar’s solution to a ransomware Disaster Recovery scenario has also provided them with a reliable and accelerated process for deploying development and testing environments. Developers are now able to scale testing and development environments up and down as needed.  This has allowed their Development and QA teams to follow the pace of feature development, versus weekly or bi-weekly feature release and testing schedules tied to a single testing environment.

Reference Architecture Showing How Sportradar Accelerated Data Recovery

Figure 1 – Reference Architecture Diagram showing Automated Deployment Flow

Prerequisites

The prerequisites for implementing this deployment strategy are:

  • An implemented database backup policy
  • Ideally data should be backed up to a data bunker AWS account outside the scope of the environment you are looking to protect. This is so that in the event of a ransomware attack, your backed up data is isolated from your affected environment and account
  • Application code within a GitHub repository
  • Separation of duties
  • Access and responsibility for the backups and GitHub repository should be separated to different stakeholders in order to reduce the likelihood of both being impacted by a security breach

Step 1: New Account Setup 

Once data destruction is identified, the first step in Sportradar’s process is to use a pre-created runbook to create a new AWS account.  A new account is created in case the malicious actors who have encrypted the application’s data have access to not just the application, but also to the AWS account the application resides in.

The runbook sets up a VPC for a selected Region, as well as spinning up the following resources:

  • Security Groups with network connectivity to their git repository (in this case GitLab), IAM Roles for their resources
  • KMS Keys
  • Amazon S3 buckets with CloudFormation deployment templates
  • CodeBuild, CodeDeploy, and CodePipeline

Step 2: Deploying Secrets

It is a security best practice to ensure that no secrets are hard coded into your application code. So, after account setup is complete, the new AWS accounts Access Keys and the selected AWS Region are passed into CodePipeline variables. The application secrets are then deployed to the AWS Parameter Store.

Step 3: Deploying Orchestrator Step Function and In-Memory Databases

To optimize deployment time, Sportradar decided to leave the deployment of their in-memory databases running on Amazon EC2 outside of their orchestrator Step Function.  They deployed the database using a CloudFormation template from their CodePipeline. This was in parallel with the deployment of the Step Function, which orchestrates the rest of their deployment.

Step 4: Step Function Orchestrates the Deployment of Microservices and Alarms

The AWS Step Functions orchestrate the deployment of Sportradar’s microservices solutions, deploying 10+ Amazon RDS instances, and restoring each dataset from DB snapshots. Following that, 80+ producer Amazon SQS queues and  S3 buckets for data staging were deployed. After the successful deployment of the SQS queues, the Lambda functions for data ingestion and 15+ data processing Step Functions are deployed to begin pulling in data from various sources into the solution.

Then the API Gateways and Lambda functions which provide the API layer for each of the microservices are deployed in front of the restored RDS instances. Finally, 300+ Amazon CloudWatch Alarms are created to monitor the environment and trigger necessary alerts. In total Sportradar’s deployment process brings online: 15+ Step Functions for data processing, 30+ micro-services, 10+ Amazon RDS instances with over 150GB of data, 80+ SQS Queues, 180+ Lambda functions, CDN for UI, Amazon Elasticache, and 300+ CloudWatch alarms to monitor the applications. In all, that is over 600 resources deployed with data restored consistently in less than 2 hours total.

Reference Architecture Diagram for How Sportradar Accelerated Data Recovery Using AWS Services

Figure 2 – Reference Architecture Diagram of the Recovered Application

Conclusion

In this blog, we showed how Sportradar’s team used Step Functions to accelerate their deployments, and a walk-through of an example disaster recovery scenario. Step Functions can be used to orchestrate the deployment and configuration of a new environment, allowing complex environments to be deployed in stages, and for those stages to appropriately wait on their dependencies.

For examples of Step Functions being used in different orchestration scenarios, check out how Step Functions acts as an orchestrator for ETLs in Orchestrate multiple ETL jobs using AWS Step Functions and AWS Lambda and Orchestrate Apache Spark applications using AWS Step Functions and Apache Livy. For migrations of Amazon EC2 based workloads, read more about CloudEndure, Migrating workloads across AWS Regions with CloudEndure Migration.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

 

Ben Burdsall

Ben Burdsall

Ben is currently the chief technology officer of Sportradar, – a data provider to the sporting industry, where he leads a product and engineering team of more than 800. Before that, Ben was part of the global leadership team of Worldpay.

Justin Shreve

Justin Shreve

Justin is Director of Engineering at Sportradar, leading an international team to build an innovative enterprise sports analytics platform.

Field Notes: How to Set Up Your Cross-Account and Cross-Region Database for Amazon Aurora

Post Syndicated from Ashutosh Pateriya original https://aws.amazon.com/blogs/architecture/field-notes-how-to-set-up-your-cross-account-and-cross-region-database-for-amazon-aurora/

This post was co-written by Ashutosh Pateriya, Solution Architect at AWS and Nirmal Tomar, Principal Consultant at Infosys Technologies Ltd. 

Various organizations have stringent regulatory compliance obligations or business requirements that require an effective cross-account and cross-region database setup. We recommend to establish a Disaster Recovery (DR) environment in different AWS accounts and Regions for a system design that can handle AWS Region failures.

Account-level separation is important for isolating production environments from other environments, that is, development, test, UAT and DR environments. These are defined by external compliance requirements, such as PCI-DSS or HIPAA. Cross-account replication helps an organization recover data from a replicated account, when the primary AWS account is compromised, and access to the account is lost. Using multiple Regions gives you greater control over the recovery time, if there is a hard dependency failure on a Regional AWS service.

You can use Amazon Aurora Global Database to replicate your RDS database across Regions, within the same AWS Account. In this blog, we show how to build a custom solution for Cross-Account, Cross-Region database replication with configurable Recovery Time Objective (RTO)/ Recovery Point Objective (RPO).

Solution Overview

a custom solution for Cross-Account, Cross-Region database replication with configurable Recovery Time

Figure 1 – Architecture of a Cross-Account, Cross-Region database replication with configurable Recovery Time Objective (RTO)/ Recovery Point Objective (RPO)

Step 1: Automated Amazon Aurora snapshots cannot be shared with other AWS accounts. Hence, create a manual version of the snapshot for the same.

Step 2: Amazon Aurora cannot be restored from the snapshot created in different AWS Regions. To overcome this, you must copy the snapshot in the target AWS Region after AWS KMS encryption.

Step 3: Share the DB cluster snapshot with the target account, in order to restore the cluster.

Step 4: Restore Aurora Cluster from the shared snapshot.

Step 5: Restore the required Aurora Instances.

Step 6: Swap the cluster and instance names, to make it accessible from the previous Reader/Writer endpoint.

Step 7: Perform a sanity test and shut down the old cluster.

Prerequisites

  • Two AWS accounts with permission to spin up required resources.
  • A running Aurora instance in the source AWS account.

Walkthrough

The solution outlined is custom built, and automated using AWS Step functions, AWS Lambda, Amazon Simple Notification Service (SNS), AWS Key Management Service (KMS), AWS Cloud​Formation Templates, Amazon Aurora and AWS Identity and Access Management IAM by following these four steps:

  1. Launch a one-time set up of the AWS KMS in the AWS source account to share the key with destination account. This key will be used by the destination account to restore the Aurora cluster from a KMS encrypted snapshot.
  2. Set up the step function workflow in the AWS source account and source Region. On a scheduled interval, this workflow creates a snapshot in the source account and source Region. Copy it to the target Region and push a notification to the SNS topic and setup a Lambda function as a target for the SNS topic. Refer to the documentation to learn more about Using Lambda with Amazon SNS.
  3. Set up the step function workflow in the AWS target account and in the target Region. This workflow is initiated from the previous workflow after sharing the snapshot. It restores the Aurora cluster from the shared snapshot in the target account and in the target Region.
  4. Provide access to the Lambda function in the target account and target Region. Set it up to listen to the SNS topic in the source account and source Region.
AWS Step Function in Source Account and Source Region

Figure 2 – Architecture showing AWS Step Function in Source Account and Source Region

We can implement a Lambda function with different supported programming languages. For this solution, we selected Python to implement various Lambda functions used in both step function workflows.

Note: We are using us-west-2 as the source Region and us-west-1 as the target Region for this demo. You can choose Regions as per your design need.

Part 1: One Time AWS KMS key setup in the Source Account and Target Region for cross-account access of the Encrypted Snapshot

We can share a Customer Master Key (CMK) across multiple AWS accounts using different ways, such as  using AWS CloudFormation templates, CLI, and AWS console.

Following are one-time steps that allow you to create an AWS KMS key in the source AWS account. You can then provide cross-account usage access of the same key to the destination account for restoring the Aurora cluster. You can achieve this with the AWS Console and AWS CloudFormation Template.

Note: Several AWS services being used do not fall under the AWS Free Tier. To avoid extra charges, make sure to delete resources once the demo is completed.

Using the AWS Console

You can set up cross-account AWS KMS keys using the AWS Console. For more information on the setup, refer to the guide: Allowing users in other accounts to use a CMK.

Using the AWS CloudFormation Template

You can set up cross account AWS KMS keys using CloudFormation templates by following these steps:

  • Launch the template:

Launch with CLoudFormatation Console button

  • Use the source account and destination AWS Region.
  • Type the appropriate stack name and destination account number and select Create stack.

Create Stack screenshot

  • Click on the Resources tab to confirm if the KMS key and its alias name have been created.

Resources screenshot

Part 2: Snapshot creation and sharing Target Account in Target Region using the Step Function workflow

Now, we schedule the AWS Step Function workflow with single/multiple times in a day/week/month using Amazon EventBridge or Amazon CloudWatch Events per Recovery Time Objective (RTO)/ Recovery Point Objective (RPO).

This workflow first creates a snapshot in the AWS source account and source Region> The snapshot copies it in target Region after KMS encryption, shares the snapshot with the target account and sends a notification to SNS Topic.  A Lambda function is subscribed to the topic in the target account.

 

Diagram showing AWS Step Function in Source Account and Source Region

Figure 3 – Architecture showing the AWS Step Functions Workflow

The Import Lambda functions involved in this workflow:

a.     CreateRDSSnapshotInSourceRegion:  This Lambda function creates a snapshot in the source Account and Region for an existing Aurora cluster.

b.     FetchSourceRDSSnapshotStatus: This Lambda function fetches the snapshot status and waits for its availability before moving to the next step.

c.     EncrytAndCopySnapshotInTargetRegion: This Lambda function copies the source snapshot in the target AWS Region after encryption with AWS KMS keys.

d.     FetchTargetRDSSnapshotStatus: This Lambda function fetch the snapshot status in the target Region and will wait to move to the next step once snapshot status changed to available.

e.     ShareSnapshotStateWithTargetAccount: This Lambda function shares the snapshot with the target account and target Region.

f.      SendSNSNotificationToTargetAccount:  This option sends the notification to the SNS topic. An AWS Lambda function is subscribed to this topic for the destination AWS Account.

Using the following AWS CloudFormation Template, we set up the required services/workflow in the source account and source Region:

a.     Launch the template in the source account and source Region:

Launch with CLoudFormatation Console button

Specify stack details screenshot

b.     Fill out the preceding form with the following details and select next.

Stack name:  Stack Name which you want to create.

DestinationAccountNumber: Destination AWS Account number where you want to restore the Aurora  cluster.

KMSKey: KMS key ARN number setup in previous steps.

ScheduleExpression: Schedule a cron expression when you want to run this workflow. Sample expressions are available in this guide, Schedule expressions using rate or cron.

SourceDBClusterIdentifier: The identifier of the DB cluster, for which a snapshot is created. It must match the identifier of an existing DBCluster. This parameter isn’t case-sensitive.

SourceDBSnapshotName: Snapshot name which we will use in this flow.

TargetAWSRegion: Destination AWS Account Region where you want to restore the Aurora cluster.

Permissions screenshot

c.     Select IAM role to launch this template per best practices.

Capabilities and transforms screenshot

d.     Acknowledge to create various resources including IAM roles/policies and select Create Stack.

Part 3: Workflow to Restore the Aurora Cluster in the AWS Target Account and Target Region

Once AWS Lambda receives a cross-account sns notification for the shared snapshot, this workflow begins:

  • The Lambda function starts the AWS Step Function workflow in an asynchronous mode.
  • The workflow checks for the snapshot availability and starts restoring the Aurora cluster based on the configured engine, version, and other parameters.
  • Depending on the availability of the cluster, it adds instances based on a predefined configuration.
  • Once the instances are available, it swaps cluster names so that the updated database is available at the same reader and writer endpoints.
  • Once the databases are restored, the old Aurora cluster is shut down.

Note:

a)       The database is restored in the default VPC, Availability Zone, and option group, with the default cluster version and option group. If you want to restore with the specific configurations, you can edit the restore cluster function with details available in the same section.

b)      Database credentials will be the same as source account. If you want to override credential or change the authentication mechanism, you can edit the restore cluster function with details available in the same section.

Trigger Step Function graphic

Figure 4 – Architecture showing the AWS Step Functions Workflow

Import Lambda functions involved in this workflow:

a)     FetchRDSSnapshotStatus: This Lambda function fetches the snapshot status and moves to RestoreCluster functionality once the snapshot is available.

b)    RestoreRDSCluster: This Lambda function restores the cluster from the shared snapshot with the mandatory attribute only.

–     You can edit the Lambda function as described following to add specific parameters like specific VPC, Subnets, AvailabilityZones, DBSubnetGroupName, Autoscaling for reader capacity, DBClusterParameterGroupName, BacktrackWindow and many more.

c)     RestoreInstance: This Lambda function creates an instance for the restored cluster.

d)    SwapClusterName: This Lambda function swaps the cluster name, so that the updated database is available at previous endpoints. Then the cluster url is swapped.

e)     TerminateOldCluster: This Lambda function will shut down the previous cluster.

Using the following AWS CloudFormation Template, we can set up the required services/workflow in the target account and target Region:

a)     Launch the template in target account and target Region:

Launch with CLoudFormatation Console button

Specify stack details screenshot

b)     Fill out the preceding form with the following details and click the next button.

Stack name: Stack name which you want to create.

Engine: The database engine to be used for the new DB cluster, such as aurora-mysql. It must be compatible with the engine of the source.

DBInstanceClass: The compute and memory capacity of the Amazon RDS DB instance, for example, db.t3.medium.

Not all DB instance classes are available in all AWS Regions, or for all database engines. For the full list of DB instance classes, and availability for your engine, see DB Instance Class in the Amazon RDS User Guide.

ExistingDBClusterIdentifier: Name of the existing DB cluster. In the final step, if the old cluster is available with this name, it will be deleted and a new cluster will be available with this name, so the reader/writer endpoint will not change. This parameter isn’t case-sensitive. It must contain from 1 to 63 letters, numbers, or hyphens. The first character must be a letter. DBClusterIdentifier can’t end with a hyphen or contain two consecutive hyphens.

TempDBClusterIdentifier: Temporary name of the DB cluster created from the DB snapshot or DB cluster snapshot. In the final step, this cluster name will be swapped with the original cluster and the previous cluster will be deleted. This parameter isn’t case-sensitive. Must contain from 1 to 63 letters, numbers, or hyphens. The first character must be a letter. DBClusterIdentifier can’t end with a hyphen or contain two consecutive hyphens.

ExistingDBInstanceIdentifier: Name of the existing DB instance identifier. In the final step, if an previous instance is available with this name, it will be deleted and a new instance will be available with this name. This parameter is stored as a lowercase string. Must contain from 1 to 63 letters, numbers, or hyphens. The first character must be a letter, can’t end with a hyphen or contain two consecutive hyphens.

TempDBInstanceIdentifier: Temporary name of the existing DB instance. In the final step, if a previous instance is available with this name, it will be deleted and a new instance will be available. This parameter is stored as a lowercase string. Must contain from 1 to 63 letters, numbers, or hyphens. The first character must be a letter, can’t end with a hyphen or contain two consecutive hyphens.

Permissions2 screenshot

 

c)     Select IAM role to launch this template as per best practices and select Next.

Capabilities and transforms screenshot 2

 

d)     Acknowledge to create various resources including IAM roles/policies and select Create Stack.

After launching the CloudFormation steps, all resources like IAM roles, step function workflow and Lambda functions will be set up in the target account.

Part 4: Cross Account Role for Lambda Invocation to restore the Aurora Cluster

In this step, we  provide access to Lambda function setup in the target account and target Region to listen to the SNS topic setup in the source account and source Region. An SNS notification is initiated by the first workflow after the snapshot sharing steps. The Lambda function initiates a workflow to restore the Aurora cluster in the target account and target Region.

Using the following code , we can configure a Lambda function in the target account to listen to the sns topic created in the source account and Region. More details are available in this Tutorial: Using AWS Lambda with Amazon Simple Notification Service.

a)     Configure the SNS topic in source account to allow Subscriptions for target account by using the following AWS CLI command in the source account.

aws sns add-permission \
  --label lambda-access --aws-account-id <<targetAccountID>> \
  --topic-arn <<snsTopicinSourceAccount>> \
  --action-name Subscribe ListSubscriptionsByTopic

Replace << targetAccountID >> with target AWS account number and <<snsTopicinSourceAccount>> with sns topic created as an output of part 2 CloudFormation template in the source account.

b)     Configure the InvokeStepFunction Lambda function in target account to allow invocation from the SNS topic in source account by using following AWS CLI command in the target account. This policy allows the specific SNS topic in the source account to invoke the Lambda function.

aws lambda add-permission \
  --function-name InvokeStepFunction \
  --source-arn <<snsTopicinSourceAccount>> \
  --statement-id sns-x-account \
  --action "lambda:InvokeFunction" \
   --principal sns.amazonaws.com

Replace <<snsTopicinSourceAccount>> with sns topic created as an output of part 2 CloudFormation template.

c)     Subscribe the Lambda function in the target account to the SNS topic in the source account by using the following AWS CLI command in the target account.

aws sns subscribe \
  --topic-arn <<snsTopicinSourceAccount>> \
   --protocol "lambda" \
  --notification-endpoint << InvokeStepFunctionArninTargetAccount>>

Initiate the CLI command in the source Region as sns topic is created there. Replace <<snsTopicinSourceAccount>> with sns topic created as an output of part 2 CloudFormation template execution and << InvokeStepFunctionArninTargetAccount >>with InvokeStepFunction Lambda in the target account created as an output of part 3 CloudFormation template launch.

Security Consideration:

1.     We can share the snapshots with different accounts using public and private mode.

  • Public mode permits all AWS accounts to restore a DB instance from the shared snapshot.
  • Private mode permits only AWS accounts that you specify to restore a DB instance from the shared snapshot. We recommend sharing the snapshot with the destination account in private mode, as we are using the same in our solution.

Snapshot permissions screenshot

2.     Sharing a manual DB cluster snapshot (whether encrypted or unencrypted) enables authorized AWS accounts to directly restore a DB cluster from the snapshot. This is instead of taking a copy and restoring from that. We recommend sharing an encrypted snapshot with the destination account, as we are using the same in our solution.

Cleaning up

Delete the resources to avoid future incurring charges.

Conclusion

In this blog, you learned how to replicate databases across the Region and across different accounts. This helps in designing your DR environment and fulfilling compliance requirements. This solution can be customized for any RDS-based databases or Aurora Serverless, and you can achieve the desired level of RTO and RPO.

Reference:

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

 

Nirmal Tomar

Nirmal is a principal consultant with Infosys, assisting vertical industries on application migration and modernization to the public cloud. He is the part of the Infosys Cloud CoE team and leads various initiatives with AWS. Nirmal has a keen interest in industry and cloud trends, and in working with hyperscaler to build differentiated offerings. Nirmal also specializes in cloud native development, managed containerization solutions, data analytics, data lakes, IoT, AI, and machine learning solutions.

Field Notes: Orchestrating and Monitoring Complex, Long-running Workflows Using AWS Step Functions

Post Syndicated from Max Winter original https://aws.amazon.com/blogs/architecture/field-notes-orchestrating-and-monitoring-complex-long-running-workflows-using-aws-step-functions/

Situation:

IHS Markit’s Wall Street Office (WSO) offers financial reports to hundreds of clients worldwide. When IHS Markit completed the migration of WSO’s SaaS software to AWS, it unlocked the power and agility to deliver new product features monthly, as opposed to a multi-year release cycle. This migration also presented a great opportunity to further enhance the customer experience by automating the WSO reporting team’s own Continuous Integration and Continuous Deployment (CI/CD) workflow. WSO then offered the same migration workflow to its on-prem clients, who needed the ability to upgrade quickly in order to meet a regulatory LIBOR reporting deadline. This rapid upgrade was enabled by fully automating regression testing of new software versions.

In this blog post, I outline the architectures created in collaboration with WSO to orchestrate and monitor the complex, long-running reconciliation workflows in their environment by leveraging the power of AWS Step Functions. To enable each client’s migration to AWS, WSO needed to ensure that the new, AWS version of the reporting application produced identical outputs to the previous, on-premises version. For a single migration, the process is as follows:

  • spin up the old version of the SQL Server and reporting engine on Windows servers,
  • run reports,
  • repeat the process with the new version,
  • compare the outputs and review the differences.

The problem came with scaling this process.  IHS Markit provides financial solutions and tools to numerous clients. To enable these clients to transition away from LIBOR, the WSO team was tasked with migrating over 80 instances of the application, and reconciling hundreds of reports for each migration. During upgrades, customers must manually validate custom extracts created in the WSO Reporting application against current and next version, which limits upgrade frequency and increases the resourcing cost of these validations. Without automation, upgrading all clients would have taken an entire new Operations team, and cost the firm over 700 developer-hours to meet the regulatory LIBOR cessation deadline.

WSO was able to save over 4,000 developer-hours by making this process repeatable so it can be used as an automated regression test as part of the regular Systems Development Lifecycle process  The following diagram shows the reconciliation workflow steps enabled as part of this automated process.

Reconciliation Workflow Steps

Figure 1 – Reconciliation Workflow Steps

Complication:

The team quickly realized that a Serverless and event-driven solution would be required to make this process manageable. The initial approach was to use AWS Lambda functions to call PowerShell scripts to perform each step in the reconciliation process. They also used Amazon SNS to invoke the next Lambda function when the previous step completed.

The problem came when the Operations team tried to monitor these Lambda functions, with multiple parallel reconciliations running concurrently. The Lambda outputs became mixed together in shared Amazon Cloud Watch log groups, and there was no way to quickly see the overall progress of any given reconciliation workflow. It was also difficult to figure out how to recover from errors.

Furthermore, the team found that some steps in this process, such as database restoration, ran longer than the 15 minute Lambda timeout limit. As a result, they were forced to look for alternatives to manage these long-running steps. Following is an architecture diagram showing the serverless component used to automate and scale the process.

Initial Orchestration Architecture

Figure 2 – Initial Orchestration Architecture

Solution:

Enter Step Functions and AWS Systems Manager (formerly known as SSM) Automation. To address the problem of orchestrating the many sequential and parallel steps in our workflow, AWS Solutions Architects suggested replacing Amazon SNS with AWS Step Functions.

The Step Functions state machine controls the order in which the steps are invoked, including successful and error state transitions. The service is integrated with 14 other AWS services (Lambda function, SSM Automation, Amazon ECS, and more.), and can invoke them, as well as manual actions. These calls can be synchronous or run via steps that wait for an event. A state machine instance is long-lived and can support processes that take up to a year to complete.

Step Function Designer UI

Figure 3 – Step Function Designer UI

This immediately gave the Development team a holistic, visual way to design our workflow, and offered Operations a graphical user interface (UI) to monitor ongoing reconciliations in real time. The Step Functions console lists out all running and past reconciliations, including their status, and allows the operator to drill down into the detailed state diagram of any given reconciliation. The operator can then see how far it’s progressed or where it encountered an error.

The UI also provides Amazon CloudWatch links for any given step, isolating the logs of that particular Lambda execution, eliminating the need to search through the CloudWatch log group manually. The screenshot below illustrates what an in-progress Step Function looks to an operator, with each step listed out with its own status and a link to its log.

Step Function Execution Monitor

Figure 4 – Step Function Execution Monitor

 

Figure 5 - Step Function Execution Detail Viewer

Figure 5 – Step Function Execution Detail Viewer

 

Figure 6 -Step Function Log Group in CloudWatch

Figure 6 -Step Function Log Group in Amazon CloudWatch

The team also used the Step Function state machine as a container for metadata about each particular reconciliation process instance (like the environment ID and the database and Amazon EC2 instances associated with that environment), reducing the need to pass this data between Lambda functions.

To solve the problem of long-running PowerShell scripts, AWS Solutions Architects suggested using SSM Automation. Unlike Lambda functions, SSM Automation is meant to run operational scripts, with no maximum time limit. They also have native PowerShell integration, which you can use to call the existing scripts and capture their output.

 

SSM Automation UI for Monitoring Long-running Tasks and Manual Approval Steps

Figure 7 – SSM Automation UI for Monitoring Long-running Tasks and Manual Approval Steps

To save time running hundreds of reports, the team looked into the ‘Map State’ feature of Step Functions. Map takes an array of input data, then creates an instance of the step (in this case a Lambda call) for each item in this array.  It waits for them all to complete before proceeding.

This is recommended to implement as a fan-out pattern with almost no orchestration code. The Map State step also gives Operations users the option to limit the level of parallelism, in this case letting only 5 reports run simultaneously. This prevents overloading our reporting applications and databases.

Figure 8 – Map State is used to Fan Out the “CompareReport” Step as Multiple Parallel Steps

To deal with errors in any of the workflow steps, the Development team introduced a manual review step, which you can model in Step Functions. The manual step notifies a mailing list of the error, then waits for a reply to tell it whether to retry or abort the workflow.

The only challenge the Development team found was the mechanism for re-running an individual failed step. At this time, any failure needs to have an explicit state transition within the Step Function’s state diagram. While the Step Function can auto-retry a step, the team wanted to insert a wait-for-human-investigation step before retrying the more expensive and complex steps.

This presented 2 options:

  1. add wait-and-loop-back steps around every step we may want to retry,
  2. route all failures to a single wait-for-investigation step.

The former added significant complexity to the state machine, so AWS Solutions Architects raised this as a product feature that should be added to the Step Functions UI.

The proposed enhancement would allow any failed step to be manually rerun or skipped via UI controls, without adding explicit steps to each state machine to model this. In the meantime, the Dev team went with the latter approach, and had the human error review step loop back to the top of the state machine to retry the entire workflow. To avoid re-running long steps, they created a check within the step Lambda function to query the Step Functions API and determine whether that step had already succeeded before the loop-back, and complete it instantly if it had.

 

Human Intervention Steps Used to Allow Time to Review Results or Resolve Errors Before Retrying Failed Steps

Figure 9 – Human Intervention Steps Used to Allow Time to Review Results or Resolve Errors Before Retrying Failed Steps

Conclusion:

Within 6 weeks, WSO was able to run the first reconciliations and begin the LIBOR migration on time. The Step Function Designer instantly gave the Developer team an operator UI and workflow orchestration engine. Normally, this would have required the creation of an entire 3-tier stack, scheduler and logging infrastructure.

Instead, using Step Functions allowed the developers to spend their time on the reconciliation logic that makes their application unique. The report compare tool developed by the WSO team provides clients with automated artifacts confirming that customer report data remained identical between current version and next version of WSO. The new testing artifacts provide clients with robust and comprehensive testing of critical data extracts.

The Completed Step Function which Orchestrates an Entire Reconciliation Process

Figure 10 -The Completed Step Function which Orchestrates an Entire Reconciliation Process

 

We hope that this blog post provided useful insights to help determine if using AWS Step Functions are a good fit for you.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

 

Field Notes: Use AWS Cloud9 to Power Your Visual Studio Code IDE

Post Syndicated from Nick Ragusa original https://aws.amazon.com/blogs/architecture/field-notes-use-aws-cloud9-to-power-your-visual-studio-code-ide/

Everyone has their favorite integrated development environment, or IDE, as it’s more commonly known. For many of us, it’s a tool that we rely on for our day-to-day activities. In some instances, it’s a tool we’ve spent years getting set up just the way we want – from the theme that looks the best to the most productive plugins and extensions that help us optimize our workflows.

After many iterations of trying different IDEs, I’ve chosen Visual Studio Code as my daily driver. One of the things I appreciate most about Visual Studio Code is its vast ecosystem of extensions, allowing me to extend its core functionality to exactly how I need it. I’ve spent hours installing (and sometimes subsequently removing) extensions, figuring out the keyboard shortcut combinations but it’s the theme and syntax highlighter that I find the most appealing. Despite all of this time invested in building the perfect IDE, it can still fall somewhat short.

One of the hardest challenges to overcome, regardless of IDE, is the fact that it’s confined to the hardware that it’s installed on. Some of these challenges include running out of local disk space, RAM exhaustion, or requiring more CPU cores to process a build. In this blog post, I show you how to overcome the limitations of your laptop or desktop by using AWS Cloud9 to power your Visual Studio Code IDE.

Solution Overview

One way to overcome the limitations of your laptop or desktop is to use a remote machine. You’ve probably tried to mount some remote file system over SSH on your machine so you could continue to use your IDE, but timeouts and connection resets made that a frustrating and unusable solution.

Imagine a solution where you can:

  • Use SSH to connect to a remote instance, install all of your favorite extensions on the remote instance, and take advantage of the remote machine’s resources including CPU, disk, RAM, and more.
  • Access the instance without needing to open security groups and ACLs from your (often changing) desktop IP.
  • Leverage your existing AWS infrastructure including your VPC, IAM user / role and policies, and take advantage of AWS Cloud9 to power your remote instance.

With AWS Cloud9, you start with an environment pre-packaged with essential tools for popular programming languages, coupled with the power of Amazon EC2. As an added advantage, you can even take advantage of AWS Cloud9’s built-in auto-shutdown capability which powers off your instance when you’re not actively connected to it. The following diagram shows the architecture you can build to use AWS Cloud9 to power your Visual Studio Code IDE.

Visual Studio Ref Architecture

Walkthrough

I will walk you through setting up the Visual Studio Code with the Remote SSH extension. Once installed, I’ll show you how to leverage the features of AWS Cloud9 to power your Visual Studio Code IDE, including:

  • Automatically power on your instance when you connect to it
  • Configure your SSH client to use AWS Systems Manager Session Manager to connect to your AWS Cloud9 instance
  • Modify your AWS Cloud9 instance to shut down after you disconnect

Before you get started, launch the following AWS CloudFormation template in your AWS account. This will be the easiest way to get up and running to be able to evaluate this solution.

launch stack button

Once launched, a second CloudFormation stack is deployed named aws-cloud9-VS-Code-Remote-SSH-Demo-<unique id>.

  • Select this stack from your console.
  • Select the Resources tab, and take note of the instance Physical ID.
  • Save this value for use later on.

CloudFormation Screenshot

 

Finally, you may wish to encrypt your AWS Cloud9 instance’s EBS volume for an additional layer of security. Follow this procedure to encrypt your AWS Cloud9 instance’s volume.

Prerequisites

For this walkthrough, you should have the following available to you:

Install Remote – SSH Extension

First, install the Remote SSH extension in Visual Studio Code. You have the option of clicking the Install button from the preceding link, or by opening the extensions panel (View -> Extensions) from within Visual Studio Code and searching for Remote SSH.

Install Remote – SSH Extension

Once installed, there are a few extension settings I suggest modifying for this solution. To access the extension settings, click on the icon in the extension browser, and go to Extension Settings.

Here, you will land on a screen similar to the following. Adjust:

  • Remote.SSH: Connect Timeout – I suggest putting a value such as 60 here. Doing so will prevent a timeout from happening in the event your AWS Cloud9 instance is powered off. This gives it ample time to power on and get ready.
  • Remote.SSH: Default Extensions – For your essential extensions, specify them on this screen. Whenever you connect to a new remote host, these extensions will be installed by default.

Screen Settings screenshot

Install the Session Manager plugin for the AWS CLI

First, install the Session Manager plugin to start and stop sessions that connect to your EC2 instances from the AWS CLI. This plugin can be installed on supported versions of Microsoft Windows, macOS, Linux, and Ubuntu Server.

Note: the plugin requires AWS CLI version 1.16.12 or later to work.

Create an SSH key pair

An SSH key pair is required to access our instance over SSH. Using public key authentication over a simple password provides cryptographic strength over even the most complex passwords. In my macOS environment, I have a utility called ssh-keygen, which I will use to create a key pair. From a terminal, run:

$ ssh-keygen -b 4096 -C 'VS Code Remote SSH user' -t rsa

To avoid any confusion with existing SSH keys, I chose to save my key to /Users/nickragusa/.ssh/id_rsa-cloud9 for this example.

Now that we have a keypair created, we need to copy the public key to the authorized_keys file on the AWS Cloud9 instance. Since I saved my key to /Users/nickragusa/.ssh/id_rsa-cloud9, the corresponding public key has been saved to /Users/nickragusa/.ssh/id_rsa-cloud9.pub.

Open the AWS Cloud9 service console in the same region you deployed the CloudFormation stack. Your environment should be named VS Code Remote SSH Demo. Select Open IDE:

 

AWS Cloud9 console

Now that you’re in your AWS Cloud9 IDE, edit ~/.ssh/authorized_keys and paste your public key, being sure to paste it below the lines. Following is an example of using the AWS Cloud9 editor to first reveal the file and then edit it:

AWS Cloud9 Editor

Save the file and move on to the next steps.

Modify the shutdown script

As an optional cost saving measure in AWS Cloud9, you can specify a timeout value in minutes that when reached, the instance will automatically power down. This works perfectly when you are connected to your AWS Cloud9 IDE through your browser.  In our use case here, we  connect to the instance strictly via SSH from our Visual Studio Code IDE. We modify the shutdown logic on the instance itself to check for connectivity from our IDE, and if we’re actively connected, this prevents the automatic shutdown.

To prevent the instance from shutting down while connected from Visual Studio Code, download this script and place it in ~/.c9/stop-if-inactive.sh. From your AWS Cloud9 instance, run:

# Save a copy of the script first
$ sudo mv ~/.c9/stop-if-inactive.sh ~/.c9/stop-if-inactive.sh-SAVE
$ curl https://raw.githubusercontent.com/aws-samples/cloud9-to-power-vscode-blog/main/scripts/stop-if-inactive.sh -o ~/.c9/stop-if-inactive.sh
$ sudo chown root:root ~/.c9/stop-if-inactive.sh
$ sudo chmod 755 ~/.c9/stop-if-inactive.sh

That’s all of the configuration changes that are needed on your AWS Cloud9 instance. The remainder of the work will be done on your laptop or desktop.

  • ssm-proxy.sh – This script will launch each time you SSH to the AWS Cloud9 instance. Upon invocation, the script checks the current state of your instance. If your instance is powered off, it will execute the aws ec2 start-instances CLI to power on the instance and wait for it to start. Once started, it will use the aws ssm start-session command, along with the Session Manager plugin installed earlier, to create an SSH session with the instance via AWS Systems Manager Session Manager.
  • ~/.ssh/config – For OpenSSH clients, this file defines any user specific SSH configurations you need. In this file, you specify the EC2 instance ID of your AWS Cloud9 instance, the private SSH key to use, the user to connect as, and the location of the ssm-proxy.sh script.

Start by downloading the ssm-proxy.sh script to your machine. To keep things organized, I downloaded this script to my ~/.ssh/ folder.

$ curl https://raw.githubusercontent.com/aws-samples/cloud9-to-power-vscode blog/main/scripts/ssm-proxy.sh -o ~/.ssh/ssm-proxy.sh
$ chmod +x ~/.ssh/ssm-proxy.sh

Next, you edit this script to match your environment. Specifically, there are 4 variables at the top of ssm-proxy.sh that you may want to edit:

  • AWS_PROFILE – If you use named profiles, you specify the appropriate profile here. If you do not use named profiles, specify a value of default here.
  • AWS_REGION – The Region where you launched your AWS Cloud9 environment into.
  • MAX_ITERATION – The number of times this script will check if the environment has successfully powered on.
  • SLEEP_DURATION – The number of seconds to wait between MAX_ITERATION checks.

Once this script is set up, you need to edit your SSH configuration. You can use the example provided to get you started.

Edit ~/.ssh/config and paste the contents of the example to the bottom of the file. Edit any values that are unique to your environment, specifically:

  • IdentityFile – This should be the full path to the private key you created earlier.
  • HostName – Specify the EC2 instance ID of your AWS Cloud9 environment. You should have copied this value down from the ‘Walkthrough’ section of this post.
  • ProxyCommand – Make sure to specify the full path of the location where you downloaded the ssm-proxy.sh script from the previous step.

Test the solution

Your AWS Cloud9 environment is now up and you have configured your local SSH configuration to use AWS Systems Manager Session Manager for connectivity.  Time to try it all out!

From the Visual Studio Code editor, open the command palette, start typing remote-ssh to bring up the correct command, and select Remote-SSH: Connect Current Window to Host.

Command Palette screenshot

A list of your SSH hosts should appear as defined in ~/.ssh/config. Select the host called cloud9 and wait for the connection to establish.

Now your Visual Studio Code environment is powered by your AWS Cloud9 instance! As some next steps, I suggest:

  • Open the integrated terminal so you can explore the file system and check what tools are already installed.
  • Browse the extension gallery and install any extensions you frequently use. Remember that extensions are installed locally – so any extensions you may have already installed on your desktop / laptop are not automatically available in this remote environment.

Cleaning up

To avoid any incurring future charges, you should delete any resources you created. I recommend deleting the CloudFormation stack you launched at the beginning of this walkthrough.

Conclusion

In this post, I covered how you can use the Remote SSH extension within Visual Studio Code to connect to your AWS Cloud9 instance. By making some adjustments to your local OpenSSH configuration, you can use AWS Systems Manager Session Manager to establish an SSH connection to your instance. This is done without the need to open an SSH specific port in the instance’s security group.

Now that you can power your Visual Studio Code IDE with AWS Cloud9, how will you take your IDE to the next level? Feel free to share some of your favorite extensions, keyboard shortcuts, or your vastly improved build times in the comments!

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Automating Data Ingestion and Labeling for Autonomous Vehicle Development

Post Syndicated from Amr Ragab original https://aws.amazon.com/blogs/architecture/field-notes-automating-data-ingestion-and-labeling-for-autonomous-vehicle-development/

This post was co-written by Amr Ragab, AWS Sr. Solutions Architect, EC2 Engineering and Anant Nawalgaria, former AWS Professional Services EMEA.

One of the most common needs we have heard from customers in Autonomous Vehicle (AV) development, is to launch a hybrid deployment environment at scale. As vehicle fleets are deployed across the globe, they are capturing real-time telemetry and sensory data. A data lake is created to process the data, and then iterating that dataset improves machine learning models for L3+ development. These datasets can include 4K60Hz camera video captures, LIDAR, RADAR and car telemetry data. The first step is to create the data gravity on which the compute and labeling portions will operate.

Architecture Overview

In this blog, we explain how to build the components in the following architecture, which takes an input dataset of 4K camera data, and performs event-driven video processing. This also includes anonymizing the dataset with face and license plate blurring using AWS Batch.

Figure 1 - Architecture for Automating Data Ingestion and Labeling for Autonomous Vehicle Development

Figure 1 – Architecture for Automating Data Ingestion and Labeling for Autonomous Vehicle Development

The output is a cleaned dataset which is processed in an Amazon SageMaker Ground Truth workflow. You can visualize the results in a SageMaker Jupyter notebook.

You can transfer data from on-premises to AWS with both online transfer using  AWS Storage Gateway, AWS Transfer Family, or AWS DataSync via AWS DirectConnect, as well as offline using AWS Snowball. Whether you use an online or offline approach, a fully automated processing workflow can still be initiated.

Description of the Dataset

The dataset we acquired was a driving sample in the N. Virginia/Washington DC metro area, as shown in the following map.

The dataset we acquired was a driving sample in the N. Virginia/Washington DC metro area, as shown in the following image

Figure 2 – Map of the area in which the driving sample was taken.

This path was chosen because of several different driving patterns including city, suburban, highway and unique driving characteristics.

We captured a 4K-60Hz video as well telemetry data from the CAN bus and finally GPS coordinates. The telemetry data from the CAN bus and GPS coordinates were streamed in real time over 4G/5G mobile network through the Amazon Kinesis service. Reach out to your AWS account team if you are interested in exploring connected car applications.

Video Processing Workflow and Manifest Creation

We walk you through the process for the video processing workflow and manifest creation.

  • The first step in the workflow is to take the 4K video file and split it into frames. Some additional processing is done to perform color and lens correction but that is specific to the camera used in this blog.
  • The next step is to use the ML-based anonymizer application which processes incoming video frames and applies face and license plate blurring on the dataset.
  • It uses the excellent work from Understand.ai and is available on Github via the Apache 2.0 License.
  • We then take the processed data and create a manifest.json file which is uploaded to a S3 bucket.
  • The S3 bucket then becomes the source for the SageMaker Ground Truth workflow.
  • Ancillary steps also include applying a lifecycle policy to Amazon S3 to transfer the raw video file to Amazon S3 Glacier. The video is then reconstructed from the processed frames. The docker image contains the following enablement stack:

Ubuntu 18.04 – nvcr.io/nvidia/cuda
AWS command line utility
FFMPEG
Understand.ai anonymizer GitHub

SageMaker Ground Truth Labeling and Analysis

To preparing image metadata, we use Amazon Rekognition. Any extra information, such as other objects were added to the image using Amazon Rekognition, and following is the Lambda code for it.

Preparing image metadata using Amazon Rekognition

Figure 3 -Lambda code to add additional objects to the image using Amazon Rekognition.

Before starting the analysis, let’s create some helper functions using the following code:

Code sample to show Helper functions

Code sample to show additional help functions

Compute Summary Statistics

First, let’s compute some summary statistics for a set of labeling jobs. For example, we might want to make a high-level comparison between two labeling jobs performed with different settings. Here, we’ll calculate the number of annotations and the mean confidence score for two jobs. We know that the first job was run with three workers per task, and the second was run with five workers per task.

Code to calculate the number of annotations and the mean confidence score for two jobs.

We can determine that the mean confidence of the whole job was 40%. Now, let us do the querying on the dataN.

Example 1:

Objective: All images with at least 5 cars annotations with confidence score of at least 80%.

Query: select * from s3object s where s.”demo-full-dataset-2″ is not null and ‘Car’ in s.”demo-full-dataset-2-metadata”.”class-map”.* and size(s.”demo-full-dataset-2″.”annotations”) >= 5 and min(s.”demo-full-dataset-2-metadata”.objects[*].”confidence”) >= 0.8 limit 20

Code:

Query output

Results:

 

Photos showing video output

Example 2:

Objective: Identify all images with at least one Person in them.

Query: select * from s3object s where ‘Person’ in s.”Objects”[*] limit 10.

Code:

Objective: Identify all images with at least one Person in them

Results:

Identify all images with at least one Person in them

Results - Identify all images with at least one Person in them

Example 3:

Objective: Identify all images with at least some text in them (for example, on signboards)

Query: select * from s3object s where CHAR_LENGTH(s.Text)>0 and s.Text  limit 10

Code:

Objective: Identify all images with at least some text in them

Results:

Results - images with at least some text in them

images with at least some text in them

Conclusion

In this blog, we showed an architecture to automate data ingestion and Ground Truth labeling for autonomous vehicle development. We initiated a workflow to process a data lake to anonymize the individual video frames and then prepare the dataset for Ground Truth labeling. The ground truth labeling UI was offered to a globally distributed workforce which labeled our images at scale. If you are developing an autonomous vehicle/robotics platform, contact your AWS account team for more information.

Recommended Reading:

Field Notes: Building an Autonomous Driving and ADAS Data Lake on AWS

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.