Tag Archives: Field Notes

Field Notes: Restricting Amazon WorkSpaces Users to Run Amazon Athena Queries

Post Syndicated from Somdeb Bhattacharjee original https://aws.amazon.com/blogs/architecture/field-notes-restricting-amazon-workspaces-users-to-run-amazon-athena-queries/

One of the use cases we hear from customers is that they want to provide very limited access to Amazon Workspaces users (for example contractors, consultants) in an AWS account. At the same time they want to allow them to query Amazon Simple Storage Service (Amazon S3) data in another account using Amazon Athena over a JDBC connection.

For example, marketing companies might provide private access to the first party data to media agencies through this mechanism.

The restrictions they want to put in place are:

  • For security reasons these Amazon WorkSpaces should not have internet connectivity. So the access to Amazon Athena must be over AWS PrivateLink.
  • Public access to Athena is not allowed using the credentials used for the JDBC connection. This is to prevent the users from leveraging the credentials to query the data from anywhere else.

In this post, we show how to use Amazon Virtual Private Cloud (Amazon VPC) endpoints for Athena, along with AWS Identity and Access Management (AWS IAM) policies. This provides private access to query the Amazon S3 data while preventing users from querying the data from outside their Amazon WorkSpaces or using the Athena public endpoint.

Let’s review the steps to achieve this:

  • Initial setup of two AWS accounts (AccountA and AccountB)
  • Set up Amazon S3 bucket with sample data in AccountA
  • Set up an IAM user with Amazon S3 and Athena access in AccountA
  • Create an Amazon VPC endpoint for Athena in AccountA
  • Set up Amazon WorkSpaces for a user in AccountB
  • Install a SQL client tool (we will use DbVisualizer Free) and Athena JDBC driver in Amazon WorkSpaces in AccountB
  • Use DbVisualizer to the query the Amazon S3 data in AccountA using the Athena public endpoint
  • Update IAM policy for user in AccountA to restrict private only access

 Prerequisites

To follow the steps in this post, you need two AWS Accounts. The Amazon VPC and subnet requirements are specified in the detailed steps.

Note: The AWS CloudFormation template used in this blog post is for US-EAST-1 (N. Virginia) Region so ensure the Region setting for both the accounts are set to US-EAST-1 (N. Virginia).

Walkthrough

The two AWS accounts are:

AccountA – Contains the Amazon S3 bucket where the data is stored. For AccountA you can create a new Amazon VPC or use the default Amazon VPC.

AccountB – Amazon WorkSpaces account. Use the following AWS CloudFormation template for AccountB:

  • The AWS CloudFormation template will create a new Amazon VPC in AccountB with CIDR 10.10.0.0/16 and set up one public subnet and two private subnets.
  • It will also create a NAT Gateway in the public subnet and create both public and private route tables.
  • Since we will be launching Amazon WorkSpaces in these private subnets and not all Availability Zones (AZ) are supported by Amazon WorkSpaces, it is important to choose the right AZ when creating them.

Review the documentation to learn which AWS Regions/AZ are supported.

We have provided two parameters in the AWS CloudFormation template:

  • AZName1
  • AZName2

Step 1

Before launching the CloudFormation stack:

  • Log in to AccountB
  • Search for AWS Resource Access Manager
  • On the right-hand side, you will notice the AZ ID to AZ Name mapping. Note down the AZ Name corresponding to AZ ID use1-az2 and use1-az4
  • Now launch the CloudFormation template and remember to choose the AZ names you noted down earlier
    • https://athena-workspaces-blogpost.s3.amazonaws.com/vpc.yaml
  • Enter the CloudFormation Stack Name as – ‘AthenaWorkspaces’ and leave everything default.
  • Once the CloudFormation stack creation is complete, create a peering connection from AccountB to AccountA.
  • Update the associated route tables for the private subnets with the new peering connection.

For information on how to create a VPC peering connection, refer to AWS documentation on VPC Peering.

AccountB VPC Route Table:

AccountB VPC Route Table:

AccountA VPC Route Table:

AccountA VPC Route Table

Step 2

  • Create a new Amazon S3 bucket in AccountA with a bucket name that starts with ‘athena-’.
  • Next, you can download a sample file and upload it to the Amazon S3 bucket you just created.
  • Use the following statements to create AWS Glue database. Use an external table for the data in the Amazon S3 bucket so that you can query it from Athena.
  • Go to Athena console and define a new database:

CREATE DATABASE IF NOT EXISTS sampledb

Once the database is created, create a new table in sampledb (by selecting sampledb from the “Database” drop down menu). Replace the <<your bucket name>> with the bucket you just created:

CREATE EXTERNAL TABLE IF NOT EXISTS sampledb.amazon_reviews_tsv(
  marketplace string, 
  customer_id string, 
  review_id string, 
  product_id string, 
  product_parent string, 
  product_title string,
  product_category string,
  star_rating int, 
  helpful_votes int, 
  total_votes int, 
  vine string, 
  verified_purchase string, 
  review_headline string, 
  review_body string, 
  review_date string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  ESCAPED BY '\\'
  LINES TERMINATED BY '\n'
LOCATION
  's3://<<your bucket name>>/'
TBLPROPERTIES ("skip.header.line.count"="1")

 

Step 3

  • In AccountA, create a new IAM user with programmatic access.
  • Save the access key and secret access key.
  • For the same user add an Inline Policy which allows the following actions:

IAM summary

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowAthenaReadActions",
            "Effect": "Allow",
            "Action": [
                "athena:ListWorkGroups",
                "athena:ListDataCatalogs",
                "athena:GetExecutionEngine",
                "athena:GetExecutionEngines",
                "athena:GetNamespace",
                "athena:GetCatalogs",
                "athena:GetNamespaces",
                "athena:GetTables",
                "athena:GetTable"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowAthenaWorkgroupActions",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryResults",
                "athena:DeleteNamedQuery",
                "athena:GetNamedQuery",
                "athena:ListQueryExecutions",
                "athena:StopQueryExecution",
                "athena:GetQueryResultsStream",
                "athena:ListNamedQueries",
                "athena:CreateNamedQuery",
                "athena:GetQueryExecution",
                "athena:BatchGetNamedQuery",
                "athena:BatchGetQueryExecution",
                "athena:GetWorkGroup"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowGlueActionsViaVPCE",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:GetTables",
                "glue:GetTable"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowGlueActionsViaAthena",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:GetTables",
                "glue:GetTable"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowS3ActionsViaAthena",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload",
                "s3:CreateBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::athena-*"
            ]
        }
    ]
}

 

Step 4

  • In this step, we create an Interface VPC endpoint (AWS PrivateLink) for Athena in AccountA. When you use an interface VPC endpoint, communication between your Amazon VPC and Athena is conducted entirely within the AWS network.
  • Each VPC endpoint is represented by one or more Elastic Network Interfaces (ENIs) with private IP addresses in your VPC subnets.
  • To create an Interface VPC endpoint follow the instructions and select Athena in the AWS Services list. Do not select the checkbox for Enable Private DNS Name.
  • Ensure the security group that is attached to the Amazon VPC endpoint is open to inbound traffic on port 443 and 444 for source AccountB VPC CIDR 10.10.0.0/16. Port 444 is used by Athena to stream query results.
  • Once you create the VPC endpoint, you will get a DNS endpoint name which is in the following format. We are going to use this in JDBC connection from the SQL client.

      VPC_Endpoint_ID.athena.Region.vpce.amazonaws.com

Step 5

  • In this step we set up Amazon WorkSpaces in AccountB.
  • Each Amazon WorkSpace is associated with the specific Amazon VPC and AWS Directory Service construct that you used to create it. All Directory Service constructs (Simple AD, AD Connector, and Microsoft AD) require two subnets to operate, each in different Availability Zones. This is why we created 2 private subnets at the beginning.
  • For this blog post I have used Simple AD as the directory service for the Amazon WorkSpaces.
  • By default, IAM users don’t have permissions for Amazon WorkSpaces resources and operations.
  • To allow IAM users to manage Amazon WorkSpaces resources, you must create an IAM policy that explicitly grants them permissions
  • Then attach the policy to the IAM users or groups that require those permissions.
  • To start, go to the Amazon WorkSpaces console and select Advanced Setup.
    • Set up a new directory using the SimpleAD option.
    • Use the “small” directory size and choose the Amazon VPC and private subnets you created in Step 1 for AccountB.
    • Once you create the directory, register the directory with Amazon WorkSpaces by selecting “Register” from the “Action” menu.
    • Select private subnets you created in Step 1 for AccountB.

Directory info

  • Next, launch Amazon WorkSpaces by following the Launch WorkSpaces button.
  • Select the directory you created and create a new user.
  • For the bundle, choose Standard with Windows 10 (PCoIP).
  • After the Amazon WorkSpaces is created, you can log in to the Amazon WorkSpaces using a client software. You can download it from https://clients.amazonworkspaces.com/
  • Login to your Amazon WorkSpace, install a SQL Client of your choice. At this point your Amazon WorkSpace still has Internet access via the NAT Gateway
  • I have used DbVisualizer (the free version) as the SQL client. Once you have that installed, install the JDBC driver for Athena following the instructions
  • Now you can set up the JDBC connections to Athena using the access key and secret key you set up for an IAM user in AccountA.

Step 6

To test out both the Athena public endpoint and the Athena VPC endpoint, create two connections using the same credentials.

For the Athena public endpoint, you need to use athena.us-east-1.amazonaws.com service endpoint. (jdbc:awsathena://athena.us-east-1.amazonaws.com:443;S3OutputLocation=s3://<athena-bucket-name>/)

Athena public

For the VPC Endpoint Connection, use the VPC Endpoint you created in Step 4 (jdbc:awsathena://vpce-<>.athena.us-east-1.vpce.amazonaws.com:443;S3OutputLocation=s3://<athena-bucket-name>/)

Database connection Athena

Now run a simple query to select records from the amazon_reviews_tsv table using both the connections.

SELECT * FROM sampledb.amazon_reviews_tsv limit 10

You should be able to see results using both the connections. Since the private subnets are still connected to the internet via the NAT Gateway, you can query using the Athena public endpoint.

Run the AWS Command Line Interface (AWS CLI) command using the credentials used for the JDBC connection from your workstation. You should be able to access the Amazon S3 bucket objects and the Athena query run list using the following commands.

aws s3 ls s3://athena-workspaces-blogpost

aws athena list-query-executions

Step 7

  • Now we lock down the access as described in the beginning of this blog post by taking the following actions:
  • Update the route table for the private subnets by removing the route for the internet so access to the Athena public endpoint is restricted from the Amazon WorkSpaces. The only access will be allowed through the Athena VPC Endpoint.
  • Add conditional checks to the IAM user access policy that will restrict access to the Amazon S3 buckets and Athena only if:
    • The request came in through the VPC endpoint. For this we use the “aws:SourceVpce” check and provide the VPC Endpoint ID value.
    • The request for Amazon S3 data is through Athena. For this we use the condition “aws:CalledVia” and provide a value of “athena.amazonaws.com”.
  • In the IAM access policy below replace <<your vpce id>> with your VPC endpoint id and update the previous inline policy which was added to the IAM user in Step 3.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowAthenaReadActions",
            "Effect": "Allow",
            "Action": [
                "athena:ListWorkGroups",
                "athena:ListDataCatalogs",
                "athena:GetExecutionEngine",
                "athena:GetExecutionEngines",
                "athena:GetNamespace",
                "athena:GetCatalogs",
                "athena:GetNamespaces",
                "athena:GetTables",
                "athena:GetTable"
            ],
            "Resource": "*",
            "Condition":{
               "StringEquals":{
                  "aws:SourceVpce":[
                     "<<your vpce id>>"
                  ]
               }
            }
        },
        {
            "Sid": "AllowAthenaWorkgroupActions",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryResults",
                "athena:DeleteNamedQuery",
                "athena:GetNamedQuery",
                "athena:ListQueryExecutions",
                "athena:StopQueryExecution",
                "athena:GetQueryResultsStream",
                "athena:ListNamedQueries",
                "athena:CreateNamedQuery",
                "athena:GetQueryExecution",
                "athena:BatchGetNamedQuery",
                "athena:BatchGetQueryExecution",
                "athena:GetWorkGroup"
            ],
            "Resource": "*",
            "Condition":{
               "StringEquals":{
                  "aws:SourceVpce":[
                     "<<your vpce id>>"
                  ]
               }
            }
        },
        {
            "Sid": "AllowGlueActionsViaVPCE",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:GetTables",
                "glue:GetTable"
            ],
            "Resource": "*",
            "Condition":{
               "StringEquals":{
                  "aws:SourceVpce":[
                     "<<your vpce id>>"
                  ]
               }
            }
        },
        {
            "Sid": "AllowGlueActionsViaAthena",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:GetTables",
                "glue:GetTable"
            ],
            "Resource": "*",
            "Condition":{
               "ForAnyValue:StringEquals":{
                  "aws:CalledVia":[
                     "athena.amazonaws.com"
                  ]
               }
            }
        },
        {
            "Sid": "AllowS3ActionsViaAthena",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload",
                "s3:CreateBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::athena-*"
            ],
            "Condition":{
               "ForAnyValue:StringEquals":{
                  "aws:CalledVia":[
                     "athena.amazonaws.com"
                  ]
               }
            }
        }
    ]
}

Once you applied the changes, try to reconnect using both the Athena VPC endpoint as well Athena public endpoint connections. The Athena VPC endpoint connection should work but the public endpoint connection will time out. Also try the same Amazon S3 and Athena AWS CLI commands. You should get access denied for both the operations.

Clean Up

To avoid incurring costs, remember to delete the resources that you created.

For AWS AccountA:

  • Delete the S3 buckets
  • Delete the database you created in AWS Glue
  • Delete the Amazon VPC endpoint you created for Amazon Athena

For AccountB:

  • Delete the Amazon Workspace you created along with the Simple AD directory. You can review more information on how to delete your Workspaces.

Conclusion

In this blog post, I showed how to leverage Amazon VPC endpoints and IAM policies to privately connect to Amazon Athena from Amazon Workspaces that don’t have internet connectivity.

Give this solution a try and share your feedback in the comments!

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Optimize your Java application for Amazon ECS with Quarkus

Post Syndicated from Sascha Moellering original https://aws.amazon.com/blogs/architecture/field-notes-optimize-your-java-application-for-amazon-ecs-with-quarkus/

In this blog post, I show you an interesting approach to implement a Java-based application and compile it to a native image using Quarkus. This native image is the main application, which is containerized, and runs in an Amazon Elastic Container Service and Amazon Elastic Kubernetes Service cluster on AWS Fargate.

Amazon ECS is a fully managed container orchestration service, Amazon EKS is a fully managed Kubernetes service, both services support Fargate to provide serverless compute for containers. Fargate removes the need to provision and manage servers, lets you specify and pay for resources per application, and improves security through application isolation by design. AWS Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you.

Quarkus is a Supersonic Subatomic Java framework that uses OpenJDK HotSpot as well as GraalVM and over fifty different libraries like RESTEasy, Vertx, Hibernate, and Netty. In a previous blog post, I demonstrated how GraalVM can be used to optimize the size of Docker images. GraalVM is an open source, high-performance polyglot virtual machine from Oracle. I use it to compile native images ahead of time to improve startup performance, and reduce the memory consumption and file size of Java Virtual Machine (JVM)-based applications. The framework that allows ahead-of-time-compilation (AOT) is called Substrate.

Application Architecture

First, review the GitHub repository containing the demo application.

Our application is a simple REST-based Create Read Update Delete (CRUD) service that implements basic user management functionalities. All data is persisted in an Amazon DynamoDB table. Quarkus offers an extension for Amazon DynamoDB that is based on AWS SDK for Java V2. This Quarkus extension supports two different programming models: blocking access and asynchronous programming. For local development, DynamoDB Local is also supported. DynamoDB Local is the downloadable version of DynamoDB that lets you write and test applications without accessing the DynamoDB service. Instead, the database is self-contained on your computer. When you are ready to deploy your application in production, you can make a few minor changes to the code so that it uses the DynamoDB service.

The REST-functionality is located in the class UserResource which uses the JAX-RS implementation RESTEasy. This class invokes the UserService that implements the functionalities to access a DynamoDB table with the AWS SDK for Java. All user-related information is stored in a Plain Old Java Object (POJO) called User.

Building the application

To create a Docker container image that can be used in the task definition of my ECS cluster, follow these three steps: build the application, create the Docker Container Image, and push the created image to my Docker image registry.

To build the application, I used Maven with different profiles. The first profile (default profile) uses a standard build to create an uber JAR – a self-contained application with all dependencies. This is very useful if you want to run local tests with your application, because the build time is much shorter compared to the native-image build. When you run the package command, it also execute all tests, which means you need DynamoDB Local running on your workstation.

$ docker run -p 8000:8000 amazon/dynamodb-local -jar DynamoDBLocal.jar -inMemory -sharedDb

$ mvn package

The second profile uses GraalVM to compile the application into a native image. In this case, you use the native image as base for a Docker container. The Dockerfile can be found under src/main/docker/Dockerfile.native and uses a build-pattern called multi-stage build.

$ mvn package -Pnative -Dquarkus.native.container-build=true

An interesting aspect of multi-stage builds is that you can use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base image, and begins a new stage of the build. You can pick the necessary files and copy them from one stage to another, thereby limiting the number of files you have to copy. Use this feature to build your application in one stage and copy your compiled artifact and additional files to your target image. In this case, you use ubi-quarkus-native-image:20.1.0-java11 as base image and copy the necessary TLS-files (SunEC library and the certificates) and point your application to the necessary files with JVM properties.

FROM quay.io/quarkus/ubi-quarkus-native-image:20.1.0-java11 as nativebuilder
RUN mkdir -p /tmp/ssl-libs/lib \
  && cp /opt/graalvm/lib/security/cacerts /tmp/ssl-libs \
  && cp /opt/graalvm/lib/libsunec.so /tmp/ssl-libs/lib/

FROM registry.access.redhat.com/ubi8/ubi-minimal
WORKDIR /work/
COPY target/*-runner /work/application
COPY --from=nativebuilder /tmp/ssl-libs/ /work/
RUN chmod 775 /work
EXPOSE 8080
CMD ["./application", "-Dquarkus.http.host=0.0.0.0", "-Djava.library.path=/work/lib", "-Djavax.net.ssl.trustStore=/work/cacerts"]

In the second and third steps, I have to build and push the Docker image to a Docker registry of my choice which is straight forward:

$ docker build -f src/main/docker/Dockerfile.native -t

$ docker push <repo/image:tag>

Setting up the infrastructure

You’ve compiled the application to a native-image and have built a Docker image. Now, you set up the basic infrastructure consisting of an Amazon Virtual Private Cloud (VPC), an Amazon ECS or Amazon EKS cluster with on AWS Fargate launch type, an Amazon DynamoDB table, and an Application Load Balancer.

Figure 1: Architecture of the infrastructure (for Amazon ECS)

Figure 1: Architecture of the infrastructure (for Amazon ECS)

Codifying your infrastructure allows you to treat your infrastructure just as code. In this case, you use the AWS Cloud Development Kit (AWS CDK), an open source software development framework, to model and provision your cloud application resources using familiar programming languages. The code for the CDK application can be found in the demo application’s code repository under eks_cdk/lib/ecs_cdk-stack.ts or ecs_cdk/lib/ecs_cdk-stack.ts. Set up the infrastructure in the AWS Region us-east-1:

$ npm install -g aws-cdk // Install the CDK
$ cd ecs_cdk
$ npm install // retrieves dependencies for the CDK stack
$ npm run build // compiles the TypeScript files to JavaScript
$ cdk deploy  // Deploys the CloudFormation stack

The output of the AWS CloudFormation stack is the load balancer’s DNS record. The heart of our infrastructure is an Amazon ECS or Amazon EKS cluster with AWS Fargate launch type. The Amazon ECS cluster is set up as follows:

const cluster = new ecs.Cluster(this, "quarkus-demo-cluster", {
      vpc: vpc
    });
    
    const logging = new ecs.AwsLogDriver({
      streamPrefix: "quarkus-demo"
    })

    const taskRole = new iam.Role(this, 'quarkus-demo-taskRole', {
      roleName: 'quarkus-demo-taskRole',
      assumedBy: new iam.ServicePrincipal('ecs-tasks.amazonaws.com')
    });
    
    const taskDef = new ecs.FargateTaskDefinition(this, "quarkus-demo-taskdef", {
      taskRole: taskRole
    });
    
    const container = taskDef.addContainer('quarkus-demo-web', {
      image: ecs.ContainerImage.fromRegistry("<repo/image:tag>"),
      memoryLimitMiB: 256,
      cpu: 256,
      logging
    });
    
    container.addPortMappings({
      containerPort: 8080,
      hostPort: 8080,
      protocol: ecs.Protocol.TCP
    });

    const fargateService = new ecs_patterns.ApplicationLoadBalancedFargateService(this, "quarkus-demo-service", {
      cluster: cluster,
      taskDefinition: taskDef,
      publicLoadBalancer: true,
      desiredCount: 3,
      listenerPort: 8080
    });

Cleaning up

After you are finished, you can easily destroy all of these resources with a single command to save costs.

$ cdk destroy

Conclusion

In this post, I described how Java applications can be implemented using Quarkus, compiled to a native-image, and ran using Amazon ECS or Amazon EKS on AWS Fargate. I also showed how AWS CDK can be used to set up the basic infrastructure. I hope I’ve given you some ideas on how you can optimize your existing Java application to reduce startup time and memory consumption. Feel free to submit enhancements to the sample template in the source repository or provide feedback in the comments.

We also encourage you to explore how you to optimize your Java application for AWS Lambda with Quarkus.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: How to Identify and Block Fake Crawler Bots Using AWS WAF

Post Syndicated from Vijay Menon original https://aws.amazon.com/blogs/architecture/field-notes-how-to-identify-and-block-fake-crawler-bots-using-aws-waf/

In this blog post, we focus on how to identify fake bots using these AWS services: AWS WAF, Amazon Kinesis Data Firehose, Amazon S3 and AWS Lambda. We use fake Google/Bing bots to demonstrate, but the principles can be applied to other popular crawlers like Slurp Bot from Yahoo, DuckDuckBot from DuckDuckGo, Alexa crawler from Alexa internet ranking service.

For industries like media, online retailors, news or social websites, content is critical and often sets them apart from other competitors. These companies put in a significant amount of effort to make the content as visible and accessible as possible. To do that these companies rely on crawler bots, so that legitimate users searching for content can find the content easily. Crawler bots are useful for indexing the site pages and helping make the content more searchable and improve rankings.

However, this capability can be misused. So it is important to distinguish between genuine crawler bots and fake ones that are doing more than just indexing your site. It’s important to properly identify good and bad actors so that you can stop the bad ones without impacting the ability of good ones, and at scale. This helps in driving more traffic, visitors, and more revenue from your websites.

Identifying bots

There are two primary sources of information required to identify a fake bot:

  • HTTP Header User-Agent: Fake bots try to present themselves as real bots, for example as Google or Bing, by using the same user agent string used by Google or Bing.
  • IP Address: You can look at the source IP address of the incoming request and determine if it belongs to the search engine provider network like Google or Bing. You can do this by performing a forward and reverse look up and comparing the results. These methods are well documented by the search engine providers Google and Bing.

Solution Overview

The solution leverages the capabilities of AWS WAF. Our demonstration application is a static website hosted on Amazon S3 fronted by Amazon CloudFront. This means that we can provide permission of access to the S3 bucket only to CloudFront using origin access identity.

The logs are streamed in near real time using Amazon Kinesis Data Firehose, inspected using a Lambda function to help identify fake bots before storing the logs on Amazon S3. The Lambda function does two things:

  1. Inspect the traffic using the rules to identify bad or fake bots. In this case it uses forward and reverse DNS lookup results of the Client IP address of packets with User-Agent string resembling GoogleBot or BingBot.
  2. Once identified as a fake bot, the Lambda function updates AWS WAF IP-Set to permanently block the requests coming from IP addresses of fake bots.

Note: For the sake of this demonstration, we are using a static website hosted on Amazon S3 with CloudFront. This requires the AWS WAF and IP-Set used by AWS WAF to be of scope ‘CLOUDFRONT’. You can modify it to use scope ‘REGIONAL’ if you chose to protect your web properties behind Application Load Balancer or API Gateway.

Solution Architecture

WAF Solution Architecture

Prerequisites

Readers of this blog post should be familiar with HTTP and the following AWS services:

For this walkthrough, you should have the following:

  • AWS Account
  • AWS Command Line Interface (AWS CLI): You need AWS CLI installed and configured on the workstation from where you are going to try the steps mentioned below.
  • Credentials configured in AWS CLI should have the required IAM permissions to spin up and modify the resources mentioned in this post.
  • Make sure that you deploy the solution to us-east-1 Region and your AWS CLI default Region is us-east-1. If us-east-1 is not the default Region, reference the Regions explicitly while executing AWS CLI commands using --region us-east-1 switch.
  • Amazon S3 bucket in us-east-1 region

Walkthrough

1.     Create an Amazon S3 static Website and put it behind an Amazon CloudFront distribution. You can follow these steps. Note down the CloudFront distribution ID. We will use it in subsequent steps.

2.     Download and unzip the file containing CloudFormation template and lambda function code from here to a folder in your local workstation. You will need to run all of the subsequent commands from this folder.

3.     Zip the lambda function lambda_function.py and upload it to an Amazon S3 bucket of your choice in us-east-1. Note the bucket name as it will be used in the subsequent steps. The blog uses waf-logs-fake-bots-us-east-1 for reference as the S3 bucket name.

$ zip lambda_function.py lambda_function.py.zip
$ aws s3 cp lambda_function.py.zip s3://waf-logs-fake-bots-us-east-1/

4.     Create the resources required for this blog post by deploying the AWS CloudFormation template and running the below command:

aws cloudformation create-stack \
--stack-name FakeBotBlockBlog \
--template-body file://BotBlog.yml \
--parameters ParameterKey=KinesisBufferIntervalSeconds,ParameterValue=900 ParameterKey=KinesisBufferSizeMB,ParameterValue=3 ParameterKey=IPSetName,ParameterValue=BlockFakeBotIPSet ParameterKey=IPSetScope,ParameterValue=CLOUDFRONT ParameterKey=S3BucketWithDeploymentPackage,ParameterValue=waf-logs-fake-bots-us-east-1 ParameterKey=DeploymentPackageZippedFilename,ParameterValue=lambda_function.py.zip \
--capabilities CAPABILITY_IAM \
--region us-east-1

You need to provide the following information, and you can change the parameters based on your specific needs:

a.     KinesisBufferIntervalSeconds and KinesisBufferSizeMB. These will define the interval at which Kinesis Firehose ships the logs to Amazon S3, the Default is 900 seconds and 3MB respectively, whichever is met first.

b.     IPSetName. Name of the IP Set which will be used to record the client IP address of fake bots. Default value is BlockFakeBotIPSet.

c.     IPSetScope. Scope of the IP Set. I am using CLOUDFRONT and associate it with the CloudFront distribution created in step 1. You can choose to make it REGIONAL in which case WebACLassociation will need to be with an ALB or an API Gateway.

d.     S3BucketWithDeploymentPackage. Name of S3 bucket used in step 3. The blog assumes waf-logs-fake-bots-us-east-1.

e.     DeploymentPackageZipedFilename. Lambda function filename without the file extension. For example, the blog assumes lambda_function.py.zip is available on the Amazon S3 bucket and uses this value for this parameter.

Some stack templates might include resources that can affect permissions in your AWS account, for example, by creating new AWS Identity and Access Management (IAM) role. For those stacks, you must explicitly acknowledge this by specifying CAPABILITY_IAM or CAPABILITY_NAMED_IAM value for the –capabilities parameter.

Stack creation will take you approximately 5-7 minutes. Check the status of the stack by executing the below command every few minutes. You should see StackStatus value as CREATE_COMPLETE.

Example:

aws cloudformation describe-stacks --stack-name FakeBotBlockBlog | grep StackStatus

The CloudFormation template will create the following resources:

  • IP Set for AWS WAF
  • WebACL with rules to block the client IP addresses of fake bots, and an AWS-managed common rule set.
  • Lambda function to help detect fake bots and modify the AWS WAF IP Set to block them
  • Kinesis Firehose delivery stream, which will use the above Lambda function for processing
  • IAM roles with required permissions for the Lambda function and Kinesis Firehose
  • S3 bucket for AWS WAF logs

5.     Enable logging for the WebACL using AWS CLI. For this you need the ARN of the WebACL and Kinesis Firehose. You can find that information from the output of the CloudFormation stack created in step 4 using the below AWS CLI command

aws cloudformation describe-stacks --stack-name FakeBotBlockBlog

Please note the 2 ARNs and run the following commands by replacing (1) ResourceArn value with WebACL ARN and (2) LogDestinationConfigs value with Kinesis Firehose delivery stream ARN.

Example:

aws wafv2 put-logging-configuration –-logging-configuration ResourceArn=arn:aws:wafv2:us-east-1:123456789012:global/webacl/FakeBotWebACL/259ea98f-24ba-4acd-8803-3e7d02e8d482,LogDestinationConfigs=arn:aws:firehose:us-east-1:123456789012:deliverystream/aws-waf-logs-FakeBotBlockBlog --region us-east-1

6.     Associate CloudFront distribution with this WebACL: Sign in to the AWS Management Console and open the AWS WAF and Shield console at https://console.aws.amazon.com/wafv2/homev2/web-acls?region=global .

  • Click on the WebACL created earlier in this procedure
  • Navigate to ‘Associated AWS resources’ tab and select Add AWS resources
  • In the subsequent screen, select the CloudFront distribution created in step 1
  • Select Add

Note: If you are using an ALB or API Gateway for your web property, then you need to use REGIONAL WebACL and IP Set.  Review the procedure to associate an ALB or API Gateway to the WebACL.

You can monitor WebACL performance from the Overview section of WebACL from the AWS WAF and Shield console.

Testing

To test, you will need to generate some traffic which will trigger the lambda function to detect and block the fake bots created earlier in this blog. The web traffic can be generated from the local machine or from an EC2 instance with access to the internet using curl. Manually set the user agent to resemble Googlebot by running the following command from shell:

Replace http://www.awsdemodesign.com/ with the URL of your CloudFront distribution you created in step 1 of the walkthrough.

for i in {1..1000}; do curl -I -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" http://www.awsdemodesign.com/; done

Initially you will see HTTP1.1/ 200 OK response. This will trigger Lambda and modify the IP Set to include your public IP address to be blocked. You can verify that by inspecting the IP set from AWS WAF and Shield console.

  • Sign in to the AWS Management Console and open the AWS WAF and Shield console
  • Click on IP Set created earlier in this blog. In the subsequent screen you can see your public IP address in the list of IP addresses.

If you run the curl command again, you will see that the response now is HTTP/1.1 403 Forbidden.

Clean Up

  • Disassociate the CloudFront Distribution from WebACL
  • Delete the S3 bucket and CloudFront Distribution created in Step 1
  • Empty and delete the S3 bucket created by the CloudFormation stack for AWS WAF logs.
  • Delete CloudFormation stack created in Step 4.

Conclusion

In this blog post, we demonstrated how you can set up and inspect incoming web traffic using AWS Lambda, AWS WAF native logging capabilities, and Kinesis Firehose to help detect and block bad or fake bots at scale. Furthermore, the solution outlined in this post provides a framework which can be extended to identify similar unwanted traffic impersonating as other good bots.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

 

Field Notes: Integrating IoT and ITSM using AWS IoT Greengrass and AWS Secrets Manager – Part 2

Post Syndicated from Gary Emmerton original https://aws.amazon.com/blogs/architecture/field-notes-integrating-iot-and-itsm-using-aws-iot-greengrass-and-aws-secrets-manager-part-2/

In part 1 of this blog I introduced the need for organizations to securely connect thousands of IoT devices with many different systems in the hyperconnected world that exists today, and how that can be addressed using AWS IoT Greengrass and AWS Secrets Manager.  We walked through the creation of ServiceNow credentials in AWS Secrets Manager, the creation of IAM roles and the Lambda functions that will run on our edge device (a Raspberry Pi).

In this second part of the blog, we will setup AWS IoT Greengrass, on our Raspberry Pi, and AWS IoT Core so that we can run the AWS Lambda functions and access our ServiceNow credentials, retrieved securely from AWS Secrets Manager.

Setting up AWS IoT Core and AWS IoT Greengrass

The overall sequence for configuring AWS IoT Core and AWS IoT Greengrass is:

  • Create a certificate, and IoT Thing and link them
  • Create AWS IoT Greengrass group
  • Associate IAM role to the AWS IoT Greengrass group
  • Create and attach a policy to the certificate
  • Create an AWS IoT Greengrass Resource Definition for our ‘Secret’
  • Create an AWS IoT Greengrass Function Definition for our Lambda functions
  • Create an AWS IoT Greengrass Subscription Definition for IoT Topics to be used
  • Finally associate our Resource, Function and Subscription Definitions with our AWS IoT Greengrass Core

Steps

For this walkthrough, I have selected the AWS region “eu-west-1”, however, feel free to use other Regions where AWS IoT Core and AWS IoT Greengrass are available.

First, let’s install Greengrass on the Raspberry Pi:

  • Follow the instructions to configure the pre-requisites on the Raspberry Pi
  • Then we download the AWS IoT Greengrass software
  • And then we unzip the AWS IoT Greengrass software using the following command (note, this command is for version 1.10.0 of Greengrass and will change as later versions are released):

sudo tar -xzvf greengrass-linux-armv6l-1.10.0.tar.gz -C /

Note that AWS IoT Greengrass must be compatible with the version of the AWS Greengrass SDK installed to identify what versions are compatible and use sudo pip3 install greengrasssdk==<version_number> to install the SDK compatible with the version of AWS IoT Greengrass that we installed.

Our AWS IoT Greengrass core will authenticate with AWS IoT Core in AWS using certificates, so we need to generate these first using the following command:

aws iot create-keys-and-certificate --set-as-active --certificate-pem-outfile "iot-ge.cert.pem" --public-key-outfile "iot-ge.public.key" --private-key-outfile "iot-ge.private.key"

This command will generate three files containing the private key, public key and certificate.  All of these files need to be copied to the /greengrass/certs folder on the Raspberry Pi.  Also, the output of the preceding command will give the ARN of the certificate – we need to make a note of this ARN as we will use it in the next steps.

We also need to download a copy of the Amazon Root CA into the /greegrass/certs folder using the command below:

sudo wget -O root.ca.pem https://www.amazontrust.com/repository/AmazonRootCA1.pem

For the next step we need our AWS account number and IoT Host address unique to our account – we get the IoT Host address using the command:

aws iot describe-endpoint --endpoint-type iot:Data-ATS

Now we need to create a config.json file on the Raspberry Pi in the /greengrass/config folder, with the account number and IoT Host address obtained in the previous step;

{
  "coreThing" : {
    "caPath" : "root.ca.pem",
    "certPath" : "iot-ge.cert.pem",
    "keyPath" : "iot-ge.private.key",
    "thingArn" : "arn:aws:iot:eu-west-1:<aws_account_number>:thing/IoT-blog_Core",
    "iotHost" : "<endpoint_address>",
    "ggHost" : "greengrass-ats.iot.eu-west-1.amazonaws.com",
    "keepAlive" : 600
  },
  "runtime" : {
    "cgroup" : {
      "useSystemd" : "yes"
    },
    "allowFunctionsToRunAsRoot" : "yes"
  },
  "managedRespawn" : false,
  "crypto" : {
    "principals" : {
      "SecretsManager" : {
        "privateKeyPath" : "file:///greengrass/certs/iot-ge.private.key"
      },
      "IoTCertificate" : {
        "privateKeyPath" : "file:///greengrass/certs/iot-ge.private.key",
        "certificatePath" : "file:///greengrass/certs/iot-ge.cert.pem"
      }
    },
    "caPath" : "file:///greengrass/certs/root.ca.pem"
  }
}

Note that the line "allowFunctionsToRunAsRoot" : "yes" allows the Lambda functions to easily access the SenseHat on the Raspberry Pi. This configuration should normally be avoided in Production environments for security reasons but has been used here for simplicity.

Next we create the IoT Thing to represent our Raspberry Pi to match the entry we added into the config.json file previously:

aws iot create-thing --thing-name IoT-blog_Core

Now that our config.json file is in place and our IoT ‘thing’ created we can start the AWS IoT Greengrass software using the following commands:

cd /greengrass/ggc/core/
sudo ./greengrassd start

Then we attach the certificate to our new Thing – we need the ARN of the certificate that was noted in the earlier steps when we created the certificates:

aws iot attach-thing-principal --thing-name "IoT-blog_Core" --principal "<certificate_arn>"

Now we create the AWS IoT Greengrass group – make a note of the Group ID in the output of this command as we use it later:

aws greengrass create-group --name IoT-blog-group

Next we create the AWS IoT Greengrass Core definition file – create this using a text editor and save as core-def.json

{
  "Cores": [
    {
      "CertificateArn": "<certificate_arn>",
      "Id": "<IoT Thing Name>",
      "SyncShadow": true,
      "ThingArn": "<thing_arn>"
    }
  ]
}

Then, using the preceding file we just created, we create the core definition using the following command:

aws greengrass create-core-definition --name "IoT-blog_Core" --initial-version file://core-def.json

Now we associate the AWS IoT Greengrass core with the AWS IoT Greengrass group – we need the LatestVersionARN from the output of the command above and the group ID of your existing AWS IoT Greengrass group (in the output from the command for creation of the group in previous steps):

aws greengrass create-group-version --group-id "<greengrass_group_id>" --core-definition-version-arn "<core_definition_version_arn>"

Then we associate the IAM role (created earlier) to the AWS IoT Greengrass group;

aws greengrass associate-role-to-group --group-id "<greengrass_group_id>" --role-arn "arn:aws:iam::<aws_account_number>:role/IoTGGRole"

We need to create a policy to associate with the certificate so that our AWS IoT Greengrass Core (authenticated/authorized by our certificates) has rights to interact with AWS IoT Core.  To do this we create the policy.json file:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "iot:Publish",
        "iot:Subscribe",
        "iot:Connect",
        "iot:Receive"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "iot:GetThingShadow",
        "iot:UpdateThingShadow",
        "iot:DeleteThingShadow"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "greengrass:*"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

Then create the policy using the policy file using the command below:

aws iot create-policy --policy-name myGGPolicy --policy-document file://policy.json

And finally attach our new policy to the certificate – as the certificate is attached to our AWS IoT Greengrass Core, this gives the rights defined in the policy to our AWS IoT Greengrass Core;

aws iot attach-policy --target "<certificate_arn>" --policy-name "myGGPolicy"

Now we have the AWS IoT Greengrass Core and permissions in place, it’s time to add our Secret as a resource for AWS IoT Greengrass.

First, we need to create a resource definition that refers to the ARN of the secret we created earlier.  Get the ARN of the secret using the following command:

aws secretsmanager describe-secret --secret-id "greengrass-snow-creds"

And then we create a text file containing the following and save it as resource.json:

{
"Resources": [
    {
      "Id": "SNOW-Credentials",
      "Name": "SNOW-Credentials",
      "ResourceDataContainer": {
        "SecretsManagerSecretResourceData": {
          "ARN": "<secret_arn>"
        }
      }
    }
  ]
}

Now we command to create the resource reference in IoT to the Secret:

aws greengrass create-resource-definition --name "MySNOWSecret" --initial-version file://resource.json

Note the Resource ID from the output as it is needed as it has to be added to the Lambda definition json file in the next steps.  The function definition file contains the details of the Lambda function(s) that we will attach to our AWS IoT Greengrass group.  We create a text file with the content below and save as lambda-def.json.

We also specify a couple of variables in the definition file; these are the same as the environment variables that can be specified for Lambda, but they make the variables available in AWS IoT Greengrass.

Note, if we specify environment variables for the functions in the Lambda console then these will NOT be available when the function is running under AWS IoT Greengrass.  We will need our ServiceNow API URL to add to the configuration below, and this will be in the form of https://devXXXXX.service-now.com/api/now/table/incident, where XXXXX is the developer instance number assigned by ServiceNow when our instance is created.

We need the ARNs of the Lambda functions that we created in part 1 of the blog – these appear in the output after successfully creating the functions from the command line, or can be obtained using the aws lambda list-functions command – we need to have the ‘:1’ at the end of the ARN as AWS IoT Greengrass needs to reference published function versions.

{
  "DefaultConfig": {
    "Execution": {
      "IsolationMode": "NoContainer",
      "RunAs": {
        "Gid": 0,
        "Uid": 0
      }
    }
  },
  "Functions": [
    {
      "FunctionArn": "<lambda_function1_arn>:1",
      "FunctionConfiguration": {
        "EncodingType": "json",
        "Environment": {
          "Execution": {
            "IsolationMode": "NoContainer"
          },
          "Variables": { 
            "tempLimit": "30",
            "humidLimit": "50"
          }
        },
        "ExecArgs": "string",
        "Executable": "lambda_function.lambda_handler",
        "Pinned": true,
        "Timeout": 10
      },
    "Id": "sensorLambda"
    },
    {
      "FunctionArn": "<lambda_function2_arn>:1",
      "FunctionConfiguration": {
        "EncodingType": "json",
        "Environment": {
          "Execution": {
            "IsolationMode": "NoContainer"
          },
          "ResourceAccessPolicies": [
            {
              "Permission": "ro",
              "ResourceId": "SNOW-Credentials"
            }
          ],
          "Variables": { 
            "snowUrl": "<service_now_api_url>"
          }
        },
        "ExecArgs": "string",
        "Executable": "lambda_function.lambda_handler",
        "Pinned": false,
        "Timeout": 10
      },
    "Id": "anomalyLambda"
    }
  ]
}

The Lambda function now needs to be registered within our AWS IoT Greengrass core using the definition file just created, using the following command:

aws greengrass create-function-definition --name "IoT-blog-lambda" --initial-version file://lambda-def.json

Create Subscriptions

We now need to create some IoT Topics to pass data between the two Lambda functions and also to submit all sensor data to AWS IoT Core, which gives us visibility of the successful collection of sensor data.cd.

First, let’s create a subscription configuration file (subscriptions.json) for sensor data and anomaly data:

{
  "Subscriptions": [
    {
      "Id": "SensorData",
      "Source": "<lambda_function1_arn>:1",
      "Subject": "IoTBlog/sensorData",
      "Target": "cloud"
    },
    {
      "Id": "AnomalyData",
      "Source": "<lambda_function1_arn>:1",
      "Subject": "IoTBlog/anomaly",
      "Target": "<lambda_function2_arn>:1"
    },
    {
      "Id": "AnomalyDataB",
      "Source": "<lambda_function1_arn>:1",
      "Subject": "IoTBlog/anomaly",
      "Target": "cloud"
    }
  ]
}

And next, we run the command to create the subscription from this configuration:

aws greengrass create-subscription-definition --name "IoT-sensor-subs" --initial-version file://subscriptions.json

Update AWS IoT Greengrass Group Associations and Deploy

Now that the functions, subscriptions and resources have been defined, we run the following command to update our AWS IoT Greengrass group to the new version with those components included:

aws greengrass create-group-version --group-id <gg_group_id> --core-definition-version-arn "<core_def_version_arn>" --function-definition-version-arn "<function_def_version_arn>" --resource-definition-version-arn "<resource_def_version_arn>" --subscription-definition-version-arn "<subscription_def_version_arn>"

And finally, we can deploy our configuration.  Use the following command to deploy the Greengrass group to our device, using the group-version-id from the output of the previous command and also the group-id:

aws greengrass create-deployment --deployment-type NewDeployment --group-id <gg_group_id> --group-version-id <gg_group_version_id>

Summarized below is the integration between the different functions and components that we have now deployed to get from our sensor data through to an incident being raised in ServiceNow:

Raspberry PI

Create an Incident

Everything is setup now from an IoT perspective, so we can attempt to trigger a threshold breach on the sensors to trigger the creation of an incident in ServiceNow.  In order to trigger the incident creation, let’s raise the humidity around the sensor so that it breaches the threshold defined in the environment variables of the Lambda function.

Under normal conditions we will just see the data published by the first Lambda function in the IoTBlog/sensorData topic:

IoTblog sensordata

However, when a threshold is breached (in our example, humidity above 50%), the data is published to the IoTBlog/anomaly topic as shown below:

ioTblog Anomaly

Via the AWS IoT Greengrass subscriptions created earlier, this message arriving in the anomaly topic also triggers the second Lambda function to create the ticket in ServiceNow.

The log for the second Lambda function on AWS IoT Greengrass (stored in /greengrass/ggc/var/log/user/<region>/<aws_account_number>/ on the Raspberry Pi) will show a ‘201’ return code if the incident is successfully created in ServiceNow.

201 response

Now let’s log on to ServiceNow and check out our new incident.  Good news, our new incident appears correctly:

And when we click on our incident we can see the detail, including the full data from the IoT topic in the Activities section;

This is only a basic use of the ServiceNow API and there are many other parameters that you can use to increase the richness of the incident, refer to the ServiceNow API documentation for more details.

Cleaning up

To avoid incurring future charges, delete the resources that you created in the walkthrough.

Conclusion

We have built an IoT device (Raspberry Pi), running AWS IoT Greengrass, AWS Lambda, and using ServiceNow credentials managed in AWS Secrets Manager.  Using this we have triggered an anomaly event that has created an incident automatically in ServiceNow, directly from the Lambda function running on our Pi.  You can use this architecture as the foundation to integrate your edge devices and ITSM solution to automate ticket generation in your organization.

Look out for follow-up blogs that will extend this solution to provide a real-time dashboard for the sensor data and store the sensor data in a Data Lake for historical visualization.

Find out more about deploying Secrets to AWS IoT Greengrass Core.

Check out the AWS IoT Blog for more examples of how to use AWS to integrate your edge devices with the AWS Cloud.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

 

Field Notes: Integrating IoT and ITSM using AWS IoT Greengrass and AWS Secrets Manager – Part 1

Post Syndicated from Gary Emmerton original https://aws.amazon.com/blogs/architecture/field-notes-integrating-iot-and-itsm-using-aws-iot-greengrass-and-aws-secrets-manager-part-1/

IT Security is a hot topic in every organization, and in a hyper connected world the need to integrate thousands of IoT devices securely with many different systems at scale is critical.

AWS Secrets Manager helps customers manage their system credentials securely in the AWS Cloud, and with its integration with AWS IoT Greengrass, that capability now extends out to your edge-connected IoT devices.

In this two part blog post, I will walk through the steps to use this integration to give edge devices the capability to connect to and log incidents directly into ServiceNow. The credentials for connecting to ServiceNow are created in AWS Secrets Manager, and deployed locally (encrypted) to the edge device via AWS IoT Greengrass.

Part 1 (this post) gives an overview of the whole solution and the steps for setting up AWS Secrets Manager, creating the required IAM roles and AWS Lambda functions.  In part 2 of the blog we will then set up AWS IoT Greengrass and AWS IoT Core so that we can run the functions and access the secret on our edge device (Raspberry Pi).

Enabling edge devices to automatically raise incidents in your organization’s ITSM toolset ensures that you can use existing workflows and incident escalation paths for your edge devices.  Previously this would have been challenging to integrate. Additionally, by running this capability at the edge, it enables quicker responses and reduces the need to make calls back to the AWS Cloud.

Overview of solution

The solution makes use of a Raspberry Pi, running AWS IoT Greengrass. AWS Lambda functions on AWS IoT Greengrass capture temperature and humidity sensor data and make calls directly to ServiceNow when thresholds are breached.  The integration of AWS Secrets Manager with AWS IoT Core enables the credentials required for the ServiceNow API calls to be available locally. These calls are encrypted and available on the Pi for the Lambda function to use.

The sensors for temperature and humidity in this example are on a Raspberry Pi Sense Hat, which illustrates sensors that could be used in an industrial or manufacturing use case.  You can use any type of sensor such as vibration, strain gauges, or other electro-mechanical sensors.

One of the Lambda functions running on AWS IoT Greengrass on the RPi captures the sensor readings, and should a threshold on either be exceeded then it triggers a second Lambda function (again running on AWS IoT Greengrass). This then makes a ‘create incident’ API call to ServiceNow, using the credentials stored in AWS Secrets Manager.

In order to have visibility of the sensor data, and to manage communications between the first and second Lambda functions, all data from the sensors is published to one IoT Topic Data related to any threshold breaches is published to another IoT Topic.

Following is a high-level diagram for the architecture used in this blog.

ServiceNow RA

Prerequisites

To complete the steps in this blog, you need:

  • An AWS Account
  • A ServiceNow developer instance or other test ServiceNow instance that you can access – You can sign-up for a free ServiceNow developer account 
  • A Raspberry Pi (I used a Pi 3B with Raspbian Buster)
  • A Raspberry Pi SenseHat
  • A workstation with the latest AWS Command Line Interface (CLI) installed

Additionally, ensure that you have Python 3.7.x installed with the following Python modules on the Raspberry Pi (Raspian Buster includes Python 3.7 by default). Install the following packages as the root user (sudo pip):

  • greengrasssdk
  • boto3
  • requests
  • sense_hat
  • datetime

I have taken the approach of having these modules installed on the Pi in order to simplify the creation of the Lambda function.  This ensures the function will run locally on the Pi rather than having to build all of the Python modules on the Pi and then zip them to run the Lambda in an AWS IoT Greengrass container.

Walkthrough

The steps in the walkthrough can be achieved from the AWS console. I have focused on the command line approach, using the AWS CLI, as this will give a more detailed view of what is happening and the dependencies between the different components.  The overall sequence for the steps is:

  • Create secret in AWS Secrets Manager – this will contain the credentials required to access ServiceNow for Lambda running on AWS IoT Greengrass
  • Create IAM role – provides permissions for AWS IoT Greengrass to other AWS services, including AWS Secrets Manager
  • Create Lambda functions – the functions that will capture sensor data and create the Service Now ticket
  • Configure and Deploy IoT Core – deploy our configuration to the Raspberry Pi, covered in Part 2 of this blog

I’ve structured the order of the steps in a logical sequence so that any dependencies of later steps are created first.  There are a number of places where a value in the output of one command (such as an ARN) needs to be noted as required for subsequent commands. At times the steps may seem counter-intuitive, but whilst developing this blog, I found this sequence has proven to be the most effective.

Create Secret

First, we create our ‘Secret’ in AWS Secrets Manager.  This consists of a secret string containing a JSON object for the username and password required for authentication to the API of my ServiceNow developer instance.  For the purposes of this example, we will use the default encryption key for the AWS Secrets Manager service.

The following command, from the AWS CLI, is used to create the new secret, entering the relevant username and password for the ServiceNow instance.

IMPORTANT: the name of the secret must start with “greengrass-“(specified in the command after the “–name” parameter) because the IAM Greengrass managed service role (which we will use later) has permission by default to access secrets that start with this text.

aws secretsmanager create-secret --name "greengrass-snow-creds" --description "Credentials for ServiceNow API access" --secret-string '{"username":"&lt;username&gt;","password":"&lt;password&gt;"}'

The successful completion of the *preceding command results in an output on your terminal screen, containing the ARN (Amazon Resource Name) for the new secret.

Create IAM roles

In order for AWS IoT Greengrass to access our new secret and download it securely to AWS IoT Greengrass running on the Pi, we need to give it an IAM role. If we do not set this up correctly then there will be an error when trying to deploy the AWS IoT Greengrass group later in the walkthrough.

Our new IAM role needs a policy document that describes the permissions that we will give the role.  The first step is to create the IAM policy document, containing only the permissions for the Greengrass service this new role – to do this we create a text file called assume-role.json containing the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "greengrass.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Then we run the following command, referencing the file just created:

aws iam create-role --role-name "IoTGGRole" --assume-role-policy-document file://assume-role.json

And finally we need to attach the IAM managed AWS IoT Greengrass service role to our new custom role:

aws iam attach-role-policy --role-name "IoTGGRole" --policy-arn arn:aws:iam::aws:policy/service-role/AWSGreengrassResourceAccessRolePolicy

We also need a role for our Lambda functions with basic execution permissions; create a text file called lambda-role.json containing the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Then we run the following command, referencing the file just created:

aws iam create-role --role-name "IoTLambdaRole" --assume-role-policy-document file://lambda-role.json

And finally we need to attach the IAM managed Lambda basic execution role to our new custom role:

aws iam attach-role-policy --role-name "IoTLambdaRole" --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Create Lambda Functions

We now create the Lambda functions that will be deployed to Greengrass – these functions manage the interactions between the sensors and ServiceNow.

Lambda Function 1 – Get/Publish Sensor Data

The first Lambda function reads the sensor data and publishes the data to an IoT Topic (IoTBlog/sensorData), every 5 seconds, which can then be used by downstream services for analytics.  This function also determines whether a threshold has been breached and if so, it publishes the data to a separate IoT Topic (IotBlog/anomaly) to which our second Lambda function is subscribed.

import os
import json
from datetime import datetime
import time
import sys
from sense_hat import SenseHat
import greengrasssdk
import boto3

client = greengrasssdk.client('iot-data')
secClient = greengrasssdk.client('secretsmanager')
sense = SenseHat()
sense.clear()

t_threshold = int(os.environ['tempLimit'])
h_threshold = int(os.environ['humidLimit'])

def lambda_handler(event, context):
    return
 
# Get sensor data and check against thresholds
def getSensorData():
    while True:
        eventTitle = "no event"
        anomaly = False
        ts = time.time()
        dt = datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')
        
        temp     = round(sense.get_temperature(),2)
        humidity = round(sense.get_humidity(),2)
        
        time.sleep(5)

        if (temp > int(t_threshold)):
            anomaly = True
            eventTitle = "Temperature breach"
            
        if (humidity > int(h_threshold)):
            anomaly = True
            eventTitle = "Humidity breach"

        sensorData = { 'title': eventTitle,'dt':dt,'ts':ts,'t':temp,'h':humidity }
        publishData(anomaly,sensorData)

# Publish sensor data to IoT topic(s)
def publishData(anomaly,myData):
    response = client.publish(
        topic = 'IoTBlog/sensorData',
        payload = json.dumps(myData) )
    

We create a text file lambda_function.py containing the code above and then compress into a zip file named lambda_function_1.zip.  Once this is done we can then create the function in AWS using the following command:

aws lambda create-function --function-name "1-IoT-GetSensorData" --runtime python3.7 --zip-file fileb://lambda_function_1.zip --handler lambda_function.lambda_handler –role arn:aws:iam::&lt;aws_account_number&gt;:role/IoTLambdaRole
In order to make use of Lambda functions in AWS IoT Greengrass we then need to publish a version of the function using the following command:
aws lambda publish-version --function-name "1-IoT-GetSensorData"

Lambda Function 2 – Create Anomaly Ticket

The second Lambda function is triggered through its subscription to the anomaly data published to the anomaly IoT Topic by the first Lambda function. This then makes a call to the ServiceNow API to create an incident.  Prior to making the API call, the function obtains the ServiceNow credentials from the secret that has been made available to AWS IoT Greengrass.

As this is a Resource within the AWS IoT Greengrass Core, it is automatically downloaded to the Raspberry Pi as part of the deployment of the AWS IoT Greengrass Core.

import os
import json
import sys
import greengrasssdk
import requests

client = greengrasssdk.client('iot-data')
secClient = greengrasssdk.client('secretsmanager')
secret_name = "greengrass-snow-creds"
snow_instance = os.environ['snowUrl']
msg = ""

def publishAnomaly(msg,title):
    auth = getLocalSecret()
    createTicket(auth,msg,title)

# Get secret from GG
def getLocalSecret():
    secret = secClient.get_secret_value(SecretId=secret_name)
    rawSecret = secret.get('SecretString')
    return json.loads(str(rawSecret))

# Create ticket in ServiceNow            
def createTicket(auth,eventData,title):
    API_ENDPOINT = snow_instance
    HEADERS = {"Content-Type":"application/json","Accept":"application/json"}
    PARAMS = { 
        "short_description":title,
        "assignment_group":"sensor_team",
        "urgency":"2",
        "impact":"2",
        "comments":eventData
    } 

    request = requests.post(url = API_ENDPOINT, auth=(str(auth["username"]),str(auth["password"])), headers=HEADERS, data = json.dumps(PARAMS))
    print("Response:",request)

def lambda_handler(event, context):
    msg = json.dumps(event)
    msg = json.loads(msg)
    title = "Sensor Threshold - " + msg["title"]
    
    publishAnomaly(msg,title)
    return

We create another text file, lambda_function.py containing the code above and then compress into a zip file named lambda_function_2.zip.  Once this is done we can then create the function in AWS using the following command:

aws lambda create-function --function-name "2-IoT-ServiceNow" --runtime python3.7 --zip-file fileb://lambda_function_2.zip --handler lambda_function.lambda_handler –role arn:aws:iam::&lt;account_number&gt;:role/IoTLambdaRole

In order to make use of Lambda functions in Greengrass we then need to publish a version of the function using the following command:

aws lambda publish-version --function-name "2-IoT-ServiceNow"

Conclusion

In this post, I showed you the steps for integrating IoT and ITSM by setting up AWS Secrets Manager, creating the required IAM roles and AWS Lambda functions.  Now you can proceed to part 2 of the blog to set up AWS IoT-Core and AWS IoT Greengrass to make use of the secret and functions that you created in this post.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Customizing the AWS Control Tower Account Factory with AWS Service Catalog

Post Syndicated from Remek Hetman original https://aws.amazon.com/blogs/architecture/field-notes-customizing-the-aws-control-tower-account-factory-with-aws-service-catalog/

Many AWS customers who are managing hundreds or thousands of accounts know how complex and time consuming this process can be. To reduce the burden and simplify the process of creating new accounts, last year AWS released a new service, AWS Control Tower.

AWS Control Tower helps you automate the process of setting up a multi-account AWS environment (AWS Landing Zone) that is secure, well-architected, and ready to use. This Landing Zone is created following best practices established through AWS’ experience working with thousands of enterprises as they move to the cloud. This includes the configuration of AWS Organizations, centralized logging, federated access, mandatory guardrails, and networking.

Those elements are a good starting point to cover the initial configuration of the new account. For some organizations, the next step is to baseline a newly vended account to align it with the company policies and compliance requirements. This means create or deploy necessary roles, policies, governance controls, security groups and so on.

In this blog post, I describe a solution that helps achieve consistent governance and compliance requirements across accounts created by AWS Control Tower. The solution uses the AWS Service Catalog as the repository of products that will be deployed into the new accounts.

Prerequisites and assumptions

Before I get into how it works, let’s first review a few key AWS Service Catalog concepts:

  • An AWS Service Catalog product is a blueprint for building your AWS resources that you want to make available for deployment on AWS along with the configuration information.
  • A portfolio is a collection of products, together with the configuration information.
  • A provisioned product is an AWS CloudFormation stack.
  • Constraints control the way users can deploy a product. With launch constraints, you can specify a role that the AWS Service Catalog can assume to launch a product.
  • Review AWS Service Catalog reference blueprints for a quick way to set up and configure AWS Service Catalog portfolios and products.

You need an AWS Service Catalog portfolio with products that you plan to deploy to the accounts. The portfolio has to be created in the AWS Control Tower primary account. If you don’t have a portfolio, the starting point would be to review these resources:

Solution Overview

This solution supports the following scenarios:

  • Deployment of products to the newly vended AWS Control Tower account (Figure 1)
  • Update and deployment of products to existing accounts (Figure 2)
Figure 1 Deployment to new account

Figure 1 – Deployment to new account

The architecture diagram in figure 1 shows the process of deploying products to the new account.

  1. AWS Control Tower creates a new account
  2. Once an account is created successfully, Amazon CloudWatch Events trigger an AWS Lambda function
  3. The Lambda function pulls the configuration from an Amazon S3 bucket and:
    • Validates configuration
    • Grants Lambda role access to portfolio(s)
    • Creates StackSet constrains for products
  4. Lambda calls the AWS Step Function
  5. The AWS Step Function orchestrates deployment of the AWS Service Catalog products and monitors progress
Figure 2 Update or deployment to an existing account

Figure 2 – Update or deployment to an existing account

The architecture diagram in figure 2 shows process of updating product or deploying product to the existing account.

  1. User uploads update configuration to an Amazon S3 Bucket
  2. Update triggers AWS Lambda Function
  3. Lambda function reads uploaded configuration and:
    • Validates configuration
    • Grants Lambda role access to portfolio(s)
    • Creates StackSet constrains for products
  4. Calls AWS Step Function
  5. AWS Step Function orchestrate update/deployment of AWS Service Catalog products and monitoring progress

Deployment and configuration

Before proceeding with the deployment, you must install Python3.

Then, follow these steps:

  1. Download the solution from GitHub
  2. Go to folder src and run the following command:                                                                                    “pip3 install –r requirements.txt –t .”
  3.  Zip content of the src folder to “control-tower-account-factory-solution.zip” file
  4. Upload zip file to your Amazon S3 bucket created in the AWS Region where you want to deploy the solution
  5. Launch the AWS CloudFormation template

AWS CloudFormation Template Parameters:

  • ConfigurationBucketName – name of the Amazon S3 bucket from where AWS Lambda function should pull the configuration file. You can provide the name of an existing bucket.
  • CreateConfigurationBucket – set to true if you want to create a bucket specified in previous parameter. If a bucket exists, set value to false
  • ConfigurationFileName – name of the configuration file for a new account. Default value: config.yml
  • UpdateFileName – name of the configuration file for updates. Default value: update.yml
  • SourceCodeBucketName – name of the bucket where you uploaded the zipped Lambda code
  • SourceCodePackageName – name of the zipped file contain Lambda. If the file was uploaded to folder, include the name of folder(s) as well. Example: my_folder/control-tower-account-factory-solution.zip
  • BaselineFunctionName –name of the AWS Lambda function. Default value: control-tower- account-factory-lambda
  • BaselinetLambdaRoleName –name of the AWS Lambda function IAM role. Default value: control-tower- account-factory-lambda-role
  • StateMachineName  – name of the AWS Step Function. Default value: control-tower- account-factory-state-machine
  • StateMachineRoleName – name of the AWS Step Function IAM role. Default value: control-tower- account-factory-state-machine-role
  • TrackingTableName – name of the DynamoDB table to track all deployment and updates. Default value: control-tower-account-factory-tracking-table
  • MaxIterations –maximum iteration of the AWS Step Function before the reports time out. Default value: 30. It can be overwrite in configuration file. For more information see Configuration section.
  • TopicName – (optional) name of the SNS Topic that will be used for sending notification. Default value: control-tower- account-factory-account-notification
  • NotificationEmail – (optional) the email address where to send the notification. If this isn’t provided, the SNS topic won’t be created.

Configuration Files

The configuration files have to be in the YAML format, you can use the following schemas for new or existing accounts:

Configuration schema for new account:

The configuration can apply to all new accounts or can be divided based on the organization unit associated with the new account. Also, you can mix deployments where some products are always deployed and some only to specific organization units.

Example configuration file

Configuration schema for existing accounts

This configuration can be used to update provisioned products or deploy products into existing accounts. You can specify a target location either as account(s), organization unit(s) or both. If you specified an organization unit as a target, the product(s) will be updated/deployed to all accounts associated with organization unit.

If the product doesn’t exist in the account, the Lambda function attempts to deploy it. You can manage this by setting the parameter ‘deployifnotexist’ to true. If omitted or set to false, Lambda won’t provision the product to an existing account.

Example configuration file

All products deployed by the AWS Service Catalog are provisioned under their own name that has to be unique across all provisioned products. In this example, products are provisioned under the following naming convention:

<account id where product is provision>-<”provision_name” from configuration file>

Example:  123456789-my-product

In the configuration file you can specify dependencies between products. Dependencies need to be provided as the list of provisioned product name not a product name. If provided, deployment of the product will be suspended until all dependencies successfully deployed.

The Step Function is running in the loop with interval of 1 minute and checking if dependencies were deployed. In the event of an error in the dependency naming or configuration, the step function iterates only until it reaches the maximum iterations defined in the configuration file. If the maximum iterations are reached, the step function reports time out and interrupts the products’ deployment.

Important: if you updating a product by adding a new AWS Region, you cannot specify the dependency or run updates in existing regions the same time. In this scenario you should break the update as follows:

  • Upload configuration to create dependencies in the new region
  • Upload configuration to create products only in the new region

You can find different examples of configuration files in GitHub.

Note: The name of the configuration file and Amazon S3 location must match the values provided in the AWS CloudFormation template during solution deployment.

AWS Lambda functions deployment considerations

When deploying product is a Lambda function, you need to consider two requirements:

  1. The source code of the Lambda function needs to be in the Amazon S3 bucket created in the same AWS region where you are planning to deploy a Lambda function
  2. The destination account needs to have permission to the source Amazon S3 bucket

To accommodate both requirements, the approach is to create an Amazon S3 bucket under the AWS Control Tower primary account. There is one Amazon S3 bucket in each Region where the functions will be deployed.

Each deployment bucket should have attached the following policy:

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::<S3 bucket name>/*",
            "Condition": {
                "StringEquals": {
                    "aws:PrincipalOrgID": "<AWS Organizations Id>"
                },
                "StringLike": {
                    "aws:PrincipalArn": [
                        "arn:aws:iam::*:role/AWSControlTowerExecution",
        "arn:aws:iam::*:role/< name of the Lambda IAM role – default: control-tower-account-factory-lambda-role>"
                    ]
                }
            }
        }
    ]
}

This allows the deployment role to access Lambda’s source code from any account in your AWS Organizations.

Conclusion

In this blog post, I’ve outlined a solution to help you drive consistent governance and compliance requirements across accounts vended though AWS Control Tower. This solution provides enterprises with mechanisms to manage all deployments from a centralized account. Also, this reduces the need for maintaining multiple separate CI/CD pipelines.  Therefore you can simplify and reduce deployment time in a multi-account environment.

This solution allows you keep provisioned products up to date by updating existing products or bringing old accounts to the same compliance level as the new accounts.

Since this solution supports deployment to existing accounts and can be run without AWS Control Tower, it can be used for deployment of any AWS Service Catalog product either in single or multiple account environments. This solution then becomes an integral part of CI/CD pipeline.

For more information, review the AWS Control Tower documentation and AWS Service Catalog documentation. Also, review the links listed in the “Prerequisites and assumptions” section of this post.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Migrating a Self-managed Kubernetes Cluster on Amazon EC2 to Amazon EKS

Post Syndicated from Ahmed Bham original https://aws.amazon.com/blogs/architecture/field-notes-migrating-a-self-managed-kubernetes-cluster-on-ec2-to-amazon-eks/

AWS customers from startups to enterprises have been successfully running Kubernetes clusters on Amazon EC2 instances since 2015, well before Amazon Elastic Kubernetes Service (Amazon EKS), was launched in 2018. As a fully managed Kubernetes service, Amazon EKS customers can run Kubernetes on AWS without needing to install, operate, and maintain their own Kubernetes control plane. Since its launch, many existing and new customers are building and running their Kubernetes clusters on Amazon EKS.

At re:Invent 2019, AWS announced AWS Fargate for Amazon EKS, which is serverless compute for containers. Then, in January 2020, AWS announced a price reduction of Amazon EKS  by 50% to $0.10 per hour, per cluster. These developments, coupled with realization of management and cost overhead of Kubernetes control plane operations at scale, made more customers look into migrating their self-managed Kubernetes clusters to Amazon EKS.

The “how” of this migration is the focus of this blog.

Overview of Solution

For most customers migrating from self-managed Kubernetes clusters on Amazon EC2 to Amazon EKS can usually be accomplished with minimal or no downtime. However, for large clusters involving hundreds of nodes and thousands of pods, this requires more planning and testing, and it is recommended to engage AWS Support for guidance.

Kubernetes control plane

Considerations

There are certain considerations to ensure a successful Amazon EKS migration and operational excellence.

Security

  • Access Control
    • Amazon EKS uses AWS Identity and Access Management (IAM) to provide authentication to your Kubernetes cluster, but it still relies on native Kubernetes Role Based Access Control (RBAC) for authorization. It’s important to plan for the creation and governance of IAM users, roles, or groups, for Kubernetes cluster administration.
    • You can enable private access to the Kubernetes API server so that all communication between your nodes and the API server stays within your VPC. You can limit the IP addresses that can access your API server from the internet, or completely disable internet access to the API server.
  • IAM Role for Service Account
    • With IAM roles for service accounts on Amazon EKS clusters, you can associate an IAM role with a Kubernetes service account. This service account can then provide AWS permissions to the containers in any pod that uses that service account. With this feature, you no longer need to provide extended permissions to the node IAM role so that pods on that node can call AWS APIs.
  • Security groups for pods
    • Security groups for pods integrate Amazon EC2 security groups with Kubernetes pods. You can use Amazon EC2 security groups to define rules that allow inbound and outbound network traffic to and from pods. These pods are deployed to nodes running on many Amazon EC2 instance types.

Networking

  • Amazon EKS supports native VPC networking with the Amazon VPC Container Network Interface (CNI) plugin for Kubernetes. Using this plugin allows Kubernetes pods to have the same IP address inside the pod as they do on the VPC network.
  •  VPC CNI plugin uses IP addresses for the pods from the VPC CIDR ranges, and specifically from the subnet where the worker node is hosted. Therefore, customers must ensure the VPC and subnets that will host their Amazon EKS cluster have sufficient IP addresses available for the expected number of pods running at a time. Additionally, IP addresses are allocated to the Elastic Network Interface (ENI) attached to the EC2 instances.  The EC2 instance selection for the worker nodes should take into account the number of ENI attachments supported by the instance type.

Compute Options

Kubernetes Versions

  • Amazon EKS supports four major Kubernetes versions at a time,  which you can review in the available AWS documentation, along with a calendar for future Amazon EKS releases.
  • If you are currently running a non-supported Kubernetes cluster, or would like to migrate to a newer version on Amazon EKS, consider the following:
    1. Review release notes for specific Kubernetes version you want to migrate to, and make necessary updates to Kubernetes manifest files.
    2. Update your Kubernetes add-ons (CNI plugin, CoreDNS, Kube-Proxy) to compatible versions, as listed in the Updating an Amazon EKS Cluster guide.

Prerequisites

  1. Create an IAM Role for the creator and/or administrator of the Amazon EKS cluster. Specify this role when creating the Amazon EKS cluster.
  2. If using an existing VPC and subnets to host Amazon EKS cluster:
    • You will need subnets in at least two Availability Zones
    • All public subnets should have the property MapPublicIpOnLaunch enabled (that is, Auto-assign public IPv4 address in the AWS Management Console) to host self-managed and managed node groups.

3. If your pods are currently accessing AWS resources, and if you would like to use IAM roles for service accounts, then:

    • Create service accounts in Kubernetes to be used by your pods.
    • Follow these steps to create IAM roles, and assign to service account created.
    • Update your pod manifest files to specify the newly created service account and role ARN annotation. Remove any existing code for storing or passing IAM credentials.

4. If you are planning to use AWS Fargate to run your pods, you need to create the appropriate Fargate Profile and pod execution role.

Application and Data Migration

  • For stateless workloads, apply your resource definitions (YAML or Helm) to the new cluster, and make sure everything works as expected. This includes the connection to resources external to the cluster.
  • For stateful workloads:
    1. You will need to carefully plan your migration to avoid data loss or unexpected downtime.
    2. If you are currently using shared persistent file storage based on Amazon EFS or Amazon FSx for Lustre, they can be mounted to Amazon EKS pods concurrently. Just make sure that pods don’t write to the same files concurrently.
    3. For pods using EBS volumes, and for other persistent storage types, you can use a Kubernetes backup and restore tool, Velero.

Traffic Migration

If you have an entire domain migration that you would like to smoothly migrate, you can take advantage of Amazon Route 53’s Weighted Routing (as shown in the following diagram). With Weighted Routing, you are able to have a progressive transition from your existing cluster to the new one with zero downtime by splitting the traffic at the DNS level.

Your customers are slowly being transferred to your new cluster as their cached TTL expires. The split could start with a small share of your customers, for example, 10% being pointed to the new Amazon EKS cluster and 90% still on the old one. As soon as traffic is confirmed to be working as expected on the new cluster, that percentage of clients pointed to the new one can be increased.

mydomain.com

 

This implementation is flexible, it can be tied to Load Balancers, EC2 Instances, and even to external on-premises infrastructure.

Conclusion

In this blog post, we showed how to migrate your live-traffic serving self-hosted Kubernetes Cluster to Amazon EKS. Amazon EKS offers a cost-effective and highly available option for running Kubernetes clusters in the cloud. Since Amazon EKS is upstream Kubernetes compliant, you can migrate existing self-managed Kubernetes workloads to Amazon EKS, with multiple options to minimize or avoid service disruption. To create your first Amazon EKS cluster, visit Getting started with Amazon EKS.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

 

 

Field Notes: Building an Autonomous Driving and ADAS Data Lake on AWS

Post Syndicated from Junjie Tang original https://aws.amazon.com/blogs/architecture/field-notes-building-an-autonomous-driving-and-adas-data-lake-on-aws/

Customers developing self-driving car technology are continuously challenged by the amount of data captured and created during the development lifecycle. This is accelerated by the need to design and launch incremental feature improvements on advanced driver-assistance systems (ADAS). Efforts to advance ADAS functionality have led to new approaches for storing, cataloging, and analyzing driving data captured from vehicles on the road today. Combining data from connected vehicle fleets transmitted over cellular networks in combination with manually ingesting data from vehicle data loggers requires a complex architecture and elastic data lake capability that only AWS provides.

This blog explains how to build an Autonomous Driving Data Lake using this Reference Architecture. We cover the workflow from how to ingest the data, prepare it for machine learning, catalog the output from ADAS systems and vehicle sensors, label it, automatically detect scenarios, and manage the various workflows required for moving it into an organized data lake construct. The AWS Autonomous Driving and ADAS Data Lake Reference Architecture was developed after working with numerous customers on the challenges they faced in achieving this. Using multiple AWS services and solution best practices, we outlined an approach we found helpful to others.

In the blog post, Autonomous Vehicle and ADAS development on AWS Part 1: Achieving Scale, we outlined the benefits having a data lake in the cloud, as opposed to building your own on-premises data lake solution.

Before we dive into the details of the reference architecture, let’s review the typical workflow of autonomous and ADAS development. The diagram below shows the following steps, much of which are common in any machine learning project:

  • data acquisition and ingest,
  • data processing and analytics,
  • labeling, map development,
  • model and algorithm development,
  • simulation and validation, and
  • orchestration and deployment.

This blog focuses on data ingest, data processing and analytics, labeling, and the data lake itself, as shown in the following workflow diagram:

driving dev workflow

Why build a data lake for ADAS and Autonomous Driving System development?

At AWS re:Invent 2019, BMW spoke at this session; (AUT306): Creating a data-driven, cloud-native ecosystem at BMW Group where they explained the motivation for developing an enterprise-wide cloud data lake, or the Cloud Data Hub as they call it. The Cloud Data Hub ingests information from multiple lines of business including sources like manufacturing systems, logistics, customer service, after-sales, and connected vehicle sensor telemetry data.

BMW created a global organization to drive DevOps culture with a strong emphasis on tooling, data quality, and end-to-end data lineage. Their journey began in the Hadoop eco-system (Hive, HBase) with an on-premises, heterogeneous, and difficult-to-scale environment, then moved to cloud-native building blocks with a focus on data quality. Examples of benefits derived from the AWS Cloud implementation are multi-region deployments, high security standards, and compliance with local regulations. The goal is to democratize the most valuable information assets so they can be used across a broad community globally.

The BMW Cloud Data Hub is just one example of how customers manage information assets for sharing across the enterprise. Using a similar approach, the AWS Autonomous Driving and ADAS Data Lake Reference Architecture extends the data lake pattern to address specific challenges around:

  1. metadata and the data catalog including automated scenario detection;
  2. data lineage from source to semantic layer;
  3. data sharing with external consumers and third party service providers via Amazon SageMaker Ground Truth.

 

Now let’s review the details of the Autonomous Driving Data Lake Reference Architecture:

1. Ingest data from autonomous fleet with AWS Outposts for local data processing. 

Sensor data is captured and written to data loggers containing multiple SSD hard drives. Once at the garage or customer facility, the hard drives are removed and inserted into copy stations.  From there the data is copied to Amazon S3 or to a local storage system and AWS Outposts for pre-processing. AWS Outposts is a fully managed service that extends AWS infrastructure with AWS Services like Amazon Elastic Kubernetes Service, Amazon Relational Database Service, Amazon EMR on Amazon S3 (EMR supports EMRFS and HDFS which both natively support S3). These services are used to run data integrity checks, compress the data to remove redundant information and prepare the data for downstream AD workloads.

Using AWS DataSync, the data is synchronized at high data rates and securely between on-premises network attached storage (NAS) sources and Amazon S3. This is done over a high bandwidth connection provided by AWS Direct Connect.

2. Ingest vehicle telemetry data in real time using AWS IoT Core and Amazon Kinesis Data Firehose.

Vehicle telemetry is captured and published to the cloud using a number of different technologies, typically over HTTPS or MQTT. In this architecture, AWS IoT Greengrass provides an intelligent edge runtime in the vehicle with application logic running in Lambda functions deployed locally to filter vehicle network signals like CAN data, GPS location, ADAS system output, road condition metadata derived from cameras, and other vehicle sensor information.

AWS IoT Greengrass allows customers to deploy containers, machine learning inference models, create multiple data streams and prioritize them based on business logic you define. Eventually, the data ends up in S3 to be combined with the sensor data captured from the data logger process defined in step one above.

3. Remove and transform low quality data.

Autonomous vehicles produce terabytes of data per hour. In this trove of information, there may be redundant as well as corrupted data coming from the vehicle telemetry stream and raw sensor stack.  This data needs to be normalized for optimal downstream processing. Customers use a number of technologies to do this. For example, Amazon EMR provides a runtime for high volume, complex data processing using open source Apache big data processing engines like Spark.  There are a few common steps in the data transformation including;

  • checking if the driving is complete by combining batch files and streaming data;
  • parsing the log files based on recording formats (rosbag, mdf4, etc.);
  • decoding the signals from binary formats to readable text;
  • filtering inconsistent data files; and
  • synchronize the timestamp of signals

EMR Launch is an open source framework developed by AWS Labs available for customers to accelerate and simplify defining, deploying, managing, and using Amazon EMR clusters with the following features:

  • separating the definition of cluster security configurations (EMR Profile) and cluster resource configurations (Cluster Configuration) into reusable and shareable constructs; and
  • provide a suite of tools to simplify the construction of orchestration pipelines using Amazon Step Functions (EMR Launch Function).

4. Schedule the extract, transform, load (ETL) jobs using Apache Airflow.

To create trustworthy insights and empower your next machine learning use case with a reliable data foundation you need to bring the data creation process under design control. Fundamental to this approach is a centralized, governed workflow system powered by Apache Airflow. With Airflow you can establish trust in your data processing pipelines by making the workflow part of your code base to enable transparent, repeatable, pipeline executions.

The following solution diagram shows how radar and video data processing in MDF4 format achieves the highest scalability by leveraging AWS Fargate for Amazon ECS.

  • To ensure data pipeline integrity, the solution is deployed securely in an Amazon Virtual Private Cloud with end-to-end TLS and is only accessed from a private bastion host.
  • The containers for Airflow Webserver, Scheduler and Worker are deployed in multiple Availability Zones for high availability.
  • The communication between the components are decoupled via Amazon ElastiCache for Redis.
  • The status of running jobs is stored in Amazon Aurora.
  • To learn more about how to leverage complex workflows and model training jobs on AWS, a future blog post will describe the detailed architecture of Apache Airflow.

radar and video data processing in MDF4 format

5. Enrich data with map information and weather conditions based on GPS location and timestamp.

Data-sets are enriched using Amazon EMR with map or weather information from external geospatial and weather service providers and stored in Amazon S3, or database services like Amazon DynamoDB. Sensors like cameras and LiDAR could malfunction or even fail in adverse weather conditions. Advanced sensor fusing could perform the weather perception in real-time. The result of the weather perception could be verified with the real weather conditions.

6. Extract metadata into Amazon DynamoDB and Amazon Elasticsearch Service. 

Using drive logs that contain the telemetry, telematics, perception, and sensor data, a catalog is built to create searchable quantitative metadata that includes speed, turning angles, location as well as simple and complex semantic descriptions of scene snippets such as “high velocity,” “left turn,” or “pedestrian.”  The data lake catalog is updated with scenario data and indexed in Amazon Elastic Search for discovery by analysts and ADS engineers.  The extraction process is ideally fully automated, but many higher order behavioral descriptions may require human annotations from later processing steps.

The majority of systems leverage quantitative and simple semantic descriptions for the drive data, with a clear trend towards needing higher order behaviors extracted in the data lake catalog.  These more complex behaviors are ideal for enhanced searching capabilities and better validating coverage mapping.  ASAM OpenSCENARIO defines a scenario description language that provides a common ontology and hierarchy for detailing the behavior of the vehicle and surroundings.

This standard provides an open approach for describing complex, synchronized maneuvers that involve multiple entities including other vehicles, vulnerable road users (VRUs) like pedestrians, bikers, construction workers, and other traffic participants. The description of a maneuver may be based on driver actions like performing a lane change by the ego, or based on the actions of others in the scenario such as a cut in from another driver. OpenSCENARIO also accounts for the appearance/description of the participant in the scene.

7. Store data lineage in Amazon Neptune and catalog data using AWS Glue Data Catalog.

Amazon Neptune is a fully managed graph database service. It’s useful to catalog data lineage in a graph model to visualize file and object dependencies. AWS Glue is a fully managed service that provides a data catalog making assets in the data lake discoverable. Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.

The following diagram shows how we parse data from the Ford Autonomous Vehicle Dataset in Rosbag format using Amazon EMR. We store it in Amazon S3 in parquet format, use AWS Glue Crawler to read the file schema and create the tables in the AWS Glue Data Catalog, and finally use Amazon Athena to query the velocity data.

Ford Autonomous Vehicle Dataset

8. Process drive data and perform deep signal validation. 

Deploy your drive data signal validation code in Amazon Elastic Kubernetes Service (Amazon EKS). EKS is a managed service that makes it easy for you to run Kubernetes without needing to stand up or maintain your own Kubernetes control plane. Only a subset of the signals from the Rosbag or MDF4 files will be extracted for KPI calculation and aggregation which potentially reduces the stored data volume from gigabytes to megabytes.

9. Perform automated labeling using Amazon SageMaker Ground Truth.

Amazon SageMaker Ground Truth is a fully managed data labeling service that makes it easy to build highly accurate training datasets for machine learning. Ground Truth offers automatic data labeling and/or annotation which uses a machine learning model to label your data.

In addition, the service helps you create custom workflows for data labeling that leverage human workers from Amazon Mechanical Turk, the AWS Partner Network, or your own private workforce to improve the automated labeling accuracy. Ground Truth now supports 3D Point Cloud Labeling for task types like Object Detection, Object Tracking and Semantic Segmentation. This blog shows how to use the service with open data sets from Audi A2D2 and KITTI. Alternatively, customers can run a custom container on Amazon EKS for Ground Truth generation and labeling.

10. Provide a search function for particular scenarios using AWS AppSync.

Developers and data scientists can search for a particular scenario and all of the associated metadata related to it. AWS AppSync is a managed service that uses GraphQL to make it easy for applications to get data from a range of data sources such as Amazon DynamoDB, Amazon ES and AWS Lambda.

Additional Aspects to consider

  • China: The collection of raw data that includes video, lidar, radar, and GPS data is defined as a controlled activity by the government (geographic information surveying and mapping). This is a regulated activity and must be done under the governance of local certified map providers with navigation surveying licenses.
  • Data Encryption and Anonymization: Some ADS/ADAS use cases include sensitive or personal information. AWS Key Management Service (KMS) supports Customer Master Keys (CMKs). Vehicle Identification Numbers (VIN) can be anonymized by Amazon EMR jobs. This blog shows how to anonymize personal data like faces from the video using Amazon Rekognition.
  • Exchange Data with partners: AWS Data Exchange is a service that makes it easy for AWS customers to securely exchange file-based data sets in the AWS Cloud. Providers in AWS Data Exchange have a secure, transparent, and reliable channel to reach AWS customers and grant existing customers their subscriptions more efficiently.
  • Data Lake as code: AWS provides a full stack of DevOps tooling including AWS CodeCommit, AWS CodeBuild and AWS CodePipeline to simplify the provisioning and management infrastructure, to deploy application code, automate software release processes, and monitor application and infrastructure performance. Third party CI/CD tools like Jenkins and Zuul can be integrated as well.

Conclusion

In this post, we discussed the steps outlined in this Reference architecture to build an Autonomous Driving and ADAS Data Lake on AWS. We hope you found this interesting and helpful and invite your comments on the architecture.

Also, check out the Automotive issue of the AWS Architecture Monthly Magazine.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Implementing Hardware-in-the-Loop for Autonomous Driving Development on AWS

Post Syndicated from Bryan Berezdivin original https://aws.amazon.com/blogs/architecture/field-notes-implementing-hardware-in-the-loop-for-autonomous-driving-development-on-aws/

Automotive customers use AWS as their platform for advanced driving assistance systems (ADAS) and autonomous driving (AD) development to accelerate their development cycles and experience faster time-to-market.  In the blog post, Autonomous Vehicle and ADAS development on AWS Part 1: Achieving Scale, we illustrated how software in the loop (SiL) and hardware in the loop (HiL) simulations are part of the workflow used to develop and validate safe AD and ADAS functionality. In this post, I run through some of the more common questions and patterns for implementing HiL on AWS, while looking at some of the differences from running your development all on-premises.

HiL simulations leverage test drive data and derived synthetic data to develop and validate various functions in the AD software stack.  Test drive log data is ingested and stored in Amazon S3 for use for HiL simulations in parallel with other AD development workloads including visualization, processing, labeling, analysis, and model and algorithm development.

As such, we see customers exist in a hybrid context with their HiL workloads running on-premises to support customized equipment. For ADAS and ADS customers, this poses a few questions and considerations:

  • What are the recommendations to deploy HiL for AD on AWS?
  • How is this different from what customers were used to on-premises?

For hybrid customers, there are assumptions and misconceptions:

  • Do I need to replicate the test drive data locally, and if so, what are the considerations and consequences?

For the purposes of brevity, the remainder of this blog post will use the term AD development to encompass ADAS and AD unless specifically called out.

HiL Building Blocks

Simulations and validations make up an important aspect of AD development. According to Rand’s analysis, there is a need to demonstrate safe driving on billions of miles for an autonomous vehicle to have a lower failure rate than a human driver. While this analysis is statistically derived for fully autonomous driving (SAE Level 5), it demonstrates a need for further validation on millions of miles. This pattern is reflected in most ADAS and AD development projects, where software in the loop (SiL) and HiL are used for verification and validation.

The following diagram is an illustration of the ISO 26262 V-Model, a product development approach for matching requirements with corresponding tests, where HiL simulation is required for much of system level testing and validation phases on the right side of the overlaid V-Model.

Figure 1: V-Model as defined by ISO-26262

Figure 1: V-Model as defined by ISO-26262

HiL simulations require a few key elements. The main component is the device under test (DUT), such as one or more electronic control units (ECUs) running the AD software stack. HiL simulations allow customers to put the device under test (DUT) under the rigor of real-world signals found in a vehicle. By providing more accurate environments and scenarios for the DUT, it can be fine-tuned for key performance indicators (KPIs) such as power utilization, response time, and accuracy.

The DUT is connected to a “HIL Rig,” a high performance server with multiple expansion boards to connect to various components of the AD system. The various interfaces are identified in Figure 2:  High Level Hardware-in-the-loop (HiL) System and Interfaces and include Controller Area Network (CAN), Automotive Ethernet, Low Voltage Differential Signal (LVDS), and PCIExpress. These interfaces emulate the vehicular topology for testing purposes and allow system level validation of the DUT.

The HiL Rigs and the corresponding software tooling are offered by companies like Elektrobit, dSpace, National Instruments, and Opal-RT. The HiL systems facilitate time synchronized inputs and outputs to the DUT and measures system performance. These solutions have optimizations for large-scale operations aligned with faster validation cycles for customers. This latter point is relevant when a project requires validation across a large number of miles. Elektrobit provides the ability to orchestrate and deploy large HiL server farms that can be deployed to work in parallel. The larger server farms allow parallel HiL simulations to reduce the time to validate thousands of miles of drive time and assess the key performance indicators (KPIs) for the feature sets. Results can be acted on more quickly and reduce the overall development time.

Figure 2: High Level Hardware-in-the-loop (HiL) System and Interfaces

Figure 2: High Level Hardware-in-the-loop (HiL) System and Interfaces

The HIL system loads sensor data derived from test drive logs. These logs vary in format, but often are captured and stored as MDF4, ADTF, rosbag, or other data logger proprietary formats. These are then processed for HIL simulations to implement open-loop and closed-loop simulations.

  • Open loop simulations refer to replay of log data from test drives.
  • Closed-loop simulations rely on the behavior of the system as inputs vary based on new outputs of the simulation.

Both open loop and closed loop simulations are part of autonomous driving development, but open-loop simulations require the largest datasets (multiple petabytes on average) due to reliance on the log data from test drives making them a primary concern for deploying HiL in a hybrid manner.

Overview of Solution

Architectures for supporting HIL simulations with AWS for AD development vary based primarily on the networking available at HiL locations. A common pattern for AWS customers is to have the HiL systems directly interfacing with Amazon S3 over high-bandwidth network links leveraging AWS Direct Connect. This is the simplest approach to deploying HiL and avoids hybrid data management of the petabytes of data in Amazon S3 to a local storage system.

AWS Direct Connect provides customers options to deploy their HIL rigs at their data center or in AWS colocation facilities with low latency connections. AWS has the largest number of Direct Connection locations and points-of-presence (POPs) to enable low latency connectivity to any of the >24 AWS Regions. The following diagram illustrates a reference architecture leveraging a direct interface from the HiL systems and Amazon S3.

Figure 3 : Reference Architecture for Hardware-in-the-Loop (HiL) Direct to Amazon S3

As shown in Figure 3, we illustrate the common interfaces, topology, and AWS services used for autonomous driving customers.

  • Amazon S3 is used to store and analyze the test drive logs used by the HiL simulations and also the results from the simulation runs for further analysis.
  • Metadata of test drive data is populated in various database and analytics services, referred to as the data catalogues, with metadata crawlers and processing pipelines that extract from the drive log and test result data on Amazon S3.
  • The data catalogues provide flexible search interfaces for developers and validation engineers or advanced analytics tools. These systems provide keyword search in Amazon Elasticsearch or SQL queries in Amazon Redshift or Amazon RDS and noSQL interfaces using Amazon DynamoDB. Amazon Partner Network solutions for these database and analysis tools are common as well, such as those in AWS Marketplace.
  • Validation engineer, data scientists, or developers use these data catalogues to find scenarios for testing. These personas also use the HiL management interfaces to configure and orchestrate the HiL simulation runs on the scenarios identified and ensure traceability.
  • HiL management systems control the HiL Rigs that interface to the DUT and implement the HiL simulations using the test drive logs. The HiL management system then writes results back to S3 for further analysis via various tool chains.

A common question AWS customers have is how to determine an optimal hybrid architecture using this approach. The primary factors are properly sized network links to accommodate data sets used by the HiL simulations as well as low latency network links between Amazon S3 and the HiL rigs. As a result, a key factor is ensuring use of an AWS region for your AWS storage that is in close proximity to your HiL testing site(s).

Based on current HiL implementations, open-loop simulations can sustain latencies of 30-50 ms RTT. AWS has numerous AWS Direct Connect locations in co-location facilities with latencies <5ms RTT. Sizing for these network links can be calculated based on the expected dataset sizes and the interval of time targeted for simulation run. We show a basic formula used for network sizing.

Average_Throughput (Gbps) = Average_Dataset_Size(GB)*8 / Time_Interval (seconds)

As an example, for a scenario where an average of 20PB is needed by the HIL rig every 2 weeks, we require ~200Gbps for the AWS Direct Connect bandwidth.

Figure 4 shows an example of a high-level architecture supported by Elektrobit with multiple EB 9101 test racks grouped together. This architecture supports multiple ECUs to be tested at once, leveraging drive log data in Amazon S3. This system is controlled with a central management software that allows optimal orchestration to keep the Elektrobit HiL system running optimally.

Use cases include:

  • The automated replay of all relevant sensor data with high time precision to ECU
  • Capture of ECU responses including debug data
  • Integration of customer components inside the HiL rack for visualization or post-processing.

Another common question from AWS customers is whether this architecture is supported for their HiL implementation. Many HiL providers are adding AWS functionality to their software and hardware stacks in response to customers transitioning to cloud for the development platforms.  Some vendors still require Amazon S3 as a supported interface in their HiL Rigs. The work needed to accommodate Amazon S3 is usually a small level of effort for any developer by using Amazon SDKs on the HiL rig software stack. If there is a project where this is needed, contact the AWS account teams and your HiL vendors to ensure a successful and cost efficient project implementation.

Figure 4: Elektrobit HiL Architecture with AWS

Figure 4: Elektrobit HiL Architecture with AWS

An alternative HiL solution shown in Figure 5 includes Amazon S3 as the primary storage for drive log data and the scale out NAS storage system is located on-premises operating as a cache for the HIL rigs. This is common when the networking options at the HIL site are limited in bandwidth or latency to handle the target datasets and time windows.

AWS customers calculate the size of the cache to transfer the entire dataset over the intended time interval. Following is a simple calculation to demonstrate this.

Cache_Size(GB) = Average_Dataset_Size(GB) - Average_Throughput (Gbps) /8 * Time_Interval (seconds)

In this example, a customer has 40 Gbps AWS Direct Connect available and a 10PB dataset needed for HIL simulations every 2 weeks. Using the preceding formula there is a need for local cache of four PB capable of high read-rates.

Figure 5: Reference Architecture for Hardware-in-the Loop (HiL) with Local Cache to Amazon S3

Figure 5: Reference Architecture for Hardware-in-the Loop (HiL) with Local Cache to Amazon S3

In this hybrid architecture there is a need to orchestrate the data movement in line with the needs of the HIL simulation data set. This requires third party software generally or built in functionality into workflow orchestration tools like Apache Airflow. At CES 2019, Dell EMC and AWS illustrated a solution for this hybrid architecture documented in this short solution brief using Isilon as the scale out NAS storage system and DataIQ as the data movement and orchestration mechanism.

Any of these architectures can be cost-optimized, and AWS has programs and pricing options for Amazon Direct Connect as well as the other AWS services involved. There are Enterprise Agreements and Migration Acceleration Programs (MAP)  in line with the holistic AD development platform needs, that reduce the costs for hybrid architecture functionality needed in the HiL solutions. One common need is support for AWS Direct Connect “flat rate” pricing option to accommodate the data transfer out (DTO) needs for the HiL workload. If you need details on these programs for your AD development project, contact your AWS account team.

Conclusion

In this blog post, we discussed two common architectural patterns for supporting HiL simulations for ADAS and Autonomous Driving development. These help customers decide on the right networking, storage, and hybrid topologies for these systems.

HiL systems directly interfacing with Amazon S3 is the most common pattern as you see with Elektrobit HiL solutions, but for customers with limited network links the use of a local cache is an option. Autonomous driving customers looking to increase velocity in their SAE Level 2-5 development programs with HiL simulations have achieved success with AWS as the development platform using these patterns. AWS has a team dedicated to autonomous driving, so contact your AWS account team to get a more prescriptive solution for your HiL or related ADAS and AD development needs.

Also, check out the Automotive issue of the AWS Architecture Monthly Magazine.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Powering the Connected Vehicle with Amazon Alexa

Post Syndicated from Amit Kumar original https://aws.amazon.com/blogs/architecture/field-notes-powering-the-connected-vehicle-with-amazon-alexa/

Alexa has improved the in-home experience and has potential to greatly enhance the in-car experience. This blog is a continuation of my previous blog: Field Notes: Implementing a Digital Shadow of a Connected Vehicle with AWS IoT. Multiple OEMs (Original Equipment Manufacturers) have showcased this capability during CES 2020. Use cases include; a person seating at the rear seat can play a song, control HVAC (Heating, ventilation, and air conditioning), pay for gas/coffee, all while using Alexa. In this blog, I cover how you create a connected vehicle using Alexa, to initiate a command, such as; ‘Alexa, open my trunk’.

Solution Architecture

“Alexa, open my trunk”

The preceding architecture shows a message flowing in the following example:

  1. A user of a connected vehicle wants to open their trunk using an Alexa voice command. Alexa will identify the right intent based on utterances and invoke a Lambda function. The Lambda function updates the device shadow with (desired {““trunk””: ““open””}).
  2. Vehicle TCU registered the callback function shadowRegisterDeltaCallback(). Listen to delta topics for the device shadow by subscribing to delta topics. Whenever there is a difference between the desired and reported state, the registered callback will be called.  The delta payload will be available in the callback. Update performed in #1 will be received in delta callback.
  3. Now, the vehicle must act on the desired state. In this case, it acts on the trunk status change. After performing the required action for the trunk change, the vehicle TCU will update the device shadow with the reported state (reported : { “trunk”: “open”} )
  4. The web/mobile app subscribed to the topic $aws/things/tcu/shadow/update/accepted”. Therefore, as soon as the vehicle TCU updates the shadow, the Web/Mobile app received the update and synchronized the UI state.

As part of the previous blog, we implemented #2, #3 and #4. Lets implement #1 and incorporate into the solution.

The source code (vehicle-command) of this blog is available in this code repository.

The Alexa voice command required the implementation of three key areas:

  1. Configure Alexa – which will listen to utterances and identify the right intent and invoke a Lambda function.
  2. Set up the Lambda function – which will interpret the command and invoke the AWS IoT Core device shadow API.
  3. Handle Command at Vehicle tcu and App – Vehicle tcu must register shadowRegisterDeltaCallback so any update in the device shadow will receive a call message to perform the  actual command by the vehicle and synchronize the state with a web/mobile app.

Let’s ‘Open a trunk’ using Alexa voice command. First set up the environment:

  • Open AWS Cloud9 IDE created in an earlier lab and run the following command:

Set up permanent credentials. Note: Alexa doesn’t work with temporary credentials.  Configure it with permanent credentials for ASK command line interface (CLI).

  1. Open Cloud9 Preferences by clicking AWS Cloud9 > Preference or  by clicking on the “gear” icon in the upper right corner of the  Cloud9 window
  2. Select “AWS  Settings”
  3. Disable “AWS  managed temporary credentials”
  4. $ aws  configure
  5. Enter the Access Key  and Secret Access Key of a user that has required access credentials
  6. Use us-east-1 as the region. It will store in ~/.aws/config

Verify that everything worked by examining the file ~/.aws/credentials. It should resemble the following:

[default]
 aws_access_key_id = <access_key>
 aws_secret_access_key = <secrect_key>
 aws_session_token=

*Remove aws_session_token line from credentials file.

Next, install the Alexa CLI:

$ npm install ask-cli --global

Initialize ASK CLI by issuing the following command. This will initialize the ASK CLI with a profile associated with your Amazon developer credentials.

$ ask configure --no-browser

Check you are linking AWS account with Alexa:

Do you want to link your AWS account in order to host your Alexa skills? Yes

#At the end output should look as follows:

------------------------- Initialization Complete -------------------------
Here is the summary for the profile setup:
ASK Profile: default
AWS Profile: default
Vendor ID: MXXXXXXXXXX

As part of the previous blog, you have already cloned the following git repository in AWS Cloud9 IDE. It has a baseline code to jump start.

$ git clone

Configure Alexa Skills

The Alexa Developer console GUI can be used but we are doing it programmatically so it can be done at scale and allows versioning.

1. Open connected-vehicle-lab/vehicle-command/skill-package/skill.json . We have 2 locale en-US, en-IN are defined in the base code for Alexa command. Let’s add en-GB locale in the json file located at “manifest”/”publishingInformation”/”locales”.  Similarly, you can add locale for your preferred language:

"en-GB": {
"name": "vehicle-command",
"summary": "Control Vehicle using voice command",
"description": "Allow you to control vehicle using voice command",
"examplePhrases": [
    "Alexa open genie",
    "ask genie to lower window",
    "window up"
    ],
"keywords": []
}

If you are inserting into the middle then make sure it is separated by a comma.

2. Let’s create a copy of models connected-vehicle-lab/vehicle-command/skill-package/interactionModels/custom/en-US.json and rename it to en-GB.json and add our intent

  • We have “invocationName”: “genie”.  Here, we  are using “genie” as a command to invoke our Alexa skill. You  can change if needed
  • The key elements in this json file is intent, slots, sample utterance and slot types. Let’s define the  slot types t_action_type for ‘open’, ‘close’, ‘lock’, ‘unlock’. under “types”: [].
        {
        "name": "t_action_type",
        "values": [
            {
                "name": {
                "value": "unlock"
                }
            },
            {
                "name": {
                "value": "lock"
                }
            },
            {
                "name": {
                "value": "close"
                }
            },
            {
                "name": {
                "value": "open"
                }
            }
          ]
        }
  • Let’s add intent under “intents”: [] for trunk  ‘TrunkCommandIntent’ and define the sample utterance speech like ‘lock my trunk’,  ‘open trunk’. We are using slot types to simplify the utterance and  understand the operation requested by a user.
        {
            "name": "TrunkCommandIntent",
            "slots": [
            {
                "name": "t_action",
                "type": "t_action_type"
            }
            ],
            "samples": [
                "{t_action} trunk",
                "trunk {t_action}",
                "{t_action} my trunk",
                "{t_action} trunk"
            ]
}
  • Now add the same intent, slots, slot type and sample utterances  for other locales files (en-US.json and en-IN.json) as well.

3. Let’s add response message under languageString.js (available at /connected-vehicle-lab/vehicle-command/lambda/custom).

TRUNK_OPEN: 'Trunk Open',
TRUNK_CLOSE: 'Trunk Close' 

If you are inserting into the middle then make sure it is separated by a comma.

Set up the Lambda function

1. Add a Lambda function which will get invoked by Alexa. This Lambda function will handle  the intent and invoke IoT Core Device Shadow API and execute the actual command of ‘Trunk open/unlock or lock/close’.

  • Open /connected-vehicle-lab/vehicle-command/lambda/custom/index.js  and add our TrunkCommandIntent
const TrunkCommandIntentHandler = {
                canHandle(handlerInput) {
                return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
                && Alexa.getIntentName(handlerInput.requestEnvelope) === 'TrunkCommandIntent';
                },
                    handle(handlerInput) {
                    var t_action_value = handlerInput.requestEnvelope.request.intent.slots.t_action.value;
                    console.log(t_action_value);
                    var speakOutput;
                    const obj = "trunk";
                    if (t_action_value == "lock" || t_action_value == "open")
                    {
                        updateDeviceShadow(obj, "open");
                        speakOutput = handlerInput.t('TRUNK_OPEN')
                    }
                    else 
                    {
                        updateDeviceShadow(obj, "close");
                        speakOutput = handlerInput.t('TRUNK_CLOSE')
                    } 
                    console.log(speakOutput);
                    return handlerInput.responseBuilder
                    .speak(speakOutput)
                    //.reprompt('add a reprompt if you want to keep the session open for the user to respond')
                    .getResponse();
                }
            };
  • We have  UpdateDeviceShadow(“vehicle_part”, “command”) function  which actually invokes the IoT core Device Shadow API
 function updateDeviceShadow (obj, command)
    {
        shadowMessage.state.desired[obj] = command;
        var iotdata = new AWS.IotData({endpoint: ioT_EndPoint});
        var params = {
        payload: JSON.stringify(shadowMessage) , /* required */
        thingName: deviceName /* required */ 
        };
        iotdata.updateThingShadow(params, function(err, data) {
            if (err) 
            console.log(err, err.stack); // an error occurred
            else 
            console.log(data); 
            //reset the shadow 
            shadowMessage.state.desired = {}
        });
} 

2. Update the value of ioT_EndPoint from AWS IoT Core > Settings > Custom Endpoint

3.  Add Trunk CommandIntent in request handler

exports.handler = Alexa.SkillBuilders.custom()
    .addRequestHandlers(
        LaunchRequestHandler,
        WindowCommandIntentHandler,
        DoorCommandIntentHandler,
        TrunkCommandIntentHandler,

4. Deploy Alexa Skills

$ cd ~/environment/connected-vehicle-lab/vehicle-command
$ ask deploy 

Handle Command at Vehicle tcu and App

For more detail on this section, refer to part 1 of this blog: Field Notes: Implementing a Digital Shadow of a Connected Vehicle with AWS IoT.

@ Vehicle tcu – tcuShadowRead.py has trunk_handle() function to receive a message from device shadow

def trunk_handle(status):
  if status is not None:
    shadowClient.reportedShadowMessage['state']['reported']['trunk'] = status
    print ('Perform action on trunk status change : ' + str(status))

@web App – demo-car/js/websocket.js has handleTrunkCommand function receive callback message as soon any update happened on Device Shadow

//this function will be called by onMessageArrive
function handleTrunkCommand(trunkStatus) {
    obj = document.getElementsByClassName("action trunk")[0];
    obj.checked = trunkStatus == "open" ? true : false;
    console.log(obj.getAttribute("data-text") + " : " + obj.checked);
}

demo-car/js/demo-car.js has handleTrunkCommand function to handle UI input and invoke IoT Core Device Gateway API to update the desired state.

//this function will be called when user will click on trunk checkbox
    handleTrunkCommand: function(obj) {
        obj.checked ? demoCar.shadowMessage.state.desired.trunk = "open" : demoCar.shadowMessage.state.desired.trunk = "close";
        console.log(obj.getAttribute("data-text") + " : " + demoCar.shadowMessage.state.desired.trunk);
        demoCar.accessIoTDevice();
    },

Use Alexa skill to invoke a command

Let’s test or command ‘Alexa, open my trunk’. We can use a command line and execute:

$ask dialog --locale "en-GB" 

Using Alexa GUI, provides an interesting visualization, as shown in the following screenshot.

  1. Open the Alexa GUI,  Select ‘vehicle command’ skill and select test tab. Allow “developer.amazon.com” to use your microphone?
  2. Open a demo.html web app side by side of the Alexa GUI to check an actual operation happened at the Vehicle tcu and synchronize the  status with virtual car model.
  3. Now test the Alexa skill. You can use an audio command as well. You can ask or write ‘ask genie’.

Alexa developer console

Clean Up

What a fun exploration this has been! Now clean up AWS resources created for this and the previous post to avoid incurring any future AWS services costs. Resources created by CDK can be deleted by deleting the stack on the CloudFormation console. Resources created manually need to be deleted individually.

Conclusion

In this blog post, I showed how you can enable voice command for a connected vehicle and enhance in-vehicle user experience.  Similarly, you can also extend this solution for the use cases like Alexa ‘open my garage’. AWS IoT Core Device Shadow API does all the heavy-lifting in this case. Any update in device shadow allows both device and user application to act. Alexa skill is acting as an interface to capture the user command and invoke the lambda function.

Since these are all serverless services, that means this implementation can scale without making any change in the application and you only pay when someone invokes a command. Creating an engaging, high-quality interaction with Alexa in the vehicle is critical. You can refer to Alexa Automotive Documentation for an Alexa Built-in automotive experience.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

 

Field Notes: Implementing a Digital Shadow of a Connected Vehicle with AWS IoT

Post Syndicated from Amit Sinha original https://aws.amazon.com/blogs/architecture/field-notes-implementing-a-digital-shadow-of-a-connected-vehicle-with-aws-iot/

Innovations in connected vehicle technology are expected to improve the quality and speed of vehicle communications and create a safer driving experience. As connected vehicles are becoming part of the mainstream, OEMs (Original Equipment Manufacturers) are broadening the capabilities of their products and dramatically improving the in-vehicle experience for customers.

An important feature in a connected vehicle is its ability to execute a remote command and synchronize the state of the vehicle between a web/mobile app in real time.

This blog demonstrates how to:

  • secure two-way communication between a device (vehicle telematics control unit) and the AWS Cloud using AWS IoT
  • execute command at vehicle
  • execute a remote command
  • and test with a vehicle virtual model

You can watch a quick animation of a remote command execution in the following GIF:

Animated car GIF

Solution Overview

In a traditional connected vehicle approach, there are many processes running on multiple servers. These processes are subscribing to one another, coordinating with each other, and polling for an update. This makes scalability and availability a challenge. We use AWS IoT Core and AWS IoT Device Shadow service as primary components for this solution.

This solution has three building blocks:

  1. a vehicle TCU (telematics control unit),
  2. the AWS Cloud (with connection via AWS IoT Core) and
  3. a virtual Model (e.g.; web/mobile app to send/receive commands to TCU). These three building blocks together reflect the current state of a vehicle.

Alexa Solution Overview

The previous diagram shows a message flowing in the following example:

  1. A user of a connected vehicle wants to open their door using a web/mobile app. The app updates the device shadow with (desired {““door””: ““open””}). The app will always request the vehicle to execute the command; therefore, it will always update the device shadow with the desired state.
  2. Vehicle TCU registered the callback function shadowRegisterDeltaCallback(). Listen on delta topics for the device shadow by subscribing to delta topics. Whenever there is a difference between the desired and reported state, the registered callback is called and the delta payload will be available in the callback. Update performed in #1 will received in delta callback.
  3. Now, the vehicle needs to act on the desired state. In this case, ‘act on’ is the door status change. After performing the required action for the door change, the vehicle TCU will update the device shadow with the reported state (reported : { “door”: “open”} )
  4. Now, the vehicle is closing the door. The vehicle will always perform the action; therefore, it will always update device shadow with reports state (reported: {“door” : “close”})
  5. The Web/Mobile app subscribed to topic $aws/things/tcu/shadow/update/accepted”. Therefore, as soon as the vehicle TCU updates the shadow, the Web/Mobile app received the update and synchronized the UI state.
  6. You can also build an Amazon Alexa skill to control your vehicle (“Alexa, raise my window”). After identifying the utterance, Alexa can invoke the Lambda function to update the device shadow and perform the requested action.

Note: For the Web/Mobile app developments for production, it is recommended to use AWS AppSync and AWS Amplify SDK for building a flexible and decoupled application from the API. Refer to this code sample for more detail.

Implementation

First, you need to set up the code. Refer to the directions in this code sample.

Create device

In AWS IoT Core, name a device ‘TCU’ (created by connected-vehicle-app-cdk-stack). Create a new certificate (download files) and attach the policy generated by cdk.

create a certificate

Next, deploy the certificate key and pem file on your device so it can connect with the AWS Cloud using the X.509 certificate. For more detail, refer to the directions in the code sample.

Execute Command at Vehicle

AWS IoT Device Shadow is an important feature of AWS IoT core for remote command execution because it allows you to decouple the vehicle and the app which controls and commands the vehicle. A device’s shadow is a JSON document that is used to store and retrieve current state information for a device. Primarily we use state.desired and state.reported. properties of a device’s shadow document.

The device shadow (Device SKD and APIs) enables applications to interact with devices even when they are offline and allow:

  • Cloud representation of device state
  • Query last known state for offline devices
  • Real-time state changes
  • Track last known device state
  • Control devices via change of state
  • Automatic synchronization once devices connect to the cloud
  • APIs for applications to discover and interact with devices

The rich features of a device shadow allows the app to interact with the vehicle TCU even when there is no connectivity. Once connectivity is established, the device gateway pushes the changes to device and vice versa.

We need to deploy a program (tcuShadowWrite.py) on the vehicle TCU device to update the device shadow and send the update to the AWS Cloud. This program is available in this code repository.

Let’s assume that after reaching their home, the vehicle’s user closes the door, switches off the headlights, and rolls up the windows. The same state of the vehicle should be reflected on their web/mobile app in real time. The vehicle TCU has to update the “reported” state in the device shadow JSON document.

shadow message

AWSIoTMQTTShadowClient library has a method called shadowUpdate that needs to be called from the vehicle TCU to update the device shadow. Essentially, it is publishing the shadow reported state on topic $aws/things/<thingName>/shadow/update.

If you run tcuShadowWrite.py script, you should be able to see the output as described in the following image.

tcushadowscript

  • Open the AWS IoT Core console.
  • Select Manage -> Things -> Select tcu, and then choose Shadow. You should be able to see the shadow message sent from the device described in the following image.

shadow document

Execute Remote Command

We need to deploy a program (tcuShadowRead.py) on the vehicle TCU to receive updates from the AWS Cloud. It is available in this code sample.

Let’s assume the vehicle owner uses the mobile app to open the door, switch on the headlight and roll down the windows. The vehicle TCU should receive this command and instruct the Electronic Control Unit (ECU) to execute the command. The web/mobile app will update the “desire” state in the device shadow JSON document.

shadow message2

In tcuShadowRead.py, AWSIoTMQTTShadowClient has a method shadowRegisterDeltaCallback. It listens on delta topics for this device shadow by subscribing to delta topics. Whenever there is a difference between the desired and reported state, the registered callback is called and the delta payload will be available in the callback.

callback

The callback function has a code to handle the state change request. In an actual implementation, a function like door_handle() would be calling the ECU to execute the door open command.

door open command

If you make changes in Device Shadow on AWS IoT for the tcu device, you should receive the output in the following image.

Device shadow

Test with a Virtual Vehicle Model

To help you test this solution, you can deploy the virtual vehicle model shown in the following image. Detailed steps for the deployment of the virtual vehicle is available in this code sample.

virtual vehicle model

Any changes in the model state should be reflected on the virtual demo vehicle and vice versa.

Here, we use open-source Paho-mqtt library.  and Developers can use this to write JavaScript applications that access AWS IoT using MQTT or MQTT over the WebSocket protocol without using AWS IoT SDK. This implementation is made simpler by using AWS IoT Device SDK for JavaScript v2 Readme.

Review the JavaScript file named webSocketApp.js:

websocket app

  • onMessageArrived() function will be invoked whenever the device will change the shadow state.
  • handle<object>Command functions (such as handleDoorCommand) will be called with the current state.  Call this function if the device has received any status change.

We have another JavaScript file demo-car.js in the demo-car folder. This includes the functions that our simulated vehicle will use in order to change the device shadow.

Let’s review the following code:

democar javascript

  • We have 3 handle command function defined (e.g., handleDoorCommand) to take the user’s input and access AWS IoT Core services.
  • connectDevice is an actual function to invoke updateThingsShadow function to send the desired state
  • accessIoTDevice uses Amazon Cognito Identity to get the authenticated identities to access AWS IoT Core securely without exposing the access key or secret key.

Now, keep demo.html side by side to your code and run the tcuShadowRead.py script. Any change made at the virtual model will reflect at the command output. Similarly, any change made by tcuShadowWrite.py will reflect the state update on the virtual model.

Conclusion

In this blog, we showed how to implement a digital shadow of a connected vehicle using AWS IoT. This solution removes complexity from running multiple processes in parallel and ensures a successful outcome. AWS IoT Core enables scalable, secure, low-latency, low-overhead, bi-directional communication between connected devices, tolerate and recover from slow/brittle connection, the AWS Cloud and customer-facing applications.

The Device Shadow in AWS IoT Core enables the AWS Cloud and applications to easily and accurately receive data from connected vehicles and send commands to the vehicles. The Device Shadow’s uniform and always-available interface simplifies the implementation of time-sensitive use cases. These include, remote command execution and two-way state synchronization between a device and app where the cloud is acting as a broker. This solution enables you to shift operational responsibilities of a connected vehicle infrastructure to the AWS Cloud while paying only for what you use, with no minimum fees or mandatory service usage.

For more information about how AWS can help you build connected vehicle solutions, refer to the AWS Connected Vehicle solution page.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Gaining Insights into Labeling Jobs for Machine Learning

Post Syndicated from Michael Graumann original https://aws.amazon.com/blogs/architecture/field-notes-gaining-insights-into-labeling-jobs-for-machine-learning/

In an era where more and more data is generated, it becomes critical for businesses to derive value from it. With the help of supervised learning, it is possible to generate models to automatically make predictions or decisions by leveraging historical data. For example, image recognition for self-driving cars, predicting anomalies on X-rays, fraud detection in finance and more. With supervised learning, these models learn from labeled data. The success of those models is highly dependent on readily available, high quality labeled data.

However, you might encounter cases where a high percentage of your pre-existing data is unlabeled. In these situations, providing correct labeling to previously unlabeled data points would directly translate to higher model accuracy.

Amazon SageMaker Ground Truth helps you with exactly that. It lets you build highly accurate training datasets for machine learning quickly. SageMaker Ground Truth provides your labelers with built-in workflows and interfaces for common labeling tasks. This process could take several hours or more depending on the size of your unlabeled dataset, and you might have a need to track the progress easily, preferably in the form of a dashboard.

In this blogpost we show how to gain deep insights into the progress of labeling and the performance of the workers by using Amazon Athena and Amazon QuickSight. We use Amazon Athena former to set up several views with specific insights into the labeling progress. Finally we will reference these views in Amazon QuickSight to visualize the data in a dashboard.

This approach also works for combining multiple AWS services in general. AWS provides many building blocks than you can mix-and-match to create a unique, integrated solution with cohesive insights. In this blog post we use data produced by one service (Ground Truth), prepare it with another (Athena) and visualize with a third (QuickSight). The following diagram shows this architecture.

Solution Architecture

ML Solution Architecture

Mapping a JSON structure to a table structure

Ground Truth creates several directories in your Amazon S3 output path. These directories contain the results of your labeling job and other artifacts of the job. The top-level directory for a labeling job has the same name as your labeling job, while the output directories are placed inside it. We will create all insights from what SageMaker Ground Truth calls worker responses.

All respective JSON files reside in the path s3://bucket/<job-name>/annotations/worker-response/.

To analyze the labeling data with Amazon Athena we need to understand the structure of the underlying JSON files. Let’s review the example below. For each item that was labeled, we see the label itself, followed by the submission time and a workerId pointing to an identity. This identity lives in Amazon Cognito, a fully managed service that provides the user directory for our labelers.

{
    "answers": [
        {
            "answerContent": {
                "crowd-classifier": {
                    "label": "Compute"
                }
            },
            "submissionTime": "2020-03-27T10:31:04.210Z",
            "workerId": "private.eu-west-1.1111111111111111",
            "workerMetadata": {
                "identityData": {
                    "identityProviderType": "Cognito",
                    "issuer": "https://cognito-idp.eu-west-1.amazonaws.com/eu-west-1_111111111",
                    "sub": "11111111-1111-1111-1111-111111111111"
                }
            }
        },
        ...
    ]
}

Although the data is stored in Amazon S3 object storage, we are able to use SQL to access the data by using Amazon Athena. Since we now understand the JSON structure from shown in the preceding code, we use Athena and define how to interpret the data that is relevant to us. We do so by first creating a database using the Athena Query Editor:

CREATE DATABASE analyze_labels_db;

Once inside the database, we add the table schema. The actual files remain on Amazon S3, but using the metadata catalog, Athena then knows where the data lies and how to interpret it. The AWS Glue Data Catalog is a central repository to store structural and operational metadata for all your data assets. For a given dataset, you can store its table definition, physical location, add business relevant attributes, in addition to track how this data has changed over time. Besides, Athena the AWS Glue Data Catalog also provides out-of-box integration with Amazon EMR and Amazon Redshift Spectrum. Once you add your table definitions to the Glue Data Catalog, they are available for ETL. They are also readily available for querying in Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum so that you can have a common view of your data between these services.

When going from JSON to SQL, we are crossing format boundaries. To further facilitate how to read the JSON formatted data we are using SerDe Properties to replace the hyphen in crowd-classifier with an underscore due to DDL constraints. Finally we point the location to our Amazon S3 bucket containing the single worker responses. Recognize in the following script that we translate the nested structure of the JSON file itself into a hierarchical, nested data structure in the schema definition. Also, we could leave out the workerMetadata as we don’t need it at this time. The data would still stay in the files on Amazon S3, so that we could later change and add the workerMetadata STRUCT into the table definition for our analysis.

CREATE EXTERNAL TABLE annotations_raw (
  answers array<
    struct<answercontent: 
      struct<crowd_classifier: 
        struct<label: string>
      >,
      submissionTime: string,
      workerId: string,
      workerMetadata: 
        struct<identityData: 
          struct<identityProviderType: string, issuer: string, sub: string>
        >
    >
  >
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  "mapping.crowd_classifier"="crowd-classifier" 
) 
LOCATION 's3://<YOUR_BUCKET>/<JOB_NAME>/annotations/worker-response/'

Creating Views in Athena

Now, we have nested data in our annotations_raw table. For many use cases, especially for analytical uses, representing data in a tabular fashion—as rows—is more natural. This is also the standard way when using SQL and business intelligence tools. To unnest the hierarchical data into flattened rows, we create the following view which will serve as foundation for the other views we create. For an in-depth look into unnesting data with Amazon Athena, read this blog post.

Some of the information we’re interested in might not be part of the document, but is encoded in the path. We use a trick in Athena by using the $path variable from the Presto Hive Connector. This determines which Amazon S3 file contains data that is returned by a specific row in an Athena table. This way we can find out which data object an annotation belongs to. Since Athena is built on top of Presto, we are able to use Presto’s built-in regexp_extract function to find out the iteration as well as the data object id per labeling result. We also cast the submission time in date format to later determine the labeling progress per day.

CREATE OR REPLACE VIEW annotations_view AS
SELECT 
  regexp_extract("$path", 'iteration-[0-9]*') as iteration,
  regexp_extract("$path", '(iteration-[0-9]*\/([0-9]*))',2) as dataRecord,
  answer.answercontent.crowd_classifier.label,
  cast(from_iso8601_timestamp(answer.submissionTime) as timestamp) as submissionTime,
  cast(from_iso8601_timestamp(answer.submissionTime) as date) as submissionDay,
  answer.workerId,
  answer.workerMetadata.identityData.identityProviderType,
  answer.workerMetadata.identityData.issuer,
  answer.workerMetadata.identityData.sub,
  "$path" path
FROM 
  annotations_raw
CROSS JOIN UNNEST(answers) AS t(answer)

This view, annotations_view, will be the starting point for the other views we will be creating in further in this post.

Visualizing with QuickSight

In this section, we explore a way to visualize the views we build in Athena by pointing Amazon QuickSight to the respective view. Amazon QuickSight lets you create and publish interactive dashboards that include ML Insights. Dashboards can then be accessed from any device, and embedded into your applications, portals, and websites.

Thanks to the tight integration between Athena and QuickSight, we are able to map one dataset in QuickSight to one Athena view. In order to further optimize the performance of the dashboard, we can optionally import the datasets into the in-memory optimized calculation engine for Amazon QuickSight called SPICE. With the datasets in place we can now create an analysis in order to interact with the visuals we’re going to add. You can think of an analysis as a container for a set of related visuals. You can use multiple datasets in an analysis, although any given visual can only use one of those datasets. After you create an analysis and an initial visual, you can expand the analysis. You can do this for example by adding datasets and visuals.

Let’s start with our first insight.

Annotations per worker

We’d like to gain insights not only into the total number of labeled items but also on the level of contributions of each individual workers. This could give us an indication whether the labels were created by a diverse crowd of labelers or by a few productive ones. A largely disproportionate amount of contributions from a handful of workers who may have brought along their biases.

SageMaker Ground Truth calls labeled data objects annotations, which is the result of a single workers labeling task.

Luckily we encapsulated all the heavy lifting of format conversion in the annotations_view, so that it is now easy to create a view for the annotations per user:

CREATE OR REPLACE VIEW annotations_per_user AS
SELECT COUNT(sub) AS LabeledItems,
sub AS User
FROM annotations_view
GROUP BY sub
ORDER BY LabeledItems DESC

Next we visualize this view in QuickSight. We add a visual to our analysis, select the respective dataset for the view and use the AutoGraph feature, which chooses the most appropriate visual type. Since we already arranged our view in Athena by the number of labeled items in descending order, there is no need now to sort the data in QuickSight. In the following screenshot, worker c4ef78e4... contributed more labels compared to their peers.

Annotations per worker

This view gives you an indicator to check for a bias that the leading worker might have brought along.

Annotations per label

One thing we want to be aware of is potential imbalances between classes in our dataset. Especially simple machine learning models, which may learn to frequently predict a label that is massively over represented in the dataset. If we can identify an imbalance, we can apply mitigation actions such as upsampling data of underrepresented classes. With the following view we list the total number of annotations per label.

CREATE OR REPLACE VIEW annotations_per_label AS
SELECT Count(dataRecord) AS TotalLabels, label As Label 
FROM annotations_view
GROUP BY label
ORDER BY TotalLabels DESC, Label;

As before, we create a dataset in QuickSight pointing to the annotations_per_label view, open the analysis, add a new visual and leverage the AutoGraph functionality. The result is the following visual representation:

Annotations per worker 2

One can clearly see that the Analytics & AI/ML class is massively underrepresented. At this point, you might want to try getting more data or think about upsampling data for that class.

Annotations per day

Seeing the total number of annotations per label and per worker is good, but we are also interested in how the labeling progress changes over time. This way we might see spikes related to labeller activations. We can also or estimate how long it takes to reach a certain goal of annotations given the current pace. For this purpose we create the following view aggregating the total annotations per day.

CREATE OR REPLACE VIEW annotations_per_day AS
SELECT COUNT(datarecord) AS LabeledItems,
submissionDay
FROM annotations_view
GROUP BY submissionDay
ORDER BY submissionDay, LabeledItems DESC

This time the QuickSight AutoGraph provides us with the following line chart. You might have noticed that the axis labels do not match the column names in Athena. That is because we renamed them in QuickSight for better readability.

Total annotations per day

In the preceding chart we see that there is no consistent pace of labeling, which makes it hard to predict when a certain amount of labeled data will be reached. In this example, after starting strong the progress immediately went down. Knowing this, we might want to take action into motivating our workers to contribute more and validate the effectiveness of these actions with the help of this chart. The spikes indicate an effective short-term action.

Distribution of total annotations by user

We already have insights into annotations per worker, per label and per day. Let us now now see what insights we can get from aggregating some of this information.

The bigger your labeling workforce gets, the harder it can become to see the whole picture. For that reason we will now create a histogram consisting of five buckets. Each bucket represents an interval of total annotations (for example, 0-25 annotations) mapped to the number of users whose amount of total annotations lies in that interval. This allows us to get a sense of what kind of bias might be introduced by the majority of annotations being contributed by a small amount of workers.

To do that, we use the Presto function width_bucket which returns the number of labeled data objects according to the five buckets we defined with a size of 25 each. We define these buckets by creating an Array with 5 elements that specify the boundaries.

CREATE OR REPLACE VIEW users_per_bucket_annotations AS
SELECT 
bucket,numberOfUsers,
CASE
   WHEN bucket=5 THEN 'B' || cast(bucket AS VARCHAR(10)) || ': ' || cast(((bucket-1) * 25) AS VARCHAR(10)) || '+'
   ELSE 'B' || cast(bucket AS VARCHAR(10)) || ': ' || cast(((bucket-1) * 25) AS VARCHAR(10)) || '-' || cast((bucket * 25) AS VARCHAR(10))
END AS NumberOfAnnotations
FROM
(SELECT width_bucket(labeleditems,ARRAY[0,25,50,75,100]) AS bucket,
 count(user) AS numberOfUsers
FROM annotations_per_user
GROUP BY 1
ORDER BY bucket)

A SELECT * FROM users_per_bucket_annotations produces the following result:

A SELECT FROM users_per_bucket_annotations

Let’s now investigate the same data via QuickSight:

Annotations per User in buckets of Size 25

Now that we can look at the data visually it becomes clear that we have a bimodal distribution, with many labelers having done very little, and many labelers doing quite a lot. This may warrant interviewing some labelers to find out if there is something holding back users from progressing, or if we can keep engagement high over time.

Putting it all together in QuickSight

Since we created all previous visuals into one analysis, we can now utilize it as a central place to consume our insights in a user-friendly way. Moreover, we can share our insights with others as a read-only snapshot which QuickSight calls a dashboard. User who are dashboard viewers can view and filter the dashboard data as below:

Groundtruth dashboard

Furthermore, you can generate a report and let QuickSight send it either once or on a schedule (daily, weekly or monthly) to your peers. This way users do not have to sign in and they can get reminders to check the progress of the labeling job. Lastly, sending out those reports is an opportunity to stay in touch with the labelers and keep the engagement high.

Conclusion

In this blogpost, we have shown one example of combining multiple AWS services in order to build a solution tailored to your needs. We took the Amazon S3 output generated by SageMaker Ground Truth and showed how it can be further processed and analyzed with Athena. Finally, we created a central place to consume our insights in a user-friendly way with QuickSight. By putting it all together in a dashboard we were able to share our insights with our peers.

You can take the same pattern and apply it to other situations: take some of the many building blocks AWS provides and mix-and-match them to create a unique, integrated solution with cohesive insights just as we did with Ground Truth, Athena, and QuickSight.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Monitoring the Java Virtual Machine Garbage Collection on AWS Lambda

Post Syndicated from Steffen Grunwald original https://aws.amazon.com/blogs/architecture/field-notes-monitoring-the-java-virtual-machine-garbage-collection-on-aws-lambda/

When you want to optimize your Java application on AWS Lambda for performance and cost the general steps are: Build, measure, then optimize! To accomplish this, you need a solid monitoring mechanism. Amazon CloudWatch and AWS X-Ray are well suited for this task since they already provide lots of data about your AWS Lambda function. This includes overall memory consumption, initialization time, and duration of your invocations. To examine the Java Virtual Machine (JVM) memory you require garbage collection logs from your functions. Instances of an AWS Lambda function have a short lifecycle compared to a long-running Java application server. It can be challenging to process the logs from tens or hundreds of these instances.

In this post, you learn how to emit and collect data to monitor the JVM garbage collector activity. Having this data, you can visualize out-of-memory situations of your applications in a Kibana dashboard like in the following screenshot. You gain actionable insights into your application’s memory consumption on AWS Lambda for troubleshooting and optimization.

The lifecycle of a JVM application on AWS Lambda

Let’s first revisit the lifecycle of the AWS Lambda Java runtime and its JVM:

  1. A Lambda function is invoked.
  2. AWS Lambda launches an execution context. This is a temporary runtime environment based on the configuration settings you provide, like permissions, memory size, and environment variables.
  3. AWS Lambda creates a new log stream in Amazon CloudWatch Logs for each instance of the execution context.
  4. The execution context initializes the JVM and your handler’s code.

You typically see the initialization of a fresh execution context when a Lambda function is invoked for the first time, after it has been updated, or it scales up in response to more incoming events.

AWS Lambda maintains the execution context for some time in anticipation of another Lambda function invocation. In effect, the service freezes the execution context after a Lambda function completes. It thaws the execution context when the Lambda function is invoked again if AWS Lambda chooses to reuse it.

During invocations, the JVM also maintains garbage collection as usual. Outside of invocations, the JVM and its maintenance processes like garbage collection are also frozen.

Garbage collection and indicators for your application’s health

The purpose of JVM garbage collection is to clean up objects in the JVM heap, which is the space for an application’s objects. It finds objects which are unreachable and deletes them. This frees heap space for other objects.

You can make the JVM log garbage collection activities to get insights into the health of your application. One example for this is the free heap after each garbage collection. If this metric keeps shrinking, it is an indicator for a memory leak – eventually turning into an OutOfMemoryError. If there is not enough of free heap, the JVM might be too busy with garbage collection instead of running your application code. Otherwise, a heap that is too big does indicate that there’s potential to decrease the memory configuration of your AWS Lambda function. This keeps garbage collection pauses low and provides a consistent response time.

The garbage collection logging can be configured via an environment variable as part of the AWS Lambda function configuration. The environment variable JAVA_TOOL_OPTIONS is considered by both the Java 8 and 11 JVMs. You use it to pass options that you would usually add to the command line when launching the JVM. The options to configure garbage collection logging and the output is specific to the Java version.

Java 11 uses the Unified Logging System (JEP 158 and JEP 271) which has been introduced in Java 9. Logging can be configured with the environment variable:

JAVA_TOOL_OPTIONS=-Xlog:gc+metaspace,gc+heap,gc:stdout:time,tags

The Serial Garbage Collector will output the logs:

[<TIMESTAMP>][gc] GC(4) Pause Full (Allocation Failure) 9M->9M(11M) 3.941ms (D)
[<TIMESTAMP>][gc,heap] GC(3) DefNew: 3063K->234K(3072K) (A)
[<TIMESTAMP>][gc,heap] GC(3) Tenured: 6313K->9127K(9152K) (B)
[<TIMESTAMP>][gc,metaspace] GC(3) Metaspace: 762K->762K(52428K) (C)
[<TIMESTAMP>][gc] GC(3) Pause Young (Allocation Failure) 9M->9M(21M) 23.559ms (D)

Prior to Java 9, including Java 8, you configure the garbage collection logging as follows:

JAVA_TOOL_OPTIONS=-XX:+PrintGCDetails -XX:+PrintGCDateStamps

The Serial garbage collector output in Java 8 is structured differently:

<TIMESTAMP>: [GC (Allocation Failure)
    <TIMESTAMP>: [DefNew: 131042K->131042K(131072K), 0.0000216 secs] (A)
    <TIMESTAMP>: [Tenured: 235683K->291057K(291076K), 0.2213687 secs] (B)
    366725K->365266K(422148K), (D)
    [Metaspace: 3943K->3943K(1056768K)], (C)
    0.2215370 secs]
    [Times: user=0.04 sys=0.02, real=0.22 secs]
<TIMESTAMP>: [Full GC (Allocation Failure)
    <TIMESTAMP>: [Tenured: 297661K->36658K(297664K), 0.0434012 secs] (B)
    431575K->36658K(431616K), (D)
    [Metaspace: 3943K->3943K(1056768K)], 0.0434680 secs] (C)
    [Times: user=0.02 sys=0.00, real=0.05 secs]

Independent of the Java version, the garbage collection activities are logged to standard out (stdout) or standard error (stderr). Logs appear in the AWS Lambda function’s log stream of Amazon CloudWatch Logs. The log contains the size of memory used for:

  • A: the young generation
  • B: the old generation
  • C: the metaspace
  • D: the entire heap

The notation is before-gc -> after-gc (committed heap). Read the JVM Garbage Collection Tuning Guide for more details.

Visualizing the logs in Amazon Elasticsearch Service

It is hard to fully understand the garbage collection log by just reading it in Amazon CloudWatch Logs. You must visualize it to gain more insight. This section describes the solution to achieve this.

Solution Overview

Java Solution Overview

Amazon CloudWatch Logs have a feature to stream CloudWatch Logs data to Amazon Elasticsearch Service via an AWS Lambda function. The AWS Lambda function for log transformation is subscribed to the log group of your application’s AWS Lambda function. The subscription filters for a pattern that matches the one of the garbage collection log entries. The log transformation function processes the log messages and puts it to a search cluster. To make the data easy to digest for the search cluster, you add code to transform and convert the messages to JSON. Having the data in a search cluster, you can visualize it with Kibana dashboards.

Get Started

To start, launch the solution architecture described above as a prepackaged application from the AWS Serverless Application Repository. It contains all resources ready to visualize the garbage collection logs for your Java 11 AWS Lambda functions in a Kibana dashboard. The search cluster consists of a single t2.small.elasticsearch instance with 10GB of EBS storage. It is protected with Amazon Cognito User Pools so you only need to add your user(s). The T2 instance types do not support encryption of data at rest.

Read the source code for the application in the aws-samples repository.

1. Spin up the application from the AWS Serverless Application Repository:

launch stack button

2. As soon as the application is deployed completely, the outputs of the AWS CloudFormation stack provide the links for the next steps. You will find two URLs in the AWS CloudFormation console called createUserUrl and kibanaUrl.

search stack

3. Use the createUserUrl link from the outputs, or navigate to the Amazon Cognito user pool in the console to create a new user in the pool.

a. Enter an email address as username and email. Enter a temporary password of your choice with at least 8 characters.

b. Leave the phone number empty and uncheck the checkbox to mark the phone number as verified.

c. If necessary, you can check the checkboxes to send an invitation to the new user or to make the user verify the email address.

d. Choose Create user.

create user dialog of Amazon Cognito User Pools

4. Access the Kibana dashboard with the kibanaUrl link from the AWS CloudFormation stack outputs, or navigate to the Kibana link displayed in the Amazon Elasticsearch Service console.

a. In Kibana, choose the Dashboard icon in the left menu bar

b. Open the Lambda GC Activity dashboard.

You can test that new events appear by using the Kibana Developer Console:

POST gc-logs-2020.09.03/_doc
{
  "@timestamp": "2020-09-03T15:12:34.567+0000",
  "@gc_type": "Pause Young",
  "@gc_cause": "Allocation Failure",
  "@heap_before_gc": "2",
  "@heap_after_gc": "1",
  "@heap_size_gc": "9",
  "@gc_duration": "5.432",
  "@owner": "123456789012",
  "@log_group": "/aws/lambda/myfunction",
  "@log_stream": "2020/09/03/[$LATEST]123456"
}

5. When you go to the Lambda GC Activity dashboard you can see the new event. You must select the right timeframe with the Show dates link.

Lambda GC activity

The dashboard consists of six tiles:

  • In the Filters you optionally select the log group and filter for a specific AWS Lambda function execution context by the name of its log stream.
  • In the GC Activity Count by Execution Context you see a heatmap of all filtered execution contexts by garbage collection activity count.
  • The GC Activity Metrics display a graph for the metrics for all filtered execution contexts.
  • The GC Activity Count shows the amount of garbage collection activities that are currently displayed.
  • The GC Duration show the sum of the duration of all displayed garbage collection activities.
  • The GC Activity Raw Data at the bottom displays the raw items as ingested into the search cluster for a further drill down.

Configure your AWS Lambda function for garbage collection logging

1. The application that you want to monitor needs to log garbage collection activities. Currently the solution supports logs from Java 11. Add the following environment variable to your AWS Lambda function to activate the logging.

JAVA_TOOL_OPTIONS=-Xlog:gc:stderr:time,tags

The environment variables must reflect this parameter like the following screenshot:

environment variables

2. Go to the streamLogs function in the AWS Lambda console that has been created by the stack, and subscribe it to the log group of the function you want to monitor.

streamlogs function

3. Select Add Trigger.

4. Select CloudWatch Logs as Trigger Configuration.

5. Input a Filter name of your choice.

6. Input "[gc" (including quotes) as the Filter pattern to match all garbage collection log entries.

7. Select the Log Group of the function you want to monitor. The following screenshot subscribes to the logs of the application’s function resize-lambda-ResizeFn-[...].

add trigger

8. Select Add.

9. Execute the AWS Lambda function you want to monitor.

10. Refresh the dashboard in Amazon Elasticsearch Service and see the datapoint added manually before appearing in the graph.

Troubleshooting examples

Let’s look at an example function and draw some useful insights from the Java garbage collection log. The following diagrams show the Sample Amazon S3 function code for Java from the AWS Lambda documentation running in a Java 11 function with 512 MB of memory.

  • An S3 event from a new uploaded image triggers this function.
  • The function loads the image from S3, resizes it, and puts the resized version to S3.
  • The file size of the example image is close to 2.8MB.
  • The application is called 100 times with a pause of 1 second.

Memory leak

For the demonstration of a memory leak, the function has been changed to keep all source images in memory as a class variable. Hence the memory of the function keeps growing when processing more images:

GC activity metrics

In the diagram, the heap size drops to zero at timestamp 12:34:00. The Amazon CloudWatch Logs of the function reveal an error before the next call to your code in the same AWS Lambda execution context with a fresh JVM:

Java heap space: java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: Java heap space
 at java.desktop/java.awt.image.DataBufferByte.<init>(Unknown Source)
[...]

The JVM crashed and was restarted because of the error. You leverage primarily the Amazon CloudWatch Logs of your function to detect errors. The garbage collection log and its visualization provide additional information for root cause analysis:

Did the JVM run out of memory because a single image to resize was too large?

Or was the memory issue growing over time?

The latter could be an indication that you have a memory leak in your code.

The Heap size is too small

For the demonstration of a heap that was chosen too small, the memory leak from the preceding image has been resolved, but the function was configured to 128MB of memory. From the baseline of the heap to the maximum heap size, there are only approximately 5 MB used.

GC activity metrics

This will result in a high management overhead of your JVM. You should experiment with a higher memory configuration to find the optimal performance also taking cost into account. Check out AWS Lambda power tuning open source tool to do this in an automated fashion.

Finetuning the initial heap size

If you review the development of the heap size at the start of an execution context, this indicates that the heap size is continuously increased. Each heap size change is an expensive operation consuming time of your function. Over time, the heap size is changed as well. The garbage collector logs 502 activities, which take almost 17 seconds overall.

GC activity metrics

This on-demand scaling is useful on a local workstation where the physical memory is shared with other applications. On AWS Lambda, the configured memory is dedicated to your function, so you can use it to its full extent.

You can do so by setting the minimum and maximum heap size to a fixed value by appending the -Xms and -Xmx parameters to the environment variable we introduced before.

The heap is not the only part of the JVM that consumes memory, so you must experiment with this setting and closely monitor the performance.

Start with the heap size that you observe to be working from the garbage collection log. If you set the heap size too large, your function will not initialize at all or break unexpectedly. Remember that the ability to tweak JVM parameters might change with future service features.

Let’s set 400 MB of the 512 MB memory and examine the results:

JAVA_TOOL_OPTIONS=-Xlog:gc:stderr:time,tags -Xms400m -Xmx400m

GC activity metrics

The preceding dashboard shows that the overall garbage collection duration was reduced by about 95%. The garbage collector had 80% fewer activities.

The garbage collection log entries displayed in the dashboard reveal that exclusively minor garbage collection (Pause Young) activities were triggered instead of major garbage collections (Pause Full). This is expected as the images are immediately discarded after the download, resize, upload operation. The effect on the overall function durations of 100 invocations, is a 5% decrease on average in this specific case.

Lambda duration

Cost estimation and clean up

Cost for the processing and transformation of your function’s Amazon CloudWatch Logs incurs when your function is called. This cost depends on your application and how often garbage collection activities are triggered. Read an estimate of the monthly cost for the search cluster. If you do not need the garbage collection monitoring anymore, delete the subscription filter from the log group of your AWS Lambda function(s). Also, delete the stack of the solution above in the AWS CloudFormation console to clean up resources.

Conclusion

In this post, we examined further sources of data to gain insights about the health of your Java application. We also demonstrated a pipeline to ingest, transform, and visualize this information continuously in a Kibana dashboard. As a next step, launch the application from the AWS Serverless Application Repository and subscribe it to your applications’ logs. Feel free to submit enhancements to the application in the aws-samples repository or provide feedback in the comments.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Requirements for Successfully Installing CloudEndure

Post Syndicated from Daniel Covey original https://aws.amazon.com/blogs/architecture/field-notes-requirements-for-successfully-installing-cloudendure/

Customers have been using CloudEndure for their Migration and Disaster Recovery needs for many years. In 2019, CloudEndure was acquired by AWS, and provided the licensing for CloudEndure to all of their users free of charge for migration. During this time, AWS has identified the requirements for replication to complete successfully after initial agent install. Customers can use the following tips to facilitate a smooth transition to AWS.

In this blog, we look at four sections of the CloudEndure configuration process required for a successful installation:

  1. CloudEndure Port configuration
  2. CloudEndure JSON Policy Options
  3. CloudEndure Staging Area Configuration
  4. CloudEndure Configuration for Proxies

Required CloudEndure Ports

There are 2 required ports that CloudEndure uses. TCP Ports 1500 and 443 have particular configurations based on source or staging area. TCP 1500 is used for replication of data, and 443 is for agent communication with the CloudEndure Console.

Architecture Overview

The following graphic is a high level overview of the required ports for CloudEndure, both from the source infrastructure, and the staging subnet you will be replicating to.

network architecture

Steps

  1. Is 443 outbound open to console.cloudendure.com on the source infrastructure?
  • Check OS level firewall
  • Check proxy settings
  • Ensure there is no SSL intercept or Deep Packet Inspection being done to packets from that machine

        2. Is 443 outbound open to console.cloudendure.com in the AWS Security Group assigned to the replication subnet?

  • Check no NACLs are in place to prevent SSL traffic outbound from the subnet
  • Check the machines on this subnet can reach the EC2 endpoint for the region.
  • If you have any restrictions on accessing Amazon S3 buckets, you can have CloudEndure use a CloudFront distribution instead.
  • Review The CloudEndure documentation for how to do this

       3. Is 1500 outbound open to the Staging Subnet from the source infrastructure?

  • Check OS level firewall
  • Check proxy settings

4. Is 1500 inbound from the source infrastructure open on the Security Group assigned to the replication subnet?

  • Check no NACLs are in place preventing traffic.

CloudEndure JSON Policies 

CloudEndure uses one of these JSON policies attached to IAM Users. These policies give CloudEndure specific access to your AWS account resources. This launches specific resources needed to ensure the tool is working properly. CloudEndure JSON policies use tag filtering to limit the creation and deletion of resources.

For the JSON policy, CloudEndure expects a specific set of permissions, even in the case where we may not be using them. CloudEndure does a policy check first, to ensure all permissions are available. It is not recommended to change the JSON policies, as it can cause CloudEndure to fail initial replication configuration. Use one of the following three policies.

AWS to AWS

  • Best policy to use if you are doing Inter-AWS replication, such as Region-to-Region, or AZ-to-AZ replication

Default

  • Default JSON policy. Allows for access to any of the resources needed by CloudEndure

Tagging based

  • A more restrictive policy, for customers that need a more secure solution.

Staging Area Configuration

CloudEndure replicates to a “Staging Area”, where you control the Replication Server and the AWS EBS volumes attached to that server. You define which VPC and Subnet you want CloudEndure to replicate to, with the following considerations.

staging area

  1. Default Subnet
    • You designate your specific AWS Subnet to use for replication here. Leaving the option as “Default” uses the default subnet for the VPC, which is usually deleted by customers when first configuring their VPCs.

2. Default Security group

    • This is often created by the Cloudendure tool, cannot be changed, and will be added if replication disconnects. Any changes made to this SG will be reverted back to default rules.
    • If utilizing a proxy, it is advised to add a Security Group that also allows access to the proxy

Proxy Servers

Some customers utilize Proxy servers within their environment. Review the following guidance regarding specific changes to configurations within your environment needed for CloudEndure to operate effectively.

  1. Make sure to set proxy in replication settings
    • This can be either IP address, or an FQDN

       2. Note the following for either Windows or Linux

    • Windows – CloudEndure agent runs as System, so please ensure the System account is part of the allow list in the proxy.
    • Linux – CloudEndure Agent creates a linux user to run commands (named cloudendure), so this user will need to be part of the allow list in the proxy

       3. Make sure environment variables are set for the machines

  • Windows Steps
    • Control Panel > System and Security > System > Advanced system settings.
    • In the Advanced Tab of the System Properties dialog box, select Environment Variables 
    • On the System Variables section of the Environment Variables pane, select New to add the https_proxy environment variable or Edit if the variable already exists.
    • Enter https://PROXY_ADDR:PROXY_PORT/ in the Variable value field. Select OK.
    • If the agent was already installed, restart the service
  • Alternatively, you can open CMD as Administrator and enter the following command:

setx https_proxy https://<proxy ip>:<proxy port>/ /m

  • Linux Steps
    • Complete one of the following lines in the terminal
    • $ export http_proxy=http://server-ip:port/
    • $ export http_proxy=http://127.0.0.1:3128/
    • $ export http_proxy=http://proxy-server.mycorp.com:3128/ Please note to include the trailing /

Cleaning Up

After you have finished utilizing the CloudEndure tool, remove any resources you may no longer need.

Conclusion

In conclusion, I have showed how best to prepare your environment for installation of the CloudEndure tool. CloudEndure is utilized to protect your business, and mitigate downtime during your move to the cloud. By following the preceding steps, you set up the configuration for success. Visit the AWS landing page for CloudEndure, to get a deeper understanding of the tool, get started with CloudEndure, or take the free online technical training. Should you need assistance with other configurations, visit the CloudEndure Documentation Library, which includes every aspect of the tooling, as well as a helpful FAQ.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Deploying UiPath RPA Software on AWS

Post Syndicated from Yuchen Lin original https://aws.amazon.com/blogs/architecture/field-notes-deploying-uipath-rpa-software-on-aws/

Running UiPath RPA software on AWS leverages the elasticity of the AWS Cloud, to set up, operate, and scale robotic process automation. It provides cost-efficient and resizable capacity, and scales the robots to meet your business workload. This reduces the need for administration tasks, such as hardware provisioning, environment setup, and backups. It frees you to focus on business process optimization by automating more processes.

This blog post guides you in deploying UiPath robotic processing automation (RPA) software on AWS. RPA software uses the user interface to capture data and manipulate applications just like humans do. It runs as a software robot to interpret, and trigger responses, as well as communicate with other systems to perform a variety of repetitive tasks.

UiPath Enterprise RPA Platform provides the full automation lifecycle including discover, build, manage, run, engage, and measure with different products. This blog post focuses on the Platform’s core products: build with UiPath Studio, manage with UiPath Orchestrator and run with UiPath Robots.

About UiPath software

UiPath Enterprise RPA Platform’s core products are:

UiPath Studio and UiPath Robot are individual products, you can deploy each on a standalone machine.

UiPath Orchestrator contains Web Servers, SQL Server and Indexer Server (Elasticsearch), you can use Single Machine deployment, or Multi-Node deployment, depends on the workload capacity and availability requirements.

For information on UiPath platform offerings, review UiPath platform products.

UiPath on AWS

You can deploy all UiPath products on AWS.

  • UiPath Studio is needed for automation design jobs and runs on single machine. You deploy it with Amazon EC2.
  • UiPath Robots are needed for automation tasks, runs on a single machine, and scales with the business workload. You deploy it with Amazon EC2 and scale with Amazon EC2 Auto Scaling.
  • UiPath Orchestrator is needed for automation administration jobs and contains three logical components that run on multiple machines. You deploy Web Server with Amazon EC2, SQL Server with Amazon RDS, and Indexer Server with Amazon Elasticsearch Service. For Multi-Node deployment, you deploy High Availability Add-On with Amazon EC2.

The architecture of UiPath Enterprise RPA Platform on AWS looks like the following diagram:

Figure 1 - UiPath Enterprise RPA Platform on AWS

Figure 1 – UiPath Enterprise RPA Platform on AWS

By deploying the UiPath Enterprise RPA Platform on AWS, you can set up, operate, and scale workloads. This controls the infrastructure cost to meet process automation workloads.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account
  • AWS resources
  • UiPath Enterprise RPA Platform software
  • Basic knowledge of Amazon EC2, EC2 Auto Scaling, Amazon RDS, Amazon Elasticsearch Service.
  • Basic knowledge to set up Windows Server, IIS, SQL Server, Elasticsearch.
  • Basic knowledge of Redis Enterprise to set up High Availability Add-on.
  • Basic knowledge of UiPath Studio, UiPath Robot, UiPath Orchestrator.

Deployment Steps

Deploy UiPath Studio
UiPath Studio deploys on a single machine. Amazon EC2 instances provide secure and resizable compute capacity in the cloud, and the ability to launch applications when needed without upfront commitments.

  1. Download the UiPath Enterprise RPA Platform. UiPath Studio is integrated in the installation package.
  2. Launch an EC2 instance with a Windows OS-based Amazon Machine Image (AMI) that meets the UiPath Studio hardware requirements and software requirements.
  3. Install the UiPath Studio software. For UiPath Studio installation steps, review the UiPath Studio Guide.

Optionally, you can save the installation and pre-configuration work completed for UiPath Studio as a custom Amazon Machine Image (AMI). Then, you can launch more UiPath Studio instances from this AMI. For details, visit Launch an EC2 instance from a custom AMI tutorial.

UiPath Robot Deployment

Each UiPath Robot deploys one single machine with Amazon EC2. Amazon EC2 Auto Scaling helps you add or remove Robots to meet automation workload changes in demand.

  1. Download the UiPath Enterprise RPA Platform. The UiPath Robot is integrated in the installation package.
  2. Launch an EC2 instance with a Windows OS based Amazon Machine Image (AMI) that meets the UiPath Robot hardware requirements and software requirements.
  3. Install the business application (Microsoft Office, SAP, etc.) required for your business processes. Alternatively, select the business application AMI from the AWS Marketplace.
  4. Install the UiPath Robot software. For UiPath Robot installation steps, review Installing the Robot.

Optionally, you can save the installation and pre-configuration work completed for UiPath Robot as a custom Amazon Machine Image (AMI). Then you can create Launch templates with instance configuration information. With launch template, you can create Auto Scaling groups from launch templates and scale the Robots.

Scale the Robots’ Capacity

Amazon EC2 Auto Scaling groups help you use scaling policies to scale compute capacity based on resource use. By monitoring the process queue and creating a customized scaling policy, the UiPath Robot can automatically scale based on the workload. For details, review Scaling the size of your Auto Scaling group.

Use the Robot Logs

UiPath Robot generates multiple diagnostic and execution logs. Amazon CloudWatch provides the log collection, storage, and analysis, and enables the complete visibility of the Robots and automation tasks. For CloudWatch agent setup on Robot, review Quick Start: Enable Your Amazon EC2 Instances Running Windows Server to Send logs to CloudWatch Logs.

Monitor the Automation Jobs

UiPath Robot uses the user interface to capture data and manipulate applications. When UiPath Robot runs, it is important to capture processing screens for troubleshooting and auditing usage. This screen capture activity can be integrated with process in conjunction with UiPath Studio.

Amazon S3 provides cost-effective storage for retaining all Robot logs and processing screen captures. Amazon S3 Object Lifecycle Management automates the transition between different storage classes, and helps you manage the screenshots so that they are stored cost effectively throughout their lifecycle. For lifecycle policy creation, review How Do I Create a Lifecycle Policy for an S3 Bucket?.

UiPath Orchestrator Deployment

Deployment Components
UiPath Orchestrator Server Platform has many logical components, grouped in three layers:

  • presentation layer
  • web service layer
  • persistence layer

The presentation layer and web service layer are built into one ASP.NET website. The persistence layer contains SQL Server and Elasticsearch. There are three deployment components to be set up:

  • web application
  • SQL Server
  • Elasticsearch

The Web Server, SQL Server, and Elasticsearch Server require multiple different environments. Review the hardware requirements and software requirements for more details.

Note: set up the Web Server, SQL Server, Elasticsearch Server environments before running the UiPath Enterprise Platform installation wizard.

Set up Web Server with Amazon EC2

UiPath Orchestrator Web Server deploys on Windows Server with IIS 7.5 or later. For details, review the software requirements.

AWS provides various AMIs for Windows Server that can help you set up the environment required for the Web Server.

The Microsoft Windows Server 2019 Base AMI includes most prerequisites for installation except some features of Web Server (IIS) to be enabled. For configuration steps, review Server Roles and Features.

The Web Server should be put in correct subnet (Public or Private) and have proper security group (HTTPS visits) according to the business requirements. Review Allow user to connect EC2 on HTTP or HTTPS.

Set up SQL Server with Amazon RDS

Amazon Relational Database Service (Amazon RDS) provides you with a managed database service. With a few clicks, you can set up, operate, and scale a relational database in the AWS Cloud.

Amazon RDS support SQL Server Engine. For UiPath Orchestrator, both Standard Edition and Enterprise Edition are supported. For details, review software requirements.

Amazon RDS can be set up in multiple Available Zones to meet requirements for high availability.

UiPath Orchestrator can connect to the created Amazon RDS database with SQL Server Authentication.

Set up Elasticsearch Server with Amazon Elasticsearch Service (Amazon ES)

Amazon ES is a fully managed service for you to deploy, secure, and operate Elasticsearch at scale with generally zero down time.

Elasticsearch Service provides a managed ELS stack, with no upfront costs or usage requirements, and without the operational overhead.

All messages logged by UiPath Robots are sent through the Logging REST endpoint to the Indexer Server where they are indexed for future utilization.

Install UiPath Orchestrator on the Web Server

After Web Server, SQL Server, Elasticsearch Server environment are ready, download the UiPath Enterprise RPA Platform, and install it on the Web Server.

The UiPath Enterprise Platform installation wizard guides you in configuring and setting up each environment, including connecting to SQL Server and configuring the Elasticsearch API URL.

After you complete setup, the UiPath Orchestrator Portal is available for you to visit and manage processes, jobs, and robots.

The UiPath Orchestrator dashboard appears like in the following screenshot:

Figure UiPath Orchestrator Portal

Figure 2- UiPath Orchestrator Portal

Set up Orchestrator High Availability Architecture

One Orchestrator can handle many robots in a typical configuration, but any product running on a single server is vulnerable to failure if something happens to that server.

The High Availability add-on (HAA) enables you to add a second Orchestrator server to your environment that is generally fully synchronized with the first server.

To set up multi-node deployment, launch Amazon EC2 instances with a Linux OS-based Amazon Machine Image (AMI) that meets the HAA hardware and software requirements. Follow the installation guide to set up HAA.

Elastic Load Balancing automatically distributes incoming application traffic across multiple targets. Network Load Balancer should be set up to allow Robots to communicate with multi-node Orchestrators.

Cleaning up

To avoid incurring future charges, delete all the resources.

Conclusion

In this post, I showed you how to deploy the UiPath Enterprise RPA Platform on AWS to further optimize and automate your business processes. AWS Managed Services like Amazon EC2, Amazon RDS, and Amazon Elasticsearch Service help you set up the environment with high availability. This reduces the maintenance effort of backend services, as well as scaling Orchestrator capabilities. Amazon EC2 Auto Scaling helps you add or remove robots to meet automation workload changes in demand.

Learn more about how to integrate UiPath with AWS services, check out The UiPath and AWS partnership.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Building a Shared Account Structure Using AWS Organizations

Post Syndicated from Abhijit Vaidya original https://aws.amazon.com/blogs/architecture/field-notes-building-a-shared-account-structure-using-aws-organizations/

For customers considering the AWS Solution Provider Program, there are challenges to mitigate when building a shared account model with SI partners. AWS Organizations make it possible to build the right account structure to support a resale arrangement. In this engagement model, the end customer gets an AWS invoice from an AWS authorized partner instead of AWS directly.

Partners and customers who want to engage in this service resale arrangement need to build a new account structure. This process includes linking or transferring existing customer accounts to the partner master account. This is so that all the billing data from customer accounts is consolidated into the partner master account.

While linking or transferring existing customer accounts to the new master account, the partner must check the new master account structure. It should not compromise any customer security controls and continue to provide full control of linked accounts to the customer.  The new account structure must fulfill the following requirements for both the AWS customer and partner:

  • The customer maintains full access to the AWS organization and able to perform all critical security-related tasks except access to billing data.
  • The Partner is able to control only billing information and not able to perform any other task in the root account (master payer account) without approval from the customer.
  • In case of contract breach / termination, the customer is able to gain back full control of all accounts including the Master.

In this post, you will learn about how partners can create a master account with shared ownership. We also show how to link or transfer customer organization accounts to the new organization master account and set up policies that would provide appropriate access to both partner and customer.

Account Structure

The following architecture represents the account structure setup that can fulfill customer and partner requirements as a part of a service resale arrangement.

As illustrated in the preceding diagram, the following list includes the key architectural components:

Architectural Components

As a part of resale arrangement, the customer’s existing AWS organization and related accounts are linked to the partner’s master payer account. The customer can continue to maintain their existing master root account, while all child accounts are linked to the master account (as shown in the list).

Customers may have valid concerns about linking/transferring accounts to the owned master payee account and may come up with many ‘what-if’ scenarios for example “What if the partner shuts down environment/servers?”  or, “What if partner blocks access to child accounts?”.

This account structure provides the right controls to both customer and partner that would address customer concerns around the security of their accounts. This includes the following benefits:

  • Starting with access to the root account, neither customer nor partner can access the root account without the other party’s involvement.
  • The partner controls the id/password for the root account while the customer maintains the MFA token for the account. The customer also controls the phone number, security questions associated with the root account. That way, the partner cannot replace the MFA token on their own.
  • The partner only has billing access and does not control any other parts of account including child accounts. Anytime the root account access is needed, both customer and partner team need to collaborate and access the root account.
  • The customer or partner cannot assign new IAM roles to themselves, therefore protecting the initial account setup.

Security considerations in the shared account setup

The following table highlights both customer and partner responsibilities and access controls provided by the architecture in the previous section.

The following points highlight security recommendations to provide adequate access rights to both partner and customers.

  • New master payer/ root account has a joint ownership between the Partner and the Customer.
  • AWS account root users (user id/password) would be with Partner and MFA (multi-factor authentication) device with Customer.
  • IAM (AWS Identity and Access Management) role to be created under the master payer with policies “FullOrganizationAccess”, “Amazon S3” (Amazon Simple Storage Service), “CloudTrail” (AWS CloudTrail), “CloudWatch”(Amazon CloudWatch) for the Customer.
  • Security team to log in and manage security of the account. Additional permissions to be added to this role as needed in future. This role does not have ANY billing permissions.
  • Only the partner has access to AWS billing and usage data.
  • IAM role / user would be created under master payer with just billing permission for the Partner team to log in and download all invoices. This role does not have any other permissions except billing and usage reports.
  • Any root login attempts to master payer triggers a notification to Customer’s SOC team and the Partner’s Customer account management team.
  • The Partner’ email address is used to create an account so invoices can be emailed to the partner’s email. The Customer cannot see these invoices.
  • The Customer phone number is used to create a master account and the customer maintains security questions/answers. This prevents replacement of MFA token by the Partner team without informing customer.  The Customer wouldn’t need the Partner’s help or permission to login and manage any security.
  • No aspect of  the Master Payer / Root Partner team can login to the master payer/Root without the Customer providing an MFA token.

Setting up the shared master AWS account structure

Create a playbook for the account transfer activity based on the following tasks. For each task, identify the owner. Make sure that owners have the right permissions to perform the tasks.

Part I – Setting up new partner master account

  1. Create new Partner Master payee Account
  2. Update payment details section with the required details for payment in Partner Master payee Account
  3. Enable MFA in the Partner Master payee Account
  4. Update contact for security and operations in the Partner Master payee Account
  5. Update demographics -Address and contact details in Partner Master payee Account
  6. Create an IAM role for Customer Team in Partner Master Payee account. IAM role is created under master payer with “FullOrganizationAccess”, “Amazon S3”, “CloudTrail”, “CloudWatch” “CloudFormationFullAccess” for the Customer SOC team to login and manage security of the account. Additional permissions can be added to this role in future if needed.

Select the roles:

create role

7. Create an IAM role/user for Partner billing role in the Partner Master Payee account.

Part II – Setting up customer master account

1.      Create an IAM user in the customer’s master account. This user assumes role into the new master payer/root account.

aws management console

 

2.      Confirm that when the IAM user from the customer account assumes a role in the new master account, and that the user does not have Billing Access.

Billing and cost management dashboard

Part III – Creating an organization structure in partner account

  1. Create an Organization in the Partner Master Payee Account
  2. Create Multiple Organizational Units (OU) in the Partner Master Payee Account

 

3. Enable Service Control Policies from AWS Organization’s Policies menu.

service control policies

5. Create/Copy Multiple in to Partner Master Payee Account from Customer root Account.  Any service control policies from the customer root account should be manually copied to new partner account.

6. If customer root account has any special software installed for example, security, install same software in Partner Master Payee Account.

7. Set alerts in Partner Master Payee root account. Any login to the root account would send alerts to customer and partner teams.

8. It is recommended to keep a copy of all billing history invoices for all accounts to be transferred to partner organization. This could be achieved by either downloading CSV or printing all invoices and storing files in Amazon S3 for long term archival. Billing history and invoices are found by clicking Orders and Invoices on Billing & Cost Management Dashboard. After accounts are transferred to new organization, historic billing data will not be available for those accounts.

9. Remove all the Member Accounts from the current Customer Root Account/ Organization. This step is performed by customer account admin and required before account can be transferred to Partner Account organization.

10. Send an invite from the Partner Master Payee Account to the delinked Member Account

master payee account

11.      Member Accounts to accept the invite from the Partner Master Payee Account.

invitations

12.      Move the Customer member account to the appropriate OU in the Partner Master Payee Account.

Setting the shared security model between partner and customer contact

While setting up the master account, three contacts need to be updated for notification.

  • Billing –  this is owned by the Partner
  • Operations – this is owned by the Customer
  • Security – this is owned by the Customer.

This will trigger a notification of any activity on the root account. The contact details contain Name, Title, Email Address and Phone number.  It is recommended to use the Customer’s SOC team distribution email for security and operations, and a phone number that belongs to the organization, and not the individual.

Alternate contacts

Additionally, before any root account activity takes place, AWS Support will verify using the security challenge questionnaire. These questions and answers are owned by the Customer’s SOC team.

security challenge questions

If a customer is not able to access the AWS account, alternate support options are available at Contact us by expanding the “I’m an AWS customer and I’m looking for billing or account support” menu. While contacting AWS Support, all the details that are listed on the account are needed, including full name, phone number, address, email address, and the last four digits of the credit card.

Clean Up

After recovering the account, the Customer should close any accounts that are not in use. It’s a good idea not to have open accounts in your name that could result in charges. For more information, review Closing an Account in the Billing and Cost Management User Guide.

The Shared master root account should be only used for selected activities referred to in the following document.

Conclusion

In this post, you learned how AWS Organizations features can be used to create a shared master account structure.  This helps both customer and partner engage in a service resale business engagement. Using AWS Organizations and cross account access, this solution allows customers to control all key aspects of managing the AWS Organization (Security / Logging / Monitoring) and also allows partners to control any billing related data.

Additional Resources

Cross Account Access

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Integrating a Multi-forest Source Environment with AWS SSO

Post Syndicated from Sudhir Amin original https://aws.amazon.com/blogs/architecture/field-notes-integrating-a-multi-forest-source-environment-with-aws-sso/

During re:Invent 2019, AWS announced a new way to integrate external identity sources such as Azure Active Directory with auto provisioning of identities and groups in AWS Single Sign-On (AWS SSO). In March 2020, AWS SSO afforded customers the possibility to connect their Okta Identity Cloud to AWS Single Sign-On (SSO) in order to manage access to AWS centrally in AWS SSO.

AWS Single Sign-On service helps to centralize management access to multiple AWS accounts and some cases tying back to corporate identities. This provides ready access to business applications and services. With this feature, companies can leverage AWS Single Sign-On for allowing federated access to multiple AWS accounts and cloud applications.

In this blog post, I discuss the challenges faced by customers running multi-forest environments or multiple Azure tenant subscriptions with this feature.  I also provide a different approach to solving this challenge with a brief overview of each solution presented.

Large Enterprise companies often require their security team to build centralized identity solutions that work across different Active Directory forests environments. This is commonly due to a merger, acquisition or partnership. Challenges include complex networking with different IP routes, DNS forwarding configurations, firewall rules to enable trust relationships between different Active Directory forests to support compliance of a single identity to manage the account lifecycle and password policies. This becomes even more challenging when your organization is working in multiple cloud platforms within a centralized Identity solution, with hybrid networking connectivity.

Customer Example

To illustrate my point, I use the following example of a real life customer scenario, under the fictitious name of ‘Acme Corporation’.

Acme Corporation is a capital wealth management company operating in three countries: USA, Canada, and Brazil. Business is growing and they are exploring cloud services.

Their corporate headquarters is located in NY, USA and they have established offices (branches) in Canada and Brazil. The organization operates in a decentralized model, which consists of different governance over their identity structure. An Active Directory Forest is established per Region with a cross-forest trust relationship. The company is looking to adopt cloud technologies and needed a common identity solution across on-premises and cloud services with Azure Active Directory and AWS.

We’ve outlined the solution in the following diagram:

Figure 1 - Solution Overview

Figure 1 – Solution Overview

Options to source identities into AWS Single Sign-On

AWS Single Sign-On offers the following 3 options to establish as an identity source:

  • AWS SSO
  • Active Directory
  • External Identity Provider
Figure 2 - Identity Source Options

Figure 2 – Identity Source Options

The first option; “AWS SSO” is a default native identity store. You can create and delete users and groups.

The second option; “Active Directory” allows administrators to source users and groups from Active Directory running On-Premises Active Directory, or Active Directory in EC2 (using AD Connector as the directory gateway) or AWS Managed Microsoft AD directory hosted in the AWS Cloud.

The third option; “External Identity Provider” enables administrators to provision users and groups from external identity providers (IdPs) through the Security Assertion Markup Language (SAML) 2.0 standard.

Note: AWS Single Sign-On allows only one identity source at any given time. In this post, we focus on two options that help integrate a multi-forest environment with AWS Single Sign-On and Azure Active Directory.

Solution

Option 1. Federating with Active Directory 

In the hub-and-spoke model, the AWS Managed Microsoft Active Directory is the hub and the spoke is the Active Directory forests.

  1. Provision a AWS Managed Microsoft Active Directory.
    • If you already have an AWS Managed Microsoft Active Directory for a hub, continue to the next step.
  2. Setup hybrid network connectivity, and firewall rules allowing trust traffic
  3. DNS, conditional forwarding allows to resolve the trusting forests. We need an Outbound Endpoint with Forwarding Rules to the different forests so the VPC resolves the names and an inbound endpoint so the forests can resolve the AWS Managed Microsoft AD names.
  4. Check the name resolution is working for the hybrid environment.
  5. Establish a Forest trust relationship and validate the trust.

The following snapshot shows how your trust relationship will be displayed on the console.

Figure 3 - Trust Relationships

Note 1: You cannot use the transitive trust relationship of a child domain in a forest or cross forest relationship. In that case, you have to create an explicit trust or a domain trust to the AWS Managed Microsoft AD domain for AWS Single Sign-On. This enables you to see the user and groups required to provision the permission sets and Accounts.

Note 2: AWS Managed Microsoft Active Directory in this example does not require you to host any users or groups, as this domain is only being used for the domain trust relationships. In short, this can be an empty forest.

Configure AWS Single Sign-On to use your AWS Managed Microsoft Active Directory for Active Directory option.

The following snapshot shows how to assign a group to an account in preparation for AWS Singles Sign-On enablement.

Figure 4 - Selecting Users or Groups

The following snapshot shows how to assign a group to an account in preparation for AWS Singles Sign-On enablement and selecting a group.

Figure 5 - Assigning Users

The following is a conceptual diagram of Acme corporation, after successful integration.

Figure 4 - Conceptual Diagram

Figure 3 – Option 1 – Conceptual Diagram

Option 2a. Federating with Azure Active Directory Single Tenant

If you have multiple-forests and would like to use a single tenant, here are the steps:

  1. Setup a single Azure AD Connect in any forest, to consolidate users from different forests to a single Azure Tenant.
  2. Configure AWS Single Sign-On to use your Single Azure Active Directory Tenant for External Identity Provider option.

The following is a conceptual diagram of Acme corporation, after successful integration.

Figure 5 - Option 2a - Conceptual Diagram

Figure 5 – Option 2a – Conceptual Diagram

Option 2b. Federating with Azure Active Directory Multiple Tenants

If option 2a is not feasible and you are using multiple Azure AD Connect sync servers and multiple Azure Active Directory tenants (as per the following diagram) then, you can nominate one of the Azure Active Directory tenants to connect with AWS SSO. Through B2B invitation, selectively invite users from other tenants into the nominated tenant.

Note: This is not a scalable solution, as it requires administrative overhead. This should be ideal for a small set of users requiring access to AWS API or console for administrative work.

The following is a conceptual diagram of Acme corporation, after successful integration.

Figure 6 - Option 2b - Conceptual Diagram

Figure 6 – Option 2b – Conceptual Diagram

Conclusion

In this post, we discussed the options for connecting AWS SSO to your preferred Identity Provider, with a multi-forest infrastructure. Customers running multi-forest environments or multiple Azure tenant subscriptions now have a guide to offer their users a continued way of centralizing management and enforcing least privilege access on cloud resources. To learn more, review our AWS Single Sign-On service content.

Additional Content:

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Serverless Container-based APIs with Amazon ECS and Amazon API Gateway

Post Syndicated from Simone Pomata original https://aws.amazon.com/blogs/architecture/field-notes-serverless-container-based-apis-with-amazon-ecs-and-amazon-api-gateway/

A growing number of organizations choose to build their APIs with Docker containers. For hosting and exposing these container-based APIs, they need a solution which supports HTTP requests routing, autoscaling, and high availability. In some cases, user authorization is also needed.

For this purpose, many organizations are orchestrating their containerized services with Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS), while hosting their containers on Amazon EC2 or AWS Fargate. Then, they can add scalability and high availability with Service Auto Scaling (in Amazon ECS) or Horizontal Pod Auto Scaler (in Amazon EKS), and they expose the services through load balancers (for example, the AWS Application Load Balancer).

When you use Amazon ECS as an orchestrator (with EC2 or Fargate launch type), you also have the option to expose your services with Amazon API Gateway and AWS Cloud Map instead of a load balancer. AWS Cloud Map is used for service discovery: no matter how Amazon ECS tasks scale, AWS Cloud Map service names would point to the right set of Amazon ECS tasks. Then, API Gateway HTTP APIs can be used to define API routes and point them to the corresponding AWS Cloud Map services.

API Gateway and AWS Cloud Map could be a good fit if you want to leverage the capabilities provided by API Gateway HTTP APIs. For example, you could import/export your API as an OpenAPI definition file. You could configure the following features, either on the whole API or – more granularly – at route level: throttling, detailed metrics, or OAuth 2.0 / OIDC user authorization. You could also deploy your API at different stages over time. Or you could easily configure CORS for your API or for any route, instead of handling OPTIONS preflight requests yourself.

If you don’t need the capabilities of API Gateway HTTP APIs or if those of Elastic Load Balancing are a better fit, then you can use the latter. For example, the capabilities of the Application Load Balancer include: content-based routing (not only by path and HTTP method, but also by HTTP header, query-string parameter, source IP, etc.), redirects, fixed responses, and others. Additionally, the Network Load Balancer provides layer 4 load balancing capabilities. Ultimately, there are overlaps and differences between the features of Elastic Load Balancing and those of API Gateway HTTP APIs: so you may want to compare them to choose the right option for your use case.

This blog post guides you through the details of the option based on API Gateway and AWS Cloud Map, and how to implement it: first you learn how the different components (Amazon ECS, AWS Cloud Map, API Gateway, etc.) work together, then you launch and test a sample container-based API.

Architecture Overview

The following diagram shows the architecture of the sample API that you are going to launch.

Figure 1 - Architecture Diagram

Figure 1 – Architecture Diagram

This example API exposes two services: “Food store” to PUT and GET foods, and “Pet store” to PUT and GET pets. Unauthenticated users can only GET, while authenticated users can also PUT.

The following building blocks are used:

  1. Amazon Cognito User Pools: for user authentication. In this example API, Amazon Cognito is used for user authentication, but you could use any other OAuth 2.0 / OIDC identity provider instead. When the user authenticates with Amazon Cognito, user pool tokens are granted, including a JWT access token that is used for authorizing requests to the container APIs.
  2. API Gateway HTTP APIs: for exposing the containerized services to the user. API routes and the respective integrations are defined in API Gateway. A route is the combination of a path and a method. An integration is the backend service which is invoked by that route. In this API, private integrations point to AWS Cloud Map services, which in turn resolve to private Amazon ECS services (more about AWS Cloud Map in the next paragraph). As Amazon ECS services are private resources in a Virtual Private Cloud (VPC), API Gateway uses a VPC link to connect to them in a private way. A VPC link is a set of elastic network interfaces in the VPC, assigned to and managed by API Gateway, so that API Gateway can talk privately with other resources in the VPC. This way, Amazon ECS services can be launched in private subnets and don’t need a public IP. In this sample application, JWT authorization is configured in API Gateway for PUT routes. API Gateway performs requests authorization based on validation of the JWT Token provided, and optionally, scopes in the token. This way, you don’t need additional code in your containers for authorization.
  3. AWS Cloud Map: for service discovery of the containerized services. API Gateway needs a way to find physical addresses of the backend services, and AWS Cloud Map provides this capability. To enable this functionality, service discovery should be configured on Amazon ECS services. Amazon ECS performs periodic health checks on tasks in Amazon ECS services and registers the healthy tasks to the respective AWS Cloud Map service. AWS Cloud Map services can then be resolved either via DNS queries or by calling the DiscoverInstances API (API Gateway uses the API). AWS Cloud Map supports different DNS record types (including A, AAAA, CNAME, and SRV); at the time, of writing, API Gateway can only retrieve SRV records from AWS Cloud Map, so SRV records are used in this sample application. With SRV records, each AWS Cloud Map service returns a combination of IP addresses and port numbers of all the healthy tasks in the service. Consider that AWS Cloud Map would perform round-robin routing (with equal weighting to the targets): for this reason, to avoid hot spots, all tasks in each service should be homogeneous (in terms of container images, vCPU, memory, and other settings).
  4. Amazon ECS: for hosting the containerized services. Amazon ECS is a highly scalable and high-performance container orchestrator. In this blog post, the Fargate launch type is used, so that containers are launched on the Fargate serverless compute engine, and you don’t have to provision or manage any EC2 instances. In this sample API, service auto scaling is also enabled, so that the number of containers in each service can scale up and down automatically based on % CPU usage. Containers will be launched across multiple Availability Zones in the AWS Region, to get high availability.
  5. Amazon DynamoDB: for persisting the data. Amazon DynamoDB is a key-value and document database that provides single-millisecond performance at any scale. In a real-world scenario, you could still use DynamoDB or another data store, such as Amazon Relational Database Service (RDS).

All the code of this blog post is publicly available in this GitHub repository. You can explore the CloudFormation template used to define the sample application as code. You can view the source code of the two containerized services: Food store repository and Pet store repository. You can also explore the code of the sample web app that you’ll use to test the API (this web app has been developed with the Amplify framework). Note that the code provided is intended for testing purposes and not for production usage.

Walkthrough

In this section, you will deploy the sample application and test it.

Prerequisites

To launch the sample API, you first need an AWS user that has access to the AWS Management Console and has the IAM permissions to launch the AWS CloudFormation stack.

Deploying the sample application

Now it’s time to launch the sample API:

  1. Select Launch Stack
  2. In the page for quick stack creation, do the following:
    • Select the capability “I acknowledge that AWS CloudFormation might create IAM resources”.
    • Keep the rest as default.
    • Choose Create Stack.
  3. Wait until the status of the stack transitions to “CREATE_COMPLETE”.

Testing the sample application

In this section, you test the API from a sample web application client that I created. Open the sample web application:

  1. From the page of the stack, choose Outputs.
  2. Open the URL for the “APITestPage” output.
  3. On the opened page, choose Proceed.

The web page should state that you are not signed-in. In this sample API, any user can GET items, but only authenticated users can PUT items. Sign up to the sample web application:

  1. Choose Go To Sign In.
  2. Choose Create account.
  3. Complete the sign-up procedure (you will be asked for a valid email address, which will be registered into your Amazon Cognito User Pool).

The application should state that you are signed-in. Test the API as an authenticated user:

  1. Try to PUT an item. You would see that the operation succeeds. The item has been persisted by the containerized service to the DynamoDB table.

DynamoDB table

 

2.  Try to GET the same item that you previously PUT. You would see that the same JSON is returned. This JSON is retrieved by the containerized service from the DynamoDB table.

Test the API as an unauthenticated user:

  1. Choose Sign Out.
  2. Try to GET the same item that you previously PUT. You would see that the same JSON is returned. This JSON is retrieved by the containerized service from the DynamoDB table.
  3. Try to PUT any item. You would get a 401 Unauthorized error from the API. This behavior is expected because only signed-in users have a JWT token, and you configured API Gateway to only authorize PUT requests that provide a valid token.

DynamoDB table

Exploring the resources of the sample application

You can also explore the resources launched as part of the CloudFormation stack. To list all of them, from the page of your CloudFormation stack, choose Resources.

To see the Amazon ECS services, go to the Amazon ECS console, choose your cluster, and you would see that 2 services are running, one for the Foodstore and another for the Petstore, as shown in the following image.

Notice that the services use the Fargate launch type, meaning that they are running on serverless compute capacity, so you don’t have to launch or maintain any EC2 instances to run them.

Cluster demo

To see the details of a service, go to the Amazon ECS cluster page and choose a service. You land on the service page, where you can see the running tasks, the service events, and other details.

To view the service auto scaling configuration, choose Auto Scaling. You can notice that Amazon ECS is set to automatically scale the number of tasks according to the value of a metric. In this sample application, the metric is the average CPU utilization of the service (ECSServiceAverageCPUUtilization), but you could use another metric.

Auto scaling

The scaling policy of each service uses two Amazon CloudWatch Alarms, one for scaling out and one for scaling in. An alarm is triggered when the target metric deviates from the target value, which in turn is used to trigger the scaling action. To see the alarms, go to the CloudWatch Alarms console.

CloudWatch Alarms

To see the service discovery entries, go to the AWS Cloud Map console, choose your namespace (see the parameter “PrivateDNSNamespaceName” in the CloudFormation stack), and you would see that two services are defined. If you choose one of these services, you would see that multiple service instances are registered, each representing a single Amazon ECS task (in this sample application, each Amazon ECS task is a single container). If you choose one of these service instances, you would see the details about the task, including the private IP, the port, and the health status. API Gateway retrieves these entries to discover your services.

Service Instance

To see the API configuration, go to the API Gateway console and choose your API.

Then, from the left side of the screen select either Routes, Authorization, Integrations, or any other option.

Integrations

Cleaning up

To clean up the resources, simply delete the CloudFormation stack that you deployed as part of this blog post.

Conclusion

You have learned how API Gateway HTTP APIs can be used together with AWS Cloud Map to expose Amazon ECS services as APIs. You have deployed a sample API that also uses Amazon Cognito for authentication and DynamoDB for data persistence.

API Gateway HTTP APIs provides a number of features that you can leverage, such as OpenAPI import/export, throttling, OAuth 2.0 / OIDC user authorization, detailed metrics, and stages deployment. That said, API Gateway is not the only way to expose your ECS services; if you don’t need the features of API Gateway HTTP APIs or if those of Elastic Load Balancing are a better fit, then you can use the latter service. The recommended approach is to compare them to choose the most suitable for your use case.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.