Tag Archives: Technical How-to

Rapid and flexible Infrastructure as Code using the AWS CDK with AWS Solutions Constructs

2020-10-31 Biff Gaut

Post Syndicated from Biff Gaut original https://aws.amazon.com/blogs/devops/rapid-flexible-infrastructure-with-solutions-constructs-cdk/

Introduction

As workloads move to the cloud and all infrastructure becomes virtual, infrastructure as code (IaC) becomes essential to leverage the agility of this new world. JSON and YAML are the powerful, declarative modeling languages of AWS CloudFormation, allowing you to define complex architectures using IaC. Just as higher level languages like BASIC and C abstracted away the details of assembly language and made developers more productive, the AWS Cloud Development Kit (AWS CDK) provides a programming model above the native template languages, a model that makes developers more productive when creating IaC. When you instantiate CDK objects in your Typescript (or Python, Java, etc.) application, those objects “compile” into a YAML template that the CDK deploys as an AWS CloudFormation stack.

AWS Solutions Constructs take this simplification a step further by providing a library of common service patterns built on top of the CDK. These multi-service patterns allow you to deploy multiple resources with a single object, resources that follow best practices by default – both independently and throughout their interaction.

Comparison of an Application stack with Assembly Language, 4th generation language and Object libraries such as Hibernate with an IaC stack of CloudFormation, AWS CDK and AWS Solutions Constructs

Application Development Stack vs. IaC Development Stack

Solution overview

To demonstrate how using Solutions Constructs can accelerate the development of IaC, in this post you will create an architecture that ingests and stores sensor readings using Amazon Kinesis Data Streams, AWS Lambda, and Amazon DynamoDB.

An architecture diagram showing sensor readings being sent to a Kinesis data stream. A Lambda function will receive the Kinesis records and store them in a DynamoDB table.

Prerequisite – Setting up the CDK environment

Tip – If you want to try this example but are concerned about the impact of changing the tools or versions on your workstation, try running it on AWS Cloud9. An AWS Cloud9 environment is launched with an AWS Identity and Access Management (AWS IAM) role and doesn’t require configuring with an access key. It uses the current region as the default for all CDK infrastructure.

To prepare your workstation for CDK development, confirm the following:

Node.js 10.3.0 or later is installed on your workstation (regardless of the language used to write CDK apps).
You have configured credentials for your environment. If you’re running locally you can do this by configuring the AWS Command Line Interface (AWS CLI).
TypeScript 2.7 or later is installed globally (npm -g install typescript)

Before creating your CDK project, install the CDK toolkit using the following command:

npm install -g aws-cdk

Create the CDK project

First create a project folder called stream-ingestion with these two commands:

mkdir stream-ingestion
cd stream-ingestion

Now create your CDK application using this command:

npx [email protected] init app --language=typescript

Tip – This example will be written in TypeScript – you can also specify other languages for your projects.

At this time, you must use the same version of the CDK and Solutions Constructs. We’re using version 1.68.0 of both based upon what’s available at publication time, but you can update this with a later version for your projects in the future.

Let’s explore the files in the application this command created:

bin/stream-ingestion.ts – This is the module that launches the application. The key line of code is:

new StreamIngestionStack(app, 'StreamIngestionStack');

This creates the actual stack, and it’s in StreamIngestionStack that you will write the CDK code that defines the resources in your architecture.

lib/stream-ingestion-stack.ts – This is the important class. In the constructor of StreamIngestionStack you will add the constructs that will create your architecture.

During the deployment process, the CDK uploads your Lambda function to an Amazon S3 bucket so it can be incorporated into your stack.

To create that S3 bucket and any other infrastructure the CDK requires, run this command:

cdk bootstrap

The CDK uses the same supporting infrastructure for all projects within a region, so you only need to run the bootstrap command once in any region in which you create CDK stacks.

To install the required Solutions Constructs packages for our architecture, run the these two commands from the command line:

npm install @aws-solutions-constructs/[email protected]
npm install @aws-solutions-constructs/[email protected]

Write the code

First you will write the Lambda function that processes the Kinesis data stream messages.

Create a folder named lambda under stream-ingestion
Within the lambda folder save a file called lambdaFunction.js with the following contents:

var AWS = require("aws-sdk");

// Create the DynamoDB service object
var ddb = new AWS.DynamoDB({ apiVersion: "2012-08-10" });

AWS.config.update({ region: process.env.AWS_REGION });

// We will configure our construct to 
// look for the .handler function
exports.handler = async function (event) {
  try {
    // Kinesis will deliver records 
    // in batches, so we need to iterate through
    // each record in the batch
    for (let record of event.Records) {
      const reading = parsePayload(record.kinesis.data);
      await writeRecord(record.kinesis.partitionKey, reading);
    };
  } catch (err) {
    console.log(`Write failed, err:\n${JSON.stringify(err, null, 2)}`);
    throw err;
  }
  return;
};

// Write the provided sensor reading data to the DynamoDB table
async function writeRecord(partitionKey, reading) {

  var params = {
    // Notice that Constructs automatically sets up 
    // an environment variable with the table name.
    TableName: process.env.DDB_TABLE_NAME,
    Item: {
      partitionKey: { S: partitionKey },  // sensor Id
      timestamp: { S: reading.timestamp },
      value: { N: reading.value}
    },
  };

  // Call DynamoDB to add the item to the table
  await ddb.putItem(params).promise();
}

// Decode the payload and extract the sensor data from it
function parsePayload(payload) {

  const decodedPayload = Buffer.from(payload, "base64").toString(
    "ascii"
  );

  // Our CLI command will send the records to Kinesis
  // with the values delimited by '|'
  const payloadValues = decodedPayload.split("|", 2)
  return {
    value: payloadValues[0],
    timestamp: payloadValues[1]
  }
}

We won’t spend a lot of time explaining this function – it’s pretty straightforward and heavily commented. It receives an event with one or more sensor readings, and for each reading it extracts the pertinent data and saves it to the DynamoDB table.

You will use two Solutions Constructs to create your infrastructure:

The aws-kinesisstreams-lambda construct deploys an Amazon Kinesis data stream and a Lambda function.

aws-kinesisstreams-lambda creates the Kinesis data stream and Lambda function that subscribes to that stream. To support this, it also creates other resources, such as IAM roles and encryption keys.

The aws-lambda-dynamodb construct deploys a Lambda function and a DynamoDB table.

aws-lambda-dynamodb creates an Amazon DynamoDB table and a Lambda function with permission to access the table.

To deploy the first of these two constructs, replace the code in lib/stream-ingestion-stack.ts with the following code:

import * as cdk from "@aws-cdk/core";
import * as lambda from "@aws-cdk/aws-lambda";
import { KinesisStreamsToLambda } from "@aws-solutions-constructs/aws-kinesisstreams-lambda";

import * as ddb from "@aws-cdk/aws-dynamodb";
import { LambdaToDynamoDB } from "@aws-solutions-constructs/aws-lambda-dynamodb";

export class StreamIngestionStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const kinesisLambda = new KinesisStreamsToLambda(
      this,
      "KinesisLambdaConstruct",
      {
        lambdaFunctionProps: {
          // Where the CDK can find the lambda function code
          runtime: lambda.Runtime.NODEJS_10_X,
          handler: "lambdaFunction.handler",
          code: lambda.Code.fromAsset("lambda"),
        },
      }
    );

    // Next Solutions Construct goes here
  }
}

Let’s explore this code:

It instantiates a new KinesisStreamsToLambda object. This Solutions Construct will launch a new Kinesis data stream and a new Lambda function, setting up the Lambda function to receive all the messages in the Kinesis data stream. It will also deploy all the additional resources and policies required for the architecture to follow best practices.
The third argument to the constructor is the properties object, where you specify overrides of default values or any other information the construct needs. In this case you provide properties for the encapsulated Lambda function that informs the CDK where to find the code for the Lambda function that you stored as lambda/lambdaFunction.js earlier.

Now you’ll add the second construct that connects the Lambda function to a new DynamoDB table. In the same lib/stream-ingestion-stack.ts file, replace the line // Next Solutions Construct goes here with the following code:

    // Define the primary key for the new DynamoDB table
    const primaryKeyAttribute: ddb.Attribute = {
      name: "partitionKey",
      type: ddb.AttributeType.STRING,
    };

    // Define the sort key for the new DynamoDB table
    const sortKeyAttribute: ddb.Attribute = {
      name: "timestamp",
      type: ddb.AttributeType.STRING,
    };

    const lambdaDynamoDB = new LambdaToDynamoDB(
      this,
      "LambdaDynamodbConstruct",
      {
        // Tell construct to use the Lambda function in
        // the first construct rather than deploy a new one
        existingLambdaObj: kinesisLambda.lambdaFunction,
        tablePermissions: "Write",
        dynamoTableProps: {
          partitionKey: primaryKeyAttribute,
          sortKey: sortKeyAttribute,
          billingMode: ddb.BillingMode.PROVISIONED,
          removalPolicy: cdk.RemovalPolicy.DESTROY
        },
      }
    );

    // Add autoscaling
    const readScaling = lambdaDynamoDB.dynamoTable.autoScaleReadCapacity({
      minCapacity: 1,
      maxCapacity: 50,
    });

    readScaling.scaleOnUtilization({
      targetUtilizationPercent: 50,
    });

Let’s explore this code:

The first two const objects define the names and types for the partition key and sort key of the DynamoDB table.
The LambdaToDynamoDB construct instantiated creates a new DynamoDB table and grants access to your Lambda function. The key to this call is the properties object you pass in the third argument.
- The first property sent to LambdaToDynamoDB is existingLambdaObj – by setting this value to the Lambda function created by KinesisStreamsToLambda, you’re telling the construct to not create a new Lambda function, but to grant the Lambda function in the other Solutions Construct access to the DynamoDB table. This illustrates how you can chain many Solutions Constructs together to create complex architectures.
- The second property sent to LambdaToDynamoDB tells the construct to limit the Lambda function’s access to the table to write only.
- The third property sent to LambdaToDynamoDB is actually a full properties object defining the DynamoDB table. It provides the two attribute definitions you created earlier as well as the billing mode. It also sets the RemovalPolicy to DESTROY. This policy setting ensures that the table is deleted when you delete this stack – in most cases you should accept the default setting to protect your data.
The last two lines of code show how you can use statements to modify a construct outside the constructor. In this case we set up auto scaling on the new DynamoDB table, which we can access with the dynamoTable property on the construct we just instantiated.

That’s all it takes to create the all resources to deploy your architecture.

Save all the files, then compile the Typescript into a CDK program using this command:

npm run build

Finally, launch the stack using this command:

cdk deploy

(Enter “y” in response to Do you wish to deploy all these changes (y/n)?)

You will see some warnings where you override CDK default values. Because you are doing this intentionally you may disregard these, but it’s always a good idea to review these warnings when they occur.

Tip – Many mysterious CDK project errors stem from mismatched versions. If you get stuck on an inexplicable error, check package.json and confirm that all CDK and Solutions Constructs libraries have the same version number (with no leading caret ^). If necessary, correct the version numbers, delete the package-lock.json file and node_modules tree and run npm install. Think of this as the “turn it off and on again” first response to CDK errors.

You have now deployed the entire architecture for the demo – open the CloudFormation stack in the AWS Management Console and take a few minutes to explore all 12 resources that the program deployed (and the 380 line template generated to created them).

Feed the Stream

Now use the CLI to send some data through the stack.

Go to the Kinesis Data Streams console and copy the name of the data stream. Replace the stream name in the following command and run it from the command line.

aws kinesis put-records \
--stream-name StreamIngestionStack-KinesisLambdaConstructKinesisStreamXXXXXXXX-XXXXXXXXXXXX \
--records \
PartitionKey=1301,'Data=15.4|2020-08-22T01:16:36+00:00' \
PartitionKey=1503,'Data=39.1|2020-08-22T01:08:15+00:00'

Tip – If you are using the AWS CLI v2, the previous command will result in an “Invalid base64…” error because v2 expects the inputs to be Base64 encoded by default. Adding the argument --cli-binary-format raw-in-base64-out will fix the issue.

To confirm that the messages made it through the service, open the DynamoDB console – you should see the two records in the table.

Now that you’ve got it working, pause to think about what you just did. You deployed a system that can ingest and store sensor readings and scale to handle heavy loads. You did that by instantiating two objects – well under 60 lines of code. Experiment with changing some property values and deploying the changes by running npm run build and cdk deploy again.

Cleanup

To clean up the resources in the stack, run this command:

cdk destroy

Conclusion

Just as languages like BASIC and C allowed developers to write programs at a higher level of abstraction than assembly language, the AWS CDK and AWS Solutions Constructs allow us to create CloudFormation stacks in Typescript, Java, or Python instead JSON or YAML. Just as there will always be a place for assembly language, there will always be situations where we want to write CloudFormation templates manually – but for most situations, we can now use the AWS CDK and AWS Solutions Constructs to create complex and complete architectures in a fraction of the time with very little code.

AWS Solutions Constructs can currently be used in CDK applications written in Typescript, Javascript, Java and Python and will be available in C# applications soon.

About the Author

Biff Gaut has been shipping software since 1983, from small startups to large IT shops. Along the way he has contributed to 2 books, spoken at several conferences and written many blog posts. He is now a Principal Solutions Architect at AWS working on the AWS Solutions Constructs team, helping customers deploy better architectures more quickly.

Field Notes: Integrating IoT and ITSM using AWS IoT Greengrass and AWS Secrets Manager – Part 2

2020-10-29 Gary Emmerton

Post Syndicated from Gary Emmerton original https://aws.amazon.com/blogs/architecture/field-notes-integrating-iot-and-itsm-using-aws-iot-greengrass-and-aws-secrets-manager-part-2/

In part 1 of this blog I introduced the need for organizations to securely connect thousands of IoT devices with many different systems in the hyperconnected world that exists today, and how that can be addressed using AWS IoT Greengrass and AWS Secrets Manager. We walked through the creation of ServiceNow credentials in AWS Secrets Manager, the creation of IAM roles and the Lambda functions that will run on our edge device (a Raspberry Pi).

In this second part of the blog, we will setup AWS IoT Greengrass, on our Raspberry Pi, and AWS IoT Core so that we can run the AWS Lambda functions and access our ServiceNow credentials, retrieved securely from AWS Secrets Manager.

Setting up AWS IoT Core and AWS IoT Greengrass

The overall sequence for configuring AWS IoT Core and AWS IoT Greengrass is:

Create a certificate, and IoT Thing and link them
Create AWS IoT Greengrass group
Associate IAM role to the AWS IoT Greengrass group
Create and attach a policy to the certificate
Create an AWS IoT Greengrass Resource Definition for our ‘Secret’
Create an AWS IoT Greengrass Function Definition for our Lambda functions
Create an AWS IoT Greengrass Subscription Definition for IoT Topics to be used
Finally associate our Resource, Function and Subscription Definitions with our AWS IoT Greengrass Core

Steps

For this walkthrough, I have selected the AWS region “eu-west-1”, however, feel free to use other Regions where AWS IoT Core and AWS IoT Greengrass are available.

First, let’s install Greengrass on the Raspberry Pi:

Follow the instructions to configure the pre-requisites on the Raspberry Pi
Then we download the AWS IoT Greengrass software
And then we unzip the AWS IoT Greengrass software using the following command (note, this command is for version 1.10.0 of Greengrass and will change as later versions are released):

sudo tar -xzvf greengrass-linux-armv6l-1.10.0.tar.gz -C /

Note that AWS IoT Greengrass must be compatible with the version of the AWS Greengrass SDK installed to identify what versions are compatible and use sudo pip3 install greengrasssdk==<version_number> to install the SDK compatible with the version of AWS IoT Greengrass that we installed.

Our AWS IoT Greengrass core will authenticate with AWS IoT Core in AWS using certificates, so we need to generate these first using the following command:

aws iot create-keys-and-certificate --set-as-active --certificate-pem-outfile "iot-ge.cert.pem" --public-key-outfile "iot-ge.public.key" --private-key-outfile "iot-ge.private.key"

This command will generate three files containing the private key, public key and certificate. All of these files need to be copied to the /greengrass/certs folder on the Raspberry Pi. Also, the output of the preceding command will give the ARN of the certificate – we need to make a note of this ARN as we will use it in the next steps.

We also need to download a copy of the Amazon Root CA into the /greegrass/certs folder using the command below:

sudo wget -O root.ca.pem https://www.amazontrust.com/repository/AmazonRootCA1.pem

For the next step we need our AWS account number and IoT Host address unique to our account – we get the IoT Host address using the command:

aws iot describe-endpoint --endpoint-type iot:Data-ATS

Now we need to create a config.json file on the Raspberry Pi in the /greengrass/config folder, with the account number and IoT Host address obtained in the previous step;

{
  "coreThing" : {
    "caPath" : "root.ca.pem",
    "certPath" : "iot-ge.cert.pem",
    "keyPath" : "iot-ge.private.key",
    "thingArn" : "arn:aws:iot:eu-west-1:<aws_account_number>:thing/IoT-blog_Core",
    "iotHost" : "<endpoint_address>",
    "ggHost" : "greengrass-ats.iot.eu-west-1.amazonaws.com",
    "keepAlive" : 600
  },
  "runtime" : {
    "cgroup" : {
      "useSystemd" : "yes"
    },
    "allowFunctionsToRunAsRoot" : "yes"
  },
  "managedRespawn" : false,
  "crypto" : {
    "principals" : {
      "SecretsManager" : {
        "privateKeyPath" : "file:///greengrass/certs/iot-ge.private.key"
      },
      "IoTCertificate" : {
        "privateKeyPath" : "file:///greengrass/certs/iot-ge.private.key",
        "certificatePath" : "file:///greengrass/certs/iot-ge.cert.pem"
      }
    },
    "caPath" : "file:///greengrass/certs/root.ca.pem"
  }
}

Note that the line "allowFunctionsToRunAsRoot" : "yes" allows the Lambda functions to easily access the SenseHat on the Raspberry Pi. This configuration should normally be avoided in Production environments for security reasons but has been used here for simplicity.

Next we create the IoT Thing to represent our Raspberry Pi to match the entry we added into the config.json file previously:

aws iot create-thing --thing-name IoT-blog_Core

Now that our config.json file is in place and our IoT ‘thing’ created we can start the AWS IoT Greengrass software using the following commands:

cd /greengrass/ggc/core/
sudo ./greengrassd start

Then we attach the certificate to our new Thing – we need the ARN of the certificate that was noted in the earlier steps when we created the certificates:

aws iot attach-thing-principal --thing-name "IoT-blog_Core" --principal "<certificate_arn>"

Now we create the AWS IoT Greengrass group – make a note of the Group ID in the output of this command as we use it later:

aws greengrass create-group --name IoT-blog-group

Next we create the AWS IoT Greengrass Core definition file – create this using a text editor and save as core-def.json

{
  "Cores": [
    {
      "CertificateArn": "<certificate_arn>",
      "Id": "<IoT Thing Name>",
      "SyncShadow": true,
      "ThingArn": "<thing_arn>"
    }
  ]
}

Then, using the preceding file we just created, we create the core definition using the following command:

aws greengrass create-core-definition --name "IoT-blog_Core" --initial-version file://core-def.json

Now we associate the AWS IoT Greengrass core with the AWS IoT Greengrass group – we need the LatestVersionARN from the output of the command above and the group ID of your existing AWS IoT Greengrass group (in the output from the command for creation of the group in previous steps):

aws greengrass create-group-version --group-id "<greengrass_group_id>" --core-definition-version-arn "<core_definition_version_arn>"

Then we associate the IAM role (created earlier) to the AWS IoT Greengrass group;

aws greengrass associate-role-to-group --group-id "<greengrass_group_id>" --role-arn "arn:aws:iam::<aws_account_number>:role/IoTGGRole"

We need to create a policy to associate with the certificate so that our AWS IoT Greengrass Core (authenticated/authorized by our certificates) has rights to interact with AWS IoT Core. To do this we create the policy.json file:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "iot:Publish",
        "iot:Subscribe",
        "iot:Connect",
        "iot:Receive"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "iot:GetThingShadow",
        "iot:UpdateThingShadow",
        "iot:DeleteThingShadow"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "greengrass:*"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

Then create the policy using the policy file using the command below:

aws iot create-policy --policy-name myGGPolicy --policy-document file://policy.json

And finally attach our new policy to the certificate – as the certificate is attached to our AWS IoT Greengrass Core, this gives the rights defined in the policy to our AWS IoT Greengrass Core;

aws iot attach-policy --target "<certificate_arn>" --policy-name "myGGPolicy"

Now we have the AWS IoT Greengrass Core and permissions in place, it’s time to add our Secret as a resource for AWS IoT Greengrass.

First, we need to create a resource definition that refers to the ARN of the secret we created earlier. Get the ARN of the secret using the following command:

aws secretsmanager describe-secret --secret-id "greengrass-snow-creds"

And then we create a text file containing the following and save it as resource.json:

{
"Resources": [
    {
      "Id": "SNOW-Credentials",
      "Name": "SNOW-Credentials",
      "ResourceDataContainer": {
        "SecretsManagerSecretResourceData": {
          "ARN": "<secret_arn>"
        }
      }
    }
  ]
}

Now we command to create the resource reference in IoT to the Secret:

aws greengrass create-resource-definition --name "MySNOWSecret" --initial-version file://resource.json

Note the Resource ID from the output as it is needed as it has to be added to the Lambda definition json file in the next steps. The function definition file contains the details of the Lambda function(s) that we will attach to our AWS IoT Greengrass group. We create a text file with the content below and save as lambda-def.json.

We also specify a couple of variables in the definition file; these are the same as the environment variables that can be specified for Lambda, but they make the variables available in AWS IoT Greengrass.

Note, if we specify environment variables for the functions in the Lambda console then these will NOT be available when the function is running under AWS IoT Greengrass. We will need our ServiceNow API URL to add to the configuration below, and this will be in the form of https://devXXXXX.service-now.com/api/now/table/incident, where XXXXX is the developer instance number assigned by ServiceNow when our instance is created.

We need the ARNs of the Lambda functions that we created in part 1 of the blog – these appear in the output after successfully creating the functions from the command line, or can be obtained using the aws lambda list-functions command – we need to have the ‘:1’ at the end of the ARN as AWS IoT Greengrass needs to reference published function versions.

{
  "DefaultConfig": {
    "Execution": {
      "IsolationMode": "NoContainer",
      "RunAs": {
        "Gid": 0,
        "Uid": 0
      }
    }
  },
  "Functions": [
    {
      "FunctionArn": "<lambda_function1_arn>:1",
      "FunctionConfiguration": {
        "EncodingType": "json",
        "Environment": {
          "Execution": {
            "IsolationMode": "NoContainer"
          },
          "Variables": { 
            "tempLimit": "30",
            "humidLimit": "50"
          }
        },
        "ExecArgs": "string",
        "Executable": "lambda_function.lambda_handler",
        "Pinned": true,
        "Timeout": 10
      },
    "Id": "sensorLambda"
    },
    {
      "FunctionArn": "<lambda_function2_arn>:1",
      "FunctionConfiguration": {
        "EncodingType": "json",
        "Environment": {
          "Execution": {
            "IsolationMode": "NoContainer"
          },
          "ResourceAccessPolicies": [
            {
              "Permission": "ro",
              "ResourceId": "SNOW-Credentials"
            }
          ],
          "Variables": { 
            "snowUrl": "<service_now_api_url>"
          }
        },
        "ExecArgs": "string",
        "Executable": "lambda_function.lambda_handler",
        "Pinned": false,
        "Timeout": 10
      },
    "Id": "anomalyLambda"
    }
  ]
}

The Lambda function now needs to be registered within our AWS IoT Greengrass core using the definition file just created, using the following command:

aws greengrass create-function-definition --name "IoT-blog-lambda" --initial-version file://lambda-def.json

Create Subscriptions

We now need to create some IoT Topics to pass data between the two Lambda functions and also to submit all sensor data to AWS IoT Core, which gives us visibility of the successful collection of sensor data.cd.

First, let’s create a subscription configuration file (subscriptions.json) for sensor data and anomaly data:

{
  "Subscriptions": [
    {
      "Id": "SensorData",
      "Source": "<lambda_function1_arn>:1",
      "Subject": "IoTBlog/sensorData",
      "Target": "cloud"
    },
    {
      "Id": "AnomalyData",
      "Source": "<lambda_function1_arn>:1",
      "Subject": "IoTBlog/anomaly",
      "Target": "<lambda_function2_arn>:1"
    },
    {
      "Id": "AnomalyDataB",
      "Source": "<lambda_function1_arn>:1",
      "Subject": "IoTBlog/anomaly",
      "Target": "cloud"
    }
  ]
}

And next, we run the command to create the subscription from this configuration:

aws greengrass create-subscription-definition --name "IoT-sensor-subs" --initial-version file://subscriptions.json

Update AWS IoT Greengrass Group Associations and Deploy

Now that the functions, subscriptions and resources have been defined, we run the following command to update our AWS IoT Greengrass group to the new version with those components included:

aws greengrass create-group-version --group-id <gg_group_id> --core-definition-version-arn "<core_def_version_arn>" --function-definition-version-arn "<function_def_version_arn>" --resource-definition-version-arn "<resource_def_version_arn>" --subscription-definition-version-arn "<subscription_def_version_arn>"

And finally, we can deploy our configuration. Use the following command to deploy the Greengrass group to our device, using the group-version-id from the output of the previous command and also the group-id:

aws greengrass create-deployment --deployment-type NewDeployment --group-id <gg_group_id> --group-version-id <gg_group_version_id>

Summarized below is the integration between the different functions and components that we have now deployed to get from our sensor data through to an incident being raised in ServiceNow:

Raspberry PI

Create an Incident

Everything is setup now from an IoT perspective, so we can attempt to trigger a threshold breach on the sensors to trigger the creation of an incident in ServiceNow. In order to trigger the incident creation, let’s raise the humidity around the sensor so that it breaches the threshold defined in the environment variables of the Lambda function.

Under normal conditions we will just see the data published by the first Lambda function in the IoTBlog/sensorData topic:

IoTblog sensordata

However, when a threshold is breached (in our example, humidity above 50%), the data is published to the IoTBlog/anomaly topic as shown below:

ioTblog Anomaly

Via the AWS IoT Greengrass subscriptions created earlier, this message arriving in the anomaly topic also triggers the second Lambda function to create the ticket in ServiceNow.

The log for the second Lambda function on AWS IoT Greengrass (stored in /greengrass/ggc/var/log/user/<region>/<aws_account_number>/ on the Raspberry Pi) will show a ‘201’ return code if the incident is successfully created in ServiceNow.

Now let’s log on to ServiceNow and check out our new incident. Good news, our new incident appears correctly:

And when we click on our incident we can see the detail, including the full data from the IoT topic in the Activities section;

This is only a basic use of the ServiceNow API and there are many other parameters that you can use to increase the richness of the incident, refer to the ServiceNow API documentation for more details.

Cleaning up

To avoid incurring future charges, delete the resources that you created in the walkthrough.

Conclusion

We have built an IoT device (Raspberry Pi), running AWS IoT Greengrass, AWS Lambda, and using ServiceNow credentials managed in AWS Secrets Manager. Using this we have triggered an anomaly event that has created an incident automatically in ServiceNow, directly from the Lambda function running on our Pi. You can use this architecture as the foundation to integrate your edge devices and ITSM solution to automate ticket generation in your organization.

Look out for follow-up blogs that will extend this solution to provide a real-time dashboard for the sensor data and store the sensor data in a Data Lake for historical visualization.

Find out more about deploying Secrets to AWS IoT Greengrass Core.

Check out the AWS IoT Blog for more examples of how to use AWS to integrate your edge devices with the AWS Cloud.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Building, bundling, and deploying applications with the AWS CDK

2020-10-28 Cory Hall

Post Syndicated from Cory Hall original https://aws.amazon.com/blogs/devops/building-apps-with-aws-cdk/

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to model and provision your cloud application resources using familiar programming languages.

The post CDK Pipelines: Continuous delivery for AWS CDK applications showed how you can use CDK Pipelines to deploy a TypeScript-based AWS Lambda function. In that post, you learned how to add additional build commands to the pipeline to compile the TypeScript code to JavaScript, which is needed to create the Lambda deployment package.

In this post, we dive deeper into how you can perform these build commands as part of your AWS CDK build process by using the native AWS CDK bundling functionality.

If you’re working with Python, TypeScript, or JavaScript-based Lambda functions, you may already be familiar with the PythonFunction and NodejsFunction constructs, which use the bundling functionality. This post describes how to write your own bundling logic for instances where a higher-level construct either doesn’t already exist or doesn’t meet your needs. To illustrate this, I walk through two different examples: a Lambda function written in Golang and a static site created with Nuxt.js.

Concepts

A typical CI/CD pipeline contains steps to build and compile your source code, bundle it into a deployable artifact, push it to artifact stores, and deploy to an environment. In this post, we focus on the building, compiling, and bundling stages of the pipeline.

The AWS CDK has the concept of bundling source code into a deployable artifact. As of this writing, this works for two main types of assets: Docker images published to Amazon Elastic Container Registry (Amazon ECR) and files published to Amazon Simple Storage Service (Amazon S3). For files published to Amazon S3, this can be as simple as pointing to a local file or directory, which the AWS CDK uploads to Amazon S3 for you.

When you build an AWS CDK application (by running cdk synth), a cloud assembly is produced. The cloud assembly consists of a set of files and directories that define your deployable AWS CDK application. In the context of the AWS CDK, it might include the following:

AWS CloudFormation templates and instructions on where to deploy them
Dockerfiles, corresponding application source code, and information about where to build and push the images to
File assets and information about which S3 buckets to upload the files to

Use case

For this use case, our application consists of front-end and backend components. The example code is available in the GitHub repo. In the repository, I have split the example into two separate AWS CDK applications. The repo also contains the Golang Lambda example app and the Nuxt.js static site.

Golang Lambda function

To create a Golang-based Lambda function, you must first create a Lambda function deployment package. For Go, this consists of a .zip file containing a Go executable. Because we don’t commit the Go executable to our source repository, our CI/CD pipeline must perform the necessary steps to create it.

In the context of the AWS CDK, when we create a Lambda function, we have to tell the AWS CDK where to find the deployment package. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  runtime: lambda.Runtime.GO_1_X,
  handler: 'main',
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-go-executable')),
});

In the preceding code, the lambda.Code.fromAsset() method tells the AWS CDK where to find the Golang executable. When we run cdk synth, it stages this Go executable in the cloud assembly, which it zips and publishes to Amazon S3 as part of the PublishAssets stage.

If we’re running the AWS CDK as part of a CI/CD pipeline, this executable doesn’t exist yet, so how do we create it? One method is CDK bundling. The lambda.Code.fromAsset() method takes a second optional argument, AssetOptions, which contains the bundling parameter. With this bundling parameter, we can tell the AWS CDK to perform steps prior to staging the files in the cloud assembly.

Breaking down the BundlingOptions parameter further, we can perform the build inside a Docker container or locally.

Building inside a Docker container

For this to work, we need to make sure that we have Docker running on our build machine. In AWS CodeBuild, this means setting privileged: true. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [
        'bash', '-c', [
          'go test -v',
          'GOOS=linux go build -o /asset-output/main',
      ].join(' && '),
    },
  })
  ...
});

We specify two parameters:

image (required) – The Docker image to perform the build commands in
command (optional) – The command to run within the container

The AWS CDK mounts the folder specified as the first argument to fromAsset at /asset-input inside the container, and mounts the asset output directory (where the cloud assembly is staged) at /asset-output inside the container.

After we perform the build commands, we need to make sure we copy the Golang executable to the /asset-output location (or specify it as the build output location like in the preceding example).

This is the equivalent of running something like the following code:

docker run \
  --rm \
  -v folder-containing-source-code:/asset-input \
  -v cdk.out/asset.1234a4b5/:/asset-output \
  lambci/lambda:build-go1.x \
  bash -c 'GOOS=linux go build -o /asset-output/main'

Building locally

To build locally (not in a Docker container), we have to provide the local parameter. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [],
      local: {
        tryBundle(outputDir: string) {
          try {
            spawnSync('go version')
          } catch {
            return false
          }

          spawnSync(`GOOS=linux go build -o ${path.join(outputDir, 'main')}`);
          return true
        },
      },
    },
  })
  ...
});

The local parameter must implement the ILocalBundling interface. The tryBundle method is passed the asset output directory, and expects you to return a boolean (true or false). If you return true, the AWS CDK doesn’t try to perform Docker bundling. If you return false, it falls back to Docker bundling. Just like with Docker bundling, you must make sure that you place the Go executable in the outputDir.

Typically, you should perform some validation steps to ensure that you have the required dependencies installed locally to perform the build. This could be checking to see if you have go installed, or checking a specific version of go. This can be useful if you don’t have control over what type of build environment this might run in (for example, if you’re building a construct to be consumed by others).

If we run cdk synth on this, we see a new message telling us that the AWS CDK is bundling the asset. If we include additional commands like go test, we also see the output of those commands. This is especially useful if you wanted to fail a build if tests failed. See the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

Cloud Assembly

If we look at the cloud assembly that was generated (located at cdk.out), we see something like the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

It contains our GolangLambdaStack CloudFormation template that defines our Lambda function, as well as our Golang executable, bundled at asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952/main.

Let’s look at how the AWS CDK uses this information. The GolangLambdaStack.assets.json file contains all the information necessary for the AWS CDK to know where and how to publish our assets (in this use case, our Golang Lambda executable). See the following code:

{
  "version": "5.0.0",
  "files": {
    "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952": {
      "source": {
        "path": "asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952",
        "packaging": "zip"
      },
      "destinations": {
        "current_account-current_region": {
          "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}",
          "objectKey": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip",
          "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}"
        }
      }
    }
  }
}

The file contains information about where to find the source files (source.path) and what type of packaging (source.packaging). It also tells the AWS CDK where to publish this .zip file (bucketName and objectKey) and what AWS Identity and Access Management (IAM) role to use (assumeRoleArn). In this use case, we only deploy to a single account and Region, but if you have multiple accounts or Regions, you see multiple destinations in this file.

The GolangLambdaStack.template.json file that defines our Lambda resource looks something like the following code:

{
  "Resources": {
    "MyGoFunction0AB33E85": {
      "Type": "AWS::Lambda::Function",
      "Properties": {
        "Code": {
          "S3Bucket": {
            "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}"
          },
          "S3Key": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip"
        },
        "Handler": "main",
        ...
      }
    },
    ...
  }
}

The S3Bucket and S3Key match the bucketName and objectKey from the assets.json file. By default, the S3Key is generated by calculating a hash of the folder location that you pass to lambda.Code.fromAsset(), (for this post, folder-containing-source-code). This means that any time we update our source code, this calculated hash changes and a new Lambda function deployment is triggered.

Nuxt.js static site

In this section, I walk through building a static site using the Nuxt.js framework. You can apply the same logic to any static site framework that requires you to run a build step prior to deploying.

To deploy this static site, we use the BucketDeployment construct. This is a construct that allows you to populate an S3 bucket with the contents of .zip files from other S3 buckets or from a local disk.

Typically, we simply tell the BucketDeployment construct where to find the files that it needs to deploy to the S3 bucket. See the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-directory')),
  ],
  destinationBucket: myBucket
});

To deploy a static site built with a framework like Nuxt.js, we need to first run a build step to compile the site into something that can be deployed. For Nuxt.js, we run the following two commands:

yarn install – Installs all our dependencies
yarn generate – Builds the application and generates every route as an HTML file (used for static hosting)

This creates a dist directory, which you can deploy to Amazon S3.

Just like with the Golang Lambda example, we can perform these steps as part of the AWS CDK through either local or Docker bundling.

Building inside a Docker container

To build inside a Docker container, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [
          'bash', '-c', [
            'yarn install',
            'yarn generate',
            'cp -r /asset-input/dist/* /asset-output/',
          ].join(' && '),
        ],
      },
    }),
  ],
  ...
});

For this post, we build inside the publicly available node:lts image hosted on DockerHub. Inside the container, we run our build commands yarn install && yarn generate, and copy the generated dist directory to our output directory (the cloud assembly).

The parameters are the same as described in the Golang example we walked through earlier.

Building locally

To build locally, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        local: {
          tryBundle(outputDir: string) {
            try {
              spawnSync('yarn --version');
            } catch {
              return false
            }

            spawnSync('yarn install && yarn generate');

       fs.copySync(path.join(__dirname, ‘path-to-nuxtjs-project’, ‘dist’), outputDir);
            return true
          },
        },
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [],
      },
    }),
  ],
  ...
});

Building locally works the same as the Golang example we walked through earlier, with one exception. We have one additional command to run that copies the generated dist folder to our output directory (cloud assembly).

Conclusion

This post showed how you can easily compile your backend and front-end applications using the AWS CDK. You can find the example code for this post in this GitHub repo. If you have any questions or comments, please comment on the GitHub repo. If you have any additional examples you want to add, we encourage you to create a Pull Request with your example!

Our code also contains examples of deploying the applications using CDK Pipelines, so if you’re interested in deploying the example yourself, check out the example repo.

About the author

Cory Hall

Cory is a Solutions Architect at Amazon Web Services with a passion for DevOps and is based in Charlotte, NC. Cory works with enterprise AWS customers to help them design, deploy, and scale applications to achieve their business goals.

Field Notes: Integrating IoT and ITSM using AWS IoT Greengrass and AWS Secrets Manager – Part 1

2020-10-28 Gary Emmerton

Post Syndicated from Gary Emmerton original https://aws.amazon.com/blogs/architecture/field-notes-integrating-iot-and-itsm-using-aws-iot-greengrass-and-aws-secrets-manager-part-1/

IT Security is a hot topic in every organization, and in a hyper connected world the need to integrate thousands of IoT devices securely with many different systems at scale is critical.

AWS Secrets Manager helps customers manage their system credentials securely in the AWS Cloud, and with its integration with AWS IoT Greengrass, that capability now extends out to your edge-connected IoT devices.

In this two part blog post, I will walk through the steps to use this integration to give edge devices the capability to connect to and log incidents directly into ServiceNow. The credentials for connecting to ServiceNow are created in AWS Secrets Manager, and deployed locally (encrypted) to the edge device via AWS IoT Greengrass.

Part 1 (this post) gives an overview of the whole solution and the steps for setting up AWS Secrets Manager, creating the required IAM roles and AWS Lambda functions. In part 2 of the blog we will then set up AWS IoT Greengrass and AWS IoT Core so that we can run the functions and access the secret on our edge device (Raspberry Pi).

Enabling edge devices to automatically raise incidents in your organization’s ITSM toolset ensures that you can use existing workflows and incident escalation paths for your edge devices. Previously this would have been challenging to integrate. Additionally, by running this capability at the edge, it enables quicker responses and reduces the need to make calls back to the AWS Cloud.

Overview of solution

The solution makes use of a Raspberry Pi, running AWS IoT Greengrass. AWS Lambda functions on AWS IoT Greengrass capture temperature and humidity sensor data and make calls directly to ServiceNow when thresholds are breached. The integration of AWS Secrets Manager with AWS IoT Core enables the credentials required for the ServiceNow API calls to be available locally. These calls are encrypted and available on the Pi for the Lambda function to use.

The sensors for temperature and humidity in this example are on a Raspberry Pi Sense Hat, which illustrates sensors that could be used in an industrial or manufacturing use case. You can use any type of sensor such as vibration, strain gauges, or other electro-mechanical sensors.

One of the Lambda functions running on AWS IoT Greengrass on the RPi captures the sensor readings, and should a threshold on either be exceeded then it triggers a second Lambda function (again running on AWS IoT Greengrass). This then makes a ‘create incident’ API call to ServiceNow, using the credentials stored in AWS Secrets Manager.

In order to have visibility of the sensor data, and to manage communications between the first and second Lambda functions, all data from the sensors is published to one IoT Topic Data related to any threshold breaches is published to another IoT Topic.

Following is a high-level diagram for the architecture used in this blog.

ServiceNow RA

Prerequisites

To complete the steps in this blog, you need:

An AWS Account
A ServiceNow developer instance or other test ServiceNow instance that you can access – You can sign-up for a free ServiceNow developer account
A Raspberry Pi (I used a Pi 3B with Raspbian Buster)
A Raspberry Pi SenseHat
A workstation with the latest AWS Command Line Interface (CLI) installed

Additionally, ensure that you have Python 3.7.x installed with the following Python modules on the Raspberry Pi (Raspian Buster includes Python 3.7 by default). Install the following packages as the root user (sudo pip):

greengrasssdk
boto3
requests
sense_hat
datetime

I have taken the approach of having these modules installed on the Pi in order to simplify the creation of the Lambda function. This ensures the function will run locally on the Pi rather than having to build all of the Python modules on the Pi and then zip them to run the Lambda in an AWS IoT Greengrass container.

Walkthrough

The steps in the walkthrough can be achieved from the AWS console. I have focused on the command line approach, using the AWS CLI, as this will give a more detailed view of what is happening and the dependencies between the different components. The overall sequence for the steps is:

Create secret in AWS Secrets Manager – this will contain the credentials required to access ServiceNow for Lambda running on AWS IoT Greengrass
Create IAM role – provides permissions for AWS IoT Greengrass to other AWS services, including AWS Secrets Manager
Create Lambda functions – the functions that will capture sensor data and create the Service Now ticket
Configure and Deploy IoT Core – deploy our configuration to the Raspberry Pi, covered in Part 2 of this blog

I’ve structured the order of the steps in a logical sequence so that any dependencies of later steps are created first. There are a number of places where a value in the output of one command (such as an ARN) needs to be noted as required for subsequent commands. At times the steps may seem counter-intuitive, but whilst developing this blog, I found this sequence has proven to be the most effective.

Create Secret

First, we create our ‘Secret’ in AWS Secrets Manager. This consists of a secret string containing a JSON object for the username and password required for authentication to the API of my ServiceNow developer instance. For the purposes of this example, we will use the default encryption key for the AWS Secrets Manager service.

The following command, from the AWS CLI, is used to create the new secret, entering the relevant username and password for the ServiceNow instance.

IMPORTANT: the name of the secret must start with “greengrass-“(specified in the command after the “–name” parameter) because the IAM Greengrass managed service role (which we will use later) has permission by default to access secrets that start with this text.

aws secretsmanager create-secret --name "greengrass-snow-creds" --description "Credentials for ServiceNow API access" --secret-string '{"username":"&lt;username&gt;","password":"&lt;password&gt;"}'

The successful completion of the *preceding command results in an output on your terminal screen, containing the ARN (Amazon Resource Name) for the new secret.

Create IAM roles

In order for AWS IoT Greengrass to access our new secret and download it securely to AWS IoT Greengrass running on the Pi, we need to give it an IAM role. If we do not set this up correctly then there will be an error when trying to deploy the AWS IoT Greengrass group later in the walkthrough.

Our new IAM role needs a policy document that describes the permissions that we will give the role. The first step is to create the IAM policy document, containing only the permissions for the Greengrass service this new role – to do this we create a text file called assume-role.json containing the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "greengrass.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Then we run the following command, referencing the file just created:

aws iam create-role --role-name "IoTGGRole" --assume-role-policy-document file://assume-role.json

And finally we need to attach the IAM managed AWS IoT Greengrass service role to our new custom role:

aws iam attach-role-policy --role-name "IoTGGRole" --policy-arn arn:aws:iam::aws:policy/service-role/AWSGreengrassResourceAccessRolePolicy

We also need a role for our Lambda functions with basic execution permissions; create a text file called lambda-role.json containing the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Then we run the following command, referencing the file just created:

aws iam create-role --role-name "IoTLambdaRole" --assume-role-policy-document file://lambda-role.json

And finally we need to attach the IAM managed Lambda basic execution role to our new custom role:

aws iam attach-role-policy --role-name "IoTLambdaRole" --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Create Lambda Functions

We now create the Lambda functions that will be deployed to Greengrass – these functions manage the interactions between the sensors and ServiceNow.

Lambda Function 1 – Get/Publish Sensor Data

The first Lambda function reads the sensor data and publishes the data to an IoT Topic (IoTBlog/sensorData), every 5 seconds, which can then be used by downstream services for analytics. This function also determines whether a threshold has been breached and if so, it publishes the data to a separate IoT Topic (IotBlog/anomaly) to which our second Lambda function is subscribed.

import os
import json
from datetime import datetime
import time
import sys
from sense_hat import SenseHat
import greengrasssdk
import boto3

client = greengrasssdk.client('iot-data')
secClient = greengrasssdk.client('secretsmanager')
sense = SenseHat()
sense.clear()

t_threshold = int(os.environ['tempLimit'])
h_threshold = int(os.environ['humidLimit'])

def lambda_handler(event, context):
    return
 
# Get sensor data and check against thresholds
def getSensorData():
    while True:
        eventTitle = "no event"
        anomaly = False
        ts = time.time()
        dt = datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')
        
        temp     = round(sense.get_temperature(),2)
        humidity = round(sense.get_humidity(),2)
        
        time.sleep(5)

        if (temp > int(t_threshold)):
            anomaly = True
            eventTitle = "Temperature breach"
            
        if (humidity > int(h_threshold)):
            anomaly = True
            eventTitle = "Humidity breach"

        sensorData = { 'title': eventTitle,'dt':dt,'ts':ts,'t':temp,'h':humidity }
        publishData(anomaly,sensorData)

# Publish sensor data to IoT topic(s)
def publishData(anomaly,myData):
    response = client.publish(
        topic = 'IoTBlog/sensorData',
        payload = json.dumps(myData) )

We create a text file lambda_function.py containing the code above and then compress into a zip file named lambda_function_1.zip. Once this is done we can then create the function in AWS using the following command:

aws lambda create-function --function-name "1-IoT-GetSensorData" --runtime python3.7 --zip-file fileb://lambda_function_1.zip --handler lambda_function.lambda_handler –role arn:aws:iam::&lt;aws_account_number&gt;:role/IoTLambdaRole

In order to make use of Lambda functions in AWS IoT Greengrass we then need to publish a version of the function using the following command:

aws lambda publish-version --function-name "1-IoT-GetSensorData"

Lambda Function 2 – Create Anomaly Ticket

The second Lambda function is triggered through its subscription to the anomaly data published to the anomaly IoT Topic by the first Lambda function. This then makes a call to the ServiceNow API to create an incident. Prior to making the API call, the function obtains the ServiceNow credentials from the secret that has been made available to AWS IoT Greengrass.

As this is a Resource within the AWS IoT Greengrass Core, it is automatically downloaded to the Raspberry Pi as part of the deployment of the AWS IoT Greengrass Core.

import os
import json
import sys
import greengrasssdk
import requests

client = greengrasssdk.client('iot-data')
secClient = greengrasssdk.client('secretsmanager')
secret_name = "greengrass-snow-creds"
snow_instance = os.environ['snowUrl']
msg = ""

def publishAnomaly(msg,title):
    auth = getLocalSecret()
    createTicket(auth,msg,title)

# Get secret from GG
def getLocalSecret():
    secret = secClient.get_secret_value(SecretId=secret_name)
    rawSecret = secret.get('SecretString')
    return json.loads(str(rawSecret))

# Create ticket in ServiceNow            
def createTicket(auth,eventData,title):
    API_ENDPOINT = snow_instance
    HEADERS = {"Content-Type":"application/json","Accept":"application/json"}
    PARAMS = { 
        "short_description":title,
        "assignment_group":"sensor_team",
        "urgency":"2",
        "impact":"2",
        "comments":eventData
    } 

    request = requests.post(url = API_ENDPOINT, auth=(str(auth["username"]),str(auth["password"])), headers=HEADERS, data = json.dumps(PARAMS))
    print("Response:",request)

def lambda_handler(event, context):
    msg = json.dumps(event)
    msg = json.loads(msg)
    title = "Sensor Threshold - " + msg["title"]
    
    publishAnomaly(msg,title)
    return

We create another text file, lambda_function.py containing the code above and then compress into a zip file named lambda_function_2.zip. Once this is done we can then create the function in AWS using the following command:

aws lambda create-function --function-name "2-IoT-ServiceNow" --runtime python3.7 --zip-file fileb://lambda_function_2.zip --handler lambda_function.lambda_handler –role arn:aws:iam::&lt;account_number&gt;:role/IoTLambdaRole

In order to make use of Lambda functions in Greengrass we then need to publish a version of the function using the following command:

aws lambda publish-version --function-name "2-IoT-ServiceNow"

Conclusion

In this post, I showed you the steps for integrating IoT and ITSM by setting up AWS Secrets Manager, creating the required IAM roles and AWS Lambda functions. Now you can proceed to part 2 of the blog to set up AWS IoT-Core and AWS IoT Greengrass to make use of the secret and functions that you created in this post.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Agile website delivery with Hugo and AWS Amplify

2020-10-28 Nigel Harris

Post Syndicated from Nigel Harris original https://aws.amazon.com/blogs/devops/agile-website-delivery-with-hugo-and-aws-amplify/

In this post, we show how you can rapidly configure and deploy a website using Hugo (an AWS Cloud9 integrated development environment (IDE) for content editing), AWS CodeCommit for source code control, and AWS Amplify to implement a source code-controlled, automated deployment process.

When hosting a website on AWS, you can choose from several options. One popular option is to use Amazon Simple Storage Service (Amazon S3) to host a static website. If you prefer full access to the infrastructure hosting your website, you can use the NGINX Quick Start to quickly deploy web server infrastructure using AWS CloudFormation.

Static website generators such as Hugo and MkDocs accelerate the website content generation process, and can be a valuable tool when trying to rapidly deliver technical documentation or similar content. Typically, the content creation process requires programming in HTML and CSS.

Hugo is written in Go and available under the Apache 2.0 license. It provides several themes (collections of layouts) that accelerate website creation by drastically reducing the need to focus on format. You can author content in Markdown and output in multiple languages and formats (including ebook formats). Excellent examples of public websites built using Hugo include Digital.gov and Kubernetes.io.

Solution overview

This solution illustrates how to provision a hosted, source code-controlled Hugo generated website using CodeCommit and Amplify Console. The provisioned website is configured with a custom subdomain and an SSL certificate. We use an AWS Cloud9 IDE to enable content creation in the cloud.

Setting up an AWS Cloud9 IDE

Start by provisioning an AWS Cloud9 IDE. AWS Cloud9 environments run using Amazon Elastic Compute Cloud (Amazon EC2). You need to provision your AWS Cloud9 environment into an existing public subnet in an Amazon Virtual Private Cloud (Amazon VPC) within your AWS account. You can complete this in the following steps:

1. Access your AWS account using with an identity with administrative privileges. If you don’t have an AWS account, you can create one.

2. Create a new AWS Cloud9 environment using the wizard on the AWS Cloud9 console.

3. Enter a name for your desktop and an optional description.

4. Choose Next step.

Naming your Cloud 9 environment

5. In the Environment settings section, for Environment type, select Create a new EC2 instance for environment (direct access).

6. For Instance type, select your preferred instance type (the default, t2.micro, works for this use case)

7. Under Network settings, for Network (VPC), choose a VPC that you wish to deploy your AWS Cloud9 instance into. You may wish to use your default VPC, which is suitable for the purpose of this tutorial.

8. Choose a public subnet from this VPC for deployment.

Cloud9 Settings

9. Leave all other settings unchanged and choose Next step.
10. Review your choices and choose Create environment.

Environment creation takes a few minutes to complete. When the environment is ready, you receive access to the AWS Cloud9 IDE in your browser. We return to it shortly to develop content for your Hugo website.

Your Cloud9 Desktop

Configuring a source code repository to track content changes

Static website generators enable rapid changes to website content and layout. Source control management (SCM) systems provide a revision history for your code, and allow you to revert to previous versions of a project when unintended changes are introduced. SCM systems become increasingly important as the velocity of change and the number of team members introducing change increases.

You now create a source code repository to track changes to your content. You use CodeCommit, a fully-managed source control service that hosts secure Git-based repositories.

1. In a new browser, sign in to the CodeCommit Console and create a new repository.

2. For Repository name, enter amplify-website.

3. For Description, enter an appropriate description.

4. Choose Create.

Create repository

Repository creation takes just a few moments.

5. In the Connection steps section, choose the appropriate method to connect to your repository based on how you accessed your AWS account.

For this post, I signed in to my AWS account using federated access, so I choose the HTTPS Git Remote CodeCommit (HTTPS-GRC) tab. This is the recommended connection method for this sign-in type. You can also configure a connection to your repository using SSH or Git credentials over HTTPS. SSH and Git credentials over HTTPS are appropriate methods if you have signed in to your AWS account as an AWS Identity and Access Management (IAM) user. The Amazon CodeCommit console provides additional information regarding each of these connection types, including links to supporting documentation.

Connect to Repo

Configuring and deploying an example website

You’re now ready to configure and deploy your website.

1. Return to the browser with your AWS Cloud9 IDE and place your cursor in the lower terminal pane of the IDE.

The terminal pane provides Bash shell access on the EC2 instance running AWS Cloud9.

You now create a Hugo website. The website design is based on Hugo-theme-learn. Themes are collections of Hugo layouts that take all the hassle out of building your website. Learn is a multilingual-ready theme authored by Mathieu Cornic, designed for building technical documentation websites.

Hugo provides a variety of themes on their website. Many of the themes include bundled example website content that you can easily adapt by following the accompanying theme documentation.

2. Enter the following code to download an existing example website stored as a .zip file, extract it, and commit the contents into CodeCommit from your AWS Cloud9 IDE:

cd ~/environment
aws s3 cp s3://ee-assets-prod-us-east-1/modules/3c5ba9cb6ff44465b96993d210f67147/v1/example-website.zip ~/environment/example-website.zip
unzip example-website.zip
rm example-website.zip

The following screenshot shows your output.

example website copy commands

Next, we run commands to create a directory to host your website and copy files into place from the example website to get started. We then create a new default branch called main (formerly referred to as the master branch), local to our AWS Cloud9 instance. We then copy files into place from the example website. After adding and committing them locally, we push all our changes to the remote Amazon Codecommit repository.

3. Enter the following code:

mkdir ~/environment/amplify-website/
cd ~/environment/amplify-website/
git init
git remote add origin codecommit::us-east-1://amplify-website
git remote -v
git checkout -b main
cp -rp ~/environment/example-website/* ~/environment/amplify-website/
git add *
git commit -am "first commit"
git push -u origin main

Deployment and hosting is achieved by using Amplify Console, a static web hosting service that accelerates your application release cycle by providing a simple CI/CD workflow for building and deploying static web applications.

4. On the Amplify console, under Deploy, choose Get Started.

Amplify banner

5. On the Get started with the Amplify Console page, select AWS CodeCommit as your source code repository.

6. Choose Continue.

Amplify get started page

7. On the Add repository branch page, for Recently updated repositories, choose your repository.

8. For Branch, choose main.

9. Choose Next.

add branch

On the Configure build settings page, Amplify automatically uses the amplify.yml file for build settings for your deployment. You committed this into your source code repository in the previous step. The amplify.yml file is detected from the root of your website directory structure.

10. Choose Next.

Amplify configure build settings

11. On the review page, choose Save and deploy.

Amplify builds and deploys your Amplify website within minutes, and shows you its progress. When deployment is complete, you can access the website to see the sample content.

amplify website

The following screenshot shows your example website.

sample website

Promoting changes to the website

We can now update the line of text in the home page and commit and publish this change.

1. Return to the browser with your AWS Cloud9 IDE and place your cursor in the lower terminal pane of the IDE.

2. On the navigation pane, choose the file ~/environment/amplify-website/workshop/content/_index.en.md.

The contents of the file open under a new tab in the upper pane.

3. Change the string First Line of Text to First Update to Website.

content change

4. From the File menu, choose Save to save the changes you have made to the _index.en.md file.

save content changes

5. Commit the changes and push to CodeCommit by running the following command in the lower terminal pane in AWS Cloud9:

git add *; git commit -am "homepage update"; git push origin main

The output in your AWS Cloud9 terminal should appear similar to the following screenshot.

commit output

6. Return to the Amplify Console and observe how the committed change in CodeCommit is automatically detected. Amplify runs deployment steps to push your changes to the website.

amplify deploy changes

7. Access the URL of your website after this update is complete to verify that the first line of text on your home page has changed.

updated website

You can repeat this process to make source-code controlled, automated changes to your website.

Adding a custom domain

Adding a custom domain to your Amplify configuration makes it easier for clients to access your content. You can register new domains using Amazon Route 53 or, if you have an existing domain registered outside of AWS, you can integrate it with Route 53 and Amplify. For our use case, the domain www.hugoonamplify.com is a registered a domain name using a third-party registrar (NameCheap). You can manage DNS configurations for domains registered outside of AWS using Route 53.

Start by configuring a public hosted zone in Route 53.

1. On the Route 53 console, choose Hosted zones.

2. Choose Create hosted zone.

hosted zones

3. For Domain name, enter hugoonamplify.com.

4. For Description, enter an appropriate description.

5. For Type, select Public hosted zone.

hosted zones configuration

6. Choose Create hosted zone.

7. Save the addresses of the name servers that respond to client DNS lookup requests for the custom domain.

create hosted zone

8. In a separate browser, access the console of your DNS registrar.

9. Configure a custom DNS name servers setting on the console of the third-party domain name registrar.

This configuration specifies the Route 53 assigned name servers as authoritative DNS for our custom domain. For this use case, propagation of this change may take up to 48 hours.

namecheap console

10. Use https://who.is to verify that the AWS name servers are listed correctly for your custom domain to internet clients.

whois lookup

You can now set up your custom domain in Amplify. Amplify helps you configure DNS and set up SSL for your desired custom domain.

domain management

11. On the Amplify Console, under App settings, choose Domain management.

12. Choose Add domain.

13. For Domain, enter your custom domain name (hugoonamplify.com).

14. Choose Configure domain.

15. For Subdomain, I only want to set up www and choose to exclude the root of my custom domain.

16. Choose Save.

Amplify begins the process of creating the SSL certificates. Amplify sends a notification that it’s issuing an SSL certificate to secure traffic to the custom domain.

ssl domain management

After a few moments, it proceeds to SSL configuration and indicates that ownership of domain is in progress.

ssl domain management configuration

Amplify verifies domain ownership by creating a sample CNAME record in your hosted zone file. When ownership is verified, the domain is propagated onto an Amazon CloudFront distribution managed by the Amplify service, and domain activation is complete.

ssl domain management configured

Clients can now access the website using the custom domain name www.hugoonaplify.com.

access website via custom domain

Establishing a subdomain for development

You can create a development website in Amplify that is aligned to a development code branch in CodeCommit that enables testing changes prior to production release.

1. Access the AWS Cloud9 IDE and use the terminal to enter the following commands to create a development branch and push changes to CodeCommit using the current content from the main branch with a single content change:

git checkout -b development
git branch
git remote -v
git add *; git commit -am "first development commit";
git push -u origin development

2. Open and edit the file ~/environment/amplify-website/workshop/content/_index.en.md and change the string Update to Website to something else.

Alternatively, run the following Unix sed command from the terminal in AWS Cloud9 to make that content change:

sed -i 's/Update to Website/Update to Development/g' ~/environment/amplify-website/workshop/content/_index.en.md

3. Commit and push your change with the following code:

git add *; git commit -am "second development commit"; git push -u origin development

You now configure a subdomain in Amplify to allow developers to review changes.

4. Return to the amplify-website app.

5. Choose Connect branch.

connect branch

6. For Branch, choose the development branch you created and committed code into.

7. Choose Next.

add development branch

Amplify builds a second website based on the contents of the development branch. You can see the instance of your website matched to the development code branch on Amplify Console.

amplify two branches

8. Access the domain management menu item in your Amplify application to add a friendly subdomain.

9. Edit the domain and add a subdomain item with a name of your choice (for example, dev).

10. Associate it to the development branch containing the committed code and content changes.

11. Choose Add.

add dev domain

You can access the subdomain to verify the changes.

verify domain

Controlling access to development

You may wish to restrict access to new content as it’s deployed into the development website.

1. On Amplify Console, choose your application.

2. Choose Access control.

3. Under Access control settings, choose your preferred settings.

You have the option to restrict access globally or on a branch-by-branch basis. For this use case, we create a simple password protection for a user named developer on the development branch and site.

access control settings

Cleaning up

Unless you plan to keep the website you have constructed, you can quickly clean up provisioned assets and avoid any unnecessary costs.

1. On Amplify Console, select the app you created.

2. From the Actions drop-down menu, choose Delete app.

3. In the pop-up window, confirm the deletion.

4. On the CodeCommit dashboard, select the repository you created.

5. Choose Delete.

6. In the pop-up window, confirm the deletion.

7. On the AWS Cloud9 dashboard, select the IDE you created.

8. Choose Delete.

9. In the pop-up window, confirm the deletion.

Conclusion

Hugo is a powerful tool that enables accelerated delivery of content in a variety of formats including image portfolios, online resume presentation, blogging, and technical documentation. Amplify Console provides a convenient, easy-to-use, static web hosting service that can greatly accelerate delivery of static content.

When combining Hugo with Amplify Console, you can rapidly deploy websites in minutes with features such as friendly URLS, environments matched to code branches, and encryption (SSL). Visit gohugo.io to find out more about Hugo. For more information about how Amplify Console can help you rapidly deploy Hugo and other modern web applications, see the AWS Amplify Console User Guide.

Field Notes: Customizing the AWS Control Tower Account Factory with AWS Service Catalog

2020-10-23 Remek Hetman

Post Syndicated from Remek Hetman original https://aws.amazon.com/blogs/architecture/field-notes-customizing-the-aws-control-tower-account-factory-with-aws-service-catalog/

Many AWS customers who are managing hundreds or thousands of accounts know how complex and time consuming this process can be. To reduce the burden and simplify the process of creating new accounts, last year AWS released a new service, AWS Control Tower.

AWS Control Tower helps you automate the process of setting up a multi-account AWS environment (AWS Landing Zone) that is secure, well-architected, and ready to use. This Landing Zone is created following best practices established through AWS’ experience working with thousands of enterprises as they move to the cloud. This includes the configuration of AWS Organizations, centralized logging, federated access, mandatory guardrails, and networking.

Those elements are a good starting point to cover the initial configuration of the new account. For some organizations, the next step is to baseline a newly vended account to align it with the company policies and compliance requirements. This means create or deploy necessary roles, policies, governance controls, security groups and so on.

In this blog post, I describe a solution that helps achieve consistent governance and compliance requirements across accounts created by AWS Control Tower. The solution uses the AWS Service Catalog as the repository of products that will be deployed into the new accounts.

Prerequisites and assumptions

Before I get into how it works, let’s first review a few key AWS Service Catalog concepts:

An AWS Service Catalog product is a blueprint for building your AWS resources that you want to make available for deployment on AWS along with the configuration information.
A portfolio is a collection of products, together with the configuration information.
A provisioned product is an AWS CloudFormation stack.
Constraints control the way users can deploy a product. With launch constraints, you can specify a role that the AWS Service Catalog can assume to launch a product.
Review AWS Service Catalog reference blueprints for a quick way to set up and configure AWS Service Catalog portfolios and products.

You need an AWS Service Catalog portfolio with products that you plan to deploy to the accounts. The portfolio has to be created in the AWS Control Tower primary account. If you don’t have a portfolio, the starting point would be to review these resources:

Solution Overview

This solution supports the following scenarios:

Deployment of products to the newly vended AWS Control Tower account (Figure 1)
Update and deployment of products to existing accounts (Figure 2)

Figure 1 – Deployment to new account

The architecture diagram in figure 1 shows the process of deploying products to the new account.

AWS Control Tower creates a new account
Once an account is created successfully, Amazon CloudWatch Events trigger an AWS Lambda function
The Lambda function pulls the configuration from an Amazon S3 bucket and:
- Validates configuration
- Grants Lambda role access to portfolio(s)
- Creates StackSet constrains for products
Lambda calls the AWS Step Function
The AWS Step Function orchestrates deployment of the AWS Service Catalog products and monitors progress

Figure 2 – Update or deployment to an existing account

The architecture diagram in figure 2 shows process of updating product or deploying product to the existing account.

User uploads update configuration to an Amazon S3 Bucket
Update triggers AWS Lambda Function
Lambda function reads uploaded configuration and:
- Validates configuration
- Grants Lambda role access to portfolio(s)
- Creates StackSet constrains for products
Calls AWS Step Function
AWS Step Function orchestrate update/deployment of AWS Service Catalog products and monitoring progress

Deployment and configuration

Before proceeding with the deployment, you must install Python3.

Then, follow these steps:

Download the solution from GitHub
Go to folder src and run the following command: “pip3 install –r requirements.txt –t .”
Zip content of the src folder to “control-tower-account-factory-solution.zip” file
Upload zip file to your Amazon S3 bucket created in the AWS Region where you want to deploy the solution
Launch the AWS CloudFormation template

AWS CloudFormation Template Parameters:

ConfigurationBucketName – name of the Amazon S3 bucket from where AWS Lambda function should pull the configuration file. You can provide the name of an existing bucket.
CreateConfigurationBucket – set to true if you want to create a bucket specified in previous parameter. If a bucket exists, set value to false
ConfigurationFileName – name of the configuration file for a new account. Default value: config.yml
UpdateFileName – name of the configuration file for updates. Default value: update.yml
SourceCodeBucketName – name of the bucket where you uploaded the zipped Lambda code
SourceCodePackageName – name of the zipped file contain Lambda. If the file was uploaded to folder, include the name of folder(s) as well. Example: my_folder/control-tower-account-factory-solution.zip
BaselineFunctionName –name of the AWS Lambda function. Default value: control-tower- account-factory-lambda
BaselinetLambdaRoleName –name of the AWS Lambda function IAM role. Default value: control-tower- account-factory-lambda-role
StateMachineName – name of the AWS Step Function. Default value: control-tower- account-factory-state-machine
StateMachineRoleName – name of the AWS Step Function IAM role. Default value: control-tower- account-factory-state-machine-role
TrackingTableName – name of the DynamoDB table to track all deployment and updates. Default value: control-tower-account-factory-tracking-table
MaxIterations –maximum iteration of the AWS Step Function before the reports time out. Default value: 30. It can be overwrite in configuration file. For more information see Configuration section.
TopicName – (optional) name of the SNS Topic that will be used for sending notification. Default value: control-tower- account-factory-account-notification
NotificationEmail – (optional) the email address where to send the notification. If this isn’t provided, the SNS topic won’t be created.

Configuration Files

The configuration files have to be in the YAML format, you can use the following schemas for new or existing accounts:

Configuration schema for new account:

The configuration can apply to all new accounts or can be divided based on the organization unit associated with the new account. Also, you can mix deployments where some products are always deployed and some only to specific organization units.

Example configuration file

Configuration schema for existing accounts

This configuration can be used to update provisioned products or deploy products into existing accounts. You can specify a target location either as account(s), organization unit(s) or both. If you specified an organization unit as a target, the product(s) will be updated/deployed to all accounts associated with organization unit.

If the product doesn’t exist in the account, the Lambda function attempts to deploy it. You can manage this by setting the parameter ‘deployifnotexist’ to true. If omitted or set to false, Lambda won’t provision the product to an existing account.

Example configuration file

All products deployed by the AWS Service Catalog are provisioned under their own name that has to be unique across all provisioned products. In this example, products are provisioned under the following naming convention:

<account id where product is provision>-<”provision_name” from configuration file>

Example:  123456789-my-product

In the configuration file you can specify dependencies between products. Dependencies need to be provided as the list of provisioned product name not a product name. If provided, deployment of the product will be suspended until all dependencies successfully deployed.

The Step Function is running in the loop with interval of 1 minute and checking if dependencies were deployed. In the event of an error in the dependency naming or configuration, the step function iterates only until it reaches the maximum iterations defined in the configuration file. If the maximum iterations are reached, the step function reports time out and interrupts the products’ deployment.

Important: if you updating a product by adding a new AWS Region, you cannot specify the dependency or run updates in existing regions the same time. In this scenario you should break the update as follows:

Upload configuration to create dependencies in the new region
Upload configuration to create products only in the new region

You can find different examples of configuration files in GitHub.

Note: The name of the configuration file and Amazon S3 location must match the values provided in the AWS CloudFormation template during solution deployment.

AWS Lambda functions deployment considerations

When deploying product is a Lambda function, you need to consider two requirements:

The source code of the Lambda function needs to be in the Amazon S3 bucket created in the same AWS region where you are planning to deploy a Lambda function
The destination account needs to have permission to the source Amazon S3 bucket

To accommodate both requirements, the approach is to create an Amazon S3 bucket under the AWS Control Tower primary account. There is one Amazon S3 bucket in each Region where the functions will be deployed.

Each deployment bucket should have attached the following policy:

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::<S3 bucket name>/*",
            "Condition": {
                "StringEquals": {
                    "aws:PrincipalOrgID": "<AWS Organizations Id>"
                },
                "StringLike": {
                    "aws:PrincipalArn": [
                        "arn:aws:iam::*:role/AWSControlTowerExecution",
        "arn:aws:iam::*:role/< name of the Lambda IAM role – default: control-tower-account-factory-lambda-role>"
                    ]
                }
            }
        }
    ]
}

This allows the deployment role to access Lambda’s source code from any account in your AWS Organizations.

Conclusion

In this blog post, I’ve outlined a solution to help you drive consistent governance and compliance requirements across accounts vended though AWS Control Tower. This solution provides enterprises with mechanisms to manage all deployments from a centralized account. Also, this reduces the need for maintaining multiple separate CI/CD pipelines. Therefore you can simplify and reduce deployment time in a multi-account environment.

This solution allows you keep provisioned products up to date by updating existing products or bringing old accounts to the same compliance level as the new accounts.

Since this solution supports deployment to existing accounts and can be run without AWS Control Tower, it can be used for deployment of any AWS Service Catalog product either in single or multiple account environments. This solution then becomes an integral part of CI/CD pipeline.

For more information, review the AWS Control Tower documentation and AWS Service Catalog documentation. Also, review the links listed in the “Prerequisites and assumptions” section of this post.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Building event-driven architectures with Amazon SNS FIFO

2020-10-23 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-event-driven-architectures-with-amazon-sns-fifo/

This post is courtesy of Christian Mueller, Principal Solutions Architect.

Developers increasingly adopt event-driven architectures to decouple their distributed applications. Often, these events must be propagated in a strictly ordered manner to all subscribed applications. Using Amazon SNS FIFO topics and Amazon SQS FIFO queues, you can address use cases that require end-to-end message ordering, deduplication, filtering, and encryption.

In this blog post, I introduce a sample event-driven architecture. I walk through an implementation based on Amazon SNS FIFO topics and Amazon SQS FIFO queues.

Common requirements in event-driven-architectures

In event-driven architectures, data consistency is a common business requirement. This is often translated into technical requirements such as zero message loss and strict message ordering. For example, if you update your domain object rapidly, you want to be sure that all events are received by each subscriber in exactly the order they occurred. This way, the current domain object state is what each subscriber received as the latest update event. Similarly, all update events should be received after the initial create event.

Before Amazon SNS FIFO, architects had to design applications to check if messages are received out of order before processing.

Another common challenge is preventing message duplicates when sending events to the messaging service. If an event publisher receives an error, such as a network timeout, the publisher does not know if the messaging service could receive and successfully process the message or not.

The client may retry, as this is the default behavior for some HTTP response codes in AWS SDKs. This can cause duplicate messages.

Before Amazon SNS FIFO, developers had to design receivers to be idempotent. In some cases, where the event cannot be idempotent, this requires the receiver to be implemented in an idempotent way. Often, this is done by adding a key-value store like Amazon DynamoDB or Amazon ElastiCache for Redis to the service. Using this approach, the receiver can track if the event has been seen before.

Exploring the recruiting agency example

This sample application models a recruitment agency with a job listings website. The application is composed of multiple services. I explain 3 of them in more detail.

A custom service, the anti-corruption service, receives a change data capture (CDC) event stream of changes from a relational database. This service translates the low-level technical database events into meaningful business events for the domain services for easy consumption. These business events are sent to the SNS FIFO “JobEvents.fifo“ topic. Here, interested services subscribe to these events and process them asynchronously.

In this domain, the analytics service is interested in all events. It has an SQS FIFO “AnalyticsJobEvents.fifo” queue subscribed to the SNS FIFO “JobEvents.fifo“ topic. It uses SQS FIFO as event source for AWS Lambda, which processes and stores these events in Amazon S3. S3 is object storage service with high scalability, data availability, durability, security, and performance. This allows you to use services like Amazon EMR, AWS Glue or Amazon Athena to get insights into your data to extract value.

The inventory service owns an SQS FIFO “InventoryJobEvents.fifo” queue, which is subscribed to the SNS FIFO “JobEvents.fifo“ topic. It is only interested in “JobCreated” and “JobDeleted” events, as it only tracks which jobs are currently available and stores this information in a DynamoDB table. Therefore, it uses an SNS filter policy to only receive these events, instead of receiving all events.

This sample application focuses on the SNS FIFO capabilities, so I do not explore other services subscribed to the SNS FIFO topic. This sample follows the SQS best practices and SNS redrive policy recommendations and configures dead-letter queues (DLQ). This is useful in case SNS cannot deliver an event to the subscribed SQS queue. It also helps if the function fails to process an event from the corresponding SQS FIFO queue multiple times. As a requirement in both cases, the attached SQS DLQ must be an SQS FIFO queue.

Deploying the application

To deploy the application using infrastructure as code, it uses the AWS Serverless Application Model (SAM). SAM provides shorthand syntax to express functions, APIs, databases, and event source mappings. It is expanded into AWS CloudFormation syntax during deployment.

To get started, clone the “event-driven-architecture-with-sns-fifo” repository, from here. Alternatively, download the repository as a ZIP file from here and extract it to a directory of your choice.

As a prerequisite, you must have SAM CLI, Python 3, and PIP installed. You must also have the AWS CLI configured properly.

Navigate to the root directory of this project and build the application with SAM. SAM downloads required dependencies and stores them locally. Execute the following commands in your terminal:

git clone https://github.com/aws-samples/event-driven-architecture-with-amazon-sns-fifo.git
cd event-driven-architecture-with-amazon-sns-fifo
sam build

You see the following output:

Now, deploy the application:

sam deploy --guided

Provide arguments for the deployments, such as the stack name and preferred AWS Region:

After a successful deployment, you see the following output:

Learning more about the implementation

I explore the three services forming this sample application, and how they use the features of SNS FIFO.

Anti-corruption service

The anti-corruption service owns the SNS FIFO “JobEvents.fifo” topic, where it publishes business events related to job postings. It uses an SNS FIFO topic, as end-to-end ordering per job ID is required. SNS FIFO is configured not to perform content-based deduplication, as I require a unique message deduplication ID for each event for deduplication. The corresponding definition in the SAM template looks like this:

  JobEventsTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: JobEvents.fifo
      FifoTopic: true
      ContentBasedDeduplication: false

For simplicity, the anti-corruption function in the sample application doesn’t consume an external database CDC stream. It uses Amazon CloudWatch Events as an event source to trigger the function every minute.

I provide the SNS FIFO topic Amazon Resource Name (ARN) as an environment variable in the function. This makes this function more portable to deploy in different environments and stages. The function’s AWS Identity and Access Management (IAM) policy grants permissions to publish messages to only this SNS topic:

  AntiCorruptionFunction:
    Type: AWS::Serverless::
    Properties:
      CodeUri: anti-corruption-service/
      Handler: app.lambda_handler
      Runtime: python3.7
      MemorySize: 256
      Environment:
        Variables:
          TOPIC_ARN: !Ref JobEventsTopic
      Policies:
        - SNSPublishMessagePolicy
            TopicName: !GetAtt JobEventsTopic.TopicName
      Events:
        Trigger:
          Type: 
          Properties:
            Schedule: 'rate(1 minute)'

The anti-corruption function uses features in the SNS publish API, which allows you to define a “MessageDeduplicationId” and a “MessageGroupId”. The “MessageDeduplicationId” is used to filter out duplicate messages, which are sent to SNS FIFO within in 5-minute deduplication interval. The “MessageGroupId” is required, as SNS FIFO processes all job events for the same message group in a strictly ordered manner, isolated from other message groups processed through the same topic.

Another important aspect in this implementation is the use of “MessageAttributes”. We define a message attribute with the name “eventType” and values like “JobCreated”, “JobSalaryUpdated”, and “JobDeleted”. This allows subscribers to define SNS filter policies to only receive certain events they are interested in:

import boto3
from datetime import datetime
import json
import os
import random
import uuid

TOPIC_ARN = os.environ['TOPIC_ARN']

sns = boto3.client('sns')

def lambda_handler(event, context):
    jobId = str(random.randrange(0, 1000))

    send_job_created_event(jobId)
    send_job_updated_event(jobId)
    send_job_deleted_event(jobId)
    return

def send_job_created_event(jobId):
    messageId = str(uuid.uuid4())

    response = sns.publish(
        TopicArn=TOPIC_ARN,
        Subject=f'Job {jobId} created',
        MessageDeduplicationId=messageId,
        MessageGroupId=f'JOB-{jobId}',
        Message={...},
        MessageAttributes = {
            'eventType': {
                'DataType': 'String',
                'StringValue': 'JobCreated'
            }
        }
    )
    print('sent message and received response: {}'.format(response))
    return

def send_job_updated_event(jobId):
    messageId = str(uuid.uuid4())

    response = sns.publish(...)
    print('sent message and received response: {}'.format(response))
    return

def send_job_deleted_event(jobId):
    messageId = str(uuid.uuid4())

    response = sns.publish(...)
    print('sent message and received response: {}'.format(response))
    return

Analytics service

The analytics service owns an SQS FIFO “AnalyticsJobEvents.fifo” queue which is subscribed to the SNS FIFO “JobEvents.fifo” topic. Following best practices, I define redrive policies for the SQS FIFO queue and the SNS FIFO subscription in the template:

  AnalyticsJobEventsQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: AnalyticsJobEvents.fifo
      FifoQueue: true
      RedrivePolicy:
        deadLetterTargetArn: !GetAtt AnalyticsJobEventsQueueDLQ.Arn
        maxReceiveCount: 3

  AnalyticsJobEventsQueueToJobEventsTopicSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      Endpoint: !GetAtt AnalyticsJobEventsQueue.Arn
      Protocol: sqs
      RawMessageDelivery: true
      TopicArn: !Ref JobEventsTopic
      RedrivePolicy: !Sub '{"deadLetterTargetArn": "${AnalyticsJobEventsSubscriptionDLQ.Arn}"}'

The analytics function uses SQS FIFO as an event source for Lambda. The S3 bucket name is an environment variable for the function, which increases the code portability across environments and stages. The IAM policy for this function only grants permissions write objects to this S3 bucket:

  AnalyticsFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: analytics-service/
      Handler: app.lambda_handler
      Runtime: python3.7
      MemorySize: 256
      Environment:
        Variables:
          BUCKET_NAME: !Ref AnalyticsBucket
      Policies:
        - S3WritePolicy:
            BucketName: !Ref AnalyticsBucket
      Events:
        Trigger:
          Type: SQS
          Properties:
            Queue: !GetAtt AnalyticsJobEventsQueue.Arn
            BatchSize: 10

View the function implementation at the GitHub repo.

Inventory service

The inventory service also owns an SQS FIFO “InventoryJobEvents.fifo” queue which is subscribed to the SNS FIFO “JobEvents.fifo” topic. It uses redrive policies for the SQS FIFO queue and the SNS FIFO subscription as well. This service is only interested in certain events, so uses an SNS filter policy to specify these events:

  InventoryJobEventsQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: InventoryJobEvents.fifo
      FifoQueue: true
      RedrivePolicy:
        deadLetterTargetArn: !GetAtt InventoryJobEventsQueueDLQ.Arn
        maxReceiveCount: 3

  InventoryJobEventsQueueToJobEventsTopicSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      Endpoint: !GetAtt InventoryJobEventsQueue.Arn
      Protocol: sqs
      RawMessageDelivery: true
      TopicArn: !Ref JobEventsTopic
      FilterPolicy: '{"eventType":["JobCreated", "JobDeleted"]}'
      RedrivePolicy: !Sub '{"deadLetterTargetArn": "${InventoryJobEventsQueueSubscriptionDLQ.Arn}"}'

The inventory function also uses SQS FIFO as event source for Lambda. The DynamoDB table name is set as an environment variable, so the function can look up the name during initialization. The IAM policy grants read/write permissions for only this table:

  InventoryFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: inventory-service/
      Handler: app.lambda_handler
      Runtime: python3.7
      MemorySize: 256
      Environment:
        Variables:
          TABLE_NAME: !Ref InventoryTable
      Policies:
        - DynamoDBCrudPolicy:
            TableName: !Ref InventoryTable
      Events:
        Trigger:
          Type: SQS
          Properties:
            Queue: !GetAtt InventoryJobEventsQueue.Arn
            BatchSize: 10

View the function implementation at the GitHub repo.

Conclusion

Amazon SNS FIFO topics can simplify the design of event-driven architectures and reduce custom code in building such applications.

By using the native integration with Amazon SQS FIFO queues, you can also build architectures that fan out to thousands of subscribers. This pattern helps achieve data consistency, deduplication, filtering, and encryption in near real time, using managed services.

For information on regional availability and service quotas, see SNS endpoints and quotas and SQS endpoints and quotas. For more information on the FIFO functionality, see SNS FIFO and SQS FIFO in their Developer Guides.

Field Notes: Migrating a Self-managed Kubernetes Cluster on Amazon EC2 to Amazon EKS

2020-10-20 Ahmed Bham

Post Syndicated from Ahmed Bham original https://aws.amazon.com/blogs/architecture/field-notes-migrating-a-self-managed-kubernetes-cluster-on-ec2-to-amazon-eks/

AWS customers from startups to enterprises have been successfully running Kubernetes clusters on Amazon EC2 instances since 2015, well before Amazon Elastic Kubernetes Service (Amazon EKS), was launched in 2018. As a fully managed Kubernetes service, Amazon EKS customers can run Kubernetes on AWS without needing to install, operate, and maintain their own Kubernetes control plane. Since its launch, many existing and new customers are building and running their Kubernetes clusters on Amazon EKS.

At re:Invent 2019, AWS announced AWS Fargate for Amazon EKS, which is serverless compute for containers. Then, in January 2020, AWS announced a price reduction of Amazon EKS by 50% to $0.10 per hour, per cluster. These developments, coupled with realization of management and cost overhead of Kubernetes control plane operations at scale, made more customers look into migrating their self-managed Kubernetes clusters to Amazon EKS.

The “how” of this migration is the focus of this blog.

Overview of Solution

For most customers migrating from self-managed Kubernetes clusters on Amazon EC2 to Amazon EKS can usually be accomplished with minimal or no downtime. However, for large clusters involving hundreds of nodes and thousands of pods, this requires more planning and testing, and it is recommended to engage AWS Support for guidance.

Kubernetes control plane

Considerations

There are certain considerations to ensure a successful Amazon EKS migration and operational excellence.

Security

Access Control
- Amazon EKS uses AWS Identity and Access Management (IAM) to provide authentication to your Kubernetes cluster, but it still relies on native Kubernetes Role Based Access Control (RBAC) for authorization. It’s important to plan for the creation and governance of IAM users, roles, or groups, for Kubernetes cluster administration.
- You can enable private access to the Kubernetes API server so that all communication between your nodes and the API server stays within your VPC. You can limit the IP addresses that can access your API server from the internet, or completely disable internet access to the API server.
IAM Role for Service Account
- With IAM roles for service accounts on Amazon EKS clusters, you can associate an IAM role with a Kubernetes service account. This service account can then provide AWS permissions to the containers in any pod that uses that service account. With this feature, you no longer need to provide extended permissions to the node IAM role so that pods on that node can call AWS APIs.
Security groups for pods
- Security groups for pods integrate Amazon EC2 security groups with Kubernetes pods. You can use Amazon EC2 security groups to define rules that allow inbound and outbound network traffic to and from pods. These pods are deployed to nodes running on many Amazon EC2 instance types.

Networking

Amazon EKS supports native VPC networking with the Amazon VPC Container Network Interface (CNI) plugin for Kubernetes. Using this plugin allows Kubernetes pods to have the same IP address inside the pod as they do on the VPC network.
VPC CNI plugin uses IP addresses for the pods from the VPC CIDR ranges, and specifically from the subnet where the worker node is hosted. Therefore, customers must ensure the VPC and subnets that will host their Amazon EKS cluster have sufficient IP addresses available for the expected number of pods running at a time. Additionally, IP addresses are allocated to the Elastic Network Interface (ENI) attached to the EC2 instances. The EC2 instance selection for the worker nodes should take into account the number of ENI attachments supported by the instance type.

Compute Options

An Amazon EKS cluster can schedule pods on any combination of self-managed nodes, managed node groups, and AWS Fargate. This table in the Amazon EKS User Guide provides several criteria to evaluate when deciding which option best meets your requirements.

Kubernetes Versions

Amazon EKS supports four major Kubernetes versions at a time, which you can review in the available AWS documentation, along with a calendar for future Amazon EKS releases.
If you are currently running a non-supported Kubernetes cluster, or would like to migrate to a newer version on Amazon EKS, consider the following:

1. Review release notes for specific Kubernetes version you want to migrate to, and make necessary updates to Kubernetes manifest files.
2. Update your Kubernetes add-ons (CNI plugin, CoreDNS, Kube-Proxy) to compatible versions, as listed in the Updating an Amazon EKS Cluster guide.

Prerequisites

Create an IAM Role for the creator and/or administrator of the Amazon EKS cluster. Specify this role when creating the Amazon EKS cluster.
If using an existing VPC and subnets to host Amazon EKS cluster:

- You will need subnets in at least two Availability Zones
- All public subnets should have the property MapPublicIpOnLaunch enabled (that is, Auto-assign public IPv4 address in the AWS Management Console) to host self-managed and managed node groups.

3. If your pods are currently accessing AWS resources, and if you would like to use IAM roles for service accounts, then:

- Create service accounts in Kubernetes to be used by your pods.
- Follow these steps to create IAM roles, and assign to service account created.
- Update your pod manifest files to specify the newly created service account and role ARN annotation. Remove any existing code for storing or passing IAM credentials.

4. If you are planning to use AWS Fargate to run your pods, you need to create the appropriate Fargate Profile and pod execution role.

Application and Data Migration

For stateless workloads, apply your resource definitions (YAML or Helm) to the new cluster, and make sure everything works as expected. This includes the connection to resources external to the cluster.
For stateful workloads:

1. You will need to carefully plan your migration to avoid data loss or unexpected downtime.
2. If you are currently using shared persistent file storage based on Amazon EFS or Amazon FSx for Lustre, they can be mounted to Amazon EKS pods concurrently. Just make sure that pods don’t write to the same files concurrently.
3. For pods using EBS volumes, and for other persistent storage types, you can use a Kubernetes backup and restore tool, Velero.

Traffic Migration

If you have an entire domain migration that you would like to smoothly migrate, you can take advantage of Amazon Route 53’s Weighted Routing (as shown in the following diagram). With Weighted Routing, you are able to have a progressive transition from your existing cluster to the new one with zero downtime by splitting the traffic at the DNS level.

Your customers are slowly being transferred to your new cluster as their cached TTL expires. The split could start with a small share of your customers, for example, 10% being pointed to the new Amazon EKS cluster and 90% still on the old one. As soon as traffic is confirmed to be working as expected on the new cluster, that percentage of clients pointed to the new one can be increased.

mydomain.com

This implementation is flexible, it can be tied to Load Balancers, EC2 Instances, and even to external on-premises infrastructure.

Conclusion

In this blog post, we showed how to migrate your live-traffic serving self-hosted Kubernetes Cluster to Amazon EKS. Amazon EKS offers a cost-effective and highly available option for running Kubernetes clusters in the cloud. Since Amazon EKS is upstream Kubernetes compliant, you can migrate existing self-managed Kubernetes workloads to Amazon EKS, with multiple options to minimize or avoid service disruption. To create your first Amazon EKS cluster, visit Getting started with Amazon EKS.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Fire Dynamics Simulation CFD workflow using AWS ParallelCluster, Elastic Fabric Adapter, Amazon FSx for Lustre and NICE DCV

2020-10-19 Emma White

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/fire-dynamics-simulation-cfd-workflow-using-aws-parallelcluster-elastic-fabric-adapter-amazon-fsx-for-lustre-and-nice-dcv/

This post was written by By Kevin Tuil, AWS HPC consultant

Modeling fires is key for many industries, from the design of new buildings, defining evacuation procedures for trains, planes and ships, and even the spread of wildfires. Modeling these fires is complex. It involves both the need to model the three-dimensional unsteady turbulent flow of the fire and the many potential chemical reactions. To achieve this, the fire modeling community has moved to higher-fidelity turbulence modeling approaches such as the Large Eddy Simulation, which requires both significant temporal and spatial resolution. It means that the computational cost for these simulations is typically in the order of days to weeks on a single workstation.
While there are a number of software packages, one of the most popular is the open-source code: Fire Dynamics Simulation (FDS) developed by National Institute of Standards and Technology (NIST).

In this blog, I focus on how AWS High Performance Computing (HPC) resources (e.g AWS ParallelCluster, Amazon FSx for Lustre, Elastic Fabric Adapter (EFA), and Amazon S3) allow FDS users to scale up beyond a single workstation to hundreds of cores to achieve simulation times of hours rather than days or weeks. In this blog, I outline the architecture needed, providing scripts and templates to compile FDS and run your simulation.

Service and solution overview

AWS ParallelCluster

AWS ParallelCluster is an open source cluster management tool that simplifies deploying and managing HPC clusters with Amazon FSx for Lustre, EFA, a variety of job schedulers, and the MPI library of your choice. AWS ParallelCluster simplifies cluster orchestration on AWS so that HPC environments become easy-to-use, even if you are new to the cloud. AWS released AWS ParallelCluster 2.9.1 and its user guide – which is the version I use in this blog.

These three AWS HPC resources are optimal for Fire Dynamics Simulation. Together, they provide easy deployment of HPC systems on AWS, low latency network communication for MPI workloads, and a fast, parallel file system.

Elastic Fabric Adapter

EFA is a critical service that provides low latency and high-bandwidth 100 Gbps network communication. EFA allows applications to scale at the level of on-premises HPC clusters with the on-demand elasticity and flexibility of the AWS Cloud. Computational Fluid Dynamics (CFD), among other tightly coupled applications, is an excellent candidate for the use of EFA.

Amazon FSx for Lustre

Amazon FSx for Lustre is a fully managed, high-performance file system, optimized for fast processing workloads, like HPC. Amazon FSx for Lustre allows users to access and alter data from either Amazon S3 or on-premises seamlessly and exceptionally fast. For example, you can launch and run a file system that provides sub-millisecond latency access to your data. Additionally, you can read and write data at speeds of up to hundreds of gigabytes per second of throughput, and millions of IOPS. This speed and low-latency unleash innovation at an unparalleled pace. This blog post uses the latest version of Amazon FSx for Lustre, which recently added a new API for moving data in and out of Amazon S3. This API also includes POSIX support, which allows files to mount with the same user id. Additionally, the latest version also includes a new backup feature that allows you to back up your files to an S3 bucket.

Solution and steps

The overall solution that I deploy in this blog is represented in the following diagram:

solution overview diagram

Step 1: Access to AWS Cloud9 terminal and upload data

There are two ways to start using AWS ParallelCluster. You can either install AWS CLI or turn on AWS Cloud9, which is a cloud-based integrated development environment (IDE) that includes a terminal. For simplicity, I use AWS Cloud9 to create the HPC cluster. Please refer to this link to proceed to AWS Cloud9 set up and to this link for AWS CLI setup.

Once logged into your AWS Cloud9 instance, the first thing you want to create is the S3 bucket. This bucket is key to exchange user data in and out from the corporate data center and the AWS HPC cluster. Please make sure that your bucket name is unique globally, meaning there is only one worldwide across all AWS Regions.

aws s3 mb s3://fds-smv-bucket-unique
make_bucket: fds-smv-bucket-unique

Download the latest FDS-SMV Linux version package from the official NIST website. It looks something like: FDS6.7.4_SMV6.7.14_lnx.sh

For the geometry, it should be renamed to “geometry.fds”, and must be uploaded to your AWS Cloud9 or directly to your S3 bucket.

Please note that once the FDS-SMV package has been downloaded locally to the instance, you must upload it to the S3 bucket using the following command.

aws s3 cp FDS6.7.4_SMV6.7.14_lnx.sh s3://fds-smv-bucket-unique
aws s3 cp geometry.fds s3://fds-smv-bucket-unique

You use the same S3 bucket to install FDS-SMV later on with the Amazon FSx for Lustre File System.

Step 2: Set up AWS ParallelCluster

You can install AWS ParallelCluster running the following command from your AWS Cloud9 instance:

sudo pip install aws-parallelcluster

Once it is installed, you can run the following command to check the version:

pcluster version

At the time of writing this blog, 2.9.1 is the most up-to-date version.

Then use the text editor of your choice and open the configuration file as follows:

vim ~/.parallelcluster/config

Replace the bolded section, if not yet filled in, by your own information and save the configuration file.

[aws]
aws_region_name = <AWS-REGION>

[global]
sanity_check = true
cluster_template = fds-smv-cluster
update_check = true

[vpc public]
vpc_id = vpc-<VPC-ID>
master_subnet_id = subnet-<SUBNET-ID>

[cluster fds-smv-cluster]
key_name = <Key-Name>
vpc_settings = public
compute_instance_type=c5n.18xlarge
master_instance_type=c5.xlarge
initial_queue_size = 0
max_queue_size = 100
scheduler=slurm
cluster_type = ondemand
s3_read_write_resource=arn:aws:s3:::fds-smv-bucket-unique*
placement_group = DYNAMIC
placement = compute
base_os = alinux2
tags = {"Name" : "fds-smv"}
disable_hyperthreading = true
fsx_settings = fsxshared
enable_efa = compute
dcv_settings = hpc-dcv

[dcv hpc-dcv]
enable = master

[fsx fsxshared]
shared_dir = /fsx
storage_capacity = 1200
import_path = s3://fds-smv-bucket-unique
imported_file_chunk_size = 1024
export_path = s3://fds-smv-bucket-unique

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

Let’s review the different sections of the configuration file and explain their role:

scheduler: Supported job schedulers are SGE, TORQUE, SLURM and AWS Batch. I have selected SLURM for this example.
cluster_type: You have the choice between On-Demand (ondemand) or Spot Instances (spot) for your compute instances. For On-Demand, instances are available for use without condition (if available in the Region selected) at a certain price per hour with the pay-as-you-go model, meaning that as soon as they are started, they are reserved for your utilization. For Spot Instances, you can take advantage of unused EC2 capacity in the AWS Cloud. Spot Instances are available at up to a 90% discount compared to On-Demand Instance prices. You can use Spot Instances for various stateless, fault-tolerant, or flexible applications such as HPC, for more information about Spot Instances, feel free to visit this webpage.
s3_read_write_resource: This parameter allows you to read and write objects directly on your S3 bucket from the cluster you created without additional permissions. It acts as a role for your cluster, allowing you access to your specified S3 bucket.
placement_group: Use DYNAMIC to ensure that your instances are located as physically close to one another as possible. Close placement minimizes the latency between compute nodes and takes advantage of EFA’s low latency networking.
placement: By selecting compute you only enforce compute instances to be placed within the same placement group, leaving the head node placement free.
compute_instance_type:Select C5n.18xlarge because it is optimized for compute-intensive workloads and supports EFA for better scaling of HPC applications. Note that EFA is supported only for specific instance types. Please visit currently supported instances for more information.
master_instance_type:This can be any instance type. As traffic between head and compute nodes is relatively small, and the head node runs during the entire lifetime of the cluster, I use c5.xlarge because it is inexpensive and is a good fit for this use case.
initial_queue_size:You start with no compute instances after the HPC cluster is up. This means that any new job submitted has some delay (time for the nodes to be powered on) before they are seen as available by the job scheduler. This helps you pay for what you use and keeps costs as low as possible.
max_queue_size:Limit the maximum compute fleet to 100 instances. This allows you room to scale your jobs up to a large number of cores, while putting a limit on the number of compute nodes to help control costs.
base_os: For this blog, select Amazon Linux 2 (alinux2) as a base OS. Currently we also support Amazon Linux (alinux), CentOS 7 (centos7), Ubuntu 16.04 (ubuntu1604), and Ubuntu 18.04 (ubuntu1804) with EFA.
disable_hyperthreading: This setting turns off hyperthreading (true) on your cluster, which is the right configuration in this use case.[fsx fsxshared]: This section contains the settings to define your FSx for Lustre parallel file system, including the location where the shared directory is mounted, the storage capacity for the file system, the chunk size for files to be imported, and the location from which the data will be imported. You can read more about FSx for Lustre here.
enable_efa: Mark as (true) in this use case since it is a tightly coupled CFD simulation use case.
dcv_settings:With AWS ParallelCluster, you can use NICE DCV to support your remote visualization needs.
[dcv hpc-dcv]:This section contains the settings to define your remote visualization setup. You can read more about DCV with AWS ParallelCluster here.
import_path: This parameter enables all the objects on the S3 bucket available when creating the cluster to be seen directly from the FSx for Lustre file system. In this case, you are able to access the FDS-SMV package and the geometry under the /fsx mounted folder.
export_path: This parameter is useful for backup purposes using the Data Repository Tasks. I share more details about this in step 7 (optional).

Step 3: Create the HPC cluster and log in

Now, you can create the HPC cluster, named fds-smv. It takes around 10 minutes to complete and you can see the status changing (going through the different AWS CloudFormation template steps). At the end of creation, two IP addresses are prompted, a public IP and/or a private IP depending on your network choice.

pcluster create fds-smv
Creating stack named: parallelcluster-fds-smv
Status: parallelcluster-fds-smv - CREATE_COMPLETE                               
MasterPublicIP: X.X.X.X
ClusterUser: ec2-user
MasterPrivateIP: X.X.X.X

In order to log in, you must use the key you specified in the AWS ParallelCluster configuration file before creating the cluster:

pcluster ssh fds-smv -i <Key-Name>

You should now be logged in as an ec2-user (since we are using Amazon Linux 2 base OS).

Step 4: Install FDS-SMV package

Now that the HPC cluster using AWS ParallelCluster is set up, it is time to install the FDS-SMV package. In the prior steps, you uploaded both the FDS-SMV package and the geometry to your S3 bucket. Since you enabled “import_path” to that bucket, they are already available on the Amazon FSx for Lustre storage under /fsx.

Run the script as follows and select /fsx/fds-smv as final target for installation:

cd /fsx
./FDS6.7.4_SMV6.7.14_lnx.sh
[ec2-user@ip-X-X-X-X fsx]$ ./FDS6.7.4_SMV6.7.14_lnx.sh 

Installing FDS and Smokeview  for Linux

Options:
  1) Press <Enter> to begin installation [default]
  2) Type "extract" to copy the installation files to:
     FDS6.7.4_SMV6.7.14_lnx.tar.gz
 

FDS install options:
  Press 1 to install in /home/ec2-user/FDS/FDS6 [default]
  Press 2 to install in /opt/FDS/FDS6
  Press 3 to install in /usr/local/bin/FDS/FDS6
  Enter a directory path to install elsewhere
/fsx/fds-smv

It is important to source the following scripts as part of the installed packages to check if the installation is successful with the correct versions. Here is the correct output you should get:

[ec2-user@ip-X-X-X-X ~]$ source /fsx/fds-smv/bin/SMV6VARS.sh 
[ec2-user@ip-X-X-X-X ~]$ source /fsx/fds-smv/bin/FDS6VARS.sh 
[ec2-user@ip-X-X-X-X ~]$ fds -version
FDS revision       : FDS6.7.4-0-gbfaa110-release
MPI library version: Intel(R) MPI Library 2019 Update 4 for Linux* OS

[ec2-user@ip-10-0-2-233 ~]$ smokeview -version

Smokeview  SMV6.7.14-0-g568693b-release - Mar  9 2020

Revision         : SMV6.7.14-0-g568693b-release
Revision Date    : Wed Mar 4 23:13:42 2020 -0500
Compilation Date : Mar  9 2020 16:31:22
Compiler         : Intel C/C++ 19.0.4.243
Checksum(SHA1)   : e801eace7c6597dc187739e51ba6f546bfde4e48
Platform         : LINUX64

Important notes:

The way FDS-SMV package has been installed is the default installation. Binaries are already compiled and Intel MPI libraries are embedded as part of the installation package. It is what one would call a self-contained application. For further builds and source codes, please visit this webpage.

Step 5: Running the fire dynamics simulation using FDS

Now that everything is installed, it is time to create the SLURM submission script. In this step, you take advantage of the FSx for Lustre File System, the compute-optimized instance, and the EFA network to maximize simulation performance.

cd /fsx/
vi fds-smv.sbatch

Here is the information you should specify in your submission script:

#!/bin/bash
#SBATCH --job-name=fds-smv-job
#SBATCH --ntasks=<Total number of MPI processes>
#SBATCH --ntasks-per-node=36
#SBATCH --output=%x_%j.out

source /fsx/fds-smv/bin/FDS6VARS.sh
source /fsx/fds-smv/bin/SMV6VARS.sh

module load intelmpi 

export OMP_NUM_THREADS=1
export I_MPI_PIN_DOMAIN=omp

cd /fsx/<results>

time mpirun -ppn 36 -np <Total number of MPI processes>  fds geometry.fds

Replace the <results> with the one of your choice, and don’t forget to copy the geometry.fds file in it before submitting your job. Once ready, save the file and submit the job using the following command:

sbatch fds-smv.sbatch

If you decided to build your HPC cluster with c5n.18xlarge instances, the number of MPI processes per node is 36 since you turned off the hyperthreading, and that the instance has 36 physical cores. That is the meaning of the “#SBATCH --ntasks-per-node=36” line.

For any run exceeding 36 MPI processes, the job is split among multiple instances and take advantage of EFA for internode communication.

It is important to note that FDS only allows the number of MPI processes to be equal to the number of meshes in the input geometry (geometry.fds in this scenario). In case the number of meshes in the input geometry cannot be modified, OpenMP threads can be enabled and efficiently increase performance. Do this using up to four OpenMP Threads across four CPU cores attached to one MPI process.

Please read best practices provided by NIST for that topic on their user guide.

In order to take advantage of the distributed computing capability of FDS, it is mandatory to work first on the input geometry, and divide it into the appropriate number of meshes. It is also highly advised to evenly distribute the number of cells/elements per mesh across all meshes. This best practice optimizes the load balancing for each CPU core.

Step 6: Visualizing the results using NICE DCV and SMV

In order to visualize results, you must connect to the head node using NICE DCV streaming protocol.

As a reminder, the current instance type for the head node is a c5.xlarge, which is not a graphics-accelerated instance. For heavy and GPU intensive visualization, it is important to set up a more appropriate instance such as the G4 instance group.

Go back to your AWS Cloud9 instance, open a new terminal side by side to your session connected to your AWS HPC cluster, and enter the following command in the terminal:

pcluster dcv connect fds-smv -k <Key-Name>

You are provided a one-time HTTPS URL available for a short period of time in order to connect to your head node using the NICE DCV protocol.

Once connected, open the terminal inside your session and source the FDS-SMV scripts as before:

source /fsx/fds-smv/bin/FDS6VARS.sh
source /fsx/fds-smv/bin/SMV6VARS.sh

Navigate to your <results> folder and start SMV with your result.

I have selected one of the geometries named fire_whirl_pool.fds in the Examples folder, part of the default FDS-SMV installation package located here:

/fsx/fds-smv/Examples/Fires/fire_whirl_pool.fds

You can find other scenarios under the Examples folder to run some more use cases if you did not already choose your geometry.fds file.

Now you can run SMV and visualize your results:

smokeview fire_whirl_pool.smv

SMV (smokeview) takes as an input .smv extension files, please replace with your appropriate file. If you have already chosen your geometry.fds, then run the following command:

smokeview geometry.smv

The application then open as follows, and you can visualize the results. The following image is an output of the SOOT DENSITY of the 3D smoke.

fire simulation picture

Step 7 (optional): Back up your FDS-SMV results to an S3 bucket

First update the AWS CLI to its most recent version. It is compatible with 1.16.309 and above.

After running your FDS-SMV simulation, you can back up your data in /fsx to the S3 bucket you used earlier to upload the installation package, and input files using Data Repository Tasks.

Data Repository Tasks represent bulk operations between your Amazon FSx for Lustre file system and your S3 bucket. One of the jobs is to export your changed file system contents back to its linked S3 bucket.

Open your AWS Cloud9 terminal and exit the HPC head node cluster. Retrieve your Amazon FSx for Lustre ID using:

aws fsx describe-file-systems

It looks something like, fs-0533eebf1148fc8dd. Then create a backup of the data as follows:

aws fsx create-data-repository-task --file-system-id fs-0533eebf1148fc8dd --type EXPORT_TO_REPOSITORY --paths results --report Enabled=true,Scope=FAILED_FILES_ONLY,Format=REPORT_CSV_20191124,Path=s3://fds-smv-bucket-unique/

The following are definitions about the command parameters:

file-system-id: Your file system ID.
type EXPORT_TO_REPOSITORY: Exports the data back to the S3 bucket.
paths results: The directory you want to export to your S3 bucket. If you have more than one folder to back up, use a comma-separated notation such as: results1,results2,…
Format=REPORT_CSV_20191124: Note this is only the name the Amazon FSx Lustre supports. Please keep it the same.

You can check the backup status by running:

aws fsx describe-data-repository-tasks

Please wait for the copy to be achieved, once finished you should see on the Lifecycle line "Lifecycle": "SUCCEEDED"

Also go back to your S3 bucket, and your folder(s) should appear with all the files correctly uploaded from your /fsx folder you specified.

In terms of data management, Amazon S3 is an important service. You started by uploading installation package and geometry files from an external source, such as your laptop or an on-premises system. Then made these files available to the AWS HPC cluster under the Amazon FSx for Lustre file system and ran the simulation. Finally, you backed up the results from the Amazon FSx for Lustre to Amazon S3. You can also decide to download the results on Amazon S3 back to your local system if needed.

Step 8: Delete your AWS resources created during the deployment of this blog

After your run is completed and your data backed up successfully (Step 7 is optional) on your S3 bucket, you can then delete your cluster by using the following command in your Cloud9 terminal:

pcluster delete fds-smv

Warning:

If you run the command above all resources you created during this blog are automatically deleted beside your Cloud9 session and your data on your S3 bucket you created earlier.

Your S3 bucket still contains your input “geometry.fds” and your installation package “FDS6.7.4_SMV6.7.14_lnx.sh” files.

If you selected to back up your data during Step 7 (optional), then your S3 bucket also contains that data on top of the two previous files mentioned above.

If you want to delete your S3 bucket and all data mentioned above, go to your AWS Management Console, select S3 service then select your S3 bucket and hit delete on the top section.

If you want to terminate your Cloud9 session, go to your AWS Management Console, select Cloud9 service then select your session and hit delete on the top right section.

After performing these operations, there will be no more resources running on AWS related to this blog.

Conclusion

I showed that AWS ParallelCluster, Amazon FSx for Lustre, EFA, and Amazon S3 are key AWS services and features for HPC workloads such as CFD and in particular for FDS.

You can achieve simulation times of hours on AWS rather than days or weeks on a single workstation.

Please visit this workshop for a more in-depth tutorial on running Fire Dynamics Simulation on AWS and our HPC dedicated homepage.

Event-driven architecture for using third-party Git repositories as source for AWS CodePipeline

2020-10-19 Kirankumar Chandrashekar

Post Syndicated from Kirankumar Chandrashekar original https://aws.amazon.com/blogs/devops/event-driven-architecture-for-using-third-party-git-repositories-as-source-for-aws-codepipeline/

In the post Using Custom Source Actions in AWS CodePipeline for Increased Visibility for Third-Party Source Control, we demonstrated using custom actions in AWS CodePipeline and a worker that periodically polls for jobs and processes further to get the artifact from the Git repository. In this post, we discuss using an event-driven architecture to trigger an AWS CodePipeline pipeline that has a third-party Git repository within the source stage that is part of a custom action.

Instead of using a worker to periodically poll for available jobs across all pipelines, we can define a custom source action on a particular pipeline to trigger an Amazon CloudWatch Events rule when the webhook on CodePipeline receives an event and puts it into an In Progress state. This works exactly like how CodePipeline works with natively supported Git repositories like AWS CodeCommit or GitHub as a source.

Solution architecture

The following diagram shows how you can use an event-driven architecture with a custom source stage that is associated with a third-party Git repository that isn’t supported by CodePipeline natively. For our use case, we use GitLab, but you can use any Git repository that supports Git webhooks.

The architecture includes the following steps:

1. A user commits code to a Git repository.

2. The commit invokes a Git webhook.

3. This invokes a CodePipeline webhook.

4. The CodePipeline source stage is put into In Progress status.

5. The source stage action triggers a CloudWatch Events rule that indicates the stage started.

6. The CloudWatch event triggers an AWS Lambda function.

7. The function polls for the job details of the custom action.

8. The function also triggers AWS CodeBuild and passes all the job-related information.

9. CodeBuild gets the public SSH key stored in AWS Secrets Manager (or user name and password, if using HTTPS Git access).

10. CodeBuild clones the repository for a particular branch.

11. CodeBuild zips and uploads the archive to the CodePipeline artifact store Amazon Simple Storage Service (Amazon S3) bucket.

12. A Lambda function sends a success message to the CodePipeline source stage so it can proceed to the next stage.

Similarly, with the same setup, if you chose a release change for the pipeline that has custom source stage, a CloudWatch event is triggered, which triggers a Lambda function, and the same process repeats until it gets the artifact from the Git repository.

Solution overview

To set up the solution, you complete the following steps:

1. Create an SSH key pair for authenticating to the Git repository.

2. Publish the key to Secrets Manager.

3. Launch the AWS CloudFormation stack to provision resources.

4. Deploy a sample CodePipeline and test the custom action type.

5. Retrieve the webhook URL.

6. Create a webhook and add the webhook URL.

Creating an SSH key pair
You first create an SSH key pair to use for authenticating to the Git repository using ssh-keygen on your terminal. See the following code:

ssh-keygen -t rsa -b 4096 -C "[email protected]"

Follow the prompt from ssh-keygen and give a name for the key, for example codepipeline_git_rsa. This creates two new files in the current directory: codepipeline_git_rsa and codepipeline_git_rsa.pub.

Make a note of the contents of codepipeline_git_rsa.pub and add it as an authorized key for your Git user. For instructions, see Adding an SSH key to your GitLab account.

Publishing the key
Publish this key to Secrets Manager using the AWS Command Line Interface (AWS CLI):

export SecretsManagerArn=$(aws secretsmanager create-secret --name codepipeline_git \
--secret-string file://codepipeline_git_rsa --query ARN --output text)

Make a note of the ARN, which is required later.

Alternative, you can create a secret on the Secrets Manager console.

Make sure that the lines in the private key codepipeline_git are the same when the value to the secret is added.

Launching the CloudFormation stack

Clone the git repository aws-codepipeline-third-party-git-repositories that contains the AWS CloudFormation templates and AWS Lambda function code using the below command:

git clone https://github.com/aws-samples/aws-codepipeline-third-party-git-repositories.git .

Now you should have the below files in the cloned repository

cfn/
|--sample_pipeline_custom.yaml
`--third_party_git_custom_action.yaml
lambda/
`--lambda_function.py

Launch the CloudFormation stack using the template third_party_git_custom_action.yaml from the cfn directory. The main resources created by this stack are:

1. CodePipeline Custom Action Type. ResourceType: AWS::CodePipeline::CustomActionType
2. Lambda Function. ResourceType: AWS::Lambda::Function
3. CodeBuild Project. ResourceType: AWS::CodeBuild::Project
4. Lambda Execution Role. ResourceType: AWS::IAM::Role
5. CodeBuild Service Role. ResourceType: AWS::IAM::Role

These resources help uplift the logic for connecting to the Git repository, which for this post is GitLab.

Upload the Lambda function code to any S3 bucket in the same Region where the stack is being deployed. To create a new S3 bucket, use the following code (make sure to provide a unique name):

export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export S3_BUCKET_NAME=codepipeline-git-custom-action-${ACCOUNT_ID} 
aws s3 mb s3://${S3_BUCKET_NAME} --region us-east-1

Then zip the contents of the function and upload to the S3 bucket (substitute the appropriate bucket name):

export ZIP_FILE_NAME="codepipeline_git.zip"
zip -jr ${ZIP_FILE_NAME} ./lambda/lambda_function.py && \
aws s3 cp codepipeline_git.zip \
s3://${S3_BUCKET_NAME}/${ZIP_FILE_NAME}

If you don’t have a VPC and subnets that Lambda and CodeBuild require, you can create those by launching the following CloudFormation stack.

Run the following AWS CLI command to deploy the third-party Git source solution stack:

export vpcId="vpc-123456789"
export subnetId1="subnet-12345"
export subnetId2="subnet-54321"
export GIT_SOURCE_STACK_NAME="thirdparty-codepipeline-git-source"
aws cloudformation create-stack \
--stack-name ${GIT_SOURCE_STACK_NAME} \
--template-body file://$(pwd)/cfn/third_party_git_custom_action.yaml \
--parameters ParameterKey=SourceActionVersion,ParameterValue=1 \
ParameterKey=SourceActionProvider,ParameterValue=CustomSourceForGit \
ParameterKey=GitPullLambdaSubnet,ParameterValue=${subnetId1}\\,${subnetId2} \
ParameterKey=GitPullLambdaVpc,ParameterValue=${vpcId} \
ParameterKey=LambdaCodeS3Bucket,ParameterValue=${S3_BUCKET_NAME} \
ParameterKey=LambdaCodeS3Key,ParameterValue=${ZIP_FILE_NAME} \
--capabilities CAPABILITY_IAM

Alternatively, launch the stack by choosing Launch Stack:

For more information about the VPC requirements for Lambda and CodeBuild, see Internet and service access for VPC-connected functions and Use AWS CodeBuild with Amazon Virtual Private Cloud, respectively.

A custom source action type is now available on the account where you deployed the stack. You can check this on the CodePipeline console by attempting to create a new pipeline. You can see your source type listed under Source provider.

Testing the pipeline

We now deploy a sample pipeline and test the custom action type using the template sample_pipeline_custom.yaml from the cfn directory . You can run the following AWS CLI command to deploy the CloudFormation stack:

Note: Please provide the GitLab repository url to SSH_URL environment variable that you have access to or create a new GitLab project and repository. The example url "[email protected]:kirankumar15/test.git" is for illustration purposes only.

export SSH_URL="[email protected]:kirankumar15/test.git"
export SAMPLE_STACK_NAME="third-party-codepipeline-git-source-test"
aws cloudformation create-stack \
--stack-name ${SAMPLE_STACK_NAME}\
--template-body file://$(pwd)/cfn/sample_pipeline_custom.yaml \
--parameters ParameterKey=Branch,ParameterValue=master \
ParameterKey=GitUrl,ParameterValue=${SSH_URL} \
ParameterKey=SourceActionVersion,ParameterValue=1 \
ParameterKey=SourceActionProvider,ParameterValue=CustomSourceForGit \
ParameterKey=CodePipelineName,ParameterValue=sampleCodePipeline \
ParameterKey=SecretsManagerArnForSSHPrivateKey,ParameterValue=${SecretsManagerArn} \
ParameterKey=GitWebHookIpAddress,ParameterValue=34.74.90.64/28 \
--capabilities CAPABILITY_IAM

Alternatively, choose Launch stack:

Retrieving the webhook URL
When the stack creation is complete, retrieve the CodePipeline webhook URL from the stack outputs. Use the following AWS CLI command:

aws cloudformation describe-stacks \
--stack-name ${SAMPLE_STACK_NAME}\ 
--output text \
--query "Stacks[].Outputs[?OutputKey=='CodePipelineWebHookUrl'].OutputValue"

For more information about stack outputs, see Outputs.

Creating a webhook

You can use an existing GitLab repository or create a new GitLab repository and follow the below steps to add a webhook to it.
To create your webhook, complete the following steps:

1. Navigate to the Webhooks Settings section on the GitLab console for the repository that you want to have as a source for CodePipeline.

2. For URL, enter the CodePipeline webhook URL you retrieved in the previous step.

3. Select Push events and optionally enter a branch name.

4. Select Enable SSL verification.

5. Choose Add webhook.

For more information about webhooks, see Webhooks.

We’re now ready to test the solution.

Testing the solution

To test the solution, we make changes to the branch that we passed as the branch parameter in the GitLab repository. This should trigger the pipeline. On the CodePipeline console, you can see the Git Commit ID on the source stage of the pipeline when it succeeds.

Note: Please provide the GitLab repository url that you have access to or create a new GitLab repository. and make sure that it has buildspec.yml in the contents to execute in AWS CodeBuild project in the Build stage. The example url "[email protected]:kirankumar15/test.git" is for illustration purposes only.

Enter the following code to clone your repository:

git clone [email protected]:kirankumar15/test.git .

Add a sample file to the repository with the name sample.txt, then commit and push it to the repository:

echo "adding a sample file" >> sample_text_file.txt
git add ./
git commit -m "added sample_test_file.txt to the repository"
git push -u origin master

The pipeline should show the status In Progress.

After few minutes, it changes to Succeeded status and you see the Git Commit message on the source stage.

You can also view the Git Commit message by choosing the execution ID of the pipeline, navigating to the timeline section, and choosing the source action. You should see the Commit message and Commit ID that correlates with the Git repository.

Troubleshooting

If the CodePipeline fails, check the Lambda function logs for the function with the name GitLab-CodePipeline-Source-${ACCOUNT_ID}. For instructions on checking logs, see Accessing Amazon CloudWatch logs for AWS Lambda.

If the Lambda logs has the CodeBuild build ID, then check the CodeBuild run logs for that build ID for errors. For instructions, see View detailed build information.

Cleaning up

Delete the CloudFormation stacks that you created. You can use the following AWS CLI commands:

aws cloudformation delete-stack --stack-name ${SAMPLE_STACK_NAME}

aws cloudformation delete-stack --stack-name ${GIT_SOURCE_STACK_NAME}

Alternatively, delete the stack on the AWS CloudFormation console.

Additionally, empty the S3 bucket and delete it. Locate the bucket in the ${SAMPLE_STACK_NAME} stack. Then use the following AWS CLI command:

aws s3 rb s3://${S3_BUCKET_NAME} --force

You can also delete and empty the bucket on the Amazon S3 console.

Conclusion

You can use the architecture in this post for any Git repository that supports webhooks. This solution also works if the repository is reachable only from on premises, and if the endpoints can be accessed from a VPC. This event-driven architecture works just like using any natively supported source for CodePipeline.

About the Author

Kirankumar Chandrashekar is a DevOps consultant at AWS Professional Services. He focuses on leading customers in architecting DevOps technologies. Kirankumar is passionate about Infrastructure as Code and DevOps. He enjoys music, as well as cooking and travelling.

Building a cross-account CI/CD pipeline for single-tenant SaaS solutions

2020-10-18 Rafael Ramos

Post Syndicated from Rafael Ramos original https://aws.amazon.com/blogs/devops/cross-account-ci-cd-pipeline-single-tenant-saas/

With the increasing demand from enterprise customers for a pay-as-you-go consumption model, more and more independent software vendors (ISVs) are shifting their business model towards software as a service (SaaS). Usually this kind of solution is architected using a multi-tenant model. It means that the infrastructure resources and applications are shared across multiple customers, with mechanisms in place to isolate their environments from each other. However, you may not want or can’t afford to share resources for security or compliance reasons, so you need a single-tenant environment.

To achieve this higher level of segregation across the tenants, it’s recommended to isolate the environments on the AWS account level. This strategy brings benefits, such as no network overlapping, no account limits sharing, and simplified usage tracking and billing, but it comes with challenges from an operational standpoint. Whereas multi-tenant solutions require management of a single shared production environment, single-tenant installations consist of dedicated production environments for each customer, without any shared resources across the tenants. When the number of tenants starts to grow, delivering new features at a rapid pace becomes harder to accomplish, because each new version needs to be manually deployed on each tenant environment.

This post describes how to automate this deployment process to deliver software quickly, securely, and less error-prone for each existing tenant. I demonstrate all the steps to build and configure a CI/CD pipeline using AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, and AWS CloudFormation. For each new version, the pipeline automatically deploys the same application version on the multiple tenant AWS accounts.

There are different caveats to build such cross-account CI/CD pipelines on AWS. Because of that, I use AWS Command Line Interface (AWS CLI) to manually go through the process and demonstrate in detail the various configuration aspects you have to handle, such as artifact encryption, cross-account permission granting, and pipeline actions.

Single-tenancy vs. multi-tenancy

One of the first aspects to consider when architecting your SaaS solution is its tenancy model. Each brings their own benefits and architectural challenges. On multi-tenant installations, each customer shares the same set of resources, including databases and applications. With this mode, you can use the servers’ capacity more efficiently, which generally leads to significant cost-saving opportunities. On the other hand, you have to carefully secure your solution to prevent a customer from accessing sensitive data from another. Designing for high availability becomes even more critical on multi-tenant workloads, because more customers are affected in the event of downtime.

Because the environments are by definition isolated from each other, single-tenant solutions are simpler to design when it comes to security, networking isolation, and data segregation. Likewise, you can customize the applications per customer, and have different versions for specific tenants. You also have the advantage of eliminating the noisy-neighbor effect, and can plan the infrastructure for the customer’s scalability requirements. As a drawback, in comparison with multi-tenant, the single-tenant model is operationally more complex because you have more servers and applications to maintain.

Which tenancy model to choose depends ultimately on whether you can meet your customer needs. They might have specific governance requirements, be bound to a certain industry regulation, or have compliance criteria that influences which model they can choose. For more information about modeling your SaaS solutions, see SaaS on AWS.

Solution overview

To demonstrate this solution, I consider a fictitious single-tenant ISV with two customers: Unicorn and Gnome. It uses one central account where the tools reside (Tooling account), and two other accounts, each representing a tenant (Unicorn and Gnome accounts). As depicted in the following architecture diagram, when a developer pushes code changes to CodeCommit, Amazon CloudWatch Events triggers the CodePipeline CI/CD pipeline, which automatically deploys a new version on each tenant’s AWS account. It ensures that the fictitious ISV doesn’t have the operational burden to manually re-deploy the same version for each end-customers.

Architecture diagram of a CI/CD pipeline for single-tenant SaaS solutions

For illustration purposes, the sample application I use in this post is an AWS Lambda function that returns a simple JSON object when invoked.

Prerequisites

Before getting started, you must have the following prerequisites:

Three AWS accounts:
- Tooling – Where the CodeCommit repository, the artifact store, and the pipeline orchestration reside.
- Tenant 1 – The dedicated account for the first tenant, called Unicorn.
- Tenant 2 – The dedicated account for the second tenant, called Gnome.
Install and authenticate the AWS CLI. You can authenticate with an AWS Identity and Access Management (IAM) user or an AWS Security Token Service (AWS STS) token.
Install Git.

Setting up the Git repository

Your first step is to set up your Git repository.

Create a CodeCommit repository to host the source code.

The CI/CD pipeline is automatically triggered every time new code is pushed to that repository.

Make sure Git is configured to use IAM credentials to access AWS CodeCommit via HTTP by running the following command from the terminal:

git config --global credential.helper '!aws codecommit credential-helper $@'
git config --global credential.UseHttpPath true

Clone the newly created repository locally, and add two files in the root folder: index.js and application.yaml.

The first file is the JavaScript code for the Lambda function that represents the sample application. For our use case, the function returns a JSON response object with statusCode: 200 and the body Hello!\n. See the following code:

exports.handler = async (event) => {
    const response = {
        statusCode: 200,
        body: `Hello!\n`,
    };
    return response;
};

The second file is where the infrastructure is defined using AWS CloudFormation. The sample application consists of a Lambda function, and we use AWS Serverless Application Model (AWS SAM) to simplify the resources creation. See the following code:

AWSTemplateFormatVersion: '2010-09-09'
Transform: 'AWS::Serverless-2016-10-31'
Description: Sample Application.

Parameters:
    S3Bucket:
        Type: String
    S3Key:
        Type: String
    ApplicationName:
        Type: String
        
Resources:
    SampleApplication:
        Type: 'AWS::Serverless::Function'
        Properties:
            FunctionName: !Ref ApplicationName
            Handler: index.handler
            Runtime: nodejs12.x
            CodeUri:
                Bucket: !Ref S3Bucket
                Key: !Ref S3Key
            Description: Hello Lambda.
            MemorySize: 128
            Timeout: 10

Push both files to the remote Git repository.

Creating the artifact store encryption key

By default, CodePipeline uses server-side encryption with an AWS Key Management Service (AWS KMS) managed customer master key (CMK) to encrypt the release artifacts. Because the Unicorn and Gnome accounts need to decrypt those release artifacts, you need to create a customer managed CMK in the Tooling account.

From the terminal, run the following command to create the artifact encryption key:

aws kms create-key --region <YOUR_REGION>

This command returns a JSON object with the key ARN property if run successfully. Its format is similar to arn:aws:kms:<YOUR_REGION>:<TOOLING_ACCOUNT_ID>:key/<KEY_ID>. Record this value to use in the following steps.

The encryption key has been created manually for educational purposes only, but it’s considered a best practice to have it as part of the Infrastructure as Code (IaC) bundle.

Creating an Amazon S3 artifact store and configuring a bucket policy

Our use case uses Amazon Simple Storage Service (Amazon S3) as artifact store. Every release artifact is encrypted and stored as an object in an S3 bucket that lives in the Tooling account.

To create and configure the artifact store, follow these steps in the Tooling account:

From the terminal, create an S3 bucket and give it a unique name:

aws s3api create-bucket \
    --bucket <BUCKET_UNIQUE_NAME> \
    --region <YOUR_REGION> \
    --create-bucket-configuration LocationConstraint=<YOUR_REGION>

Configure the bucket to use the customer managed CMK created in the previous step. This makes sure the objects stored in this bucket are encrypted using that key, replacing <KEY_ARN> with the ARN property from the previous step:

aws s3api put-bucket-encryption \
    --bucket <BUCKET_UNIQUE_NAME> \
    --server-side-encryption-configuration \
        '{
            "Rules": [
                {
                    "ApplyServerSideEncryptionByDefault": {
                        "SSEAlgorithm": "aws:kms",
                        "KMSMasterKeyID": "<KEY_ARN>"
                    }
                }
            ]
        }'

The artifacts stored in the bucket need to be accessed from the Unicorn and Gnome Configure the bucket policies to allow cross-account access:

aws s3api put-bucket-policy \
    --bucket <BUCKET_UNIQUE_NAME> \
    --policy \
        '{
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": [
                        "s3:GetBucket*",
                        "s3:List*"
                    ],
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": [
                            "arn:aws:iam::<UNICORN_ACCOUNT_ID>:root",
                            "arn:aws:iam::<GNOME_ACCOUNT_ID>:root"
                        ]
                    },
                    "Resource": [
                        "arn:aws:s3:::<BUCKET_UNIQUE_NAME>"
                    ]
                },
                {
                    "Action": [
                        "s3:GetObject*"
                    ],
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": [
                            "arn:aws:iam::<UNICORN_ACCOUNT_ID>:root",
                            "arn:aws:iam::<GNOME_ACCOUNT_ID>:root"
                        ]
                    },
                    "Resource": [
                        "arn:aws:s3:::<BUCKET_UNIQUE_NAME>/CrossAccountPipeline/*"
                    ]
                }
            ]
        }'

This S3 bucket has been created manually for educational purposes only, but it’s considered a best practice to have it as part of the IaC bundle.

Creating a cross-account IAM role in each tenant account

Following the security best practice of granting least privilege, each action declared on CodePipeline should have its own IAM role. For this use case, the pipeline needs to perform changes in the Unicorn and Gnome accounts from the Tooling account, so you need to create a cross-account IAM role in each tenant account.

Repeat the following steps for each tenant account to allow CodePipeline to assume role in those accounts:

Configure a named CLI profile for the tenant account to allow running commands using the correct access keys.
Create an IAM role that can be assumed from another AWS account, replacing <TENANT_PROFILE_NAME> with the profile name you defined in the previous step:

aws iam create-role \
    --role-name CodePipelineCrossAccountRole \
    --profile <TENANT_PROFILE_NAME> \
    --assume-role-policy-document \
        '{
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": "arn:aws:iam::<TOOLING_ACCOUNT_ID>:root"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }'

Create an IAM policy that grants access to the artifact store S3 bucket and to the artifact encryption key:

aws iam create-policy \
    --policy-name CodePipelineCrossAccountArtifactReadPolicy \
    --profile <TENANT_PROFILE_NAME> \
    --policy-document \
        '{
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": [
                        "s3:GetBucket*",
                        "s3:ListBucket"
                    ],
                    "Resource": [
                        "arn:aws:s3:::<BUCKET_UNIQUE_NAME>"
                    ],
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "s3:GetObject*",
                        "s3:Put*"
                    ],
                    "Resource": [
                        "arn:aws:s3:::<BUCKET_UNIQUE_NAME>/CrossAccountPipeline/*"
                    ],
                    "Effect": "Allow"
                },
                {
                    "Action": [ 
                        "kms:DescribeKey", 
                        "kms:GenerateDataKey*", 
                        "kms:Encrypt", 
                        "kms:ReEncrypt*", 
                        "kms:Decrypt" 
                    ], 
                    "Resource": "<KEY_ARN>",
                    "Effect": "Allow"
                }
            ]
        }'

Attach the CodePipelineCrossAccountArtifactReadPolicy IAM policy to the CodePipelineCrossAccountRole IAM role:

aws iam attach-role-policy \
    --profile <TENANT_PROFILE_NAME> \
    --role-name CodePipelineCrossAccountRole \
    --policy-arn arn:aws:iam::<TENANT_ACCOUNT_ID>:policy/CodePipelineCrossAccountArtifactReadPolicy

Create an IAM policy that allows to pass the IAM role CloudFormationDeploymentRole to CloudFormation and to perform CloudFormation actions on the application Stack:

aws iam create-policy \
    --policy-name CodePipelineCrossAccountCfnPolicy \
    --profile <TENANT_PROFILE_NAME> \
    --policy-document \
        '{
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": [
                        "iam:PassRole"
                    ],
                    "Resource": "arn:aws:iam::<TENANT_ACCOUNT_ID>:role/CloudFormationDeploymentRole",
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "cloudformation:*"
                    ],
                    "Resource": "arn:aws:cloudformation:<YOUR_REGION>:<TENANT_ACCOUNT_ID>:stack/SampleApplication*/*",
                    "Effect": "Allow"
                }
            ]
        }'

Attach the CodePipelineCrossAccountCfnPolicy IAM policy to the CodePipelineCrossAccountRole IAM role:

aws iam attach-role-policy \
    --profile <TENANT_PROFILE_NAME> \
    --role-name CodePipelineCrossAccountRole \
    --policy-arn arn:aws:iam::<TENANT_ACCOUNT_ID>:policy/CodePipelineCrossAccountCfnPolicy

Additional configuration is needed in the Tooling account to allow access, which you complete later on.

Creating a deployment IAM role in each tenant account

After CodePipeline assumes the CodePipelineCrossAccountRole IAM role into the tenant account, it triggers AWS CloudFormation to provision the infrastructure based on the template defined in the application.yaml file. For that, AWS CloudFormation needs to assume an IAM role that grants privileges to create resources into the tenant AWS account.

Repeat the following steps for each tenant account to allow AWS CloudFormation to create resources in those accounts:

Create an IAM role that can be assumed by AWS CloudFormation:

aws iam create-role \
    --role-name CloudFormationDeploymentRole \
    --profile <TENANT_PROFILE_NAME> \
    --assume-role-policy-document \
        '{
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "cloudformation.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }'

Create an IAM policy that grants permissions to create AWS resources:

aws iam create-policy \
    --policy-name CloudFormationDeploymentPolicy \
    --profile <TENANT_PROFILE_NAME> \
    --policy-document \
        '{
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": "iam:PassRole",
                    "Resource": "arn:aws:iam::<TENANT_ACCOUNT_ID>:role/*",
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "iam:GetRole",
                        "iam:CreateRole",
                        "iam:DeleteRole",
                        "iam:AttachRolePolicy",
                        "iam:DetachRolePolicy"
                    ],
                    "Resource": "arn:aws:iam::<TENANT_ACCOUNT_ID>:role/*",
                    "Effect": "Allow"
                },
                {
                    "Action": "lambda:*",
                    "Resource": "*",
                    "Effect": "Allow"
                },
                {
                    "Action": "codedeploy:*",
                    "Resource": "*",
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "s3:GetObject*",
                        "s3:GetBucket*",
                        "s3:List*"
                    ],
                    "Resource": [
                        "arn:aws:s3:::<BUCKET_UNIQUE_NAME>",
                        "arn:aws:s3:::<BUCKET_UNIQUE_NAME>/*"
                    ],
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "kms:Decrypt",
                        "kms:DescribeKey"
                    ],
                    "Resource": "<KEY_ARN>",
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "cloudformation:CreateStack",
                        "cloudformation:DescribeStack*",
                        "cloudformation:GetStackPolicy",
                        "cloudformation:GetTemplate*",
                        "cloudformation:SetStackPolicy",
                        "cloudformation:UpdateStack",
                        "cloudformation:ValidateTemplate"
                    ],
                    "Resource": "arn:aws:cloudformation:<YOUR_REGION>:<TENANT_ACCOUNT_ID>:stack/SampleApplication*/*",
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "cloudformation:CreateChangeSet"
                    ],
                    "Resource": "arn:aws:cloudformation:<YOUR_REGION>:aws:transform/Serverless-2016-10-31",
                    "Effect": "Allow"
                }
            ]
        }'

The granted permissions in this IAM policy depend on the resources your application needs to be provisioned. Because the application in our use case consists of a simple Lambda function, the IAM policy only needs permissions over Lambda. The other permissions declared are to access and decrypt the Lambda code from the artifact store, use AWS CodeDeploy to deploy the function, and create and attach the Lambda execution role.

Attach the IAM policy to the IAM role:

aws iam attach-role-policy \
    --profile <TENANT_PROFILE_NAME> \
    --role-name CloudFormationDeploymentRole \
    --policy-arn arn:aws:iam::<TENANT_ACCOUNT_ID>:policy/CloudFormationDeploymentPolicy

Configuring an artifact store encryption key

Even though the IAM roles created in the tenant accounts declare permissions to use the CMK encryption key, that’s not enough to have access to the key. To access the key, you must update the CMK key policy.

From the terminal, run the following command to attach the new policy:

aws kms put-key-policy \
    --key-id <KEY_ARN> \
    --policy-name default \
    --region <YOUR_REGION> \
    --policy \
        '{
             "Id": "TenantAccountAccess",
             "Version": "2012-10-17",
             "Statement": [
                {
                    "Sid": "Enable IAM User Permissions",
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": "arn:aws:iam::<TOOLING_ACCOUNT_ID>:root"
                    },
                    "Action": "kms:*",
                    "Resource": "*"
                },
                {
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": [
                            "arn:aws:iam::<GNOME_ACCOUNT_ID>:role/CloudFormationDeploymentRole",
                            "arn:aws:iam::<GNOME_ACCOUNT_ID>:role/CodePipelineCrossAccountRole",
                            "arn:aws:iam::<UNICORN_ACCOUNT_ID>:role/CloudFormationDeploymentRole",
                            "arn:aws:iam::<UNICORN_ACCOUNT_ID>:role/CodePipelineCrossAccountRole"
                        ]
                    },
                    "Action": [
                        "kms:Decrypt",
                        "kms:DescribeKey"
                    ],
                    "Resource": "*"
                }
             ]
         }'

Provisioning the CI/CD pipeline

Each CodePipeline workflow consists of two or more stages, which are composed by a series of parallel or serial actions. For our use case, the pipeline is made up of four stages:

Source – Declares CodeCommit as the source control for the application code.
Build – Using CodeBuild, it installs the dependencies and builds deployable artifacts. In this use case, the sample application is too simple and this stage is used for illustration purposes.
Deploy_Dev – Deploys the sample application on a sandbox environment. At this point, the deployable artifacts generated at the Build stage are used to create a CloudFormation stack and deploy the Lambda function.
Deploy_Prod – Similar to Deploy_Dev, at this stage the sample application is deployed on the tenant production environments. For that, it contains two actions (one per tenant) that are run in parallel. CodePipeline uses CodePipelineCrossAccountRole to assume a role on the tenant account, and from there, CloudFormationDeploymentRole is used to effectively deploy the application.

To provision your resources, complete the following steps from the terminal:

Download the CloudFormation pipeline template:

curl -LO https://cross-account-ci-cd-pipeline-single-tenant-saas.s3.amazonaws.com/pipeline.yaml

Deploy the CloudFormation stack using the pipeline template:

aws cloudformation deploy \
    --template-file pipeline.yaml \
    --region <YOUR_REGION> \
    --stack-name <YOUR_PIPELINE_STACK_NAME> \
    --capabilities CAPABILITY_IAM \
    --parameter-overrides \
        ArtifactBucketName=<BUCKET_UNIQUE_NAME> \
        ArtifactEncryptionKeyArn=<KMS_KEY_ARN> \
        UnicornAccountId=<UNICORN_TENANT_ACCOUNT_ID> \
        GnomeAccountId=<GNOME_TENANT_ACCOUNT_ID> \
        SampleApplicationRepositoryName=<YOUR_CODECOMMIT_REPOSITORY_NAME> \
        RepositoryBranch=<YOUR_CODECOMMIT_MAIN_BRANCH>

This is the list of the required parameters to deploy the template:

- ArtifactBucketName – The name of the S3 bucket where the deployment artifacts are to be stored.
- ArtifactEncryptionKeyArn – The ARN of the customer managed CMK to be used as artifact encryption key.
- UnicornAccountId – The AWS account ID for the first tenant (Unicorn) where the application is to be deployed.
- GnomeAccountId – The AWS account ID for the second tenant (Gnome) where the application is to be deployed.
- SampleApplicationRepositoryName – The name of the CodeCommit repository where source changes are detected.
- RepositoryBranch – The name of the CodeCommit branch where source changes are detected. The default value is master in case no value is provided.

Wait for AWS CloudFormation to create the resources.

When stack creation is complete, the pipeline starts automatically.

For each existing tenant, an action is declared within the Deploy_Prod stage. The following code is a snippet of how these actions are configured to deploy the application on a different account:

RoleArn: !Sub arn:aws:iam::${UnicornAccountId}:role/CodePipelineCrossAccountRole
Configuration:
    ActionMode: CREATE_UPDATE
    Capabilities: CAPABILITY_IAM,CAPABILITY_AUTO_EXPAND
    StackName: !Sub SampleApplication-unicorn-stack-${AWS::Region}
    RoleArn: !Sub arn:aws:iam::${UnicornAccountId}:role/CloudFormationDeploymentRole
    TemplatePath: CodeCommitSource::application.yaml
    ParameterOverrides: !Sub | 
        { 
            "ApplicationName": "SampleApplication-Unicorn",
            "S3Bucket": { "Fn::GetArtifactAtt" : [ "ApplicationBuildOutput", "BucketName" ] },
            "S3Key": { "Fn::GetArtifactAtt" : [ "ApplicationBuildOutput", "ObjectKey" ] }
        }

The code declares two IAM roles. The first one is the IAM role assumed by the CodePipeline action to access the tenant AWS account, whereas the second is the IAM role used by AWS CloudFormation to create AWS resources in the tenant AWS account. The ParameterOverrides configuration declares where the release artifact is located. The S3 bucket and key are in the Tooling account and encrypted using the customer managed CMK. That’s why it was necessary to grant access from external accounts using a bucket and KMS policies.

Besides the CI/CD pipeline itself, this CloudFormation template declares IAM roles that are used by the pipeline and its actions. The main IAM role is named CrossAccountPipelineRole, which is used by the CodePipeline service. It contains permissions to assume the action roles. See the following code:

{
    "Action": "sts:AssumeRole",
    "Effect": "Allow",
    "Resource": [
        "arn:aws:iam::<TOOLING_ACCOUNT_ID>:role/<PipelineSourceActionRole>",
        "arn:aws:iam::<TOOLING_ACCOUNT_ID>:role/<PipelineApplicationBuildActionRole>",
        "arn:aws:iam::<TOOLING_ACCOUNT_ID>:role/<PipelineDeployDevActionRole>",
        "arn:aws:iam::<UNICORN_ACCOUNT_ID>:role/CodePipelineCrossAccountRole",
        "arn:aws:iam::<GNOME_ACCOUNT_ID>:role/CodePipelineCrossAccountRole"
    ]
}

When you have more tenant accounts, you must add additional roles to the list.

After CodePipeline runs successfully, test the sample application by invoking the Lambda function on each tenant account:

aws lambda invoke --function-name SampleApplication --profile <TENANT_PROFILE_NAME> --region <YOUR_REGION> out

The output should be:

{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}

Cleaning up

Follow these steps to delete the components and avoid future incurring charges:

Delete the production application stack from each tenant account:

aws cloudformation delete-stack --profile <TENANT_PROFILE_NAME> --region <YOUR_REGION> --stack-name SampleApplication-<TENANT_NAME>-stack-<YOUR_REGION>

Delete the dev application stack from the Tooling account:

aws cloudformation delete-stack --region <YOUR_REGION> --stack-name SampleApplication-dev-stack-<YOUR_REGION>

Delete the pipeline stack from the Tooling account:

aws cloudformation delete-stack --region <YOUR_REGION> --stack-name <YOUR_PIPELINE_STACK_NAME>

Delete the customer managed CMK from the Tooling account:

aws kms schedule-key-deletion --region <YOUR_REGION> --key-id <KEY_ARN>

Delete the S3 bucket from the Tooling account:

aws s3 rb s3://<BUCKET_UNIQUE_NAME> --force

Optionally, delete the IAM roles and policies you created in the tenant accounts

Conclusion

This post demonstrated what it takes to build a CI/CD pipeline for single-tenant SaaS solutions isolated on the AWS account level. It covered how to grant cross-account access to artifact stores on Amazon S3 and artifact encryption keys on AWS KMS using policies and IAM roles. This approach is less error-prone because it avoids human errors when manually deploying the exact same application for multiple tenants.

For this use case, we performed most of the steps manually to better illustrate all the steps and components involved. For even more automation, consider using the AWS Cloud Development Kit (AWS CDK) and its pipeline construct to create your CI/CD pipeline and have everything as code. Moreover, for production scenarios, consider having integration tests as part of the pipeline.

Rafael Ramos

Rafael is a Solutions Architect at AWS, where he helps ISVs on their journey to the cloud. He spent over 13 years working as a software developer, and is passionate about DevOps and serverless. Outside of work, he enjoys playing tabletop RPG, cooking and running marathons.

Integrating AWS CloudFormation Guard into CI/CD pipelines

2020-10-16 Sergey Voinich

Post Syndicated from Sergey Voinich original https://aws.amazon.com/blogs/devops/integrating-aws-cloudformation-guard/

In this post, we discuss and build a managed continuous integration and continuous deployment (CI/CD) pipeline that uses AWS CloudFormation Guard to automate and simplify pre-deployment compliance checks of your AWS CloudFormation templates. This enables your teams to define a single source of truth for what constitutes valid infrastructure definitions, to be compliant with your company guidelines and streamline AWS resources’ deployment lifecycle.

We use the following AWS services and open-source tools to set up the pipeline:

CloudFormation
CloudFormation Guard
AWS CodeBuild
AWS CodeCommit
AWS CodePipeline
AWS Command Line Interface (AWS CLI)
Amazon Elastic Block Store (Amazon EBS)
Amazon Elastic Compute Cloud (Amazon EC2)

Solution overview

The CI/CD workflow includes the following steps:

A code change is committed and pushed to the CodeCommit repository.
CodePipeline automatically triggers a CodeBuild job.
CodeBuild spins up a compute environment and runs the phases specified in the buildspec.yml file:
Clone the code from the CodeCommit repository (CloudFormation template, rule set for CloudFormation Guard, buildspec.yml file).
Clone the code from the CloudFormation Guard repository on GitHub.
Provision the build environment with necessary components (rust, cargo, git, build-essential).
Download CloudFormation Guard release from GitHub.
Run a validation check of the CloudFormation template.
If the validation is successful, pass the control over to CloudFormation and deploy the stack. If the validation fails, stop the build job and print a summary to the build job log.

The following diagram illustrates this workflow.

Architecture Diagram of CI/CD Pipeline with CloudFormation Guard

Prerequisites

For this walkthrough, complete the following prerequisites:

Have an AWS account
Have a clear understanding of CloudFormation Guard
Configure the AWS CLI
Configure your credentials for CodeCommit

Creating your CodeCommit repository

Create your CodeCommit repository by running a create-repository command in the AWS CLI:

aws codecommit create-repository --repository-name cfn-guard-demo --repository-description "CloudFormation Guard Demo"

The following screenshot indicates that the repository has been created.

CodeCommit Repository has been created

Populating the CodeCommit repository

Populate your repository with the following artifacts:

A buildspec.yml file. Modify the following code as per your requirements:

version: 0.2
env:
  variables:
    # Definining CloudFormation Teamplate and Ruleset as variables - part of the code repo
    CF_TEMPLATE: "cfn_template_file_example.yaml"
    CF_ORG_RULESET:  "cfn_guard_ruleset_example"
phases:
  install:
    commands:
      - apt-get update
      - apt-get install build-essential -y
      - apt-get install cargo -y
      - apt-get install git -y
  pre_build:
    commands:
      - echo "Setting up the environment for AWS CloudFormation Guard"
      - echo "More info https://github.com/aws-cloudformation/cloudformation-guard"
      - echo "Install Rust"
      - curl https://sh.rustup.rs -sSf | sh -s -- -y
  build:
    commands:
       - echo "Pull GA release from github"
       - echo "More info https://github.com/aws-cloudformation/cloudformation-guard/releases"
       - wget https://github.com/aws-cloudformation/cloudformation-guard/releases/download/1.0.0/cfn-guard-linux-1.0.0.tar.gz
       - echo "Extract cfn-guard"
       - tar xvf cfn-guard-linux-1.0.0.tar.gz .
  post_build:
    commands:
       - echo "Validate CloudFormation template with cfn-guard tool"
       - echo "More information https://github.com/aws-cloudformation/cloudformation-guard/blob/master/cfn-guard/README.md"
       - cfn-guard-linux/cfn-guard check --rule_set $CF_ORG_RULESET --template $CF_TEMPLATE --strict-checks
artifacts:
  files:
    - cfn_template_file_example.yaml
  name: guard_templates

An example of a rule set file (cfn_guard_ruleset_example) for CloudFormation Guard. Modify the following code as per your requirements:

#CFN Guard rules set example

#List of multiple references
let allowed_azs = [us-east-1a,us-east-1b]
let allowed_ec2_instance_types = [t2.micro,t3.nano,t3.micro]
let allowed_security_groups = [sg-08bbcxxc21e9ba8e6,sg-07b8bx98795dcab2]

#EC2 Policies
AWS::EC2::Instance AvailabilityZone IN %allowed_azs
AWS::EC2::Instance ImageId == ami-0323c3dd2da7fb37d
AWS::EC2::Instance InstanceType IN %allowed_ec2_instance_types
AWS::EC2::Instance SecurityGroupIds == ["sg-07b8xxxsscab2"]
AWS::EC2::Instance SubnetId == subnet-0407a7casssse558

#EBS Policies
AWS::EC2::Volume AvailabilityZone == us-east-1a
AWS::EC2::Volume Encrypted == true
AWS::EC2::Volume Size == 50 |OR| AWS::EC2::Volume Size == 100
AWS::EC2::Volume VolumeType == gp2

An example of a CloudFormation template file (.yaml). Modify the following code as per your requirements:

AWSTemplateFormatVersion: "2010-09-09"
Description: "EC2 instance with encrypted EBS volume for AWS CloudFormation Guard Testing"

Resources:

 EC2Instance:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: 'ami-0323c3dd2da7fb37d'
      AvailabilityZone: 'us-east-1a'
      KeyName: "your-ssh-key"
      InstanceType: 't3.micro'
      SubnetId: 'subnet-0407a7xx68410e558'
      SecurityGroupIds:
        - 'sg-07b8b339xx95dcab2'
      Volumes:
         - 
          Device: '/dev/sdf'
          VolumeId: !Ref EBSVolume
      Tags:
       - Key: Name
         Value: cfn-guard-ec2

 EBSVolume:
   Type: AWS::EC2::Volume
   Properties:
     Size: 100
     AvailabilityZone: 'us-east-1a'
     Encrypted: true
     VolumeType: gp2
     Tags:
       - Key: Name
         Value: cfn-guard-ebs
   DeletionPolicy: Snapshot

Outputs:
  InstanceID:
    Description: The Instance ID
    Value: !Ref EC2Instance
  Volume:
    Description: The Volume ID
    Value: !Ref  EBSVolume

Optional CodeCommit Repository Structure

The following screenshot shows a potential CodeCommit repository structure.

Creating a CodeBuild project

Our CodeBuild project orchestrates around CloudFormation Guard and runs validation checks of our CloudFormation templates as a phase of the CI process.

On the CodeBuild console, choose Build projects.
Choose Create build projects.
For Project name, enter your project name.
For Description, enter a description.

Create CodeBuild Project

For Source provider, choose AWS CodeCommit.
For Repository, choose the CodeCommit repository you created in the previous step.

Define the source for your CodeBuild Project

To setup CodeBuild environment we will use managed image based on Ubuntu 18.04

For Environment Image, select Managed image.
For Operating system, choose Ubuntu.
For Service role¸ select New service role.
For Role name, enter your service role name.

Setup the environment, the OS image and other settings for the CodeBuild

Leave the default settings for additional configuration, buildspec, batch configuration, artifacts, and logs.

You can also use CodeBuild with custom build environments to help you optimize billing and improve the build time.

Creating IAM roles and policies

Our CI/CD pipeline needs two AWS Identity and Access Management (IAM) roles to run properly: one role for CodePipeline to work with other resources and services, and one role for AWS CloudFormation to run the deployments that passed the validation check in the CodeBuild phase.

Creating permission policies

Create your permission policies first. The following code is the policy in JSON format for CodePipeline:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "codecommit:UploadArchive",
                "codecommit:CancelUploadArchive",
                "codecommit:GetCommit",
                "codecommit:GetUploadArchiveStatus",
                "codecommit:GetBranch",
                "codestar-connections:UseConnection",
                "codebuild:BatchGetBuilds",
                "codedeploy:CreateDeployment",
                "codedeploy:GetApplicationRevision",
                "codedeploy:RegisterApplicationRevision",
                "codedeploy:GetDeploymentConfig",
                "codedeploy:GetDeployment",
                "codebuild:StartBuild",
                "codedeploy:GetApplication",
                "s3:*",
                "cloudformation:*",
                "ec2:*"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "*",
            "Condition": {
                "StringEqualsIfExists": {
                    "iam:PassedToService": [
                        "cloudformation.amazonaws.com",
                        "ec2.amazonaws.com"
                    ]
                }
            }
        }
    ]
}

To create your policy for CodePipeline, run the following CLI command:

aws iam create-policy --policy-name CodePipeline-Cfn-Guard-Demo --policy-document file://CodePipelineServiceRolePolicy_example.json

Capture the policy ARN that you get in the output to use in the next steps.

The following code is the policy in JSON format for AWS CloudFormation:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "iam:CreateServiceLinkedRole",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "iam:AWSServiceName": [
                        "autoscaling.amazonaws.com",
                        "ec2scheduled.amazonaws.com",
                        "elasticloadbalancing.amazonaws.com"
                    ]
                }
            }
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:GetObjectAcl",
                "s3:GetObject",
                "cloudwatch:*",
                "ec2:*",
                "autoscaling:*",
                "s3:List*",
                "s3:HeadBucket"
            ],
            "Resource": "*"
        }
    ]
}

Create the policy for AWS CloudFormation by running the following CLI command:

aws iam create-policy --policy-name CloudFormation-Cfn-Guard-Demo --policy-document file://CloudFormationRolePolicy_example.json

Capture the policy ARN that you get in the output to use in the next steps.

Creating roles and trust policies

The following code is the trust policy for CodePipeline in JSON format:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "codepipeline.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create your role for CodePipeline with the following CLI command:

aws iam create-role --role-name CodePipeline-Cfn-Guard-Demo-Role --assume-role-policy-document file://RoleTrustPolicy_CodePipeline.json

Capture the role name for the next step.

The following code is the trust policy for AWS CloudFormation in JSON format:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "cloudformation.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create your role for AWS CloudFormation with the following CLI command:

aws iam create-role --role-name CF-Cfn-Guard-Demo-Role --assume-role-policy-document file://RoleTrustPolicy_CloudFormation.json

Capture the role name for the next step.

Finally, attach the permissions policies created in the previous step to the IAM roles you created:

aws iam attach-role-policy --role-name CodePipeline-Cfn-Guard-Demo-Role --policy-arn "arn:aws:iam::<AWS Account Id >:policy/CodePipeline-Cfn-Guard-Demo"

aws iam attach-role-policy --role-name CF-Cfn-Guard-Demo-Role --policy-arn "arn:aws:iam::<AWS Account Id>:policy/CloudFormation-Cfn-Guard-Demo"

Creating a pipeline

We can now create our pipeline to assemble all the components into one managed, continuous mechanism.

On the CodePipeline console, choose Pipelines.
Choose Create new pipeline.
For Pipeline name, enter a name.
For Service role, select Existing service role.
For Role ARN, choose the service role you created in the previous step.
Choose Next.

Setting Up CodePipeline environment

In the Source section, for Source provider, choose AWS CodeCommit.
For Repository name¸ enter your repository name.
For Branch name, choose master.
For Change detection options, select Amazon CloudWatch Events.
Choose Next.

Adding CodeCommit to CodePipeline

In the Build section, for Build provider, choose AWS CodeBuild.
For Project name, choose the CodeBuild project you created.
For Build type, select Single build.
Choose Next.

Adding Build Project to Pipeline Stage

Now we will create a deploy stage in our CodePipeline to deploy CloudFormation templates that passed the CloudFormation Guard inspection in the CI stage.

In the Deploy section, for Deploy provider, choose AWS CloudFormation.
For Action mode¸ choose Create or update stack.
For Stack name, choose any stack name.
For Artifact name, choose BuildArtifact.
For File name, enter the CloudFormation template name in your CodeCommit repository (In case of our demo it is cfn_template_file_example.yaml).
For Role name, choose the role you created earlier for CloudFormation.

Adding deploy stage to CodePipeline

22. In the next step review your selections for the pipeline to be created. The stages and action providers in each stage are shown in the order that they will be created. Click Create pipeline. Our CodePipeline is ready.

Validating the CI/CD pipeline operation

Our CodePipeline has two basic flows and outcomes. If the CloudFormation template complies with our CloudFormation Guard rule set file, the resources in the template deploy successfully (in our use case, we deploy an EC2 instance with an encrypted EBS volume).

CloudFormation Console

If our CloudFormation template doesn’t comply with the policies specified in our CloudFormation Guard rule set file, our CodePipeline stops at the CodeBuild step and you see an error in the build job log indicating the resources that are non-compliant:

[EBSVolume] failed because [Encrypted] is [false] and the permitted value is [true]
[EC2Instance] failed because [t3.2xlarge] is not in [t2.micro,t3.nano,t3.micro] for [InstanceType]
Number of failures: 2

Note: To demonstrate the above functionality I changed my CloudFormation template to use unencrypted EBS volume and switched the EC2 instance type to t3.2xlarge which do not adhere to the rules that we specified in the Guard rule set file

Cleaning up

To avoid incurring future charges, delete the resources that we have created during the walkthrough:

CloudFormation stack resources that were deployed by the CodePipeline
CodePipeline that we have created
CodeBuild project
CodeCommit repository

Conclusion

In this post, we covered how to integrate CloudFormation Guard into CodePipeline and fully automate pre-deployment compliance checks of your CloudFormation templates. This allows your teams to have an end-to-end automated CI/CD pipeline with minimal operational overhead and stay compliant with your organizational infrastructure policies.

Field Notes: Building an Autonomous Driving and ADAS Data Lake on AWS

2020-10-14 Junjie Tang

Post Syndicated from Junjie Tang original https://aws.amazon.com/blogs/architecture/field-notes-building-an-autonomous-driving-and-adas-data-lake-on-aws/

Customers developing self-driving car technology are continuously challenged by the amount of data captured and created during the development lifecycle. This is accelerated by the need to design and launch incremental feature improvements on advanced driver-assistance systems (ADAS). Efforts to advance ADAS functionality have led to new approaches for storing, cataloging, and analyzing driving data captured from vehicles on the road today. Combining data from connected vehicle fleets transmitted over cellular networks in combination with manually ingesting data from vehicle data loggers requires a complex architecture and elastic data lake capability that only AWS provides.

This blog explains how to build an Autonomous Driving Data Lake using this Reference Architecture. We cover the workflow from how to ingest the data, prepare it for machine learning, catalog the output from ADAS systems and vehicle sensors, label it, automatically detect scenarios, and manage the various workflows required for moving it into an organized data lake construct. The AWS Autonomous Driving and ADAS Data Lake Reference Architecture was developed after working with numerous customers on the challenges they faced in achieving this. Using multiple AWS services and solution best practices, we outlined an approach we found helpful to others.

In the blog post, Autonomous Vehicle and ADAS development on AWS Part 1: Achieving Scale, we outlined the benefits having a data lake in the cloud, as opposed to building your own on-premises data lake solution.

Before we dive into the details of the reference architecture, let’s review the typical workflow of autonomous and ADAS development. The diagram below shows the following steps, much of which are common in any machine learning project:

data acquisition and ingest,
data processing and analytics,
labeling, map development,
model and algorithm development,
simulation and validation, and
orchestration and deployment.

This blog focuses on data ingest, data processing and analytics, labeling, and the data lake itself, as shown in the following workflow diagram:

driving dev workflow

Why build a data lake for ADAS and Autonomous Driving System development?

At AWS re:Invent 2019, BMW spoke at this session; (AUT306): Creating a data-driven, cloud-native ecosystem at BMW Group where they explained the motivation for developing an enterprise-wide cloud data lake, or the Cloud Data Hub as they call it. The Cloud Data Hub ingests information from multiple lines of business including sources like manufacturing systems, logistics, customer service, after-sales, and connected vehicle sensor telemetry data.

BMW created a global organization to drive DevOps culture with a strong emphasis on tooling, data quality, and end-to-end data lineage. Their journey began in the Hadoop eco-system (Hive, HBase) with an on-premises, heterogeneous, and difficult-to-scale environment, then moved to cloud-native building blocks with a focus on data quality. Examples of benefits derived from the AWS Cloud implementation are multi-region deployments, high security standards, and compliance with local regulations. The goal is to democratize the most valuable information assets so they can be used across a broad community globally.

The BMW Cloud Data Hub is just one example of how customers manage information assets for sharing across the enterprise. Using a similar approach, the AWS Autonomous Driving and ADAS Data Lake Reference Architecture extends the data lake pattern to address specific challenges around:

metadata and the data catalog including automated scenario detection;
data lineage from source to semantic layer;
data sharing with external consumers and third party service providers via Amazon SageMaker Ground Truth.

Now let’s review the details of the Autonomous Driving Data Lake Reference Architecture:

1. Ingest data from autonomous fleet with AWS Outposts for local data processing.

Sensor data is captured and written to data loggers containing multiple SSD hard drives. Once at the garage or customer facility, the hard drives are removed and inserted into copy stations. From there the data is copied to Amazon S3 or to a local storage system and AWS Outposts for pre-processing. AWS Outposts is a fully managed service that extends AWS infrastructure with AWS Services like Amazon Elastic Kubernetes Service, Amazon Relational Database Service, Amazon EMR on Amazon S3 (EMR supports EMRFS and HDFS which both natively support S3). These services are used to run data integrity checks, compress the data to remove redundant information and prepare the data for downstream AD workloads.

Using AWS DataSync, the data is synchronized at high data rates and securely between on-premises network attached storage (NAS) sources and Amazon S3. This is done over a high bandwidth connection provided by AWS Direct Connect.

2. Ingest vehicle telemetry data in real time using AWS IoT Core and Amazon Kinesis Data Firehose.

Vehicle telemetry is captured and published to the cloud using a number of different technologies, typically over HTTPS or MQTT. In this architecture, AWS IoT Greengrass provides an intelligent edge runtime in the vehicle with application logic running in Lambda functions deployed locally to filter vehicle network signals like CAN data, GPS location, ADAS system output, road condition metadata derived from cameras, and other vehicle sensor information.

AWS IoT Greengrass allows customers to deploy containers, machine learning inference models, create multiple data streams and prioritize them based on business logic you define. Eventually, the data ends up in S3 to be combined with the sensor data captured from the data logger process defined in step one above.

3. Remove and transform low quality data.

Autonomous vehicles produce terabytes of data per hour. In this trove of information, there may be redundant as well as corrupted data coming from the vehicle telemetry stream and raw sensor stack. This data needs to be normalized for optimal downstream processing. Customers use a number of technologies to do this. For example, Amazon EMR provides a runtime for high volume, complex data processing using open source Apache big data processing engines like Spark. There are a few common steps in the data transformation including;

checking if the driving is complete by combining batch files and streaming data;
parsing the log files based on recording formats (rosbag, mdf4, etc.);
decoding the signals from binary formats to readable text;
filtering inconsistent data files; and
synchronize the timestamp of signals

EMR Launch is an open source framework developed by AWS Labs available for customers to accelerate and simplify defining, deploying, managing, and using Amazon EMR clusters with the following features:

separating the definition of cluster security configurations (EMR Profile) and cluster resource configurations (Cluster Configuration) into reusable and shareable constructs; and
provide a suite of tools to simplify the construction of orchestration pipelines using Amazon Step Functions (EMR Launch Function).

4. Schedule the extract, transform, load (ETL) jobs using Apache Airflow.

To create trustworthy insights and empower your next machine learning use case with a reliable data foundation you need to bring the data creation process under design control. Fundamental to this approach is a centralized, governed workflow system powered by Apache Airflow. With Airflow you can establish trust in your data processing pipelines by making the workflow part of your code base to enable transparent, repeatable, pipeline executions.

The following solution diagram shows how radar and video data processing in MDF4 format achieves the highest scalability by leveraging AWS Fargate for Amazon ECS.

To ensure data pipeline integrity, the solution is deployed securely in an Amazon Virtual Private Cloud with end-to-end TLS and is only accessed from a private bastion host.
The containers for Airflow Webserver, Scheduler and Worker are deployed in multiple Availability Zones for high availability.
The communication between the components are decoupled via Amazon ElastiCache for Redis.
The status of running jobs is stored in Amazon Aurora.
To learn more about how to leverage complex workflows and model training jobs on AWS, a future blog post will describe the detailed architecture of Apache Airflow.

radar and video data processing in MDF4 format

5. Enrich data with map information and weather conditions based on GPS location and timestamp.

Data-sets are enriched using Amazon EMR with map or weather information from external geospatial and weather service providers and stored in Amazon S3, or database services like Amazon DynamoDB. Sensors like cameras and LiDAR could malfunction or even fail in adverse weather conditions. Advanced sensor fusing could perform the weather perception in real-time. The result of the weather perception could be verified with the real weather conditions.

6. Extract metadata into Amazon DynamoDB and Amazon Elasticsearch Service.

Using drive logs that contain the telemetry, telematics, perception, and sensor data, a catalog is built to create searchable quantitative metadata that includes speed, turning angles, location as well as simple and complex semantic descriptions of scene snippets such as “high velocity,” “left turn,” or “pedestrian.” The data lake catalog is updated with scenario data and indexed in Amazon Elastic Search for discovery by analysts and ADS engineers. The extraction process is ideally fully automated, but many higher order behavioral descriptions may require human annotations from later processing steps.

The majority of systems leverage quantitative and simple semantic descriptions for the drive data, with a clear trend towards needing higher order behaviors extracted in the data lake catalog. These more complex behaviors are ideal for enhanced searching capabilities and better validating coverage mapping. ASAM OpenSCENARIO defines a scenario description language that provides a common ontology and hierarchy for detailing the behavior of the vehicle and surroundings.

This standard provides an open approach for describing complex, synchronized maneuvers that involve multiple entities including other vehicles, vulnerable road users (VRUs) like pedestrians, bikers, construction workers, and other traffic participants. The description of a maneuver may be based on driver actions like performing a lane change by the ego, or based on the actions of others in the scenario such as a cut in from another driver. OpenSCENARIO also accounts for the appearance/description of the participant in the scene.

7. Store data lineage in Amazon Neptune and catalog data using AWS Glue Data Catalog.

Amazon Neptune is a fully managed graph database service. It’s useful to catalog data lineage in a graph model to visualize file and object dependencies. AWS Glue is a fully managed service that provides a data catalog making assets in the data lake discoverable. Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.

The following diagram shows how we parse data from the Ford Autonomous Vehicle Dataset in Rosbag format using Amazon EMR. We store it in Amazon S3 in parquet format, use AWS Glue Crawler to read the file schema and create the tables in the AWS Glue Data Catalog, and finally use Amazon Athena to query the velocity data.

Ford Autonomous Vehicle Dataset

8. Process drive data and perform deep signal validation.

Deploy your drive data signal validation code in Amazon Elastic Kubernetes Service (Amazon EKS). EKS is a managed service that makes it easy for you to run Kubernetes without needing to stand up or maintain your own Kubernetes control plane. Only a subset of the signals from the Rosbag or MDF4 files will be extracted for KPI calculation and aggregation which potentially reduces the stored data volume from gigabytes to megabytes.

9. Perform automated labeling using Amazon SageMaker Ground Truth.

Amazon SageMaker Ground Truth is a fully managed data labeling service that makes it easy to build highly accurate training datasets for machine learning. Ground Truth offers automatic data labeling and/or annotation which uses a machine learning model to label your data.

In addition, the service helps you create custom workflows for data labeling that leverage human workers from Amazon Mechanical Turk, the AWS Partner Network, or your own private workforce to improve the automated labeling accuracy. Ground Truth now supports 3D Point Cloud Labeling for task types like Object Detection, Object Tracking and Semantic Segmentation. This blog shows how to use the service with open data sets from Audi A2D2 and KITTI. Alternatively, customers can run a custom container on Amazon EKS for Ground Truth generation and labeling.

10. Provide a search function for particular scenarios using AWS AppSync.

Developers and data scientists can search for a particular scenario and all of the associated metadata related to it. AWS AppSync is a managed service that uses GraphQL to make it easy for applications to get data from a range of data sources such as Amazon DynamoDB, Amazon ES and AWS Lambda.

Additional Aspects to consider

China: The collection of raw data that includes video, lidar, radar, and GPS data is defined as a controlled activity by the government (geographic information surveying and mapping). This is a regulated activity and must be done under the governance of local certified map providers with navigation surveying licenses.
Data Encryption and Anonymization: Some ADS/ADAS use cases include sensitive or personal information. AWS Key Management Service (KMS) supports Customer Master Keys (CMKs). Vehicle Identification Numbers (VIN) can be anonymized by Amazon EMR jobs. This blog shows how to anonymize personal data like faces from the video using Amazon Rekognition.
Exchange Data with partners: AWS Data Exchange is a service that makes it easy for AWS customers to securely exchange file-based data sets in the AWS Cloud. Providers in AWS Data Exchange have a secure, transparent, and reliable channel to reach AWS customers and grant existing customers their subscriptions more efficiently.
Data Lake as code: AWS provides a full stack of DevOps tooling including AWS CodeCommit, AWS CodeBuild and AWS CodePipeline to simplify the provisioning and management infrastructure, to deploy application code, automate software release processes, and monitor application and infrastructure performance. Third party CI/CD tools like Jenkins and Zuul can be integrated as well.

Conclusion

In this post, we discussed the steps outlined in this Reference architecture to build an Autonomous Driving and ADAS Data Lake on AWS. We hope you found this interesting and helpful and invite your comments on the architecture.

Also, check out the Automotive issue of the AWS Architecture Monthly Magazine.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Implementing Hardware-in-the-Loop for Autonomous Driving Development on AWS

2020-10-13 Bryan Berezdivin

Post Syndicated from Bryan Berezdivin original https://aws.amazon.com/blogs/architecture/field-notes-implementing-hardware-in-the-loop-for-autonomous-driving-development-on-aws/

Automotive customers use AWS as their platform for advanced driving assistance systems (ADAS) and autonomous driving (AD) development to accelerate their development cycles and experience faster time-to-market. In the blog post, Autonomous Vehicle and ADAS development on AWS Part 1: Achieving Scale, we illustrated how software in the loop (SiL) and hardware in the loop (HiL) simulations are part of the workflow used to develop and validate safe AD and ADAS functionality. In this post, I run through some of the more common questions and patterns for implementing HiL on AWS, while looking at some of the differences from running your development all on-premises.

HiL simulations leverage test drive data and derived synthetic data to develop and validate various functions in the AD software stack. Test drive log data is ingested and stored in Amazon S3 for use for HiL simulations in parallel with other AD development workloads including visualization, processing, labeling, analysis, and model and algorithm development.

As such, we see customers exist in a hybrid context with their HiL workloads running on-premises to support customized equipment. For ADAS and ADS customers, this poses a few questions and considerations:

What are the recommendations to deploy HiL for AD on AWS?
How is this different from what customers were used to on-premises?

For hybrid customers, there are assumptions and misconceptions:

Do I need to replicate the test drive data locally, and if so, what are the considerations and consequences?

For the purposes of brevity, the remainder of this blog post will use the term AD development to encompass ADAS and AD unless specifically called out.

HiL Building Blocks

Simulations and validations make up an important aspect of AD development. According to Rand’s analysis, there is a need to demonstrate safe driving on billions of miles for an autonomous vehicle to have a lower failure rate than a human driver. While this analysis is statistically derived for fully autonomous driving (SAE Level 5), it demonstrates a need for further validation on millions of miles. This pattern is reflected in most ADAS and AD development projects, where software in the loop (SiL) and HiL are used for verification and validation.

The following diagram is an illustration of the ISO 26262 V-Model, a product development approach for matching requirements with corresponding tests, where HiL simulation is required for much of system level testing and validation phases on the right side of the overlaid V-Model.

Figure 1: V-Model as defined by ISO-26262

HiL simulations require a few key elements. The main component is the device under test (DUT), such as one or more electronic control units (ECUs) running the AD software stack. HiL simulations allow customers to put the device under test (DUT) under the rigor of real-world signals found in a vehicle. By providing more accurate environments and scenarios for the DUT, it can be fine-tuned for key performance indicators (KPIs) such as power utilization, response time, and accuracy.

The DUT is connected to a “HIL Rig,” a high performance server with multiple expansion boards to connect to various components of the AD system. The various interfaces are identified in Figure 2: High Level Hardware-in-the-loop (HiL) System and Interfaces and include Controller Area Network (CAN), Automotive Ethernet, Low Voltage Differential Signal (LVDS), and PCIExpress. These interfaces emulate the vehicular topology for testing purposes and allow system level validation of the DUT.

The HiL Rigs and the corresponding software tooling are offered by companies like Elektrobit, dSpace, National Instruments, and Opal-RT. The HiL systems facilitate time synchronized inputs and outputs to the DUT and measures system performance. These solutions have optimizations for large-scale operations aligned with faster validation cycles for customers. This latter point is relevant when a project requires validation across a large number of miles. Elektrobit provides the ability to orchestrate and deploy large HiL server farms that can be deployed to work in parallel. The larger server farms allow parallel HiL simulations to reduce the time to validate thousands of miles of drive time and assess the key performance indicators (KPIs) for the feature sets. Results can be acted on more quickly and reduce the overall development time.

Figure 2: High Level Hardware-in-the-loop (HiL) System and Interfaces

The HIL system loads sensor data derived from test drive logs. These logs vary in format, but often are captured and stored as MDF4, ADTF, rosbag, or other data logger proprietary formats. These are then processed for HIL simulations to implement open-loop and closed-loop simulations.

Open loop simulations refer to replay of log data from test drives.
Closed-loop simulations rely on the behavior of the system as inputs vary based on new outputs of the simulation.

Both open loop and closed loop simulations are part of autonomous driving development, but open-loop simulations require the largest datasets (multiple petabytes on average) due to reliance on the log data from test drives making them a primary concern for deploying HiL in a hybrid manner.

Overview of Solution

Architectures for supporting HIL simulations with AWS for AD development vary based primarily on the networking available at HiL locations. A common pattern for AWS customers is to have the HiL systems directly interfacing with Amazon S3 over high-bandwidth network links leveraging AWS Direct Connect. This is the simplest approach to deploying HiL and avoids hybrid data management of the petabytes of data in Amazon S3 to a local storage system.

AWS Direct Connect provides customers options to deploy their HIL rigs at their data center or in AWS colocation facilities with low latency connections. AWS has the largest number of Direct Connection locations and points-of-presence (POPs) to enable low latency connectivity to any of the >24 AWS Regions. The following diagram illustrates a reference architecture leveraging a direct interface from the HiL systems and Amazon S3.

Figure 3 : Reference Architecture for Hardware-in-the-Loop (HiL) Direct to Amazon S3

As shown in Figure 3, we illustrate the common interfaces, topology, and AWS services used for autonomous driving customers.

Amazon S3 is used to store and analyze the test drive logs used by the HiL simulations and also the results from the simulation runs for further analysis.
Metadata of test drive data is populated in various database and analytics services, referred to as the data catalogues, with metadata crawlers and processing pipelines that extract from the drive log and test result data on Amazon S3.
The data catalogues provide flexible search interfaces for developers and validation engineers or advanced analytics tools. These systems provide keyword search in Amazon Elasticsearch or SQL queries in Amazon Redshift or Amazon RDS and noSQL interfaces using Amazon DynamoDB. Amazon Partner Network solutions for these database and analysis tools are common as well, such as those in AWS Marketplace.
Validation engineer, data scientists, or developers use these data catalogues to find scenarios for testing. These personas also use the HiL management interfaces to configure and orchestrate the HiL simulation runs on the scenarios identified and ensure traceability.
HiL management systems control the HiL Rigs that interface to the DUT and implement the HiL simulations using the test drive logs. The HiL management system then writes results back to S3 for further analysis via various tool chains.

A common question AWS customers have is how to determine an optimal hybrid architecture using this approach. The primary factors are properly sized network links to accommodate data sets used by the HiL simulations as well as low latency network links between Amazon S3 and the HiL rigs. As a result, a key factor is ensuring use of an AWS region for your AWS storage that is in close proximity to your HiL testing site(s).

Based on current HiL implementations, open-loop simulations can sustain latencies of 30-50 ms RTT. AWS has numerous AWS Direct Connect locations in co-location facilities with latencies <5ms RTT. Sizing for these network links can be calculated based on the expected dataset sizes and the interval of time targeted for simulation run. We show a basic formula used for network sizing.

Average_Throughput (Gbps) = Average_Dataset_Size(GB)*8 / Time_Interval (seconds)

As an example, for a scenario where an average of 20PB is needed by the HIL rig every 2 weeks, we require ~200Gbps for the AWS Direct Connect bandwidth.

Figure 4 shows an example of a high-level architecture supported by Elektrobit with multiple EB 9101 test racks grouped together. This architecture supports multiple ECUs to be tested at once, leveraging drive log data in Amazon S3. This system is controlled with a central management software that allows optimal orchestration to keep the Elektrobit HiL system running optimally.

Use cases include:

The automated replay of all relevant sensor data with high time precision to ECU
Capture of ECU responses including debug data
Integration of customer components inside the HiL rack for visualization or post-processing.

Another common question from AWS customers is whether this architecture is supported for their HiL implementation. Many HiL providers are adding AWS functionality to their software and hardware stacks in response to customers transitioning to cloud for the development platforms. Some vendors still require Amazon S3 as a supported interface in their HiL Rigs. The work needed to accommodate Amazon S3 is usually a small level of effort for any developer by using Amazon SDKs on the HiL rig software stack. If there is a project where this is needed, contact the AWS account teams and your HiL vendors to ensure a successful and cost efficient project implementation.

Figure 4: Elektrobit HiL Architecture with AWS

An alternative HiL solution shown in Figure 5 includes Amazon S3 as the primary storage for drive log data and the scale out NAS storage system is located on-premises operating as a cache for the HIL rigs. This is common when the networking options at the HIL site are limited in bandwidth or latency to handle the target datasets and time windows.

AWS customers calculate the size of the cache to transfer the entire dataset over the intended time interval. Following is a simple calculation to demonstrate this.

Cache_Size(GB) = Average_Dataset_Size(GB) - Average_Throughput (Gbps) /8 * Time_Interval (seconds)

In this example, a customer has 40 Gbps AWS Direct Connect available and a 10PB dataset needed for HIL simulations every 2 weeks. Using the preceding formula there is a need for local cache of four PB capable of high read-rates.

Figure 5: Reference Architecture for Hardware-in-the Loop (HiL) with Local Cache to Amazon S3

In this hybrid architecture there is a need to orchestrate the data movement in line with the needs of the HIL simulation data set. This requires third party software generally or built in functionality into workflow orchestration tools like Apache Airflow. At CES 2019, Dell EMC and AWS illustrated a solution for this hybrid architecture documented in this short solution brief using Isilon as the scale out NAS storage system and DataIQ as the data movement and orchestration mechanism.

Any of these architectures can be cost-optimized, and AWS has programs and pricing options for Amazon Direct Connect as well as the other AWS services involved. There are Enterprise Agreements and Migration Acceleration Programs (MAP) in line with the holistic AD development platform needs, that reduce the costs for hybrid architecture functionality needed in the HiL solutions. One common need is support for AWS Direct Connect “flat rate” pricing option to accommodate the data transfer out (DTO) needs for the HiL workload. If you need details on these programs for your AD development project, contact your AWS account team.

Conclusion

In this blog post, we discussed two common architectural patterns for supporting HiL simulations for ADAS and Autonomous Driving development. These help customers decide on the right networking, storage, and hybrid topologies for these systems.

HiL systems directly interfacing with Amazon S3 is the most common pattern as you see with Elektrobit HiL solutions, but for customers with limited network links the use of a local cache is an option. Autonomous driving customers looking to increase velocity in their SAE Level 2-5 development programs with HiL simulations have achieved success with AWS as the development platform using these patterns. AWS has a team dedicated to autonomous driving, so contact your AWS account team to get a more prescriptive solution for your HiL or related ADAS and AD development needs.

Also, check out the Automotive issue of the AWS Architecture Monthly Magazine.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Powering the Connected Vehicle with Amazon Alexa

2020-10-07 Amit Kumar

Post Syndicated from Amit Kumar original https://aws.amazon.com/blogs/architecture/field-notes-powering-the-connected-vehicle-with-amazon-alexa/

Alexa has improved the in-home experience and has potential to greatly enhance the in-car experience. This blog is a continuation of my previous blog: Field Notes: Implementing a Digital Shadow of a Connected Vehicle with AWS IoT. Multiple OEMs (Original Equipment Manufacturers) have showcased this capability during CES 2020. Use cases include; a person seating at the rear seat can play a song, control HVAC (Heating, ventilation, and air conditioning), pay for gas/coffee, all while using Alexa. In this blog, I cover how you create a connected vehicle using Alexa, to initiate a command, such as; ‘Alexa, open my trunk’.

Solution Architecture

“Alexa, open my trunk”

The preceding architecture shows a message flowing in the following example:

A user of a connected vehicle wants to open their trunk using an Alexa voice command. Alexa will identify the right intent based on utterances and invoke a Lambda function. The Lambda function updates the device shadow with (desired {““trunk””: ““open””}).
Vehicle TCU registered the callback function shadowRegisterDeltaCallback(). Listen to delta topics for the device shadow by subscribing to delta topics. Whenever there is a difference between the desired and reported state, the registered callback will be called. The delta payload will be available in the callback. Update performed in #1 will be received in delta callback.
Now, the vehicle must act on the desired state. In this case, it acts on the trunk status change. After performing the required action for the trunk change, the vehicle TCU will update the device shadow with the reported state (reported : { “trunk”: “open”} )
The web/mobile app subscribed to the topic $aws/things/tcu/shadow/update/accepted”. Therefore, as soon as the vehicle TCU updates the shadow, the Web/Mobile app received the update and synchronized the UI state.

As part of the previous blog, we implemented #2, #3 and #4. Lets implement #1 and incorporate into the solution.

The source code (vehicle-command) of this blog is available in this code repository.

The Alexa voice command required the implementation of three key areas:

Configure Alexa – which will listen to utterances and identify the right intent and invoke a Lambda function.
Set up the Lambda function – which will interpret the command and invoke the AWS IoT Core device shadow API.
Handle Command at Vehicle tcu and App – Vehicle tcu must register shadowRegisterDeltaCallback so any update in the device shadow will receive a call message to perform the actual command by the vehicle and synchronize the state with a web/mobile app.

Let’s ‘Open a trunk’ using Alexa voice command. First set up the environment:

Open AWS Cloud9 IDE created in an earlier lab and run the following command:

Set up permanent credentials. Note: Alexa doesn’t work with temporary credentials. Configure it with permanent credentials for ASK command line interface (CLI).

Open Cloud9 Preferences by clicking AWS Cloud9 > Preference or by clicking on the “gear” icon in the upper right corner of the Cloud9 window
Select “AWS Settings”
Disable “AWS managed temporary credentials”
$ aws configure
Enter the Access Key and Secret Access Key of a user that has required access credentials
Use us-east-1 as the region. It will store in ~/.aws/config

Verify that everything worked by examining the file ~/.aws/credentials. It should resemble the following:

[default]
aws_access_key_id = <access_key>
aws_secret_access_key = <secrect_key>
aws_session_token=

*Remove aws_session_token line from credentials file.

Next, install the Alexa CLI:

$ npm install ask-cli --global

Initialize ASK CLI by issuing the following command. This will initialize the ASK CLI with a profile associated with your Amazon developer credentials.

$ ask configure --no-browser

Check you are linking AWS account with Alexa:

Do you want to link your AWS account in order to host your Alexa skills? Yes

#At the end output should look as follows:

------------------------- Initialization Complete -------------------------
Here is the summary for the profile setup:
ASK Profile: default
AWS Profile: default
Vendor ID: MXXXXXXXXXX

As part of the previous blog, you have already cloned the following git repository in AWS Cloud9 IDE. It has a baseline code to jump start.

$ git clone

Configure Alexa Skills

The Alexa Developer console GUI can be used but we are doing it programmatically so it can be done at scale and allows versioning.

1. Open connected-vehicle-lab/vehicle-command/skill-package/skill.json . We have 2 locale en-US, en-IN are defined in the base code for Alexa command. Let’s add en-GB locale in the json file located at “manifest”/”publishingInformation”/”locales”. Similarly, you can add locale for your preferred language:

"en-GB": {
"name": "vehicle-command",
"summary": "Control Vehicle using voice command",
"description": "Allow you to control vehicle using voice command",
"examplePhrases": [
    "Alexa open genie",
    "ask genie to lower window",
    "window up"
    ],
"keywords": []
}

If you are inserting into the middle then make sure it is separated by a comma.

2. Let’s create a copy of models connected-vehicle-lab/vehicle-command/skill-package/interactionModels/custom/en-US.json and rename it to en-GB.json and add our intent

We have “invocationName”: “genie”. Here, we are using “genie” as a command to invoke our Alexa skill. You can change if needed
The key elements in this json file is intent, slots, sample utterance and slot types. Let’s define the slot types t_action_type for ‘open’, ‘close’, ‘lock’, ‘unlock’. under “types”: [].

        {
        "name": "t_action_type",
        "values": [
            {
                "name": {
                "value": "unlock"
                }
            },
            {
                "name": {
                "value": "lock"
                }
            },
            {
                "name": {
                "value": "close"
                }
            },
            {
                "name": {
                "value": "open"
                }
            }
          ]
        }

Let’s add intent under “intents”: [] for trunk ‘TrunkCommandIntent’ and define the sample utterance speech like ‘lock my trunk’, ‘open trunk’. We are using slot types to simplify the utterance and understand the operation requested by a user.

        {
            "name": "TrunkCommandIntent",
            "slots": [
            {
                "name": "t_action",
                "type": "t_action_type"
            }
            ],
            "samples": [
                "{t_action} trunk",
                "trunk {t_action}",
                "{t_action} my trunk",
                "{t_action} trunk"
            ]
}

Now add the same intent, slots, slot type and sample utterances for other locales files (en-US.json and en-IN.json) as well.

3. Let’s add response message under languageString.js (available at /connected-vehicle-lab/vehicle-command/lambda/custom).

TRUNK_OPEN: 'Trunk Open',
TRUNK_CLOSE: 'Trunk Close'

If you are inserting into the middle then make sure it is separated by a comma.

Set up the Lambda function

1. Add a Lambda function which will get invoked by Alexa. This Lambda function will handle the intent and invoke IoT Core Device Shadow API and execute the actual command of ‘Trunk open/unlock or lock/close’.

Open /connected-vehicle-lab/vehicle-command/lambda/custom/index.js and add our TrunkCommandIntent

const TrunkCommandIntentHandler = {
                canHandle(handlerInput) {
                return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
                && Alexa.getIntentName(handlerInput.requestEnvelope) === 'TrunkCommandIntent';
                },
                    handle(handlerInput) {
                    var t_action_value = handlerInput.requestEnvelope.request.intent.slots.t_action.value;
                    console.log(t_action_value);
                    var speakOutput;
                    const obj = "trunk";
                    if (t_action_value == "lock" || t_action_value == "open")
                    {
                        updateDeviceShadow(obj, "open");
                        speakOutput = handlerInput.t('TRUNK_OPEN')
                    }
                    else 
                    {
                        updateDeviceShadow(obj, "close");
                        speakOutput = handlerInput.t('TRUNK_CLOSE')
                    } 
                    console.log(speakOutput);
                    return handlerInput.responseBuilder
                    .speak(speakOutput)
                    //.reprompt('add a reprompt if you want to keep the session open for the user to respond')
                    .getResponse();
                }
            };

We have UpdateDeviceShadow(“vehicle_part”, “command”) function which actually invokes the IoT core Device Shadow API

 function updateDeviceShadow (obj, command)
    {
        shadowMessage.state.desired[obj] = command;
        var iotdata = new AWS.IotData({endpoint: ioT_EndPoint});
        var params = {
        payload: JSON.stringify(shadowMessage) , /* required */
        thingName: deviceName /* required */ 
        };
        iotdata.updateThingShadow(params, function(err, data) {
            if (err) 
            console.log(err, err.stack); // an error occurred
            else 
            console.log(data); 
            //reset the shadow 
            shadowMessage.state.desired = {}
        });
}

2. Update the value of ioT_EndPoint from AWS IoT Core > Settings > Custom Endpoint

3. Add Trunk CommandIntent in request handler

exports.handler = Alexa.SkillBuilders.custom()
    .addRequestHandlers(
        LaunchRequestHandler,
        WindowCommandIntentHandler,
        DoorCommandIntentHandler,
        TrunkCommandIntentHandler,

4. Deploy Alexa Skills

$ cd ~/environment/connected-vehicle-lab/vehicle-command
$ ask deploy

Handle Command at Vehicle tcu and App

For more detail on this section, refer to part 1 of this blog: Field Notes: Implementing a Digital Shadow of a Connected Vehicle with AWS IoT.

@ Vehicle tcu – tcuShadowRead.py has trunk_handle() function to receive a message from device shadow

def trunk_handle(status):
  if status is not None:
    shadowClient.reportedShadowMessage['state']['reported']['trunk'] = status
    print ('Perform action on trunk status change : ' + str(status))

@web App – demo-car/js/websocket.js has handleTrunkCommand function receive callback message as soon any update happened on Device Shadow

//this function will be called by onMessageArrive
function handleTrunkCommand(trunkStatus) {
    obj = document.getElementsByClassName("action trunk")[0];
    obj.checked = trunkStatus == "open" ? true : false;
    console.log(obj.getAttribute("data-text") + " : " + obj.checked);
}

demo-car/js/demo-car.js has handleTrunkCommand function to handle UI input and invoke IoT Core Device Gateway API to update the desired state.

//this function will be called when user will click on trunk checkbox
    handleTrunkCommand: function(obj) {
        obj.checked ? demoCar.shadowMessage.state.desired.trunk = "open" : demoCar.shadowMessage.state.desired.trunk = "close";
        console.log(obj.getAttribute("data-text") + " : " + demoCar.shadowMessage.state.desired.trunk);
        demoCar.accessIoTDevice();
    },

Use Alexa skill to invoke a command

Let’s test or command ‘Alexa, open my trunk’. We can use a command line and execute:

$ask dialog --locale "en-GB"

Using Alexa GUI, provides an interesting visualization, as shown in the following screenshot.

Open the Alexa GUI, Select ‘vehicle command’ skill and select test tab. Allow “developer.amazon.com” to use your microphone?
Open a demo.html web app side by side of the Alexa GUI to check an actual operation happened at the Vehicle tcu and synchronize the status with virtual car model.
Now test the Alexa skill. You can use an audio command as well. You can ask or write ‘ask genie’.

Alexa developer console

Clean Up

What a fun exploration this has been! Now clean up AWS resources created for this and the previous post to avoid incurring any future AWS services costs. Resources created by CDK can be deleted by deleting the stack on the CloudFormation console. Resources created manually need to be deleted individually.

Conclusion

In this blog post, I showed how you can enable voice command for a connected vehicle and enhance in-vehicle user experience. Similarly, you can also extend this solution for the use cases like Alexa ‘open my garage’. AWS IoT Core Device Shadow API does all the heavy-lifting in this case. Any update in device shadow allows both device and user application to act. Alexa skill is acting as an interface to capture the user command and invoke the lambda function.

Since these are all serverless services, that means this implementation can scale without making any change in the application and you only pay when someone invokes a command. Creating an engaging, high-quality interaction with Alexa in the vehicle is critical. You can refer to Alexa Automotive Documentation for an Alexa Built-in automotive experience.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Implementing a Digital Shadow of a Connected Vehicle with AWS IoT

2020-10-06 Amit Sinha

Post Syndicated from Amit Sinha original https://aws.amazon.com/blogs/architecture/field-notes-implementing-a-digital-shadow-of-a-connected-vehicle-with-aws-iot/

Innovations in connected vehicle technology are expected to improve the quality and speed of vehicle communications and create a safer driving experience. As connected vehicles are becoming part of the mainstream, OEMs (Original Equipment Manufacturers) are broadening the capabilities of their products and dramatically improving the in-vehicle experience for customers.

An important feature in a connected vehicle is its ability to execute a remote command and synchronize the state of the vehicle between a web/mobile app in real time.

This blog demonstrates how to:

secure two-way communication between a device (vehicle telematics control unit) and the AWS Cloud using AWS IoT
execute command at vehicle
execute a remote command
and test with a vehicle virtual model

You can watch a quick animation of a remote command execution in the following GIF:

Solution Overview

In a traditional connected vehicle approach, there are many processes running on multiple servers. These processes are subscribing to one another, coordinating with each other, and polling for an update. This makes scalability and availability a challenge. We use AWS IoT Core and AWS IoT Device Shadow service as primary components for this solution.

This solution has three building blocks:

a vehicle TCU (telematics control unit),
the AWS Cloud (with connection via AWS IoT Core) and
a virtual Model (e.g.; web/mobile app to send/receive commands to TCU). These three building blocks together reflect the current state of a vehicle.

Alexa Solution Overview

The previous diagram shows a message flowing in the following example:

A user of a connected vehicle wants to open their door using a web/mobile app. The app updates the device shadow with (desired {““door””: ““open””}). The app will always request the vehicle to execute the command; therefore, it will always update the device shadow with the desired state.
Vehicle TCU registered the callback function shadowRegisterDeltaCallback(). Listen on delta topics for the device shadow by subscribing to delta topics. Whenever there is a difference between the desired and reported state, the registered callback is called and the delta payload will be available in the callback. Update performed in #1 will received in delta callback.
Now, the vehicle needs to act on the desired state. In this case, ‘act on’ is the door status change. After performing the required action for the door change, the vehicle TCU will update the device shadow with the reported state (reported : { “door”: “open”} )
Now, the vehicle is closing the door. The vehicle will always perform the action; therefore, it will always update device shadow with reports state (reported: {“door” : “close”})
The Web/Mobile app subscribed to topic $aws/things/tcu/shadow/update/accepted”. Therefore, as soon as the vehicle TCU updates the shadow, the Web/Mobile app received the update and synchronized the UI state.
You can also build an Amazon Alexa skill to control your vehicle (“Alexa, raise my window”). After identifying the utterance, Alexa can invoke the Lambda function to update the device shadow and perform the requested action.

Note: For the Web/Mobile app developments for production, it is recommended to use AWS AppSync and AWS Amplify SDK for building a flexible and decoupled application from the API. Refer to this code sample for more detail.

Implementation

First, you need to set up the code. Refer to the directions in this code sample.

Create device

In AWS IoT Core, name a device ‘TCU’ (created by connected-vehicle-app-cdk-stack). Create a new certificate (download files) and attach the policy generated by cdk.

create a certificate

Next, deploy the certificate key and pem file on your device so it can connect with the AWS Cloud using the X.509 certificate. For more detail, refer to the directions in the code sample.

Execute Command at Vehicle

AWS IoT Device Shadow is an important feature of AWS IoT core for remote command execution because it allows you to decouple the vehicle and the app which controls and commands the vehicle. A device’s shadow is a JSON document that is used to store and retrieve current state information for a device. Primarily we use state.desired and state.reported. properties of a device’s shadow document.

The device shadow (Device SKD and APIs) enables applications to interact with devices even when they are offline and allow:

Cloud representation of device state
Query last known state for offline devices
Real-time state changes
Track last known device state
Control devices via change of state
Automatic synchronization once devices connect to the cloud
APIs for applications to discover and interact with devices

The rich features of a device shadow allows the app to interact with the vehicle TCU even when there is no connectivity. Once connectivity is established, the device gateway pushes the changes to device and vice versa.

We need to deploy a program (tcuShadowWrite.py) on the vehicle TCU device to update the device shadow and send the update to the AWS Cloud. This program is available in this code repository.

Let’s assume that after reaching their home, the vehicle’s user closes the door, switches off the headlights, and rolls up the windows. The same state of the vehicle should be reflected on their web/mobile app in real time. The vehicle TCU has to update the “reported” state in the device shadow JSON document.

shadow message

AWSIoTMQTTShadowClient library has a method called shadowUpdate that needs to be called from the vehicle TCU to update the device shadow. Essentially, it is publishing the shadow reported state on topic $aws/things/<thingName>/shadow/update.

If you run tcuShadowWrite.py script, you should be able to see the output as described in the following image.

tcushadowscript

Open the AWS IoT Core console.
Select Manage -> Things -> Select tcu, and then choose Shadow. You should be able to see the shadow message sent from the device described in the following image.

shadow document

Execute Remote Command

We need to deploy a program (tcuShadowRead.py) on the vehicle TCU to receive updates from the AWS Cloud. It is available in this code sample.

Let’s assume the vehicle owner uses the mobile app to open the door, switch on the headlight and roll down the windows. The vehicle TCU should receive this command and instruct the Electronic Control Unit (ECU) to execute the command. The web/mobile app will update the “desire” state in the device shadow JSON document.

shadow message2

In tcuShadowRead.py, AWSIoTMQTTShadowClient has a method shadowRegisterDeltaCallback. It listens on delta topics for this device shadow by subscribing to delta topics. Whenever there is a difference between the desired and reported state, the registered callback is called and the delta payload will be available in the callback.

callback

The callback function has a code to handle the state change request. In an actual implementation, a function like door_handle() would be calling the ECU to execute the door open command.

door open command

If you make changes in Device Shadow on AWS IoT for the tcu device, you should receive the output in the following image.

Device shadow

Test with a Virtual Vehicle Model

To help you test this solution, you can deploy the virtual vehicle model shown in the following image. Detailed steps for the deployment of the virtual vehicle is available in this code sample.

virtual vehicle model

Any changes in the model state should be reflected on the virtual demo vehicle and vice versa.

Here, we use open-source Paho-mqtt library. and Developers can use this to write JavaScript applications that access AWS IoT using MQTT or MQTT over the WebSocket protocol without using AWS IoT SDK. This implementation is made simpler by using AWS IoT Device SDK for JavaScript v2 Readme.

Review the JavaScript file named webSocketApp.js:

websocket app

onMessageArrived() function will be invoked whenever the device will change the shadow state.
handle<object>Command functions (such as handleDoorCommand) will be called with the current state. Call this function if the device has received any status change.

We have another JavaScript file demo-car.js in the demo-car folder. This includes the functions that our simulated vehicle will use in order to change the device shadow.

Let’s review the following code:

democar javascript

We have 3 handle command function defined (e.g., handleDoorCommand) to take the user’s input and access AWS IoT Core services.
connectDevice is an actual function to invoke updateThingsShadow function to send the desired state
accessIoTDevice uses Amazon Cognito Identity to get the authenticated identities to access AWS IoT Core securely without exposing the access key or secret key.

Now, keep demo.html side by side to your code and run the tcuShadowRead.py script. Any change made at the virtual model will reflect at the command output. Similarly, any change made by tcuShadowWrite.py will reflect the state update on the virtual model.

Conclusion

In this blog, we showed how to implement a digital shadow of a connected vehicle using AWS IoT. This solution removes complexity from running multiple processes in parallel and ensures a successful outcome. AWS IoT Core enables scalable, secure, low-latency, low-overhead, bi-directional communication between connected devices, tolerate and recover from slow/brittle connection, the AWS Cloud and customer-facing applications.

The Device Shadow in AWS IoT Core enables the AWS Cloud and applications to easily and accurately receive data from connected vehicles and send commands to the vehicles. The Device Shadow’s uniform and always-available interface simplifies the implementation of time-sensitive use cases. These include, remote command execution and two-way state synchronization between a device and app where the cloud is acting as a broker. This solution enables you to shift operational responsibilities of a connected vehicle infrastructure to the AWS Cloud while paying only for what you use, with no minimum fees or mandatory service usage.

For more information about how AWS can help you build connected vehicle solutions, refer to the AWS Connected Vehicle solution page.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Building resilient serverless patterns by combining messaging services

2020-10-05 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-resilient-no-code-serverless-patterns-by-combining-messaging-services/

In “Choosing between messaging services for serverless applications”, I explain the features and differences between the core AWS messaging services. Amazon SQS, Amazon SNS, and Amazon EventBridge provide queues, publish/subscribe, and event bus functionality for your applications. Individually, these are robust, scalable services that are fundamental building blocks of serverless architectures.

However, you can also combine these services to solve specific challenges in distributed architectures. By doing this, you can use specific features of each service to build sophisticated patterns with little code. These combinations can make your applications more resilient and scalable, and reduce the amount of custom logic and architecture in your workload.

In this blog post, I highlight several important patterns for serverless developers. I also show how you use and deploy these integrations with the AWS Serverless Application Model (AWS SAM).

Examples in this post refer to code that can be downloaded from this GitHub repo. The README.md file explains how to deploy and run each example.

SNS to SQS: Adding resilience and throttling to message throughput

SNS has a robust retry policy that results in up to 100,010 delivery attempts over 23 days. If a downstream service is unavailable, it may be overwhelmed by retries when it comes back online. You can solve this issue by adding an SQS queue.

Adding an SQS queue between the SNS topic and its subscriber has two benefits. First, it adds resilience to message delivery, since the messages are durably stored in a queue. Second, it throttles the rate of messages to the consumer, helping smooth out traffic bursts caused by the service catching up with missed messages.

To build this in an AWS SAM template, you first define the two resources, and the SNS subscription:

  MySqsQueue:
    Type: AWS::SQS::Queue

  MySnsTopic:
    Type: AWS::SNS::Topic
    Properties:
      Subscription:
        - Protocol: sqs
          Endpoint: !GetAtt MySqsQueue.Arn

Finally, you provide permission to the SNS topic to publish to the queue, using the AWS::SQS::QueuePolicy resource:

  SnsToSqsPolicy:
    Type: AWS::SQS::QueuePolicy
    Properties:
      PolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Sid: "Allow SNS publish to SQS"
            Effect: Allow
            Principal: "*"
            Resource: !GetAtt MySqsQueue.Arn
            Action: SQS:SendMessage
            Condition:
              ArnEquals:
                aws:SourceArn: !Ref MySnsTopic
      Queues:
        - Ref: MySqsQueue

To test this, you can publish a message to the SNS topic and then inspect the SQS queue length using the AWS CLI:

aws sns publish --topic-arn "arn:aws:sns:us-east-1:123456789012:sns-sqs-MySnsTopic-ABC123ABC" --message "Test message"
aws sqs get-queue-attributes --queue-url "https://sqs.us-east-1.amazonaws.com/123456789012/sns-sqs-MySqsQueue- ABC123ABC " --attribute-names ApproximateNumberOfMessages

This results in the following output:

Another usage of this pattern is when you want to filter messages in architectures using an SQS queue. By placing the SNS topic in front of the queue, you can use the message filtering capabilities of SNS. This ensures that only the messages you need are published to the queue. To use message filtering in AWS SAM, use the AWS:SNS:Subcription resource:

  QueueSubcription:
    Type: 'AWS::SNS::Subscription'
    Properties:
      TopicArn: !Ref MySnsTopic
      Endpoint: !GetAtt MySqsQueue.Arn
      Protocol: sqs
      FilterPolicy:
        type:
        - orders
        - payments 
      RawMessageDelivery: 'true'

EventBridge to SNS: combining features of both services

Both SNS and EventBridge have different characteristics in terms of targets, and integration with broader features. This table compares the major differences between the two services:

	Amazon SNS	Amazon EventBridge
Number of targets	10 million (soft)	5
Limits	100,000 topics. 12,500,000 subscriptions per topic.	100 event buses. 300 rules per event bus.
Input transformation	No	Yes – see details.
Message filtering	Yes – see details.	Yes, including IP address matching – see details.
Format	Raw or JSON	JSON
Receive events from AWS CloudTrail	No	Yes
Targets	HTTP(S), SMS, SNS Mobile Push, Email/Email-JSON, SQS, Lambda functions	15 targets including AWS Lambda, Amazon SQS, Amazon SNS, AWS Step Functions, Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose.
SaaS integration	No	Yes – see integration partners.
Schema Registry integration	No	Yes – see details.
Dead-letter queues supported	Yes	No
Public visibility	Can create public topics	Cannot create public buses
Cross-Region	You can subscribe your AWS Lambda functions to an Amazon SNS topic in any Region.	Targets must be same Region. You can publish across Region to another event bus.

In this pattern, you configure an SNS topic as a target of an EventBridge rule:

In the AWS SAM template, you declare the resources in the preceding diagram as follows:

Resources:
  MySnsTopic:
    Type: AWS::SNS::Topic

  EventRule: 
    Type: AWS::Events::Rule
    Properties: 
      Description: "EventRule"
      EventPattern: 
        account: 
          - !Sub '${AWS::AccountId}'
        source:
          - "demo.cli"
      Targets: 
        - Arn: !Ref MySnsTopic
          Id: "SNStopic"

The default bus already exists in every AWS account, so there is no need to declare it. For the event bus to publish matching events to the SNS topic, you define permissions using the AWS::SNS::TopicPolicy resource:

  EventBridgeToToSnsPolicy:
    Type: AWS::SNS::TopicPolicy
    Properties: 
      PolicyDocument:
        Statement:
        - Effect: Allow
          Principal:
            Service: events.amazonaws.com
          Action: sns:Publish
          Resource: !Ref MySnsTopic
      Topics: 
        - !Ref MySnsTopic

EventBridge has a limit of five targets per rule. In cases where you must send events to hundreds or thousands of targets, publishing to SNS first and then subscribing those targets to the topic works around this limit. Both services have different targets, and this pattern allows you to deliver EventBridge events to SMS, HTTP(s), email and SNS mobile push.

You can transform and filter the message using these services, often without needing an AWS Lambda function. SNS does not support input transformation but you can do this in an EventBridge rule. Message filtering is possible in both services but EventBridge provides richer content filtering capabilities.

AWS CloudTrail can log and monitor activity across services in your AWS account. It can be a useful source for events, allowing you to respond dynamically to objects in Amazon S3 or react to changes in your environment, for example. This natively integrates with EventBridge, allowing you to ingest events at scale from dozens of services.

Using EventBridge enables you to source events from outside your AWS account, offering integrations with a list of software as a service (SaaS) providers. This capability allows you to receive events from your accounts with SaaS providers like Zendesk, PagerDuty, and Auth0. These events are delivered to a partner event bus in your account, and can then be filtered and routed to an SNS topic.

Additionally, this pattern allows you to deliver events to Lambda functions in other AWS accounts and in other AWS Regions. You can invoke Lambda from SNS topics in other Regions and accounts. It’s also possible to make SNS topics publicly read-only, making them extensible endpoints that other third parties can consume from. SNS has comprehensive access control, which you can incorporate into this pattern.

EventBridge to SQS: Building fault-tolerant microservices

EventBridge can route events to targets such as microservices. In the case of downstream failures, the service retries events for up to 24 hours. For workloads where you need a longer period of time to store and retry messages, you can deliver the events to an SQS queue in each microservice. This durably stores those events until the downstream service recovers. Additionally, this pattern protects the microservice from large bursts of traffic by throttling the delivery of messages.

The resources declared in the AWS SAM template are similar to the previous examples, but it uses the AWS::SQS::QueuePolicy resource to grant the appropriate permission to EventBridge:

  EventBridgeToToSqsPolicy:
    Type: AWS::SQS::QueuePolicy
    Properties:
      PolicyDocument:
        Statement:
        - Effect: Allow
          Principal:
            Service: events.amazonaws.com
          Action: SQS:SendMessage
          Resource:  !GetAtt MySqsQueue.Arn
      Queues:
        - Ref: MySqsQueue

Conclusion

You can combine these services in your architectures to implement patterns that solve complex challenges, often with little code required. This blog post shows three examples that implement message throttling and queueing, integrating SNS and EventBridge, and building fault tolerant microservices.

To learn more building decoupled architectures, see this Learning Path series on EventBridge. For more serverless learning resources, visit https://serverlessland.com.

Custom logging with AWS Batch

2020-10-02 Emma White

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/custom-logging-with-aws-batch/

This post was written by Christian Kniep, Senior Developer Advocate for HPC and AWS Batch.

For HPC workloads, visibility into the logs of jobs is important to debug a job which failed, but also to have insights into a running job and track its trajectory to influence the configuration of the next job or terminate the job because it went off track.

With AWS Batch, customers are able to run batch workloads at scale, reliably and with ease as this managed serves takes out the undifferentiated heavy lifting. The customer can then focus on submitting jobs and getting work done. Customers told us that at a certain scale, the single logging driver available within AWS Batch made it hard to separate logs as they were all ending up in the same log group in Amazon CloudWatch.

With the new release of customer logging driver support, customers are now able to adjust how the job output is logged. Not only customize the Amazon CloudWatch setting, but enable the use of external logging frameworks such as splunk, fluentd, json-files, syslog, gelf, journald.

This allow AWS Batch jobs to use the existing systems they are accustom to, with fine-grained control of the log data for debugging and access control purposes.

In this blog, I show the benefits of custom logging with AWS Batch by adjusting the log targets for jobs. The first example will customize the Amazon CloudWatch log group, the second will log to Splunk, an external logging service.

Example setup

To showcase this new feature, I use the AWS Command Line Interface (CLI) to setup the following:

IAM roles, policies, and profiles to grant access and permissions
A compute environment to provide the compute resources to run jobs
A job queue, which supervises the job execution and schedules jobs on a compute environment
A job definition, which uses a simple job to demonstrate how the new configuration can be applied

Once those tasks are completed, I submit a job and send logs to a customized CloudWatch log-group and Splunk.

Prerequisite

To make things easier, I first set a couple of environment variables to have the information handy for later use. I use the following code to set up the environment variables.

# in case it is not already installed
sudo yum install -y jq 
export MD_URL=http://169.254.169.254/latest/meta-data
export IFACE=$(curl -s ${MD_URL}/network/interfaces/macs/)
export SUBNET_ID=$(curl -s ${MD_URL}/network/interfaces/macs/${IFACE}/subnet-id)
export VPC_ID=$(curl -s ${MD_URL}/network/interfaces/macs/${IFACE}/vpc-id)
export AWS_REGION=$(curl -s ${MD_URL}/placement/availability-zone | sed 's/[a-z]$//')
export AWS_ACCT_ID=$(curl -s ${MD_URL}/identity-credentials/ec2/info |jq -r .AccountId)
export AWS_SG_DEFAULT=$(aws ec2 describe-security-groups \
--filters Name=group-name,Values=default \
|jq -r '.SecurityGroups[0].GroupId')

IAM

When using the AWS Management Console, you must create IAM roles manually.

Trust Policies

IAM Roles are defined to be used by a certain service. In the simplest case, you want a role to be used by Amazon EC2 – the service that provides the compute capacity in the cloud. This defines which entity is able to use an IAM Role, called Trust Policy. To set up a trust policy for an IAM role, use the following code snippet.

cat > ec2-trust-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Service": "ec2.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
  }]
}
EOF

Instance role

With the IAM trust policy, I now create an ecsInstanceRole and attach the pre-defined policy AmazonEC2ContainerServiceforEC2Role. This allows an instance to interact with Amazon ECS.

aws iam create-role --role-name ecsInstanceRole \
 --assume-role-policy-document file://ec2-trust-policy.json
aws iam create-instance-profile --instance-profile-name ecsInstanceProfile
aws iam add-role-to-instance-profile \
    --instance-profile-name ecsInstanceProfile \
    --role-name ecsInstanceRole
aws iam attach-role-policy --role-name ecsInstanceRole \
 --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role

Service Role

The AWS Batch service uses a role to interact with different services. The trust relationship reflects that the AWS Batch service is going to assume this role. You can set up this role with the following logic.

cat > svc-trust-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Service": "batch.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
  }]
}
EOF
aws iam create-role --role-name AWSBatchServiceRole \
--assume-role-policy-document file://svc-trust-policy.json
aws iam attach-role-policy --role-name AWSBatchServiceRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSBatchServiceRole

In addition to dealing with Amazon ECS, the instance role can create and write to Amazon CloudWatch log groups, to control which log group names are used, a condition is attached.

While the compute environment is coming up, let us create and attach a policy to make a new log-group possible.

cat > policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "logs:CreateLogGroup"
    ],
    "Resource": "*",
    "Condition": {
      "StringEqualsIfExists": {
        "batch:LogDriver": ["awslogs"],
        "batch:AWSLogsGroup": ["/aws/batch/custom/*"]
      }
    }
  }]
}
EOF
aws iam create-policy --policy-name batch-awslog-policy \
    --policy-document file://policy.json
aws iam attach-role-policy --policy-arn arn:aws:iam::${AWS_ACCT_ID}:policy/batch-awslog-policy --role-name ecsInstanceRole

At this point, I created the IAM roles and policies so that the instance and service are able to interact with the AWS APIs, including trust-policies to define which services are meant to use them. EC2 for the ecsInstanceRole and the AWSBatchServiceRole for the AWS Batch service itself.

Compute environment

Now, I am going to create a compute environment, which is going to spin up an instance (one vCPU target) to run the example job in.

cat > compute-environment.json << EOF
{
  "computeEnvironmentName": "od-ce",
  "type": "MANAGED",
  "state": "ENABLED",
  "computeResources": {
    "type": "EC2",
    "allocationStrategy": "BEST_FIT_PROGRESSIVE",
    "minvCpus": 1,
    "maxvCpus": 8,
    "desiredvCpus": 1,
    "instanceTypes": ["m5.xlarge"],
    "subnets": ["${SUBNET_ID}"],
    "securityGroupIds": ["${AWS_SG_DEFAULT}"],
    "instanceRole": "arn:aws:iam::${AWS_ACCT_ID}:instance-profile/ecsInstanceRole",
    "tags": {"Name": "aws-batch-compute"},
    "bidPercentage": 0
  },
  "serviceRole": "arn:aws:iam::${AWS_ACCT_ID}:role/AWSBatchServiceRole"
}
EOF
aws batch create-compute-environment --cli-input-json file://compute-environment.json

Once this section is complete, a compute environment is being spun up in the back. This will take a moment. You can use the following command to check on the status of the compute environment.

aws batch describe-compute-environments

Once it is enabled and valid we can continue by setting up the job queue.

Job Queue

Now that I have a compute environment up and running, I will create a job queue which accepts job submissions and schedules the jobs on the compute environment.

cat > job-queue.json << EOF
{
  "jobQueueName": "jq",
  "state": "ENABLED",
  "priority": 1,
  "computeEnvironmentOrder": [{
    "order": 0,
    "computeEnvironment": "od-ce"
  }]
}
EOF
aws batch create-job-queue --cli-input-json file://job-queue.json

Job definition

The job definition is used as a template for jobs. This example runs a plain container and prints the environment variables. With the new release of AWS Batch, the logging driver awslogs now allows you to change the log group configuration within the job definition.

cat > job-definition.json << EOF
{
  "jobDefinitionName": "alpine-env",
  "type": "container",
  "containerProperties": {
  "image": "alpine",
  "vcpus": 1,
  "memory": 128,
  "command": ["env"],
  "readonlyRootFilesystem": true,
  "logConfiguration": {
    "logDriver": "awslogs",
    "options": { 
      "awslogs-region": "${AWS_REGION}", 
      "awslogs-group": "/aws/batch/custom/env-queue",
      "awslogs-create-group": "true"}
    }
  }
}
EOF
aws batch register-job-definition --cli-input-json file://job-definition.json

Job Submission

Using the above job definition, you can now submit a job.

aws batch submit-job \
  --job-name test-$(date +"%F_%H-%M-%S") \
  --job-queue arn:aws:batch:${AWS_REGION}:${AWS_ACCT_ID}:job-queue/jq \
  --job-definition arn:aws:batch:${AWS_REGION}:${AWS_ACCT_ID}:job-definition/alpine-env:1

Now, you can check the ‘Log Group’ in CloudWatch. Go to the CloudWatch console and find the ‘Log Group’ section on the left.

log groups in cloudwatch

Now, click on the log group defined above, and you should see the output of the job which allows for debugging if something within the container went wrong or processing logs and create alarms and reports.

cloudwatch log events

Splunk

Splunk is an established log engine for a broad set of customers. You can use the Docker container to set up a Splunk server quickly. More information can be found in the Splunk documentation. You need to configure the HTTP Event Collector, which provides you with a link and a token.

To send logs to Splunk, create an additional job-definition with the Splunk token and URL. Please adjust the splunk-url and splunk-token to match your Splunk setup.

{
  "jobDefinitionName": "alpine-splunk",
  "type": "container",
  "containerProperties": {
    "image": "alpine",
    "vcpus": 1,
    "memory": 128,
    "command": ["env"],
    "readonlyRootFilesystem": false,
    "logConfiguration": {
      "logDriver": "splunk",
      "options": {
        "splunk-url": "https://<splunk-url>",
        "splunk-token": "XXX-YYY-ZZZ"
      }
    }
  }
}

This forwards the logs to Splunk, as you can see in the following image.

forward to splunk

Conclusion

This blog post showed you how to apply custom logging to AWS Batch using the awslog and Splunk logging driver. While these are two important logging drivers, please head over to the documentation to find out about fluentd, syslog, json-file and other drivers to find the best driver to match your current logging infrastructure.

Field Notes: Gaining Insights into Labeling Jobs for Machine Learning

2020-09-30 Michael Graumann

Post Syndicated from Michael Graumann original https://aws.amazon.com/blogs/architecture/field-notes-gaining-insights-into-labeling-jobs-for-machine-learning/

In an era where more and more data is generated, it becomes critical for businesses to derive value from it. With the help of supervised learning, it is possible to generate models to automatically make predictions or decisions by leveraging historical data. For example, image recognition for self-driving cars, predicting anomalies on X-rays, fraud detection in finance and more. With supervised learning, these models learn from labeled data. The success of those models is highly dependent on readily available, high quality labeled data.

However, you might encounter cases where a high percentage of your pre-existing data is unlabeled. In these situations, providing correct labeling to previously unlabeled data points would directly translate to higher model accuracy.

Amazon SageMaker Ground Truth helps you with exactly that. It lets you build highly accurate training datasets for machine learning quickly. SageMaker Ground Truth provides your labelers with built-in workflows and interfaces for common labeling tasks. This process could take several hours or more depending on the size of your unlabeled dataset, and you might have a need to track the progress easily, preferably in the form of a dashboard.

In this blogpost we show how to gain deep insights into the progress of labeling and the performance of the workers by using Amazon Athena and Amazon QuickSight. We use Amazon Athena former to set up several views with specific insights into the labeling progress. Finally we will reference these views in Amazon QuickSight to visualize the data in a dashboard.

This approach also works for combining multiple AWS services in general. AWS provides many building blocks than you can mix-and-match to create a unique, integrated solution with cohesive insights. In this blog post we use data produced by one service (Ground Truth), prepare it with another (Athena) and visualize with a third (QuickSight). The following diagram shows this architecture.

Solution Architecture

ML Solution Architecture

Mapping a JSON structure to a table structure

Ground Truth creates several directories in your Amazon S3 output path. These directories contain the results of your labeling job and other artifacts of the job. The top-level directory for a labeling job has the same name as your labeling job, while the output directories are placed inside it. We will create all insights from what SageMaker Ground Truth calls worker responses.

All respective JSON files reside in the path s3://bucket/<job-name>/annotations/worker-response/.

To analyze the labeling data with Amazon Athena we need to understand the structure of the underlying JSON files. Let’s review the example below. For each item that was labeled, we see the label itself, followed by the submission time and a workerId pointing to an identity. This identity lives in Amazon Cognito, a fully managed service that provides the user directory for our labelers.

{
    "answers": [
        {
            "answerContent": {
                "crowd-classifier": {
                    "label": "Compute"
                }
            },
            "submissionTime": "2020-03-27T10:31:04.210Z",
            "workerId": "private.eu-west-1.1111111111111111",
            "workerMetadata": {
                "identityData": {
                    "identityProviderType": "Cognito",
                    "issuer": "https://cognito-idp.eu-west-1.amazonaws.com/eu-west-1_111111111",
                    "sub": "11111111-1111-1111-1111-111111111111"
                }
            }
        },
        ...
    ]
}

Although the data is stored in Amazon S3 object storage, we are able to use SQL to access the data by using Amazon Athena. Since we now understand the JSON structure from shown in the preceding code, we use Athena and define how to interpret the data that is relevant to us. We do so by first creating a database using the Athena Query Editor:

CREATE DATABASE analyze_labels_db;

Once inside the database, we add the table schema. The actual files remain on Amazon S3, but using the metadata catalog, Athena then knows where the data lies and how to interpret it. The AWS Glue Data Catalog is a central repository to store structural and operational metadata for all your data assets. For a given dataset, you can store its table definition, physical location, add business relevant attributes, in addition to track how this data has changed over time. Besides, Athena the AWS Glue Data Catalog also provides out-of-box integration with Amazon EMR and Amazon Redshift Spectrum. Once you add your table definitions to the Glue Data Catalog, they are available for ETL. They are also readily available for querying in Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum so that you can have a common view of your data between these services.

When going from JSON to SQL, we are crossing format boundaries. To further facilitate how to read the JSON formatted data we are using SerDe Properties to replace the hyphen in crowd-classifier with an underscore due to DDL constraints. Finally we point the location to our Amazon S3 bucket containing the single worker responses. Recognize in the following script that we translate the nested structure of the JSON file itself into a hierarchical, nested data structure in the schema definition. Also, we could leave out the workerMetadata as we don’t need it at this time. The data would still stay in the files on Amazon S3, so that we could later change and add the workerMetadata STRUCT into the table definition for our analysis.

CREATE EXTERNAL TABLE annotations_raw (
  answers array<
    struct<answercontent: 
      struct<crowd_classifier: 
        struct<label: string>
      >,
      submissionTime: string,
      workerId: string,
      workerMetadata: 
        struct<identityData: 
          struct<identityProviderType: string, issuer: string, sub: string>
        >
    >
  >
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  "mapping.crowd_classifier"="crowd-classifier" 
) 
LOCATION 's3://<YOUR_BUCKET>/<JOB_NAME>/annotations/worker-response/'

Creating Views in Athena

Now, we have nested data in our annotations_raw table. For many use cases, especially for analytical uses, representing data in a tabular fashion—as rows—is more natural. This is also the standard way when using SQL and business intelligence tools. To unnest the hierarchical data into flattened rows, we create the following view which will serve as foundation for the other views we create. For an in-depth look into unnesting data with Amazon Athena, read this blog post.

Some of the information we’re interested in might not be part of the document, but is encoded in the path. We use a trick in Athena by using the $path variable from the Presto Hive Connector. This determines which Amazon S3 file contains data that is returned by a specific row in an Athena table. This way we can find out which data object an annotation belongs to. Since Athena is built on top of Presto, we are able to use Presto’s built-in regexp_extract function to find out the iteration as well as the data object id per labeling result. We also cast the submission time in date format to later determine the labeling progress per day.

CREATE OR REPLACE VIEW annotations_view AS
SELECT 
  regexp_extract("$path", 'iteration-[0-9]*') as iteration,
  regexp_extract("$path", '(iteration-[0-9]*\/([0-9]*))',2) as dataRecord,
  answer.answercontent.crowd_classifier.label,
  cast(from_iso8601_timestamp(answer.submissionTime) as timestamp) as submissionTime,
  cast(from_iso8601_timestamp(answer.submissionTime) as date) as submissionDay,
  answer.workerId,
  answer.workerMetadata.identityData.identityProviderType,
  answer.workerMetadata.identityData.issuer,
  answer.workerMetadata.identityData.sub,
  "$path" path
FROM 
  annotations_raw
CROSS JOIN UNNEST(answers) AS t(answer)

This view, annotations_view, will be the starting point for the other views we will be creating in further in this post.

Visualizing with QuickSight

In this section, we explore a way to visualize the views we build in Athena by pointing Amazon QuickSight to the respective view. Amazon QuickSight lets you create and publish interactive dashboards that include ML Insights. Dashboards can then be accessed from any device, and embedded into your applications, portals, and websites.

Thanks to the tight integration between Athena and QuickSight, we are able to map one dataset in QuickSight to one Athena view. In order to further optimize the performance of the dashboard, we can optionally import the datasets into the in-memory optimized calculation engine for Amazon QuickSight called SPICE. With the datasets in place we can now create an analysis in order to interact with the visuals we’re going to add. You can think of an analysis as a container for a set of related visuals. You can use multiple datasets in an analysis, although any given visual can only use one of those datasets. After you create an analysis and an initial visual, you can expand the analysis. You can do this for example by adding datasets and visuals.

Let’s start with our first insight.

Annotations per worker

We’d like to gain insights not only into the total number of labeled items but also on the level of contributions of each individual workers. This could give us an indication whether the labels were created by a diverse crowd of labelers or by a few productive ones. A largely disproportionate amount of contributions from a handful of workers who may have brought along their biases.

SageMaker Ground Truth calls labeled data objects annotations, which is the result of a single workers labeling task.

Luckily we encapsulated all the heavy lifting of format conversion in the annotations_view, so that it is now easy to create a view for the annotations per user:

CREATE OR REPLACE VIEW annotations_per_user AS
SELECT COUNT(sub) AS LabeledItems,
sub AS User
FROM annotations_view
GROUP BY sub
ORDER BY LabeledItems DESC

Next we visualize this view in QuickSight. We add a visual to our analysis, select the respective dataset for the view and use the AutoGraph feature, which chooses the most appropriate visual type. Since we already arranged our view in Athena by the number of labeled items in descending order, there is no need now to sort the data in QuickSight. In the following screenshot, worker c4ef78e4... contributed more labels compared to their peers.

Annotations per worker

This view gives you an indicator to check for a bias that the leading worker might have brought along.

Annotations per label

One thing we want to be aware of is potential imbalances between classes in our dataset. Especially simple machine learning models, which may learn to frequently predict a label that is massively over represented in the dataset. If we can identify an imbalance, we can apply mitigation actions such as upsampling data of underrepresented classes. With the following view we list the total number of annotations per label.

CREATE OR REPLACE VIEW annotations_per_label AS
SELECT Count(dataRecord) AS TotalLabels, label As Label 
FROM annotations_view
GROUP BY label
ORDER BY TotalLabels DESC, Label;

As before, we create a dataset in QuickSight pointing to the annotations_per_label view, open the analysis, add a new visual and leverage the AutoGraph functionality. The result is the following visual representation:

Annotations per worker 2

One can clearly see that the Analytics & AI/ML class is massively underrepresented. At this point, you might want to try getting more data or think about upsampling data for that class.

Annotations per day

Seeing the total number of annotations per label and per worker is good, but we are also interested in how the labeling progress changes over time. This way we might see spikes related to labeller activations. We can also or estimate how long it takes to reach a certain goal of annotations given the current pace. For this purpose we create the following view aggregating the total annotations per day.

CREATE OR REPLACE VIEW annotations_per_day AS
SELECT COUNT(datarecord) AS LabeledItems,
submissionDay
FROM annotations_view
GROUP BY submissionDay
ORDER BY submissionDay, LabeledItems DESC

This time the QuickSight AutoGraph provides us with the following line chart. You might have noticed that the axis labels do not match the column names in Athena. That is because we renamed them in QuickSight for better readability.

Total annotations per day

In the preceding chart we see that there is no consistent pace of labeling, which makes it hard to predict when a certain amount of labeled data will be reached. In this example, after starting strong the progress immediately went down. Knowing this, we might want to take action into motivating our workers to contribute more and validate the effectiveness of these actions with the help of this chart. The spikes indicate an effective short-term action.

Distribution of total annotations by user

We already have insights into annotations per worker, per label and per day. Let us now now see what insights we can get from aggregating some of this information.

The bigger your labeling workforce gets, the harder it can become to see the whole picture. For that reason we will now create a histogram consisting of five buckets. Each bucket represents an interval of total annotations (for example, 0-25 annotations) mapped to the number of users whose amount of total annotations lies in that interval. This allows us to get a sense of what kind of bias might be introduced by the majority of annotations being contributed by a small amount of workers.

To do that, we use the Presto function width_bucket which returns the number of labeled data objects according to the five buckets we defined with a size of 25 each. We define these buckets by creating an Array with 5 elements that specify the boundaries.

CREATE OR REPLACE VIEW users_per_bucket_annotations AS
SELECT 
bucket,numberOfUsers,
CASE
   WHEN bucket=5 THEN 'B' || cast(bucket AS VARCHAR(10)) || ': ' || cast(((bucket-1) * 25) AS VARCHAR(10)) || '+'
   ELSE 'B' || cast(bucket AS VARCHAR(10)) || ': ' || cast(((bucket-1) * 25) AS VARCHAR(10)) || '-' || cast((bucket * 25) AS VARCHAR(10))
END AS NumberOfAnnotations
FROM
(SELECT width_bucket(labeleditems,ARRAY[0,25,50,75,100]) AS bucket,
 count(user) AS numberOfUsers
FROM annotations_per_user
GROUP BY 1
ORDER BY bucket)

A SELECT * FROM users_per_bucket_annotations produces the following result:

A SELECT FROM users_per_bucket_annotations

Let’s now investigate the same data via QuickSight:

Annotations per User in buckets of Size 25

Now that we can look at the data visually it becomes clear that we have a bimodal distribution, with many labelers having done very little, and many labelers doing quite a lot. This may warrant interviewing some labelers to find out if there is something holding back users from progressing, or if we can keep engagement high over time.

Putting it all together in QuickSight

Since we created all previous visuals into one analysis, we can now utilize it as a central place to consume our insights in a user-friendly way. Moreover, we can share our insights with others as a read-only snapshot which QuickSight calls a dashboard. User who are dashboard viewers can view and filter the dashboard data as below:

Groundtruth dashboard

Furthermore, you can generate a report and let QuickSight send it either once or on a schedule (daily, weekly or monthly) to your peers. This way users do not have to sign in and they can get reminders to check the progress of the labeling job. Lastly, sending out those reports is an opportunity to stay in touch with the labelers and keep the engagement high.

Conclusion

In this blogpost, we have shown one example of combining multiple AWS services in order to build a solution tailored to your needs. We took the Amazon S3 output generated by SageMaker Ground Truth and showed how it can be further processed and analyzed with Athena. Finally, we created a central place to consume our insights in a user-friendly way with QuickSight. By putting it all together in a dashboard we were able to share our insights with our peers.

You can take the same pattern and apply it to other situations: take some of the many building blocks AWS provides and mix-and-match them to create a unique, integrated solution with cohesive insights just as we did with Ground Truth, Athena, and QuickSight.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Monitoring the Java Virtual Machine Garbage Collection on AWS Lambda

2020-09-23 Steffen Grunwald

Post Syndicated from Steffen Grunwald original https://aws.amazon.com/blogs/architecture/field-notes-monitoring-the-java-virtual-machine-garbage-collection-on-aws-lambda/

When you want to optimize your Java application on AWS Lambda for performance and cost the general steps are: Build, measure, then optimize! To accomplish this, you need a solid monitoring mechanism. Amazon CloudWatch and AWS X-Ray are well suited for this task since they already provide lots of data about your AWS Lambda function. This includes overall memory consumption, initialization time, and duration of your invocations. To examine the Java Virtual Machine (JVM) memory you require garbage collection logs from your functions. Instances of an AWS Lambda function have a short lifecycle compared to a long-running Java application server. It can be challenging to process the logs from tens or hundreds of these instances.

In this post, you learn how to emit and collect data to monitor the JVM garbage collector activity. Having this data, you can visualize out-of-memory situations of your applications in a Kibana dashboard like in the following screenshot. You gain actionable insights into your application’s memory consumption on AWS Lambda for troubleshooting and optimization.

The lifecycle of a JVM application on AWS Lambda

Let’s first revisit the lifecycle of the AWS Lambda Java runtime and its JVM:

A Lambda function is invoked.
AWS Lambda launches an execution context. This is a temporary runtime environment based on the configuration settings you provide, like permissions, memory size, and environment variables.
AWS Lambda creates a new log stream in Amazon CloudWatch Logs for each instance of the execution context.
The execution context initializes the JVM and your handler’s code.

You typically see the initialization of a fresh execution context when a Lambda function is invoked for the first time, after it has been updated, or it scales up in response to more incoming events.

AWS Lambda maintains the execution context for some time in anticipation of another Lambda function invocation. In effect, the service freezes the execution context after a Lambda function completes. It thaws the execution context when the Lambda function is invoked again if AWS Lambda chooses to reuse it.

During invocations, the JVM also maintains garbage collection as usual. Outside of invocations, the JVM and its maintenance processes like garbage collection are also frozen.

Garbage collection and indicators for your application’s health

The purpose of JVM garbage collection is to clean up objects in the JVM heap, which is the space for an application’s objects. It finds objects which are unreachable and deletes them. This frees heap space for other objects.

You can make the JVM log garbage collection activities to get insights into the health of your application. One example for this is the free heap after each garbage collection. If this metric keeps shrinking, it is an indicator for a memory leak – eventually turning into an OutOfMemoryError. If there is not enough of free heap, the JVM might be too busy with garbage collection instead of running your application code. Otherwise, a heap that is too big does indicate that there’s potential to decrease the memory configuration of your AWS Lambda function. This keeps garbage collection pauses low and provides a consistent response time.

The garbage collection logging can be configured via an environment variable as part of the AWS Lambda function configuration. The environment variable JAVA_TOOL_OPTIONS is considered by both the Java 8 and 11 JVMs. You use it to pass options that you would usually add to the command line when launching the JVM. The options to configure garbage collection logging and the output is specific to the Java version.

Java 11 uses the Unified Logging System (JEP 158 and JEP 271) which has been introduced in Java 9. Logging can be configured with the environment variable:

JAVA_TOOL_OPTIONS=-Xlog:gc+metaspace,gc+heap,gc:stdout:time,tags

The Serial Garbage Collector will output the logs:

[<TIMESTAMP>][gc] GC(4) Pause Full (Allocation Failure) 9M->9M(11M) 3.941ms (D)
[<TIMESTAMP>][gc,heap] GC(3) DefNew: 3063K->234K(3072K) (A)
[<TIMESTAMP>][gc,heap] GC(3) Tenured: 6313K->9127K(9152K) (B)
[<TIMESTAMP>][gc,metaspace] GC(3) Metaspace: 762K->762K(52428K) (C)
[<TIMESTAMP>][gc] GC(3) Pause Young (Allocation Failure) 9M->9M(21M) 23.559ms (D)

Prior to Java 9, including Java 8, you configure the garbage collection logging as follows:

JAVA_TOOL_OPTIONS=-XX:+PrintGCDetails -XX:+PrintGCDateStamps

The Serial garbage collector output in Java 8 is structured differently:

<TIMESTAMP>: [GC (Allocation Failure)
    <TIMESTAMP>: [DefNew: 131042K->131042K(131072K), 0.0000216 secs] (A)
    <TIMESTAMP>: [Tenured: 235683K->291057K(291076K), 0.2213687 secs] (B)
    366725K->365266K(422148K), (D)
    [Metaspace: 3943K->3943K(1056768K)], (C)
    0.2215370 secs]
    [Times: user=0.04 sys=0.02, real=0.22 secs]
<TIMESTAMP>: [Full GC (Allocation Failure)
    <TIMESTAMP>: [Tenured: 297661K->36658K(297664K), 0.0434012 secs] (B)
    431575K->36658K(431616K), (D)
    [Metaspace: 3943K->3943K(1056768K)], 0.0434680 secs] (C)
    [Times: user=0.02 sys=0.00, real=0.05 secs]

Independent of the Java version, the garbage collection activities are logged to standard out (stdout) or standard error (stderr). Logs appear in the AWS Lambda function’s log stream of Amazon CloudWatch Logs. The log contains the size of memory used for:

A: the young generation
B: the old generation
C: the metaspace
D: the entire heap

The notation is before-gc -> after-gc (committed heap). Read the JVM Garbage Collection Tuning Guide for more details.

Visualizing the logs in Amazon Elasticsearch Service

It is hard to fully understand the garbage collection log by just reading it in Amazon CloudWatch Logs. You must visualize it to gain more insight. This section describes the solution to achieve this.

Solution Overview

Java Solution Overview

Amazon CloudWatch Logs have a feature to stream CloudWatch Logs data to Amazon Elasticsearch Service via an AWS Lambda function. The AWS Lambda function for log transformation is subscribed to the log group of your application’s AWS Lambda function. The subscription filters for a pattern that matches the one of the garbage collection log entries. The log transformation function processes the log messages and puts it to a search cluster. To make the data easy to digest for the search cluster, you add code to transform and convert the messages to JSON. Having the data in a search cluster, you can visualize it with Kibana dashboards.

Get Started

To start, launch the solution architecture described above as a prepackaged application from the AWS Serverless Application Repository. It contains all resources ready to visualize the garbage collection logs for your Java 11 AWS Lambda functions in a Kibana dashboard. The search cluster consists of a single t2.small.elasticsearch instance with 10GB of EBS storage. It is protected with Amazon Cognito User Pools so you only need to add your user(s). The T2 instance types do not support encryption of data at rest.

Read the source code for the application in the aws-samples repository.

1. Spin up the application from the AWS Serverless Application Repository:

2. As soon as the application is deployed completely, the outputs of the AWS CloudFormation stack provide the links for the next steps. You will find two URLs in the AWS CloudFormation console called createUserUrl and kibanaUrl.

search stack

3. Use the createUserUrl link from the outputs, or navigate to the Amazon Cognito user pool in the console to create a new user in the pool.

a. Enter an email address as username and email. Enter a temporary password of your choice with at least 8 characters.

b. Leave the phone number empty and uncheck the checkbox to mark the phone number as verified.

c. If necessary, you can check the checkboxes to send an invitation to the new user or to make the user verify the email address.

d. Choose Create user.

create user dialog of Amazon Cognito User Pools

4. Access the Kibana dashboard with the kibanaUrl link from the AWS CloudFormation stack outputs, or navigate to the Kibana link displayed in the Amazon Elasticsearch Service console.

a. In Kibana, choose the Dashboard icon in the left menu bar

b. Open the Lambda GC Activity dashboard.

You can test that new events appear by using the Kibana Developer Console:

POST gc-logs-2020.09.03/_doc
{
  "@timestamp": "2020-09-03T15:12:34.567+0000",
  "@gc_type": "Pause Young",
  "@gc_cause": "Allocation Failure",
  "@heap_before_gc": "2",
  "@heap_after_gc": "1",
  "@heap_size_gc": "9",
  "@gc_duration": "5.432",
  "@owner": "123456789012",
  "@log_group": "/aws/lambda/myfunction",
  "@log_stream": "2020/09/03/[$LATEST]123456"
}

5. When you go to the Lambda GC Activity dashboard you can see the new event. You must select the right timeframe with the Show dates link.

Lambda GC activity

The dashboard consists of six tiles:

In the Filters you optionally select the log group and filter for a specific AWS Lambda function execution context by the name of its log stream.
In the GC Activity Count by Execution Context you see a heatmap of all filtered execution contexts by garbage collection activity count.
The GC Activity Metrics display a graph for the metrics for all filtered execution contexts.
The GC Activity Count shows the amount of garbage collection activities that are currently displayed.
The GC Duration show the sum of the duration of all displayed garbage collection activities.
The GC Activity Raw Data at the bottom displays the raw items as ingested into the search cluster for a further drill down.

Configure your AWS Lambda function for garbage collection logging

1. The application that you want to monitor needs to log garbage collection activities. Currently the solution supports logs from Java 11. Add the following environment variable to your AWS Lambda function to activate the logging.

JAVA_TOOL_OPTIONS=-Xlog:gc:stderr:time,tags

The environment variables must reflect this parameter like the following screenshot:

environment variables

2. Go to the streamLogs function in the AWS Lambda console that has been created by the stack, and subscribe it to the log group of the function you want to monitor.

streamlogs function

3. Select Add Trigger.

4. Select CloudWatch Logs as Trigger Configuration.

5. Input a Filter name of your choice.

6. Input "[gc" (including quotes) as the Filter pattern to match all garbage collection log entries.

7. Select the Log Group of the function you want to monitor. The following screenshot subscribes to the logs of the application’s function resize-lambda-ResizeFn-[...].

add trigger

8. Select Add.

9. Execute the AWS Lambda function you want to monitor.

10. Refresh the dashboard in Amazon Elasticsearch Service and see the datapoint added manually before appearing in the graph.

Troubleshooting examples

Let’s look at an example function and draw some useful insights from the Java garbage collection log. The following diagrams show the Sample Amazon S3 function code for Java from the AWS Lambda documentation running in a Java 11 function with 512 MB of memory.

An S3 event from a new uploaded image triggers this function.
The function loads the image from S3, resizes it, and puts the resized version to S3.
The file size of the example image is close to 2.8MB.
The application is called 100 times with a pause of 1 second.

Memory leak

For the demonstration of a memory leak, the function has been changed to keep all source images in memory as a class variable. Hence the memory of the function keeps growing when processing more images:

GC activity metrics

In the diagram, the heap size drops to zero at timestamp 12:34:00. The Amazon CloudWatch Logs of the function reveal an error before the next call to your code in the same AWS Lambda execution context with a fresh JVM:

Java heap space: java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: Java heap space
 at java.desktop/java.awt.image.DataBufferByte.<init>(Unknown Source)
[...]

The JVM crashed and was restarted because of the error. You leverage primarily the Amazon CloudWatch Logs of your function to detect errors. The garbage collection log and its visualization provide additional information for root cause analysis:

Did the JVM run out of memory because a single image to resize was too large?

Or was the memory issue growing over time?

The latter could be an indication that you have a memory leak in your code.

The Heap size is too small

For the demonstration of a heap that was chosen too small, the memory leak from the preceding image has been resolved, but the function was configured to 128MB of memory. From the baseline of the heap to the maximum heap size, there are only approximately 5 MB used.

GC activity metrics

This will result in a high management overhead of your JVM. You should experiment with a higher memory configuration to find the optimal performance also taking cost into account. Check out AWS Lambda power tuning open source tool to do this in an automated fashion.

Finetuning the initial heap size

If you review the development of the heap size at the start of an execution context, this indicates that the heap size is continuously increased. Each heap size change is an expensive operation consuming time of your function. Over time, the heap size is changed as well. The garbage collector logs 502 activities, which take almost 17 seconds overall.

GC activity metrics

This on-demand scaling is useful on a local workstation where the physical memory is shared with other applications. On AWS Lambda, the configured memory is dedicated to your function, so you can use it to its full extent.

You can do so by setting the minimum and maximum heap size to a fixed value by appending the -Xms and -Xmx parameters to the environment variable we introduced before.

The heap is not the only part of the JVM that consumes memory, so you must experiment with this setting and closely monitor the performance.

Start with the heap size that you observe to be working from the garbage collection log. If you set the heap size too large, your function will not initialize at all or break unexpectedly. Remember that the ability to tweak JVM parameters might change with future service features.

Let’s set 400 MB of the 512 MB memory and examine the results:

JAVA_TOOL_OPTIONS=-Xlog:gc:stderr:time,tags -Xms400m -Xmx400m

GC activity metrics

The preceding dashboard shows that the overall garbage collection duration was reduced by about 95%. The garbage collector had 80% fewer activities.

The garbage collection log entries displayed in the dashboard reveal that exclusively minor garbage collection (Pause Young) activities were triggered instead of major garbage collections (Pause Full). This is expected as the images are immediately discarded after the download, resize, upload operation. The effect on the overall function durations of 100 invocations, is a 5% decrease on average in this specific case.

Lambda duration

Cost estimation and clean up

Cost for the processing and transformation of your function’s Amazon CloudWatch Logs incurs when your function is called. This cost depends on your application and how often garbage collection activities are triggered. Read an estimate of the monthly cost for the search cluster. If you do not need the garbage collection monitoring anymore, delete the subscription filter from the log group of your AWS Lambda function(s). Also, delete the stack of the solution above in the AWS CloudFormation console to clean up resources.

Conclusion

In this post, we examined further sources of data to gain insights about the health of your Java application. We also demonstrated a pipeline to ingest, transform, and visualize this information continuously in a Kibana dashboard. As a next step, launch the application from the AWS Serverless Application Repository and subscribe it to your applications’ logs. Feel free to submit enhancements to the application in the aws-samples repository or provide feedback in the comments.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.