Tag Archives: Technical How-to

Custom logging with AWS Batch

2020-10-02 Emma White

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/custom-logging-with-aws-batch/

This post was written by Christian Kniep, Senior Developer Advocate for HPC and AWS Batch.

For HPC workloads, visibility into the logs of jobs is important to debug a job which failed, but also to have insights into a running job and track its trajectory to influence the configuration of the next job or terminate the job because it went off track.

With AWS Batch, customers are able to run batch workloads at scale, reliably and with ease as this managed serves takes out the undifferentiated heavy lifting. The customer can then focus on submitting jobs and getting work done. Customers told us that at a certain scale, the single logging driver available within AWS Batch made it hard to separate logs as they were all ending up in the same log group in Amazon CloudWatch.

With the new release of customer logging driver support, customers are now able to adjust how the job output is logged. Not only customize the Amazon CloudWatch setting, but enable the use of external logging frameworks such as splunk, fluentd, json-files, syslog, gelf, journald.

This allow AWS Batch jobs to use the existing systems they are accustom to, with fine-grained control of the log data for debugging and access control purposes.

In this blog, I show the benefits of custom logging with AWS Batch by adjusting the log targets for jobs. The first example will customize the Amazon CloudWatch log group, the second will log to Splunk, an external logging service.

Example setup

To showcase this new feature, I use the AWS Command Line Interface (CLI) to setup the following:

IAM roles, policies, and profiles to grant access and permissions
A compute environment to provide the compute resources to run jobs
A job queue, which supervises the job execution and schedules jobs on a compute environment
A job definition, which uses a simple job to demonstrate how the new configuration can be applied

Once those tasks are completed, I submit a job and send logs to a customized CloudWatch log-group and Splunk.

Prerequisite

To make things easier, I first set a couple of environment variables to have the information handy for later use. I use the following code to set up the environment variables.

# in case it is not already installed
sudo yum install -y jq 
export MD_URL=http://169.254.169.254/latest/meta-data
export IFACE=$(curl -s ${MD_URL}/network/interfaces/macs/)
export SUBNET_ID=$(curl -s ${MD_URL}/network/interfaces/macs/${IFACE}/subnet-id)
export VPC_ID=$(curl -s ${MD_URL}/network/interfaces/macs/${IFACE}/vpc-id)
export AWS_REGION=$(curl -s ${MD_URL}/placement/availability-zone | sed 's/[a-z]$//')
export AWS_ACCT_ID=$(curl -s ${MD_URL}/identity-credentials/ec2/info |jq -r .AccountId)
export AWS_SG_DEFAULT=$(aws ec2 describe-security-groups \
--filters Name=group-name,Values=default \
|jq -r '.SecurityGroups[0].GroupId')

IAM

When using the AWS Management Console, you must create IAM roles manually.

Trust Policies

IAM Roles are defined to be used by a certain service. In the simplest case, you want a role to be used by Amazon EC2 – the service that provides the compute capacity in the cloud. This defines which entity is able to use an IAM Role, called Trust Policy. To set up a trust policy for an IAM role, use the following code snippet.

cat > ec2-trust-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Service": "ec2.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
  }]
}
EOF

Instance role

With the IAM trust policy, I now create an ecsInstanceRole and attach the pre-defined policy AmazonEC2ContainerServiceforEC2Role. This allows an instance to interact with Amazon ECS.

aws iam create-role --role-name ecsInstanceRole \
 --assume-role-policy-document file://ec2-trust-policy.json
aws iam create-instance-profile --instance-profile-name ecsInstanceProfile
aws iam add-role-to-instance-profile \
    --instance-profile-name ecsInstanceProfile \
    --role-name ecsInstanceRole
aws iam attach-role-policy --role-name ecsInstanceRole \
 --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role

Service Role

The AWS Batch service uses a role to interact with different services. The trust relationship reflects that the AWS Batch service is going to assume this role. You can set up this role with the following logic.

cat > svc-trust-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Service": "batch.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
  }]
}
EOF
aws iam create-role --role-name AWSBatchServiceRole \
--assume-role-policy-document file://svc-trust-policy.json
aws iam attach-role-policy --role-name AWSBatchServiceRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSBatchServiceRole

In addition to dealing with Amazon ECS, the instance role can create and write to Amazon CloudWatch log groups, to control which log group names are used, a condition is attached.

While the compute environment is coming up, let us create and attach a policy to make a new log-group possible.

cat > policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "logs:CreateLogGroup"
    ],
    "Resource": "*",
    "Condition": {
      "StringEqualsIfExists": {
        "batch:LogDriver": ["awslogs"],
        "batch:AWSLogsGroup": ["/aws/batch/custom/*"]
      }
    }
  }]
}
EOF
aws iam create-policy --policy-name batch-awslog-policy \
    --policy-document file://policy.json
aws iam attach-role-policy --policy-arn arn:aws:iam::${AWS_ACCT_ID}:policy/batch-awslog-policy --role-name ecsInstanceRole

At this point, I created the IAM roles and policies so that the instance and service are able to interact with the AWS APIs, including trust-policies to define which services are meant to use them. EC2 for the ecsInstanceRole and the AWSBatchServiceRole for the AWS Batch service itself.

Compute environment

Now, I am going to create a compute environment, which is going to spin up an instance (one vCPU target) to run the example job in.

cat > compute-environment.json << EOF
{
  "computeEnvironmentName": "od-ce",
  "type": "MANAGED",
  "state": "ENABLED",
  "computeResources": {
    "type": "EC2",
    "allocationStrategy": "BEST_FIT_PROGRESSIVE",
    "minvCpus": 1,
    "maxvCpus": 8,
    "desiredvCpus": 1,
    "instanceTypes": ["m5.xlarge"],
    "subnets": ["${SUBNET_ID}"],
    "securityGroupIds": ["${AWS_SG_DEFAULT}"],
    "instanceRole": "arn:aws:iam::${AWS_ACCT_ID}:instance-profile/ecsInstanceRole",
    "tags": {"Name": "aws-batch-compute"},
    "bidPercentage": 0
  },
  "serviceRole": "arn:aws:iam::${AWS_ACCT_ID}:role/AWSBatchServiceRole"
}
EOF
aws batch create-compute-environment --cli-input-json file://compute-environment.json

Once this section is complete, a compute environment is being spun up in the back. This will take a moment. You can use the following command to check on the status of the compute environment.

aws batch describe-compute-environments

Once it is enabled and valid we can continue by setting up the job queue.

Job Queue

Now that I have a compute environment up and running, I will create a job queue which accepts job submissions and schedules the jobs on the compute environment.

cat > job-queue.json << EOF
{
  "jobQueueName": "jq",
  "state": "ENABLED",
  "priority": 1,
  "computeEnvironmentOrder": [{
    "order": 0,
    "computeEnvironment": "od-ce"
  }]
}
EOF
aws batch create-job-queue --cli-input-json file://job-queue.json

Job definition

The job definition is used as a template for jobs. This example runs a plain container and prints the environment variables. With the new release of AWS Batch, the logging driver awslogs now allows you to change the log group configuration within the job definition.

cat > job-definition.json << EOF
{
  "jobDefinitionName": "alpine-env",
  "type": "container",
  "containerProperties": {
  "image": "alpine",
  "vcpus": 1,
  "memory": 128,
  "command": ["env"],
  "readonlyRootFilesystem": true,
  "logConfiguration": {
    "logDriver": "awslogs",
    "options": { 
      "awslogs-region": "${AWS_REGION}", 
      "awslogs-group": "/aws/batch/custom/env-queue",
      "awslogs-create-group": "true"}
    }
  }
}
EOF
aws batch register-job-definition --cli-input-json file://job-definition.json

Job Submission

Using the above job definition, you can now submit a job.

aws batch submit-job \
  --job-name test-$(date +"%F_%H-%M-%S") \
  --job-queue arn:aws:batch:${AWS_REGION}:${AWS_ACCT_ID}:job-queue/jq \
  --job-definition arn:aws:batch:${AWS_REGION}:${AWS_ACCT_ID}:job-definition/alpine-env:1

Now, you can check the ‘Log Group’ in CloudWatch. Go to the CloudWatch console and find the ‘Log Group’ section on the left.

log groups in cloudwatch

Now, click on the log group defined above, and you should see the output of the job which allows for debugging if something within the container went wrong or processing logs and create alarms and reports.

cloudwatch log events

Splunk

Splunk is an established log engine for a broad set of customers. You can use the Docker container to set up a Splunk server quickly. More information can be found in the Splunk documentation. You need to configure the HTTP Event Collector, which provides you with a link and a token.

To send logs to Splunk, create an additional job-definition with the Splunk token and URL. Please adjust the splunk-url and splunk-token to match your Splunk setup.

{
  "jobDefinitionName": "alpine-splunk",
  "type": "container",
  "containerProperties": {
    "image": "alpine",
    "vcpus": 1,
    "memory": 128,
    "command": ["env"],
    "readonlyRootFilesystem": false,
    "logConfiguration": {
      "logDriver": "splunk",
      "options": {
        "splunk-url": "https://<splunk-url>",
        "splunk-token": "XXX-YYY-ZZZ"
      }
    }
  }
}

This forwards the logs to Splunk, as you can see in the following image.

forward to splunk

Conclusion

This blog post showed you how to apply custom logging to AWS Batch using the awslog and Splunk logging driver. While these are two important logging drivers, please head over to the documentation to find out about fluentd, syslog, json-file and other drivers to find the best driver to match your current logging infrastructure.

Field Notes: Gaining Insights into Labeling Jobs for Machine Learning

2020-09-30 Michael Graumann

Post Syndicated from Michael Graumann original https://aws.amazon.com/blogs/architecture/field-notes-gaining-insights-into-labeling-jobs-for-machine-learning/

In an era where more and more data is generated, it becomes critical for businesses to derive value from it. With the help of supervised learning, it is possible to generate models to automatically make predictions or decisions by leveraging historical data. For example, image recognition for self-driving cars, predicting anomalies on X-rays, fraud detection in finance and more. With supervised learning, these models learn from labeled data. The success of those models is highly dependent on readily available, high quality labeled data.

However, you might encounter cases where a high percentage of your pre-existing data is unlabeled. In these situations, providing correct labeling to previously unlabeled data points would directly translate to higher model accuracy.

Amazon SageMaker Ground Truth helps you with exactly that. It lets you build highly accurate training datasets for machine learning quickly. SageMaker Ground Truth provides your labelers with built-in workflows and interfaces for common labeling tasks. This process could take several hours or more depending on the size of your unlabeled dataset, and you might have a need to track the progress easily, preferably in the form of a dashboard.

In this blogpost we show how to gain deep insights into the progress of labeling and the performance of the workers by using Amazon Athena and Amazon QuickSight. We use Amazon Athena former to set up several views with specific insights into the labeling progress. Finally we will reference these views in Amazon QuickSight to visualize the data in a dashboard.

This approach also works for combining multiple AWS services in general. AWS provides many building blocks than you can mix-and-match to create a unique, integrated solution with cohesive insights. In this blog post we use data produced by one service (Ground Truth), prepare it with another (Athena) and visualize with a third (QuickSight). The following diagram shows this architecture.

Solution Architecture

ML Solution Architecture

Mapping a JSON structure to a table structure

Ground Truth creates several directories in your Amazon S3 output path. These directories contain the results of your labeling job and other artifacts of the job. The top-level directory for a labeling job has the same name as your labeling job, while the output directories are placed inside it. We will create all insights from what SageMaker Ground Truth calls worker responses.

All respective JSON files reside in the path s3://bucket/<job-name>/annotations/worker-response/.

To analyze the labeling data with Amazon Athena we need to understand the structure of the underlying JSON files. Let’s review the example below. For each item that was labeled, we see the label itself, followed by the submission time and a workerId pointing to an identity. This identity lives in Amazon Cognito, a fully managed service that provides the user directory for our labelers.

{
    "answers": [
        {
            "answerContent": {
                "crowd-classifier": {
                    "label": "Compute"
                }
            },
            "submissionTime": "2020-03-27T10:31:04.210Z",
            "workerId": "private.eu-west-1.1111111111111111",
            "workerMetadata": {
                "identityData": {
                    "identityProviderType": "Cognito",
                    "issuer": "https://cognito-idp.eu-west-1.amazonaws.com/eu-west-1_111111111",
                    "sub": "11111111-1111-1111-1111-111111111111"
                }
            }
        },
        ...
    ]
}

Although the data is stored in Amazon S3 object storage, we are able to use SQL to access the data by using Amazon Athena. Since we now understand the JSON structure from shown in the preceding code, we use Athena and define how to interpret the data that is relevant to us. We do so by first creating a database using the Athena Query Editor:

CREATE DATABASE analyze_labels_db;

Once inside the database, we add the table schema. The actual files remain on Amazon S3, but using the metadata catalog, Athena then knows where the data lies and how to interpret it. The AWS Glue Data Catalog is a central repository to store structural and operational metadata for all your data assets. For a given dataset, you can store its table definition, physical location, add business relevant attributes, in addition to track how this data has changed over time. Besides, Athena the AWS Glue Data Catalog also provides out-of-box integration with Amazon EMR and Amazon Redshift Spectrum. Once you add your table definitions to the Glue Data Catalog, they are available for ETL. They are also readily available for querying in Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum so that you can have a common view of your data between these services.

When going from JSON to SQL, we are crossing format boundaries. To further facilitate how to read the JSON formatted data we are using SerDe Properties to replace the hyphen in crowd-classifier with an underscore due to DDL constraints. Finally we point the location to our Amazon S3 bucket containing the single worker responses. Recognize in the following script that we translate the nested structure of the JSON file itself into a hierarchical, nested data structure in the schema definition. Also, we could leave out the workerMetadata as we don’t need it at this time. The data would still stay in the files on Amazon S3, so that we could later change and add the workerMetadata STRUCT into the table definition for our analysis.

CREATE EXTERNAL TABLE annotations_raw (
  answers array<
    struct<answercontent: 
      struct<crowd_classifier: 
        struct<label: string>
      >,
      submissionTime: string,
      workerId: string,
      workerMetadata: 
        struct<identityData: 
          struct<identityProviderType: string, issuer: string, sub: string>
        >
    >
  >
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  "mapping.crowd_classifier"="crowd-classifier" 
) 
LOCATION 's3://<YOUR_BUCKET>/<JOB_NAME>/annotations/worker-response/'

Creating Views in Athena

Now, we have nested data in our annotations_raw table. For many use cases, especially for analytical uses, representing data in a tabular fashion—as rows—is more natural. This is also the standard way when using SQL and business intelligence tools. To unnest the hierarchical data into flattened rows, we create the following view which will serve as foundation for the other views we create. For an in-depth look into unnesting data with Amazon Athena, read this blog post.

Some of the information we’re interested in might not be part of the document, but is encoded in the path. We use a trick in Athena by using the $path variable from the Presto Hive Connector. This determines which Amazon S3 file contains data that is returned by a specific row in an Athena table. This way we can find out which data object an annotation belongs to. Since Athena is built on top of Presto, we are able to use Presto’s built-in regexp_extract function to find out the iteration as well as the data object id per labeling result. We also cast the submission time in date format to later determine the labeling progress per day.

CREATE OR REPLACE VIEW annotations_view AS
SELECT 
  regexp_extract("$path", 'iteration-[0-9]*') as iteration,
  regexp_extract("$path", '(iteration-[0-9]*\/([0-9]*))',2) as dataRecord,
  answer.answercontent.crowd_classifier.label,
  cast(from_iso8601_timestamp(answer.submissionTime) as timestamp) as submissionTime,
  cast(from_iso8601_timestamp(answer.submissionTime) as date) as submissionDay,
  answer.workerId,
  answer.workerMetadata.identityData.identityProviderType,
  answer.workerMetadata.identityData.issuer,
  answer.workerMetadata.identityData.sub,
  "$path" path
FROM 
  annotations_raw
CROSS JOIN UNNEST(answers) AS t(answer)

This view, annotations_view, will be the starting point for the other views we will be creating in further in this post.

Visualizing with QuickSight

In this section, we explore a way to visualize the views we build in Athena by pointing Amazon QuickSight to the respective view. Amazon QuickSight lets you create and publish interactive dashboards that include ML Insights. Dashboards can then be accessed from any device, and embedded into your applications, portals, and websites.

Thanks to the tight integration between Athena and QuickSight, we are able to map one dataset in QuickSight to one Athena view. In order to further optimize the performance of the dashboard, we can optionally import the datasets into the in-memory optimized calculation engine for Amazon QuickSight called SPICE. With the datasets in place we can now create an analysis in order to interact with the visuals we’re going to add. You can think of an analysis as a container for a set of related visuals. You can use multiple datasets in an analysis, although any given visual can only use one of those datasets. After you create an analysis and an initial visual, you can expand the analysis. You can do this for example by adding datasets and visuals.

Let’s start with our first insight.

Annotations per worker

We’d like to gain insights not only into the total number of labeled items but also on the level of contributions of each individual workers. This could give us an indication whether the labels were created by a diverse crowd of labelers or by a few productive ones. A largely disproportionate amount of contributions from a handful of workers who may have brought along their biases.

SageMaker Ground Truth calls labeled data objects annotations, which is the result of a single workers labeling task.

Luckily we encapsulated all the heavy lifting of format conversion in the annotations_view, so that it is now easy to create a view for the annotations per user:

CREATE OR REPLACE VIEW annotations_per_user AS
SELECT COUNT(sub) AS LabeledItems,
sub AS User
FROM annotations_view
GROUP BY sub
ORDER BY LabeledItems DESC

Next we visualize this view in QuickSight. We add a visual to our analysis, select the respective dataset for the view and use the AutoGraph feature, which chooses the most appropriate visual type. Since we already arranged our view in Athena by the number of labeled items in descending order, there is no need now to sort the data in QuickSight. In the following screenshot, worker c4ef78e4... contributed more labels compared to their peers.

Annotations per worker

This view gives you an indicator to check for a bias that the leading worker might have brought along.

Annotations per label

One thing we want to be aware of is potential imbalances between classes in our dataset. Especially simple machine learning models, which may learn to frequently predict a label that is massively over represented in the dataset. If we can identify an imbalance, we can apply mitigation actions such as upsampling data of underrepresented classes. With the following view we list the total number of annotations per label.

CREATE OR REPLACE VIEW annotations_per_label AS
SELECT Count(dataRecord) AS TotalLabels, label As Label 
FROM annotations_view
GROUP BY label
ORDER BY TotalLabels DESC, Label;

As before, we create a dataset in QuickSight pointing to the annotations_per_label view, open the analysis, add a new visual and leverage the AutoGraph functionality. The result is the following visual representation:

Annotations per worker 2

One can clearly see that the Analytics & AI/ML class is massively underrepresented. At this point, you might want to try getting more data or think about upsampling data for that class.

Annotations per day

Seeing the total number of annotations per label and per worker is good, but we are also interested in how the labeling progress changes over time. This way we might see spikes related to labeller activations. We can also or estimate how long it takes to reach a certain goal of annotations given the current pace. For this purpose we create the following view aggregating the total annotations per day.

CREATE OR REPLACE VIEW annotations_per_day AS
SELECT COUNT(datarecord) AS LabeledItems,
submissionDay
FROM annotations_view
GROUP BY submissionDay
ORDER BY submissionDay, LabeledItems DESC

This time the QuickSight AutoGraph provides us with the following line chart. You might have noticed that the axis labels do not match the column names in Athena. That is because we renamed them in QuickSight for better readability.

Total annotations per day

In the preceding chart we see that there is no consistent pace of labeling, which makes it hard to predict when a certain amount of labeled data will be reached. In this example, after starting strong the progress immediately went down. Knowing this, we might want to take action into motivating our workers to contribute more and validate the effectiveness of these actions with the help of this chart. The spikes indicate an effective short-term action.

Distribution of total annotations by user

We already have insights into annotations per worker, per label and per day. Let us now now see what insights we can get from aggregating some of this information.

The bigger your labeling workforce gets, the harder it can become to see the whole picture. For that reason we will now create a histogram consisting of five buckets. Each bucket represents an interval of total annotations (for example, 0-25 annotations) mapped to the number of users whose amount of total annotations lies in that interval. This allows us to get a sense of what kind of bias might be introduced by the majority of annotations being contributed by a small amount of workers.

To do that, we use the Presto function width_bucket which returns the number of labeled data objects according to the five buckets we defined with a size of 25 each. We define these buckets by creating an Array with 5 elements that specify the boundaries.

CREATE OR REPLACE VIEW users_per_bucket_annotations AS
SELECT 
bucket,numberOfUsers,
CASE
   WHEN bucket=5 THEN 'B' || cast(bucket AS VARCHAR(10)) || ': ' || cast(((bucket-1) * 25) AS VARCHAR(10)) || '+'
   ELSE 'B' || cast(bucket AS VARCHAR(10)) || ': ' || cast(((bucket-1) * 25) AS VARCHAR(10)) || '-' || cast((bucket * 25) AS VARCHAR(10))
END AS NumberOfAnnotations
FROM
(SELECT width_bucket(labeleditems,ARRAY[0,25,50,75,100]) AS bucket,
 count(user) AS numberOfUsers
FROM annotations_per_user
GROUP BY 1
ORDER BY bucket)

A SELECT * FROM users_per_bucket_annotations produces the following result:

A SELECT FROM users_per_bucket_annotations

Let’s now investigate the same data via QuickSight:

Annotations per User in buckets of Size 25

Now that we can look at the data visually it becomes clear that we have a bimodal distribution, with many labelers having done very little, and many labelers doing quite a lot. This may warrant interviewing some labelers to find out if there is something holding back users from progressing, or if we can keep engagement high over time.

Putting it all together in QuickSight

Since we created all previous visuals into one analysis, we can now utilize it as a central place to consume our insights in a user-friendly way. Moreover, we can share our insights with others as a read-only snapshot which QuickSight calls a dashboard. User who are dashboard viewers can view and filter the dashboard data as below:

Groundtruth dashboard

Furthermore, you can generate a report and let QuickSight send it either once or on a schedule (daily, weekly or monthly) to your peers. This way users do not have to sign in and they can get reminders to check the progress of the labeling job. Lastly, sending out those reports is an opportunity to stay in touch with the labelers and keep the engagement high.

Conclusion

In this blogpost, we have shown one example of combining multiple AWS services in order to build a solution tailored to your needs. We took the Amazon S3 output generated by SageMaker Ground Truth and showed how it can be further processed and analyzed with Athena. Finally, we created a central place to consume our insights in a user-friendly way with QuickSight. By putting it all together in a dashboard we were able to share our insights with our peers.

You can take the same pattern and apply it to other situations: take some of the many building blocks AWS provides and mix-and-match them to create a unique, integrated solution with cohesive insights just as we did with Ground Truth, Athena, and QuickSight.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Monitoring the Java Virtual Machine Garbage Collection on AWS Lambda

2020-09-23 Steffen Grunwald

Post Syndicated from Steffen Grunwald original https://aws.amazon.com/blogs/architecture/field-notes-monitoring-the-java-virtual-machine-garbage-collection-on-aws-lambda/

When you want to optimize your Java application on AWS Lambda for performance and cost the general steps are: Build, measure, then optimize! To accomplish this, you need a solid monitoring mechanism. Amazon CloudWatch and AWS X-Ray are well suited for this task since they already provide lots of data about your AWS Lambda function. This includes overall memory consumption, initialization time, and duration of your invocations. To examine the Java Virtual Machine (JVM) memory you require garbage collection logs from your functions. Instances of an AWS Lambda function have a short lifecycle compared to a long-running Java application server. It can be challenging to process the logs from tens or hundreds of these instances.

In this post, you learn how to emit and collect data to monitor the JVM garbage collector activity. Having this data, you can visualize out-of-memory situations of your applications in a Kibana dashboard like in the following screenshot. You gain actionable insights into your application’s memory consumption on AWS Lambda for troubleshooting and optimization.

The lifecycle of a JVM application on AWS Lambda

Let’s first revisit the lifecycle of the AWS Lambda Java runtime and its JVM:

A Lambda function is invoked.
AWS Lambda launches an execution context. This is a temporary runtime environment based on the configuration settings you provide, like permissions, memory size, and environment variables.
AWS Lambda creates a new log stream in Amazon CloudWatch Logs for each instance of the execution context.
The execution context initializes the JVM and your handler’s code.

You typically see the initialization of a fresh execution context when a Lambda function is invoked for the first time, after it has been updated, or it scales up in response to more incoming events.

AWS Lambda maintains the execution context for some time in anticipation of another Lambda function invocation. In effect, the service freezes the execution context after a Lambda function completes. It thaws the execution context when the Lambda function is invoked again if AWS Lambda chooses to reuse it.

During invocations, the JVM also maintains garbage collection as usual. Outside of invocations, the JVM and its maintenance processes like garbage collection are also frozen.

Garbage collection and indicators for your application’s health

The purpose of JVM garbage collection is to clean up objects in the JVM heap, which is the space for an application’s objects. It finds objects which are unreachable and deletes them. This frees heap space for other objects.

You can make the JVM log garbage collection activities to get insights into the health of your application. One example for this is the free heap after each garbage collection. If this metric keeps shrinking, it is an indicator for a memory leak – eventually turning into an OutOfMemoryError. If there is not enough of free heap, the JVM might be too busy with garbage collection instead of running your application code. Otherwise, a heap that is too big does indicate that there’s potential to decrease the memory configuration of your AWS Lambda function. This keeps garbage collection pauses low and provides a consistent response time.

The garbage collection logging can be configured via an environment variable as part of the AWS Lambda function configuration. The environment variable JAVA_TOOL_OPTIONS is considered by both the Java 8 and 11 JVMs. You use it to pass options that you would usually add to the command line when launching the JVM. The options to configure garbage collection logging and the output is specific to the Java version.

Java 11 uses the Unified Logging System (JEP 158 and JEP 271) which has been introduced in Java 9. Logging can be configured with the environment variable:

JAVA_TOOL_OPTIONS=-Xlog:gc+metaspace,gc+heap,gc:stdout:time,tags

The Serial Garbage Collector will output the logs:

[<TIMESTAMP>][gc] GC(4) Pause Full (Allocation Failure) 9M->9M(11M) 3.941ms (D)
[<TIMESTAMP>][gc,heap] GC(3) DefNew: 3063K->234K(3072K) (A)
[<TIMESTAMP>][gc,heap] GC(3) Tenured: 6313K->9127K(9152K) (B)
[<TIMESTAMP>][gc,metaspace] GC(3) Metaspace: 762K->762K(52428K) (C)
[<TIMESTAMP>][gc] GC(3) Pause Young (Allocation Failure) 9M->9M(21M) 23.559ms (D)

Prior to Java 9, including Java 8, you configure the garbage collection logging as follows:

JAVA_TOOL_OPTIONS=-XX:+PrintGCDetails -XX:+PrintGCDateStamps

The Serial garbage collector output in Java 8 is structured differently:

<TIMESTAMP>: [GC (Allocation Failure)
    <TIMESTAMP>: [DefNew: 131042K->131042K(131072K), 0.0000216 secs] (A)
    <TIMESTAMP>: [Tenured: 235683K->291057K(291076K), 0.2213687 secs] (B)
    366725K->365266K(422148K), (D)
    [Metaspace: 3943K->3943K(1056768K)], (C)
    0.2215370 secs]
    [Times: user=0.04 sys=0.02, real=0.22 secs]
<TIMESTAMP>: [Full GC (Allocation Failure)
    <TIMESTAMP>: [Tenured: 297661K->36658K(297664K), 0.0434012 secs] (B)
    431575K->36658K(431616K), (D)
    [Metaspace: 3943K->3943K(1056768K)], 0.0434680 secs] (C)
    [Times: user=0.02 sys=0.00, real=0.05 secs]

Independent of the Java version, the garbage collection activities are logged to standard out (stdout) or standard error (stderr). Logs appear in the AWS Lambda function’s log stream of Amazon CloudWatch Logs. The log contains the size of memory used for:

A: the young generation
B: the old generation
C: the metaspace
D: the entire heap

The notation is before-gc -> after-gc (committed heap). Read the JVM Garbage Collection Tuning Guide for more details.

Visualizing the logs in Amazon Elasticsearch Service

It is hard to fully understand the garbage collection log by just reading it in Amazon CloudWatch Logs. You must visualize it to gain more insight. This section describes the solution to achieve this.

Solution Overview

Java Solution Overview

Amazon CloudWatch Logs have a feature to stream CloudWatch Logs data to Amazon Elasticsearch Service via an AWS Lambda function. The AWS Lambda function for log transformation is subscribed to the log group of your application’s AWS Lambda function. The subscription filters for a pattern that matches the one of the garbage collection log entries. The log transformation function processes the log messages and puts it to a search cluster. To make the data easy to digest for the search cluster, you add code to transform and convert the messages to JSON. Having the data in a search cluster, you can visualize it with Kibana dashboards.

Get Started

To start, launch the solution architecture described above as a prepackaged application from the AWS Serverless Application Repository. It contains all resources ready to visualize the garbage collection logs for your Java 11 AWS Lambda functions in a Kibana dashboard. The search cluster consists of a single t2.small.elasticsearch instance with 10GB of EBS storage. It is protected with Amazon Cognito User Pools so you only need to add your user(s). The T2 instance types do not support encryption of data at rest.

Read the source code for the application in the aws-samples repository.

1. Spin up the application from the AWS Serverless Application Repository:

2. As soon as the application is deployed completely, the outputs of the AWS CloudFormation stack provide the links for the next steps. You will find two URLs in the AWS CloudFormation console called createUserUrl and kibanaUrl.

search stack

3. Use the createUserUrl link from the outputs, or navigate to the Amazon Cognito user pool in the console to create a new user in the pool.

a. Enter an email address as username and email. Enter a temporary password of your choice with at least 8 characters.

b. Leave the phone number empty and uncheck the checkbox to mark the phone number as verified.

c. If necessary, you can check the checkboxes to send an invitation to the new user or to make the user verify the email address.

d. Choose Create user.

create user dialog of Amazon Cognito User Pools

4. Access the Kibana dashboard with the kibanaUrl link from the AWS CloudFormation stack outputs, or navigate to the Kibana link displayed in the Amazon Elasticsearch Service console.

a. In Kibana, choose the Dashboard icon in the left menu bar

b. Open the Lambda GC Activity dashboard.

You can test that new events appear by using the Kibana Developer Console:

POST gc-logs-2020.09.03/_doc
{
  "@timestamp": "2020-09-03T15:12:34.567+0000",
  "@gc_type": "Pause Young",
  "@gc_cause": "Allocation Failure",
  "@heap_before_gc": "2",
  "@heap_after_gc": "1",
  "@heap_size_gc": "9",
  "@gc_duration": "5.432",
  "@owner": "123456789012",
  "@log_group": "/aws/lambda/myfunction",
  "@log_stream": "2020/09/03/[$LATEST]123456"
}

5. When you go to the Lambda GC Activity dashboard you can see the new event. You must select the right timeframe with the Show dates link.

Lambda GC activity

The dashboard consists of six tiles:

In the Filters you optionally select the log group and filter for a specific AWS Lambda function execution context by the name of its log stream.
In the GC Activity Count by Execution Context you see a heatmap of all filtered execution contexts by garbage collection activity count.
The GC Activity Metrics display a graph for the metrics for all filtered execution contexts.
The GC Activity Count shows the amount of garbage collection activities that are currently displayed.
The GC Duration show the sum of the duration of all displayed garbage collection activities.
The GC Activity Raw Data at the bottom displays the raw items as ingested into the search cluster for a further drill down.

Configure your AWS Lambda function for garbage collection logging

1. The application that you want to monitor needs to log garbage collection activities. Currently the solution supports logs from Java 11. Add the following environment variable to your AWS Lambda function to activate the logging.

JAVA_TOOL_OPTIONS=-Xlog:gc:stderr:time,tags

The environment variables must reflect this parameter like the following screenshot:

environment variables

2. Go to the streamLogs function in the AWS Lambda console that has been created by the stack, and subscribe it to the log group of the function you want to monitor.

streamlogs function

3. Select Add Trigger.

4. Select CloudWatch Logs as Trigger Configuration.

5. Input a Filter name of your choice.

6. Input "[gc" (including quotes) as the Filter pattern to match all garbage collection log entries.

7. Select the Log Group of the function you want to monitor. The following screenshot subscribes to the logs of the application’s function resize-lambda-ResizeFn-[...].

add trigger

8. Select Add.

9. Execute the AWS Lambda function you want to monitor.

10. Refresh the dashboard in Amazon Elasticsearch Service and see the datapoint added manually before appearing in the graph.

Troubleshooting examples

Let’s look at an example function and draw some useful insights from the Java garbage collection log. The following diagrams show the Sample Amazon S3 function code for Java from the AWS Lambda documentation running in a Java 11 function with 512 MB of memory.

An S3 event from a new uploaded image triggers this function.
The function loads the image from S3, resizes it, and puts the resized version to S3.
The file size of the example image is close to 2.8MB.
The application is called 100 times with a pause of 1 second.

Memory leak

For the demonstration of a memory leak, the function has been changed to keep all source images in memory as a class variable. Hence the memory of the function keeps growing when processing more images:

GC activity metrics

In the diagram, the heap size drops to zero at timestamp 12:34:00. The Amazon CloudWatch Logs of the function reveal an error before the next call to your code in the same AWS Lambda execution context with a fresh JVM:

Java heap space: java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: Java heap space
 at java.desktop/java.awt.image.DataBufferByte.<init>(Unknown Source)
[...]

The JVM crashed and was restarted because of the error. You leverage primarily the Amazon CloudWatch Logs of your function to detect errors. The garbage collection log and its visualization provide additional information for root cause analysis:

Did the JVM run out of memory because a single image to resize was too large?

Or was the memory issue growing over time?

The latter could be an indication that you have a memory leak in your code.

The Heap size is too small

For the demonstration of a heap that was chosen too small, the memory leak from the preceding image has been resolved, but the function was configured to 128MB of memory. From the baseline of the heap to the maximum heap size, there are only approximately 5 MB used.

GC activity metrics

This will result in a high management overhead of your JVM. You should experiment with a higher memory configuration to find the optimal performance also taking cost into account. Check out AWS Lambda power tuning open source tool to do this in an automated fashion.

Finetuning the initial heap size

If you review the development of the heap size at the start of an execution context, this indicates that the heap size is continuously increased. Each heap size change is an expensive operation consuming time of your function. Over time, the heap size is changed as well. The garbage collector logs 502 activities, which take almost 17 seconds overall.

GC activity metrics

This on-demand scaling is useful on a local workstation where the physical memory is shared with other applications. On AWS Lambda, the configured memory is dedicated to your function, so you can use it to its full extent.

You can do so by setting the minimum and maximum heap size to a fixed value by appending the -Xms and -Xmx parameters to the environment variable we introduced before.

The heap is not the only part of the JVM that consumes memory, so you must experiment with this setting and closely monitor the performance.

Start with the heap size that you observe to be working from the garbage collection log. If you set the heap size too large, your function will not initialize at all or break unexpectedly. Remember that the ability to tweak JVM parameters might change with future service features.

Let’s set 400 MB of the 512 MB memory and examine the results:

JAVA_TOOL_OPTIONS=-Xlog:gc:stderr:time,tags -Xms400m -Xmx400m

GC activity metrics

The preceding dashboard shows that the overall garbage collection duration was reduced by about 95%. The garbage collector had 80% fewer activities.

The garbage collection log entries displayed in the dashboard reveal that exclusively minor garbage collection (Pause Young) activities were triggered instead of major garbage collections (Pause Full). This is expected as the images are immediately discarded after the download, resize, upload operation. The effect on the overall function durations of 100 invocations, is a 5% decrease on average in this specific case.

Lambda duration

Cost estimation and clean up

Cost for the processing and transformation of your function’s Amazon CloudWatch Logs incurs when your function is called. This cost depends on your application and how often garbage collection activities are triggered. Read an estimate of the monthly cost for the search cluster. If you do not need the garbage collection monitoring anymore, delete the subscription filter from the log group of your AWS Lambda function(s). Also, delete the stack of the solution above in the AWS CloudFormation console to clean up resources.

Conclusion

In this post, we examined further sources of data to gain insights about the health of your Java application. We also demonstrated a pipeline to ingest, transform, and visualize this information continuously in a Kibana dashboard. As a next step, launch the application from the AWS Serverless Application Repository and subscribe it to your applications’ logs. Feel free to submit enhancements to the application in the aws-samples repository or provide feedback in the comments.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Creating an EC2 instance in the AWS Wavelength Zone

2020-09-22 Bala Thekkedath

Post Syndicated from Bala Thekkedath original https://aws.amazon.com/blogs/compute/creating-an-ec2-instance-in-the-aws-wavelength-zone/

Creating an EC2 instance in the AWS Wavelength Zone

This blog post is contributed by Saravanan Shanmugam, Lead Solution Architect, AWS Wavelength

AWS announced Wavelength at re:Invent 2019 in partnership with Verizon in US, SK Telecom in South Korea, KDDI in Japan, and Vodafone in UK and Europe. Following the re:Invent 2019 announcement, on August 6, 2020, AWS announced GA of one Wavelength Zone with Verizon in Boston connected to US East (N.Virginia) Region and one in San Francisco connected to the US West (Oregon) Region.

In this blog, I walk you through the steps required to create an Amazon EC2 instance in an AWS Wavelength Zone from the AWS Management console. We also address the questions asked by our customers regarding the different protocol traffic allowed into and out of a AWS Wavelength Zones.

Customers who want to access AWS Wavelength Zones and deploy their applications to the Wavelength Zone can sign up using this link. Customers that opted in to access the AWS Wavelength Zone can confirm the status on the EC2 console Account Attribute section as shown in the following image.

Services and features

AWS Wavelength Zones are Availability Zones inside the Carrier Service Provider network closer to the Edge of the Mobile Network. Wavelength Zones bring the AWS core compute and storage services like Amazon EC2 and Amazon EBS that can be used by other services like Amazon EKS and Amazon ECS. We look at Wavelength Zone(s) as a hub and spoke model, where developers can deploy latency sensitive, high-bandwidth applications at the Edge and non-latency sensitive and data persistent applications in the Region.

Wavelength Zones supports three Nitro based Amazon EC2 instance types t3 (t3.medium, t3.xlarge) r5 (r5.2xlarge) and g4 (g4dn.2xlarge) with EBS volume types gp2. Customers can also use Amazon ECS and Amazon EKS to deploy container applications at the Edge. Other AWS Services, like AWS CloudFormation templates, CloudWatch, IAM resources, and Organizations, continue to work as expected, providing you a consistent experience. You can also leverage the full suite of services like Amazon S3 in the parent Region over AWS’s private network backbone. Now that we have reviewed AWS wavelength, the services and features associated with it, let us talk about the steps to launch an EC2 instance in the AWS Wavelength zone.

Creating a Subnet in the Wavelength Zone

Once the Wavelength Zone is enabled for your AWS Account, you can extend your existing VPC from the parent Region to a Wavelength Zone by creating a new VPC subnet assigned to the AWS Wavelength Zone. Customers can also create a new VPC and then a Subnet to deploy their applications in the Wavelength zone. The following image shows the Subnet creation step, where you pick the Wavelength Zone as the Availability zone for the subnet

Carrier Gateway

We have introduced a new gateway type called Carrier Gateway, which allows you to route traffic from the Wavelength Zone subnet to the CSP network and to the Internet. Carrier Gateways are similar to the Internet gateway in the Region. Carrier Gateway is also responsible for NAT’ing the traffic from/to the Wavelength Zone subnets mapping it to the carrier ip address assigned to the instances.

Creating a Carrier Gateway

In the VPC console, you can now create Carrier Gateway and attach it to your VPC.

You select the VPC to which the Carrier Gateway must be attached. There is also option to select “Route subnet traffic to the Carrier Gateway” in the Carrier Gateway creation step. By selecting this option, you can pick the Wavelength subnets you want to default route to the Carrier Gateway. This option automatically deletes the existing route table to the subnets, creates a new route table, creates a default route entry, and attaches the new route table to the Subnets you selected. The following picture captures the necessary input required while creating a Carrier Gateway

Creating an EC2 instance in a Wavelength Zone with Private IP Address

Once a VPC subnet is created for the AWS Wavelength Zone, you can launch an EC2 instance with a Private address using the EC2 Launch Wizard. In the configure instance details step, you can select the Wavelength Zone Subnet that you created in the “Creating a Subnet” section.

Attach a IAM profile with SSM role included, which allows you to SSH into the console of the instance through SSM. This is a recommended practice for Wavelength Zone instances as there is no direct SSH access allowed from Public internet.

Creating an EC2 instance in a Wavelength Zone with Carrier IP Address

The instances running in the Wavelength Zone subnets can obtain a Carrier IP address, which is allocated from a pool of IP addresses called Network Border group (NBG). To create an EC2 instance in the Wavelength Zone with a carrier routable IP address, you can use AWS CLI. You can use the following command to create EC2 instance in a Wavelength Zone subnet. Note the additional network interface (NIC) option “AssociateCarrierIpAddress: as part of the EC2 run instance command, as shown in the following command.

aws ec2 --region us-west-2 run-instances --network-interfaces '[{"DeviceIndex":0, "AssociateCarrierIpAddress": true, "SubnetId": "<subnet-0d3c2c317ac4a262a>"}]' --image-id <ami-0a07be880014c7b8e> --instance-type t3.medium --key-name <san-francisco-wavelength-sample-key>

*To use “AssociateCarrierIpAddress” option in the ec2 run-instance command use the latest aws cli v2.

The carrier IP assigned to the EC2 instance can be obtained by running the following command.

aws ec2 describe-instances --instance-ids <replace-with-your-instance-id> --region us-west-2

Make necessary changes to the default security group that is attached to the EC2 instance after running the run-instance command to allow the necessary protocol traffic. If you allow ICMP traffic to your EC2 instance, you can test ICMP connectivity to your instance from the public internet.

The different protocols allowed in and out of the Wavelength Zone are captured in the following table.

TCP Connection FROM	TCP Connection TO	Result*
Region Zones	WL Zones	Allowed
Wavelength Zones	Region	Allowed
Wavelength Zones	Internet	Allowed
Internet (TCP SYN)	WL Zones	Blocked
Internet (TCP EST)	WL Zones	Allowed
Wavelength Zones	UE (Radio)	Allowed
UE(Radio)	WL Zones	Allowed

UDP Packets FROM	UDP Packets TO	Result*
Wavelength Zones	WL Zones	Allowed
Wavelength Zones	Region	Allowed
Wavelength Zones	Internet	Allowed
Internet	WL	Blocked
Wavelength Zones	UE (Radio)	Allowed
UE(Radio)	WL Zones	Allowed

ICMP FROM	ICMP TO	Result*
Wavelength Zones	WL Zones	Allowed
Wavelength Zones	Region	Allowed
Wavelength Zones	Internet	Allowed
Internet	WL	Allowed
Wavelength Zones	UE (Radio)	Allowed
UE(Radio)	WL Zones	Allowed

Conclusion

We have covered how to create and run an EC2 instance in the AWS Wavelength Zone, the core foundation for application deployments. We will continue to publish blogs helping customers to create ECS and EKS clusters in the AWS Wavelength Zones and deploy container applications at the Mobile Carriers Edge. We are really looking forward to seeing what all you can do with them. AWS would love to get your advice on additional local services/features or other interesting use cases, so feel free to leave us your comments!

Field Notes: Requirements for Successfully Installing CloudEndure

2020-09-10 Daniel Covey

Post Syndicated from Daniel Covey original https://aws.amazon.com/blogs/architecture/field-notes-requirements-for-successfully-installing-cloudendure/

Customers have been using CloudEndure for their Migration and Disaster Recovery needs for many years. In 2019, CloudEndure was acquired by AWS, and provided the licensing for CloudEndure to all of their users free of charge for migration. During this time, AWS has identified the requirements for replication to complete successfully after initial agent install. Customers can use the following tips to facilitate a smooth transition to AWS.

In this blog, we look at four sections of the CloudEndure configuration process required for a successful installation:

CloudEndure Port configuration
CloudEndure JSON Policy Options
CloudEndure Staging Area Configuration
CloudEndure Configuration for Proxies

Required CloudEndure Ports

There are 2 required ports that CloudEndure uses. TCP Ports 1500 and 443 have particular configurations based on source or staging area. TCP 1500 is used for replication of data, and 443 is for agent communication with the CloudEndure Console.

Architecture Overview

The following graphic is a high level overview of the required ports for CloudEndure, both from the source infrastructure, and the staging subnet you will be replicating to.

network architecture

Steps

Is 443 outbound open to console.cloudendure.com on the source infrastructure?

Check OS level firewall
Check proxy settings
Ensure there is no SSL intercept or Deep Packet Inspection being done to packets from that machine

2. Is 443 outbound open to console.cloudendure.com in the AWS Security Group assigned to the replication subnet?

Check no NACLs are in place to prevent SSL traffic outbound from the subnet
Check the machines on this subnet can reach the EC2 endpoint for the region.
If you have any restrictions on accessing Amazon S3 buckets, you can have CloudEndure use a CloudFront distribution instead.
Review The CloudEndure documentation for how to do this

3. Is 1500 outbound open to the Staging Subnet from the source infrastructure?

Check OS level firewall
Check proxy settings

4. Is 1500 inbound from the source infrastructure open on the Security Group assigned to the replication subnet?

Check no NACLs are in place preventing traffic.

CloudEndure JSON Policies

CloudEndure uses one of these JSON policies attached to IAM Users. These policies give CloudEndure specific access to your AWS account resources. This launches specific resources needed to ensure the tool is working properly. CloudEndure JSON policies use tag filtering to limit the creation and deletion of resources.

For the JSON policy, CloudEndure expects a specific set of permissions, even in the case where we may not be using them. CloudEndure does a policy check first, to ensure all permissions are available. It is not recommended to change the JSON policies, as it can cause CloudEndure to fail initial replication configuration. Use one of the following three policies.

AWS to AWS

Best policy to use if you are doing Inter-AWS replication, such as Region-to-Region, or AZ-to-AZ replication

Default

Default JSON policy. Allows for access to any of the resources needed by CloudEndure

Tagging based

A more restrictive policy, for customers that need a more secure solution.

Staging Area Configuration

CloudEndure replicates to a “Staging Area”, where you control the Replication Server and the AWS EBS volumes attached to that server. You define which VPC and Subnet you want CloudEndure to replicate to, with the following considerations.

staging area

Default Subnet

- You designate your specific AWS Subnet to use for replication here. Leaving the option as “Default” uses the default subnet for the VPC, which is usually deleted by customers when first configuring their VPCs.

2. Default Security group

- This is often created by the Cloudendure tool, cannot be changed, and will be added if replication disconnects. Any changes made to this SG will be reverted back to default rules.
- If utilizing a proxy, it is advised to add a Security Group that also allows access to the proxy

Proxy Servers

Some customers utilize Proxy servers within their environment. Review the following guidance regarding specific changes to configurations within your environment needed for CloudEndure to operate effectively.

Make sure to set proxy in replication settings

- This can be either IP address, or an FQDN

2. Note the following for either Windows or Linux

- Windows – CloudEndure agent runs as System, so please ensure the System account is part of the allow list in the proxy.
- Linux – CloudEndure Agent creates a linux user to run commands (named cloudendure), so this user will need to be part of the allow list in the proxy

3. Make sure environment variables are set for the machines

Windows Steps
- Control Panel > System and Security > System > Advanced system settings.
- In the Advanced Tab of the System Properties dialog box, select Environment Variables
- On the System Variables section of the Environment Variables pane, select New to add the https_proxy environment variable or Edit if the variable already exists.
- Enter https://PROXY_ADDR:PROXY_PORT/ in the Variable value field. Select OK.
- If the agent was already installed, restart the service
Alternatively, you can open CMD as Administrator and enter the following command:

setx https_proxy https://<proxy ip>:<proxy port>/ /m

Linux Steps
- Complete one of the following lines in the terminal
- $ export http_proxy=http://server-ip:port/
- $ export http_proxy=http://127.0.0.1:3128/
- $ export http_proxy=http://proxy-server.mycorp.com:3128/ Please note to include the trailing /

Cleaning Up

After you have finished utilizing the CloudEndure tool, remove any resources you may no longer need.

Conclusion

In conclusion, I have showed how best to prepare your environment for installation of the CloudEndure tool. CloudEndure is utilized to protect your business, and mitigate downtime during your move to the cloud. By following the preceding steps, you set up the configuration for success. Visit the AWS landing page for CloudEndure, to get a deeper understanding of the tool, get started with CloudEndure, or take the free online technical training. Should you need assistance with other configurations, visit the CloudEndure Documentation Library, which includes every aspect of the tooling, as well as a helpful FAQ.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Using Lambda layers to simplify your development process

2020-09-08 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-lambda-layers-to-simplify-your-development-process/

Serverless developers frequently import libraries and dependencies into their AWS Lambda functions. While you can zip these dependencies as part of the build and deployment process, in many cases it’s easier to use layers instead. In this post, I explain how layers work, and how you can build and include layers in your own applications.

This blog post references the Happy Path application, which shows how to build a flexible backend to a photo-processing web application. To learn more, refer to Using serverless backends to iterate quickly on web apps – part 1. This code in this post is available at this GitHub repo.

Overview of Lambda layers

A Lambda layer is an archive containing additional code, such as libraries, dependencies, or even custom runtimes. When you include a layer in a function, the contents are extracted to the /opt directory in the execution environment. You can include up to five layers per function, which count towards the standard Lambda deployment size limits.

Layers are deployed as immutable versions, and the version number increments each time you publish a new layer. When you include a layer in a function, you specify the layer version you want to use. Layers are automatically set as private, but they can be shared with other AWS accounts, or shared publicly. Permissions only apply to a single version of a layer.

Using layers can make it faster to deploy applications with the AWS Serverless Application Model (AWS SAM) or the Serverless framework. By moving runtime dependencies from your function code to a layer, this can help reduce the overall size of the archive uploaded during a deployment.

Creating a layer containing the AWS SDK

The AWS SDK allows you to interact programmatically with AWS services using one of the supported runtimes. The Lambda service includes the AWS SDK so you can use it without explicitly importing in your deployment package.

However, there is no guarantee of the version provided in the execution environment. The SDK is upgraded frequently to support new AWS services and features. As a result, the version may change at any time. You can see the current version used by Lambda by declaring an instance of the SDK and logging out the version method:

For production workloads, it’s best practice to lock the version of the AWS SDK used in your functions. You can achieve this by including the SDK with your code package. Once you include this library, your code always uses the version in the deployment package and not the version included in the Lambda service.

A serverless application may consist of many functions, which all use a common SDK version. Instead of bundling the SDK with each function deployment, you can create a layer containing the SDK. The effect of this is to reduce the size of the uploaded archive, which makes your deployments faster.

To create an AWS SDK layer:

First, clone this blog post’s GitHub repo. From a terminal window, execute:
git clone https://github.com/aws-samples/aws-lambda-layers-aws-sam-examples
cd ./aws-sdk-layer
This directory contains an AWS SAM template and Node.js package.json file. Install the package.json contents:
npm install
Create the layer directory defined in the AWS SAM template and the nodejs directory required by Lambda. Next, move the node_modules directory:
mkdir -p ./layer/nodejs
mv ./node_modules ./layer/nodejs
Next, deploy the AWS SAM template to create the layer:
sam deploy --guided
For the Stack name, enter “aws-sdk-layer”. Enter your preferred AWS Region and accept the other defaults.
After the deployment completes, the new Lambda layer is available to use. Run this command to see the available layers:aws lambda list-layers

After adding a layer to a function, you can use console.log to log out the AWS SDK version. This shows that the function is now using the SDK version in the layer instead of the version provided by the Lambda service:

Creating layers with OS-specific binaries

Many code libraries include binaries that are operating-system specific. When you build packages on your local development machine, by default the binaries for that operating system are used. These may not be the right binaries for Lambda, which runs on Amazon Linux. If you are not using a compatible operating system, you must ensure you include Linux binaries in the layer.

The simplest way to package these libraries correctly is to use AWS Cloud9. This is an IDE in the AWS Cloud, which runs on Amazon EC2. After creating an environment, you can clone a git repository directly to the local storage of the instance, and run the necessary build scripts.

The Happy Path application resizes images using the Sharp npm library. This library uses libvips, which is written in C, so the compilation is operating system-specific. By creating a layer containing this library, it simplifies the packaging and deployment of the consuming Lambda function.

To create a Sharp layer using AWS Cloud9:

Navigate to the AWS Cloud9 console.
Choose Create environment.
Enter the name “My IDE” and choose Next step.
Accept all the default and choose Next step.
Review the settings and choose Create environment.
In the terminal panel, enter:
git clone https://github.com/aws-samples/aws-lambda-layers-aws-sam-examples
cd ./aws-lambda-layers-aws-sam-examples/sharp-layer
npm install
From a terminal window, ensure you are in the directory where you cloned this post’s GitHub repo. Execute the following commands:cd ./sharp-layer
npm install
mkdir -p ./layer/nodejs
mv ./node_modules ./layer/nodejs
Next, deploy the AWS SAM template to create the layer:
sam deploy --guided
For the Stack name, enter “sharp-layer”. Enter your preferred AWS Region and accept the other defaults. After the deployment completes, the new Lambda layer is available to use.

In some runtimes, you can specify a local set of packages for development, and another set for production. For example, in Node.js, the package.json file allows you to specify two sections for dependencies. If your development machine uses a different operating system to Lambda, and therefore uses different binaries, you can use package.json to resolve this. In the Happy Path Resizer function, which uses the Sharp layer, the package.json refers to a local binary for development.

AWS SAM defines Lambda functions with the AWS::Serverless::Function resource. Layers are defined as a property of functions, as a list of layer ARNs including the version:

  MyLambdaFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: myFunction/
      Handler: app.handler
      MemorySize: 128
      Layers:
        - !Ref SharpLayerARN

Sharing a layer

Layers are private to your account by default but you can optionally share with other AWS accounts or make a layer public. You cannot share layers via the AWS Management Console but instead use the AWS CLI.

To share a layer, use add-layer-version-permission, specifying the layer name, version, AWS Region, and principal:

aws lambda add-layer-version-permission \
  --layer-name node-sharp \
  --principal '*' \
  --action lambda:GetLayerVersion \
  --version-number 3 
  --statement-id public 
  --region us-east-1

In the principal parameter, specify an individual account ID or use an asterisk to make the layer public. The CLI responds with a RevisionId containing the current revision of the policy:

You can check the permissions associated with a layer version by calling get-layer-version-policy with the layer name and version:

aws lambda get-layer-version-policy \
  --layer-name node-sharp \
  --version-number 3 \
  --region us-east-1

Similarly, you can delete permissions associated with a layer version by calling remove-layer-vesion-permission with the layer name, statement ID, and version:

aws lambda remove-layer-version-permission \
 -- layer-name node-sharp \
 -- statement-id public \
 -- version-number 3

Once the permissions are removed, calling get-layer-version-policy results in an error:

Conclusion

Lambda layers provide a convenient and effective way to package code libraries for sharing with Lambda functions in your account. Using layers can help reduce the size of uploaded archives and make it faster to deploy your code.

Layers can contain packages using OS-specific binaries, providing a convenient way to distribute these to developers. While layers are private by default, you can share with other accounts or make a layer public. Layers are published as immutable versions, and deleting a layer has no effect on deployed Lambda functions already using that layer.

To learn more about using Lambda layers, visit the documentation, or see how layers are used in the Happy Path web application.

Federated multi-account access for AWS CodeCommit

2020-09-04 Steven David

Post Syndicated from Steven David original https://aws.amazon.com/blogs/devops/federated-multi-account-access-for-aws-codecommit/

As a developer working in a large enterprise or for a group that supports multiple products, you may often find yourself accessing Git repositories from different organizations. Currently, to securely access multiple Git repositories in other popular tools, you need SSH keys, GPG keys, a Git credential helper, and a significant amount of setup by the developer hoping to commit to the repository. In addition, administrators must be aware of the various ways to remove all the permissions granted to the developer.

AWS CodeCommit is a managed source control service. Combined with AWS Single Sign-On (AWS SSO) and git-remote-codecommit, you can quickly and easily switch between repositories owned by different groups or even managed in separate AWS accounts. You can control those permissions with AWS Identity and Access Management (IAM) roles to allow for the automated removal of the user’s permission as part of their off-boarding procedure for the company.

This post demonstrates how to grant access to various CodeCommit repositories without access keys.

Solution overview

In this solution, the user’s access is controlled with federated login via AWS SSO. You can grant that access using AWS native authentication, which eliminates the need for a Git credential helper, SSH, and GPG keys. In addition, this allows the administrator to control access by adding or removing the user’s IAM role access.

The following diagram shows the code access pattern you can achieve by using AWS SSO and git-remote-codecommit to access CodeCommit across multiple accounts.

git-remote-codecommit overview diagram

Prerequisites

To complete this tutorial, you must have the following prerequisites:

CodeCommit repositories in two separate accounts. For instructions, see Create an AWS CodeCommit repository.
AWS SSO set up to handle access federation. For instructions, see Enable AWS SSO.
Python 3.6 or higher installed on the developer’s local machine. To download and install the latest version of Python, see the Python website.
- On a Mac, it can be difficult to ensure that you’re using Python 3.6, because 2.7 is installed and required by the OS. For more information about checking your version of Python, see the following GitHub repo.
Git installed on your local machine. To download Git, see Git Downloads.
PIP version 9.0.3 or higher installed on your local machine. For instructions, see Installation on the PIP website.

Configuring AWS SSO role permissions

As your first step, you should make sure each AWS SSO role has the correct permissions to access the CodeCommit repositories.

On the AWS SSO console, choose AWS Accounts.
On the Permissions Sets tab, choose Create permission set.
On the Create a new permission set page, select Create a custom permission set.
For Name, enter CodeCommitDeveloperAccess.
For Description, enter This permission set gives the user access to work with CodeCommit for common developer tasks.
For Session duration, choose 12 hours.

Create new permissions

For Relay state, leave blank.
For What policies do you want to include in your permissions set?, select Create a custom permissions policy.
Use the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
             "Sid": "CodeCommitDeveloperAccess",
             "Effect": "Allow",
             "Action": [
                 "codecommit:GitPull",
                 "codecommit:GitPush",
                 "codecommit:ListRepositories"
             ],
             "Resource": "*"
         }
      ]
}

The preceding code grants access to all the repositories in the account. You could limit to a specific list of repositories, if needed.

Choose Create.

Creating your AWS SSO group

Next, we need to create the SSO Group we want to assign the permissions.

On the AWS SSO console, choose Groups.
Choose create group.
For Group name, enter CodeCommitAccessGroup.
For Description, enter Users assigned to this group will have access to work with CodeCommit.

Create Group

Choose Create.

Assigning your group and permission sets to your accounts

Now that we have our group and permission sets created, we need to assign them to the accounts with the CodeCommit repositories.

On the AWS SSO console, choose AWS Accounts.
Choose the account you want to use in your new group.
On the account Details page, choose Assign Users.
On the Select users or groups page, choose Group.
Select CodeCommitGroup.
Choose NEXT: Permission Sets.
Choose the CodeCommitDeveloperAccess permission set and choose Finish

Assign Users

Choose Proceed to Accounts to return to the AWS SSO console.
Repeat these steps for each account that has a CodeCommit repository.

Assigning a user to the group

To wrap up our AWS SSO configuration, we need to assign the user to the group.

On the AWS SSO console, choose Groups.
Choose CodeCommitAccessGroup.
Choose Add user.
Select all the users you want to add to this group.
Choose Add user(s).
From the navigation pane, choose Settings.
Record the user portal URL to use later.

Enabling AWS SSO login

The second main feature we want to enable is AWS SSO login from the AWS Command Line Interface (AWS CLI) on our local machine.

Run the following command from the AWS CLI. You need to enter the user portal URL from the previous step and tell the CLI what Region has your AWS SSO deployment. The following code example has AWS SSO deployed in us-east-1:

aws configure sso 
SSO start URL [None]: https://my-sso-portal.awsapps.com/start 
SSO region [None]:us-east-1

You’re redirected to your default browser.

When you return to the CLI, you must choose your account. See the following code:

There are 2 AWS accounts available to you.
> DeveloperResearch, [email protected] (123456789123)
DeveloperTrading, [email protected] (123456789444)

Choose the account with your CodeCommit repository.

Next, you see the permissions sets available to you in the account you just picked. See the following code:

Using the account ID 123456789123
There are 2 roles available to you.
> ReadOnly
CodeCommitDeveloperAccess

Choose the CodeCommitDeveloperAccess permissions.

You now see the options for the profile you’re creating for these AWS SSO permissions:

CLI default client Region [None]: us-west-2<ENTER>
CLI default output format [None]: json<ENTER>
CLI profile name [123456789011_ReadOnly]: DevResearch-profile<ENTER>

Repeat these steps for each AWS account you want to access.

For example, I create DevResearch-profile for my DeveloperResearch account and DevTrading-profile for the DeveloperTrading account.

Installing git-remote-codecommit

Finally, we want to install the recently released git-remote-codecommit and start working with our Git repositories.

Install git-remote-codecommit with the following code:

pip install git-remote-codecommit

With some operating systems, you might need to run the following code instead:

sudo pip install git-remote-codecommit

Clone the code from one of your repositories. For this use case, my CodeCommit repository is named MyDemoRepo. See the following code:

git clone codecommit://DevResearch-profile@MyDemoRepo my-demo-repo

After that solution is cloned locally, you can copy code from another federated profile by simply changing to that profile and referencing the repository in that account named MyDemoRepo2. See the following code:

git clone codecommit://DevTrading-profile@MyDemoRepo2 my-demo-repo2

Cleaning up

At the end of this tutorial, complete the following steps to undo the changes you made to your local system and AWS:

On the AWS SSO console, remove the user from the group you created, so any future access requests fail.
To remove the AWS SSO login profiles, open the local config file with your preferred tool and remove the profile.
1. The config file is located at %UserProfile%/.aws/config for Windows and $HOME/.aws/config for Linux or Mac.
To remove git-remote-codecommit, run the PIP uninstall command:

pip uninstall git-remote-codecommit

With some operating systems, you might need to run the following code instead:

sudo pip uninstall git-remote-codecommit

Conclusion

This post reviewed an approach to securely switch between repositories and work without concerns about one Git repository’s security credentials interfering with the other Git repository. User access is controlled by the permissions assigned to the profile via federated roles from AWS SSO. This allows for access control to CodeCommit without needing access keys.

Field Notes: Deploying UiPath RPA Software on AWS

2020-09-03 Yuchen Lin

Post Syndicated from Yuchen Lin original https://aws.amazon.com/blogs/architecture/field-notes-deploying-uipath-rpa-software-on-aws/

Running UiPath RPA software on AWS leverages the elasticity of the AWS Cloud, to set up, operate, and scale robotic process automation. It provides cost-efficient and resizable capacity, and scales the robots to meet your business workload. This reduces the need for administration tasks, such as hardware provisioning, environment setup, and backups. It frees you to focus on business process optimization by automating more processes.

This blog post guides you in deploying UiPath robotic processing automation (RPA) software on AWS. RPA software uses the user interface to capture data and manipulate applications just like humans do. It runs as a software robot to interpret, and trigger responses, as well as communicate with other systems to perform a variety of repetitive tasks.

UiPath Enterprise RPA Platform provides the full automation lifecycle including discover, build, manage, run, engage, and measure with different products. This blog post focuses on the Platform’s core products: build with UiPath Studio, manage with UiPath Orchestrator and run with UiPath Robots.

About UiPath software

UiPath Enterprise RPA Platform’s core products are:

UiPath Studio provides visual designer with hundreds of activity templates and ready-made components to automate the processes ready for robots. This includes pre-build integrations components with Amazon S3, Amazon EC2, Amazon Textract and Amazon Rekognition.
UiPath Robot runs the automations created in UiPath Studio.
UiPath Orchestrator is the centralized robot management tool where you can deploy, secure, and manage your UiPath Robots at scale.

UiPath Studio and UiPath Robot are individual products, you can deploy each on a standalone machine.

UiPath Orchestrator contains Web Servers, SQL Server and Indexer Server (Elasticsearch), you can use Single Machine deployment, or Multi-Node deployment, depends on the workload capacity and availability requirements.

For information on UiPath platform offerings, review UiPath platform products.

UiPath on AWS

You can deploy all UiPath products on AWS.

UiPath Studio is needed for automation design jobs and runs on single machine. You deploy it with Amazon EC2.
UiPath Robots are needed for automation tasks, runs on a single machine, and scales with the business workload. You deploy it with Amazon EC2 and scale with Amazon EC2 Auto Scaling.
UiPath Orchestrator is needed for automation administration jobs and contains three logical components that run on multiple machines. You deploy Web Server with Amazon EC2, SQL Server with Amazon RDS, and Indexer Server with Amazon Elasticsearch Service. For Multi-Node deployment, you deploy High Availability Add-On with Amazon EC2.

The architecture of UiPath Enterprise RPA Platform on AWS looks like the following diagram:

Figure 1 – UiPath Enterprise RPA Platform on AWS

By deploying the UiPath Enterprise RPA Platform on AWS, you can set up, operate, and scale workloads. This controls the infrastructure cost to meet process automation workloads.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
AWS resources
UiPath Enterprise RPA Platform software
Basic knowledge of Amazon EC2, EC2 Auto Scaling, Amazon RDS, Amazon Elasticsearch Service.
Basic knowledge to set up Windows Server, IIS, SQL Server, Elasticsearch.
Basic knowledge of Redis Enterprise to set up High Availability Add-on.
Basic knowledge of UiPath Studio, UiPath Robot, UiPath Orchestrator.

Deployment Steps

Deploy UiPath Studio
UiPath Studio deploys on a single machine. Amazon EC2 instances provide secure and resizable compute capacity in the cloud, and the ability to launch applications when needed without upfront commitments.

Download the UiPath Enterprise RPA Platform. UiPath Studio is integrated in the installation package.
Launch an EC2 instance with a Windows OS-based Amazon Machine Image (AMI) that meets the UiPath Studio hardware requirements and software requirements.
Install the UiPath Studio software. For UiPath Studio installation steps, review the UiPath Studio Guide.

Optionally, you can save the installation and pre-configuration work completed for UiPath Studio as a custom Amazon Machine Image (AMI). Then, you can launch more UiPath Studio instances from this AMI. For details, visit Launch an EC2 instance from a custom AMI tutorial.

UiPath Robot Deployment

Each UiPath Robot deploys one single machine with Amazon EC2. Amazon EC2 Auto Scaling helps you add or remove Robots to meet automation workload changes in demand.

Download the UiPath Enterprise RPA Platform. The UiPath Robot is integrated in the installation package.
Launch an EC2 instance with a Windows OS based Amazon Machine Image (AMI) that meets the UiPath Robot hardware requirements and software requirements.
Install the business application (Microsoft Office, SAP, etc.) required for your business processes. Alternatively, select the business application AMI from the AWS Marketplace.
Install the UiPath Robot software. For UiPath Robot installation steps, review Installing the Robot.

Optionally, you can save the installation and pre-configuration work completed for UiPath Robot as a custom Amazon Machine Image (AMI). Then you can create Launch templates with instance configuration information. With launch template, you can create Auto Scaling groups from launch templates and scale the Robots.

Scale the Robots’ Capacity

Amazon EC2 Auto Scaling groups help you use scaling policies to scale compute capacity based on resource use. By monitoring the process queue and creating a customized scaling policy, the UiPath Robot can automatically scale based on the workload. For details, review Scaling the size of your Auto Scaling group.

Use the Robot Logs

UiPath Robot generates multiple diagnostic and execution logs. Amazon CloudWatch provides the log collection, storage, and analysis, and enables the complete visibility of the Robots and automation tasks. For CloudWatch agent setup on Robot, review Quick Start: Enable Your Amazon EC2 Instances Running Windows Server to Send logs to CloudWatch Logs.

Monitor the Automation Jobs

UiPath Robot uses the user interface to capture data and manipulate applications. When UiPath Robot runs, it is important to capture processing screens for troubleshooting and auditing usage. This screen capture activity can be integrated with process in conjunction with UiPath Studio.

Amazon S3 provides cost-effective storage for retaining all Robot logs and processing screen captures. Amazon S3 Object Lifecycle Management automates the transition between different storage classes, and helps you manage the screenshots so that they are stored cost effectively throughout their lifecycle. For lifecycle policy creation, review How Do I Create a Lifecycle Policy for an S3 Bucket?.

UiPath Orchestrator Deployment

Deployment Components
UiPath Orchestrator Server Platform has many logical components, grouped in three layers:

presentation layer
web service layer
persistence layer

The presentation layer and web service layer are built into one ASP.NET website. The persistence layer contains SQL Server and Elasticsearch. There are three deployment components to be set up:

web application
SQL Server
Elasticsearch

The Web Server, SQL Server, and Elasticsearch Server require multiple different environments. Review the hardware requirements and software requirements for more details.

Note: set up the Web Server, SQL Server, Elasticsearch Server environments before running the UiPath Enterprise Platform installation wizard.

Set up Web Server with Amazon EC2

UiPath Orchestrator Web Server deploys on Windows Server with IIS 7.5 or later. For details, review the software requirements.

AWS provides various AMIs for Windows Server that can help you set up the environment required for the Web Server.

The Microsoft Windows Server 2019 Base AMI includes most prerequisites for installation except some features of Web Server (IIS) to be enabled. For configuration steps, review Server Roles and Features.

The Web Server should be put in correct subnet (Public or Private) and have proper security group (HTTPS visits) according to the business requirements. Review Allow user to connect EC2 on HTTP or HTTPS.

Set up SQL Server with Amazon RDS

Amazon Relational Database Service (Amazon RDS) provides you with a managed database service. With a few clicks, you can set up, operate, and scale a relational database in the AWS Cloud.

Amazon RDS support SQL Server Engine. For UiPath Orchestrator, both Standard Edition and Enterprise Edition are supported. For details, review software requirements.

Amazon RDS can be set up in multiple Available Zones to meet requirements for high availability.

UiPath Orchestrator can connect to the created Amazon RDS database with SQL Server Authentication.

Set up Elasticsearch Server with Amazon Elasticsearch Service (Amazon ES)

Amazon ES is a fully managed service for you to deploy, secure, and operate Elasticsearch at scale with generally zero down time.

Elasticsearch Service provides a managed ELS stack, with no upfront costs or usage requirements, and without the operational overhead.

All messages logged by UiPath Robots are sent through the Logging REST endpoint to the Indexer Server where they are indexed for future utilization.

Install UiPath Orchestrator on the Web Server

After Web Server, SQL Server, Elasticsearch Server environment are ready, download the UiPath Enterprise RPA Platform, and install it on the Web Server.

The UiPath Enterprise Platform installation wizard guides you in configuring and setting up each environment, including connecting to SQL Server and configuring the Elasticsearch API URL.

After you complete setup, the UiPath Orchestrator Portal is available for you to visit and manage processes, jobs, and robots.

The UiPath Orchestrator dashboard appears like in the following screenshot:

Figure 2- UiPath Orchestrator Portal

Set up Orchestrator High Availability Architecture

One Orchestrator can handle many robots in a typical configuration, but any product running on a single server is vulnerable to failure if something happens to that server.

The High Availability add-on (HAA) enables you to add a second Orchestrator server to your environment that is generally fully synchronized with the first server.

To set up multi-node deployment, launch Amazon EC2 instances with a Linux OS-based Amazon Machine Image (AMI) that meets the HAA hardware and software requirements. Follow the installation guide to set up HAA.

Elastic Load Balancing automatically distributes incoming application traffic across multiple targets. Network Load Balancer should be set up to allow Robots to communicate with multi-node Orchestrators.

Cleaning up

To avoid incurring future charges, delete all the resources.

Conclusion

In this post, I showed you how to deploy the UiPath Enterprise RPA Platform on AWS to further optimize and automate your business processes. AWS Managed Services like Amazon EC2, Amazon RDS, and Amazon Elasticsearch Service help you set up the environment with high availability. This reduces the maintenance effort of backend services, as well as scaling Orchestrator capabilities. Amazon EC2 Auto Scaling helps you add or remove robots to meet automation workload changes in demand.

Learn more about how to integrate UiPath with AWS services, check out The UiPath and AWS partnership.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Building a Shared Account Structure Using AWS Organizations

2020-09-02 Abhijit Vaidya

Post Syndicated from Abhijit Vaidya original https://aws.amazon.com/blogs/architecture/field-notes-building-a-shared-account-structure-using-aws-organizations/

For customers considering the AWS Solution Provider Program, there are challenges to mitigate when building a shared account model with SI partners. AWS Organizations make it possible to build the right account structure to support a resale arrangement. In this engagement model, the end customer gets an AWS invoice from an AWS authorized partner instead of AWS directly.

Partners and customers who want to engage in this service resale arrangement need to build a new account structure. This process includes linking or transferring existing customer accounts to the partner master account. This is so that all the billing data from customer accounts is consolidated into the partner master account.

While linking or transferring existing customer accounts to the new master account, the partner must check the new master account structure. It should not compromise any customer security controls and continue to provide full control of linked accounts to the customer. The new account structure must fulfill the following requirements for both the AWS customer and partner:

The customer maintains full access to the AWS organization and able to perform all critical security-related tasks except access to billing data.
The Partner is able to control only billing information and not able to perform any other task in the root account (master payer account) without approval from the customer.
In case of contract breach / termination, the customer is able to gain back full control of all accounts including the Master.

In this post, you will learn about how partners can create a master account with shared ownership. We also show how to link or transfer customer organization accounts to the new organization master account and set up policies that would provide appropriate access to both partner and customer.

Account Structure

The following architecture represents the account structure setup that can fulfill customer and partner requirements as a part of a service resale arrangement.

As illustrated in the preceding diagram, the following list includes the key architectural components:

Architectural Components

As a part of resale arrangement, the customer’s existing AWS organization and related accounts are linked to the partner’s master payer account. The customer can continue to maintain their existing master root account, while all child accounts are linked to the master account (as shown in the list).

Customers may have valid concerns about linking/transferring accounts to the owned master payee account and may come up with many ‘what-if’ scenarios for example “What if the partner shuts down environment/servers?” or, “What if partner blocks access to child accounts?”.

This account structure provides the right controls to both customer and partner that would address customer concerns around the security of their accounts. This includes the following benefits:

Starting with access to the root account, neither customer nor partner can access the root account without the other party’s involvement.
The partner controls the id/password for the root account while the customer maintains the MFA token for the account. The customer also controls the phone number, security questions associated with the root account. That way, the partner cannot replace the MFA token on their own.
The partner only has billing access and does not control any other parts of account including child accounts. Anytime the root account access is needed, both customer and partner team need to collaborate and access the root account.
The customer or partner cannot assign new IAM roles to themselves, therefore protecting the initial account setup.

Security considerations in the shared account setup

The following table highlights both customer and partner responsibilities and access controls provided by the architecture in the previous section.

The following points highlight security recommendations to provide adequate access rights to both partner and customers.

New master payer/ root account has a joint ownership between the Partner and the Customer.
AWS account root users (user id/password) would be with Partner and MFA (multi-factor authentication) device with Customer.
IAM (AWS Identity and Access Management) role to be created under the master payer with policies “FullOrganizationAccess”, “Amazon S3” (Amazon Simple Storage Service), “CloudTrail” (AWS CloudTrail), “CloudWatch”(Amazon CloudWatch) for the Customer.
Security team to log in and manage security of the account. Additional permissions to be added to this role as needed in future. This role does not have ANY billing permissions.
Only the partner has access to AWS billing and usage data.
IAM role / user would be created under master payer with just billing permission for the Partner team to log in and download all invoices. This role does not have any other permissions except billing and usage reports.
Any root login attempts to master payer triggers a notification to Customer’s SOC team and the Partner’s Customer account management team.
The Partner’ email address is used to create an account so invoices can be emailed to the partner’s email. The Customer cannot see these invoices.
The Customer phone number is used to create a master account and the customer maintains security questions/answers. This prevents replacement of MFA token by the Partner team without informing customer. The Customer wouldn’t need the Partner’s help or permission to login and manage any security.
No aspect of the Master Payer / Root Partner team can login to the master payer/Root without the Customer providing an MFA token.

Setting up the shared master AWS account structure

Create a playbook for the account transfer activity based on the following tasks. For each task, identify the owner. Make sure that owners have the right permissions to perform the tasks.

Part I – Setting up new partner master account

Create new Partner Master payee Account
Update payment details section with the required details for payment in Partner Master payee Account
Enable MFA in the Partner Master payee Account
Update contact for security and operations in the Partner Master payee Account
Update demographics -Address and contact details in Partner Master payee Account
Create an IAM role for Customer Team in Partner Master Payee account. IAM role is created under master payer with “FullOrganizationAccess”, “Amazon S3”, “CloudTrail”, “CloudWatch” “CloudFormationFullAccess” for the Customer SOC team to login and manage security of the account. Additional permissions can be added to this role in future if needed.

Select the roles:

create role

7. Create an IAM role/user for Partner billing role in the Partner Master Payee account.

Part II – Setting up customer master account

1. Create an IAM user in the customer’s master account. This user assumes role into the new master payer/root account.

aws management console

2. Confirm that when the IAM user from the customer account assumes a role in the new master account, and that the user does not have Billing Access.

Billing and cost management dashboard

Part III – Creating an organization structure in partner account

Create an Organization in the Partner Master Payee Account
Create Multiple Organizational Units (OU) in the Partner Master Payee Account

3. Enable Service Control Policies from AWS Organization’s Policies menu.

service control policies

5. Create/Copy Multiple in to Partner Master Payee Account from Customer root Account. Any service control policies from the customer root account should be manually copied to new partner account.

6. If customer root account has any special software installed for example, security, install same software in Partner Master Payee Account.

7. Set alerts in Partner Master Payee root account. Any login to the root account would send alerts to customer and partner teams.

8. It is recommended to keep a copy of all billing history invoices for all accounts to be transferred to partner organization. This could be achieved by either downloading CSV or printing all invoices and storing files in Amazon S3 for long term archival. Billing history and invoices are found by clicking Orders and Invoices on Billing & Cost Management Dashboard. After accounts are transferred to new organization, historic billing data will not be available for those accounts.

9. Remove all the Member Accounts from the current Customer Root Account/ Organization. This step is performed by customer account admin and required before account can be transferred to Partner Account organization.

10. Send an invite from the Partner Master Payee Account to the delinked Member Account

master payee account

11. Member Accounts to accept the invite from the Partner Master Payee Account.

invitations

12. Move the Customer member account to the appropriate OU in the Partner Master Payee Account.

Setting the shared security model between partner and customer contact

While setting up the master account, three contacts need to be updated for notification.

Billing – this is owned by the Partner
Operations – this is owned by the Customer
Security – this is owned by the Customer.

This will trigger a notification of any activity on the root account. The contact details contain Name, Title, Email Address and Phone number. It is recommended to use the Customer’s SOC team distribution email for security and operations, and a phone number that belongs to the organization, and not the individual.

Alternate contacts

Additionally, before any root account activity takes place, AWS Support will verify using the security challenge questionnaire. These questions and answers are owned by the Customer’s SOC team.

security challenge questions

If a customer is not able to access the AWS account, alternate support options are available at Contact us by expanding the “I’m an AWS customer and I’m looking for billing or account support” menu. While contacting AWS Support, all the details that are listed on the account are needed, including full name, phone number, address, email address, and the last four digits of the credit card.

Clean Up

After recovering the account, the Customer should close any accounts that are not in use. It’s a good idea not to have open accounts in your name that could result in charges. For more information, review Closing an Account in the Billing and Cost Management User Guide.

The Shared master root account should be only used for selected activities referred to in the following document.

Conclusion

In this post, you learned how AWS Organizations features can be used to create a shared master account structure. This helps both customer and partner engage in a service resale business engagement. Using AWS Organizations and cross account access, this solution allows customers to control all key aspects of managing the AWS Organization (Security / Logging / Monitoring) and also allows partners to control any billing related data.

Additional Resources

Cross Account Access

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Integrating a Multi-forest Source Environment with AWS SSO

2020-08-26 Sudhir Amin

Post Syndicated from Sudhir Amin original https://aws.amazon.com/blogs/architecture/field-notes-integrating-a-multi-forest-source-environment-with-aws-sso/

During re:Invent 2019, AWS announced a new way to integrate external identity sources such as Azure Active Directory with auto provisioning of identities and groups in AWS Single Sign-On (AWS SSO). In March 2020, AWS SSO afforded customers the possibility to connect their Okta Identity Cloud to AWS Single Sign-On (SSO) in order to manage access to AWS centrally in AWS SSO.

AWS Single Sign-On service helps to centralize management access to multiple AWS accounts and some cases tying back to corporate identities. This provides ready access to business applications and services. With this feature, companies can leverage AWS Single Sign-On for allowing federated access to multiple AWS accounts and cloud applications.

In this blog post, I discuss the challenges faced by customers running multi-forest environments or multiple Azure tenant subscriptions with this feature. I also provide a different approach to solving this challenge with a brief overview of each solution presented.

Large Enterprise companies often require their security team to build centralized identity solutions that work across different Active Directory forests environments. This is commonly due to a merger, acquisition or partnership. Challenges include complex networking with different IP routes, DNS forwarding configurations, firewall rules to enable trust relationships between different Active Directory forests to support compliance of a single identity to manage the account lifecycle and password policies. This becomes even more challenging when your organization is working in multiple cloud platforms within a centralized Identity solution, with hybrid networking connectivity.

Customer Example

To illustrate my point, I use the following example of a real life customer scenario, under the fictitious name of ‘Acme Corporation’.

Acme Corporation is a capital wealth management company operating in three countries: USA, Canada, and Brazil. Business is growing and they are exploring cloud services.

Their corporate headquarters is located in NY, USA and they have established offices (branches) in Canada and Brazil. The organization operates in a decentralized model, which consists of different governance over their identity structure. An Active Directory Forest is established per Region with a cross-forest trust relationship. The company is looking to adopt cloud technologies and needed a common identity solution across on-premises and cloud services with Azure Active Directory and AWS.

We’ve outlined the solution in the following diagram:

Figure 1 – Solution Overview

Options to source identities into AWS Single Sign-On

AWS Single Sign-On offers the following 3 options to establish as an identity source:

AWS SSO
Active Directory
External Identity Provider

Figure 2 – Identity Source Options

The first option; “AWS SSO” is a default native identity store. You can create and delete users and groups.

The second option; “Active Directory” allows administrators to source users and groups from Active Directory running On-Premises Active Directory, or Active Directory in EC2 (using AD Connector as the directory gateway) or AWS Managed Microsoft AD directory hosted in the AWS Cloud.

The third option; “External Identity Provider” enables administrators to provision users and groups from external identity providers (IdPs) through the Security Assertion Markup Language (SAML) 2.0 standard.

Note: AWS Single Sign-On allows only one identity source at any given time. In this post, we focus on two options that help integrate a multi-forest environment with AWS Single Sign-On and Azure Active Directory.

Solution

Option 1. Federating with Active Directory

In the hub-and-spoke model, the AWS Managed Microsoft Active Directory is the hub and the spoke is the Active Directory forests.

Provision a AWS Managed Microsoft Active Directory.
- If you already have an AWS Managed Microsoft Active Directory for a hub, continue to the next step.
Setup hybrid network connectivity, and firewall rules allowing trust traffic
DNS, conditional forwarding allows to resolve the trusting forests. We need an Outbound Endpoint with Forwarding Rules to the different forests so the VPC resolves the names and an inbound endpoint so the forests can resolve the AWS Managed Microsoft AD names.
Check the name resolution is working for the hybrid environment.
Establish a Forest trust relationship and validate the trust.

The following snapshot shows how your trust relationship will be displayed on the console.

Figure 3 - Trust Relationships

Note 1: You cannot use the transitive trust relationship of a child domain in a forest or cross forest relationship. In that case, you have to create an explicit trust or a domain trust to the AWS Managed Microsoft AD domain for AWS Single Sign-On. This enables you to see the user and groups required to provision the permission sets and Accounts.

Note 2: AWS Managed Microsoft Active Directory in this example does not require you to host any users or groups, as this domain is only being used for the domain trust relationships. In short, this can be an empty forest.

Configure AWS Single Sign-On to use your AWS Managed Microsoft Active Directory for Active Directory option.

The following snapshot shows how to assign a group to an account in preparation for AWS Singles Sign-On enablement.

Figure 4 - Selecting Users or Groups

The following snapshot shows how to assign a group to an account in preparation for AWS Singles Sign-On enablement and selecting a group.

The following is a conceptual diagram of Acme corporation, after successful integration.

Figure 3 – Option 1 – Conceptual Diagram

Option 2a. Federating with Azure Active Directory Single Tenant

If you have multiple-forests and would like to use a single tenant, here are the steps:

Setup a single Azure AD Connect in any forest, to consolidate users from different forests to a single Azure Tenant.
- Review the requirements under section “Multiple Forests, Single Azure Active Directory Tenant.
Configure AWS Single Sign-On to use your Single Azure Active Directory Tenant for External Identity Provider option.

The following is a conceptual diagram of Acme corporation, after successful integration.

Figure 5 – Option 2a – Conceptual Diagram

Option 2b. Federating with Azure Active Directory Multiple Tenants

If option 2a is not feasible and you are using multiple Azure AD Connect sync servers and multiple Azure Active Directory tenants (as per the following diagram) then, you can nominate one of the Azure Active Directory tenants to connect with AWS SSO. Through B2B invitation, selectively invite users from other tenants into the nominated tenant.

Note: This is not a scalable solution, as it requires administrative overhead. This should be ideal for a small set of users requiring access to AWS API or console for administrative work.

Follow the Microsoft B2B model.
Tutorial: Bulk invite Azure Active Directory B2B collaboration users

The following is a conceptual diagram of Acme corporation, after successful integration.

Figure 6 – Option 2b – Conceptual Diagram

Conclusion

In this post, we discussed the options for connecting AWS SSO to your preferred Identity Provider, with a multi-forest infrastructure. Customers running multi-forest environments or multiple Azure tenant subscriptions now have a guide to offer their users a continued way of centralizing management and enforcing least privilege access on cloud resources. To learn more, review our AWS Single Sign-On service content.

Additional Content:

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Serverless Container-based APIs with Amazon ECS and Amazon API Gateway

2020-08-24 Simone Pomata

Post Syndicated from Simone Pomata original https://aws.amazon.com/blogs/architecture/field-notes-serverless-container-based-apis-with-amazon-ecs-and-amazon-api-gateway/

A growing number of organizations choose to build their APIs with Docker containers. For hosting and exposing these container-based APIs, they need a solution which supports HTTP requests routing, autoscaling, and high availability. In some cases, user authorization is also needed.

For this purpose, many organizations are orchestrating their containerized services with Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS), while hosting their containers on Amazon EC2 or AWS Fargate. Then, they can add scalability and high availability with Service Auto Scaling (in Amazon ECS) or Horizontal Pod Auto Scaler (in Amazon EKS), and they expose the services through load balancers (for example, the AWS Application Load Balancer).

When you use Amazon ECS as an orchestrator (with EC2 or Fargate launch type), you also have the option to expose your services with Amazon API Gateway and AWS Cloud Map instead of a load balancer. AWS Cloud Map is used for service discovery: no matter how Amazon ECS tasks scale, AWS Cloud Map service names would point to the right set of Amazon ECS tasks. Then, API Gateway HTTP APIs can be used to define API routes and point them to the corresponding AWS Cloud Map services.

API Gateway and AWS Cloud Map could be a good fit if you want to leverage the capabilities provided by API Gateway HTTP APIs. For example, you could import/export your API as an OpenAPI definition file. You could configure the following features, either on the whole API or – more granularly – at route level: throttling, detailed metrics, or OAuth 2.0 / OIDC user authorization. You could also deploy your API at different stages over time. Or you could easily configure CORS for your API or for any route, instead of handling OPTIONS preflight requests yourself.

If you don’t need the capabilities of API Gateway HTTP APIs or if those of Elastic Load Balancing are a better fit, then you can use the latter. For example, the capabilities of the Application Load Balancer include: content-based routing (not only by path and HTTP method, but also by HTTP header, query-string parameter, source IP, etc.), redirects, fixed responses, and others. Additionally, the Network Load Balancer provides layer 4 load balancing capabilities. Ultimately, there are overlaps and differences between the features of Elastic Load Balancing and those of API Gateway HTTP APIs: so you may want to compare them to choose the right option for your use case.

This blog post guides you through the details of the option based on API Gateway and AWS Cloud Map, and how to implement it: first you learn how the different components (Amazon ECS, AWS Cloud Map, API Gateway, etc.) work together, then you launch and test a sample container-based API.

Architecture Overview

The following diagram shows the architecture of the sample API that you are going to launch.

Figure 1 – Architecture Diagram

This example API exposes two services: “Food store” to PUT and GET foods, and “Pet store” to PUT and GET pets. Unauthenticated users can only GET, while authenticated users can also PUT.

The following building blocks are used:

Amazon Cognito User Pools: for user authentication. In this example API, Amazon Cognito is used for user authentication, but you could use any other OAuth 2.0 / OIDC identity provider instead. When the user authenticates with Amazon Cognito, user pool tokens are granted, including a JWT access token that is used for authorizing requests to the container APIs.
API Gateway HTTP APIs: for exposing the containerized services to the user. API routes and the respective integrations are defined in API Gateway. A route is the combination of a path and a method. An integration is the backend service which is invoked by that route. In this API, private integrations point to AWS Cloud Map services, which in turn resolve to private Amazon ECS services (more about AWS Cloud Map in the next paragraph). As Amazon ECS services are private resources in a Virtual Private Cloud (VPC), API Gateway uses a VPC link to connect to them in a private way. A VPC link is a set of elastic network interfaces in the VPC, assigned to and managed by API Gateway, so that API Gateway can talk privately with other resources in the VPC. This way, Amazon ECS services can be launched in private subnets and don’t need a public IP. In this sample application, JWT authorization is configured in API Gateway for PUT routes. API Gateway performs requests authorization based on validation of the JWT Token provided, and optionally, scopes in the token. This way, you don’t need additional code in your containers for authorization.
AWS Cloud Map: for service discovery of the containerized services. API Gateway needs a way to find physical addresses of the backend services, and AWS Cloud Map provides this capability. To enable this functionality, service discovery should be configured on Amazon ECS services. Amazon ECS performs periodic health checks on tasks in Amazon ECS services and registers the healthy tasks to the respective AWS Cloud Map service. AWS Cloud Map services can then be resolved either via DNS queries or by calling the DiscoverInstances API (API Gateway uses the API). AWS Cloud Map supports different DNS record types (including A, AAAA, CNAME, and SRV); at the time, of writing, API Gateway can only retrieve SRV records from AWS Cloud Map, so SRV records are used in this sample application. With SRV records, each AWS Cloud Map service returns a combination of IP addresses and port numbers of all the healthy tasks in the service. Consider that AWS Cloud Map would perform round-robin routing (with equal weighting to the targets): for this reason, to avoid hot spots, all tasks in each service should be homogeneous (in terms of container images, vCPU, memory, and other settings).
Amazon ECS: for hosting the containerized services. Amazon ECS is a highly scalable and high-performance container orchestrator. In this blog post, the Fargate launch type is used, so that containers are launched on the Fargate serverless compute engine, and you don’t have to provision or manage any EC2 instances. In this sample API, service auto scaling is also enabled, so that the number of containers in each service can scale up and down automatically based on % CPU usage. Containers will be launched across multiple Availability Zones in the AWS Region, to get high availability.
Amazon DynamoDB: for persisting the data. Amazon DynamoDB is a key-value and document database that provides single-millisecond performance at any scale. In a real-world scenario, you could still use DynamoDB or another data store, such as Amazon Relational Database Service (RDS).

All the code of this blog post is publicly available in this GitHub repository. You can explore the CloudFormation template used to define the sample application as code. You can view the source code of the two containerized services: Food store repository and Pet store repository. You can also explore the code of the sample web app that you’ll use to test the API (this web app has been developed with the Amplify framework). Note that the code provided is intended for testing purposes and not for production usage.

Walkthrough

In this section, you will deploy the sample application and test it.

Prerequisites

To launch the sample API, you first need an AWS user that has access to the AWS Management Console and has the IAM permissions to launch the AWS CloudFormation stack.

Deploying the sample application

Now it’s time to launch the sample API:

Select Launch Stack
In the page for quick stack creation, do the following:
- Select the capability “I acknowledge that AWS CloudFormation might create IAM resources”.
- Keep the rest as default.
- Choose Create Stack.
Wait until the status of the stack transitions to “CREATE_COMPLETE”.

Testing the sample application

In this section, you test the API from a sample web application client that I created. Open the sample web application:

From the page of the stack, choose Outputs.
Open the URL for the “APITestPage” output.
On the opened page, choose Proceed.

The web page should state that you are not signed-in. In this sample API, any user can GET items, but only authenticated users can PUT items. Sign up to the sample web application:

Choose Go To Sign In.
Choose Create account.
Complete the sign-up procedure (you will be asked for a valid email address, which will be registered into your Amazon Cognito User Pool).

The application should state that you are signed-in. Test the API as an authenticated user:

Try to PUT an item. You would see that the operation succeeds. The item has been persisted by the containerized service to the DynamoDB table.

DynamoDB table

2. Try to GET the same item that you previously PUT. You would see that the same JSON is returned. This JSON is retrieved by the containerized service from the DynamoDB table.

Test the API as an unauthenticated user:

Choose Sign Out.
Try to GET the same item that you previously PUT. You would see that the same JSON is returned. This JSON is retrieved by the containerized service from the DynamoDB table.
Try to PUT any item. You would get a 401 Unauthorized error from the API. This behavior is expected because only signed-in users have a JWT token, and you configured API Gateway to only authorize PUT requests that provide a valid token.

DynamoDB table

Exploring the resources of the sample application

You can also explore the resources launched as part of the CloudFormation stack. To list all of them, from the page of your CloudFormation stack, choose Resources.

To see the Amazon ECS services, go to the Amazon ECS console, choose your cluster, and you would see that 2 services are running, one for the Foodstore and another for the Petstore, as shown in the following image.

Notice that the services use the Fargate launch type, meaning that they are running on serverless compute capacity, so you don’t have to launch or maintain any EC2 instances to run them.

Cluster demo

To see the details of a service, go to the Amazon ECS cluster page and choose a service. You land on the service page, where you can see the running tasks, the service events, and other details.

To view the service auto scaling configuration, choose Auto Scaling. You can notice that Amazon ECS is set to automatically scale the number of tasks according to the value of a metric. In this sample application, the metric is the average CPU utilization of the service (ECSServiceAverageCPUUtilization), but you could use another metric.

Auto scaling

The scaling policy of each service uses two Amazon CloudWatch Alarms, one for scaling out and one for scaling in. An alarm is triggered when the target metric deviates from the target value, which in turn is used to trigger the scaling action. To see the alarms, go to the CloudWatch Alarms console.

CloudWatch Alarms

To see the service discovery entries, go to the AWS Cloud Map console, choose your namespace (see the parameter “PrivateDNSNamespaceName” in the CloudFormation stack), and you would see that two services are defined. If you choose one of these services, you would see that multiple service instances are registered, each representing a single Amazon ECS task (in this sample application, each Amazon ECS task is a single container). If you choose one of these service instances, you would see the details about the task, including the private IP, the port, and the health status. API Gateway retrieves these entries to discover your services.

Service Instance

To see the API configuration, go to the API Gateway console and choose your API.

Then, from the left side of the screen select either Routes, Authorization, Integrations, or any other option.

Integrations

Cleaning up

To clean up the resources, simply delete the CloudFormation stack that you deployed as part of this blog post.

Conclusion

You have learned how API Gateway HTTP APIs can be used together with AWS Cloud Map to expose Amazon ECS services as APIs. You have deployed a sample API that also uses Amazon Cognito for authentication and DynamoDB for data persistence.

API Gateway HTTP APIs provides a number of features that you can leverage, such as OpenAPI import/export, throttling, OAuth 2.0 / OIDC user authorization, detailed metrics, and stages deployment. That said, API Gateway is not the only way to expose your ECS services; if you don’t need the features of API Gateway HTTP APIs or if those of Elastic Load Balancing are a better fit, then you can use the latter service. The recommended approach is to compare them to choose the most suitable for your use case.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Using serverless backends to iterate quickly on web apps – part 2

2020-08-24 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-serverless-backends-to-iterate-quickly-on-web-apps-part-2/

This series is about building flexible solutions that can adapt as user requirements change. One of the challenges of building modern web applications is that requirements can change quickly. This is especially true for new applications that are finding their product-market fit. Many development teams start building a product with one set of requirements, and quickly find they must build a product with different features.

For both start-ups and enterprises, it’s often important to find a development methodology and architecture that allows flexibility. This is the surest way to keep up with feature requests in evolving products and innovate to delight your end-users. In this post, I show how to build sophisticated workflows using minimal custom code.

Part 1 introduces the Happy Path application that allows park visitors to share maps and photos with other users. In that post, I explain the functionality, how to deploy the application, and walk through the backend architecture.

The Happy Path application accepts photo uploads from users’ smartphones. The application architecture must support 100,000 monthly active users. These binary uploads are typically 3–9 MB in size and must be resized and optimized for efficient distribution.

Using a serverless approach, you can develop a robust low-code solution that can scale to handle millions of images. Additionally, the solution shown here is designed to handle complex changes that are introduced in subsequent versions of the software. The code and instructions for this application are available in the GitHub repo.

Architecture overview

After installing the backend in the previous post, the architecture looks like this:

In this design, the API, storage, and notification layers exist as one application, and the business logic layer is a separate application. These two applications are deployed using AWS Serverless Application Model (AWS SAM) templates. This architecture uses Amazon EventBridge to pass events between the two applications.

In the business logic layer:

The workflow starts when events are received from EventBridge. Each time a new object is uploaded by an end-user, the PUT event in the Amazon S3 Upload bucket triggers this process.
After the workflow is completed successfully, processed images are stored in the Distribution bucket. Related metadata for the object is also stored in the application’s Amazon DynamoDB table.

By separating the architecture into two independent applications, you can replace the business logic layer as needed. Providing that the workflow accepts incoming events and then stores processed images in the S3 bucket and DynamoDB table, the workflow logic becomes interchangeable. Using the pattern, this workflow can be upgraded to handle new functionality.

Introducing AWS Step Functions for workflow management

One of the challenges in building distributed applications is coordinating components. These systems are composed of separate services, which makes orchestrating workflows more difficult than working with a single monolithic application. As business logic grows more complex, if you attempt to manage this in custom code, it can become quickly convoluted. This is especially true if it handles retries and error handling logic, and it can be hard to test and maintain.

AWS Step Functions is designed to coordinate and manage these workflows in distributed serverless applications. To do this, you create state machine diagrams using Amazon States Language (ASL). Step Functions renders a visualization of your state machine, which makes it simpler to see the flow of data from one service to another.

Each state machine consists of a series of steps. Each step takes an input and produces an output. Using ASL, you define how this data progresses through the state machine. The flow from step to step is called a transition. All state machines transition from a Start state towards an End state.

The Step Functions service manages the state of individual executions. The service also supports versioning, which makes it easier to modify state machines in production systems. Executions continue to use the version of a state machine when they were started, so it’s possible to have active executions on multiple versions.

For developers using VS Code, the AWS Toolkit extension provides support for writing state machines using ASL. It also renders visualizations of those workflows. Combined with AWS Serverless Application Model (AWS SAM) templates, this provides a powerful way to deploy and maintain applications based on Step Functions. I refer to this IDE and AWS SAM in this walkthrough.

Version 1: Image resizing

The Happy Path application uses Step Functions to manage the image-processing part of the backend. The first version of this workflow resizes the uploaded image.

To see this workflow:

In VS Code, open the workflows/statemachines folder in the Explorer panel.
Choose the v1.asl.sjon file.
Choose the Render graph option in the CodeLens. This opens the workflow visualization.

In this basic workflow, the state machine starts at the Resizer step, then progresses to the Publish step before ending:

In the top-level attributes in the definition, StartsAt sets the Resizer step as the first action.
The Resizer step is defined as a task with an ARN of a Lambda function. The Next attribute determines that the Publish step is next.
In the Publish step, this task defines a Lambda function using an ARN reference. It sets the input payload as the entire JSON payload. This step is set as the End of the workflow.

Deploying the Step Functions workflow

To deploy the state machine:

In the terminal window, change directory to the workflows/templates/v1 folder in the repo.
Execute these commands to build and deploy the AWS SAM template:
```
sam build
sam deploy –guided
```
The deploy process prompts you for several parameters. Enter happy-path-workflow-v1 as the Stack Name. The other values are the outputs from the backend deployment process, detailed in the repo’s README. Enter these to complete the deployment.

Testing and inspecting the deployed workflow

Now the workflow is deployed, you perform an integration test directly from the frontend application.

To test the deployed v1 workflow:

Open the frontend application at https://localhost:8080 in your browser.
Select a park location, choose Show Details, and then choose Upload images.
Select an image from the sample photo dataset.
After a few seconds, you see a pop-up message confirming that the image has been added:
Select the same park location again, and the information window now shows the uploaded image:

To see how the workflow processed this image:

Navigate to the Steps Functions console.
Here you see the v1StateMachine with one execution in the Succeeded column.
Choose the state machine to display more information about the start and end time.
Select the execution ID in the Executions panel to open details of this single instance of the workflow.

This view shows important information that’s useful for understanding and debugging an execution. Under Input, you see the event passed into Step Functions by EventBridge:

This contains detail about the S3 object event, such as the bucket name and key, together with the placeId, which identifies the location on the map. Under Output, you see the final result from the state machine, shows a successful StatusCode (200) and other metadata:

Using AWS SAM to define and deploy Step Functions state machines

The AWS SAM template defines both the state machine, the trigger for executions, and the permissions needed for Step Functions to execute. The AWS SAM resource for a Step Functions definition is AWS::Serverless::StateMachine.

In this example:

DefinitionUri refers to an external ASL definition, instead of embedding the JSON in the AWS SAM template directly.
DefinitionSubstitutions allow you to use tokens in the ASL definition that refer to resources created in the AWS SAM template. For example, the token ${ResizerFunctionArn} refers to the ARN of the resizer Lambda function.
Events define how the state machine is invoked. Here it defines an EventBridge rule. If an event matches this source and detail-type, it triggers an execution.
Policies: the Step Functions service must have permission to invoke the services that perform tasks in the state machine. AWS SAM policy templates provide a convenient shorthand for common execution policies, such as invoking a Lambda function.

This workflow application is separate from the main backend template. As more functionality is added to the workflow, you deploy the subsequent AWS SAM templates in the same way.

Conclusion

Using AWS SAM, you can specify serverless resources, configure permissions, and define substitutions for the ASL template. You can deploy a standalone Step Functions-based application using the AWS SAM CLI, separately from other parts of your application. This makes it easier to decouple and maintain larger applications. You can visualize these workflows directly in the VS Code IDE in addition to the AWS Management Console.

In part 3, I show how to build progressively more complex workflows and how to deploy these in-place without affecting the other parts of the application.

To learn more about building serverless web applications, see the Ask Around Me series.

Example setup

Prerequisite

IAM

Trust Policies

Instance role

Service Role

Compute environment

Job Queue

Job definition

Job Submission

Splunk

Conclusion

Solution Architecture

Conclusion

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Cost estimation and clean up

Conclusion

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Architecture Overview

Steps

Cleaning Up

Conclusion

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Overview of Lambda layers

Creating a layer containing the AWS SDK

Creating layers with OS-specific binaries

Sharing a layer

Conclusion

Solution overview

Prerequisites

Configuring AWS SSO role permissions

Creating your AWS SSO group

Assigning your group and permission sets to your accounts

Assigning a user to the group

Enabling AWS SSO login

Installing git-remote-codecommit

Cleaning up

Conclusion

About UiPath software

UiPath on AWS

Prerequisites

Deployment Steps

Cleaning up

Conclusion

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Account Structure

Security considerations in the shared account setup

Setting up the shared master AWS account structure

Clean Up

Conclusion

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Customer Example

Options to source identities into AWS Single Sign-On

Solution

Conclusion

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Architecture Overview

Walkthrough

Cleaning up

Conclusion

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Architecture overview

Introducing AWS Step Functions for workflow management

Version 1: Image resizing

Deploying the Step Functions workflow

Testing and inspecting the deployed workflow

Using AWS SAM to define and deploy Step Functions state machines

Conclusion

The collective thoughts of the interwebz