Tag Archives: AWS Lambda

Serverless architecture for optimizing Amazon Connect call-recording archival costs

2022-06-24 Brian Maguire

Post Syndicated from Brian Maguire original https://aws.amazon.com/blogs/architecture/serverless-architecture-for-optimizing-amazon-connect-call-recording-archival-costs/

In this post, we provide a serverless solution to cost-optimize the storage of contact-center call recordings. The solution automates the scheduling, storage-tiering, and resampling of call-recording files, resulting in immediate cost savings. The solution is an asynchronous architecture built using AWS Step Functions, Amazon Simple Queue Service (Amazon SQS), and AWS Lambda.

Amazon Connect provides an omnichannel cloud contact center with the ability to maintain call recordings for compliance and gaining actionable insights using Contact Lens for Amazon Connect and AWS Contact Center Intelligence Partners. The storage required for call recordings can quickly increase as customers meet compliance retention requirements, often spanning six or more years. This can lead to hundreds of terabytes in long-term storage.

Solution overview

When an agent completes a customer call, Amazon Connect sends the call recording to an Amazon Simple Storage Solution (Amazon S3) bucket with: a date and contact ID prefix, the file stored in the .WAV format and encoded using bitrate 256 kb/s, pcm_s16le, 8000 Hz, two channels, and 256 kb/s. The call-recording files are approximately 2 Mb/minute optimized for high-quality processing, such as machine learning analysis (see Figure 1).

Figure 1. Asynchronous architecture for batch resampling for call-recording files on Amazon S3

When a call recording is sent to Amazon S3, downstream post-processing is often performed to generate analytics reports for agents and quality auditors. The downstream processing can include services that provide transcriptions, quality-of-service metrics, and sentiment analysis to create reports and trigger actionable events.

While this processing is often completed within minutes, the downstream applications could require processing retries. As audio resampling reduces the quality of the audio files, it is essential to delay resampling until after processing is completed. As processed call recordings are infrequently accessed days after a call is completed, with only a small percentage accessed by agents and call quality auditors, call recordings can benefit from resampling and transitioning to long-term Amazon S3 storage tiers.

In Figure 2, multiple AWS services work together to provide an end-to-end cost-optimization solution for your contact center call recordings.

Figure 2. AWS Step Function orchestrates the batch resampling of call recordings

An Amazon EventBridge schedule rule triggers the step function to perform the batch resampling process for all call recordings from the previous 7 days.

In the first step function task, the Lambda function task iterates the S3 bucket using the ListObjectsV2 API, obtaining the call recordings (1000 objects per iteration) with the date prefix from 7 days ago.

The next task invokes a Lambda function inserting the call recording objects into the Amazon SQS queue. The audio-conversion Lambda function receives the Amazon SQS queue events via the event source mapping Lambda integration. Each concurrent Lambda invocation downloads a stored call recording from Amazon S3, resampling the .WAV with ffmpeg and tagging the S3 object with a “converted=True” tag.

Finally, the conversion function uploads the resampled file to Amazon S3, overwriting the original call recording with the resampled recording using a cost-optimized storage class, such as S3 Glacier Instant Retrieval. S3 Glacier Instant Retrieval provides the lowest cost for long-lived data that is rarely accessed and requires milliseconds retrieval, such as for contact-center call-recording playback. By default, Amazon Connect stores call recordings with S3 Versioning enabled, maintaining the original file as a version. You can use lifecycle policies to delete object versions from a version-enabled bucket to permanently remove the original version, as this will minimize the storage of the original call recording.

This solution captures failures within the step function workflow with logging and a dead-letter queue, such as when an error occurs with resampling a recording file. A Step Function task monitors the Amazon SQS queue using the AWS Step Function integration with AWS SDK with SQS and ending the workflow when the queue is emptied. Table 1 demonstrates the default and resampled formats.

Figure 3. Detailed AWS Step Functions state machine diagram

Resampling

Table 1. Default and resampled call recording audio formats

Audio sampling formats	File size/minute	Notes
Bitrate 256 kb/s, pcm_s16le, 8000 Hz, 2 channels, 256 kb/s	2 MB	The default for Amazon Connect call recordings. Sampled for audio quality and call analytics processing.
Bitrate 64 kb/s, pcm_alaw, 8000 Hz, 1 channel, 64 kb/s	0.5 MB	Resampled to mono channel 8 bit. This resampling is not reversible and should only be performed after all call analytics processing has been completed.

Cost assessment

For pricing information for the primary services used in the solution, visit:

The costs incurred by the solution are based on usage and are AWS Free Tier eligible. After the AWS Free Tier allowance is consumed, usage costs are approximately $0.11 per 1000 minutes of call recordings. S3 Standard starts at $0.023 per GB/month; and S3 Glacier Instant Retrieval is $0.004 per GB/month, with $0.003 per GB of data retrieval. During a 6-year compliance retention term, the schedule-based resampling and storage tiering results in significant cost savings.

In the 6-year example detailed in Table 2, the S3 Standard storage costs would be approximately $356,664 for 3 million call-recording minutes/month. The audio resampling and S3 Glacier Instant Retrieval tiering reduces the 6-year cost to approximately $41,838.

Table 2. Multi-year costs savings scenario (3 million minutes/month) in USD

Year	Total minutes (3 million/month)	Total storage (TB)	Cost of storage, S3 Standard (USD)	Cost of running the resampling (USD)	Cost of resampling solution with S3 Glacier Instant Retrieval (USD)
1	36,000,000	72	10,764	3,960	4,813
2	72,000,000	108	30,636	3,960	5,677
3	108,000,000	144	50,508	3,960	6,541
4	144,000,000	180	70,380	3,960	7,405
5	180,000,000	216	90,252	3,960	8,269
6	216,000,000	252	110,124	3,960	9,133
Total	1,008,000,000	972	356,664	23,760	41,838

To explore PCA costs for yourself, use AWS Cost Explorer or choose Bill Details on the AWS Billing Dashboard to see your month-to-date spend by service.

Deploying the solution

The code and documentation for this solution are available by cloning the git repository and can be deployed with AWS Cloud Development Kit (AWS CDK).

Bash
# clone repository
git clone https://github.com/aws-samples/amazon-connect-call-recording-cost-optimizer.git
# navigate the project directory
cd amazon-connect-call-recording-cost-optimizer

Modify the cdk.context.json with your environment’s configuration setting, such as the bucket_name. Next, install the AWS CDK dependencies and deploy the solution:

:# ensure you are in the root directory of the repository

./cdk-deploy.sh

Once deployed, you can test the resampling solution by waiting for the EventBridge schedule rule to execute based on the num_days_age setting that is applied. You can also manually run the AWS Step Function with a specified date, for example {"specific_date":"01/01/2022"}.

The AWS CDK deployment creates the following resources:

AWS Step Function
AWS Lambda function
Amazon SQS queues
Amazon EventBridge rule

The solution handles the automation of transitioning a storage tier, such as S3 Glacier Instant Retrieval. In addition, Amazon S3 Lifecycles can be set manually to transition the call recordings after resampling to alternative Amazon S3 Storage Classes.

Cleanup

When you are finished experimenting with this solution, cleanup your resources by running the command:

cdk destroy

This command deletes the AWS CDK-deployed resources. However, the S3 bucket containing your call recordings and CloudWatch log groups are retained.

Conclusion

This call recording resampling solution offers an automated, cost-optimized, and scalable architecture to reduce long-term compliance call recording archival costs.

Build Health Aware CI/CD Pipelines

2022-06-20 sangusah

Post Syndicated from sangusah original https://aws.amazon.com/blogs/devops/build-health-aware-ci-cd-pipelines/

Everything fails all the time — Werner Vogels, AWS CTO

At the moment of imminent failure, you want to avoid an unlucky deployment. I’ll start here with a short story that demonstrates the purpose of this post.

The DevOps team has just started a database upgrade with a planned outage of 30 minutes. The team automated the entire upgrade flow, triggered a CI/CD pipeline with no human intervention, and the upgrade is progressing smoothly. Then, 20 minutes in, the pipeline is stuck, and your upgrade isn’t progressing. The maintenance window has expired and customers can’t transact. You’ve created a support case, and the AWS engineer confirmed that the upgrade is failing because of a running AWS Health incident in the us-west-2 Region. The engineer has directed the DevOps team to continue monitoring the status.aws.amazon.com page for updates regarding incident resolution. The event continued running for three hours, during which time customers couldn’t transact. Once resolved, the DevOps team retried the failed pipeline, and it completed successfully.

After the incident, the DevOps team explored the possibilities for avoiding these types of incidents in the future. The team was made aware of AWS Health API that provides programmatic access to AWS Health information. In this post, we’ll help the DevOps team make the most of the AWS Health API to proactively prevent unintended outages.

AWS provides Business and Enterprise Support customers with access to the AWS Health API. Customers can have access to running events in the AWS infrastructure that may impact their service usage. Incidents could be Regional, AZ-specific, or even account specific. During these incidents, it isn’t recommended to deploy or change services that are impacted by the event.

In this post, I will walk you through how to embed AWS Health API insights into your CI/CD pipelines to automatically stop deployments whenever an AWS Health event is reported in a Region that you’re operating in. Furthermore, I will demonstrate how you can automate detection and remediation.

The Demo

In this demo, I will use AWS CodePipeline to demonstrate the idea. I will build a simple pipeline that demonstrates the concept without going into the build, test, and deployment specifics.

CodePipeline Flow

The CodePipeline flow consists of three steps:

Source stage that downloads a CloudFormation template from AWS CodeCommit. The template will be deployed in the last stage.
Custom stage that invokes the AWS Lambda function to evaluate the AWS Health. The Lambda function calls the AWS Health API, evaluates the health risk, and calls back CodePipeline with the assessment result.
Deploy stage that deploys the CloudFormation templates downloaded from CodeCommit in the first stage.

The CodePipeline flow consists of 3 steps. First, "source stage" that downloads a CloudFormation template from CodeCommit. The template will be deployed in the last stage. Step 2 is a "custom stage" that invokes the Lambda function to evaluate AWS Health. The Lambda function calls the AWS Health API, evaluates the health risk and calls back CodePipeline with the assessment result. Finally, step 3 is a "deploy stage" that deploys the CloudFormation template downloaded from CodeCommit in the first stage. If a health is detected in step 2, the workflow will retry after a predefined timeout.

Figure 1. CodePipeline workflow.

Lambda evaluation logic

The Lambda function evaluates whether or not a running AWS Health event may be impacted by the deployment. In this case, the following criteria must be met to consider it as safe to deploy:

Deployment will take place in the North Virginia Region and accordingly the Lambda function will filter on the us-east-1 Region.
A closed event is irrelevant. The Lambda function will filter events with only the open status.
AWS Health API can return different event types that may not be relevant, such as: Scheduled Maintenance, and Account and Billing notifications. The Lambda function will filter only “Issue” type events.

The AWS Health API follows a multi-Region application architecture and has two regional endpoints in an active-passive configuration. To support active-passive DNS failover, AWS Health provides a global endpoint. The Python code is available on GitHub with more information in the README on how to build the Lambda code package.

The Lambda function requires the following AWS Identity and Access Management (IAM) permissions to access AWS Health API, CodePipeline, and publish logs to CloudWatch:

{
  "Version": "2012-10-17", 
  "Statement": [
    {
      "Action": [ 
        "logs:CreateLogStream",
        "logs:CreateLogGroup",
        "logs:PutLogEvents"
      ],
      "Effect": "Allow", 
      "Resource": "arn:aws:logs:us-east-1:replaceWithAccountNumber:*"
    },
    {
      "Action": [
        "codepipeline:PutJobSuccessResult",
        "codepipeline:PutJobFailureResult"
        ],
        "Effect": "Allow",
        "Resource": "*"
     },
     {
        "Effect": "Allow",
        "Action": "health:DescribeEvents",
        "Resource": "*"
    }
  ]
}

Solution architecture

Figure 2. Solution architecture diagram.

In CodePipeline, create a new stage with a single action to asynchronously invoke a Lambda function. The function will call AWS Health DescribeEvents API to retrieve the list of active health incidents. Then, the function will complete the event analysis and decide whether or not it may impact the running deployment. Finally, the function will call back CodePipeline with the evaluation results through either PutJobSuccessResult or PutJobFailureResult API operations.

If the Lambda evaluation succeeds, then it will call back the pipeline with a PutJobSuccessResult API. In turn, the pipeline will mark the step as successful and complete the execution.

AWS Code Pipeline workflow execution snapshot from the AWS Console. The first step, Source is a success after completing source code download from AWS CodeCommit service. The second step, check the AWS service health is a success as well.

Figure 3. AWS Code Pipeline workflow successful execution.

If the Lambda evaluation fails, then it will call back the pipeline with a PutJobFailureResult API specifying a failure message. Once the DevOps team is made aware that the event has been resolved, select the Retry button to re-evaluate the health status.

Figure 4. AWS CodePipeline workflow failed execution.

Your DevOps team must be aware of failed deployments. Therefore, it’s a good idea to configure alerts to notify concerned stakeholders with failed stage executions. Create a notification rule that posts a Slack message if a stage fails. For detailed steps, see Create a notification rule – AWS CodePipeline. In case of failure, a Slack notification will be sent through AWS Chatbot.

A Slack UI snapshot showing the notification to be sent if a deployment fails to execute. The notification shows a title of "AWS CodePipeline Notification". The notification indicates that one action has failed in the stage aws-health-check. The notification also shows that the failure reason is that there is an Incident In Progress. The notification also mentions the Pipeline name as well as the failed stage name.

Figure 5. Slack UI snapshot notification for a failed deployment.

A more elegant solution involves pushing the notification to an SNS topic that in turns calls a Lambda function to retry the failed stage. The Lambda function extracts the pipeline failed stage identifier, and then calls the RetryStageExecution CodePipeline API.

Conclusion

We’ve learned how to create an automation that evaluates the risk associated with proceeding with a deployment in conjunction with a running AWS Health event. Then, the automation decides whether to proceed with the deployment or block the progress to avoid unintended downtime. Accordingly, this results in the improved availability of your application.

This solution isn’t exclusive to CodePipeline. However, the pattern can be applied to other CI/CD tools that your DevOps team uses.

Author:

Adding approval notifications to EC2 Image Builder before sharing AMIs

2022-06-15 Sheila Busser

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/adding-approval-notifications-to-ec2-image-builder-before-sharing-amis/

This blog post was written by, Glenn Chia Jin Wee, Associate Cloud Architect at AWS and Randall Han, Associate Professional Services Consultant at AWS.

In some situations, you may be required to manually validate the Amazon Machine Image (AMI) built from an Amazon Elastic Compute Cloud (Amazon EC2) Image Builder pipeline before sharing this AMI to other AWS accounts or to an AWS Organization. Currently, Image Builder provides an end-to-end pipeline that automatically shares AMIs after they’ve been built.

In this post, we will walk through the steps to enable approval notifications before AMIs are shared with other AWS accounts. Having a manual approval step could be useful if you would like to verify the AMI configurations before it is shared to other AWS accounts or an AWS Organization. This reduces the possibility of incorrectly configured AMIs being shared to other teams which in turn could lead to downstream issues if applications are installed using this AMI. This solution uses serverless resources to send an email with a link that automatically shares the AMI with the specified AWS accounts. Users select this link after they’ve verified that the AMI is built according to specifications.

Overview

In this solution, an Image Builder Pipeline is run that builds a Golden AMI in Account A. After the AMI is built, Image Builder publishes data about the AMI to an Amazon Simple Notification Service (Amazon SNS) topic.
This SNS Topic passes the data to an AWS Lambda function that subscribes to it.
The Lambda function that subscribes to this topic retrieves the data, formats it, and sends a customized email to another SNS Topic.
The second SNS Topic has an email subscription with the Approver’s email. The approver will receive the customized email with a URL that interacts with the next set of Serverless resources.
Selecting the URL makes a GET request to Amazon API Gateway, thereby passing the AMI ID in the query string.
API Gateway then triggers another Lambda function and passes the AMI ID to it.
The Lambda function obtains the AMI ID from the query string parameter of the API Gateway request, and then shares it with the provided target account.

Prerequisites

For this walkthrough, you will need the following:

Two AWS accounts – one to host solution resources, and the second with which to share the built AMI.
AWS Serverless Application Model (AWS SAM) installed, and AWS credentials configured for Account A. Refer to AWS docs: Installing the AWS SAM CLI for the installation and configuration steps.
A new Amazon Virtual Private Cloud (Amazon VPC) will be created from the stack. Make sure that you have fewer than five VPCs in the selected Region.

Walkthrough

In this section, we will guide you through the steps required to deploy the Image Builder solution that utilizes Serverless resources. The solution is deployed with AWS SAM.

In this scenario, we deploy the solution within the approver’s account. The approval email will be sent to a predefined email address for manual approval, before the newly created AMI is shared to target accounts.

Once the approver selects the approval link, an email notification will be sent to the predefined target account email address, notifying that the AMI has been successfully shared.

The high-level steps we will follow are:

In Account A, deploy the provided AWS SAM template. This includes an example Image Builder Pipeline, Amazon SNS topics, API Gateway, and Lambda functions.
Approve the SNS subscription from your supplied email address.
Run the pipeline from the Amazon EC2 Image Builder Console.
[Optional] After the pipeline runs, launch an Amazon EC2 instance from the built AMI to conduct manual tests
An Amazon SNS email will be sent to you with an API Gateway URL. When clicked, an AWS Lambda function shares the AMI to the Account B.
Log in to Account B and verify that the AMI has been shared.

Step 1: Launch the AWS SAM template

Clone the SAM templates from this GitHub repository.
Run the following command to deploy the templates via SAM. Replace <approver email> with the Approver’s email and <AWS Account B ID> with the AWS Account ID of your second AWS Account.

sam deploy \

–template-file template.yaml \

–stack-name ec2-image-builder-approver-notifications \

–capabilities CAPABILITY_IAM \

–resolve-s3 \

–parameter-overrides \

ApproverEmail=<approver email> \

TargetAccountEmail=<target account email> \

TargetAccountlds=<AWS Account B ID>

Step 2: Verify your email address

After running the deployment, you will receive an email prompting you to confirm the Subscription at the approver email address. Choose Confirm subscription.

This leads to the following screen, which shows that your subscription is confirmed.

Repeat the previous 2 steps for the target email address.

Step 3: Run the pipeline from the Image Builder console

In the Image Builder console, under Image pipelines, select the checkbox next to the Pipeline created, choose Actions, and select Run pipeline.

Note that the pipeline takes approximately 20 to 30 minutes to complete.

Step 4: [Optional] Launch an Amazon EC2 instance from the built AMI

There could be a requirement to manually validate the AMI before sharing it to other AWS accounts or to the AWS organization. With this requirement, approvers will launch an Amazon EC2 instance from the built AMI and conduct manual tests on the EC2 instance to make sure that it is functional.

In the Amazon EC2 console, under Images, choose AMIs. Validate that the AMI is created.

Follow AWS docs: Launching an EC2 instances from a custom AMI for steps on how to launch an Amazon EC2 instance from the AMI.

Step 5: Select the approval URL in the email sent

When the pipeline is run successfully, you will receive another email with a URL to share the AMI.

2. Selecting this URL results in the following screen which shows that the AMI share is successful.

Step 6: Verify that the AMI is shared to Account B

Log in to Account B.
In the Amazon EC2 console, under Images, choose AMIs. Then, in the dropdown, choose Private images. Validate that the AMI is shared.

3. Verify that a success email notification was sent to the target account email address provided.

Clean up

This section provides the necessary information for deleting various resources created as part of this post.

1. Deregister the AMIs created and shared.

a. Log in to Account A and follow the steps at AWS documentation: Deregister your Linux AMI.

2. Delete the SAM stack with the following command. Replace <region> with the Region of choice.

sam delete –stack-name ec2-image-builder-approver-notifications –no-prompts –region <region>

3. Delete the CloudWatch log groups for the Lambda functions. You’ll identify it with the name `/aws/lambda/ec2-image-builder-approve*`.

4. Consider deleting the Amazon S3 bucket used to store the packaged Lambda artifact.

Conclusion

In this post, we explained how to use Serverless resources to enable approval notifications for an Image Builder pipeline before AMIs are shared to other accounts. This solution can be extended to share to more than one AWS account or even to an AWS organization. With this solution, you will be notified when new golden images are created, allowing you to verify the correctness of their configuration before sharing them to for wider use. This reduces the possibility of sharing AMIs with misconfigurations that the written tests may not have identified.

We invite you to experiment with different AMIs created using Image Builder, and with different Image Builder components. Check out this GitHub repository for various examples that use Image Builder. Also check out this blog on Image builder integrations with EC2 Auto Scaling Instance Refresh. Let us know your questions and findings in the comments, and have fun!

Optimize Federated Query Performance using EXPLAIN and EXPLAIN ANALYZE in Amazon Athena

2022-06-14 Nishchai JM

Post Syndicated from Nishchai JM original https://aws.amazon.com/blogs/big-data/optimize-federated-query-performance-using-explain-and-explain-analyze-in-amazon-athena/

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon Simple Storage Service (Amazon S3) using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. In 2019, Athena added support for federated queries to run SQL queries across data stored in relational, non-relational, object, and custom data sources.

In 2021, Athena added support for the EXPLAIN statement, which can help you understand and improve the efficiency of your queries. The EXPLAIN statement provides a detailed breakdown of a query’s run plan. You can analyze the plan to identify and reduce query complexity and improve its runtime. You can also use EXPLAIN to validate SQL syntax prior to running the query. Doing so helps prevent errors that would have occurred while running the query.

Athena also added EXPLAIN ANALYZE, which displays the computational cost of your queries alongside their run plans. Administrators can benefit from using EXPLAIN ANALYZE because it provides a scanned data count, which helps you reduce financial impact due to user queries and apply optimizations for better cost control.

In this post, we demonstrate how to use and interpret EXPLAIN and EXPLAIN ANALYZE statements to improve Athena query performance when querying multiple data sources.

Solution overview

To demonstrate using EXPLAIN and EXPLAIN ANALYZE statements, we use the following services and resources:

Five database tables from the Major League Baseball dataset
AWS CloudFormation to provision AWS resources
Amazon S3, Amazon DynamoDB, and Amazon Aurora MySQL-Compatible Edition as data storage

Athena uses the AWS Glue Data Catalog to store and retrieve table metadata for the Amazon S3 data in your AWS account. The table metadata lets the Athena query engine know how to find, read, and process the data that you want to query. We use Athena data source connectors to connect to data sources external to Amazon S3.

Prerequisites

To deploy the CloudFormation template, you must have the following:

An AWS account
An AWS Identity and Access Management (IAM) user with access to the following services:
- Amazon Athena
- AWS CloudFormation
- Amazon DynamoDB
- AWS Glue
- AWS Lambda
- Amazon Relational Database Service (Amazon RDS)
- Amazon S3
- Amazon Virtual Private Cloud (Amazon VPC)

Provision resources with AWS CloudFormation

To deploy the CloudFormation template, complete the following steps:

Choose Launch Stack:

Follow the prompts on the AWS CloudFormation console to create the stack.
Note the key-value pairs on the stack’s Outputs tab.

You use these values when configuring the Athena data source connectors.

The CloudFormation template creates the following resources:

S3 buckets to store data and act as temporary spill buckets for Lambda
AWS Glue Data Catalog tables for the data in the S3 buckets
A DynamoDB table and Amazon RDS for MySQL tables, which are used to join multiple tables from different sources
A VPC, subnets, and endpoints, which are needed for Amazon RDS for MySQL and DynamoDB

The following figure shows the high-level data model for the data load.

Create the DynamoDB data source connector

To create the DynamoDB connector for Athena, complete the following steps:

On the Athena console, choose Data sources in the navigation pane.
Choose Create data source.
For Data sources, select Amazon DynamoDB.
Choose Next.

For Data source name, enter DDB.

For Lambda function, choose Create Lambda function.

This opens a new tab in your browser.

For Application name, enter AthenaDynamoDBConnector.
For SpillBucket, enter the value from the CloudFormation stack for AthenaSpillBucket.
For AthenaCatalogName, enter dynamodb-lambda-func.
Leave the remaining values at their defaults.
Select I acknowledge that this app creates custom IAM roles and resource policies.
Choose Deploy.

You’re returned to the Connect data sources section on the Athena console.

Choose the refresh icon next to Lambda function.
Choose the Lambda function you just created (dynamodb-lambda-func).

Choose Next.
Review the settings and choose Create data source.
If you haven’t already set up the Athena query results location, choose View settings on the Athena query editor page.

Choose Manage.
For Location of query result, browse to the S3 bucket specified for the Athena spill bucket in the CloudFormation template.
Add Athena-query to the S3 path.
Choose Save.

In the Athena query editor, for Data source, choose DDB.
For Database, choose default.

You can now explore the schema for the sportseventinfo table; the data is the same in DynamoDB.

Choose the options icon for the sportseventinfo table and choose Preview Table.

Create the Amazon RDS for MySQL data source connector

Now let’s create the connector for Amazon RDS for MySQL.

On the Athena console, choose Data sources in the navigation pane.
Choose Create data source.
For Data sources, select MySQL.
Choose Next.

For Data source name, enter MySQL.

For Lambda function, choose Create Lambda function.

For Application name, enter AthenaMySQLConnector.
For SecretNamePrefix, enter AthenaMySQLFederation.
For SpillBucket, enter the value from the CloudFormation stack for AthenaSpillBucket.
For DefaultConnectionString, enter the value from the CloudFormation stack for MySQLConnection.
For LambdaFunctionName, enter mysql-lambda-func.
For SecurityGroupIds, enter the value from the CloudFormation stack for RDSSecurityGroup.
For SubnetIds, enter the value from the CloudFormation stack for RDSSubnets.
Select I acknowledge that this app creates custom IAM roles and resource policies.
Choose Deploy.

On the Lambda console, open the function you created (mysql-lambda-func).
On the Configuration tab, under Environment variables, choose Edit.

Choose Add environment variable.
Enter a new key-value pair:
- For Key, enter MYSQL_connection_string.
- For Value, enter the value from the CloudFormation stack for MySQLConnection.
Choose Save.

Return to the Connect data sources section on the Athena console.
Choose the refresh icon next to Lambda function.
Choose the Lambda function you created (mysql-lamdba-function).

Choose Next.
Review the settings and choose Create data source.
In the Athena query editor, for Data Source, choose MYSQL.
For Database, choose sportsdata.

Choose the options icon by the tables and choose Preview Table to examine the data and schema.

In the following sections, we demonstrate different ways to optimize our queries.

Optimal join order using EXPLAIN plan

A join is a basic SQL operation to query data on multiple tables using relations on matching columns. Join operations affect how much data is read from a table, how much data is transferred to the intermediate stages through networks, and how much memory is needed to build up a hash table to facilitate a join.

If you have multiple join operations and these join tables aren’t in the correct order, you may experience performance issues. To demonstrate this, we use the following tables from difference sources and join them in a certain order. Then we observe the query runtime and improve performance by using the EXPLAIN feature from Athena, which provides some suggestions for optimizing the query.

The CloudFormation template you ran earlier loaded data into the following services:

AWS Storage	Table Name	Number of Rows
Amazon DynamoDB	sportseventinfo	657
Amazon S3	person	7,025,585
Amazon S3	ticketinfo	2,488

Let’s construct a query to find all those who participated in the event by type of tickets. The query runtime with the following join took approximately 7 mins to complete:

SELECT t.id AS ticket_id, 
e.eventid, 
p.first_name 
FROM 
"DDB"."default"."sportseventinfo" e, 
"AwsDataCatalog"."athenablog"."person" p, 
"AwsDataCatalog"."athenablog"."ticketinfo" t 
WHERE 
t.sporting_event_id = cast(e.eventid as double) 
AND t.ticketholder_id = p.id

Now let’s use EXPLAIN on the query to see its run plan. We use the same query as before, but add explain (TYPE DISTRIBUTED):

EXPLAIN (TYPE DISTRIBUTED)
SELECT t.id AS ticket_id, 
e.eventid, 
p.first_name 
FROM 
"DDB"."default"."sportseventinfo" e, 
"AwsDataCatalog"."athenablog"."person" p, 
"AwsDataCatalog"."athenablog"."ticketinfo" t 
WHERE 
t.sporting_event_id = cast(e.eventid as double) 
AND t.ticketholder_id = p.id

The following screenshot shows our output

Notice the cross-join in Fragment 1. The joins are converted to a Cartesian product for each table, where every record in a table is compared to every record in another table. Therefore, this query takes a significant amount of time to complete.

To optimize our query, we can rewrite it by reordering the joining tables as sportseventinfo first, ticketinfo second, and person last. The reason for this is because the WHERE clause, which is being converted to a JOIN ON clause during the query plan stage, doesn’t have the join relationship between the person table and sportseventinfo table. Therefore, the query plan generator converted the join type to cross-joins (a Cartesian product), which less efficient. Reordering the tables aligns the WHERE clause to the INNER JOIN type, which satisfies the JOIN ON clause and runtime is reduced from 7 minutes to 10 seconds.

The code for our optimized query is as follows:

SELECT t.id AS ticket_id, 
e.eventid, 
p.first_name 
FROM 
"DDB"."default"."sportseventinfo" e, 
"AwsDataCatalog"."athenablog"."ticketinfo" t, 
"AwsDataCatalog"."athenablog"."person" p 
WHERE 
t.sporting_event_id = cast(e.eventid as double) 
AND t.ticketholder_id = p.id

The following is the EXPLAIN output of our query after reordering the join clause:

EXPLAIN (TYPE DISTRIBUTED) 
SELECT t.id AS ticket_id, 
e.eventid, 
p.first_name 
FROM 
"DDB"."default"."sportseventinfo" e, 
"AwsDataCatalog"."athenablog"."ticketinfo" t, 
"AwsDataCatalog"."athenablog"."person" p 
WHERE t.sporting_event_id = cast(e.eventid as double) 
AND t.ticketholder_id = p.id

The following screenshot shows our output.

The cross-join changed to INNER JOIN with join on columns (eventid, id, ticketholder_id), which results in the query running faster. Joins between the ticketinfo and person tables converted to the PARTITION distribution type, where both left and right tables are hash-partitioned across all worker nodes due to the size of the person table. The join between the sportseventinfo table and ticketinfo are converted to the REPLICATED distribution type, where one table is hash-partitioned across all worker nodes and the other table is replicated to all worker nodes to perform the join operation.

For more information about how to analyze these results, refer to Understanding Athena EXPLAIN statement results.

As a best practice, we recommend having a JOIN statement along with an ON clause, as shown in the following code:

SELECT t.id AS ticket_id, 
e.eventid, 
p.first_name 
FROM 
"AwsDataCatalog"."athenablog"."person" p 
JOIN "AwsDataCatalog"."athenablog"."ticketinfo" t ON t.ticketholder_id = p.id 
JOIN "ddb"."default"."sportseventinfo" e ON t.sporting_event_id = cast(e.eventid as double)

Also as a best practice when you join two tables, specify the larger table on the left side of join and the smaller table on the right side of the join. Athena distributes the table on the right to worker nodes, and then streams the table on the left to do the join. If the table on the right is smaller, then less memory is used and the query runs faster.

In the following sections, we present examples of how to optimize pushdowns for filter predicates and projection filter operations for the Athena data source using EXPLAIN ANALYZE.

Pushdown optimization for the Athena connector for Amazon RDS for MySQL

A pushdown is an optimization to improve the performance of a SQL query by moving its processing as close to the data as possible. Pushdowns can drastically reduce SQL statement processing time by filtering data before transferring it over the network and filtering data before loading it into memory. The Athena connector for Amazon RDS for MySQL supports pushdowns for filter predicates and projection pushdowns.

The following table summarizes the services and tables we use to demonstrate a pushdown using Aurora MySQL.

Table Name	Number of Rows	Size in KB
player_partitioned	5,157	318.86
sport_team_partitioned	62	5.32

We use the following query as an example of a filtering predicate and projection filter:

SELECT full_name,
name 
FROM "sportsdata"."player_partitioned" a 
JOIN "sportsdata"."sport_team_partitioned" b ON a.sport_team_id=b.id 
WHERE a.id='1.0'

This query selects the players and their team based on their ID. It serves as an example of both filter operations in the WHERE clause and projection because it selects only two columns.

We use EXPLAIN ANALYZE to get the cost for the running this query:

EXPLAIN ANALYZE 
SELECT full_name,
name 
FROM "MYSQL"."sportsdata"."player_partitioned" a 
JOIN "MYSQL"."sportsdata"."sport_team_partitioned" b ON a.sport_team_id=b.id 
WHERE a.id='1.0'

The following screenshot shows the output in Fragment 2 for the table player_partitioned, in which we observe that the connector has a successful pushdown filter on the source side, so it tries to scan only one record out of the 5,157 records in the table. The output also shows that the query scan has only two columns (full_name as the projection column and sport_team_id and the join column), and uses SELECT and JOIN, which indicates the projection pushdown is successful. This helps reduce the data scan when using Athena data source connectors.

Now let’s look at the conditions in which a filter predicate pushdown doesn’t work with Athena connectors.

LIKE statement in filter predicates

We start with the following example query to demonstrate using the LIKE statement in filter predicates:

SELECT * 
FROM "MYSQL"."sportsdata"."player_partitioned" 
WHERE first_name LIKE '%Aar%'

We then add EXPLAIN ANALYZE:

EXPLAIN ANALYZE 
SELECT * 
FROM "MYSQL"."sportsdata"."player_partitioned" 
WHERE first_name LIKE '%Aar%'

The EXPLAIN ANALYZE output shows that the query performs the table scan (scanning the table player_partitioned, which contains 5,157 records) for all the records even though the WHERE clause only has 30 records matching the condition %Aar%. Therefore, the data scan shows the complete table size even with the WHERE clause.

We can optimize the same query by selecting only the required columns:

EXPLAIN ANALYZE 
SELECT sport_team_id,
full_name 
FROM "MYSQL"."sportsdata"."player_partitioned" 
WHERE first_name LIKE '%Aar%'

From the EXPLAIN ANALYZE output, we can observe that the connector supports the projection filter pushdown, because we select only two columns. This brought the data scan size down to half of the table size.

OR statement in filter predicates

We start with the following query to demonstrate using the OR statement in filter predicates:

SELECT id,
first_name 
FROM "MYSQL"."sportsdata"."player_partitioned" 
WHERE first_name = 'Aaron' OR id ='1.0'

We use EXPLAIN ANALYZE with the preceding query as follows:

EXPLAIN ANALYZE 
SELECT * 
FROM 
"MYSQL"."sportsdata"."player_partitioned" 
WHERE first_name = 'Aaron' OR id ='1.0'

Similar to the LIKE statement, the following output shows that query scanned the table instead of pushing down to only the records that matched the WHERE clause. This query outputs only 16 records, but the data scan indicates a complete scan.

Pushdown optimization for the Athena connector for DynamoDB

For our example using the DynamoDB connector, we use the following data:

Table	Number of Rows	Size in KB
sportseventinfo	657	85.75

Let’s test the filter predicate and project filter operation for our DynamoDB table using the following query. This query tries to get all the events and sports for a given location. We use EXPLAIN ANALYZE for the query as follows:

EXPLAIN ANALYZE 
SELECT EventId,
Sport 
FROM "DDB"."default"."sportseventinfo" 
WHERE Location = 'Chase Field'

The output of EXPLAIN ANALYZE shows that the filter predicate retrieved only 21 records, and the project filter selected only two columns to push down to the source. Therefore, the data scan for this query is less than the table size.

Now let’s see where filter predicate pushdown doesn’t work. In the WHERE clause, if you apply the TRIM() function to the Location column and then filter, predicate pushdown optimization doesn’t apply, but we still see the projection filter optimization, which does apply. See the following code:

EXPLAIN ANALYZE 
SELECT EventId,
Sport 
FROM "DDB"."default"."sportseventinfo" 
WHERE trim(Location) = 'Chase Field'

The output of EXPLAIN ANALYZE for this query shows that the query scans all the rows but is still limited to only two columns, which shows that the filter predicate doesn’t work when the TRIM function is applied.

We’ve seen from the preceding examples that the Athena data source connector for Amazon RDS for MySQL and DynamoDB do support filter predicates and projection predicates for pushdown optimization, but we also saw that operations such as LIKE, OR, and TRIM when used in the filter predicate don’t support pushdowns to the source. Therefore, if you encounter unexplained charges in your federated Athena query, we recommend using EXPLAIN ANALYZE with the query and determine whether your Athena connector supports the pushdown operation or not.

Please note that running EXPLAIN ANALYZE incurs cost because it scans the data.

Conclusion

In this post, we showcased how to use EXPLAIN and EXPLAIN ANALYZE to analyze Athena SQL queries for data sources on AWS S3 and Athena federated SQL query for data source like DynamoDB and Amazon RDS for MySQL. You can use this as an example to optimize queries which would also result in cost savings.

About the Authors

Nishchai JM is an Analytics Specialist Solutions Architect at Amazon Web services. He specializes in building Big-data applications and help customer to modernize their applications on Cloud. He thinks Data is new oil and spends most of his time in deriving insights out of the Data.

Varad Ram is Senior Solutions Architect in Amazon Web Services. He likes to help customers adopt to cloud technologies and is particularly interested in artificial intelligence. He believes deep learning will power future technology growth. In his spare time, he like to be outdoor with his daughter and son.

Extending PowerShell on AWS Lambda with other services

2022-06-02 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/extending-powershell-on-aws-lambda-with-other-services/

This post expands on the functionality introduced with the PowerShell custom runtime for AWS Lambda. The previous blog explains how the custom runtime approach makes it easier to run Lambda functions written in PowerShell.

You can add additional functionality to your PowerShell serverless applications by importing PowerShell modules, which are shareable packages of code. Build your own modules or import from the wide variety of existing vendor modules to manage your infrastructure and applications.

You can also take advantage of the event-driven nature of Lambda, which allows you to run Lambda functions in response to events. Events can include an object being uploaded to Amazon S3, a message placed on an Amazon SQS queue, a scheduled task using Amazon EventBridge, or an HTTP request from Amazon API Gateway. Lambda functions support event triggers from over 200 AWS services and software as a service (SaaS) applications.

Adding PowerShell modules

You can add PowerShell modules from a number of locations. These can include modules from the AWS Tools for PowerShell, from the PowerShell Gallery, or your own custom modules. Lambda functions access these PowerShell modules within specific folders within the Lambda runtime environment.

You can include PowerShell modules via Lambda layers, within your function code package, or container image. When using .zip archive functions, you can use layers to package and share modules to use with your functions. Layers reduce the size of uploaded deployment archives and can make it faster to deploy your code. You can attach up to five layers to your function, one of which must be the PowerShell custom runtime layer. You can include multiple modules per layer.

The custom runtime configures PowerShell’s PSModulePath environment variable, which contains the list of folder locations to search to find modules. The runtime searches the folders in the following order:

1. User supplied modules as part of function package

You can include PowerShell modules inside the published Lambda function package in a /modules subfolder.

2. User supplied modules as part of Lambda layers

You can publish Lambda layers that include PowerShell modules in a /modules subfolder. This allows you to share modules across functions and accounts. Lambda extracts layers to /opt within the Lambda runtime environment so the modules are located in /opt/modules. This is the preferred solution to use modules with multiple functions.

3. Default/user supplied modules supplied with PowerShell

You can also include additional default modules and add them within a /modules folder within the PowerShell custom runtime layer.

For example, the following function includes four Lambda layers. One layer includes the custom runtime. Three additional layers include further PowerShell modules; the AWS Tools for PowerShell, your own custom modules, and third-party modules. You can also include additional modules with your function code.

Lambda layers

Within your PowerShell code, you can load modules during the function initialization (init) phase. This initializes the modules before the handler function runs, which speeds up subsequent warm-start invocations.

Adding modules from the AWS Tools for PowerShell

This post shows how to use the AWS Tools for PowerShell to manage your AWS services and resources. The tools are packaged as a set of PowerShell modules that are built on the functionality exposed by the AWS SDK for .NET. You can follow similar packaging steps to add other modules to your functions.

The AWS Tools for PowerShell are available as three distinct packages:

The AWS.Tools package is the preferred modularized version, which allows you to load only the modules for the services you want to use. This reduces package size and function memory usage. The AWS.Tools cmdlets support auto-importing modules without having to call Import-Module first. However, specifically importing the modules during the function init phase is more efficient and can reduce subsequent invoke duration. The AWS.Tools.Common module is required and provides cmdlets for configuration and authentication that are not service specific.

The accompanying GitHub repository contains the code for the custom runtime, along with a number of example applications. There are also module build instructions for adding a number of common PowerShell modules as Lambda layers, including AWS.Tools.

Building an event-driven PowerShell function

The repository contains an example of an event-driven demo application that you can build using serverless services.

A clothing printing company must manage its t-shirt size and color inventory. The printers store t-shirt orders for each day in a CSV file. The inventory service is one service that must receive the CSV file. It parses the file and, for each order, records the details to manage stock deliveries.

The stores upload the files to S3. This automatically invokes a PowerShell Lambda function, which is configured to respond to the S3 ObjectCreated event. The Lambda function receives the S3 object location as part of the $LambdaInput event object. It uses the AWS Tools for PowerShell to download the file from S3. It parses the contents and, for each line in the CSV file, sends the individual order details as an event to an EventBridge event bus.

In this example, there is a single rule to log the event to Amazon CloudWatch Logs to show the received event. However, you could route each order, depending on the order details, to different targets. For example, you can send different color combinations to SQS queues, which the dyeing service can use to order dyes. You could send particular size combinations to another Lambda function that manages cloth orders.

Example event-driven application

The previous blog post shows how to use the AWS Serverless Application Model (AWS SAM) to build a Lambda layer, which includes only the AWS.Tools.Common module to run Get-AWSRegion. To build a PowerShell application to process objects from S3 and send events to EventBridge, you can extend this functionality by also including the AWS.Tools.S3 and AWS.Tools.EventBridge modules in a Lambda layer.

Lambda layers, including S3 and EventBridge

Building the AWS Tools for PowerShell layer

You could choose to add these modules and rebuild the existing layer. However, the example in this post creates a new Lambda layer to show how you can have different layers for different module combinations of AWS.Tools. The example also adds the Lambda layer Amazon Resource Name (ARN) to AWS Systems Manager Parameter Store to track deployed layers. This allows you to reference them more easily in infrastructure as code tools.

The repository includes build scripts for both Windows and non-Windows developers. Windows does not natively support Makefiles. When using Windows, you can use either Windows Subsystem for Linux (WSL), Docker Desktop, or native PowerShell.

When using Linux, macOS, WSL, or Docker, the Makefile builds the Lambda layers. After downloading the modules, it also extracts the additional AWS.Tools.S3 and AWS.Tools.EventBridge modules.

# Download AWSToolsLayer module binaries
curl -L -o $(ARTIFACTS_DIR)/AWS.Tools.zip https://sdk-for-net.amazonwebservices.com/ps/v4/latest/AWS.Tools.zip
mkdir -p $(ARTIFACTS_DIR)/modules

# Extract select AWS.Tools modules (AWS.Tools.Common required)
unzip $(ARTIFACTS_DIR)/AWS.Tools.zip 'AWS.Tools.Common/**/*' -d $(ARTIFACTS_DIR)/modules/
unzip $(ARTIFACTS_DIR)/AWS.Tools.zip 'AWS.Tools.S3/**/*' -d $(ARTIFACTS_DIR)/modules/
unzip $(ARTIFACTS_DIR)/AWS.Tools.zip 'AWS.Tools.EventBridge/**/*' -d $(ARTIFACTS_DIR)/modules/

When using native PowerShell on Windows to build the layer, the build-AWSToolsLayer.ps1 script performs the same file copy functionality as the Makefile. You can use this option for Windows without WSL or Docker.

### Extract entire AWS.Tools modules to stage area but only move over select modules
…
Move-Item "$PSScriptRoot\stage\AWS.Tools.Common" "$PSScriptRoot\modules\" -Force
Move-Item "$PSScriptRoot\stage\AWS.Tools.S3" "$PSScriptRoot\modules\" -Force
Move-Item "$PSScriptRoot\stage\AWS.Tools.EventBridge" "$PSScriptRoot\modules\" -Force

The Lambda function code imports the required modules in the function init phase.

Import-Module "AWS.Tools.Common"
Import-Module "AWS.Tools.S3"
Import-Module "AWS.Tools.EventBridge"

For other combinations of AWS.Tools, amend the example build-AWSToolsLayer.ps1 scripts to add the modules you require. You can use a similar download and copy process, or PowerShell’s Save-Module to build layers for modules from other locations.

Building and deploying the event-driven serverless application

Follow the instructions in the GitHub repository to build and deploy the application.

The demo application uses AWS SAM to deploy the following resources:

PowerShell custom runtime.
Additional Lambda layer containing the AWS.Tools.Common, AWS.Tools.S3, and AWS.Tools.EventBridge modules from AWS Tools for PowerShell. The layer ARN is stored in Parameter Store.
S3 bucket to store CSV files.
Lambda function triggered by S3 upload.
Custom EventBridge event bus and rule to send events to CloudWatch Logs.

Testing the event-driven application

Use the AWS CLI or AWS Tools for PowerShell to copy the sample CSV file to S3. Replace BUCKET_NAME with your S3 SourceBucket Name from the AWS SAM outputs.

AWS CLI

aws s3 cp .\test.csv s3://BUCKET_NAME

AWS Tools for PowerShell

Write-S3Object -BucketName BUCKET_NAME -File .\test.csv

The S3 file copy action generates an S3 notification event. This invokes the PowerShell Lambda function, passing the S3 file location details as part of the function $LambdaInput event object.

The function downloads the S3 CSV file, parses the contents, and sends the individual lines to EventBridge, which logs the events to CloudWatch Logs.

Navigate to the CloudWatch Logs group /aws/events/demo-s3-lambda-eventbridge.

You can see the individual orders logged from the CSV file.

EventBridge logs showing CSV lines

Conclusion

You can extend PowerShell Lambda applications to provide additional functionality.

This post shows how to import your own or vendor PowerShell modules and explains how to build Lambda layers for the AWS Tools for PowerShell.

You can also take advantage of the event-driven nature of Lambda to run Lambda functions in response to events. The demo application shows how a clothing printing company builds a PowerShell serverless application to manage its t-shirt size and color inventory.

See the accompanying GitHub repository, which contains the code for the custom runtime, along with additional installation options and additional examples.

Start running PowerShell on Lambda today.

For more serverless learning resources, visit Serverless Land.

Optimizing your AWS Lambda costs – Part 2

2022-06-02 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/optimizing-your-aws-lambda-costs-part-2/

This post is written by Chris Williams, Solutions Architect and Thomas Moore, Solutions Architect, Serverless.

Part 1 of this blog series looks at optimizing AWS Lambda costs through right-sizing a function’s memory, and code tuning. We also explore how using Graviton2, Provisioned Concurrency and Compute Savings Plans can offer a reduction in per-millisecond billing.

Part 2 continues to explore cost optimization techniques for Lambda with a focus on architectural improvements and cost-effective logging.

Event filtering

A common serverless architecture pattern is Lambda reading events from a queue or a stream, such as Amazon SQS or Amazon Kinesis Data Streams. This uses an event source mapping, which defines how the Lambda service handles incoming messages or records from the event source.

Sometimes you don’t want to process every message in the queue or stream because the data is not relevant. For example, if IoT vehicle data is sent to a Kinesis Stream and you only want to process events where tire_pressure is < 32, then the Lambda code may look like this:

def lambda_handler(event, context):
   if(event[“tire_pressure”] >=32):
      return

# business logic goes here

This is inefficient as you are paying for Lambda invocations and execution time when there is no business value beyond filtering.

Lambda now supports the ability to filter messages before invocation, simplifying your code and reducing costs. You only pay for Lambda when the event matches the filter criteria and triggers an invocation.

Filtering is supported for Kinesis Streams, Amazon DynamoDB Streams and SQS by specifying filter criteria when setting up the event source mapping. For example, using the following AWS CLI command:

aws lambda create-event-source-mapping \
--function-name fleet-tire-pressure-evaluator \
--batch-size 100 \
--starting-position LATEST \
--event-source-arn arn:aws:kinesis:us-east-1:123456789012:stream/fleet-telemetry \
--filter-criteria '{"Filters": [{"Pattern": "{\"tire_pressure\": [{\"numeric\": [\"<\", 32]}]}"}]}'

After applying the filter, Lambda is only invoked when tire_pressure is less than 32 in messages received from the Kinesis Stream. In this example, it may indicate a problem with the vehicle and require attention.

For more information on how to create filters, refer to examples of event pattern rules in EventBridge, as Lambda filters messages in the same way. Event filtering is explored in greater detail in the Lambda event filtering launch blog.

Avoid idle wait time

Lambda function duration is one dimension used for calculating billing. When function code makes a blocking call, you are billed for the time that it waits to receive a response.

This idle wait time can grow when Lambda functions are chained together, or a function is acting as an orchestrator for other functions. For customers who have workflows such as batch operations or order delivery systems, this adds management overhead. Additionally, it may not be possible to complete all workflow logic and error handling within the maximum Lambda timeout of 15 minutes.

Instead of handling this logic in function code, re-architect your solution to use AWS Step Functions as an orchestrator of the workflow. When using a standard workflow, you are billed for each state transition within the workflow rather than the total duration of the workflow. In addition, you can move support for retries, wait conditions, error workflows and callbacks into the state condition allowing your Lambda functions to focus on business logic.

The following example shows an example Step Functions state machine, where a single Lambda function is split into multiple states. During the wait period, there is no charge. You are only billed on state transition.

Direct integrations

If a Lambda function is not performing custom logic when it integrates with other AWS services, it may be unnecessary and could be replaced by a lower-cost direct integration.

For example, you may be using API Gateway together with a Lambda function to read from a DynamoDB table:

This could be replaced using a direct integration, removing the Lambda function:

API Gateway supports transformations to return the output response in a format the client expects. This avoids having to use a Lambda function to do the transformation. You can find more detailed instructions on creating an API Gateway with an AWS service integration in the documentation.

You can also benefit from direct integration when using Step Functions. Today, Step Functions supports over 200 AWS services and 9,000 API actions. This gives greater flexibility for direct service integration and in many cases removes the need for a proxy Lambda function. This can simplify Step Function workflows and may reduce compute costs.

Reduce logging output

Lambda automatically stores logs that the function code generates through Amazon CloudWatch Logs. This may be useful for understanding what is happening within your application in near real-time. CloudWatch Logs includes a charge for the total data ingested throughout the month. Therefore, reducing output to include only necessary information can help reduce costs.

When you deploy workloads into production, review the logging level of your application. For example, in a pre-production environment, debug logs can be beneficial in providing additional information to tune the function. Within your production workloads, you may disable debug level logs and use a logging library (such as the Lambda Powertools Python Logger). You can define a minimum logging level to output by using an environment variable, which allows configuration outside of the function code.

Structuring your log format enforces a standard set of information through a defined schema, instead of allowing variable formats or large volumes of text. Defining structures such as error codes and adding accompanying metrics leads to a reduction in the volume of text that repeats throughout your logs. This also improves the ability to filter logs for specific error types and reduces the risk of a mistyped character in a log message.

Use Cost-Effective Storage for Logs

Once CloudWatch Logs ingests data, by default it is persisted forever with a per-GB monthly storage fee. As log data ages, it typically becomes less valuable in the immediate time frame, and is instead reviewed historically on an ad-hoc basis. However, the storage pricing within CloudWatch Logs remains the same.

To avoid this, set retention policies on your CloudWatch Logs log groups to delete old log data automatically. This retention policy applies to both your existing and future log data.

Some applications logs may need to persist for months or years for compliance or regulatory requirements. Instead of keeping the logs in CloudWatch Logs, export them to Amazon S3. By doing this, you can take advantage of lower-cost storage object classes while factoring in any expected usage patterns for how or when data is accessed.

Conclusion

Cost optimization is an important part of creating well-architected solutions and this is no different when using serverless. This blog series explores some best practice techniques to help reduce your Lambda bill.

If you are already running AWS Lambda applications in production today, some techniques are easier to implement than others. For example, you can purchase Savings Plans with zero code or architecture changes, whereas avoiding idle wait time will require new services and code changes. Evaluate which technique is right for your workload in a development environment before applying changes to production.

If you are still in the design and development stages, use this blog series as a reference to incorporate these cost optimization techniques at an early stage. This ensures that your solutions are optimized from day one.

To get hands-on implementing some of the techniques discussed, take the Serverless Optimization Workshop.

For more serverless learning resources, visit Serverless Land.

Optimizing your AWS Lambda costs – Part 1

2022-06-02 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/optimizing-your-aws-lambda-costs-part-1/

This post is written by Chris Williams, Solutions Architect and Thomas Moore, Solutions Architect, Serverless.

When you develop and architect solutions, cost-optimization should always be part of the process. This is no different for serverless applications that have been developed using AWS Lambda.

Your workloads may vary in terms of complexity, usage patterns, and technology. However, the following advice is applicable to all customers when deciding how to prioritize cost optimization with the tradeoffs of developing the application:

Efficient code makes better use of resources.
Consider the downstream services in architectural decisions.
Optimization should be a continuous cycle of improvement.
Prioritize changes that make the greatest improvements first.

The Optimizing your AWS Lambda costs blog series reviews operational and architectural guidance. It can be applied to existing Lambda functions and those that you develop in the future.

Introduction to Lambda pricing

Lambda pricing is calculated as a combination of:

Total number of requests
Total duration of invocations
Configured memory allocated

When you optimize Lambda functions, each of these components impacts the total monthly cost. This pricing applies after you exceed the AWS Free Tier that is offered for Lambda.

Right-Sizing

Right-sizing is a good starting point in the process of cost optimization. This exercise helps to identify the lowest cost for applications without affecting performance or requiring code changes.

For Lambda, this is accomplished by configuring the memory for a function, ranging anywhere from 128 MB up to 10,240 MB (10 GB). By adjusting the memory configuration, you also adjust the amount of vCPU that is available to the function during invocation. Tuning these settings provides memory- or CPU-bound applications with access to additional resources during the execution, which may lead to an overall reduced duration of invocation.

Identifying the optimal configuration for your Lambda functions may be manually intensive, especially if changes are made frequently. The AWS Lambda Power Tuning tool provides a solution powered by AWS Step Functions that can help identify the appropriate configuration. It analyzes a set of memory configurations against an example payload.

In the following example, as the memory increases for this Lambda function, the total invocation time improves. This leads to a reduction in the cost for the total execution without affecting the original performance of the function. For this function, the optimal memory configuration for the function is 512 MB, as this is where the resource utilization is most efficient for the total cost of each invocation. This varies per function, and using the tool on your Lambda functions can identify if they benefit from right-sizing.

This exercise should be completed regularly, especially if code is released frequently. Once you have identified the appropriate memory setting for your Lambda functions, you should add right-sizing to your processes. The AWS Lambda Power Tuning tool generates programmatic output that can be used by your CI/CD workflows during the release of new code, allowing the automation of memory configuration.

Performance efficiency

A key component to Lambda pricing is the total duration of the invocation. The longer the function takes to run, the more it costs and the higher the latency in your application. For this reason, it is important to ensure the code you write is as efficient as possible and follows the Lambda best practices.

At a high level, to optimize code :

Minimize deployment package size to its runtime necessities. This reduces the amount of time it takes for the package to be downloaded and unpacked.
Minimize the complexity of dependencies. Simpler frameworks often load faster.
Take advantage of execution reuse. Initialize SDK clients and database connections outside the function handler, and cache static assets locally in the /tmp directory. Subsequent invocations can reuse open connections and resources in memory and in /tmp.
Follow general coding performance best practices for your chosen language and runtime.

To help visualize the components of your application and identify performance bottlenecks, use AWS X-Ray with Lambda. You can enable X-Ray active tracing on new and existing functions by editing the function configuration. For example, with the AWS CLI:

aws lambda update-function-configuration --function-name my-function \
--tracing-config Mode=Active

The AWS X-Ray SDK can be used to trace all AWS SDK calls inside Lambda functions. This helps to identify any bottlenecks in the application performance. The X-Ray SDK for Python can be used to capture data for other libraries such as requests, sqlite3, and httplib, as shown in the following example:

Amazon CodeGuru Profiler is another tool that can help with code optimization. It uses machines learning algorithms to help find the most expensive lines of code and suggests ways to improve efficiency. It can be enabled on existing Lambda functions by enabling code profiling in the function configuration.

The CodeGuru console shows the results as a series of visualizations and recommendations.

Use these tools together with the documented best practices to evaluate your code’s performance when developing your serverless applications. Efficient code often means faster applications and reduced costs.

AWS Graviton2

In September 2021, Lambda functions powered by Arm-based AWS Graviton2 processors became generally available. Graviton2 functions are designed to deliver up to 19% better performance at 20% lower cost than x86. In addition to the lower billing cost when using Arm, you could also see a reduction in the function duration due to the CPU performance improvement, reducing costs even further.

You can configure both new and existing functions to target the AWS Graviton2 processor. Functions are invoked in the same way and integrations with services, applications and tools are not affected by the architecture change. Many functions only need the configuration change to take advantage of the price/performance of Graviton2. Others may require repackaging to use Arm-specific dependencies.

It’s always recommended to test your workloads before making the change. To see how much your code benefits from using Graviton2, use the Lambda Power Tuning tool to compare against x86. The tool allows you to compare two results on the same chart:

Provisioned concurrency

Where customers are looking to reduce their Lambda function cold starts or avoid burst throttling, provisioned concurrency provides execution environments that are ready to be invoked. It can also reduce total Lambda cost when there is a consistent volume of traffic. This is because the provisioned concurrency pricing model offers a lower total price when it is fully used.

Similar to the standard Lambda function pricing, there are price components for total requests, total duration, and memory configuration. In addition, there is the cost of each provisioned concurrency environment (based on its memory configuration). When this execution environment is fully utilized, the combined cost of the invocation and the execution environment can offer up to 16% savings on duration cost when compared to the regular on-demand pricing.

If you cannot maximize usage in an execution environment, provisioned concurrency can still offer a lower total price per invocation. In the following example, once it’s consumed for more than 60% of the available time, it becomes cheaper than using the on-demand pricing model. The savings increase in line with capacity usage.

To identify the invocation baseline of a Lambda function, look at the average concurrent execution metrics per hour over the previous 24 hours. This helps you to find a consistent baseline throughout the day where you are consistently using multiple execution environments.

For Lambda functions where peak invocation levels are only expected during particular windows of time, take advantage of a scheduled scaling action. Where traffic patterns are not easy to determine, you can implement Application Auto Scaling to adjust based on the current level of utilization.

Compute savings plans

AWS Savings Plans is a flexible pricing model offering lower prices compared to on-demand pricing, in exchange for a specific usage commitment (measured in $/hour) for a one- or three-year period.

Compute Savings Plans include Amazon EC2, AWS Fargate and Lambda. Lambda usage for duration and provisioned concurrency are charged at a discounted Savings Plans rate of up to 17% for a 1- or 3-year term.

You can implement Savings Plans without any function code or configuration changes. They can be a simpler way to save money for Lambda-based workloads. Before deciding to use a savings plan, analyze previous patterns to understand any variations in your month to month usage.

Conclusion

This blog post explains how Lambda pricing works and how right-sizing applications and tuning them for performance efficiency offers a more cost-efficient utilization model. The results can also reduce latency, creating a better experience for your end users.

The post explores how architectural changes such as moving to Graviton2 and configuring provisioned concurrency can provide cost reductions for the same operations. Finally, you can use Compute Savings Plans to add an additional cost reduction once you establish a baseline of usage per month.

Part 2 introduces further optimization opportunities for reducing Lambda invocations, moving to an asynchronous model, and reducing logging costs.

For more serverless learning resources, visit Serverless Land.

Monitoring and alerting break-glass access in an AWS Organization

2022-05-27 Haresh Nandwani

Post Syndicated from Haresh Nandwani original https://aws.amazon.com/blogs/architecture/monitoring-and-alerting-break-glass-access-in-an-aws-organization/

Organizations building enterprise-scale systems require the setup of a secure and governed landing zone to deploy and operate their systems. A landing zone is a starting point from which your organization can quickly launch and deploy workloads and applications with confidence in your security and infrastructure environment as described in What is a landing zone?. Nationwide Building Society (Nationwide) is the world’s largest building society. It is owned by its 16 million members and exists to serve their needs. The Society is one of the UK’s largest providers for mortgages, savings and current accounts, as well as being a major provider of ISAs, credit cards, personal loans, insurance, and investments.

For one of its business initiatives, Nationwide utilizes AWS Control Tower to build and operate their landing zone which provides a well-established pattern to set up and govern a secure, multi-account AWS environment. Nationwide operates in a highly regulated industry and our governance assurance requires adequate control of any privileged access to production line-of-business data or to resources which have access to them. We chose for this specific business initiative to deploy our landing zone using AWS Organizations, to benefit from ongoing account management and governance as aligned with AWS implementation best practices. We also utilized AWS Single Sign-On (AWS SSO) to create our workforce identities in AWS once and manage access centrally across our AWS Organization. In this blog, we describe the integrations required across AWS Control Tower and AWS SSO to implement a break-glass mechanism that makes access reporting publishable to system operators as well as to internal audit systems and processes. We will outline how we used AWS SSO for our setup as well as the three architecture options we considered, and why we went with the chosen solution.

Sourcing AWS SSO access data for near real-time monitoring

In our setup, we have multiple AWS Accounts and multiple trails on each of these accounts. Users will regularly navigate across multiple accounts as they operate our infrastructure, and their journeys are marked across these multiple trails. Typically, AWS CloudTrail would be our chosen resource to clearly and unambiguously identify account or data access. The key challenge in this scenario was to design an efficient and cost-effective solution to scan these trails to help identify and report on break-glass user access to account and production data. To address this challenge, we developed the following two architecture design options.

Option 1: A decentralized approach that uses AWS CloudFormation StackSets, Amazon EventBridge and AWS Lambda

Our solution entailed a decentralized approach by deploying a CloudFormation StackSet to create, update, or delete stacks across multiple accounts and AWS Regions with a single operation. The Stackset created Amazon EventBridge rules and target AWS Lambda functions. These functions post to EventBridge in our audit account. Our audit account has a set of Lambda functions running off EventBridge to initiate specific events, format the event message and post to Slack, our centralized communication platform for this implementation. Figure 1 depicts the overall architecture for this option.

Figure 1. De-centralized logging using Amazon EventBridge and AWS Lambda

Option 2: Use an organization trail in the Organization Management account

This option uses the centralized organization trail in the Organization Management account to source audit data. Details of how to create an organization trail can be found in the AWS CloudTrail User Guide. CloudTrail was configured to send log events to CloudWatch Logs. These events are then sent via Lambda functions to Slack using webhooks. We used a public terraform module in this GitHub repository to build this Lambda Slack integration. Figure 2 depicts the overall architecture for this option.

Figure 2. Centralized logging pattern using Amazon CloudWatch

This was our preferred option and is the one we finally implemented.

We also evaluated a third option which was to use centralized logging and auditing feature enabled by Control Tower. Users authenticate and federate to target accounts from a central location so it seemed possible to source this info from the centralized logs. These log events arrive as .gz compressed json objects, which meant having to expand these archives repeatedly for inspection. We therefore decided against this option.

A centralized, economic, extensible solution to alert of SSO break-glass

Our requirement was to identify break-glass access across any of the access mechanisms supported by AWS, including CLI and User Portal access. To ensure we have comprehensive coverage across all access mechanisms, we identified all the events initiated for each access mechanism:

User Portal/AWS Console access events
- Authenticate
- ListApplications
- ListApplicationProfiles
- Federate – this event contains the role that the user is federating into
CLI access events
- CreateToken
- ListAccounts
- ListAccountRoles
- GetRoleCredentials – this event contains the role that the user is federating into

EventBridge is able to initiate actions after events only when the event is trying to perform changes (when the “readOnly” attribute on the event record body equals “false”).

The AWS support team was aware of this attribute and recommended that we, change the data flow we were using to one able to initiate actions after any kind of event, regardless of the value on its readOnly attribute. The solution in our case was to send the CloudTrail logs to CloudWatch Logs. This then and initiates the Lambda function through a filter subscription that detects the desired event names on the log content.

The filter used is as follows:

{($.eventSource = sso.amazonaws.com) && ($.eventName = Federate||$.eventName = GetRoleCredentials)}

Due to the query size in the CloudWatch Log queries we had to remove the subscription filters and do the parsing of the content of the log lines inside the lambda function. In order to determine what accounts would initiate the notifications, we sent the list of accounts and roles to it as an environment variable at runtime.

Considerations with cross-account SSO access

With direct federation users get an access token. This is most obvious in AWS single sign on at the chiclet page as “Command line or programmatic access”. SSO tokens have a limited lifetime (we use the default 1-hour). A user does not have to get a new token to access a target resource until the one they are using is expired. This means that a user may repeatedly access a target account using the same token during its lifetime. Although the token is made available at the chiclet page, the GetRoleCredentials event does not occur until it is used to authenticate an API call to the target AWS account.

Conclusion

In this blog, we discussed how AWS Control Tower and AWS Single Sign-on enabled Nationwide to build and govern a secure, multi-account AWS environment for one of their business initiatives and centralize access management across our implementation. The integration was important for us to accurately and comprehensively identify and audit break-glass access for our implementation. As a result, we were able to satisfy our security and compliance audit requirements for privileged access to our AWS accounts.

Use the AWS Toolkit for Azure DevOps to automate your deployments to AWS

2022-05-24 Mahmoud Abid

Post Syndicated from Mahmoud Abid original https://aws.amazon.com/blogs/devops/use-the-aws-toolkit-for-azure-devops-to-automate-your-deployments-to-aws/

Many developers today seek to improve productivity by finding better ways to collaborate, enhance code quality and automate repetitive tasks. We hear from some of our customers that they would like to leverage services such as AWS CloudFormation, AWS CodeBuild and other AWS Developer Tools to manage their AWS resources while continuing to use their existing CI/CD pipelines which they are familiar with. These services range from popular open-source solutions, such as Jenkins, to paid commercial solutions, such as Azure DevOps Server (formerly Team Foundation Server (TFS)).

In this post, I will walk you through an example to leverage the AWS Toolkit for Azure DevOps to deploy your Infrastructure as Code templates, i.e. AWS CloudFormation stacks, directly from your existing Azure DevOps build pipelines.

The AWS Toolkit for Azure DevOps is a free-to-use extension for hosted and on-premises Microsoft Azure DevOps that makes it easy to manage and deploy applications using AWS. It integrates with many AWS services, including Amazon S3, AWS CodeDeploy, AWS Lambda, AWS CloudFormation, Amazon SQS and others. It can also run commands using the AWS Tools for Windows PowerShell module as well as the AWS CLI.

Solution Overview

The solution described in this post consists of leveraging the AWS Toolkit for Azure DevOps to manage resources on AWS via Infrastructure as Code templates with AWS CloudFormation:

Figure 1. Solution high-level overview

Prerequisites and Assumptions

You will need to go through three main steps in order to set up your environment, which are summarized here and detailed in the toolkit’s user guide:

Install the toolkit into your Azure DevOps account or choose Download to install it on an on-premises server (Figure 2).
Create an IAM User and download its keys. Keep the principle of least privilege in mind when associating the policy to your user.
Create a Service Connection for your project in Azure DevOps. Service connections are how the Azure DevOps tooling manages connecting and providing access to Azure resources. The AWS Toolkit also provides a user interface to configure the AWS credentials used by the service connection (Figure 3).

In addition to the above steps, you will need a sample AWS CloudFormation template to use for testing the deployment such as this sample template creating an EC2 instance. You can find more samples in the Sample Templates page or get started with authoring your own templates.

Figure 2. AWS Toolkit for Azure DevOps in the Visual Studio Marketplace

Figure 3. A new Service Connection of type “AWS” will appear after installing the extension

Model your CI/CD Pipeline to Automate Your Deployments on AWS

One common DevOps model is to have a CI/CD pipeline that deploys an application stack from one environment to another. This model typically includes a Development (or integration) account first, then Staging and finally a Production environment. Let me show you how to make some changes to the service connection configuration to apply this CI/CD model to an Azure DevOps pipeline.

We will create one service connection per AWS account we want to deploy resources to. Figure 4 illustrates the updated solution to showcase multiple AWS Accounts used within the same Azure DevOps pipeline.

Figure 4. Solution overview with multiple target AWS accounts

Each service connection will be configured to use a single, target AWS account. This can be done in two ways:

Create an IAM User for every AWS target account and supply the access key ID and secret access key for that user.
Alternatively, create one central IAM User and have it assume an IAM Role for every AWS deployment target. The AWS Toolkit extension enables you to select an IAM Role to assume. This IAM Role can be in the same AWS account as the IAM User or in a different accounts as depicted in Figure 5.

Figure 5. Use a single IAM User to access all other accounts

Define Your Pipeline Tasks

Once a service connection for your AWS Account is created, you can now add a task to your pipeline that references the service connection created in the previous step. In the example below, I use the CloudFormation Create/Update Stack task to deploy a CloudFormation stack using a template file named my-aws-cloudformation-template.yml:

- task: CloudFormationCreateOrUpdateStack@1
  displayName: 'Create/Update Stack: Development-Deployment'
  inputs:
    awsCredentials: 'development-account'
    regionName:     'eu-central-1'
    stackName:      'my-stack-name'
    useChangeSet:   true
    changeSetName:  'my-stack-name-change-set'
    templateFile:   'my-aws-cloudformation-template.yml'
    templateParametersFile: 'development/parameters.json'
    captureStackOutputs: asVariables
    captureAsSecuredVars: false

I used the service connection that I’ve called development-account and specified the other required information such as the templateFile path for the AWS CloudFormation template. I also specified the optional templateParametersFile path because I used template parameters in my template.

A template parameters file is particularly useful if you need to use custom values in your CloudFormation templates that are different for each stack. This is a common case when deploying the same application stack to different environments (Development, Staging, and Production).

The task below will to deploy the same template to a Staging environment:

- task: CloudFormationCreateOrUpdateStack@1
  displayName: 'Create/Update Stack: Staging-Deployment'
  inputs:
    awsCredentials: 'staging-account'
    regionName:     'eu-central-1'
    stackName:      'my-stack-name'
    useChangeSet:   true
    changeSetName:  'my-stack-name-changeset'
    templateFile:   'my-aws-cloudformation-template.yml'
    templateParametersFile: 'staging/parameters.json'
    captureStackOutputs: asVariables
    captureAsSecuredVars: false

The differences between Development and Staging deployment tasks are the service connection name and template parameters file path used. Remember that each service connection points to a different AWS account and the corresponding parameter values are specific to the target environment.

Use Azure DevOps Parameters to Switch Between Your AWS Accounts

Azure DevOps lets you define reusable contents via pipeline templates and pass different variable values to them when defining the build tasks. You can leverage this functionality so that you easily replicate your deployment steps to your different environments.

In the pipeline template snippet below, I use three template parameters that are passed as input to my task definition:

# File pipeline-templates/my-application.yml

parameters:
  deploymentEnvironment: ''         # development, staging, production, etc
  awsCredentials:        ''         # service connection name
  region:                ''         # the AWS region

steps:

- task: CloudFormationCreateOrUpdateStack@1
  displayName: 'Create/Update Stack: Staging-Deployment'
  inputs:
    awsCredentials: '${{ parameters.awsCredentials }}'
    regionName:     '${{ parameters.region }}'
    stackName:      'my-stack-name'
    useChangeSet:   true
    changeSetName:  'my-stack-name-changeset'
    templateFile:   'my-aws-cloudformation-template.yml'
    templateParametersFile: '${{ parameters.deploymentEnvironment }}/parameters.json'
    captureStackOutputs: asVariables
    captureAsSecuredVars: false

This template can then be used when defining your pipeline with steps to deploy to the Development and Staging environments. The values passed to the parameters will control the target AWS Account the CloudFormation stack will be deployed to :

# File development/pipeline.yml

container: amazon/aws-cli

trigger:
  branches:
    include:
    - master
    
steps:
- template: ../pipeline-templates/my-application.yml  
  parameters:
    deploymentEnvironment: 'development'
    awsCredentials:        'deployment-development'
    region:                'eu-central-1'
    
- template: ../pipeline-templates/my-application.yml  
  parameters:
    deploymentEnvironment: 'staging'
    awsCredentials:        'deployment-staging'
    region:                'eu-central-1'

Putting it All Together

In the snippet examples below, I defined an Azure DevOps pipeline template that builds a Docker image, pushes it to Amazon ECR (using the ECR Push Task) , creates/updates a stack from an AWS CloudFormation template with a template parameter files, and finally runs a AWS CLI command to list all Load Balancers using the AWS CLI Task.

The template below can be reused across different AWS accounts by simply switching the value of the defined parameters as described in the previous section.

Define a template containing your AWS deployment steps:

# File pipeline-templates/my-application.yml

parameters:
  deploymentEnvironment: ''         # development, staging, production, etc
  awsCredentials:        ''         # service connection name
  region:                ''         # the AWS region

steps:

# Build a Docker image
  - task: Docker@1
    displayName: 'Build docker image'
    inputs:
      dockerfile: 'Dockerfile'
      imageName: 'my-application:${{parameters.deploymentEnvironment}}'

# Push Docker Image to Amazon ECR
  - task: ECRPushImage@1
    displayName: 'Push image to ECR'
    inputs:
      awsCredentials: '${{ parameters.awsCredentials }}'
      regionName:     '${{ parameters.region }}'
      sourceImageName: 'my-application'
      repositoryName: 'my-application'
  
# Deploy AWS CloudFormation Stack
- task: CloudFormationCreateOrUpdateStack@1
  displayName: 'Create/Update Stack: My Application Deployment'
  inputs:
    awsCredentials: '${{ parameters.awsCredentials }}'
    regionName:     '${{ parameters.region }}'
    stackName:      'my-application'
    useChangeSet:   true
    changeSetName:  'my-application-changeset'
    templateFile:   'cfn-templates/my-application-template.yml'
    templateParametersFile: '${{ parameters.deploymentEnvironment }}/my-application-parameters.json'
    captureStackOutputs: asVariables
    captureAsSecuredVars: false
         
# Use AWS CLI to perform commands, e.g. list Load Balancers 
 - task: AWSShellScript@1
    displayName: 'AWS CLI: List Elastic Load Balancers'
    inputs:
    awsCredentials: '${{ parameters.awsCredentials }}'
    regionName:     '${{ parameters.region }}'
    scriptType:     'inline'
    inlineScript:   'aws elbv2 describe-load-balancers'

Define a pipeline file for deploying to the Development account:

# File development/azure-pipelines.yml

container: amazon/aws-cli

variables:
- name:  deploymentEnvironment
  value: 'development'
- name:  awsCredentials
  value: 'deployment-development'
- name:  region
  value: 'eu-central-1'  

trigger:
  branches:
    include:
    - master
    - dev
  paths:
    include:
    - "${{ variables.deploymentEnvironment }}/*"  
    
steps:
- template: ../pipeline-templates/my-application.yml  
  parameters:
    deploymentEnvironment: ${{ variables.deploymentEnvironment }}
    awsCredentials:        ${{ variables.awsCredentials }}
    region:                ${{ variables.region }}

(Optionally) Define a pipeline file for deploying to the Staging and Production accounts

<p># File staging/azure-pipelines.yml</p>
container: amazon/aws-cli

variables:
- name:  deploymentEnvironment
  value: 'staging'
- name:  awsCredentials
  value: 'deployment-staging'
- name:  region
  value: 'eu-central-1'  

trigger:
  branches:
    include:
    - master
  paths:
    include:
    - "${{ variables.deploymentEnvironment }}/*"  
    
    
steps:
- template: ../pipeline-templates/my-application.yml  
  parameters:
    deploymentEnvironment: ${{ variables.deploymentEnvironment }}
    awsCredentials:        ${{ variables.awsCredentials }}
    region:                ${{ variables.region }}

# File production/azure-pipelines.yml

container: amazon/aws-cli

variables:
- name:  deploymentEnvironment
  value: 'production'
- name:  awsCredentials
  value: 'deployment-production'
- name:  region
  value: 'eu-central-1'  

trigger:
  branches:
    include:
    - master
  paths:
    include:
    - "${{ variables.deploymentEnvironment }}/*"  
    
    
steps:
- template: ../pipeline-templates/my-application.yml  
  parameters:
    deploymentEnvironment: ${{ variables.deploymentEnvironment }}
    awsCredentials:        ${{ variables.awsCredentials }}
    region:                ${{ variables.region }}

Cleanup

After you have tested and verified your pipeline, you should remove any unused resources by deleting the CloudFormation stacks to avoid unintended account charges. You can delete the stack manually from the AWS Console or use your Azure DevOps pipeline by adding a CloudFormationDeleteStack task:

- task: CloudFormationDeleteStack@1
  displayName: 'Delete Stack: My Application Deployment'
  inputs:
    awsCredentials: '${{ parameters.awsCredentials }}'
    regionName:     '${{ parameters.region }}'
    stackName:      'my-application'

Conclusion

In this post, I showed you how you can easily leverage the AWS Toolkit for AzureDevOps extension to deploy resources to your AWS account from Azure DevOps and Azure DevOps Server. The story does not end here. This extension integrates directly with others services as well, making it easy to build your pipelines around them:

AWSCLI – Interact with the AWSCLI (Windows hosts only)
AWS Powershell Module – Interact with AWS through powershell (Windows hosts only)
Beanstalk – Deploy ElasticBeanstalk applications
CodeDeploy – Deploy with CodeDeploy
CloudFormation – Create/Delete/Update CloudFormation stacks
ECR – Push an image to an ECR repository
Lambda – Deploy from S3, .net core applications, or any other language that builds on Azure DevOps
S3 – Upload/Download to/from S3 buckets
Secrets Manager – Create and retrieve secrets
SQS – Send SQS messages
SNS – Send SNS messages
Systems manager – Get/set parameters and run commands

The toolkit is an open-source project available in GitHub. We’d love to see your issues, feature requests, code reviews, pull requests, or any positive contribution coming up.

Author:

Disaster recovery with AWS managed services, Part 2: Multi-Region/backup and restore

2022-05-23 Dhruv Bakshi

Post Syndicated from Dhruv Bakshi original https://aws.amazon.com/blogs/architecture/disaster-recovery-with-aws-managed-services-part-ii-multi-region-backup-and-restore/

In part I of this series, we introduced a disaster recovery (DR) concept that uses managed services through a single AWS Region strategy. In part two, we introduce a multi-Region backup and restore approach. With this approach, you can deploy a DR solution in multiple Regions, but it will be associated with longer RPO/RTO. Using a backup and restore strategy will safeguard applications and data against large-scale events as a cost-effective solution, but will result in longer downtimes and greater loss of data in the event of a disaster as compared to other strategies as shown in Figure 1.

Figure 1. DR Strategies

Implementing the multi-Region/backup and restore strategy

Using multiple Regions ensures resiliency in the most serious, widespread outages. A secondary Region protects workloads against being unable to run within a given Region, because they are wide and geographically dispersed.

Architecture overview

The application diagram presented in Figures 2.1 and 2.2 refers to an application that processes payment transactions, which was modernized to utilize managed services in the AWS Cloud. In this post, we’ll show you which AWS services it uses and how they work to maintain multi-Region/backup and restore strategy.

These figures show how to successfully implement the backup and restore strategy and successfully fail over your workload. The following sections list the components of the example application presented in the figures, which works as follows:

Amazon Route 53 health checks monitor application endpoints
If the Route 53 health check fails, an Amazon CloudWatch alarm prompts an Amazon Simple Notification Service (Amazon SNS) topic
This SNS topic invokes an AWS Lambda function, which will invoke the infrastructure pipeline to initiate cluster provision in the secondary Region.

Figure 2.1. Multi-Region backup

Figure 2.2. Multi-Region restore

Route 53

Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources. Health checks are necessary for configuring DNS failover within Route 53. Once an application or resource becomes unhealthy, you’ll need to initiate a manual failover process to create resources in the secondary Region. In our architecture, we use CloudWatch alarms to automate notifications of changes in health status.

Please check out the Creating Disaster Recovery Mechanisms Using Amazon Route 53 blog post for additional DR mechanisms using Amazon Route 53.

Amazon EKS control plane

Amazon Elastic Kubernetes Service (Amazon EKS) automatically scales control plane instances based on load, automatically detects and replaces unhealthy control plane instances, and restarts them across the Availability Zones within the Region as needed. Because on-demand clusters are provisioned in the secondary Region, AWS also manages the control plane the same way.

Amazon EKS data plane

It is a best practice to create worker nodes using Amazon Elastic Compute Cloud (Amazon EC2) Auto Scaling groups instead of creating individual EC2 instances and joining them to the cluster. This is because Amazon EC2 Auto Scaling groups automatically replace any terminated or failed nodes, which ensures that the cluster always has the capacity to run your workload.

The Amazon EKS control plane and data plane will be created on demand in the secondary Region during an outage via Infrastructure-as-a-Code (IaaC) such as AWS CloudFormation, Terraform, etc. You should pre-stage all networking requirements like virtual private cloud (VPC), subnets, route tables, gateways and deploy the Amazon EKS cluster during an outage in the primary Region.

As shown in the Backup and restore your Amazon EKS cluster resources using Velero blog post, you may use a third-party tool like Velero for managing snapshots of persistent volumes. These snapshots can be stored in an Amazon Simple Storage Service (Amazon S3) bucket in the primary Region, which will be replicated to an S3 bucket in another Region via cross-Region replication.

During an outage in the primary Region, you can use the tool in the secondary Region to restore volumes from snapshots in the standby cluster.

OpenSearch Service

For domains running Amazon OpenSearch Service, OpenSearch Service takes hourly automated snapshots and retains up to 336 for 14 days. These snapshots can only be used for cluster recovery within the same Region as the primary OpenSearch cluster.

You can use OpenSearch APIs to create a manual snapshot of an OpenSearch cluster, which can be stored in a registered repository like Amazon S3. You can do this manually or create a scheduled Lambda function based on their RPO, which prompts creation of a manual snapshot that will be stored in an S3 bucket. Amazon S3 cross-Region replication will then automatically and asynchronously copy objects across S3 buckets.

You can restore OpenSearch Service clusters by creating the cluster on demand via CloudFormation and using OpenSearch APIs to restore the snapshot from an S3 bucket.

Amazon RDS Postgres

Amazon Relational Database Service (Amazon RDS) can copy continuous backups cross-Region. You can configure your Amazon RDS database instance to replicate snapshots and transaction logs to a destination Region of your choice.

If a continuous backup rule also specifies a cross-account or cross-Region copy, AWS Backup takes a snapshot of the continuous backup, copies that snapshot to the destination vault, and then deletes the source snapshot. For continuous backup of Amazon RDS, AWS Backup creates a snapshot every 24 hours and stores transaction logs every 5 minutes in-Region. The Backup Frequency setting only applies to cross-Region backups of these continuous backups. Backup Frequency determines how often AWS Backup:

Creates a snapshot at that point in time from the existing snapshot plus all transaction logs up to that point
Copies snapshots to the other Region(s)
Deletes snapshots (because it only was created to be copied)

For more information, refer to the Point-in-time recovery and continuous backup for Amazon RDS with AWS Backup blog post.

ElastiCache

You can export and import backup and copy API calls for Amazon ElastiCache to develop a snapshot and restore strategy in a secondary Region. You can either prompt a manual backup and copy of that backup to S3 bucket or create a pair of Lambda functions to run at a schedule to meet the RPO requirements. The Lambda functions will prompt a manual backup, which creates a .rdb to an S3 bucket. Amazon S3 cross-Region replication will then handle asynchronous copy of the backup to an S3 bucket in a secondary Region.

You can use CloudFormation to create an ElastiCache cluster on demand and use CloudFormation properties such as SnapshotArns and SnapshotName to point to the desired ElastiCache backup stored in Amazon S3 to seed the cluster in the secondary Region.

Amazon Redshift

Amazon Redshift takes automatic, incremental snapshots of your data periodically and saves them to Amazon S3. Additionally, you can take manual snapshots of your data whenever you want.

To precisely control when snapshots are taken, you can create a snapshot schedule and attach it to one or more clusters. You can also configure cross-Region snapshot copy, which will automatically copy all your automated and manual snapshots to another Region.

During an outage, you can create the Amazon Redshift cluster on demand via CloudFormation and use CloudFormation properties such as SnapshotIdentifier to restore the new cluster from that snapshot.

Note: You can add an additional layer of protection to your backups through AWS Backup Vault Lock, S3 Object Lock, and Encrypted Backups.

Conclusion

With greater adoption of managed services within the cloud, there is a need to think of creative ways to implement a cost-effective DR solution. This backup and restore approach offered in this post will lower costs through more lenient RPO/RTO requirements, while providing a solution to utilize AWS managed services.

In the next post, we will discuss a multi-Region active/active strategy for the same application stack illustrated in this post.

Related information

Disaster Recovery (DR) Architecture on AWS, Part II: Backup and Restore with Rapid Recovery

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

AWS Week in Review – May 16, 2022

2022-05-16 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-week-in-review-may-16-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

I had been on the road for the last five weeks and attended many of the AWS Summits in Europe. It was great to talk to so many of you in person. The Serverless Developer Advocates are going around many of the AWS Summits with the Serverlesspresso booth. If you attend an event that has the booth, say “Hi ” to my colleagues, and have a coffee while asking all your serverless questions. You can find all the upcoming AWS Summits in the events section at the end of this post.

Last week’s launches
Here are some launches that got my attention during the previous week.

AWS Step Functions announced a new console experience to debug your state machine executions – Now you can opt-in to the new console experience of Step Functions, which makes it easier to analyze, debug, and optimize Standard Workflows. The new page allows you to inspect executions using three different views: graph, table, and event view, and add many new features to enhance the navigation and analysis of the executions. To learn about all the features and how to use them, read Ben’s blog post.

Example on how the Graph View looks

AWS Lambda now supports Node.js 16.x runtime – Now you can start using the Node.js 16 runtime when you create a new function or update your existing functions to use it. You can also use the new container image base that supports this runtime. To learn more about this launch, check Dan’s blog post.

AWS Amplify announces its Android library designed for Kotlin – The Amplify Android library has been rewritten for Kotlin, and now it is available in preview. This new library provides better debugging capacities and visibility into underlying state management. And it is also using the new AWS SDK for Kotlin that was released last year in preview. Read the What’s New post for more information.

Three new APIs for batch data retrieval in AWS IoT SiteWise – With this new launch AWS IoT SiteWise now supports batch data retrieval from multiple asset properties. The new APIs allow you to retrieve current values, historical values, and aggregated values. Read the What’s New post to learn how you can start using the new APIs.

AWS Secrets Manager now publishes secret usage metrics to Amazon CloudWatch – This launch is very useful to see the number of secrets in your account and set alarms for any unexpected increase or decrease in the number of secrets. Read the documentation on Monitoring Secrets Manager with Amazon CloudWatch for more information.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other launches and news that you may have missed:

IBM signed a deal with AWS to offer its software portfolio as a service on AWS. This allows customers using AWS to access IBM software for automation, data and artificial intelligence, and security that is built on Red Hat OpenShift Service on AWS.

Podcast Charlas Técnicas de AWS – If you understand Spanish, this podcast is for you. Podcast Charlas Técnicas is one of the official AWS podcasts in Spanish. This week’s episode introduces you to Amazon DynamoDB and shares stories on how different customers use this database service. You can listen to all the episodes directly from your favorite podcast app or the podcast web page.

AWS Open Source News and Updates – Ricardo Sueiras, my colleague from the AWS Developer Relation team, runs this newsletter. It brings you all the latest open-source projects, posts, and more. Read edition #112 here.

Upcoming AWS Events
It’s AWS Summits season and here are some virtual and in-person events that might be close to you:

May 18, AWS Summit Tel Aviv (in-person)
May 18, AWS Summit ASEAN (online)
May 18–19, AWS Summit Australia and New Zealand (online)
May 18–19, AWS Summit Atlanta (in-person)
May 23–25, AWS Summit Washington, DC (in-person)

You can register for re:MARS to get fresh ideas on topics such as machine learning, automation, robotics, and space. The conference will be in person in Las Vegas, June 21–24.

That’s all for this week. Check back next Monday for another Week in Review!

— Marcia

Use direct service integrations to optimize your architecture

2022-05-16 Jerome Van Der Linden

Post Syndicated from Jerome Van Der Linden original https://aws.amazon.com/blogs/architecture/use-direct-service-integrations-to-optimize-your-architecture/

When designing an application, you must integrate and combine several AWS services in the most optimized way for an effective and efficient architecture:

Optimize for performance by reducing the latency between services
Optimize for costs operability and sustainability, by avoiding unnecessary components and reducing workload footprint
Optimize for resiliency by removing potential point of failures
Optimize for security by minimizing the attack surface

As stated in the Serverless Application Lens of the Well-Architected Framework, “If your AWS Lambda function is not performing custom logic while integrating with other AWS services, chances are that it may be unnecessary.” In addition, Amazon API Gateway, AWS AppSync, AWS Step Functions, Amazon EventBridge, and Lambda Destinations can directly integrate with a number of services. These optimizations can offer you more value and less operational overhead.

This blog post will show how to optimize an architecture with direct integration.

Workflow example and initial architecture

Figure 1 shows a typical workflow for the creation of an online bank account. The customer fills out a registration form with personal information and adds a picture of their ID card. The application then validates ID and address, and scans if there is already an existing user by that name. If everything checks out, a backend application will be notified to create the account. Finally, the user is notified of successful completion.

Figure 1. Bank account application workflow

The workflow architecture is shown in Figure 2 (click on the picture to get full resolution).

Figure 2. Initial account creation architecture

This architecture contains 13 Lambda functions. If you look at the code on GitHub, you can see that:

Five of these Lambda functions are basic and perform simple operations:

One function to execute the Step Functions state machine
Two functions to perform create, read, update, and delete (CRUD) operations on Amazon DynamoDB
Two functions to send messages on Amazon Simple Queue Service (Amazon SQS) or EventBridge

Additional Lambda functions perform other tasks, such as verification and validation:

One function generates a presigned URL to upload ID card pictures to Amazon Simple Storage Service (Amazon S3)
One function uses the Amazon Textract API to extract information from the ID card
One function verifies the identity of the user against the information extracted from the ID card
One function performs simple HTTP request to a third-party API to validate the address

Finally, four functions concern the websocket (connect, message, and disconnect) and notifications to the user.

Opportunities for improvement

If you further analyze the code of the five basic functions (see startWorkflow on GitHub, for example), you will notice that there are actually three lines of fundamental code that start the workflow. The others 38 lines involve imports, input validation, error handling, logging, and tracing. Remember that all this code must be tested and maintained.

import os
import json
import boto3
from aws_lambda_powertools import Tracer
from aws_lambda_powertools import Logger
import re

logger = Logger()
tracer = Tracer()

sfn = boto3.client('stepfunctions')

PATTERN = re.compile(r"^arn:(aws[a-zA-Z-]*)?:states:[a-z]{2}((-gov)|(-iso(b?)))?-[a-z]+-\d{1}:\d{12}:stateMachine:[a-zA-Z0-9-_]+$")

if ('STATE_MACHINE_ARN' not in os.environ
    or os.environ['STATE_MACHINE_ARN'] is None
    or not PATTERN.match(os.environ['STATE_MACHINE_ARN'])):
    raise RuntimeError('STATE_MACHINE_ARN env var is not set or incorrect')

STATE_MACHINE_ARN = os.environ['STATE_MACHINE_ARN']

@logger.inject_lambda_context
@tracer.capture_lambda_handler
def handler(event, context):
    try:
        event['requestId'] = context.aws_request_id

        sfn.start_execution(
            stateMachineArn=STATE_MACHINE_ARN,
            input=json.dumps(event)
        )

        return {
            'requestId': event['requestId']
        }
    except Exception as error:
        logger.exception(error)
        raise RuntimeError('Internal Error - cannot start the creation workflow') from error

After running this workflow several times and reviewing the AWS X-Ray traces (Figure 3), we can see that it takes about 2–3 seconds when functions are warmed:

Figure 3. X-Ray traces when Lambda functions are warmed

But the process takes around 10 seconds with cold starts, as shown in Figure 4:

Figure 4. X-Ray traces when Lambda functions are cold

We use an asynchronous architecture to avoid waiting time for the user, as this can be a long process. We also use WebSockets to notify the user when it’s finished. This adds some complexity, new components, and additional costs to the architecture. Now let’s look at how we can optimize this architecture.

Improving the initial architecture

Direct integration with Step Functions

Step Functions can directly integrate with some AWS services, including DynamoDB, Amazon SQS, and EventBridge, and more than 10,000 APIs from 200+ AWS services. With these integrations, you can replace Lambda functions when they do not provide value. We recommend using Lambda functions to transform data, not to transport data from one service to another.

In our bank account creation use case, there are four Lambda functions we can replace with direct service integrations (see large arrows in Figure 5):

Query a DynamoDB table to search for a user
Send a message to an SQS queue when the extraction fails
Create the user in DynamoDB
Send an event on EventBridge to notify the backend

Figure 5. Lambda functions that can be replaced

It is not as clear that we need to replace the other Lambda functions. Here are some considerations:

To extract information from the ID card, we use Amazon Textract. It is available through the SDK integration in Step Functions. However, the API’s response provides too much information. We recommend using a library such as amazon-textract-response-parser to parse the result. For this, you’ll need a Lambda function.
The identity cross-check performs a simple comparison between the data provided in the web form and the one extracted in the ID card. We can perform this comparison in Step Functions using a Choice state and several conditions. If the business logic becomes more complex, consider using a Lambda function.
To validate the address, we query a third-party API. Step Functions cannot directly call a third-party HTTP endpoint, but because it’s integrated with API Gateway, we can create a proxy for this endpoint.

If you only need to retrieve data from an API or make a simple API call, use the direct integration. If you need to implement some logic, use a Lambda function.

Direct integration with API Gateway

API Gateway also provides service integrations. In particular, we can start the workflow without using a Lambda function. In the console, select the integration type “AWS Service”, the AWS service “Step Functions”, the action “StartExecution”, and “POST” method, as shown in Figure 6.

Figure 6. API Gateway direct integration with Step Functions

After that, use a mapping template in the integration request to define the parameters as shown here:

{
"stateMachineArn":"arn:aws:states:eu-central-1:123456789012:stateMachine: accountCreationWorkflow",
"input":"$util.escapeJavaScript($input.json('$'))"
}

We can go further and remove the websockets and associated Lambda functions connect, message, and disconnect. By using Synchronous Express Workflows and the StartSyncExecution API, we can start the workflow and wait for the result in a synchronous fashion. API Gateway will then directly return the result of the workflow to the client.

Final optimized architecture

After applying these optimizations, we have the updated architecture shown in Figure 7. It uses only two Lambda functions out of the initial 13. The rest have been replaced by direct service integrations or implemented in Step Functions.

Figure 7. Final optimized architecture

We were able to remove 11 Lambda functions and their associated fees. In this architecture, the cost is mainly driven by Step Functions, and the main price difference will be your use of Express Workflows instead of Standard Workflows. If you need to keep some Lambda functions, use AWS Lambda Power Tuning to configure your function correctly and benefit from the best price/performance ratio.

One of the main benefits of this architecture is performance. With the final workflow architecture, it now takes about 1.5 seconds when the Lambda function is warmed and 3 seconds on cold starts (versus up to 10 seconds previously), see Figure 8:

Figure 8. X-Ray traces for the final architecture

The process can now be synchronous. It reduces the complexity of the architecture and vastly improves the user experience.

An added benefit is that by reducing the overall complexity and removing the unnecessary Lambda functions, we have also reduced the risk of failures. These can be errors in the code, memory or timeout issues due to bad configuration, lack of permissions, network issues between components, and more. This increases the resiliency of the application and eases its maintenance.

Testing

Testability is an important consideration when building your workflow. Unit testing a Lambda function is straightforward, and you can use your preferred testing framework and validate methods. Adopting a hexagonal architecture also helps remove dependencies to the cloud.

When removing functions and using an approach with direct service integrations, you are by definition directly connected to the cloud. You still must verify that the overall process is working as expected, and validate these integrations.

You can achieve this kind of tests locally using Step Functions Local, and the recently announced Mocked Service Integrations. By mocking service integrations, for example, retrieving an item in DynamoDB, you can validate the different paths of your state machine.

You also have to perform integration tests, but this is true whether you use direct integrations or Lambda functions.

Conclusion

This post describes how to simplify your architecture and optimize for performance, resiliency, and cost by using direct integrations in Step Functions and API Gateway. Although many Lambda functions were reduced, some remain useful for handling more complex business logic and data transformation. Try this out now by visiting the GitHub repository.

For further reading:

Node.js 16.x runtime now available in AWS Lambda

2022-05-12 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/node-js-16-x-runtime-now-available-in-aws-lambda/

This post is written by Dan Fox, Principal Specialist Solutions Architect, Serverless.

You can now develop AWS Lambda functions using the Node.js 16 runtime. This version is in active LTS status and considered ready for general use. To use this new version, specify a runtime parameter value of nodejs16.x when creating or updating functions or by using the appropriate container base image.

The Node.js 16 runtime includes support for ES modules and top-level await that was added to the Node.js 14 runtime in January 2022. This is especially useful when used with Provisioned Concurrency to reduce cold start times.

This runtime version is supported by functions running on either Arm-based AWS Graviton2 processors or x86-based processors. Using the Graviton2 processor architecture option allows you to get up to 34% better price performance.

We recognize that customers have been waiting for some time for this runtime release. We hear your feedback and plan to release the next Node.js runtime version in a timelier manner.

AWS SDK for JavaScript

The Node.js 16 managed runtimes and container base images bundle the AWS JavaScript SDK version 2. Using the bundled SDK is convenient for a few use cases. For example, developers writing short functions via the Lambda console or inline functions via CloudFormation templates may find it useful to reference the bundled SDK.

In general, however, including the SDK in the function’s deployment package is good practice. Including the SDK pins a function to a specific minor version of the package, insulating code from SDK API or behavior changes. To include the SDK, refer to the SDK package and version in the dependency object of the package.json file. Use a package manager like npm or yarn to download and install the library locally before building and deploying your deployment package to the AWS Cloud.

Customers who take this route should consider using the JavaScript SDK, version 3. This version of the SDK contains modular packages. This can reduce the size of your deployment package, improving your function’s performance. Additionally, version 3 contains improved TypeScript compatibility, and using it will maximize compatibility with future runtime releases.

Language updates

With this release, Lambda customers can take advantage of new Node.js 16 language features, including:

Toolchain and compiler upgrades, including prebuilt binaries for Apple Silicon.
Stable timers promises API.
RegExp match indices.
Faster calls with argument size mismatch.
An update to V8 release 9.4

Prebuilt binaries for Apple Silicon

Node.js 16 is the first runtime release to ship prebuilt binaries for Apple Silicon. Customers using M1 processors in Apple computers may now develop Lambda functions locally using this runtime version.

Stable timers promises API

The timers promises API offers timer functions that return promise objects, improving the functionality for managing timers. This feature is available for both ES modules and CommonJS.

You may designate your function as an ES module by changing the file name extension of your handler file to .mjs, or by specifying “type” as “module” in the function’s package.json file. Learn more about using Node.js ES modules in AWS Lambda.


  // index.mjs
  
  import { setTimeout } from 'timers/promises';

  export async function handler() {
    await setTimeout(2000);
    return;
  }

RegExp match indices

The RegExp match indices feature allows developers to get an array of the start and end indices of the captured string in a regular expression. Use the “/d” flag in your regular expression to access this feature.

  // handler.js
  
  exports.lambdaHandler = async () => {
    const matcher = /(AWS )(Lambda)/d.exec('AWS Lambda');
    console.log("match: " + matcher.indices[0]) // 0,10
    console.log("first capture group: " + matcher.indices[1]) // 0,4
    console.log("second capture group: " + matcher.indices[2]) // 4,10
  }

Working with TypeScript

Many developers using Node.js runtimes in Lambda develop their code using TypeScript. To better support TypeScript developers, we have recently published new documentation on using TypeScript with Lambda, and added beta TypeScript support to the AWS SAM CLI.

We are also working on a TypeScript version of Lambda PowerTools. This is a suite of utilities for Lambda developers to simplify the adoption of best practices, such as tracing, structured logging, custom metrics, and more. Currently, AWS Lambda Powertools for TypeScript is in beta developer preview.

Runtime updates

To help keep Lambda functions secure, AWS will update Node.js 16 with all minor updates released by the Node.js community when using the zip archive format. For Lambda functions packaged as a container image, pull, rebuild, and deploy the latest base image from DockerHub or the Amazon ECR Public Gallery.

Amazon Linux 2

The Node.js 16 managed runtime, like Node.js 14, Java 11, and Python 3.9, is based on an Amazon Linux 2 execution environment. Amazon Linux 2 provides a secure, stable, and high-performance execution environment to develop and run cloud and enterprise applications.

Conclusion

Lambda now supports Node.js 16. Get started building with Node.js 16 by specifying a runtime parameter value of nodejs16.x when creating your Lambda functions using the zip archive packaging format.

You can also build Lambda functions in Node.js 16 by deploying your function code as a container image using the Node.js 16 AWS base image for Lambda. You can read about the Node.js programming model in the AWS Lambda documentation to learn more about writing functions in Node.js 16.

For existing Node.js functions, review your code for compatibility with Node.js 16 including deprecations, then migrate to the new runtime by changing the function’s runtime configuration to nodejs16.x.

For more serverless learning resources, visit Serverless Land.

Throttling a tiered, multi-tenant REST API at scale using API Gateway: Part 2

2022-05-09 Nick Choi

Post Syndicated from Nick Choi original https://aws.amazon.com/blogs/architecture/throttling-a-tiered-multi-tenant-rest-api-at-scale-using-api-gateway-part-2/

In Part 1 of this blog series, we demonstrated why tiering and throttling become necessary at scale for multi-tenant REST APIs, and explored tiering strategy and throttling with Amazon API Gateway.

In this post, Part 2, we will examine tenant isolation strategies at scale with API Gateway and extend the sample code from Part 1.

Enhancing the sample code

To enable this functionality in the sample code (Figure 1), we will make manual changes. First, create one API key for the Free Tier and five API keys for the Basic Tier. Currently, these API keys are private keys for your Amazon Cognito login, but we will make a further change in the backend business logic that will promote them to pooled resources. Note that all of these modifications are specific to this sample code’s implementation; the implementation and deployment of a production code may be completely different (Figure 1).

Figure 1. Cloud architecture of the sample code

Next, in the business logic for thecreateKey(), find the AWS Lambda function in lambda/create_key.js. It appears like this:

function createKey(tableName, key, plansTable, jwt, rand, callback) {
  const pool = getPoolForPlanId( key.planId ) 
  if (!pool) {
    createSiloedKey(tableName, key, plansTable, jwt, rand, callback);
  } else {
    createPooledKey(pool, tableName, key, jwt, callback);
  }
}

The getPoolForPlanId() function does a search for a pool of keys associated with the usage plan. If there is a pool, we “create” a kind of reference to the pooled resource, rather than a completely new key that is created by the API Gateway service directly. The lambda/api_key_pools.js should be empty.

exports.apiKeyPools = [];

In effect, all usage plans were considered as siloed keys up to now. To change that, populate the data structure with values from the six API keys that were created manually. You will have to look up the IDs of the API keys and usage plans that were created in API Gateway (Figures 2 and 3). Using the AWS console to navigate to API Gateway is the most intuitive.

Figure 2. A view of the AWS console when inspecting the ID for the Basic usage plan

Figure 3. A view of the AWS Console when looking up the API key value (not the ID)

When done, your code in lambda/api_key_pools.js should be the following, but instead of ellipses (…), the IDs for the user plans and API keys specific to your environment will appear.

exports.apiKeyPools = [{
    planName: "FreePlan"
    planId: "...",
    apiKeys: [ "..." ]
  },
 {
    planName: "BasicPlan"
    planId: "...",
    apiKeys: [ "...", "...", "...", "...", "..." ]
  }];

After making the code changes, run cdk deploy from the command line to update the Lambda functions. This change will only affect key creation and deletion because of the system implementation. Updates affect only the user’s specific reference to the key, not the underlying resource managed by API Gateway.

When the web application is run now, it will look similar to before—tenants should not be aware what tiering strategy they have been assigned to. The only way to notice the difference would be to create two Free Tier keys, test them, and note that the value of the X-API-KEY header is unchanged between the two.

Now, you have a virtually unlimited number of users who can have API keys in the Free or Basic Tier. By keeping the Premium Tier siloed, you are subject to the 10,000-API-key maximum (less any keys allocated for the lower tiers). You may consider additional techniques to continue to scale, such as replicating your service in another AWS account.

Other production considerations

The sample code is minimal, and it illustrates just one aspect of scaling a Software-as-a-service (SaaS) application. There are many other aspects be considered in a production setting that we explore in this section.

The throttled endpoint, GET /api rely only on API key for authorization for demonstration purpose. For any production implementation consider authentication options for your REST APIs. You may explore and extend to require authentication with Cognito similar to /admin/* endpoints in the sample code.

One API key for Free Tier access and five API keys for Basic Tier access are illustrative in a sample code but not representative of production deployments. Number of API keys with service quota into consideration, business and technical decisions may be made to minimize noisy neighbor effect such as setting blast radius upper threshold of 0.1% of all users. To satisfy that requirement, each tier would need to spread users across at least 1,000 API keys. The number of keys allocated to Basic or Premium Tier would depend on market needs and pricing strategies. Additional allocations of keys could be held in reserve for troubleshooting, QA, tenant migrations, and key retirement.

In the planning phase of your solution, you will decide how many tiers to provide, how many usage plans are needed, and what throttle limits and quotas to apply. These decisions depend on your architecture and business.

To define API request limits, examine the system API Gateway is protecting and what load it can sustain. For example, if your service will scale up to 1,000 requests per second, it is possible to implement three tiers with a 10/50/40 split: the lowest tier shares one common API key with a 100 request per second limit; an intermediate tier has a pool of 25 API keys with a limit of 20 requests per second each; and the highest tier has a maximum of 10 API keys, each supporting 40 requests per second.

Metrics play a large role in continuously evolving your SaaS-tiering strategy (Figure 4). They provide rich insights into how tenants are using the system. Tenant-aware and SaaS-wide metrics on throttling and quota limits can be used to: assess tiering in-place, if tenants’ requirements are being met, and if currently used tenant usage profiles are valid (Figure 5).

Figure 4. Tiering strategy example with 3 tiers and requests allocation per tier

Figure 5. An example SaaS metrics dashboard

API Gateway provides options for different levels of granularity required, including detailed metrics, and execution and access logging to enable observability of your SaaS solution. Granular usage metrics combined with underlying resource consumption leads to managing optimal experience for your tenants with throttling levels and policies per method and per client.

Cleanup

To avoid incurring future charges, delete the resources. This can be done on the command line by typing:

cd ${TOP}/cdk
cdk destroy

cd ${TOP}/react
amplify delete

${TOP} is the topmost directory of the sample code. For the most up-to-date information, see the README.md file.

Conclusion

In this two-part blog series, we have reviewed the best practices and challenges of effectively guarding a tiered multi-tenant REST API hosted in AWS API Gateway. We also explored how throttling policy and quota management can help you continuously evaluate the needs of your tenants and evolve your tiering strategy to protect your backend systems from being overwhelmed by inbound traffic.

Throttling a tiered, multi-tenant REST API at scale using API Gateway: Part 1

2022-05-06 Nick Choi

Post Syndicated from Nick Choi original https://aws.amazon.com/blogs/architecture/throttling-a-tiered-multi-tenant-rest-api-at-scale-using-api-gateway-part-1/

Many software-as-a-service (SaaS) providers adopt throttling as a common technique to protect a distributed system from spikes of inbound traffic that might compromise reliability, reduce throughput, or increase operational cost. Multi-tenant SaaS systems have an additional concern of fairness; excessive traffic from one tenant needs to be selectively throttled without impacting the experience of other tenants. This is also known as “the noisy neighbor” problem. AWS itself enforces some combination of throttling and quota limits on nearly all its own service APIs. SaaS providers building on AWS should design and implement throttling strategies in all of their APIs as well.

In this two-part blog series, we will explore tiering and throttling strategies for multi-tenant REST APIs and review tenant isolation models with hands-on sample code. In part 1, we will look at why a tiering and throttling strategy is needed and show how Amazon API Gateway can help by showing sample code. In part 2, we will dive deeper into tenant isolation models as well as considerations for production.

We selected Amazon API Gateway for this architecture since it is a fully managed service that helps developers to create, publish, maintain, monitor, and secure APIs. First, let’s focus on how Amazon API Gateway can be used to throttle REST APIs with fine granularity using Usage Plans and API Keys. Usage Plans define the thresholds beyond which throttling should occur. They also enable quotas, which sets a maximum usage per a day, week, or month. API Keys are identifiers for distinguishing traffic and determining which Usage Plans to apply for each request. We limit the scope of our discussion to REST APIs because other protocols that API Gateway supports — WebSocket APIs and HTTP APIs — have different throttling mechanisms that do not employ Usage Plans or API Keys.

SaaS providers must balance minimizing cost to serve and providing consistent quality of service for all tenants. They also need to ensure one tenant’s activity does not affect the other tenants’ experience. Throttling and quotas are a key aspect of a tiering strategy and important for protecting your service at any scale. In practice, this impact of throttling polices and quota management is continuously monitored and evaluated as the tenant composition and behavior evolve over time.

Architecture Overview

Figure 1. Cloud Architecture of the sample code.

Figure 1 – Architecture of the sample code

To get a firm foundation of the basics of throttling and quotas with API Gateway, we’ve provided sample code in AWS-Samples on GitHub. Not only does it provide a starting point to experiment with Usage Plans and API Keys in the API Gateway, but we will modify this code later to address complexity that happens at scale. The sample code has two main parts: 1) a web frontend and, 2) a serverless backend. The backend is a serverless architecture using Amazon API Gateway, AWS Lambda, Amazon DynamoDB, and Amazon Cognito. As Figure I illustrates, it implements one REST API endpoint, GET /api, that is protected with throttling and quotas. There are additional APIs under the /admin/* resource to provide Read access to Usage Plans, and CRUD operations on API Keys.

All these REST endpoints could be tested with developer tools such as curl or Postman, but we’ve also provided a web application, to help you get started. The web application illustrates how tenants might interact with the SaaS application to browse different tiers of service, purchase API Keys, and test them. The web application is implemented in React and uses AWS Amplify CLI and SDKs.

Prerequisites

To deploy the sample code, you should have the following prerequisites:

AWS Account
AWS CLI
AWS CDK
Amplify CLI
An AWS CLI profile with permissions to deploy the architecture

For clarity, we’ll use the environment variable, ${TOP}, to indicate the top-most directory in the cloned source code or the top directory in the project when browsing through GitHub.

Detailed instructions on how to install the code are in ${TOP}/INSTALL.md file in the code. After installation, follow the ${TOP}/WALKTHROUGH.md for step-by-step instructions to create a test key with a very small quota limit of 10 requests per day, and use the client to hit that limit. Search for HTTP 429: Too Many Requests as the signal your client has been throttled.

Figure 2: The web application (with browser developer tools enabled) shows that a quick succession of API calls starts returning an HTTP 429 after the quota for the day is exceeded.

Responsibilities of the Client to support Throttling

The Client must provide an API Key in the header of the HTTP request, labelled, “X-Api-Key:”. If a resource in API Gateway has throttling enabled and that header is missing or invalid in the request, then API Gateway will reject the request.

Important: API Keys are simple identifiers, not authorization tokens or cryptographic keys. API keys are for throttling and managing quotas for tenants only and not suitable as a security mechanism. There are many ways to properly control access to a REST API in API Gateway, and we refer you to the AWS documentation for more details as that topic is beyond the scope of this post.

Clients should always test for the response to any network call, and implement logic specific to an HTTP 429 response. The correct action is almost always “try again later.” Just how much later, and how many times before giving up, is application dependent. Common approaches include:

Retry – With simple retry, client retries the request up to defined maximum retry limit configured
Exponential backoff – Exponential backoff uses progressively larger wait time between retries for consecutive errors. As the wait time can become very long quickly, maximum delay and a maximum retry limits should be specified.
Jitter – Jitter uses a random amount of delay between retry to prevent large bursts by spreading the request rate.

AWS SDK is an example client-responsibility implementation. Each AWS SDK implements automatic retry logic that uses a combination of retry, exponential backoff, jitter, and maximum retry limit.

SaaS Considerations: Tenant Isolation Strategies at Scale

While the sample code is a good start, the design has an implicit assumption that API Gateway will support as many API Keys as we have number of tenants. In fact, API Gateway has a quota on available per region per account. If the sample code’s requirements are to support more than 10,000 tenants (or if tenants are allowed multiple keys), then the sample implementation is not going to scale, and we need to consider more scalable implementation strategies.

This is one instance of a general challenge with SaaS called “tenant isolation strategies.” We highly recommend reviewing this white paper ‘SasS Tenant Isolation Strategies‘. A brief explanation here is that the one-resource-per-customer (or “siloed”) model is just one of many possible strategies to address tenant isolation. While the siloed model may be the easiest to implement and offers strong isolation, it offers no economy of scale, has high management complexity, and will quickly run into limits set by the underlying AWS Services. Other models besides siloed include pooling, and bridged models. Again, we recommend the whitepaper for more details.

Figure 3- Tiered multi-tenant architectures often employ different tenant isolation strategies at different tiers. Our example is specific to API Keys, but the technique generalizes to storage, compute, and other resources.

In this example, we implement a range of tenant isolation strategies at different tiers of service. This allows us to protect against “noisy-neighbors” at the highest tier, minimize outlay of limited resources (namely, API-Keys) at the lowest tier, and still provide an effective, bounded “blast radius” of noisy neighbors at the mid-tier.

A concrete development example helps illustrate how this can be implemented. Assume three tiers of service: Free, Basic, and Premium. One could create a single API Key that is a pooled resource among all tenants in the Free Tier. At the other extreme, each Premium customer would get their own unique API Key. They would protect Premium tier tenants from the ‘noisy neighbor’ effect. In the middle, the Basic tenants would be evenly distributed across a set of fixed keys. This is not complete isolation for each tenant, but the impact of any one tenant is contained within “blast radius” defined.

In production, we recommend a more nuanced approach with additional considerations for monitoring and automation to continuously evaluate tiering strategy. We will revisit these topics in greater detail after considering the sample code.

Conclusion

In this post, we have reviewed how to effectively guard a tiered multi-tenant REST API hosted in Amazon API Gateway. We also explored how tiering and throttling strategies can influence tenant isolation models. In Part 2 of this blog series, we will dive deeper into tenant isolation models and gaining insights with metrics.

If you’d like to know more about the topic, the AWS Well-Architected SaaS Lens Performance Efficiency pillar dives deep on tenant tiers and providing differentiated levels of performance to each tier. It also provides best practices and resources to help you design and reduce impact of noisy neighbors your SaaS solution.

To learn more about Serverless SaaS architectures in general, we recommend the AWS Serverless SaaS Workshop and the SaaS Factory Serverless SaaS reference solution that inspired it.

Let’s Architect! Serverless architecture on AWS

2022-05-04 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-serverless-architecture-on-aws/

Serverless architecture and computing allow you and your teams to focus on delivering business value in place of investing time tweaking the infrastructure characteristics. AWS is not only providing serverless computing as a service, but share that half of our new applications built by Amazon are using AWS Lambda, as noted by Andy Jassy in his 2020 re:Invent keynote.

In this post, we share insights into reimagining a serverless environment.

I Build Applications – Event-driven Architecture

Event-driven architecture is common in modern applications built with microservices, and it is the cornerstone for designing serverless workloads. It uses events to trigger and communicate between decoupled services.

With this video, you can learn how to start with a prototype then scale to mass adoption using decoupled systems that run when responding to, without needing to redesign. Danilo Poccia, Chief Evangelist at AWS, begins the session with the APIs, then gives an example on how to build an event-driven architecture using Amazon EventBridge. The session closes with how to understand what is happening in this exchange of events.

Event-driven communication with asynchronous invocation

Building modern cloud applications? Think integration

This re:Invent 2021 session explains modern cloud applications based on serverless or microservices, and how connections between components define important characteristics, like scalability, availability, and coupling.

How your systems are interconnected describes your system’s essential properties, such as resiliency and changeability. Gregor Hohpe, AWS Enterprise Strategist, shares tips on what to consider when integrating different services, such as lifecycle, level of control over the systems you are integrating, and how integration becomes an integral part of your software delivery cycle. The goal is to use the same method to integrate at the same speed as software deployment.

Integration approaches with Gregor Hohpe

Serverless architectural patterns and best practices

Serverless architectures require a mindset shift: existing patterns need to be revisited, and new patterns created using the new architecture style. For each pattern created by AWS, we provide operational, security, and reliability best practices and discuss potential challenges. We also demonstrate some patterns in reference architecture diagrams.

This session helps you identify services and applications to create serverless architectures and understand areas of potential savings, increased agility, and reliability in your organization. Heitor Lessa, Principal Solutions Architect at AWS, starts the session identifying the benefits of Lambda Power Tuning: he details setting up memory when there are hundreds of functions, then follows with best practices for the pattern created.

Best practices for serverless architecture

Best practices of advanced serverless developers

This session is an overview of architectural best practices, optimizations, and handy codes that can be used to build secure, scalable, and high-performance serverless applications.

Julian Wood, Senior Developer Advocate at AWS, provides the recommended practices for implementing serverless applications inside your company, such as Lambda, to transform and not transport, avoid monolithic services and functions, orchestrate workflow with step functions, choreograph events. Julian also touches on understanding different ways you can invoke Lambda functions and what you should be aware of with each invocation model.

Three types of AWS Lambda invocation models

Building next-gen applications with event-driven architectures

Maintaining data consistency across multiple services can be challenging. It can also be difficult to work with large amounts of data in different data stores and locations. Teams building microservices architectures often find that integration with other applications and external services can make their workloads more monolithic and tightly coupled.

In this session, you can learn how to use event-based architectures to decouple and decentralize application components. Coupling is not one-dimensional, and it’s a trade-off to balance and optimize over time. This video demonstrates patterns based on message queues and events: for each pattern you can learn the advantages, the disadvantages, and the options for building it on AWS.

Sam Dengler, Principal Solutions Architect at AWS, explains the mental models to apply while designing choreography and orchestration in a scenario with microservices. The strategy adopted by Taco Bell for identifying their bounded contexts is also detailed, as well as the architecture built on Lambda for running the business logic and on AWS Step Functions for orchestration.

Choreography and orchestration are two modes of interaction in a microservices architecture

See you next time!

Thanks for joining our discussion on serverless architecting! If you want to deep dive into the topic, read all about Serverless on AWS!

See you in a couple of weeks when we discuss architecting for resilience!

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Build a custom Java runtime for AWS Lambda

2022-04-26 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/compute/build-a-custom-java-runtime-for-aws-lambda/

This post is written by Christian Müller, Principal AWS Solutions Architect and Maximilian Schellhorn, AWS Solutions Architect

When running applications on AWS Lambda, you have the option to use either one of the managed runtime versions that AWS provides or bring your own custom runtime. The following blog post provides a walkthrough of how you can create and optimize a custom runtime for Java based Lambda functions.

Builders might rely on customized or experimental runtime behavior when creating solutions in the cloud. The Java ecosystem fosters innovation and encourages experiments with the current six-month release schedule for the latest runtime versions.

However, Lambda focuses on providing stable long-term support (LTS) versions. The official Lambda runtimes are built around a combination of operating system, programming language, and software libraries that are subject to maintenance and security updates. For example, the Lambda runtime for Java supports the LTS versions Java 8 Corretto and Java 11 Corretto as of April 2022. The Java 17 Corretto version is pending. In addition, there is no provided runtime for non LTS versions like Java 15 Corretto, Java 16 Corretto, or Java 18 Corretto.

To use other language versions, Lambda allows you to create custom runtimes. Custom runtimes allow builders to provide and configure their own runtimes for running their application code. To enable communication between your custom runtime and Lambda, you can use the runtime interface client library in Java.

With the introduction of modular runtime images in Java 9 (JEP 220), it is possible to include only the Java runtime modules that your application depends on. This reduces the overall runtime size and increases performance, especially during cold-starts. In addition, there are other techniques in Java, like class data sharing and tiered compilation, which allow you to reduce the startup time of your application even further.

To combine those capabilities, this blog post provides an overview for creating and deploying a minified Java runtime on Lambda by using Java 18 Corretto. For step-by-step instructions and prerequisites, refer to the official GitHub example.

Overview of the example

In the following example, you build a custom runtime for a basic Java application that writes request headers to Amazon DynamoDB and is fronted by Amazon API Gateway.

The following diagram summarizes the steps to create the application and the custom runtime:

Download the preferred Java version and take advantage of jdeps, jlink and class data sharing to create a minified and optimized Java runtime based on the application code (function.jar).
Create a bootstrap file with optimized starting instructions for the application.
Package the application code, the optimized Java runtime, and the bootstrap file as a zip file.
Deploy the runtime, including the app, to Lambda. For example, using the AWS Cloud Development Kit (CDK)

Steps 1–3 are automated and abstracted via Docker. The following section provides a high-level walkthrough of the build and deployment process. For the full version, see the Dockerfile in the GitHub example.

Creating the optimized Java runtime

1. Download the desired Java version and copy the local application code to the Docker environment and build it with Maven:

FROM amazonlinux:2

...

# Update packages and install Amazon Corretto 18, Maven and Zip
RUN yum -y update
RUN yum install -y java-18-amazon-corretto-devel maven zip

...

# Copy the software folder to the image and build the function
COPY software software
WORKDIR /software/example-function
RUN mvn clean package

2. This step results in an uber-jar (function.jar) that you can use as an input argument for jdeps. The output is a file containing all the Java modules that the function depends on:

RUN jdeps -q \
    --ignore-missing-deps \
    --multi-release 18 \
    --print-module-deps \
    target/function.jar > jre-deps.info

3. Create an optimized Java runtime based on those application modules with jlink. Remove unnecessary information from the runtime, for example header files or man-pages:

RUN jlink --verbose \
    --compress 2 \
    --strip-java-debug-attributes \
    --no-header-files \
    --no-man-pages \
    --output /jre18-slim \
    --add-modules $(cat jre-deps.info)

4. This creates your own custom Java 18 runtime in the /jre18-slim folder. You can apply additional optimization techniques such as Class-Data-Sharing (CDS) to generate a classes.jsa file to accelerate the class loading time of the JVM.

RUN /jre18-slim/bin/java -Xshare:dump

Adding optimized starting instructions

You must tell the Lambda execution environment how to start the application. You can achieve that with a bootstrap file that includes the necessary instructions. In addition, you can define parameters to improve the performance further. For example, you could use tiered compilation and SerialGC.

The following snippet represents an example of a bootstrap file:

#!/bin/sh

$LAMBDA_TASK_ROOT/jre18-slim/bin/java \
    --add-opens java.base/java.util=ALL-UNNAMED \
    -XX:+TieredCompilation \
    -XX:TieredStopAtLevel=1 \
    -XX:+UseSerialGC \
    -jar function.jar "$_HANDLER"

Packaging the components

Combine the bootstrap file, the custom Java runtime, and the application code in a zip file for later use as the deployment package:

RUN zip -r runtime.zip \
    bootstrap \
    function.jar \
    /jre18-slim

The GitHub example provides a build.sh script to run the above-mentioned process via Docker. This results in a runtime.zip that you can then use as a deployment package.

Deploying the application with the custom runtime

To deploy the custom runtime, use AWS CDK. This allows you to define the needed infrastructure as code more easily in your favorite programming language.

The following code snippet shows how to create a Lambda function from a custom runtime:

Function customJava18Function = new Function(this, "LambdaCustomRuntimeJava18", FunctionProps.builder()
        .functionName("custom-runtime-java-18")
.handler("com.amazon.aws.example.ExampleDynamoDbHandler::handleRequest")
        .runtime(Runtime.PROVIDED_AL2)
        .code(Code.fromAsset("../runtime.zip"))
        .memorySize(512)
        .environment(Map.of("TABLE_NAME", exampleTable.getTableName()))
        .timeout(Duration.seconds(20))
        .logRetention(RetentionDays.ONE_WEEK)
        .build());

To deploy the application and output the necessary API Gateway URL to invoke the Lambda function, use the following command or use the provided provision_infrastructure.sh script:

cdk deploy --outputs-file target/outputs.json

Testing the application and validating the example results

After deployment, you can load test the application with the open-source software project Artillery.

The following command creates 120 concurrent invocations of the Lambda function for a duration of 60 seconds. It uses the API Gateway URL that is exported after the AWS CDK successfully deployed the application:

artillery run -t $(cat infrastructure/target/outputs.json | jq -r '.LambdaCustomRuntimeMinimalJRE18InfrastructureStack.apiendpoint') -v '{ "url": "/custom-runtime" }' infrastructure/loadtest.yml

Use CloudWatch Log Insights to query the Lambda logs and gather information about the cold start (initDuration) and duration percentiles:

filter @type = "REPORT"
    | parse @log /\d+:\/aws\/lambda\/(?<function>.*)/
    | stats
    count(*) as invocations,
    pct(@duration+coalesce(@initDuration,0), 0) as p0,
    pct(@duration+coalesce(@initDuration,0), 25) as p25,
    pct(@duration+coalesce(@initDuration,0), 50) as p50,
    pct(@duration+coalesce(@initDuration,0), 75) as p75,
    pct(@duration+coalesce(@initDuration,0), 90) as p90,
    pct(@duration+coalesce(@initDuration,0), 95) as p95,
    pct(@duration+coalesce(@initDuration,0), 99) as p99,
    pct(@duration+coalesce(@initDuration,0), 100) as p100
    group by function, ispresent(@initDuration) as coldstart
    | sort by coldstart, function

The results provide an indication of how your application performs with the custom runtime. This is especially helpful when comparing different versions.

Invocation time (@duration) for both cold and warm starts plus function initialization time (@initDuration) if it is a cold start:

Function initialization time (@initDuration) only:

Conclusion

In this blog post, you learn how to create your own optimized Java runtime for AWS Lambda by using a variety of Java optimization techniques. This allows you to tailor your Java runtime to your application needs.

See the full example on GitHub and make use of your own preferred Java version. Add additional optimization steps in the Dockerfile or tune the parameters in the bootstrap file to optimize the start of the Java virtual machine.

In case you want to re-use your custom runtime in multiple Lambda functions, you can also distribute it via a Lambda layer.

For more serverless learning resources, visit Serverless Land.

Let’s Architect! Using open-source technologies on AWS

2022-04-20 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-using-open-source-technologies-on-aws/

With open-source technology, authors make software available to the public, who can view, use, or change it and add new features or support new capabilities. Open-source technology promotes collaboration across different teams, organizations, and people because the process often includes different perspectives and ideas, which typically results a stronger solution.

It can be difficult to create a multi-use solution when building to solve for a specific challenge. With an open-source project or an initiative, multiple teams work together, which prevents coupling and makes the solution easier to generalize.

In this edition of Let’s Architect!, we show you some open-source technologies built with AWS and options for running well-known, open-source projects on AWS.

Firecracker: Secure and Fast microVMs for Serverless Computing

Firecracker was developed at AWS to improve the customer experience of services like AWS Lambda and AWS Fargate. This technology is used to deploy workloads in lightweight virtual machines (VMs), called microVMs. For example, when a new Lambda function is triggered in response to an event, AWS Lambda provisions a microVM (if none already exists) to handle the request. Behind the scenes, this is powered by Firecracker.

This video introduces Firecracker and the concept of virtual machine monitor as a technology to create and manage microVMs. This talk explains Firecracker’s foundation, the minimal device model, and how it interacts with various containers. You’ll learn about the performance, security, and utilization improvements enabled by Firecracker and how Firecracker is used for Lambda and Fargate.

An example host running Firecracker microVMs

Deep dive into AWS Cloud Development Kit

AWS Cloud Development Kit (CDK) is an open-source software development framework that allows you to define your cloud application resources using familiar programming languages. It uses object-oriented design to create resources and build an end-to-end process for application development from infrastructure and software-development perspectives.

This video introduces AWS CDK core concepts and demonstrates how to create custom resources and deploy them to the cloud. With AWS CDK, you can make deployments repeatable, automate operations through infrastructure as code, and use the software design patterns while coding your architecture.

AWS CDK is an open-source software development framework for defining cloud infrastructure as code

Using Apollo Server on AWS Lambda with Amazon EventBridge for real-time, event-driven streaming

Apollo Server is an open-source, spec-compliant GraphQL server that’s compatible with any GraphQL client. This blog posts covers how you can architect Apollo Server on AWS Lambda in an event-driven architecture. It shows you how to use the Apollo Server on AWS Lambda, integrate it with REST and WebSocket APIs and communicate asynchronously via event bus.

Sample application: a chat app that receives a text message from the client and responds with French and German translations of the message

Observability the open-source way

Removing the undifferentiated heavy lifting for implementing open-source software can allow you to plug-and-play your favorite solutions with existing AWS services. This video addresses best practices and real-world use cases for Amazon Managed Service for Prometheus, Amazon Managed Grafana, and AWS Distro for OpenTelemetry to gain observability. Observability is fundamental to collect and analyze data coming from your architecture, understand the status of your system, and take action to improve application performance.

Setting up Amazon Managed Service for Prometheus

See you next time!

See you in a couple of weeks when we discuss strategies for running serverless applications on AWS!

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Handling Lambda functions idempotency with AWS Lambda Powertools

2022-04-20 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/handling-lambda-functions-idempotency-with-aws-lambda-powertools/

This post is written by Jerome Van Der Linden, Solutions Architect Builder and Dariusz Osiennik, Sr Cloud Application Architect.

One of the advantages of using AWS Lambda is its integration with messaging services like Amazon SQS or Amazon EventBridge. The integration is managed and can also handle the retrying of failed messages. If there’s an error within the Lambda function, the failed message is sent again and the function is re-invoked.

This feature increases the resilience of the application but also means that a message can be processed multiple times by the function. This is important when managing orders, payments, or any kind of transaction that must be handled only once.

As mentioned in the design principles of Lambda, “since the same event may be received more than once, functions should be designed to be idempotent”. This article explains what idempotency is and how to implement it more easily with Lambda Powertools.

Understanding idempotency

Idempotency is the property of an operation whereby it can be applied multiple times without changing the result beyond the initial application. You can run an idempotent operation safely multiple times without any side effects like duplicates or inconsistent data. For example, this is a key principle for infrastructure as code, where you don’t want to double the number of resources each time you apply a template.

Applied to Lambda, a function is idempotent when it can be invoked multiple times with the same event with no risk of side effects. To make a function idempotent, it must first identify that an event has already been processed. Therefore, it must extract a unique identifier, called an “idempotency key”.

This identifier may be in the event payload (for example, orderId), a combination of multiple fields in the payload (for example, customerId, and orderId), or even a hash of the full payload. If using the full payload, fields such as dates, timestamps, or random elements may affect the hash and lead to changing values.

The function then checks in a persistence layer (for example, Amazon DynamoDB or Amazon ElastiCache):

If the key is not there, then the Lambda function can proceed normally, perform the transaction, and save the idempotency key in the persistence layer. You can potentially add the result of the function in the persistence layer too, so that subsequent calls can retrieve this result directly.
If the key is there, then the function can return and avoid applying the transaction again.

The following diagram shows the sequence of events with this idempotency scenario:

There are edge cases in this example:

You can invoke the function twice with the same event within a few milliseconds. In that case, each function acts as if it’s the first time this event is received and processes it, resulting in inconsistencies.
The function may perform several operations that are not idempotent. If the first operation is successful and then an error happens, the idempotency key won’t be saved. Subsequent calls redo the first operation, resulting in inconsistencies.

You can guard against these edge cases by inserting a lock as soon as the event is received:

There are other questions and edge cases that you must consider when implementing idempotency on your Lambda functions. Read Making retries safe with idempotent APIs from the Builder’s Library to dive into the details. You can choose to implement idempotency by yourself or you can use a library that handles it and takes care of these edge cases for you. This is what Lambda Powertools (for Python and Java) proposes.

Idempotency with Lambda Powertools

Lambda Powertools is a library, available in Python, Java, and TypeScript. It provides utilities for Lambda functions to ease the adoption of best practices and to reduce the amount of code to perform recurring tasks. In particular, it provides a module to handle idempotency (in the Java and Python versions).

This post shows examples using the Java version. To get started with the Lambda Powertools idempotency module, you must install the library and configure it within your build process. For more details, follow AWS Lambda Powertools documentation.

Next, you must configure a persistence storage layer where the idempotency feature can store its state. You can use the built-in support for DynamoDB or you can create your own implementation for the database of your choice. This example creates a new table in DynamoDB.

The following AWS Serverless Application Model (AWS SAM) template creates a suitable table to store the state:

Resources:
  IdempotencyTable:
    Type: AWS::DynamoDB::Table
    Properties:
      AttributeDefinitions:
        - AttributeName: id
          AttributeType: S
      KeySchema:
        - AttributeName: id
          KeyType: HASH
      TimeToLiveSpecification:
        AttributeName: expiration
        Enabled: true
      BillingMode: PAY_PER_REQUEST

In this definition:

The table is multi-tenant and can be reused by multiple Lambda functions that use the Powertools idempotency module.
The DynamoDB time-to-live configuration helps keep idempotency limited in time. You can configure the duration, which is 1 hour by default.

Configure the idempotency module’s behavior in the init phase of the function’s lifecycle, before the handleRequest method gets called:

public class SubscriptionHandler implements RequestHandler<Subscription, SubscriptionResult> {

  public SubscriptionHandler() {
    Idempotency.config().withPersistenceStore(
      DynamoDBPersistenceStore.builder()
        .withTableName(System.getenv("TABLE_NAME"))
        .build()
      ).configure();
  }
}

Lambda Powertools follows the paradigm of convention over configuration and provides default values for many parameters. The persistence store is the only required element. To use the DynamoDB implementation, you must specify a table name. In the previous sample, the name is provided by the environment variable TABLE_NAME.

Adding the @Idempotent annotation to the handleRequest method enables the idempotency functionality. It uses a hash of the Subscription event as the idempotency key.

@Idempotent
  public SubscriptionResult handleRequest(final Subscription event, final Context context) {
    SubscriptionPayment payment = createSubscriptionPayment(
      event.getUsername(),
      event.getProductId()
    );

    return new SubscriptionResult(payment.getId(), "success", 200);
  }

Creating orders

The example is about creating an order for a user for a list of products. Orders should not be duplicated if the client repeats the request. API consumers can safely retry a create order request in case of issues (such as a timeout or networking disruption). The application should also allow the user to buy the same products in a short period of time if that is the user’s intention.

The following architecture diagram consists of an Amazon API Gateway REST API, the idempotent Lambda function, and a DynamoDB table for storing orders.

The Orders API allows creating a new order by calling its POST /orders endpoint with the following sample payload:

{
  "requestToken": "260d2efe-af84-11ec-b909-0242ac120002",
  "userId": "user1",  
  "items": [
    {
      "productId": "product1",      
      "price": 6.50,      
      "quantity": 5    
    },    
    {
      "productId": "product2",      
      "price": 13.50,      
      "quantity": 2    
    }
  ],  
  "comment": "AWSome Order"
}

Lambda Powertools uses JMESPath to extract the important fields from the request that uniquely identify it. It then calculates a hash of these fields to constitute the idempotency key.

In the example, the important fields are the userId and the items, to avoid duplicated orders. But the user can also buy the same list of products in a short period of time. To allow this, the API consumer can generate a client-side token and assign its value to the requestToken field. For each unique order, the token has a different value. If a request is retried by the client, it uses the same token.

This leads to the following configuration for the idempotency key:

Idempotency.config()
  .withConfig( 
    IdempotencyConfig.builder()
    .withEventKeyJMESPath("powertools_json(body).[requestToken,userId,items]")
    .build())

If the same request is sent more than once, only the first call results in a new order created in the DynamoDB table. The same order identifier is returned by the endpoint for all the subsequent calls. In this way, the API consumer can safely retry the requests without worrying about duplicating the order.

You can find the source code of the example on GitHub.

Processing payments

This example shows asynchronous batch processing of payment messages from a queue. Messages must not be processed more than once to avoid charging users multiple times for the same order. You must consider edge cases like at-least-once message delivery, an error response returned by the third party payment API or retrying the batch of messages.

The following architecture diagram shows an Amazon SQS queue, the idempotent Lambda function, and a third-party API that the function calls for payment.

This is the body of a single payment SQS message:

{
    "orderId": "order1",
    "userId": "user1",
    "amount": "50.25"
}

In this example, the process method is annotated as idempotent, not the handleRequest. The method is responsible for processing a single payment record from the SQS batch. It uses @IdempotencyKey annotation to specify which parameter to use as the idempotency key.

@Override
public List<String> handleRequest(SQSEvent sqsEvent, Context context) {
return sqsEvent.getRecords()
  .stream()
  .map(record -> process(record.getMessageId(),record.getBody()))
  .collect(Collectors.toList());
}

@Idempotent
private String process(String messageId, @IdempotencyKey String messageBody) {
    logger.info("Processing messageId: {}", messageId);
    PaymentRequest request = 
extractDataFrom(messageBody).as(PaymentRequest.class);
    return paymentService.process(request);
}

If an SQS record with the same payload is received more than once, the third-party API is not called multiple times. All the subsequent calls return before calling the process method.

If an exception is thrown from the process method, the idempotency feature does not store the idempotency state in the persistence layer. The payment is treated as unprocessed and can be retried safely. This may happen if the third-party API returns a server-side error.

By default, if one message from a batch fails, all the messages in the batch are retried. Lambda Powertools also offers the SQS Batch Processing module which can help in handling partial failures.

You can find the source code of this example in the GitHub repo.

Conclusion

Idempotency is a critical piece of serverless architectures and can be difficult to implement. If not done correctly, it can lead to inconsistent data and other issues. This post shows how you can use Lambda Powertools to make Lambda functions idempotent and ensure that critical transactions are handled only once.

For more details about the Lambda Powertools idempotency feature and its configuration options, refer to the full documentation.

For more serverless learning resources, visit Serverless Land.

How MarketAxess® uses AWS Developer Tools to create scalable and secure CI/CD pipelines

2022-04-18 Aaron Lima

Post Syndicated from Aaron Lima original https://aws.amazon.com/blogs/devops/how-marketaxess-uses-aws-developer-tools-to-create-scalable-and-secure-ci-cd-pipelines/

Very often, enterprise organizations strive to adopt modern DevOps practices, tofocus on governance and security without sacrificing development velocity. In this guest post, Prashant Joshi, Senior Cloud Engineer at MarketAxess, explains how they use the AWS Cloud Development Kit (AWS CDK), AWS CodePipeline, and AWS CodeBuild to simplify the developer experience by dynamically provisioning pipelines and maintaining governance at MarketAxess.

Problem Statement

MarketAxess is a financial technology company that operates an e-trading platform, for institutional credit markets. As MarketAxess adopted DevOps firm-wide, we struggled to ensure pipeline consistency. We had developers using static code analysis and linting, but it wasn’t enforced. As more teams began to adopt DevOps practices, the importance of providing consistency over code quality, security scanning, and artifact management grew. However, we were challenged with increasing our engineering workforce and implementing best practices in the various pipelines. As a small team, we needed a way to reliably manage and scale pipelines while reducing engineering overhead. We thought about the DevOps tenets, as well as the importance of automation, and we decided to build automation that would provision pipelines for development teams. These pipelines included best practices for Continuous Integration and Continuous Deployment (CI/CD). We wanted to build this automation with self-service, so that teams can get started developing a solution to a business problem, without having to spend too much time around the CI/CD aspects of their projects.

We chose the AWS CDK to deploy AWS CodePipeline, AWS CodeBuild, and AWS Identity and Access Management (IAM) resources, and used an API webhook using AWS Lambda and Amazon API Gateway for integration. In this post, we provide an example of how these services can be used to create dynamic cross account CI/CD pipelines.

Solution

In developing our solution, we wanted to accomplish three main goals:

Standardization and Governance of Pipelines – We wanted to ensure consistent practices in each team’s pipeline to make sure of code quality and security.
Simplified Developer Interaction – We wanted developers to focus mainly on interacting with the code repository for their project.
Improve Management of Dynamically Provisioned Pipelines – Knowing that we would need to make changes, improvements, and enhancements, we wanted tools and a process that was flexible.

We achieved these goals using AWS CDK to automate the creation of CodePipeline and define mandatory actions in the pipeline. We also created a webhook using API Gateway to integrate with our Bitbucket repositories to automatically trigger the automation. The pipelines can dynamically be provisioned or updated based on the YAML manifest file submitted to the repository. We process the manifest file with Amazon Elastic Container Service (Amazon ECS) Fargate tasks, because we had containerized the processing components using Docker. However, with the release of container support in Lambda, we are now considering this as a potential replacement. These pipelines run CI stages based on the programing language defined by development teams in the manifest file, and they deploy a tested versioned artifact to the corresponding environments via standard Software Defined Lifecycle (SDLC) practices. As a part of CI stages, we semantically version our code and tag our commits accordingly. This lets us trace commit to pipeline execution. The following architecture diagram shows a CloudFormation pipeline generated via AWS CDK.

CloudFormation Pipeline Architecture Diagram

The process flow is as follows:

Developer pushes a change to the repository.
A webhook is triggered when the Pull Request is merged that creates or modifies the pipeline based on the manifest file submitted to the repository.
This triggers a Lambda function that performs the following:
1. Clones the repository from Internally hosted BitBucket repos.
2. Uploads the repository to the source Amazon Simple Storage Service (Amazon S3) bucket, which is encrypted using Customer Managed Keys (CMK) with the AWS Key Management Service (KMS).
3. An ECS Task is run, and a manifest file is passed which gives the project parameters. Pipelines are built according to these project parameters.
An ECS Task processes the metadata file and runs cdk Logic, finally it triggers the pipeline.
1. As source code is progressed through the pipeline, the build stage output to the artifact bucket. Pipeline artifacts are encrypted with a CMK. The IAM roles in the target account only have access to this bucket.

Additionally, through the power of the IAM integration with CodePipeline, the team could implement session tags with IAM roles and Okta to make sure that independent teams only approve pipelines, which are owned by respective teams. Furthermore, we use attribute-based tags to protect the production environment from unauthorized actions, so that deployment to production can only come through the pipeline.

The AWS CDK-based pipelines let MarketAxess enable teams to independently build and obtain immediate feedback, while still centrally governing CI and CD patterns. The solution took six months of two DevOps engineers working full time to build the cdk structure and support for the core languages and their corresponding CI and CD stages. We continue to iterate on the cdk code base and pipelines, incorporating feedback from our development community to ensure developer satisfaction.

Simplified Developer Interaction

Although we were enforcing standards via the automation, we still wanted to give development teams autonomy through a simple mechanism. We wanted developers to interact with our pipeline creation process through a pipeline manifest file that they submitted to their repository. An example of the manifest file schema is in the following screenshot:

Manifest File Schema

As shown above, the manifest lets developers define custom application configurations, while preserving consistent quality gates. This manifest is checked in to source control, and upon a commit to the code repository it triggers our automation. This lets our pipelines mutate on manifest file changes, and it makes sure that the latest commit goes through the latest quality gates. Each repository gets its own pipeline, and, to maintain the security of the pipeline, we used IAM Session Tags with Okta. We tag each pipeline and its associated resources with a unique attribute that is mapped to the development team so that they only have access to their pipelines, and only authorized individuals may approve production deployments.

Using AWS CDK, AWS CodePipeline, and other AWS Services, we have been able to improve the stability and quality of the code being delivered. CodePipeline and AWS CDK have helped us develop a cloud native pipeline solution that meets our governance best practices and compliance requirements. We met our three goals, and we can iterate and change easily moving forward.

Conclusion

Organizations that achieve the automation and self-service ideals of DevOps can build, release, and deploy features and apps to users faster and at higher levels of quality. In this post, we saw a real-life example of using Infrastructure as Code with AWS CDK to build a service that helps maintain governance and helps developers get work done. Here are two other posts that demonstrate using AWS Service Catalog to create secure DevOps pipelines or DevOps pipelines that deploy containerized applications.