Tag Archives: Amazon DynamoDB

Integrating Salesforce with AWS DynamoDB using Amazon AppFlow bi-directionally

Post Syndicated from Abhijit Vaidya original https://aws.amazon.com/blogs/architecture/integrating-salesforce-with-aws-dynamodb-using-amazon-appflow-bi-directionally/

In this blog post, we demonstrate how to integrate Salesforce Lightning with Amazon DynamoDB by using Amazon AppFlow and Amazon EventBridge services bi-directionally. This is an event-driven, serverless-based microservice, allowing Salesforce users to update configuration data stored in DynamoDB tables without giving AWS account access from AWS Command Line Interface or AWS Management Console.

This architecture describes a contact center application that utilizes DynamoDB database to store configuration data, including holiday data, across different global regions. Updating this data in a table is a manual process, which includes creating a support ticket and waiting for completion of the support request. The architecture herein allows authorized business users to update the configuration data in DynamoDB directly from the Salesforce screen and not relying on manual update process.

Solution overview

As demonstrated in Figure 1, Amazon AppFlow consumes events payload from salesforce and Lambda updates the DynamoDB table after processing the payload. In parallel, a scheduled based flow sends response back to salesforce object as a notification and sends a SNS email notification to users. Contact center application (Amazon Connect) reads data from DynamoDB table.

Architecture overview

Figure 1. Architecture overview

  1. Event (data add or delete) occurring on a particular object in Salesforce is captured by Amazon AppFlow (event-based flow); event payload displays as “Input” in AWS environment.
  2. An EventBridge bus receives the “Input” payload.
  3. The EventBridge bus triggers the AWS Lambda (Lambda-1).
  4. Lambda-1 processes the “Input” payload, then performs a write operation (data add or delete) on a DynamoDB table.
  5. A write operation on the table triggers DynamoDB streams.
  6. DynamoDB stream triggers Lambda (Lambda-2).
  7. Lambda-2 processes the payload from DynamoDB streams. It saves the results in .csv file format, with a “Success” or “Fail” flag. This .csv file is uploaded to an S3 bucket.
  8. A second flow (schedule-based) is configured in Amazon AppFlow. This reads the S3 bucket at a regular interval of time to find new .csv files created by Lambda-2.
  9. A schedule-based flow transfers the .csv file data as an event-status output to the Salesforce object, notifying the user that their event has been successfully handled in DynamoDB table.
  10. In parallel, DynamoDB streams trigger Lambda (Lambda-3), which processes the payload and initiates an Amazon Simple Notification Service (Amazon SNS) notification based on the status of the event request.
  11. Amazon SNS sends an email to the user detailing that the event created in the Salesforce object was successfully completed in DynamoDB table.
  12. If any of the two flows break due to any unforeseen events, their statuses are instantly captured and streamed to an EventBridge bus.
  13. The EventBridge bus triggers Lambda (Lambda-5).
  14. Lambda-5 tries to analyze the error causing failure and tries to get the flow to a working state again. If Lambda-5 fails, it initiates an Amazon SNS email to the solution support team to manually fix the Amazon AppFlow error and get the solution active again.
  15. Meanwhile, whenever an Amazon Connect contact center receives a call, it triggers Lambda (Lambda-4), which holds read-only access to DynamoDB table.
  16. Lambda-4 fetches the data stored by Lambda-1 in DynamoDB table and provides that data to a contact center in Amazon Connect.


  • An AWS account with permissions to access all the required
  • An active Salesforce account with administration-level permissions set
  • Python 3.X version setup for developing Lambda-1 to -5

Implementation and development

1. Development steps on the Salesforce end

Note: We recommend working with a Salesforce expert. These steps can help initiate the development from Salesforce end.

a. Use the Platform Event feature to allow the data flow from Salesforce environment to AWS environment and then back to Salesforce from AWS environment using Amazon AppFlow.
b. Salesforce Connected Apps and web server flow used for integrating Amazon AppFlow with Salesforce.
c. New permissions set and new integration users are created to restrict the access to custom object.

2. Development steps on the AWS end

a. Create a new connection in Amazon AppFlow with “Salesforce” as the source
The “Client ID” and “Client Secret” of Salesforce Managed Connected App are stored in AWS Secrets Manager, which is encrypted by custom keys generated in AWS Key Management Service.

To setup an Amazon AppFlow connection with Salesforce, a stand-alone Lambda function is created using Python Boto3 API. This creates a connection in Amazon AppFlow using the “Client ID” and “Client Secret” of Salesforce connected app.

For more details regarding setting up the Salesforce connection in Amazon AppFlow, refer to the Amazon AppFlow User Guide.

b. Create new event-based input flow in Amazon AppFlow, using Salesforce as the source with the newly created connection
Select the “Salesforce events” option as per the business use case. For representation in this blog, “Telephony” is chosen as salesforce events. Select Amazon EventBridge as the destination. Create new “partner event source” for successful creation of input flow, as in Figure 2.

Configure an event-based flow in Amazon AppFlow

Figure 2. Configure an event-based flow in Amazon AppFlow

c. Handling large size input flow event payloads
Post successful creation of “Partner event source”, specify the S3 bucket for events that are larger than 256 KB, Amazon AppFlow sends a URL of the S3 object to the event bus instead of the event payload.

d. Configuring details for input flow
Select trigger pattern of input flow as “Run flow on event”, as shown in Figure 3.

Trigger pattern for creating input flow

Figure 3. Trigger pattern for creating input flow

As displayed in Figure 4, we can configure data field mapping, validation rules, and filters with Amazon AppFlow. This enables us to enrich and modify event data before it is sent to the event bus. Post this, input flow create action is complete.

Mapping Salesforce object fields with Amazon EventBridge bus

Figure 4. Mapping Salesforce object fields with Amazon EventBridge bus

e. Associating Amazon AppFlow generated partner event source with the event bus in the Amazon EventBridge dashboard
Before activating the flow, access the Amazon EventBridge console to associate the AppFlow generated partner-event-source with the event bus (Figure 5).

Associating input flow partner event source with the Amazon EventBridge bus

Figure 5. Associating input flow partner event source with the Amazon EventBridge bus

f. Post Amazon EventBridge bus association, activate the input flow
After associating the bus with input flow, navigate back to Amazon AppFlow console and click the “Activate flow” button for the input flow. Once active, input flow is ready to consume the event payload from Salesforce.

g. Amazon EventBridge bus triggering Lambda function
The EventBridge bus receives an event payload from the input flow and triggers Lambda (Lambda-1) to process that raw event payload and write the output results in a designated DynamoDB table. A sample of event input payload is in Figure 6. This payload content depends on the use case for which developer is working.

Input event payload sample from Salesforce

Figure 6. Input event payload sample from Salesforce

Lambda-1 adds the record-ID in the DynamoDB table, which is a unique event ID for each Salesforce event as shown in Figure 7.

Data added in Amazon DynamoDB table by Lambda-1

Figure 7. Data added in Amazon DynamoDB table by Lambda-1

h. Configuring DynamoDB streams
Within “Export and Streams” option in the DynamoDB table, enable the “DynamoDB stream details” and in the trigger section click on “Create trigger” option and select  Lambda-2 and Lambda -3, as detailed in Figure 8.

Configuring Amazon DynamoDB streams

Figure 8. Configuring Amazon DynamoDB streams

Lambda-2 stores the event output results and a success flag value in a .csv file that is created for every unique event; the .csv file is uploaded to an S3 bucket with the file name structure “Salesforce event recorded-event action-timestamp.csv”. For example:

Example 1: “abcd1234-event-created- 2022-05-19-11_23_51.csv” (data added)
Example 2: “abcd1234-event-deleted- 2022-05-19-11_24_50.csv” (data deleted)

i. Create new schedule-based flow (output flow) in Amazon AppFlow
The source is the S3 bucket folder where the .csv file is uploaded by Lambda-2. Select the destination as “Salesforce”, and choose the Salesforce object that is used to create the input flow (Figure 9). Revisit Step b for reference.

Configuring a schedule-based output flow

Figure 9. Configuring a schedule-based output flow

Output flow sends a response back to the same Salesforce object from which data addition/deletion event request was made. This confirms to the user that a data addition/deletion event created in Salesforce was successfully invoked in Dynamo DB table as well.

j. Error handling in output flow
In case output flow fails to write the response back to the Salesforce object, choose the option “Stop the current flow run”.

k. Configuring output flow as a run-on schedule
Output flow is a schedule-based flow that runs at specific time. Within the flow trigger window, select “Run flow on schedule”. Update the other fields, such as “Repeats”, “Every”, “Start date”, and “Starting at” per your specific needs. Within “Transfer mode”, select “Incremental transfer”. Refer to Figure 10.

Trigger pattern for schedule-based output flow

Figure 10. Trigger pattern for schedule-based output flow

l. Amazon AppFlow to Salesforce object mapping for output flow
Select Mapping method as “Manually map fields” and Destination record preference as “Upsert records”, as in Figure 11.

Output Flow will update the event record status in the Salesforce object, with success status flag value in the .csv file based on the unique “Record ID” that every Salesforce event payload contains. Once the field mapping is completed, output flow is active.

Manually mapping data fields with Salesforce

Figure 11. Manually mapping data fields with Salesforce

m. Real-time monitoring and failure handling
In case input/output flow breaks for unforeseen reasons, a rule is configured in EventBridge bus console that invokes Lambda (Lambda-5). Lambda-5 tries to handle the error and reactivate the flow. In case this action fails, Lambda sends an Amazon SNS email to the solution support team informing of the failure in Amazon AppFlow and the cause.

n. Integrating DynamoDB with Amazon Connect
Lambda (Lambda-4) is configured with contact center in Amazon Connect. As the call comes to contact center, Lambda-4 fetches the relevant data from the DynamoDB table. Amazon Connect operates per this data.


To avoid incurring future charges, delete any AWS resources that are no longer needed.


This post demonstrates the approach for developing an event-driven, serverless application that integrates DynamoDB with Salesforce using Amazon AppFlow bi-directionally. The contact center is based in Amazon Connect and functions dependent on the real-time data in a DynamoDB table—without manual intervention. The manual process can take a minimum of 24 hours, but the same action can be automatically completed using a self-service, UI-based form in the user’s Salesforce account.

This solution can be tailored depending on the business or technical requirement, differing how the data is consumed by multiple AWS services.

Extending your SaaS platform with AWS Lambda

Post Syndicated from Hasan Tariq original https://aws.amazon.com/blogs/architecture/extending-your-saas-platform-with-aws-lambda/

Software as a service (SaaS) providers continuously add new features and capabilities to their products to meet their growing customer needs. As enterprises adopt SaaS to reduce the total cost of ownership and focus on business priorities, they expect SaaS providers to enable customization capabilities.

Many SaaS providers allow their customers (tenants) to provide customer-specific code that is triggered as part of various workflows by the SaaS platform. This extensibility model allows customers to customize system behavior and add rich integrations, while allowing SaaS providers to prioritize engineering resources on the core SaaS platform and avoid per-customer customizations.

To simplify experience for enterprise developers to build on SaaS platforms, SaaS providers are offering the ability to host tenant’s code inside the SaaS platform. This blog provides architectural guidance for running custom code on SaaS platforms using AWS serverless technologies and AWS Lambda without the overhead of managing infrastructure on either the SaaS provider or customer side.

Vendor-hosted extensions

With vendor-hosted extensions, the SaaS platform runs the customer code in response to events that occur in the SaaS application. In this model, the heavy-lifting of managing and scaling the code launch environment is the responsibility of the SaaS provider.

To host and run custom code, SaaS providers must consider isolating the environment that runs untrusted custom code from the core SaaS platform, as detailed in Figure 1. This introduces additional challenges to manage security, cost, and utilization.

Distribution of responsibility between Customer and SaaS platform with vendor-hosted extensions

Figure 1. Distribution of responsibility between Customer and SaaS platform with vendor-hosted extensions

Using AWS serverless services to run custom code

Using AWS serverless technologies removes the tasks of infrastructure provisioning and management, as there are no servers to manage, and SaaS providers can take advantage of automatic scaling, high availability, and security, while only paying for value.

Example use case

Let’s take an example of a simple SaaS to-do list application that supports the ability to initiate custom code when a new to-do item is added to the list. This application is used by customers who supply custom code to enrich the content of newly added to-do list items. The requirements for the solution consist of:

  • Custom code provided by each tenant should run in isolation from all other tenants and from the SaaS core product
  • Track each customer’s usage and cost of AWS resources
  • Ability to scale per customer

Solution overview

The SaaS application in Figure 2 is the core application used by customers, and each customer is considered a separate tenant. For the sake of brevity, we assume that the customer code was already stored in an Amazon Simple Storage Service (Amazon S3) bucket as part of the onboarding. When an eligible event is generated in the SaaS application as a result of user action, like a new to-do item added, it gets propagated down to securely launch the associated customer code.

Example use case architecture

Figure 2. Example use case architecture

Walkthrough of custom code run

Let’s detail the initiation flow of custom code when a user adds a new to-do item:

  1. An event is generated in the SaaS application when a user performs an action, like adding a new to-do list item. To extend the SaaS application’s behavior, this event is linked to the custom code. Each event contains a tenant ID and any additional data passed as part of the payload. Each of these events is an “initiation request” for the custom code Lambda function.
  2. Amazon EventBridge is used to decouple the SaaS Application from event processing implementation specifics. EventBridge makes it easier to build event-driven applications at scale and keeps the future prospect of adding additional consumers. In case of unexpected failure in any downstream service, EventBridge retries sending events a set number of times.
  3. EventBridge sends the event to an Amazon Simple Queue Service (Amazon SQS) queue as a message that is subsequently picked up by a Lambda function (Dispatcher) for further routing. Amazon SQS enables decoupling and scaling of microservices and also provides a buffer for the events that are awaiting processing.
  4. The Dispatcher polls the messages from SQS queue and is responsible for routing the events to respective tenants for further processing. The Dispatcher retrieves the tenant ID from the message and performs a lookup in the database (we recommend Amazon DynamoDB for low latency), retrieves tenant SQS Amazon Resource Name (ARN) to determine which queue to route the event. To further improve performance, you can cache the tenant-to-queue mapping.
  5. The tenant SQS queue acts as a message store buffer and is configured as an event source for a Lambda function. Using Amazon SQS as an event source for Lambda is a common pattern.
  6. Lambda executes the code uploaded by the tenant to perform the desired operation. Common utility and management code (including logging and telemetry code) is kept in Lambda layers that get added to every custom code Lambda function provisioned.
  7. After performing the desired operation on data, custom code Lambda returns a value back to the SaaS application. This completes the run cycle.

This architecture allows SaaS applications to create a self-managed queue infrastructure for running custom code for tenants in parallel.

Tenant code upload

The SaaS platform can allow customers to upload code either through a user interface or using a command line interface that the SaaS provider provides to developers to facilitate uploading custom code to the SaaS platform. Uploaded code is saved in the custom code S3 bucket in .zip format that can be used to provision Lambda functions.

Custom code Lambda provisioning

The tenant environment includes a tenant SQS queue and a Lambda function that polls initiation requests from the queue. This Lambda function serves several purposes, including:

  1. It polls messages from the SQS queue and constructs a JSON payload that will be sent an input to custom code.
  2. It “wraps” the custom code provided by the customer using boilerplate code, so that custom code is fully abstracted from the processing implementation specifics. For example, we do not want custom code to know that the payload it is getting is coming from Amazon SQS or be aware of the destination where launch results will be sent.
  3. Once custom code initiation is complete, it sends a notification with launch results back to the SaaS application. This can be done directly via EventBridge or Amazon SQS.
  4. This common code can be shared across tenants and deployed by the SaaS provider, either as a library or as a Lambda layer that gets added to the Lambda function.

Each Lambda function execution environment is fully isolated by using a combination of open-source and proprietary isolation technologies, it helps you to address the risk of cross-contamination. By having a separate Lambda function provisioned per-tenant, you achieve the highest level of isolation and benefit from being able to track per-tenant costs.


In this blog post, we explored the need to extend SaaS platforms using custom code and why AWS serverless technologies—using Lambda and Amazon SQS—can be a good fit to accomplish that. We also looked at a solution architecture that can provide the necessary tenant isolation and is cost-effective for this use case.

For more information on building applications with Lambda, visit Serverless Land. For best practices on building SaaS applications, visit SaaS on AWS.

AWS Week in Review – August 22, 2022

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-week-in-review-august-22-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

I’m back from my summer holidays and ready to get up to date with the latest AWS news from last week!

Last Week’s Launches
Here are some launches that got my attention during the previous week.

Amazon CloudFront now supports HTTP/3 requests over QUIC. The main benefits of HTTP/3 are faster connection times and fewer round trips in the handshake process. HTTP/3 is available in all 410+ CloudFront edge locations worldwide, and there is no additional charge for using this feature. Read Channy’s blog post about this launch to learn more about it and how to enable it in your applications.

Using QUIC in HTTP3 vs HTTP2

Amazon Chime has announced a couple of really cool features for their SDK. Now you can compose video by concatenating video with multiple attendees, including audio, content and transcriptions. Also, Amazon Chime SDK launched the live connector pipelines that send real-time video from your applications to streaming platforms such as Amazon Interactive Video Service (IVS) or AWS Elemental MediaLive. Now building real-time streaming applications becomes easier.

AWS Cost Anomaly Detection has launched a simplified interface for anomaly exploration. Now it is easier to monitor spending patterns to detect and alert anomalous spend.

Amazon DynamoDB now supports bulk imports from Amazon S3 to a new table. This new launch makes it easier to migrate and load data into a new DynamoDB table. This is a great use for migrations, to load test data into your applications, thereby simplifying disaster recovery, among other things.

Amazon MSK Serverless, a new capability from Amazon MSK launched in the spring of this year, now has support for AWS CloudFormation and Terraform. This allows you to describe and provision Amazon MSK Serverless clusters using code.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other updates and news that you may have missed:

This week there were a couple of stories that caught my eye. The first one is about Grillo, a social impact enterprise focused on seismology, and how they used AWS to build a low-cost earthquake early warning system. The second one is from the AWS Localization team about how they use Amazon Translate to scale their localization in order to remove language barriers and make AWS content more accessible.

Podcast Charlas Técnicas de AWS – If you understand Spanish, this podcast is for you. Podcast Charlas Técnicas is one of the official AWS podcasts in Spanish, and every other week there is a new episode. The podcast is meant for builders, and it shares stories about how customers implemented and learned to use AWS services, how to architect applications, and how to use new services. You can listen to all the episodes directly from your favorite podcast app or at AWS Podcast en español.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

AWS Summits – Registration is open for upcoming in-person AWS Summits. Find the one closest to you: Chicago (August 28), Canberra (August 31), Ottawa (September 8), New Delhi (September 9), Mexico City (September 21–22), Bogota (October 4), and Singapore (October 6).

GOTO EDA Day 2022 – Registration is open for the in-person event about Event Driven Architectures (EDA) hosted in London on September 1. There will be a great line of speakers talking about the best practices for building EDA with serverless services.

AWS Virtual Workshop – Registration is open for the free virtual workshop about Amazon DocumentDB: Getting Started and Business Continuity Planning on August 24.

AWS .NET Enterprise Developer Days 2022Registration for this free event is now open. This is a 2-day, in-person event on September 7-8 at the Palmer Events Center in Austin, Texas, and a 2-day virtual event on September 13-14.

That’s all for this week. Check back next Monday for another Week in Review!

— Marcia

How to track AWS account metadata within your AWS Organizations

Post Syndicated from Jonathan Nguyen original https://aws.amazon.com/blogs/architecture/how-to-track-aws-account-metadata-within-your-aws-organizations/

United States Automobile Association (USAA) is a San Antonio-based insurance, financial services, banking, and FinTech company supporting millions of military members and their families. USAA has partnered with Amazon Web Services (AWS) to digitally transform and build multiple USAA solutions that help keep members safe and save members’ money and time.

Why build an AWS account metadata solution?

The USAA Cloud Program developed a centralized solution for collecting all AWS account metadata to facilitate core enterprise functions, such as financial management, remediation of vulnerable and insecure configurations, and change release processes for critical application and infrastructure changes.

Companies without centralized metadata solutions may have distributed documents and wikis that contain account metadata, which has to be updated manually. Manually inputting/updating information generally leads to outdated or incorrect metadata and, in addition, requires individuals to reach out to multiple resources and teams to collect specific information.

Solution overview

USAA utilizes AWS Organizations and a series of GitLab projects to create, manage, and baseline all AWS accounts and infrastructure within the organization, including identity and access management, security, and networking components. Within their GitLab projects, each deployment uses a GitLab baseline version that determines what version of the project was provisioned within the AWS account.

During the creation and onboarding of new AWS accounts, which are created for each application team and use-case, there is specific data that is used for tracking and governance purposes, and applied across the enterprise. USAA’s Public Cloud Security team took an opportunity within a hackathon event to develop the solution depicted in Figure 1.

  1. AWS account is created conforming to a naming convention and added to AWS Organizations.

Metadata tracked per AWS account includes:

    • AWS account name
    • Points of contact
    • Line of business (LOB)
    • Cost center #
    • Application ID #
    • Status
    • Cloud governance record #
    • GitLab baseline version
  1. Amazon EventBridge rule invokes AWS Step Functions when new AWS accounts are created.
  2. Step Functions invoke an AWS Lambda function to pull AWS account metadata and load into a centralized Amazon DynamoDB table with Streams enabled to support automation.
  3. A private Amazon API Gateway is exposed to USAA’s internal network, which queries the DynamoDB table and provides AWS account metadata.
Overview of USAA architecture automation workflow to manage AWS account metadata

Figure 1. Overview of USAA architecture automation workflow to manage AWS account metadata

After the solution was deployed, USAA teams leveraged the data in multiple ways:

  1. User interface: a front-end user-interface querying the API Gateway to allow internal users on the USAA network to filter and view metadata for any AWS accounts within AWS Organizations.
  2. Event-driven automation: DynamoDB streams for any changes in the table that would invoke a Lambda function, which would check the most recent version from GitLab and the GitLab baseline version in the AWS account. For any outdated deployments, the Lambda function invokes the CI/CD pipeline for that AWS account to deploy a standardized set of IAM, infrastructure, and security resources and configurations.
  3. Incident response: the Cyber Threat Response team reduces mean-time-to-respond by developing automation to query the API Gateway to append points-of-contact, environment, and AWS account name for custom detections as well as Security Hub and Amazon GuardDuty findings.
  4. Financial management: Internal teams have integrated workflows to their applications to query the API Gateway to return cost center, LOB, and application ID to assist with financial reporting and tracking purposes. This replaces manually reviewing the AWS account metadata from an internal and manually updated wiki page.
  5. Compliance and vulnerability management: automated notification systems were developed to send consolidated reports to points-of-contact listed in the AWS account from the API Gateway to remediate non-compliant resources and configurations.


In this post, we reviewed how USAA enabled core enterprise functions and teams to collect, store, and distribute AWS account metadata by developing a secure and highly scalable serverless application natively in AWS. The solution has been leveraged for multiple use-cases, including internal application teams in USAA’s production AWS environment.

Amazon Prime Day 2022 – AWS for the Win!

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-prime-day-2022-aws-for-the-win/

As part of my annual tradition to tell you about how AWS makes Prime Day possible, I am happy to be able to share some chart-topping metrics (check out my 2016, 2017, 2019, 2020, and 2021 posts for a look back).

My purchases this year included a first aid kit, some wood brown filament for my 3D printer, and a non-stick frying pan! According to our official news release, Prime members worldwide purchased more than 100,000 items per minute during Prime Day, with best-selling categories including Amazon Devices, Consumer Electronics, and Home.

Powered by AWS
As always, AWS played a critical role in making Prime Day a success. A multitude of two-pizza teams worked together to make sure that every part of our infrastructure was scaled, tested, and ready to serve our customers. Here are a few examples:

Amazon Aurora – On Prime Day, 5,326 database instances running the PostgreSQL-compatible and MySQL-compatible editions of Amazon Aurora processed 288 billion transactions, stored 1,849 terabytes of data, and transferred 749 terabytes of data.

Amazon EC2 – For Prime Day 2022, Amazon increased the total number of normalized instances (an internal measure of compute power) on Amazon Elastic Compute Cloud (Amazon EC2) by 12%. This resulted in an overall server equivalent footprint that was only 7% larger than that of Cyber Monday 2021 due to the increased adoption of AWS Graviton2 processors.

Amazon EBS – For Prime Day, the Amazon team added 152 petabytes of EBS storage. The resulting fleet handled 11.4 trillion requests per day and transferred 532 petabytes of data per day. Interestingly enough, due to increased efficiency of some of the internal Amazon services used to run Prime Day, Amazon actually used about 4% less EBS storage and transferred 13% less data than it did during Prime Day last year. Here’s a graph that shows the increase in data transfer during Prime Day:

Amazon SES – In order to keep Prime Day shoppers aware of the deals and to deliver order confirmations, Amazon Simple Email Service (SES) peaked at 33,000 Prime Day email messages per second.

Amazon SQS – During Prime Day, Amazon Simple Queue Service (SQS) set a new traffic record by processing 70.5 million messages per second at peak:

Amazon DynamoDB – DynamoDB powers multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfillment centers. Over the course of Prime Day, these sources made trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 105.2 million requests per second.

Amazon SageMaker – The Amazon Robotics Pick Time Estimator, which uses Amazon SageMaker to train a machine learning model to predict the amount of time future pick operations will take, processed more than 100 million transactions during Prime Day 2022.

Package Planning – In North America, and on the highest traffic Prime 2022 day, package-planning systems performed 60 million AWS Lambda invocations, processed 17 terabytes of compressed data in Amazon Simple Storage Service (Amazon S3), stored 64 million items across Amazon DynamoDB and Amazon ElastiCache, served 200 million events over Amazon Kinesis, and handled 50 million Amazon Simple Queue Service events.

Prepare to Scale
Every year I reiterate the same message: rigorous preparation is key to the success of Prime Day and our other large-scale events. If you are preparing for a similar chart-topping event of your own, I strongly recommend that you take advantage of AWS Infrastructure Event Management (IEM). As part of an IEM engagement, my colleagues will provide you with architectural and operational guidance that will help you to execute your event with confidence!


Introducing Amazon CodeWhisperer in the AWS Lambda console (In preview)

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-amazon-codewhisperer-in-the-aws-lambda-console-in-preview/

This blog post is written by Mark Richman, Senior Solutions Architect.

Today, AWS is launching a new capability to integrate the Amazon CodeWhisperer experience with the AWS Lambda console code editor.

Amazon CodeWhisperer is a machine learning (ML)–powered service that helps improve developer productivity. It generates code recommendations based on their code comments written in natural language and code.

CodeWhisperer is available as part of the AWS toolkit extensions for major IDEs, including JetBrains, Visual Studio Code, and AWS Cloud9, currently supporting Python, Java, and JavaScript. In the Lambda console, CodeWhisperer is available as a native code suggestion feature, which is the focus of this blog post.

CodeWhisperer is currently available in preview with a waitlist. This blog post explains how to request access to and activate CodeWhisperer for the Lambda console. Once activated, CodeWhisperer can make code recommendations on-demand in the Lambda code editor as you develop your function. During the preview period, developers can use CodeWhisperer at no cost.

Amazon CodeWhisperer

Amazon CodeWhisperer

Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you. You can trigger Lambda from over 200 AWS services and software as a service (SaaS) applications and only pay for what you use.

With Lambda, you can build your functions directly in the AWS Management Console and take advantage of CodeWhisperer integration. CodeWhisperer in the Lambda console currently supports functions using the Python and Node.js runtimes.

When writing AWS Lambda functions in the console, CodeWhisperer analyzes the code and comments, determines which cloud services and public libraries are best suited for the specified task, and recommends a code snippet directly in the source code editor. The code recommendations provided by CodeWhisperer are based on ML models trained on a variety of data sources, including Amazon and open source code. Developers can accept the recommendation or simply continue to write their own code.

Requesting CodeWhisperer access

CodeWhisperer integration with Lambda is currently available as a preview only in the N. Virginia (us-east-1) Region. To use CodeWhisperer in the Lambda console, you must first sign up to access the service in preview here or request access directly from within the Lambda console.

In the AWS Lambda console, under the Code tab, in the Code source editor, select the Tools menu, and Request Amazon CodeWhisperer Access.

Request CodeWhisperer access in Lambda console

Request CodeWhisperer access in Lambda console

You may also request access from the Preferences pane.

Request CodeWhisperer access in Lambda console preference pane

Request CodeWhisperer access in Lambda console preference pane

Selecting either of these options opens the sign-up form.

CodeWhisperer sign up form

CodeWhisperer sign up form

Enter your contact information, including your AWS account ID. This is required to enable the AWS Lambda console integration. You will receive a welcome email from the CodeWhisperer team upon once they approve your request.

Activating Amazon CodeWhisperer in the Lambda console

Once AWS enables your preview access, you must turn on the CodeWhisperer integration in the Lambda console, and configure the required permissions.

From the Tools menu, enable Amazon CodeWhisperer Code Suggestions

Enable CodeWhisperer code suggestions

Enable CodeWhisperer code suggestions

You can also enable code suggestions from the Preferences pane:

Enable CodeWhisperer code suggestions from Preferences pane

Enable CodeWhisperer code suggestions from Preferences pane

The first time you activate CodeWhisperer, you see a pop-up containing terms and conditions for using the service.

CodeWhisperer Preview Terms

CodeWhisperer Preview Terms

Read the terms and conditions and choose Accept to continue.

AWS Identity and Access Management (IAM) permissions

For CodeWhisperer to provide recommendations in the Lambda console, you must enable the proper AWS Identity and Access Management (IAM) permissions for either your IAM user or role. In addition to Lambda console editor permissions, you must add the codewhisperer:GenerateRecommendations permission.

Here is a sample IAM policy that grants a user permission to the Lambda console as well as CodeWhisperer:

  "Version": "2012-10-17",
  "Statement": [{
      "Sid": "LambdaConsolePermissions",
      "Effect": "Allow",
      "Action": [
      "Resource": "*"
      "Sid": "CodeWhispererPermissions",
      "Effect": "Allow",
      "Action": ["codewhisperer:GenerateRecommendations"],
      "Resource": "*"

This example is for illustration only. It is best practice to use IAM policies to grant restrictive permissions to IAM principals to meet least privilege standards.


To activate and work with code suggestions, use the following keyboard shortcuts:

  • Manually fetch a code suggestion: Option+C (macOS), Alt+C (Windows)
  • Accept a suggestion: Tab
  • Reject a suggestion: ESC, Backspace, scroll in any direction, or keep typing and the recommendation automatically disappears.

Currently, the IDE extensions provide automatic suggestions and can show multiple suggestions. The Lambda console integration requires a manual fetch and shows a single suggestion.

Here are some common ways to use CodeWhisperer while authoring Lambda functions.

Single-line code completion

When typing single lines of code, CodeWhisperer suggests how to complete the line.

CodeWhisperer single-line completion

CodeWhisperer single-line completion

Full function generation

CodeWhisperer can generate an entire function based on your function signature or code comments. In the following example, a developer has written a function signature for reading a file from Amazon S3. CodeWhisperer then suggests a full implementation of the read_from_s3 method.

CodeWhisperer full function generation

CodeWhisperer full function generation

CodeWhisperer may include import statements as part of its suggestions, as in the previous example. As a best practice to improve performance, manually move these import statements to outside the function handler.

Generate code from comments

CodeWhisperer can also generate code from comments. The following example shows how CodeWhisperer generates code to use AWS APIs to upload files to Amazon S3. Write a comment describing the intended functionality and, on the following line, activate the CodeWhisperer suggestions. Given the context from the comment, CodeWhisperer first suggests the function signature code in its recommendation.

CodeWhisperer generate function signature code from comments

CodeWhisperer generate function signature code from comments

After you accept the function signature, CodeWhisperer suggests the rest of the function code.

CodeWhisperer generate function code from comments

CodeWhisperer generate function code from comments

When you accept the suggestion, CodeWhisperer completes the entire code block.

CodeWhisperer generates code to write to S3.

CodeWhisperer generates code to write to S3.

CodeWhisperer can help write code that accesses many other AWS services. In the following example, a code comment indicates that a function is sending a notification using Amazon Simple Notification Service (SNS). Based on this comment, CodeWhisperer suggests a function signature.

CodeWhisperer function signature for SNS

CodeWhisperer function signature for SNS

If you accept the suggested function signature. CodeWhisperer suggest a complete implementation of the send_notification function.

CodeWhisperer function send notification for SNS

CodeWhisperer function send notification for SNS

The same procedure works with Amazon DynamoDB. When writing a code comment indicating that the function is to get an item from a DynamoDB table, CodeWhisperer suggests a function signature.

CodeWhisperer DynamoDB function signature

CodeWhisperer DynamoDB function signature

When accepting the suggestion, CodeWhisperer then suggests a full code snippet to complete the implementation.

CodeWhisperer DynamoDB code snippet

CodeWhisperer DynamoDB code snippet

Once reviewing the suggestion, a common refactoring step in this example would be manually moving the references to the DynamoDB resource and table outside the get_item function.

CodeWhisperer can also recommend complex algorithm implementations, such as Insertion sort.

CodeWhisperer insertion sort.

CodeWhisperer insertion sort.

As a best practice, always test the code recommendation for completeness and correctness.

CodeWhisperer not only provides suggested code snippets when integrating with AWS APIs, but can help you implement common programming idioms, including proper error handling.


CodeWhisperer is a general purpose, machine learning-powered code generator that provides you with code recommendations in real time. When activated in the Lambda console, CodeWhisperer generates suggestions based on your existing code and comments, helping to accelerate your application development on AWS.

To get started, visit https://aws.amazon.com/codewhisperer/. Share your feedback with us at [email protected].

For more serverless learning resources, visit Serverless Land.

How Plugsurfing doubled performance and reduced cost by 70% with purpose-built databases and AWS Graviton

Post Syndicated from Anand Shah original https://aws.amazon.com/blogs/big-data/how-plugsurfing-doubled-performance-and-reduced-cost-by-70-with-purpose-built-databases-and-aws-graviton/

Plugsurfing aligns the entire car charging ecosystem—drivers, charging point operators, and carmakers—within a single platform. The over 1 million drivers connected to the Plugsurfing Power Platform benefit from a network of over 300,000 charging points across Europe. Plugsurfing serves charging point operators with a backend cloud software for managing everything from country-specific regulations to providing diverse payment options for customers. Carmakers benefit from white label solutions as well as deeper integrations with their in-house technology. The platform-based ecosystem has already processed more than 18 million charging sessions. Plugsurfing was acquired fully by Fortum Oyj in 2018.

Plugsurfing uses Amazon OpenSearch Service as a central data store to store 300,000 charging stations’ information and to power search and filter requests coming from mobile, web, and connected car dashboard clients. With the increasing usage, Plugsurfing created multiple read replicas of an OpenSearch Service cluster to meet demand and scale. Over time and with the increase in demand, this solution started to become cost exhaustive and limited in terms of cost performance benefit.

AWS EMEA Prototyping Labs collaborated with the Plugsurfing team for 4 weeks on a hands-on prototyping engagement to solve this problem, which resulted in 70% cost savings and doubled the performance benefit over the current solution. This post shows the overall approach and ideas we tested with Plugsurfing to achieve the results.

The challenge: Scaling higher transactions per second while keeping costs under control

One of the key issues of the legacy solution was keeping up with higher transactions per second (TPS) from APIs while keeping costs low. The majority of the cost was coming from the OpenSearch Service cluster, because the mobile, web, and EV car dashboards use different APIs for different use cases, but all query the same cluster. The solution to achieve higher TPS with the legacy solution was to scale the OpenSearch Service cluster.

The following figure illustrates the legacy architecture.

Legacy Architecture

Plugsurfing APIs are responsible for serving data for four different use cases:

  • Radius search – Find all the EV charging stations (latitude/longitude) with in x km radius from the point of interest (or current location on GPS).
  • Square search – Find all the EV charging stations within a box of length x width, where the point of interest (or current location on GPS) is at the center.
  • Geo clustering search – Find all the EV charging stations clustered (grouped) by their concentration within a given area. For example, searching all EV chargers in all of Germany results in something like 50 in Munich and 100 in Berlin.
  • Radius search with filtering – Filter the results by EV charger that are available or in use by plug type, power rating, or other filters.

The OpenSearch Service domain configuration was as follows:

  • m4.10xlarge.search x 4 nodes
  • Elasticsearch 7.10 version
  • A single index to store 300,000 EV charger locations with five shards and one replica
  • A nested document structure

The following code shows the example document

   "adress":"parking lot 1",
               "plug_type":"Type A"

Solution overview

AWS EMEA Prototyping Labs proposed an experimentation approach to try three high-level ideas for performance optimization and to lower overall solution costs.

We launched an Amazon Elastic Compute Cloud (EC2) instance in a prototyping AWS account to host a benchmarking tool based on k6 (an open-source tool that makes load testing simple for developers and QA engineers) Later, we used scripts to dump and restore production data to various databases, transforming it to fit with different data models. Then we ran k6 scripts to run and record performance metrics for each use case, database, and data model combination. We also used the AWS Pricing Calculator to estimate the cost of each experiment.

Experiment 1: Use AWS Graviton and optimize OpenSearch Service domain configuration

We benchmarked a replica of the legacy OpenSearch Service domain setup in a prototyping environment to baseline performance and costs. Next, we analyzed the current cluster setup and recommended testing the following changes:

  • Use AWS Graviton based memory optimized EC2 instances (r6g) x 2 nodes in the cluster
  • Reduce the number of shards from five to one, given the volume of data (all documents) is less than 1 GB
  • Increase the refresh interval configuration from the default 1 second to 5 seconds
  • Denormalize the full document; if not possible, then denormalize all the fields that are part of the search query
  • Upgrade to Amazon OpenSearch Service 1.0 from Elasticsearch 7.10

Plugsurfing created multiple new OpenSearch Service domains with the same data and benchmarked them against the legacy baseline to obtain the following results. The row in yellow represents the baseline from the legacy setup; the rows with green represent the best outcome out of all experiments performed for the given use cases.

DB Engine Version Node Type Nodes in Cluster Configurations Data Modeling Radius req/sec Filtering req/sec Performance Gain %
Elasticsearch 7.1 m4.10xlarge 4 5 shards, 1 replica Nested 2841 580 0
Amazon OpenSearch Service 1.0 r6g.xlarge 2 1 shards, 1 replica Nested 850 271 32.77
Amazon OpenSearch Service 1.0 r6g.xlarge 2 1 shards, 1 replica Denormalized 872 670 45.07
Amazon OpenSearch Service 1.0 r6g.2xlarge 2 1 shards, 1 replica Nested 1667 474 62.58
Amazon OpenSearch Service 1.0 r6g.2xlarge 2 1 shards, 1 replica Denormalized 1993 1268 95.32

Plugsurfing was able to gain 95% (doubled) better performance across the radius and filtering use cases with this experiment.

Experiment 2: Use purpose-built databases on AWS for different use cases

We tested Amazon OpenSearch Service, Amazon Aurora PostgreSQL-Compatible Edition, and Amazon DynamoDB extensively with many data models for different use cases.

We tested the square search use case with an Aurora PostgreSQL cluster with a db.r6g.2xlarge single node as the reader and a db.r6g.large single node as the writer. The square search used a single PostgreSQL table configured via the following steps:

  1. Create the geo search table with geography as the data type to store latitude/longitude:
CREATE TYPE status AS ENUM ('available', 'inuse', 'out-of-order');

id     serial PRIMARY KEY,
geog   geography(POINT),
status status,
data   text -- Can be used as json data type, or add extra fields as flat json
  1. Create an index on the geog field:
CREATE INDEX global_points_gix ON square_search USING GIST (geog);
  1. Query the data for the square search use case:
SELECT id, ST_AsText(geog), status, datafrom square_search
where geog && ST_MakeEnvelope(32.5,9,32.8,11,4326) limit 100;

We achieved an eight-times greater improvement in TPS for the square search use case, as shown in the following table.

DB Engine Version Node Type Nodes in Cluster Configurations Data modeling Square req/sec Performance Gain %
Elasticsearch 7.1 m4.10xlarge 4 5 shards, 1 replica Nested 412 0
Aurora PostgreSQL 13.4 r6g.large 2 PostGIS, Denormalized Single table 881 213.83
Aurora PostgreSQL 13.4 r6g.xlarge 2 PostGIS, Denormalized Single table 1770 429.61
Aurora PostgreSQL 13.4 r6g.2xlarge 2 PostGIS, Denormalized Single table 3553 862.38

We tested the geo clustering search use case with a DynamoDB model. The partition key (PK) is made up of three components: <zoom-level>:<geo-hash>:<api-key>, and the sort key is the EV charger current status. We examined the following:

  • The zoom level of the map set by the user
  • The geo hash computed based on the map tile in the user’s view port area (at every zoom level, the map of Earth is divided into multiple tiles, where each tile can be represented as a geohash)
  • The API key to identify the API user
Partition Key: String Sort Key: String total_pins: Number filter1_pins: Number filter2_pins: Number filter3_pins: Number
5:gbsuv:api_user_1 Available 100 50 67 12
5:gbsuv:api_user_1 in-use 25 12 6 1
6:gbsuvt:api_user_1 Available 35 22 8 0
6:gbsuvt:api_user_1 in-use 88 4 0 35

The writer updates the counters (increment or decrement) against each filter condition and charger status whenever the EV charger status is updated at all zoom levels. With this model, the reader can query pre-clustered data with a single direct partition hit for all the map tiles viewable by the user at the given zoom level.

The DynamoDB model helped us gain a 45-times greater read performance for our geo clustering use case. However, it also added extra work on the writer side to pre-compute numbers and update multiple rows when the status of a single EV charger is updated. The following table summarizes our results.

DB Engine Version Node Type Nodes in Cluster Configurations Data modeling Clustering req/sec Performance Gain %
Elasticsearch 7.1 m4.10xlarge 4 5 shards, 1 replica Nested 22 0
DynamoDB NA Serverless 0 100 WCU, 500 RCU Single table 1000 4545.45

Experiment 3: Use AWS [email protected] and AWS Wavelength for better network performance

We recommended that Plugsurfing use [email protected] and AWS Wavelength to optimize network performance by shifting some of the APIs at the edge to closer to the user. The EV car dashboard can use the same 5G network connectivity to invoke Plugsurfing APIs with AWS Wavelength.

Post-prototype architecture

The post-prototype architecture used purpose-built databases on AWS to achieve better performance across all four use cases. We looked at the results and split the workload based on which database performs best for each use case. This approach optimized performance and cost, but added complexity on readers and writers. The final experiment summary represents the database fits for the given use cases that provide the best performance (highlighted in orange).

Plugsurfing has already implemented a short-term plan (light green) as an immediate action post-prototype and plans to implement mid-term and long-term actions (dark green) in the future.

DB Engine Node Type Configurations Radius req/sec Radius Filtering req/sec Clustering req/sec Square req/sec Monthly Costs $ Cost Benefit % Performance Gain %
Elasticsearch 7.1 m4.10xlarge x4 5 shards 2841 580 22 412 9584,64 0 0
Amazon OpenSearch Service 1.0 r6g.2xlarge x2

1 shard


1667 474 34 142 1078,56 88,75 -39,9
Amazon OpenSearch Service 1.0 r6g.2xlarge x2 1 shard 1993 1268 125 685 1078,56 88,75 5,6
Aurora PostgreSQL 13.4 r6g.2xlarge x2 PostGIS 0 0 275 3553 1031,04 89,24 782,03
DynamoDB Serverless

100 WCU

500 RCU

0 0 1000 0 106,06 98,89 4445,45
Summary . . 2052 1268 1000 3553 2215,66 76,88 104,23

The following diagram illustrates the updated architecture.

Post Prototype Architecture


Plugsurfing was able to achieve a 70% cost reduction over their legacy setup with two-times better performance by using purpose-built databases like DynamoDB, Aurora PostgreSQL, and AWS Graviton based instances for Amazon OpenSearch Service. They achieved the following results:

  • The radius search and radius search with filtering use cases achieved better performance using Amazon OpenSearch Service on AWS Graviton with a denormalized document structure
  • The square search use case performed better using Aurora PostgreSQL, where we used the PostGIS extension for geo square queries
  • The geo clustering search use case performed better using DynamoDB

Learn more about AWS Graviton instances and purpose-built databases on AWS, and let us know how we can help optimize your workload on AWS.

About the Author

Anand ShahAnand Shah is a Big Data Prototyping Solution Architect at AWS. He works with AWS customers and their engineering teams to build prototypes using AWS Analytics services and purpose-built databases. Anand helps customers solve the most challenging problems using art-of-the-possible technology. He enjoys beaches in his leisure time.

New — Detect and Resolve Issues Quickly with Log Anomaly Detection and Recommendations from Amazon DevOps Guru

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/new-detect-and-resolve-issues-quickly-with-log-anomaly-detection-and-recommendations-from-amazon-devops-guru/

Today, we are announcing a new feature, Log Anomaly Detection and Recommendations for Amazon DevOps Guru. With this feature, you can find anomalies throughout relevant logs within your app, and get targeted recommendations to resolve issues. Here’s a quick look at this feature:

AWS launched DevOps Guru, a fully managed AIOps platform service, in December 2020 to make it easier for developers and operators to improve applications’ reliability and availability. DevOps Guru minimizes the time needed for issue remediation by using machine learning models based on more than 20 years of operational expertise in building, scaling, and maintaining applications for Amazon.com.

You can use DevOps Guru to identify anomalies such as increased latency, error rates, and resource constraints and then send alerts with a description and actionable recommendations for remediation. You don’t need any prior knowledge in machine learning to use DevOps Guru, and only need to activate it in the DevOps Guru dashboard.

New Feature – Log Anomaly Detection and Recommendations

Observability and monitoring are integral parts of DevOps and modern applications. Applications can generate several types of telemetry, one of which is metrics, to reveal the performance of applications and to help identify issues.

While the metrics analyzed by DevOps Guru today are critical to surfacing issues occurring in applications, it is still challenging to find the root cause of these issues. As applications become more distributed and complex, developers and IT operators need more automation to reduce the time and effort spend detecting, debugging, and resolving operational issues. By sourcing relevant logs in conjunction with metrics, developers can now more effectively monitor and troubleshoot their applications.

With this new Log Anomaly Detection and Recommendations feature, you can get insights along with precise recommendations from application logs without manual effort. This feature delivers contextualized log data of anomaly occurrences and provides actionable insights from recommendations integrated inside the DevOps Guru dashboard.

The Log Anomaly Detection and Recommendations feature is able to detect exception keywords, numerical anomalies, HTTP status codes, data format anomalies, and more. When DevOps Guru identifies anomalies from logs, you will find relevant log samples and deep links to CloudWatch Logs on the DevOps Guru dashboard. These contextualized logs are an important component for DevOps Guru to provide further features, namely targeted recommendations to help faster troubleshooting and issue remediation.

Let’s Get Started!

This new feature consists of two things, “Log Anomaly Detection” and “Recommendations.” Let’s explore further into how we can use this feature to find the root cause of an issue and get recommendations. As an example, we’ll look at my serverless API built using Amazon API Gateway, with AWS Lambda integrated with Amazon DynamoDB. The architecture is shown in the following image:

If it’s your first time using DevOps Guru, you’ll need to enable it by visiting the DevOps Guru dashboard. You can learn more by visiting the Getting Started page.

Since I’ve already enabled DevOps Guru I can go to the Insights page, navigate to the Log groups section, and select the Enable log anomaly detection.

Log Anomaly Detection

After a few hours, I can visit the DevOps Guru dashboard to check for insights. Here, I get some findings from DevOps Guru, as seen in the following screenshots:

With Log Anomaly Detection, DevOps Guru will show the findings of my serverless API in the Log groups section, as seen in the following screenshot:

I can hover over the anomaly and get a high-level summary of the contextualized enrichment data found in this log group. It also provides me with additional information, including the number of log records analyzed and the log scan time range. From this information, I know these anomalies are new event types that have not been detected in the past with the keyword ERROR.

To investigate further, I can select the log group link and go to the Detail page. The graph shows relevant events that might have occurred around these log showcases, which is a helpful context for troubleshooting the root cause. This Detail page includes different showcases, each representing a cluster of similar log events, like exception keywords and numerical anomalies, found in the logs at the time of the anomaly.

Looking at the first log showcase, I noticed a ConditionalCheckFailedException error within the AWS Lambda function. This can occur when AWS Lambda fails to call DynamoDB. From here, I learned that there was an error in the conditional check section, and I reviewed the logic on AWS Lambda. I can also investigate related CloudWatch Logs groups by selecting View details in CloudWatch links.

One thing I want to emphasize here is that DevOps Guru identifies significant events related to application performance and helps me to see the important things I need to focus on by separating the signal from the noise.

Targeted Recommendations

In addition to anomaly detection of logs, this new feature also provides precise recommendations based on the findings in the logs. You can find these recommendations on the Insights page, by scrolling down to find the Recommendations section.

Here, I get some recommendations from DevOps Guru, which make it easier for me to take immediate steps to remediate the issue. One recommendation shown in the following image is Check DynamoDB ConditionalExpression, which relates to an anomaly found in the logs derived from AWS Lambda.


You can use DevOps Guru Log Anomaly Detection and Recommendations today at no additional charge in all Regions where DevOps Guru is available, US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

To learn more, please visit Amazon DevOps Guru web site and technical documentation, and get started today.

Happy building
— Donnie

Top 2021 AWS service launches security professionals should review – Part 2

Post Syndicated from Marta Taggart original https://aws.amazon.com/blogs/security/top-2021-aws-service-launches-security-professionals-should-review-part-2/

In Part 1 of this two-part series, we shared an overview of some of the most important 2021 Amazon Web Services (AWS) Security service and feature launches. In this follow-up, we’ll dive deep into additional launches that are important for security professionals to be aware of and understand across all AWS services. There have already been plenty in the first half of 2022, so we’ll highlight those soon, as well.

AWS Identity

You can use AWS Identity Services to build Zero Trust architectures, help secure your environments with a robust data perimeter, and work toward the security best practice of granting least privilege. In 2021, AWS expanded the identity source options, AWS Region availability, and support for AWS services. There is also added visibility and power in the permission management system. New features offer new integrations, additional policy checks, and secure resource sharing across AWS accounts.

AWS Single Sign-On

For identity management, AWS Single Sign-On (AWS SSO) is where you create, or connect, your workforce identities in AWS once and manage access centrally across your AWS accounts in AWS Organizations. In 2021, AWS SSO announced new integrations for JumpCloud and CyberArk users. This adds to the list of providers that you can use to connect your users and groups, which also includes Microsoft Active Directory Domain Services, Okta Universal Directory, Azure AD, OneLogin, and Ping Identity.

AWS SSO expanded its availability to new Regions: AWS GovCloud (US), Europe (Paris), and South America (São Paulo) Regions. Another very cool AWS SSO development is its integration with AWS Systems Manager Fleet Manager. This integration enables you to log in interactively to your Windows servers running on Amazon Elastic Compute Cloud (Amazon EC2) while using your existing corporate identities—try it, it’s fantastic!

AWS Identity and Access Management

For access management, there have been a range of feature launches with AWS Identity and Access Management (IAM) that have added up to more power and visibility in the permissions management system. Here are some key examples.

IAM made it simpler to relate a user’s IAM role activity to their corporate identity. By setting the new source identity attribute, which persists through role assumption chains and gets logged in AWS CloudTrail, you can find out who is responsible for actions that IAM roles performed.

IAM added support for policy conditions, to help manage permissions for AWS services that access your resources. This important feature launch of service principal conditions helps you to distinguish between API calls being made on your behalf by a service principal, and those being made by a principal inside your account. You can choose to allow or deny the calls depending on your needs. As a security professional, you might find this especially useful in conjunction with the aws:CalledVia condition key, which allows you to scope permissions down to specify that this account principal can only call this API if they are calling it using a particular AWS service that’s acting on their behalf. For example, your account principal can’t generally access a particular Amazon Simple Storage Service (Amazon S3) bucket, but if they are accessing it by using Amazon Athena, they can do so. These conditions can also be used in service control policies (SCPs) to give account principals broader scope across an account, organizational unit, or organization; they need not be added to individual principal policies or resource policies.

Another very handy new IAM feature launch is additional information about the reason for an access denied error message. With this additional information, you can now see which of the relevant access control policies (for example, IAM, resource, SCP, or VPC endpoint) was the cause of the denial. As of now, this new IAM feature is supported by more than 50% of all AWS services in the AWS SDK and AWS Command Line Interface, and a fast-growing number in the AWS Management Console. We will continue to add support for this capability across services, as well as add more features that are designed to make the journey to least privilege simpler.

IAM Access Analyzer

AWS Identity and Access Management (IAM) Access Analyzer provides actionable recommendations to set secure and functional permissions. Access Analyzer introduced the ability to preview the impact of policy changes before deployment and added over 100 policy checks for correctness. Both of these enhancements are integrated into the console and are also available through APIs. Access Analyzer also provides findings for external access allowed by resource policies for many services, including a previous launch in which IAM Access Analyzer was directly integrated into the Amazon S3 management console.

IAM Access Analyzer also launched the ability to generate fine-grained policies based on analyzing past AWS CloudTrail activity. This feature provides a great new capability for DevOps teams or central security teams to scope down policies to just the permissions needed, making it simpler to implement least privilege permissions. IAM Access Analyzer launched further enhancements to expand policy checks, and the ability to generate a sample least-privilege policy from past activity was expanded beyond the account level to include an analysis of principal behavior within the entire organization by analyzing log activity stored in AWS CloudTrail.

AWS Resource Access Manager

AWS Resource Access Manager (AWS RAM) helps you securely share your resources across unrelated AWS accounts within your organization or organizational units (OUs) in AWS Organizations. Now you can also share your resources with IAM roles and IAM users for supported resource types. This update enables more granular access using managed permissions that you can use to define access to shared resources. In addition to the default managed permission defined for each shareable resource type, you now have more flexibility to choose which permissions to grant to whom for resource types that support additional managed permissions. Additionally, AWS RAM added support for global resource types, enabling you to provision a global resource once, and share that resource across your accounts. A global resource is one that can be used in multiple AWS Regions; the first example of a global resource is found in AWS Cloud WAN, currently in preview as of this publication. AWS RAM helps you more securely share an AWS Cloud WAN core network, which is a managed network containing AWS and on-premises networks. With AWS RAM global resource sharing, you can use the Cloud WAN core network to centrally operate a unified global network across Regions and accounts.

AWS Directory Service

AWS Directory Service for Microsoft Active Directory, also known as AWS Managed Microsoft Active Directory (AD), was updated to automatically provide domain controller and directory utilization metrics in Amazon CloudWatch for new and existing directories. Analyzing these utilization metrics helps you quantify your average and peak load times to identify the need for additional domain controllers. With this, you can define the number of domain controllers to meet your performance, resilience, and cost requirements.

Amazon Cognito

Amazon Cognito identity pools (federated identities) was updated to enable you to use attributes from social and corporate identity providers to make access control decisions and simplify permissions management in AWS resources. In Amazon Cognito, you can choose predefined attribute-tag mappings, or you can create custom mappings using the attributes from social and corporate providers’ access and ID tokens, or SAML assertions. You can then reference the tags in an IAM permissions policy to implement attribute-based access control (ABAC) and manage access to your AWS resources. Amazon Cognito also launched a new console experience for user pools and now supports targeted sign out through refresh token revocation.

Governance, control, and logging services

There were a number of important releases in 2021 in the areas of governance, control, and logging services.

AWS Organizations

AWS Organizations added a number of important import features and integrations during 2021. Security-relevant services like Amazon Detective, Amazon Inspector, and Amazon Virtual Private Cloud (Amazon VPC) IP Address Manager (IPAM), as well as others like Amazon DevOps Guru, launched integrations with Organizations. Others like AWS SSO and AWS License Manager upgraded their Organizations support by adding support for a Delegated Administrator account, reducing the need to use the management account for operational tasks. Amazon EC2 and EC2 Image Builder took advantage of the account grouping capabilities provided by Organizations to allow cross-account sharing of Amazon Machine Images (AMIs) (for more details, see the Amazon EC2 section later in this post). Organizations also got an updated console, increased quotas for tag policies, and provided support for the launch of an API that allows for programmatic creation and maintenance of AWS account alternate contacts, including the very important security contact (although that feature doesn’t require Organizations). For more information on the value of using the security contact for your accounts, see the blog post Update the alternate security contact across your AWS accounts for timely security notifications.

AWS Control Tower

2021 was also a good year for AWS Control Tower, beginning with an important launch of the ability to take over governance of existing OUs and accounts, as well as bulk update of new settings and guardrails with a single button click or API call. Toward the end of 2021, AWS Control Tower added another valuable enhancement that allows it to work with a broader set of customers and use cases, namely support for nested OUs within an organization.

AWS CloudFormation Guard 2.0

Another important milestone in 2021 for creating and maintaining a well-governed cloud environment was the re-launch of CloudFormation Guard as Cfn-Guard 2.0. This launch was a major overhaul of the Cfn-Guard domain-specific language (DSL), a DSL designed to provide the ability to test infrastructure-as-code (IaC) templates such as CloudFormation and Terraform to make sure that they conform with a set of constraints written in the DSL by a central team, such as a security organization or network management team.

This approach provides a powerful new middle ground between the older security models of prevention (which provide developers only an access denied message, and often can’t distinguish between an acceptable and an unacceptable use of the same API) and a detect and react model (when undesired states have already gone live). The Cfn-Guard 2.0 model gives builders the freedom to build with IaC, while allowing central teams to have the ability to reject infrastructure configurations or changes that don’t conform to central policies—and to do so with completely custom error messages that invite dialog between the builder team and the central team, in case the rule is unnuanced and needs to be refined, or if a specific exception needs to be created.

For example, a builder team might be allowed to provision and attach an internet gateway to a VPC, but the team can do this only if the routes to the internet gateway are limited to a certain pre-defined set of CIDR ranges, such as the public addresses of the organization’s branch offices. It’s not possible to write an IAM policy that takes into account the CIDR values of a VPC route table update, but you can write a Cfn-Guard 2.0 rule that allows the creation and use of an internet gateway, but only with a defined and limited set of IP addresses.

AWS Systems Manager Incident Manager

An important launch that security professionals should know about is AWS Systems Manager Incident Manager. Incident Manager provides a number of powerful capabilities for managing incidents of any kind, including operational and availability issues but also security issues. With Incident Manager, you can automatically take action when a critical issue is detected by an Amazon CloudWatch alarm or Amazon EventBridge event. Incident Manager runs pre-configured response plans to engage responders by using SMS and phone calls, can enable chat commands and notifications using AWS Chatbot, and runs automation workflows with AWS Systems Manager Automation runbooks. The Incident Manager console integrates with AWS Systems Manager OpsCenter to help you track incidents and post-incident action items from a central place that also synchronizes with third-party management tools such as Jira Service Desk and ServiceNow. Incident Manager enables cross-account sharing of incidents using AWS RAM, and provides cross-Region replication of incidents to achieve higher availability.

AWS CloudTrail

AWS CloudTrail added some great new logging capabilities in 2021, including logging data-plane events for Amazon DynamoDB and Amazon Elastic Block Store (Amazon EBS) direct APIs (direct APIs allow access to EBS snapshot content through a REST API). CloudTrail also got further enhancements to its machine-learning based CloudTrail Insights feature, including a new one called ErrorRate Insights.

Amazon S3

Amazon Simple Storage Service (Amazon S3) is one of the most important services at AWS, and its steady addition of security-related enhancements is always big news. Here are the 2021 highlights.

Access Points aliases

Amazon S3 introduced a new feature, Amazon S3 Access Points aliases. With Amazon S3 Access Points aliases, you can make the access points backwards-compatible with a large amount of existing code that is programmed to interact with S3 buckets rather than access points.

To understand the importance of this launch, we have to go back to 2019 to the launch of Amazon S3 Access Points. Access points are a powerful mechanism for managing S3 bucket access. They provide a great simplification for managing and controlling access to shared datasets in S3 buckets. You can create up to 1,000 access points per Region within each of your AWS accounts. Although bucket access policies remain fully enforced, you can delegate access control from the bucket to its access points, allowing for distributed and granular control. Each access point enforces a customizable policy that can be managed by a particular workgroup, while also avoiding the problem of bucket policies needing to grow beyond their maximum size. Finally, you can also bind an access point to a particular VPC for its lifetime, to prevent access directly from the internet.

With the 2021 launch of Access Points aliases, Amazon S3 now generates a unique DNS name, or alias, for each access point. The Access Points aliases look and acts just like an S3 bucket to existing code. This means that you don’t need to make changes to older code to use Amazon S3 Access Points; just substitute an Access Points aliases wherever you previously used a bucket name. As a security team, it’s important to know that this flexible and powerful administrative feature is backwards-compatible and can be treated as a drop-in replacement in your various code bases that use Amazon S3 but haven’t been updated to use access point APIs. In addition, using Access Points aliases adds a number of powerful security-related controls, such as permanent binding of S3 access to a particular VPC.

Bucket Keys

Amazon S3 launched support for S3 Inventory and S3 Batch Operations to identify and copy objects to use S3 Bucket Keys, which can help reduce the costs of server-side encryption (SSE) with AWS Key Management Service (AWS KMS).

S3 Bucket Keys were launched at the end of 2020, another great launch that security professionals should know about, so here is an overview in case you missed it. S3 Bucket Keys are data keys generated by AWS KMS to provide another layer of envelope encryption in which the outer layer (the S3 Bucket Key) is cached by S3 for a short period of time. This extra key layer increases performance and reduces the cost of requests to AWS KMS. It achieves this by decreasing the request traffic from Amazon S3 to AWS KMS from a one-to-one model—one request to AWS KMS for each object written to or read from Amazon S3—to a one-to-many model using the cached S3 Bucket Key. The S3 Bucket Key is never stored persistently in an unencrypted state outside AWS KMS, and so Amazon S3 ultimately must always return to AWS KMS to encrypt and decrypt the S3 Bucket Key, and thus, the data. As a result, you still retain control of the key hierarchy and resulting encrypted data through AWS KMS, and are still able to audit Amazon S3 returning periodically to AWS KMS to refresh the S3 Bucket Keys, as logged in CloudTrail.

Returning to our review of 2021, S3 Bucket Keys gained the ability to use Amazon S3 Inventory and Amazon S3 Batch Operations automatically to migrate objects from the higher cost, slightly lower-performance SSE-KMS model to the lower-cost, higher-performance S3 Bucket Keys model.

Simplified ownership and access management

The final item from 2021 for Amazon S3 is probably the most important of all. Last year was the year that Amazon S3 achieved fully modernized object ownership and access management capabilities. You can now disable access control lists to simplify ownership and access management for data in Amazon S3.

To understand this launch, we need to go in time to the origins of Amazon S3, which is one of the oldest services in AWS, created even before IAM was launched in 2011. In those pre-IAM days, a storage system like Amazon S3 needed to have some kind of access control model, so Amazon S3 invented its own: Amazon S3 access control lists (ACLs). Using ACLs, you could add access permissions down to the object level, but only with regard to access by other AWS account principals (the only kind of identity that was available at the time), or public access (read-only or read-write) to an object. And in this model, objects were always owned by the creator of the object, not the bucket owner.

After IAM was introduced, Amazon S3 added the bucket policy feature, a type of resource policy that provides the rich features of IAM, including full support for all IAM principals (users and roles), time-of-day conditions, source IP conditions, ability to require encryption, and more. For many years, Amazon S3 access decisions have been made by combining IAM policy permissions and ACL permissions, which has served customers well. But the object-writer-is-owner issue has often caused friction. The good news for security professionals has been that a deny by either type of access control type overrides an allow by the other, so there were no security issues with this bi-modal approach. The challenge was that it could be administratively difficult to manage both resource policies—which exist at the bucket and access point level—and ownership and ACLs—which exist at the object level. Ownership and ACLs might potentially impact the behavior of only a handful of objects, in a bucket full of millions or billions of objects.

With the features released in 2021, Amazon S3 has removed these points of friction, and now provides the features needed to reduce ownership issues and to make IAM-based policies the only access control system for a specified bucket. The first step came in 2020 with the ability to make object ownership track bucket ownership, regardless of writer. But that feature applied only to newly-written objects. The final step is the 2021 launch we’re highlighting here: the ability to disable at the bucket level the evaluation of all existing ACLs—including ownership and permissions—effectively nullifying all object ACLs. From this point forward, you have the mechanisms you need to govern Amazon S3 access with a combination of S3 bucket policies, S3 access point policies, and (within the same account) IAM principal policies, without worrying about legacy models of ACLs and per-object ownership.

Additional database and storage service features

AWS Backup Vault Lock

AWS Backup added an important new additional layer for backup protection with the availability of AWS Backup Vault Lock. A vault lock feature in AWS is the ability to configure a storage policy such that even the most powerful AWS principals (such as an account or Org root principal) can only delete data if the deletion conforms to the preset data retention policy. Even if the credentials of a powerful administrator are compromised, the data stored in the vault remains safe. Vault lock features are extremely valuable in guarding against a wide range of security and resiliency risks (including accidental deletion), notably in an era when ransomware represents a rising threat to data.

Prior to AWS Backup Vault Lock, AWS provided the extremely useful Amazon S3 and Amazon S3 Glacier vault locking features, but these previous vaulting features applied only to the two Amazon S3 storage classes. AWS Backup, on the other hand, supports a wide range of storage types and databases across the AWS portfolio, including Amazon EBS, Amazon Relational Database Service (Amazon RDS) including Amazon Aurora, Amazon DynamoDB, Amazon Neptune, Amazon DocumentDB, Amazon Elastic File System (Amazon EFS), Amazon FSx for Lustre, Amazon FSx for Windows File Server, Amazon EC2, and AWS Storage Gateway. While built on top of Amazon S3, AWS Backup even supports backup of data stored in Amazon S3. Thus, this new AWS Backup Vault Lock feature effectively serves as a vault lock for all the data from most of the critical storage and database technologies made available by AWS.

Finally, as a bonus, AWS Backup added two more features in 2021 that should delight security and compliance professionals: AWS Backup Audit Manager and compliance reporting.

Amazon DynamoDB

Amazon DynamoDB added a long-awaited feature: data-plane operations integration with AWS CloudTrail. DynamoDB has long supported the recording of management operations in CloudTrail—including a long list of operations like CreateTable, UpdateTable, DeleteTable, ListTables, CreateBackup, and many others. What has been added now is the ability to log the potentially far higher volume of data operations such as PutItem, BatchWriteItem, GetItem, BatchGetItem, and DeleteItem. With this launch, full database auditing became possible. In addition, DynamoDB added more granular control of logging through DynamoDB Streams filters. This feature allows users to vary the recording in CloudTrail of both control plane and data plane operations, at the table or stream level.

Amazon EBS snapshots

Let’s turn now to a simple but extremely useful feature launch affecting Amazon Elastic Block Store (Amazon EBS) snapshots. In the past, it was possible to accidently delete an EBS snapshot, which is a problem for security professionals because data availability is a part of the core security triad of confidentiality, integrity, and availability. Now you can manage that risk and recover from accidental deletions of your snapshots by using Recycle Bin. You simply define a retention policy that applies to all deleted snapshots, and then you can define other more granular policies, for example using longer retention periods based on snapshot tag values, such as stage=prod. Along with this launch, the Amazon EBS team announced EBS Snapshots Archive, a major price reduction for long-term storage of snapshots.

AWS Certificate Manager Private Certificate Authority

2021 was a big year for AWS Certificate Manager (ACM) Private Certificate Authority (CA) with the following updates and new features:

Network and application protection

We saw a lot of enhancements in network and application protection in 2021 that will help you to enforce fine-grained security policies at important network control points across your organization. The services and new capabilities offer flexible solutions for inspecting and filtering traffic to help prevent unauthorized resource access.


AWS WAF launched AWS WAF Bot Control, which gives you visibility and control over common and pervasive bots that consume excess resources, skew metrics, cause downtime, or perform other undesired activities. The Bot Control managed rule group helps you monitor, block, or rate-limit pervasive bots, such as scrapers, scanners, and crawlers. You can also allow common bots that you consider acceptable, such as status monitors and search engines. AWS WAF also added support for custom responses, managed rule group versioning, in-line regular expressions, and Captcha. The Captcha feature has been popular with customers, removing another small example of “undifferentiated work” for customers.

AWS Shield Advanced

AWS Shield Advanced now automatically protects web applications by blocking application layer (L7) DDoS events with no manual intervention needed by you or the AWS Shield Response Team (SRT). When you protect your resources with AWS Shield Advanced and enable automatic application layer DDoS mitigation, Shield Advanced identifies patterns associated with L7 DDoS events and isolates this anomalous traffic by automatically creating AWS WAF rules in your web access control lists (ACLs).

Amazon CloudFront

In other edge networking news, Amazon CloudFront added support for response headers policies. This means that you can now add cross-origin resource sharing (CORS), security, and custom headers to HTTP responses returned by your CloudFront distributions. You no longer need to configure your origins or use custom [email protected] or CloudFront Functions to insert these headers.

CloudFront Functions were another great 2021 addition to edge computing, providing a simple, inexpensive, and yet highly secure method for running customer-defined code as part of any CloudFront-managed web request. CloudFront functions allow for the creation of very efficient, fine-grained network access filters, such the ability to block or allow web requests at a region or city level.

Amazon Virtual Private Cloud and Route 53

Amazon Virtual Private Cloud (Amazon VPC) added more-specific routing (routing subnet-to-subnet traffic through a virtual networking device) that allows for packet interception and inspection between subnets in a VPC. This is particularly useful for highly-available, highly-scalable network virtual function services based on Gateway Load Balancer, including both AWS services like AWS Network Firewall, as well as third-party networking services such as the recently announced integration between AWS Firewall Manager and Palo Alto Networks Cloud Next Generation Firewall, powered by Gateway Load Balancer.

Another important set of enhancements to the core VPC experience came in the area of VPC Flow Logs. Amazon VPC launched out-of-the-box integration with Amazon Athena. This means with a few clicks, you can now use Athena to query your VPC flow logs delivered to Amazon S3. Additionally, Amazon VPC launched three associated new log features that make querying more efficient by supporting Apache Parquet, Hive-compatible prefixes, and hourly partitioned files.

Following Route 53 Resolver’s much-anticipated launch of DNS logging in 2020, the big news for 2021 was the launch of its DNS Firewall capability. Route 53 Resolver DNS Firewall lets you create “blocklists” for domains you don’t want your VPC resources to communicate with, or you can take a stricter, “walled-garden” approach by creating “allowlists” that permit outbound DNS queries only to domains that you specify. You can also create alerts for when outbound DNS queries match certain firewall rules, allowing you to test your rules before deploying for production traffic. Route 53 Resolver DNS Firewall launched with two managed domain lists—malware domains and botnet command and control domains—enabling you to get started quickly with managed protections against common threats. It also integrated with Firewall Manager (see the following section) for easier centralized administration.

AWS Network Firewall and Firewall Manager

Speaking of AWS Network Firewall and Firewall Manager, 2021 was a big year for both. Network Firewall added support for AWS Managed Rules, which are groups of rules based on threat intelligence data, to enable you to stay up to date on the latest security threats without writing and maintaining your own rules. AWS Network Firewall features a flexible rules engine enabling you to define firewall rules that give you fine-grained control over network traffic. As of the launch in late 2021, you can enable managed domain list rules to block HTTP and HTTPS traffic to domains identified as low-reputation, or that are known or suspected to be associated with malware or botnets. Prior to that, another important launch was new configuration options for rule ordering and default drop, making it simpler to write and process rules to monitor your VPC traffic. Also in 2021, Network Firewall announced a major regional expansion following its initial launch in 2020, and a range of compliance achievements and eligibility including HIPAA, PCI DSS, SOC, and ISO.

Firewall Manager also had a strong 2021, adding a number of additional features beyond its initial core area of managing network firewalls and VPC security groups that provide centralized, policy-based control over many other important network security capabilities: Amazon Route 53 Resolver DNS Firewall configurations, deployment of the new AWS WAF Bot Control, monitoring of VPC routes for AWS Network Firewall, AWS WAF log filtering, AWS WAF rate-based rules, and centralized logging of AWS Network Firewall logs.

Elastic Load Balancing

Elastic Load Balancing now supports forwarding traffic directly from Network Load Balancer (NLB) to Application Load Balancer (ALB). With this important new integration, you can take advantage of many critical NLB features such as support for AWS PrivateLink and exposing static IP addresses for applications that still require ALB.

In addition, Network Load Balancer now supports version 1.3 of the TLS protocol. This adds to the existing TLS 1.3 support in Amazon CloudFront, launched in 2020. AWS plans to add TLS 1.3 support for additional services.

The AWS Networking team also made Amazon VPC private NAT gateways available in both AWS GovCloud (US) Regions. The expansion into the AWS GovCloud (US) Regions enables US government agencies and contractors to move more sensitive workloads into the cloud by helping them to address certain regulatory and compliance requirements.


Security professionals should also be aware of some interesting enhancements in AWS compute services that can help improve their organization’s experience in building and operating a secure environment.

Amazon Elastic Compute Cloud (Amazon EC2) launched the Global View on the console to provide visibility to all your resources across Regions. Global View helps you monitor resource counts, notice abnormalities sooner, and find stray resources. A few days into 2022, another simple but extremely useful EC2 launch was the new ability to obtain instance tags from the Instance Metadata Service (IMDS). Many customers run code on Amazon EC2 that needs to introspect about the EC2 tags associated with the instance and then change its behavior depending on the content of the tags. Prior to this launch, you had to associate an EC2 role and call the EC2 API to get this information. That required access to API endpoints, either through a NAT gateway or a VPC endpoint for Amazon EC2. Now, that information can be obtained directly from the IMDS, greatly simplifying a common use case.

Amazon EC2 launched sharing of Amazon Machine Images (AMIs) with AWS Organizations and Organizational Units (OUs). Previously, you could share AMIs only with specific AWS account IDs. To share AMIs within AWS Organizations, you had to explicitly manage sharing of AMIs on an account-by-account basis, as they were added to or removed from AWS Organizations. With this new feature, you no longer have to update your AMI permissions because of organizational changes. AMI sharing is automatically synchronized when organizational changes occur. This feature greatly helps both security professionals and governance teams to centrally manage and govern AMIs as you grow and scale your AWS accounts. As previously noted, this feature was also added to EC2 Image Builder. Finally, Amazon Data Lifecycle Manager, the tool that manages all your EBS volumes and AMIs in a policy-driven way, now supports automatic deprecation of AMIs. As a security professional, you will find this helpful as you can set a timeline on your AMIs so that, if the AMIs haven’t been updated for a specified period of time, they will no longer be considered valid or usable by development teams.

Looking ahead

In 2022, AWS continues to deliver experiences that meet administrators where they govern, developers where they code, and applications where they run. We will continue to summarize important launches in future blog posts. If you’re interested in learning more about AWS services, join us for AWS re:Inforce, the AWS conference focused on cloud security, identity, privacy, and compliance. AWS re:Inforce 2022 will take place July 26–27 in Boston, MA. Registration is now open. Register now with discount code SALxUsxEFCw to get $150 off your full conference pass to AWS re:Inforce. For a limited time only and while supplies last. We look forward to seeing you there!

To stay up to date on the latest product and feature launches and security use cases, be sure to read the What’s New with AWS announcements (or subscribe to the RSS feed) and the AWS Security Blog.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.


Marta Taggart

Marta is a Seattle-native and Senior Product Marketing Manager in AWS Security Product Marketing, where she focuses on data protection services. Outside of work you’ll find her trying to convince Jack, her rescue dog, not to chase squirrels and crows (with limited success).

Mark Ryland

Mark Ryland

Mark is the director of the Office of the CISO for AWS. He has over 30 years of experience in the technology industry and has served in leadership roles in cybersecurity, software engineering, distributed systems, technology standardization and public policy. Previously, he served as the Director of Solution Architecture and Professional Services for the AWS World Public Sector team.

AWS Week in Review – June 27, 2022

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-week-in-review-june-27-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

It’s the beginning of a new week, and I’d like to start with a recap of the most significant AWS news from the previous 7 days. Last week was special because I had the privilege to be at the very first EMEA AWS Heroes Summit in Milan, Italy. It was a great opportunity of mutual learning as this community of experts shared their thoughts with AWS developer advocates, product managers, and technologists on topics such as containers, serverless, and machine learning.

Participants at the EMEA AWS Heroes Summit 2022

Last Week’s Launches
Here are the launches that got my attention last week:

Amazon Connect Cases (available in preview) – This new capability of Amazon Connect provides built-in case management for your contact center agents to create, collaborate on, and resolve customer issues. Learn more in this blog post that shows how to simplify case management in your contact center.

Many updates for Amazon RDS and Amazon AuroraAmazon RDS Custom for Oracle now supports Oracle database 12.2 and 18c, and Amazon RDS Multi-AZ deployments with one primary and two readable standby database instances now supports M5d and R5d instances and is available in more Regions. There is also a Regional expansion for RDS Custom. Finally, PostgreSQL 14, a new major version, is now supported by Amazon Aurora PostgreSQL-Compatible Edition.

AWS WAF Captcha is now generally available – You can use AWS WAF Captcha to block unwanted bot traffic by requiring users to successfully complete challenges before their web requests are allowed to reach resources.

Private IP VPNs with AWS Site-to-Site VPN – You can now deploy AWS Site-to-Site VPN connections over AWS Direct Connect using private IP addresses. This way, you can encrypt traffic between on-premises networks and AWS via Direct Connect connections without the need for public IP addresses.

AWS Center for Quantum Networking – Research and development of quantum computers have the potential to revolutionize science and technology. To address fundamental scientific and engineering challenges and develop new hardware, software, and applications for quantum networks, we announced the AWS Center for Quantum Networking.

Simpler access to sustainability data, plus a global hackathon – The Amazon Sustainability Data Initiative catalog of datasets is now searchable and discoverable through AWS Data Exchange. As part of a new collaboration with the International Research Centre in Artificial Intelligence, under the auspices of UNESCO, you can use the power of the cloud to help the world become sustainable by participating to the Amazon Sustainability Data Initiative Global Hackathon.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
A couple of takeaways from the Amazon re:MARS conference:

Amazon CodeWhisperer (preview) – Amazon CodeWhisperer is a coding companion powered by machine learning with support for multiple IDEs and languages.

Synthetic data generation with Amazon SageMaker Ground TruthGenerate labeled synthetic image data that you can combine with real-world data to create more complete training datasets for your ML models.

Some other updates you might have missed:

AstraZeneca’s drug design program built using AWS wins innovation award – AstraZeneca received the BioIT World Innovative Practice Award at the 20th anniversary of the Bio-IT World Conference for its novel augmented drug design platform built on AWS. More in this blog post.

Large object storage strategies for Amazon DynamoDB – A blog post showing different options for handling large objects within DynamoDB and the benefits and disadvantages of each approach.

Amazon DevOps Guru for RDS under the hoodSome details of how DevOps Guru for RDS works, with a specific focus on its scalability, security, and availability.

AWS open-source news and updates – A newsletter curated by my colleague Ricardo to bring you the latest open-source projects, posts, events, and more.

Upcoming AWS Events
It’s AWS Summits season and here are some virtual and in-person events that might be close to you:

On June 30, the AWS User Group Ukraine is running an AWS Tech Conference to discuss digital transformation with AWS. Join to learn from many sessions including a fireside chat with Dr. Werner Vogels, CTO at Amazon.com.

That’s all from me for this week. Come back next Monday for another Week in Review!


Accelerate Amazon DynamoDB data access in AWS Glue jobs using the new AWS Glue DynamoDB Export connector

Post Syndicated from Noritaka Sekiyama original https://aws.amazon.com/blogs/big-data/accelerate-amazon-dynamodb-data-access-in-aws-glue-jobs-using-the-new-aws-glue-dynamodb-elt-connector/

Modern data architectures encourage the integration of data lakes, data warehouses, and purpose-built data stores, enabling unified governance and easy data movement. With a modern data architecture on AWS, you can store data in a data lake and use a ring of purpose-built data services around the lake, allowing you to make decisions with speed and agility.

To achieve a modern data architecture, AWS Glue is the key service that integrates data over a data lake, data warehouse, and purpose-built data stores. AWS Glue simplifies data movement like inside-out, outside-in, or around the perimeter. A powerful purpose-built data store is Amazon DynamoDB, which is widely used by hundreds of thousands of companies, including Amazon.com. It’s common to move data from DynamoDB to a data lake built on top of Amazon Simple Storage Service (Amazon S3). Many customers move data from DynamoDB to Amazon S3 using AWS Glue extract, transform, and load (ETL) jobs.

Today, we’re pleased to announce the general availability of a new AWS Glue DynamoDB export connector. It’s built on top of the DynamoDB table export feature. It’s a scalable and cost-efficient way to read large DynamoDB table data in AWS Glue ETL jobs. This post describes the benefit of this new export connector and its use cases.

The following are typical use cases to read from DynamoDB tables using AWS Glue ETL jobs:

  • Move the data from DynamoDB tables to different data stores
  • Integrate the data with other services and applications
  • Retain historical snapshots for auditing
  • Build an S3 data lake from the DynamoDB data and analyze the data from various services, such as Amazon Athena, Amazon Redshift, and Amazon SageMaker

The new AWS Glue DynamoDB export connector

The old version of the AWS Glue DynamoDB connector reads DynamoDB tables through the DynamoDB Scan API. Instead, the new AWS Glue DynamoDB export connector reads DynamoDB data from the snapshot, which is exported from DynamoDB tables. This approach has following benefits:

  • It doesn’t consume read capacity units of the source DynamoDB tables
  • The read performance is consistent for large DynamoDB tables

Especially for large DynamoDB tables more than 100 GB, this new connector is significantly faster than the traditional connector.

To use this new export connector, you need to enable point-in-time recovery (PITR) for the source DynamoDB table in advance.

How to use the new connector on AWS Glue Studio Visual Editor

AWS Glue Studio Visual Editor is a graphical interface that makes it easy to create, run, and monitor AWS Glue ETL jobs in AWS Glue. The new DynamoDB export connector is available on AWS Glue Studio Visual Editor. You can choose Amazon DynamoDB as the source.

After you choose Create, you see the visual Directed Acyclic Graph (DAG). Here, you can choose your DynamoDB table that exists in this account or Region. This allows you to select DynamoDB tables (with PITR enabled) directly as a source in AWS Glue Studio. This provides a one-click export from any of your DynamoDB tables to Amazon S3. You can also easily add any data sources and targets or transformations to the DAG. For example, it allows you to join two different DynamoDB tables and export the result to Amazon S3, as shown in the following screenshot.

The following two connection options are automatically added. This location is used to store temporary data during the DynamoDB export phase. You can set S3 bucket lifecycle policies to expire temporary data.

  • dynamodb.s3.bucket – The S3 bucket to store temporary data during DynamoDB export
  • dynamodb.s3.prefix – The S3 prefix to store temporary data during DynamoDB export

How to use the new connector on the job script code

You can use the new export connector when you create an AWS Glue DynamicFrame in the job script code by configuring the following connection options:

  • dynamodb.export – (Required) You need to set this to ddb or s3
  • dynamodb.tableArn – (Required) Your source DynamoDB table ARN
  • dynamodb.unnestDDBJson – (Optional) If set to true, performs an unnest transformation of the DynamoDB JSON structure that is present in exports. The default value is false.
  • dynamodb.s3.bucket – (Optional) The S3 bucket to store temporary data during DynamoDB export
  • dynamodb.s3.prefix – (Optional) The S3 prefix to store temporary data during DynamoDB export

The following is the sample Python code to create a DynamicFrame using the new export connector:

dyf = glue_context.create_dynamic_frame.from_options(
        "dynamodb.export": "ddb",
        "dynamodb.tableArn": "test_source",
        "dynamodb.unnestDDBJson": True,
        "dynamodb.s3.bucket": "bucket name",
        "dynamodb.s3.prefix": "bucket prefix"

The new export connector doesn’t require configurations related to AWS Glue job parallelism, unlike the old connector. Now you no longer need to change the configuration when you scale out the AWS Glue job. It also doesn’t require any configuration regarding DynamoDB table read/write capacity and its capacity mode (on demand or provisioned).

DynamoDB table schema handling

By default, the new export connector reads data in DynamoDB JSON structure that is present in exports. The following is an example schema of the frame using the Amazon Customer Review Dataset:

|-- Item: struct (nullable = true)
| |-- product_id: struct (nullable = true)
| | |-- S: string (nullable = true)
| |-- review_id: struct (nullable = true)
| | |-- S: string (nullable = true)
| |-- total_votes: struct (nullable = true)
| | |-- N: string (nullable = true)
| |-- product_title: struct (nullable = true)
| | |-- S: string (nullable = true)
| |-- star_rating: struct (nullable = true)
| | |-- N: string (nullable = true)
| |-- customer_id: struct (nullable = true)
| | |-- S: string (nullable = true)
| |-- marketplace: struct (nullable = true)
| | |-- S: string (nullable = true)
| |-- helpful_votes: struct (nullable = true)
| | |-- N: string (nullable = true)
| |-- review_headline: struct (nullable = true)
| | |-- S: string (nullable = true)
| | |-- NULL: boolean (nullable = true)
| |-- review_date: struct (nullable = true)
| | |-- S: string (nullable = true)
| |-- vine: struct (nullable = true)
| | |-- S: string (nullable = true)
| |-- review_body: struct (nullable = true)
| | |-- S: string (nullable = true)
| | |-- NULL: boolean (nullable = true)
| |-- verified_purchase: struct (nullable = true)
| | |-- S: string (nullable = true)
| |-- product_category: struct (nullable = true)
| | |-- S: string (nullable = true)
| |-- year: struct (nullable = true)
| | |-- N: string (nullable = true)
| |-- product_parent: struct (nullable = true)
| | |-- S: string (nullable = true)

To read DynamoDB item columns without handling nested data, you can set dynamodb.unnestDDBJson to True. The following is an example of the schema of the same data where dynamodb.unnestDDBJson is set to True:

|-- product_id: string (nullable = true)
|-- review_id: string (nullable = true)
|-- total_votes: string (nullable = true)
|-- product_title: string (nullable = true)
|-- star_rating: string (nullable = true)
|-- customer_id: string (nullable = true)
|-- marketplace: string (nullable = true)
|-- helpful_votes: string (nullable = true)
|-- review_headline: string (nullable = true)
|-- review_date: string (nullable = true)
|-- vine: string (nullable = true)
|-- review_body: string (nullable = true)
|-- verified_purchase: string (nullable = true)
|-- product_category: string (nullable = true)
|-- year: string (nullable = true)
|-- product_parent: string (nullable = true)

Data freshness

Data freshness is the measure of staleness of the data from the live tables in the original source. In the new export connecor, the option dynamodb.export impacts data freshness.

When dynamodb.export is set to ddb, the AWS Glue job invokes a new export and then reads the export placed in an S3 bucket into DynamicFrame. It reads exports of the live table, so data can be fresh. On the other hand, when dynamodb.export is set to s3, the AWS Glue job skips invoking a new export and directly reads an export already placed in an S3 bucket. It reads exports of the past table, so data can be stale, but you can reduce overhead to trigger the exports.

The following table explains the data freshness and pros and cons of each option.

.. dynamodb.export Config Data Freshness Data Source Pros Cons
New export connector s3 Stale Export of the past table
  • RCU is not consumed
  • Can skip triggering exports
  • Data can be stale
New export connector ddb Fresh Export of the live table
  • Data can be fresh
  • RCU is not consumed
  • Overhead to trigger exports and wait for completion
Old connector N/A Most fresh Scan of the live tables
  • Data can be fresh
  • Read capacity unit (RCU) is consumed


The following benchmark shows the performance improvements between the old version of the AWS Glue DynamoDB connector and the new export connector. The comparison uses the DynamoDB tables storing the TPC-DS benchmark dataset with different scales from 10 MB to 2 TB. The sample Spark job reads from the DynamoDB table and calculates the count of the items. All the Spark jobs are run on AWS Glue 3.0, G.2X, 60 workers.

The following chart compares AWS Glue job duration between the old connector and the new export connector. For small DynamoDB tables, the old connector is faster. For large tables more than 80 GB, the new export connector is faster. In other words, the DynamoDB export connector is recommended for jobs that take the old connector more than 5–10 minutes to run. Also, the chart shows that the duration of the new export connector increases slowly as data size increases, although the duration of the old connector increases rapidly as data size increases. This means that the new export connector is suitable especially for larger tables.

The following chart compares dollar cost between the old connector and the new export connector. It contains the AWS Glue DPU hour cost summed with the cost for reading data from DynamoDB. For the old connector, we include the read request cost. For the new export connector, we include the cost in the DynamoDB data export to Amazon S3. Both are calculated in DynamoDB on-demand capacity mode.

With AWS Glue Auto Scaling

AWS Glue Auto Scaling is a new feature to automatically resize computing resources for better performance at lower cost. You can take advantage of AWS Glue Auto Scaling with the new DynamoDB export connector.

As the following chart shows, with AWS Glue Auto Scaling, the duration of the new export connector is shorter than the old connector when the size of the source DynamoDB table is 100 GB or more. It shows a similar trend without AWS Glue Auto Scaling.

You get the cost benefits as only Spark driver is active for most of the time duration during the DynamoDB export (which is nearly 30% of the total job duration time with the old scan-based connector).


AWS Glue is a key service to integrate with multiple data stores. At AWS, we keep improving the performance and cost-efficiency of our services. In this post, we announced the availability of the new AWS Glue DynamoDB export connector. With this new connector, you can easily integrate your large data on DynamoDB tables with different data stores. It helps you read the large tables faster from AWS Glue jobs at lower cost.

The new AWS Glue DynamoDB export connector is now generally available in all supported Glue Regions. Let’s start using the new AWS Glue DynamoDB export connector today! We are looking forward to your feedback and stories on how you utilize the connector for your needs.

About the Authors

Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He is responsible for building software artifacts that help customers build data lakes on the cloud.

Neil Gupta is a Software Development Engineer on the AWS Glue team. He enjoys tackling big data problems and learning more about distributed systems.

Andrew Kim is a Software Development Engineer on the AWS Glue team. His passion is to build scalable and effective solutions to challenging problems and working with distributed systems.

Savio Dsouza is a Software Development Manager on the AWS Glue team. His team works on distributed systems for efficiently managing data lakes on AWS and optimizing Apache Spark for performance and reliability.

Optimize Federated Query Performance using EXPLAIN and EXPLAIN ANALYZE in Amazon Athena

Post Syndicated from Nishchai JM original https://aws.amazon.com/blogs/big-data/optimize-federated-query-performance-using-explain-and-explain-analyze-in-amazon-athena/

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon Simple Storage Service (Amazon S3) using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. In 2019, Athena added support for federated queries to run SQL queries across data stored in relational, non-relational, object, and custom data sources.

In 2021, Athena added support for the EXPLAIN statement, which can help you understand and improve the efficiency of your queries. The EXPLAIN statement provides a detailed breakdown of a query’s run plan. You can analyze the plan to identify and reduce query complexity and improve its runtime. You can also use EXPLAIN to validate SQL syntax prior to running the query. Doing so helps prevent errors that would have occurred while running the query.

Athena also added EXPLAIN ANALYZE, which displays the computational cost of your queries alongside their run plans. Administrators can benefit from using EXPLAIN ANALYZE because it provides a scanned data count, which helps you reduce financial impact due to user queries and apply optimizations for better cost control.

In this post, we demonstrate how to use and interpret EXPLAIN and EXPLAIN ANALYZE statements to improve Athena query performance when querying multiple data sources.

Solution overview

To demonstrate using EXPLAIN and EXPLAIN ANALYZE statements, we use the following services and resources:

Athena uses the AWS Glue Data Catalog to store and retrieve table metadata for the Amazon S3 data in your AWS account. The table metadata lets the Athena query engine know how to find, read, and process the data that you want to query. We use Athena data source connectors to connect to data sources external to Amazon S3.


To deploy the CloudFormation template, you must have the following:

Provision resources with AWS CloudFormation

To deploy the CloudFormation template, complete the following steps:

  1. Choose Launch Stack:

  1. Follow the prompts on the AWS CloudFormation console to create the stack.
  2. Note the key-value pairs on the stack’s Outputs tab.

You use these values when configuring the Athena data source connectors.

The CloudFormation template creates the following resources:

  • S3 buckets to store data and act as temporary spill buckets for Lambda
  • AWS Glue Data Catalog tables for the data in the S3 buckets
  • A DynamoDB table and Amazon RDS for MySQL tables, which are used to join multiple tables from different sources
  • A VPC, subnets, and endpoints, which are needed for Amazon RDS for MySQL and DynamoDB

The following figure shows the high-level data model for the data load.

Create the DynamoDB data source connector

To create the DynamoDB connector for Athena, complete the following steps:

  1. On the Athena console, choose Data sources in the navigation pane.
  2. Choose Create data source.
  3. For Data sources, select Amazon DynamoDB.
  4. Choose Next.

  1. For Data source name, enter DDB.

  1. For Lambda function, choose Create Lambda function.

This opens a new tab in your browser.

  1. For Application name, enter AthenaDynamoDBConnector.
  2. For SpillBucket, enter the value from the CloudFormation stack for AthenaSpillBucket.
  3. For AthenaCatalogName, enter dynamodb-lambda-func.
  4. Leave the remaining values at their defaults.
  5. Select I acknowledge that this app creates custom IAM roles and resource policies.
  6. Choose Deploy.

You’re returned to the Connect data sources section on the Athena console.

  1. Choose the refresh icon next to Lambda function.
  2. Choose the Lambda function you just created (dynamodb-lambda-func).

  1. Choose Next.
  2. Review the settings and choose Create data source.
  3. If you haven’t already set up the Athena query results location, choose View settings on the Athena query editor page.

  1. Choose Manage.
  2. For Location of query result, browse to the S3 bucket specified for the Athena spill bucket in the CloudFormation template.
  3. Add Athena-query to the S3 path.
  4. Choose Save.

  1. In the Athena query editor, for Data source, choose DDB.
  2. For Database, choose default.

You can now explore the schema for the sportseventinfo table; the data is the same in DynamoDB.

  1. Choose the options icon for the sportseventinfo table and choose Preview Table.

Create the Amazon RDS for MySQL data source connector

Now let’s create the connector for Amazon RDS for MySQL.

  1. On the Athena console, choose Data sources in the navigation pane.
  2. Choose Create data source.
  3. For Data sources, select MySQL.
  4. Choose Next.

  1. For Data source name, enter MySQL.

  1. For Lambda function, choose Create Lambda function.

  1. For Application name, enter AthenaMySQLConnector.
  2. For SecretNamePrefix, enter AthenaMySQLFederation.
  3. For SpillBucket, enter the value from the CloudFormation stack for AthenaSpillBucket.
  4. For DefaultConnectionString, enter the value from the CloudFormation stack for MySQLConnection.
  5. For LambdaFunctionName, enter mysql-lambda-func.
  6. For SecurityGroupIds, enter the value from the CloudFormation stack for RDSSecurityGroup.
  7. For SubnetIds, enter the value from the CloudFormation stack for RDSSubnets.
  8. Select I acknowledge that this app creates custom IAM roles and resource policies.
  9. Choose Deploy.

  1. On the Lambda console, open the function you created (mysql-lambda-func).
  2. On the Configuration tab, under Environment variables, choose Edit.

  1. Choose Add environment variable.
  2. Enter a new key-value pair:
    • For Key, enter MYSQL_connection_string.
    • For Value, enter the value from the CloudFormation stack for MySQLConnection.
  3. Choose Save.

  1. Return to the Connect data sources section on the Athena console.
  2. Choose the refresh icon next to Lambda function.
  3. Choose the Lambda function you created (mysql-lamdba-function).

  1. Choose Next.
  2. Review the settings and choose Create data source.
  3. In the Athena query editor, for Data Source, choose MYSQL.
  4. For Database, choose sportsdata.

  1. Choose the options icon by the tables and choose Preview Table to examine the data and schema.

In the following sections, we demonstrate different ways to optimize our queries.

Optimal join order using EXPLAIN plan

A join is a basic SQL operation to query data on multiple tables using relations on matching columns. Join operations affect how much data is read from a table, how much data is transferred to the intermediate stages through networks, and how much memory is needed to build up a hash table to facilitate a join.

If you have multiple join operations and these join tables aren’t in the correct order, you may experience performance issues. To demonstrate this, we use the following tables from difference sources and join them in a certain order. Then we observe the query runtime and improve performance by using the EXPLAIN feature from Athena, which provides some suggestions for optimizing the query.

The CloudFormation template you ran earlier loaded data into the following services:

AWS Storage Table Name Number of Rows
Amazon DynamoDB sportseventinfo 657
Amazon S3 person 7,025,585
Amazon S3 ticketinfo 2,488

Let’s construct a query to find all those who participated in the event by type of tickets. The query runtime with the following join took approximately 7 mins to complete:

SELECT t.id AS ticket_id, 
"DDB"."default"."sportseventinfo" e, 
"AwsDataCatalog"."athenablog"."person" p, 
"AwsDataCatalog"."athenablog"."ticketinfo" t 
t.sporting_event_id = cast(e.eventid as double) 
AND t.ticketholder_id = p.id

Now let’s use EXPLAIN on the query to see its run plan. We use the same query as before, but add explain (TYPE DISTRIBUTED):

SELECT t.id AS ticket_id, 
"DDB"."default"."sportseventinfo" e, 
"AwsDataCatalog"."athenablog"."person" p, 
"AwsDataCatalog"."athenablog"."ticketinfo" t 
t.sporting_event_id = cast(e.eventid as double) 
AND t.ticketholder_id = p.id

The following screenshot shows our output

Notice the cross-join in Fragment 1. The joins are converted to a Cartesian product for each table, where every record in a table is compared to every record in another table. Therefore, this query takes a significant amount of time to complete.

To optimize our query, we can rewrite it by reordering the joining tables as sportseventinfo first, ticketinfo second, and person last. The reason for this is because the WHERE clause, which is being converted to a JOIN ON clause during the query plan stage, doesn’t have the join relationship between the person table and sportseventinfo table. Therefore, the query plan generator converted the join type to cross-joins (a Cartesian product), which less efficient. Reordering the tables aligns the WHERE clause to the INNER JOIN type, which satisfies the JOIN ON clause and runtime is reduced from 7 minutes to 10 seconds.

The code for our optimized query is as follows:

SELECT t.id AS ticket_id, 
"DDB"."default"."sportseventinfo" e, 
"AwsDataCatalog"."athenablog"."ticketinfo" t, 
"AwsDataCatalog"."athenablog"."person" p 
t.sporting_event_id = cast(e.eventid as double) 
AND t.ticketholder_id = p.id

The following is the EXPLAIN output of our query after reordering the join clause:

SELECT t.id AS ticket_id, 
"DDB"."default"."sportseventinfo" e, 
"AwsDataCatalog"."athenablog"."ticketinfo" t, 
"AwsDataCatalog"."athenablog"."person" p 
WHERE t.sporting_event_id = cast(e.eventid as double) 
AND t.ticketholder_id = p.id

The following screenshot shows our output.

The cross-join changed to INNER JOIN with join on columns (eventid, id, ticketholder_id), which results in the query running faster. Joins between the ticketinfo and person tables converted to the PARTITION distribution type, where both left and right tables are hash-partitioned across all worker nodes due to the size of the person table. The join between the sportseventinfo table and ticketinfo are converted to the REPLICATED distribution type, where one table is hash-partitioned across all worker nodes and the other table is replicated to all worker nodes to perform the join operation.

For more information about how to analyze these results, refer to Understanding Athena EXPLAIN statement results.

As a best practice, we recommend having a JOIN statement along with an ON clause, as shown in the following code:

SELECT t.id AS ticket_id, 
"AwsDataCatalog"."athenablog"."person" p 
JOIN "AwsDataCatalog"."athenablog"."ticketinfo" t ON t.ticketholder_id = p.id 
JOIN "ddb"."default"."sportseventinfo" e ON t.sporting_event_id = cast(e.eventid as double)

Also as a best practice when you join two tables, specify the larger table on the left side of join and the smaller table on the right side of the join. Athena distributes the table on the right to worker nodes, and then streams the table on the left to do the join. If the table on the right is smaller, then less memory is used and the query runs faster.

In the following sections, we present examples of how to optimize pushdowns for filter predicates and projection filter operations for the Athena data source using EXPLAIN ANALYZE.

Pushdown optimization for the Athena connector for Amazon RDS for MySQL

A pushdown is an optimization to improve the performance of a SQL query by moving its processing as close to the data as possible. Pushdowns can drastically reduce SQL statement processing time by filtering data before transferring it over the network and filtering data before loading it into memory. The Athena connector for Amazon RDS for MySQL supports pushdowns for filter predicates and projection pushdowns.

The following table summarizes the services and tables we use to demonstrate a pushdown using Aurora MySQL.

Table Name Number of Rows Size in KB
player_partitioned 5,157 318.86
sport_team_partitioned 62 5.32

We use the following query as an example of a filtering predicate and projection filter:

SELECT full_name,
FROM "sportsdata"."player_partitioned" a 
JOIN "sportsdata"."sport_team_partitioned" b ON a.sport_team_id=b.id 
WHERE a.id='1.0'

This query selects the players and their team based on their ID. It serves as an example of both filter operations in the WHERE clause and projection because it selects only two columns.

We use EXPLAIN ANALYZE to get the cost for the running this query:

SELECT full_name,
FROM "MYSQL"."sportsdata"."player_partitioned" a 
JOIN "MYSQL"."sportsdata"."sport_team_partitioned" b ON a.sport_team_id=b.id 
WHERE a.id='1.0'

The following screenshot shows the output in Fragment 2 for the table player_partitioned, in which we observe that the connector has a successful pushdown filter on the source side, so it tries to scan only one record out of the 5,157 records in the table. The output also shows that the query scan has only two columns (full_name as the projection column and sport_team_id and the join column), and uses SELECT and JOIN, which indicates the projection pushdown is successful. This helps reduce the data scan when using Athena data source connectors.

Now let’s look at the conditions in which a filter predicate pushdown doesn’t work with Athena connectors.

LIKE statement in filter predicates

We start with the following example query to demonstrate using the LIKE statement in filter predicates:

FROM "MYSQL"."sportsdata"."player_partitioned" 
WHERE first_name LIKE '%Aar%'


FROM "MYSQL"."sportsdata"."player_partitioned" 
WHERE first_name LIKE '%Aar%'

The EXPLAIN ANALYZE output shows that the query performs the table scan (scanning the table player_partitioned, which contains 5,157 records) for all the records even though the WHERE clause only has 30 records matching the condition %Aar%. Therefore, the data scan shows the complete table size even with the WHERE clause.

We can optimize the same query by selecting only the required columns:

SELECT sport_team_id,
FROM "MYSQL"."sportsdata"."player_partitioned" 
WHERE first_name LIKE '%Aar%'

From the EXPLAIN ANALYZE output, we can observe that the connector supports the projection filter pushdown, because we select only two columns. This brought the data scan size down to half of the table size.

OR statement in filter predicates

We start with the following query to demonstrate using the OR statement in filter predicates:

FROM "MYSQL"."sportsdata"."player_partitioned" 
WHERE first_name = 'Aaron' OR id ='1.0'

We use EXPLAIN ANALYZE with the preceding query as follows:

WHERE first_name = 'Aaron' OR id ='1.0'

Similar to the LIKE statement, the following output shows that query scanned the table instead of pushing down to only the records that matched the WHERE clause. This query outputs only 16 records, but the data scan indicates a complete scan.

Pushdown optimization for the Athena connector for DynamoDB

For our example using the DynamoDB connector, we use the following data:

Table Number of Rows Size in KB
sportseventinfo 657 85.75

Let’s test the filter predicate and project filter operation for our DynamoDB table using the following query. This query tries to get all the events and sports for a given location. We use EXPLAIN ANALYZE for the query as follows:

FROM "DDB"."default"."sportseventinfo" 
WHERE Location = 'Chase Field'

The output of EXPLAIN ANALYZE shows that the filter predicate retrieved only 21 records, and the project filter selected only two columns to push down to the source. Therefore, the data scan for this query is less than the table size.

Now let’s see where filter predicate pushdown doesn’t work. In the WHERE clause, if you apply the TRIM() function to the Location column and then filter, predicate pushdown optimization doesn’t apply, but we still see the projection filter optimization, which does apply. See the following code:

FROM "DDB"."default"."sportseventinfo" 
WHERE trim(Location) = 'Chase Field'

The output of EXPLAIN ANALYZE for this query shows that the query scans all the rows but is still limited to only two columns, which shows that the filter predicate doesn’t work when the TRIM function is applied.

We’ve seen from the preceding examples that the Athena data source connector for Amazon RDS for MySQL and DynamoDB do support filter predicates and projection predicates for pushdown optimization, but we also saw that operations such as LIKE, OR, and TRIM when used in the filter predicate don’t support pushdowns to the source. Therefore, if you encounter unexplained charges in your federated Athena query, we recommend using EXPLAIN ANALYZE with the query and determine whether your Athena connector supports the pushdown operation or not.

Please note that running EXPLAIN ANALYZE incurs cost because it scans the data.


In this post, we showcased how to use EXPLAIN and EXPLAIN ANALYZE to analyze Athena SQL queries for data sources on AWS S3 and Athena federated SQL query for data source like DynamoDB and Amazon RDS for MySQL. You can use this as an example to optimize queries which would also result in cost savings.

About the Authors

Nishchai JM is an Analytics Specialist Solutions Architect at Amazon Web services. He specializes in building Big-data applications and help customer to modernize their applications on Cloud. He thinks Data is new oil and spends most of his time in deriving insights out of the Data.

Varad Ram is Senior Solutions Architect in Amazon Web Services. He likes to help customers adopt to cloud technologies and is particularly interested in artificial intelligence. He believes deep learning will power future technology growth. In his spare time, he like to be outdoor with his daughter and son.

Use direct service integrations to optimize your architecture

Post Syndicated from Jerome Van Der Linden original https://aws.amazon.com/blogs/architecture/use-direct-service-integrations-to-optimize-your-architecture/

When designing an application, you must integrate and combine several AWS services in the most optimized way for an effective and efficient architecture:

  • Optimize for performance by reducing the latency between services
  • Optimize for costs operability and sustainability, by avoiding unnecessary components and reducing workload footprint
  • Optimize for resiliency by removing potential point of failures
  • Optimize for security by minimizing the attack surface

As stated in the Serverless Application Lens of the Well-Architected Framework, “If your AWS Lambda function is not performing custom logic while integrating with other AWS services, chances are that it may be unnecessary.” In addition, Amazon API Gateway, AWS AppSync, AWS Step Functions, Amazon EventBridge, and Lambda Destinations can directly integrate with a number of services. These optimizations can offer you more value and less operational overhead.

This blog post will show how to optimize an architecture with direct integration.

Workflow example and initial architecture

Figure 1 shows a typical workflow for the creation of an online bank account. The customer fills out a registration form with personal information and adds a picture of their ID card. The application then validates ID and address, and scans if there is already an existing user by that name. If everything checks out, a backend application will be notified to create the account. Finally, the user is notified of successful completion.

Figure 1. Bank account application workflow

Figure 1. Bank account application workflow

The workflow architecture is shown in Figure 2 (click on the picture to get full resolution).

Figure 2. Initial account creation architecture

Figure 2. Initial account creation architecture

This architecture contains 13 Lambda functions. If you look at the code on GitHub, you can see that:

Five of these Lambda functions are basic and perform simple operations:

Additional Lambda functions perform other tasks, such as verification and validation:

  • One function generates a presigned URL to upload ID card pictures to Amazon Simple Storage Service (Amazon S3)
  • One function uses the Amazon Textract API to extract information from the ID card
  • One function verifies the identity of the user against the information extracted from the ID card
  • One function performs simple HTTP request to a third-party API to validate the address

Finally, four functions concern the websocket (connect, message, and disconnect) and notifications to the user.

Opportunities for improvement

If you further analyze the code of the five basic functions (see startWorkflow on GitHub, for example), you will notice that there are actually three lines of fundamental code that start the workflow. The others 38 lines involve imports, input validation, error handling, logging, and tracing. Remember that all this code must be tested and maintained.

import os
import json
import boto3
from aws_lambda_powertools import Tracer
from aws_lambda_powertools import Logger
import re

logger = Logger()
tracer = Tracer()

sfn = boto3.client('stepfunctions')

PATTERN = re.compile(r"^arn:(aws[a-zA-Z-]*)?:states:[a-z]{2}((-gov)|(-iso(b?)))?-[a-z]+-\d{1}:\d{12}:stateMachine:[a-zA-Z0-9-_]+$")

if ('STATE_MACHINE_ARN' not in os.environ
    or os.environ['STATE_MACHINE_ARN'] is None
    or not PATTERN.match(os.environ['STATE_MACHINE_ARN'])):
    raise RuntimeError('STATE_MACHINE_ARN env var is not set or incorrect')


def handler(event, context):
        event['requestId'] = context.aws_request_id


        return {
            'requestId': event['requestId']
    except Exception as error:
        raise RuntimeError('Internal Error - cannot start the creation workflow') from error

After running this workflow several times and reviewing the AWS X-Ray traces (Figure 3), we can see that it takes about 2–3 seconds when functions are warmed:

Figure 3. X-Ray traces when Lambda functions are warmed

Figure 3. X-Ray traces when Lambda functions are warmed

But the process takes around 10 seconds with cold starts, as shown in Figure 4:

Figure 4. X-Ray traces when Lambda functions are cold

Figure 4. X-Ray traces when Lambda functions are cold

We use an asynchronous architecture to avoid waiting time for the user, as this can be a long process. We also use WebSockets to notify the user when it’s finished. This adds some complexity, new components, and additional costs to the architecture. Now let’s look at how we can optimize this architecture.

Improving the initial architecture

Direct integration with Step Functions

Step Functions can directly integrate with some AWS services, including DynamoDB, Amazon SQS, and EventBridge, and more than 10,000 APIs from 200+ AWS services. With these integrations, you can replace Lambda functions when they do not provide value. We recommend using Lambda functions to transform data, not to transport data from one service to another.

In our bank account creation use case, there are four Lambda functions we can replace with direct service integrations (see large arrows in Figure 5):

  • Query a DynamoDB table to search for a user
  • Send a message to an SQS queue when the extraction fails
  • Create the user in DynamoDB
  • Send an event on EventBridge to notify the backend
Figure 5. Lambda functions that can be replaced

Figure 5. Lambda functions that can be replaced

It is not as clear that we need to replace the other Lambda functions. Here are some considerations:

  • To extract information from the ID card, we use Amazon Textract. It is available through the SDK integration in Step Functions. However, the API’s response provides too much information. We recommend using a library such as amazon-textract-response-parser to parse the result. For this, you’ll need a Lambda function.
  • The identity cross-check performs a simple comparison between the data provided in the web form and the one extracted in the ID card. We can perform this comparison in Step Functions using a Choice state and several conditions. If the business logic becomes more complex, consider using a Lambda function.
  • To validate the address, we query a third-party API. Step Functions cannot directly call a third-party HTTP endpoint, but because it’s integrated with API Gateway, we can create a proxy for this endpoint.

If you only need to retrieve data from an API or make a simple API call, use the direct integration. If you need to implement some logic, use a Lambda function.

Direct integration with API Gateway

API Gateway also provides service integrations. In particular, we can start the workflow without using a Lambda function. In the console, select the integration type “AWS Service”, the AWS service “Step Functions”, the action “StartExecution”, and “POST” method, as shown in Figure 6.

Figure 6. API Gateway direct integration with Step Functions

Figure 6. API Gateway direct integration with Step Functions

After that, use a mapping template in the integration request to define the parameters as shown here:

  "stateMachineArn":"arn:aws:states:eu-central-1:123456789012:stateMachine: accountCreationWorkflow",

We can go further and remove the websockets and associated Lambda functions connect, message, and disconnect. By using Synchronous Express Workflows and the StartSyncExecution API, we can start the workflow and wait for the result in a synchronous fashion. API Gateway will then directly return the result of the workflow to the client.

Final optimized architecture

After applying these optimizations, we have the updated architecture shown in Figure 7. It uses only two Lambda functions out of the initial 13. The rest have been replaced by direct service integrations or implemented in Step Functions.

Figure 7. Final optimized architecture

Figure 7. Final optimized architecture

We were able to remove 11 Lambda functions and their associated fees. In this architecture, the cost is mainly driven by Step Functions, and the main price difference will be your use of Express Workflows instead of Standard Workflows. If you need to keep some Lambda functions, use AWS Lambda Power Tuning to configure your function correctly and benefit from the best price/performance ratio.

One of the main benefits of this architecture is performance. With the final workflow architecture, it now takes about 1.5 seconds when the Lambda function is warmed and 3 seconds on cold starts (versus up to 10 seconds previously), see Figure 8:

Figure 8. X-Ray traces for the final architecture

Figure 8. X-Ray traces for the final architecture

The process can now be synchronous. It reduces the complexity of the architecture and vastly improves the user experience.

An added benefit is that by reducing the overall complexity and removing the unnecessary Lambda functions, we have also reduced the risk of failures. These can be errors in the code, memory or timeout issues due to bad configuration, lack of permissions, network issues between components, and more. This increases the resiliency of the application and eases its maintenance.


Testability is an important consideration when building your workflow. Unit testing a Lambda function is straightforward, and you can use your preferred testing framework and validate methods. Adopting a hexagonal architecture also helps remove dependencies to the cloud.

When removing functions and using an approach with direct service integrations, you are by definition directly connected to the cloud. You still must verify that the overall process is working as expected, and validate these integrations.

You can achieve this kind of tests locally using Step Functions Local, and the recently announced Mocked Service Integrations. By mocking service integrations, for example, retrieving an item in DynamoDB, you can validate the different paths of your state machine.

You also have to perform integration tests, but this is true whether you use direct integrations or Lambda functions.


This post describes how to simplify your architecture and optimize for performance, resiliency, and cost by using direct integrations in Step Functions and API Gateway. Although many Lambda functions were reduced, some remain useful for handling more complex business logic and data transformation. Try this out now by visiting the GitHub repository.

For further reading:

Throttling a tiered, multi-tenant REST API at scale using API Gateway: Part 2

Post Syndicated from Nick Choi original https://aws.amazon.com/blogs/architecture/throttling-a-tiered-multi-tenant-rest-api-at-scale-using-api-gateway-part-2/

In Part 1 of this blog series, we demonstrated why tiering and throttling become necessary at scale for multi-tenant REST APIs, and explored tiering strategy and throttling with Amazon API Gateway.

In this post, Part 2, we will examine tenant isolation strategies at scale with API Gateway and extend the sample code from Part 1.

Enhancing the sample code

To enable this functionality in the sample code (Figure 1), we will make manual changes. First, create one API key for the Free Tier and five API keys for the Basic Tier. Currently, these API keys are private keys for your Amazon Cognito login, but we will make a further change in the backend business logic that will promote them to pooled resources. Note that all of these modifications are specific to this sample code’s implementation; the implementation and deployment of a production code may be completely different (Figure 1).

Cloud architecture of the sample code

Figure 1. Cloud architecture of the sample code

Next, in the business logic for thecreateKey(), find the AWS Lambda function in lambda/create_key.js.  It appears like this:

function createKey(tableName, key, plansTable, jwt, rand, callback) {
  const pool = getPoolForPlanId( key.planId ) 
  if (!pool) {
    createSiloedKey(tableName, key, plansTable, jwt, rand, callback);
  } else {
    createPooledKey(pool, tableName, key, jwt, callback);

The getPoolForPlanId() function does a search for a pool of keys associated with the usage plan. If there is a pool, we “create” a kind of reference to the pooled resource, rather than a completely new key that is created by the API Gateway service directly. The lambda/api_key_pools.js should be empty.

exports.apiKeyPools = [];

In effect, all usage plans were considered as siloed keys up to now. To change that, populate the data structure with values from the six API keys that were created manually. You will have to look up the IDs of the API keys and usage plans that were created in API Gateway (Figures 2 and 3). Using the AWS console to navigate to API Gateway is the most intuitive.

A view of the AWS console when inspecting the ID for the Basic usage plan

Figure 2. A view of the AWS console when inspecting the ID for the Basic usage plan

A view of the AWS Console when looking up the API key value (not the ID)

Figure 3. A view of the AWS Console when looking up the API key value (not the ID)

When done, your code in lambda/api_key_pools.js should be the following, but instead of ellipses (), the IDs for the user plans and API keys specific to your environment will appear.

exports.apiKeyPools = [{
    planName: "FreePlan"
    planId: "...",
    apiKeys: [ "..." ]
    planName: "BasicPlan"
    planId: "...",
    apiKeys: [ "...", "...", "...", "...", "..." ]

After making the code changes, run cdk deploy from the command line to update the Lambda functions. This change will only affect key creation and deletion because of the system implementation. Updates affect only the user’s specific reference to the key, not the underlying resource managed by API Gateway.

When the web application is run now, it will look similar to before—tenants should not be aware what tiering strategy they have been assigned to. The only way to notice the difference would be to create two Free Tier keys, test them, and note that the value of the X-API-KEY header is unchanged between the two.

Now, you have a virtually unlimited number of users who can have API keys in the Free or Basic Tier. By keeping the Premium Tier siloed, you are subject to the 10,000-API-key maximum (less any keys allocated for the lower tiers). You may consider additional techniques to continue to scale, such as replicating your service in another AWS account.

Other production considerations

The sample code is minimal, and it illustrates just one aspect of scaling a Software-as-a-service (SaaS) application. There are many other aspects be considered in a production setting that we explore in this section.

The throttled endpoint, GET /api rely only on API key for authorization for demonstration purpose. For any production implementation consider authentication options for your REST APIs. You may explore and extend to require authentication with Cognito similar to /admin/* endpoints in the sample code.

One API key for Free Tier access and five API keys for Basic Tier access are illustrative in a sample code but not representative of production deployments. Number of API keys with service quota into consideration, business and technical decisions may be made to minimize noisy neighbor effect such as setting blast radius upper threshold of 0.1% of all users. To satisfy that requirement, each tier would need to spread users across at least 1,000 API keys. The number of keys allocated to Basic or Premium Tier would depend on market needs and pricing strategies. Additional allocations of keys could be held in reserve for troubleshooting, QA, tenant migrations, and key retirement.

In the planning phase of your solution, you will decide how many tiers to provide, how many usage plans are needed, and what throttle limits and quotas to apply. These decisions depend on your architecture and business.

To define API request limits, examine the system API Gateway is protecting and what load it can sustain. For example, if your service will scale up to 1,000 requests per second, it is possible to implement three tiers with a 10/50/40 split: the lowest tier shares one common API key with a 100 request per second limit; an intermediate tier has a pool of 25 API keys with a limit of 20 requests per second each; and the highest tier has a maximum of 10 API keys, each supporting 40 requests per second.

Metrics play a large role in continuously evolving your SaaS-tiering strategy (Figure 4). They provide rich insights into how tenants are using the system. Tenant-aware and SaaS-wide metrics on throttling and quota limits can be used to: assess tiering in-place, if tenants’ requirements are being met, and if currently used tenant usage profiles are valid (Figure 5).

Tiering strategy example with 3 tiers and requests allocation per tier

Figure 4. Tiering strategy example with 3 tiers and requests allocation per tier

An example SaaS metrics dashboard

Figure 5. An example SaaS metrics dashboard

API Gateway provides options for different levels of granularity required, including detailed metrics, and execution and access logging to enable observability of your SaaS solution. Granular usage metrics combined with underlying resource consumption leads to managing optimal experience for your tenants with throttling levels and policies per method and per client.


To avoid incurring future charges, delete the resources. This can be done on the command line by typing:

cd ${TOP}/cdk
cdk destroy

cd ${TOP}/react
amplify delete

${TOP} is the topmost directory of the sample code. For the most up-to-date information, see the README.md file.


In this two-part blog series, we have reviewed the best practices and challenges of effectively guarding a tiered multi-tenant REST API hosted in AWS API Gateway. We also explored how throttling policy and quota management can help you continuously evaluate the needs of your tenants and evolve your tiering strategy to protect your backend systems from being overwhelmed by inbound traffic.

Further reading:

Throttling a tiered, multi-tenant REST API at scale using API Gateway: Part 1

Post Syndicated from Nick Choi original https://aws.amazon.com/blogs/architecture/throttling-a-tiered-multi-tenant-rest-api-at-scale-using-api-gateway-part-1/

Many software-as-a-service (SaaS) providers adopt throttling as a common technique to protect a distributed system from spikes of inbound traffic that might compromise reliability, reduce throughput, or increase operational cost. Multi-tenant SaaS systems have an additional concern of fairness; excessive traffic from one tenant needs to be selectively throttled without impacting the experience of other tenants. This is also known as “the noisy neighbor” problem. AWS itself enforces some combination of throttling and quota limits on nearly all its own service APIs. SaaS providers building on AWS should design and implement throttling strategies in all of their APIs as well.

In this two-part blog series, we will explore tiering and throttling strategies for multi-tenant REST APIs and review tenant isolation models with hands-on sample code. In part 1, we will look at why a tiering and throttling strategy is needed and show how Amazon API Gateway can help by showing sample code. In part 2, we will dive deeper into tenant isolation models as well as considerations for production.

We selected Amazon API Gateway for this architecture since it is a fully managed service that helps developers to create, publish, maintain, monitor, and secure APIs. First, let’s focus on how Amazon API Gateway can be used to throttle REST APIs with fine granularity using Usage Plans and API Keys. Usage Plans define the thresholds beyond which throttling should occur. They also enable quotas, which sets a maximum usage per a day, week, or month. API Keys are identifiers for distinguishing traffic and determining which Usage Plans to apply for each request. We limit the scope of our discussion to REST APIs because other protocols that API Gateway supports — WebSocket APIs and HTTP APIs — have different throttling mechanisms that do not employ Usage Plans or API Keys.

SaaS providers must balance minimizing cost to serve and providing consistent quality of service for all tenants. They also need to ensure one tenant’s activity does not affect the other tenants’ experience. Throttling and quotas are a key aspect of a tiering strategy and important for protecting your service at any scale. In practice, this impact of throttling polices and quota management is continuously monitored and evaluated as the tenant composition and behavior evolve over time.

Architecture Overview

Figure 1. Cloud Architecture of the sample code.

Figure 1 – Architecture of the sample code

To get a firm foundation of the basics of throttling and quotas with API Gateway, we’ve provided sample code in AWS-Samples on GitHub. Not only does it provide a starting point to experiment with Usage Plans and API Keys in the API Gateway, but we will modify this code later to address complexity that happens at scale. The sample code has two main parts: 1) a web frontend and, 2) a serverless backend. The backend is a serverless architecture using Amazon API Gateway, AWS Lambda, Amazon DynamoDB, and Amazon Cognito. As Figure I illustrates, it implements one REST API endpoint, GET /api, that is protected with throttling and quotas. There are additional APIs under the /admin/* resource to provide Read access to Usage Plans, and CRUD operations on API Keys.

All these REST endpoints could be tested with developer tools such as curl or Postman, but we’ve also provided a web application, to help you get started. The web application illustrates how tenants might interact with the SaaS application to browse different tiers of service, purchase API Keys, and test them. The web application is implemented in React and uses AWS Amplify CLI and SDKs.


To deploy the sample code, you should have the following prerequisites:

For clarity, we’ll use the environment variable, ${TOP}, to indicate the top-most directory in the cloned source code or the top directory in the project when browsing through GitHub.

Detailed instructions on how to install the code are in ${TOP}/INSTALL.md file in the code. After installation, follow the ${TOP}/WALKTHROUGH.md for step-by-step instructions to create a test key with a very small quota limit of 10 requests per day, and use the client to hit that limit. Search for HTTP 429: Too Many Requests as the signal your client has been throttled.

Figure 2: The web application (with browser developer tools enabled) shows that a quick succession of API calls starts returning an HTTP 429 after the quota for the day is exceeded.

Figure 2: The web application (with browser developer tools enabled) shows that a quick succession of API calls starts returning an HTTP 429 after the quota for the day is exceeded.

Responsibilities of the Client to support Throttling

The Client must provide an API Key in the header of the HTTP request, labelled, “X-Api-Key:”. If a resource in API Gateway has throttling enabled and that header is missing or invalid in the request, then API Gateway will reject the request.

Important: API Keys are simple identifiers, not authorization tokens or cryptographic keys. API keys are for throttling and managing quotas for tenants only and not suitable as a security mechanism. There are many ways to properly control access to a REST API in API Gateway, and we refer you to the AWS documentation for more details as that topic is beyond the scope of this post.

Clients should always test for the response to any network call, and implement logic specific to an HTTP 429 response. The correct action is almost always “try again later.” Just how much later, and how many times before giving up, is application dependent. Common approaches include:

  • Retry – With simple retry, client retries the request up to defined maximum retry limit configured
  • Exponential backoff – Exponential backoff uses progressively larger wait time between retries for consecutive errors. As the wait time can become very long quickly, maximum delay and a maximum retry limits should be specified.
  • Jitter – Jitter uses a random amount of delay between retry to prevent large bursts by spreading the request rate.

AWS SDK is an example client-responsibility implementation. Each AWS SDK implements automatic retry logic that uses a combination of retry, exponential backoff, jitter, and maximum retry limit.

SaaS Considerations: Tenant Isolation Strategies at Scale

While the sample code is a good start, the design has an implicit assumption that API Gateway will support as many API Keys as we have number of tenants. In fact, API Gateway has a quota on available per region per account. If the sample code’s requirements are to support more than 10,000 tenants (or if tenants are allowed multiple keys), then the sample implementation is not going to scale, and we need to consider more scalable implementation strategies.

This is one instance of a general challenge with SaaS called “tenant isolation strategies.” We highly recommend reviewing this white paper ‘SasS Tenant Isolation Strategies‘. A brief explanation here is that the one-resource-per-customer (or “siloed”) model is just one of many possible strategies to address tenant isolation. While the siloed model may be the easiest to implement and offers strong isolation, it offers no economy of scale, has high management complexity, and will quickly run into limits set by the underlying AWS Services. Other models besides siloed include pooling, and bridged models. Again, we recommend the whitepaper for more details.

Figure 3. Tiered multi-tenant architectures often employ different tenant isolation strategies at different tiers. Our example is specific to API Keys, but the technique generalizes to storage, compute, and other resources.

Figure 3- Tiered multi-tenant architectures often employ different tenant isolation strategies at different tiers. Our example is specific to API Keys, but the technique generalizes to storage, compute, and other resources.

In this example, we implement a range of tenant isolation strategies at different tiers of service. This allows us to protect against “noisy-neighbors” at the highest tier, minimize outlay of limited resources (namely, API-Keys) at the lowest tier, and still provide an effective, bounded “blast radius” of noisy neighbors at the mid-tier.

A concrete development example helps illustrate how this can be implemented. Assume three tiers of service: Free, Basic, and Premium. One could create a single API Key that is a pooled resource among all tenants in the Free Tier. At the other extreme, each Premium customer would get their own unique API Key. They would protect Premium tier tenants from the ‘noisy neighbor’ effect. In the middle, the Basic tenants would be evenly distributed across a set of fixed keys. This is not complete isolation for each tenant, but the impact of any one tenant is contained within “blast radius” defined.

In production, we recommend a more nuanced approach with additional considerations for monitoring and automation to continuously evaluate tiering strategy. We will revisit these topics in greater detail after considering the sample code.


In this post, we have reviewed how to effectively guard a tiered multi-tenant REST API hosted in Amazon API Gateway. We also explored how tiering and throttling strategies can influence tenant isolation models. In Part 2 of this blog series, we will dive deeper into tenant isolation models and gaining insights with metrics.

If you’d like to know more about the topic, the AWS Well-Architected SaaS Lens Performance Efficiency pillar dives deep on tenant tiers and providing differentiated levels of performance to each tier. It also provides best practices and resources to help you design and reduce impact of noisy neighbors your SaaS solution.

To learn more about Serverless SaaS architectures in general, we recommend the AWS Serverless SaaS Workshop and the SaaS Factory Serverless SaaS reference solution that inspired it.

ICYMI: Serverless Q1 2022

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-q1-2022/

Welcome to the 16th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!


In case you missed our last ICYMI, check out what happened last quarter here.

AWS Lambda

Lambda now offers larger ephemeral storage for functions, up to 10 GB. Previously, the storage was set to 512 MB. There are several common use-cases that can benefit from expanded temporary storage, including extract-transform load (ETL) jobs, machine learning inference, and data processing workloads. To see how to configure the amount of /tmp storage in AWS SAM, deploy this Serverless Land Pattern.

Ephemeral storage settings

For Node.js developers, Lambda now supports ES Modules and top-level await for Node.js 14. This enables developers to use a wider range of JavaScript packages in functions. With top-level await, when used with Provisioned Concurrency, this can improve cold-start performance when using asynchronous initialization.

For .NET developers, Lambda now supports .NET 6 as both a managed runtime and container base image. You can now use new features of the runtime such as improved logging, simplified function definitions using top-level statements, and improved performance using source generators.

The Lambda console now allows you to share test events with other developers in your team, using granular IAM permissions. Previously, test events were only visible to the builder who created them. To learn about creating sharable test events, read this documentation.

Amazon EventBridge

Amazon EventBridge Schema Registry helps you create code bindings from event schemas for use directly in your preferred IDE. You can generate these code bindings for a schema by using the EventBridge console, APIs, or AWS SDK toolkits for Jetbrains (Intellij, PyCharm, Webstorm, Rider) and VS Code. This feature now supports Go, in addition to Java, Python, and TypeScript, and is available at no additional cost.

AWS Step Functions

Developers can test state machines locally using Step Functions Local, and the service recently announced mocked service integrations for local testing. This allows you to define sample output from AWS service integrations and combine them into test cases to validate workflow control. This new feature introduces a robust way to state machines in isolation.

Amazon DynamoDB

Amazon DynamoDB now supports limiting the number of items processed in PartiQL operation, using an optional parameter on each request. The service also increased default Service Quotas, which can help simplify the use of large numbers of tables. The per-account, per-Region quota increased from 256 to 2,500 tables.

AWS AppSync

AWS AppSync added support for custom response headers, allowing you to define additional headers to send to clients in response to an API call. You can now use the new resolver utility $util.http.addResponseHeaders() to configure additional headers in the response for a GraphQL API operation.

Serverless blog posts


Jan 6 – Using Node.js ES modules and top-level await in AWS Lambda

Jan 6 – Validating addresses with AWS Lambda and the Amazon Location Service

Jan 20 – Introducing AWS Lambda batching controls for message broker services

Jan 24 – Migrating AWS Lambda functions to Arm-based AWS Graviton2 processors

Jan 31 – Using the circuit breaker pattern with AWS Step Functions and Amazon DynamoDB

Jan 31 – Mocking service integrations with AWS Step Functions Local


Feb 8 – Capturing client events using Amazon API Gateway and Amazon EventBridge

Feb 10 – Introducing AWS Virtual Waiting Room

Feb 14 – Building custom connectors using the Amazon AppFlow Custom Connector SDK

Feb 22 – Building TypeScript projects with AWS SAM CLI

Feb 24 – Introducing the .NET 6 runtime for AWS Lambda


Mar 6 – Migrating a monolithic .NET REST API to AWS Lambda

Mar 7 – Decoding protobuf messages using AWS Lambda

Mar 8 – Building a serverless image catalog with AWS Step Functions Workflow Studio

Mar 9 – Composing AWS Step Functions to abstract polling of asynchronous services

Mar 10 – Building serverless multi-Region WebSocket APIs

Mar 15 – Using organization IDs as principals in Lambda resource policies

Mar 16 – Implementing mutual TLS for Java-based AWS Lambda functions

Mar 21 – Running cross-account workflows with AWS Step Functions and Amazon API Gateway

Mar 22 – Sending events to Amazon EventBridge from AWS Organizations accounts

Mar 23 – Choosing the right solution for AWS Lambda external parameters

Mar 28 – Using larger ephemeral storage for AWS Lambda

Mar 29 – Using AWS Step Functions and Amazon DynamoDB for business rules orchestration

Mar 31 – Optimizing AWS Lambda function performance for Java

First anniversary of Serverless Land Patterns

Serverless Patterns Collection

The DA team launched the Serverless Patterns Collection in March 2021 as a repository of serverless examples that demonstrate integrating two or more AWS services. Each pattern uses an infrastructure as code (IaC) framework to automate the deployment. These can simplify the creation and configuration of the services used in your applications.

The Serverless Patterns Collection is both an educational resource to help developers understand how to join different services, and an aid for developers that are getting started with building serverless applications.

The collection has just celebrated its first anniversary. It now contains 239 patterns for CDK, AWS SAM, Serverless Framework, and Terraform, covering 30 AWS services. We have expanded example runtimes to include .NET, Java, Rust, Python, Node.js and TypeScript. We’ve served tens of thousands of developers in the first year and we’re just getting started.

Many thanks to our contributors and community. You can also contribute your own patterns.


YouTube: youtube.com/serverlessland

Serverless Office Hours – Tues 10 AM PT

Weekly live virtual office hours. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues. Ask us anything you want about serverless technologies and applications.

YouTube: youtube.com/serverlessland
Twitch: twitch.tv/aws




FooBar Serverless YouTube channel

The Developer Advocate team is delighted to welcome Marcia Villalba onboard. Marcia was an AWS Serverless Hero before joining AWS over two years ago, and she has created one of the most popular serverless YouTube channels. You can view all of Marcia’s videos at https://www.youtube.com/c/FooBar_codes.




AWS Summits

AWS Global Summits are free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. This year, we have restarted in-person Summits at major cities around the world.

The next 4 Summits planned are Paris (April 12), San Francisco (April 20-21), London (April 27), and Madrid (May 4-5). To find and register for your nearest AWS Summit, visit the AWS Summits homepage.

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

How Net at Work built an email threat report system on AWS

Post Syndicated from Florian Mair original https://aws.amazon.com/blogs/architecture/how-net-at-work-built-an-email-threat-report-system-on-aws/

Emails are often used as an entry point for malicious software like trojan horses, rootkits, or encryption-based ransomware. The NoSpamProxy offering developed by Net at Work tackles this threat, providing secure and confidential email communication.

A subservice of NoSpamProxy called 32guards is responsible for threat reports of inbound and outbound emails. With the increasing number of NoSpamProxy customers, 32guards was found to have several limitations. 32guards was previously built on a relational database. But with the growth in traffic, this database was not able to keep up with storage demands and expected query performance. Further, the relational database schema was limiting the possibilities of complex pattern detections, due to performance limitations. The NoSpamProxy team decided to rearchitect the service based on the Lake House approach.

The goal was to move away from a one-size-fits-all approach for data analytics and integrate a data lake with purpose-built data stores, unified governance, and smooth data movement.

This post shows how Net at Work modernized their 32guards service, from a relational database to a fully serverless analytics solution. With adoption of the Well-Architected Analytics Lens best practices and the use of fully managed services, the 32guards team was able to build a production-ready application within six weeks.

Architecture for email threat reports and analytics

This section gives a walkthrough of the solution’s architecture, as illustrated in Figure 1.

Figure 1. 32guards threat reports architecture

Figure 1. 32guards threat reports architecture

1. The entry point is an Amazon API Gateway, which receives email metadata in JSON format from the NoSpamProxy fleet. The message contains information about the email in general, email attachments, and URLs in the email. As an example, a subset of the data is presented in JSON as follows:

  "Attachments": [
      "Sha256Hash": "69FB43BD7CCFD79E162B638596402AD1144DD5D762DEC7433111FC88EDD483FE",
      "Classification": 0,
      "Filename": "test.ods.tar.gz",
      "DetectedMimeType": "application/tar+gzip",
      "Size": 5895
  "Urls": [
      "Url": "http://www.aarhhie.work/",
      "Classification": 0,
    },        {
      "Url": "http://www.netatwork.de/",
      "Classification": 0,
      "Url": "http://aws.amazon.com/",
      "Classification": 0,

2. This JSON message is forwarded to an AWS Lambda function (called “frontend”), which takes care of the further downstream processing. There are two activities the Lambda function initiates:

  • Forwarding the record for real-time analysis/storage
  • Generating a threat report based on the information derived from the data stored in the indicators of compromises (IOCs) Amazon DynamoDB table

IOCs are patterns within the email metadata that are used to determine if emails are safe or not. For example, this could be for a suspicious file attachment or domain.

Threat report for suspicious emails

In the preceding JSON message, the attachments and URLs have been classified with “0” by the email service itself, which indicates that none of them look suspicious. The frontend Lambda function uses the vast number of IOCs stored in the DynamoDB table and heuristics to determine any potential threats within the email. The use of DynamoDB enables fast lookup times to generate a threat report. For the example, the response to the API Gateway in step 2 looks like this:

  "ReportedOnUtc": "2021-10-14T14:33:34.5070945Z",
  "Reason": "realtimeSuspiciousOrganisationalDomain",
  "Identifier": "aarhhie.work",

This threat report shows that the top-level domain “aarhiie.work” has been detected as suspicious. The report is used to determine further actions for the email, such as blocking.

Real-time data processing

3. In the real-time analytics flow, the frontend Lambda function ingests email metadata into a data stream using Amazon Kinesis Data Streams. This is a massively scalable, serverless, and durable real-time data streaming service. Compared to a queue, streaming storage permits more than one consumer of the same data.

4. The first consumer is an Apache Flink application running in Amazon Kinesis Data Analytics. This application generates statistical metrics (for example, occurrences of the top-level domain “.work”). The output is stored in Apache Parquet format on Amazon S3. Parquet is a columnar storage format for row-based files like csv.

The second consumer of the streaming data is Amazon Kinesis Data Firehose. Kinesis Data Firehose is a fully managed solution to reliably load streaming data into data lakes, data stores, and analytics services. Within the 32guards service, Kinesis Data Firehose is used to store all email metadata into Amazon S3. The data is stored in Apache Parquet format, which makes queries more time and cost efficient.

IOC detection

Now that we have shown how data is ingested and threat reports are generated to respond quickly to requests, let’s look at how the IOCs are updated. These IOCs are used for generating the threat report within the “frontend” Lambda function. As attack vectors are changing over time, quickly analyzing the data for new threats, is crucial to provide high-quality reports to the NoSpamProxy service.

The incoming email metadata is stored every few minutes in Amazon S3 by Kinesis Data Firehose. To query data directly in Amazon S3, Amazon Athena is used. Athena is a serverless query service that analyzes data stored in Amazon S3, by using standard SQL syntax.

5. To be able to query data in S3, Amazon Athena uses the AWS Glue Data Catalog, which contains the structure of the email metadata stored in the data lake. The data structure is derived from the data itself using AWS Glue Crawlers. Other external downstream processing services like business intelligence applications, also use Amazon Athena to consume the data.

6. Athena queries are initiated on a predefined schedule to update or generate new IOCs. The results of these queries are stored in the DynamoDB table to enable fast lookup times for the “frontend” Lambda.


In this blog post, we showed how Net at Work modernized their 32guards service within their NoSpamProxy product. The previous architecture used a relational database to ingest and store email metadata. This database was running into performance and storage issues, and must be redesigned into a more performant and scalable architecture.

Amazon S3 is used as the storage layer, which can scale up to exabytes of data. With Amazon Athena as the query engine, there is no need to operate a high-performance database cluster, as compute and storage is separated. By using Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics, valuable insight can be generated in real time, and acted upon more quickly.

As a serverless, fully managed solution, the 32guards service has a lower-cost footprint of as much as 50% and requires less maintenance. By moving away from a relational database model, the query runtimes decrease significantly. You can now conduct analyses that have not been feasible before.

Interested in the NoSpamProxy? Read more about NoSpamProxy or sign up for a free trial.

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Using AWS Step Functions and Amazon DynamoDB for business rules orchestration

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-aws-step-functions-and-amazon-dynamodb-for-business-rules-orchestration/

This post is written by Vijaykumar Pannirselvam, Cloud Consultant, Sushant Patil, Cloud Consultant, and Kishore Dhamodaran, Senior Solution Architect.

A Business Rules Engine (BRE) is used in enterprises to manage business-critical decisions. The logic or rules used to make such decisions can vary in complexity. A finance department may have a basic rule to get any purchase over a certain dollar amount to get director approval. A mortgage company may need to run complex rules based on inputs (for example, credit score, debt-to-income ratio, down payment) to make an approval decision for a loan.

Decoupling these rules from application logic provides agility to your rules management, since business rules may often change while your application may not. It can also provide standardization across your enterprise, so every department can communicate with the same taxonomy.

As part of migrating their workloads, some enterprises consider replacing their commercial rules engine with cloud native and open-source alternatives. The motivation for such a move stem from several factors, such as simplifying the architecture, cost, security considerations, or vendor support.

Many of these commercial rules engines come as part of a BPMS offering that provides orchestration capabilities for rules execution. For a successful migration to cloud using an open-source rules engine management system, you need an orchestration capability to manage incoming rule requests, auditing the rules, and tracking exceptions.

This post showcases an orchestration framework that allows you to use an open-source rules engine. It uses Drools rules engine to build a set of rules for calculating insurance premiums based on the properties of Car and Person objects. This uses AWS Step Functions, AWS Lambda, Amazon API Gateway, Amazon DynamoDB, and open-source Drools rules engine to show this. You can swap the rules engine provided you can manage it in the AWS Cloud environment and expose it as an API.

Solution overview

The following diagram shows the solution architecture.

Solution architecture

The solution comprises:

  1. API Gateway – a fully managed service that makes it easier to create, publish, maintain, monitor, and secure APIs at any scale for API consumers. API Gateway helps you manage traffic to backend systems, in this case Step Functions, which orchestrates the execution of tasks. For the REST API use-case, you can also set up a cache with customizable keys and time-to-live in seconds for your API data to avoid hitting your backend services for each request.
  2. Step Functions – a low code service to orchestrate multiple steps involved to accomplish tasks. Step Functions uses the finite-state machine (FSM) model, which uses given states and transitions to complete the tasks. The diagram depicts three states: Audit Request, Execute Ruleset and Audit Response. We execute them sequentially. You can add additional states and transitions, such as validating incoming payloads, and branching out parallel execution of the states.
  3. Drools rules engine Spring Boot application – runtime component of the rule execution. You set the Drools rule engine Spring Boot application as an Apache Maven Docker project with Drools Maven dependencies. You then deploy the Drools rule engine Docker image to an Amazon Elastic Container Registry (Amazon ECR), create an AWS Fargate cluster, and an Amazon Elastic Container Service (Amazon ECS) service. The service launches Amazon ECS tasks and maintains the desired count. An Application Load Balancer distributes the traffic evenly to all running containers.
  4. Lambda – a serverless execution environment giving you an ability to interact with the Drools Engine and a persistence layer for rule execution audit functions. The Lambda component provides the audit function required to persist the incoming requests and outgoing responses in DynamoDB. Apart from the audit function, Lambda is also used to invoke the service exposed by the Drools Spring Boot application.
  5. DynamoDB – a fully managed and highly scalable key/value store, to persist the rule execution information, such as request and response payload information. DynamoDB provides the persistence layer for the incoming request JSON payload and for the outgoing response JSON payload. The audit Lambda function invokes the DynamoDB put_item() method when it receives the request or response event from Step Functions. The DynamoDB table rule_execution_audit has an entry for every request and response associated with the incoming request-id originated by the application (upstream).

Drools rules engine implementation

The Drools rules engine separates the business rules from the business processes. You use DRL (Drools Rule Language) by defining business rules as .drl text files. You define model objects to build the rules.

The model objects are POJO (Plain Old Java Objects) defined using Eclipse, with the Drools plugin installed. You should have some level of knowledge about building rules and executing them using the Drools rules engine. The below diagram describes the functions of this component.

Drools process

You define the following rules in the .drl file as part of the GitHub repo. The purpose of these rules is to evaluate the driver premium based on the input model objects provided as input. The inputs are Car and Driver objects and output is the Policy object, which has the premium calculated based on the certain criteria defined in the rule:

rule "High Risk"
         $car : Car(style == "SPORTS", color == "RED") 
         $policy : Policy() 
         and $driver : Driver ( age < 21 )                             
         System.out.println(drools.getRule().getName() +": rule fired");          
         modify ($policy) { setPremium(increasePremiumRate($policy, 20)) };
 rule "Med Risk"
         $car : Car(style == "SPORTS", color == "RED") 
         $policy : Policy() 
         and $driver : Driver ( age > 21 )                             
         System.out.println(drools.getRule().getName() +": rule fired");          
         modify ($policy) { setPremium(increasePremiumRate($policy, 10)) };
 function double increasePremiumRate(Policy pol, double percentage) {
     return (pol.getPremium() + pol.getPremium() * percentage / 100);

Once the rules are defined, you define a RestController that takes input parameters and evaluates the above rules. The below code snippet is a POST method defined in the controller, which handles the requests and sends the response to the caller.

@PostMapping(value ="/policy/premium", consumes = {MediaType.APPLICATION_JSON_VALUE, MediaType.APPLICATION_XML_VALUE }, produces = {MediaType.APPLICATION_JSON_VALUE, MediaType.APPLICATION_XML_VALUE})
    public ResponseEntity<Policy> getPremium(@RequestBody InsuranceRequest requestObj) {
        System.out.println("handling request...");
        Car carObj = requestObj.getCar();        
        Car carObj1 = new Car(carObj.getMake(),carObj.getModel(),carObj.getYear(), carObj.getStyle(), carObj.getColor());
        Policy policyObj = requestObj.getPolicy();
        Policy policyObj1 = new Policy(policyObj.getId(), policyObj.getPremium());
        Driver driverObj = requestObj.getDriver();
        Driver driverObj1 = new Driver( driverObj.getAge(), driverObj.getName());
        KieSession kieSession = kieContainer.newKieSession();
        return ResponseEntity.ok(policyObj1);


Solution walkthrough

  1. Clone the project GitHub repository to your local machine, do a Maven build, and create a Docker image. The project contains Drools related folders needed to build the Java application.
    git clone https://github.com/aws-samples/aws-step-functions-business-rules-orchestration
    cd drools-spring-boot
    mvn clean install
    mvn docker:build
  2. Create an Amazon ECR private repository to host your Docker image.
    aws ecr create-repository —repository-name drools_private_repo —image-tag-mutability MUTABLE —image-scanning-configuration scanOnPush=false
  3. Tag the Docker image and push it to the Amazon ECR repository.
    aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <<INSERT ACCOUNT NUMBER>>.dkr.ecr.us-east-1.amazonaws.com
    docker tag drools-rule-app:latest <<INSERT ACCOUNT NUMBER>>.dkr.ecr.us-east-1.amazonaws.com/drools_private_repo:latest
    docker push <<INSERT ACCOUNT NUMBER>>.dkr.ecr.us-east-1.amazonaws.com/drools_private_repo:latest
  4. Deploy resources using AWS SAM:
    cd ..
    sam build
    sam deploy --guided

    SAM deployment output

Verifying the deployment

Verify the business rules execution and the orchestration components:

  1. Navigate to the API Gateway console, and choose the rules-stack API.
    API Gateway console
  2. Under Resources, choose POST, followed by TEST.
    Resource configuration
  3. Enter the following JSON under the Request Body section, and choose Test.

      "context": {
        "request_id": "REQ-99999",
        "timestamp": "2021-03-17 03:31:51:40"
      "request": {
        "driver": {
          "age": "18",
          "name": "Brian"
        "car": {
          "make": "honda",
          "model": "civic",
          "year": "2015",
          "style": "SPORTS",
          "color": "RED"
        "policy": {
          "id": "1231231",
          "premium": "300"
  4. The response received shows results from the evaluation of the business rule “High Risk“, with the premium representing the percentage calculation in the rule definition. Try changing the request input to evaluate a “Medium Risk” rule by modifying the age of the driver to 22 or higher:
    Sample response
  5. Optionally, you can verify the API using Postman. Get the endpoint information by navigating to the rule-stack API, followed by Stages in the navigation pane, then choosing either Dev or Stage.
  6. Enter the payload in the request body and choose Send:
    Postman UI
  7. The response received is results from the evaluation of business rule “High Risk“, with the premium representing the percentage calculation in the rule definition. Try changing the request input to evaluate a “Medium Risk” rule by modifying the age of the driver to 22 or higher.
    Body JSON
  8. Observe the request and response audit logs. Navigate to the DynamoDB console. Under the navigation pane, choose Tables, then choose rule_execution_audit.
    DynamoDB console
  9. Under the Tables section in the navigation pane, choose Explore Items. Observe the individual audit logs by choosing the audit_id.
    Table audit item

Cleaning up

To avoid incurring ongoing charges, clean up the infrastructure by deleting the stack using the following command:

sam delete SAM confirmations

Delete the Amazon ECR repository, and any other resources you created as a prerequisite for this exercise.


In this post, you learned how to leverage an orchestration framework using Step Functions, Lambda, DynamoDB, and API Gateway to build an API backed by an open-source Drools rules engine, running on a container. Try this solution for your cloud native business rules orchestration use-case.

For more serverless learning resources, visit Serverless Land.

Choosing the right solution for AWS Lambda external parameters

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/choosing-the-right-solution-for-aws-lambda-external-parameters/

This post is written by Thomas Moore, Solutions Architect, Serverless.

When using AWS Lambda to build serverless applications, customers often need to retrieve parameters from an external source at runtime. This allows you to share parameter values across multiple functions or microservices, providing a single source of truth for updates. A common example is retrieving database connection details from an external source and then using the retrieved hostname, user name, and password to connect to the database:

Lambda function retrieving database credentials from an external source

Lambda function retrieving database credentials from an external source

AWS provides a number of options to store parameter data, including AWS Systems Manager Parameter Store, AWS AppConfig, Amazon S3, and Lambda environment variables. This blog explores the different parameter data that you may need to store. I cover considerations for choosing the right parameter solution and how to retrieve and cache parameter data efficiently within the Lambda function execution environment.

Common use cases

Common parameter examples include:

  • Securely storing secret data, such as credentials or API keys.
  • Database connection details such as hostname, port, and credentials.
  • Schema data (for example, a structured JSON response).
  • TLS certificate for mTLS or JWT validation.
  • Email template.
  • Tenant configuration in a multitenant system.
  • Details of external AWS resources to communicate with such as an Amazon SQS queue URL, Amazon EventBridge event bus name, or AWS Step Functions ARN.

Key considerations

There are a number of key considerations when choosing the right solution for external parameter data.

  1. Cost – how much does it cost to store the data and retrieve it via an API call?
  2. Security – what encryption and fine-grained access control is required?
  3. Performance – what are the retrieval latency requirements?
  4. Data size – how much data is there to store and retrieve?
  5. Update frequency – how often does the parameter change and how does the function handle stale parameters?
  6. Access scope – do multiple functions or services access the parameter?

These considerations help to determine where to store the parameter data and how often to retrieve it.

For example, a 4KB parameter that updates hourly and is used by hundreds of functions needs to be optimized for low retrieval costs and high performance. Choosing a solution that supports low-cost API GET requests at a high transaction per second (TPS) would be better than one that supports large data.

AWS service options

There are a number of AWS services available to store external parameter data.

Amazon S3

S3 is an object storage service offering 99.999999999% (11 9s) of data durability and virtually unlimited scalability at low cost. Objects can be up to 5 TB in size in any format, making S3 a good solution to store larger parameter data.

Amazon DynamoDB

Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed for single-digit millisecond performance at any scale. Due to the high performance of this service, it’s a great place to store parameters when low retrieval latency is important.

AWS Secrets Manager

AWS Secrets Manager makes it easier to rotate, manage, and retrieve secret data. This makes it the ideal place to store sensitive parameters such as passwords and API keys.

AWS Systems Manager Parameter Store

Parameter Store provides a centralized store to manage configuration data. This data can be plaintext or encrypted using AWS Key Management Service (KMS). Parameters can be tagged and organized into hierarchies for simpler management. Parameter Store is a good default choice for general-purpose parameters in AWS. The standard version (no additional charge) can store parameters up to 4 KB in size and the advanced version (additional charges apply) up to 8 KB.

For a code example using Parameter Store for Lambda parameters, see the Serverless Land pattern.

AWS AppConfig

AppConfig is a capability of AWS Systems Manager to create, manage, and quickly deploy application configurations. AppConfig allows you to validate changes during roll-outs and automatically roll back, if there is an error. AppConfig deployment strategies help to manage configuration changes safely.

AppConfig also provides a Lambda extension to retrieve and locally cache configuration data. This results in fewer API calls and reduced function duration, reducing costs.

AWS Lambda environment variables

You can store parameter data as Lambda environment variables as part of the function’s version-specific configuration. Lambda environment variables are stored during function creation or updates. You can access these variables directly from your code without needing to contact an external source. Environment variables are ideal for parameter values that don’t need updating regularly and help make function code reusable across different environments. However, unlike the other options, values cannot be accessed centrally by multiple functions or services.

Lambda execution lifecycle

It is worth understanding the Lambda execution lifecycle, which has a number of stages. This helps to decide when to handle parameter retrieval within your Lambda code, including cache management.

Lambda execution lifecycle

Lambda execution lifecycle

When a Lambda function is invoked for the first time, or when Lambda is scaling to handle additional requests, an execution environment is created. The first phase in the execution environment’s lifecycle is initialization (Init), during which the code outside the main handler function runs. This is known as a cold start.

The execution environment can then be re-used for subsequent invocations. This means that the Init phase does not need to run again and only the main handler function code runs. This is known as a warm start.

An execution environment can only run a single invocation at a time. Concurrent invocations require additional execution environments. When a new execution environment is required, this starts a new Init phase, which runs the cold start process.

Caching and updates

Retrieving the parameter during Init

Retrieving the parameter during Init

Retrieving the parameter during Init

As Lambda execution environments are re-used, you can improve the performance and reduce the cost of retrieving an external parameter by caching the value. Writing the value to memory or the Lambda /tmp file system allows it to be available during subsequent invokes in the same execution environment.

This approach reduces API calls, as they are not made during every invocation. However, this can cause an out-of-date parameter and potentially different values across concurrent execution environments.

The following Python example shows how to retrieve a Parameter Store value outside the Lambda handler function during the Init phase.

import boto3
ssm = boto3.client('ssm', region_name='eu-west-1')
parameter = ssm.get_parameter(Name='/my/parameter')
def lambda_handler(event, context):
    # My function code...

Retrieving the parameter on every invocation

Retrieving the parameter on every invocation

Retrieving the parameter on every invocation

Another option is to retrieve the parameter during every invocation by making the API call inside the handler code. This keeps the value up to date, but can lead to higher retrieval costs and longer function durations due to the added API call during every invocation.

The following Python example shows this approach:

import boto3
ssm = boto3.client('ssm', region_name='eu-west-1')
def lambda_handler(event, context):
    parameter = ssm.get_parameter(Name='/my/parameter')
    # My function code...

Using AWS AppConfig Lambda extension

Using AWS AppConfig Lambda extension

Using AWS AppConfig Lambda extension

AppConfig allows you to retrieve and cache values from the service using a Lambda extension. The extension retrieves the values and makes them available via a local HTTP server. The Lambda function then queries the local HTTP server for the value. The AppConfig extension refreshes the values at a configurable poll interval, which defaults to 45 seconds. This improves performance and reduces costs, as the function only needs to make a local HTTP call.

The following Python code example shows how to access the cached parameters.

import urllib.request
def lambda_handler(event, context):
    url = f'http://localhost:2772/applications/application_name/environments/environment_name/configurations/configuration_name'
    config = urllib.request.urlopen(url).read()
    # My function code...

For caching secret values using a Lambda extension local HTTP cache and AWS Secrets Manager, see the AWS Prescriptive Guidance documentation.

Using Lambda Powertools for Python or Java

Lambda Powertools for Python or Lambda Powertools for Java contains utilities to manage parameter caching. You can configure the cache interval, which defaults to 5 seconds. Supported parameter stores include Secrets Manager, AWS Systems Manager Parameter Store, AppConfig, and DynamoDB. You also have the option to bring your own provider. The following example shows the Powertools for Python parameters utility retrieving a single value from Systems Manager Parameter Store.

from aws_lambda_powertools.utilities import parameters
def handler(event, context):
    value = parameters.get_parameter("/my/parameter")
    # My function code…


Parameter security is a key consideration. You should evaluate encryption at rest, in-transit, private network access, and fine-grained permissions for each external parameter solution based on the use case.

All services highlighted in this post support server-side encryption at rest, and you can choose to use AWS KMS to manage your own keys. When accessing parameters using the AWS SDK and CLI tools, connections are encrypted in transit using TLS by default. You can force most to use TLS 1.2.

To access parameters from inside an Amazon Virtual Private Cloud (Amazon VPC) without internet access, you can use AWS PrivateLink and create a VPC endpoint for each service. All the services mentioned in this post support AWS PrivateLink connections.

Use AWS Identity and Access Management (IAM) policies to manage which users or roles can access specific parameters.

General guidance

This blog explores a number of considerations to make when using an external source for Lambda parameters. The correct solution is use-case dependent. There are some general guidelines when selecting an AWS service.

  • For general-purpose low-cost parameters, use AWS Systems Manager Parameter Store.
  • For single function, small parameters, use Lambda environment variables.
  • For secret values that require automatic rotation, use AWS Secrets Manager.
  • When you need a managed cache, use the AWS AppConfig Lambda extension or Lambda Powertools for Python/Java.
  • For items larger than 400 KB, use Amazon S3.
  • When access frequency is high, and low latency is required, use Amazon DynamoDB.


External parameters provide a central source of truth across distributed systems, allowing for efficient updates and code reuse. This blog post highlights a number of considerations when using external parameters with Lambda to help you choose the most appropriate solution for your use case.

Consider how you cache and reuse parameters inside the Lambda execution environment. Doing this correctly can help you reduce costs and improve the performance of your Lambda functions.

There are a number of services to choose from to store parameter data. These include DynamoDB, S3, Parameter Store, Secrets Manager, AppConfig, and Lambda environment variables. Each comes with a number of advantages, depending on the use case. This blog guidance, along with the AWS documentation and Service Quotas, can help you select the most appropriate service for your workload.

For more serverless learning resources, visit Serverless Land.

Mainframe offloading and modernization: Using mainframe data to build cloud native services with AWS

Post Syndicated from Malathi Pinnamaneni original https://aws.amazon.com/blogs/architecture/mainframe-offloading-and-modernization-using-mainframe-data-to-build-cloud-native-services-with-aws/

Many companies in the financial services and insurance industries rely on mainframes for their most business-critical applications and data. But mainframe workloads typically lack agility. This is one reason that organizations struggle to innovate, iterate, and pivot quickly to develop new applications or release new capabilities. Unlocking this mainframe data can be the first step in your modernization journey.

In this blog post, we will discuss some typical offloading patterns. Whether your goal is developing new applications using mainframe data or modernizing with the Strangler Fig Application pattern, you might want some guidance on how to begin.

Refactoring mainframe applications to the cloud

Refactoring mainframe applications to cloud-native services on AWS is a common industry pattern and a long-term goal for many companies to remain competitive. But this takes an investment of time, money, and organizational change management to realize the full benefits. We see customers start their modernization journey by offloading data from the mainframe to AWS to reduce risks and create new capabilities.

The mainframe data offloading patterns that we will discuss in this post use software services that facilitate data replication to Amazon Web Services (AWS):

  • File-based data synchronization
  • Change data capture
  • Event-sourced replication

Once data is liberated from the mainframe, you can develop new agile applications for deeper insights using analytics and machine learning (ML). You could create a microservices-based, or voice-based mobile application. For example, if a bank could access their historical mainframe data to analyze customer behavior, they could develop a new solution based on profiles to use for loan recommendations.

The patterns we illustrate can be used as a reference to begin your modernization efforts with reduced risk. The long-term goal is to rewrite the mainframe applications and modernize them workload by workload.

Solution overview: Mainframe offloading and modernization

This figure shows the flow of data being replicated from mainframe using integration services and consumed in AWS

Figure 1. Mainframe offloading and modernization conceptual flow

Mainframe modernization: Architecture reference patterns

File-based batch integration

Modernization scenarios often require replicating files to AWS, or synchronizing between on-premises and AWS. Use cases include:

  • Analyzing current and historical data to enhance business analytics
  • Providing data for further processing on downstream or upstream dependent systems. This is necessary for exchanging data between applications running on the mainframe and applications running on AWS
This diagram shows a file-based integration pattern on how data can be replicated to AWS for interactive data analytics

Figure 2. File-based batch ingestion pattern for interactive data analytics

File-based batch integration – Batch ingestion for interactive data analytics (Figure 2)

  1. Data ingestion. In this example, we show how data can be ingested to Amazon S3 using AWS Transfer Family Services or AWS DataSync. Mainframe data is typically encoded in extended binary-coded decimal interchange code (EBCDIC) format. Prescriptive guidance exists to convert EBCDIC to ASCII format.
  2. Data transformation. Before moving data to AWS data stores, transformation of the data may be necessary to use it for analytics. AWS analytics services like AWS Glue and AWS Lambda can be used to transform the data. For large volume processing, use Apache Spark on AWS Elastic Map Reduce (Amazon EMR), or a custom Spring Boot application running on Amazon EC2 to perform these transformations. This process can be orchestrated using AWS Step Functions or AWS Data Pipeline.
  3. Data store. Data is transformed into a consumable format that can be stored in Amazon S3.
  4. Data consumption. You can use AWS analytics services like Amazon Athena for interactive ad-hoc query access, Amazon QuickSight for analytics, and Amazon Redshift for complex reporting and aggregations.
This diagram shows a file-based integration pattern on how data can be replicated to AWS for further processing by downstream systems

Figure 3. File upload to operational data stores for further processing

File-based batch integration – File upload to operational data stores for further processing (Figure 3)

  1. Using AWS File Transfer Services, upload CSV files to Amazon S3.
  2. Once the files are uploaded, S3’s event notification can invoke AWS Lambda function to load to Amazon Aurora. For low latency data access requirements, you can use a scalable serverless import pattern with AWS Lambda and Amazon SQS to load into Amazon DynamoDB.
  3. Once the data is in data stores, it can be consumed for further processing.

Transactional replication-based integration (Figure 4)

Several modernization scenarios require continuous near-real-time replication of relational data to keep a copy of the data in the cloud. Change Data Capture (CDC) for near-real-time transactional replication works by capturing change log activity to drive changes in the target dataset. Use cases include:

  • Command Query Responsibility Segregation (CQRS) architectures that use AWS to service all read-only and retrieve functions
  • On-premises systems with tightly coupled applications that require a phased modernization
  • Real-time operational analytics
This diagram shows a transaction-based replication (CDC) integration pattern on how data can be replicated to AWS for building reporting and read-only functions

Figure 4. Transactional replication (CDC) pattern

  1. Partner CDC tools in the AWS Marketplace can be used to manage real-time data movement between the mainframe and AWS.
  2. You can use a fan-out pattern to read once from the mainframe to reduce processing requirements and replicate data to multiple data stores based on your requirements:
    • For low latency requirements, replicate to Amazon Kinesis Data Streams and use AWS Lambda to store in Amazon DynamoDB.
    • For critical business functionality with complex logic, use Amazon Aurora or Amazon Relational Database Service (RDS) as targets.
    • To build data lake or use as an intermediary for ETL processing, customers can replicate to S3 as target.
  3. Once the data is in AWS, customers can build agile microservices for read-only functions.

Message-oriented middleware (event sourcing) integration (Figure 5)

With message-oriented middleware (MOM) systems like IBM MQ on mainframe, several modernization scenarios require integrating with cloud-based streaming and messaging services. These act as a buffer to keep your data in sync. Use cases include:

  • Consume data from AWS data stores to enable new communication channels. Examples of new channels can be mobile or voice-based applications and can be innovations based on ML
  • Migrate the producer (senders) and consumer (receivers) applications communicating with on-premises MOM platforms to AWS with an end goal to retire on-premises MOM platform
This diagram shows an event-sourcing integration reference pattern for customers using middleware systems like IBM MQ on-premises with AWS services

Figure 5. Event-sourcing integration pattern

  1. Mainframe transactions from IBM MQ can be read using a connector or a bridge solution. They can then be published to Amazon MQ queues or Amazon Managed Streaming for Apache Kakfa (MSK) topics.
  2. Once the data is published to the queue or topic, consumers encoded in AWS Lambda functions or Amazon compute services can process, map, transform, or filter the messages. They can store the data in Amazon RDS, Amazon ElastiCache, S3, or DynamoDB.
  3. Now that the data resides in AWS, you can build new cloud-native applications and do the following:


Mainframe offloading and modernization using AWS services enables you to reduce cost, modernize your architectures, and integrate your mainframe and cloud-native technologies. You’ll be able to inform your business decisions with improved analytics, and create new opportunities for innovation and the development of modern applications.

More posts for Women’s History Month!

Other ways to participate