Tag Archives: Amazon API Gateway

Implementing header-based API Gateway versioning with Amazon CloudFront

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/implementing-header-based-api-gateway-versioning-with-amazon-cloudfront/

This post is written by Amir Khairalomoum, Sr. Solutions Architect.

In this blog post, I show you how to use [email protected] feature of Amazon CloudFront to implement a header-based API versioning solution for Amazon API Gateway.

Amazon API Gateway is a fully managed service that makes it easier for developers to create, publish, maintain, monitor, and secure APIs at any scale. Amazon CloudFront is a global content delivery network (CDN) service built for high-speed, low-latency performance, security, and developer ease-of-use. [email protected] is a feature of Amazon CloudFront, a compute service that lets you run functions that customize the content that CloudFront delivers.

The example uses the AWS SAM CLI to build, deploy, and test the solution on AWS. The AWS Serverless Application Model (AWS SAM) is an open-source framework that you can use to build serverless applications on AWS. The AWS SAM CLI lets you locally build, test, and debug your applications defined by AWS SAM templates. You can also use the AWS SAM CLI to deploy your applications to AWS, or create secure continuous integration and deployment (CI/CD) pipelines.

After an API becomes publicly available, it is used by customers. As a service evolves, its contract also evolves to reflect new changes and capabilities. It’s safe to evolve a public API by adding new features but it’s not safe to change or remove existing features.

Any breaking changes may impact consumer’s applications and break them at runtime. API versioning is important to avoid breaking backward compatibility and breaking a contract. You need a clear strategy for API versioning to help consumers adopt them.

Versioning APIs

Two of the most commonly used API versioning strategies are URI versioning and header-based versioning.

URI versioning

This strategy is the most straightforward and the most commonly used approach. In this type of versioning, versions are explicitly defined as part of API URIs. These example URLs show how domain name, path, or query string parameters can be used to specify a version:


To deploy an API in API Gateway, the deployment is associated with a stage. A stage is a logical reference to a lifecycle state of your API (for example, dev, prod, beta, v2). As your API evolves, you can continue to deploy it to different stages as different versions of the API.

Header-based versioning

This strategy is another commonly used versioning approach. It uses HTTP headers to specify the desired version. It uses the “Accept” header for content negotiation or uses a custom header (for example, “APIVER” to indicate a version):


This approach allows you to preserve URIs between versions. As a result, you have a cleaner and more understandable set of URLs. It is also easier to add versioning after design. However, you may need to deal with complexity of returning different versions of your resources.

Overview of solution

The target architecture for the solution uses [email protected]. It dynamically routes a request to the relevant API version, based on the provided header:

Architecture overview

Architecture overview

In this architecture:

  1. The user sends a request with a relevant header, which can be either “Accept” or another custom header.
  2. This request reaches the CloudFront distribution and triggers the [email protected] Origin Request.
  3. The [email protected] function uses the provided header value and fetches data from an Amazon DynamoDB table. This table contains mappings for API versions. The function then modifies the Origin and the Host header of the request and returns it back to CloudFront.
  4. CloudFront sends the request to the relevant Amazon API Gateway URL.

In the next sections, I walk you through setting up the development environment and deploying and testing this solution.

Setting up the development environment

To deploy this solution on AWS, you use the AWS Cloud9 development environment.

  1. Go to the AWS Cloud9 web console. In the Region dropdown, make sure you’re using N. Virginia (us-east-1) Region.
  2. Select Create environment.
  3. On Step 1 – Name environment, enter a name for the environment, and choose Next step.
  4. On Step 2 – Configure settings, keep the existing environment settings.

    Console view of configuration settings

    Console view of configuration settings

  5. Choose Next step. Choose Create environment.

Deploying the solution

Now that the development environment is ready, you can proceed with the solution deployment. In this section, you download, build, and deploy a sample serverless application for the solution using AWS SAM.

Download the sample serverless application

The solution sample code is available on GitHub. Clone the repository and download the sample source code to your Cloud9 IDE environment by running the following command in the Cloud9 terminal window:

git clone https://github.com/aws-samples/amazon-api-gateway-header-based-versioning.git ./api-gateway-header-based-versioning

This sample includes:

  • template.yaml: Contains the AWS SAM template that defines your application’s AWS resources.
  • hello-world/: Contains the Lambda handler logic behind the API Gateway endpoints to return the hello world message.
  • edge-origin-request/: Contains the [email protected] handler logic to query the API version mapping and modify the Origin and the Host header of the request.
  • init-db/: Contains the Lambda handler logic for a custom resource to populate sample DynamoDB table

Build your application

Run the following commands in order to first, change into the project directory, where the template.yaml file for the sample application is located then build your application:

cd ~/environment/api-gateway-header-based-versioning/
sam build


Build output

Build output

Deploy your application

Run the following command to deploy the application in guided mode for the first time then follow the on-screen prompts:

sam deploy --guided


Deploy output

Deploy output

The output shows the deployment of the AWS CloudFormation stack.

Testing the solution

This application implements all required components for the solution. It consists of two Amazon API Gateway endpoints backed by AWS Lambda functions. The deployment process also initializes the API Version Mapping DynamoDB table with the values provided earlier in the deployment process.

Run the following commands to see the created mappings:

STACK_NAME=$(grep stack_name ~/environment/api-gateway-header-based-versioning/samconfig.toml | awk -F\= '{gsub(/"/, "", $2); gsub(/ /, "", $2); print $2}')

DDB_TBL_NAME=$(aws cloudformation describe-stacks --region us-east-1 --stack-name $STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`DynamoDBTableName`].OutputValue' --output text) && echo $DDB_TBL_NAME

aws dynamodb scan --table-name $DDB_TBL_NAME


Table scan results

Table scan results

When a user sends a GET request to CloudFront, it routes the request to the relevant API Gateway endpoint version according to the provided header value. The Lambda function behind that API Gateway endpoint is invoked and returns a “hello world” message.

To send a request to the CloudFront distribution, which is created as part of the deployment process, first get its domain name from the deployed AWS CloudFormation stack:

CF_DISTRIBUTION=$(aws cloudformation describe-stacks --region us-east-1 --stack-name $STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`CFDistribution`].OutputValue' --output text) && echo $CF_DISTRIBUTION


Domain name results

Domain name results

You can now send a GET request along with the relevant header you specified during the deployment process to the CloudFront to test the application.

Run the following command to test the application for API version one. Note that if you entered a different value other than the default value provided during the deployment process, change the --header parameter to match your inputs:

curl -i -o - --silent -X GET "https://${CF_DISTRIBUTION}/hello" --header "Accept:application/vnd.example.v1+json" && echo


Curl results

Curl results

The response shows that CloudFront successfully routed the request to the API Gateway v1 endpoint as defined in the mapping Amazon DynamoDB table. API Gateway v1 endpoint received the request. The Lambda function behind the API Gateway v1 was invoked and returned a “hello world” message.

Now you can change the header value to v2 and run the command again this time to test the API version two:

curl -i -o - --silent -X GET "https://${CF_DISTRIBUTION}/hello" --header "Accept:application/vnd.example.v2+json" && echo


Curl results after header change

Curl results after header change

The response shows that CloudFront routed the request to the API Gateway v2 endpoint as defined in the mapping DynamoDB table. API Gateway v2 endpoint received the request. The Lambda function behind the API Gateway v2 was invoked and returned a “hello world” message.

This solution requires valid a header value on each individual request, so the application checks and raises an error if the header is missing or the header value is not valid.

You can remove the header parameter and run the command to test this scenario:

curl -i -o - --silent -X GET "https://${CF_DISTRIBUTION}/hello" && echo


No header causes a 403 error

No header causes a 403 error

The response shows that [email protected] validated the request and raised an error to inform us that the request did not have a valid header.

Mitigating latency

In this solution, [email protected] reads the API version mappings data from the DynamoDB table. Accessing external data at the edge can cause additional latency to the request. In order to mitigate the latency, solution uses following methods:

  1. Cache data in [email protected] memory: As data is unlikely to change across many [email protected] invocations, [email protected] caches API version mappings data in the memory for a certain period of time. It reduces latency by avoiding an external network call for each individual request.
  2. Use Amazon DynamoDB global table: It brings data closer to the CloudFront distribution and reduces external network call latency.

Cleaning up

To clean up the resources provisioned as part of the solution:

  1. Run following command to delete the deployed application:
    sam delete
  2. Go to the AWS Cloud9 web console. Select the environment you created then choose Delete.


Header-based API versioning is a commonly used versioning strategy. This post shows how to use CloudFront to implement a header-based API versioning solution for API Gateway. It uses the AWS SAM CLI to build and deploy a sample serverless application to test the solution in the AWS Cloud.

To learn more about API Gateway, visit the API Gateway developer guide documentation, and for CloudFront, refer to Amazon CloudFront developer guide documentation.

For more serverless learning resources, visit Serverless Land.

Simplifying Multi-account CI/CD Deployments using AWS Proton

Post Syndicated from Marvin Fernandes original https://aws.amazon.com/blogs/architecture/simplifying-multi-account-ci-cd-deployments-using-aws-proton/

Many large enterprises, startups, and public sector entities maintain different deployment environments within multiple Amazon Web Services (AWS) accounts to securely develop, test, and deploy their applications. Maintaining separate AWS accounts for different deployment stages is a standard practice for organizations. It helps developers limit the blast radius in case of failure when deploying updates to an application, and provides for more resilient and distributed systems.

Typically, the team that owns and maintains these environments (the platform team) is segregated from the development team. A platform team performs critical activities. These can include setting infrastructure and governance standards, keeping patch levels up to date, and maintaining security and monitoring standards. Development teams are responsible for writing the code, performing appropriate testing, and pushing code to repositories to initiate deployments. The development teams are focused more on delivering their application and less on the infrastructure and networking that ties them together. The segregation of duties and use of multi-account environments are effective from a regulatory and development standpoint. But monitoring, maintaining, and enabling the safe release to these environments can be cumbersome and error prone.

In this blog, you will see how to simplify multi-account deployments in an environment that is segregated between platform and development teams. We will show how you can use one consistent and standardized continuous delivery pipeline with AWS Proton.

Challenges with multi-account deployment

For platform teams, maintaining these large environments at different stages in the development lifecycle and within separate AWS accounts can be tedious. The platform teams must ensure that certain security and regulatory requirements (like networking or encryption standards) are implemented in each separate account and environment. When working in a multi-account structure, AWS Identity and Access Management (IAM) permissions and cross-account access management can be a challenge for many account administrators. Many organizations rely on specific monitoring metrics and tagging strategies to perform basic functions. The platform team is responsible for enforcing these processes and implementing these details repeatedly across multiple accounts. This is a pain point for many infrastructure administrators or platform teams.

Platform teams are also responsible for ensuring a safe and secure application deployment pipeline. To do this, they isolate deployment and production environments from one another limiting the blast radius in case of failure. Platform teams enforce the principle of least privilege on each account, and implement proper testing and monitoring standards across the deployment pipeline.

Instead of focusing on the application and code, many developers face challenges complying with these rigorous security and infrastructure standards. This results in limited access to resources for developers. Delays come with reliance on administrators to deploy application code into production. This can lead to lags in deployment of updated code.

Deployment using AWS Proton

The ownership for infrastructure lies with the platform teams. They set the standards for security, code deployment, monitoring, and even networking. AWS Proton is an infrastructure provisioning and deployment service for serverless and container-based applications. Using AWS Proton, the platform team can provide their developers with a highly customized and catered “platform as a service” experience. This allows developers to focus their energy on building the best application, rather than spending time on orchestration tools. Platform teams can similarly focus on building the best platform for that application.

With AWS Proton, developers use predefined templates. With only a few input parameters, infrastructure can be provisioned and code deployed in an effective pipeline. This way you can get your application running and updated more quickly, see Figure 1.

Figure 1. Platform and development team roles when using AWS Proton

Figure 1. Platform and development team roles when using AWS Proton

AWS Proton allows you to deploy any serverless or container-based application across multiple accounts. You can define infrastructure standards and effective continuous delivery pipelines for your organization. Proton breaks down the infrastructure into environment and service (“infrastructure as code” templates).

In Figure 2, platform teams provide a service template of a secure environment to host a microservices application on Amazon Elastic Container Service (Amazon ECS) and AWS Fargate. The environment template contains infrastructure that is shared across services. This includes the networking configuration: Amazon Virtual Private Cloud (VPC), subnets, route tables, Internet Gateway, security groups, and ECS cluster definition for the Fargate service.

The service template provides details of the service. It includes the container task definitions, monitoring and logging definitions, and an effective continuous delivery pipeline. Using the environment and service template definitions, development teams can define the microservices that are running on Amazon ECS. They can deploy their code following the continuous integration and continuous delivery (CI/CD) pipeline.

Figure 2. Platform teams provision environment and service infrastructure as code templates in AWS Proton management account

Figure 2. Platform teams provision environment and service infrastructure as code templates in AWS Proton management account

Multi-account CI/CD deployment

For Figures 3 and 4, we used publicly available templates and created three separate AWS accounts: the AWS Proton management account, development account, and production environment accounts. Additional accounts may be added based on your use case and security requirements. As shown in Figure 3, the AWS Proton service account contains the environment, service, and pipeline templates. It also provides the connection to other accounts within the organization. The development and production accounts follow the structure of a development pipeline for a typical organization.

AWS Proton alleviates complicated cross-account policies by using a secure “environment account connection” feature. With environment account connections, platform administrators can give AWS Proton permissions to provision infrastructure in other accounts. They create an IAM role and specify a set of permissions in the target account. This enables Proton to assume the role from the management account to build resources in the target accounts.

AWS Key Management Service (KMS) policies can also be hard to manage in multi-account deployments. Proton reduces managing cross-account KMS permissions. In an AWS Proton management account, you can build a pipeline using a single artifact repository. You can also extend the pipeline to additional accounts from a single source of truth. This feature can be helpful when accounts are located in different Regions, due to regulatory requirements for example.

Figure 3. AWS Proton uses cross-account policies and provisions infrastructure in development and production accounts with environment connection feature

Figure 3. AWS Proton uses cross-account policies and provisions infrastructure in development and production accounts with environment connection feature

Once the environment and service templates are defined in the AWS Proton management account, the developer selects the templates. Proton then provisions the infrastructure, and the continuous delivery pipeline that will deploy the services to each separate account.

Developers commit code to a repository, and the pipeline is responsible for deploying to the different deployment stages. You don’t have to worry about any of the environment connection workflows. Proton allows platform teams to provide a single pipeline definition to deploy the code into multiple different accounts without any additional account level information. This standardizes the deployment process and implements effective testing and staging policies across the organization.

Platform teams can also inject manual approvals into the pipeline so they can control when a release is deployed. Developers can define tests that initiate after a deployment to ensure the validity of releases before moving to a production environment. This simplifies application code deployment in an AWS multi-account environment and allows updates to be deployed more quickly into production. The resulting deployed infrastructure is shown in Figure 4.

Figure 4. AWS Proton deploys service into multi-account environment through standardized continuous delivery pipeline

Figure 4. AWS Proton deploys service into multi-account environment through standardized continuous delivery pipeline


In this blog, we have outlined how using AWS Proton can simplify handling multi-account deployments using one consistent and standardized continuous delivery pipeline. AWS Proton addresses multiple challenges in the segregation of duties between developers and platform teams. By having one uniform resource for all these accounts and environments, developers can develop and deploy applications faster, while still complying with infrastructure and security standards.

For further reading:

Getting started with Proton
Identity and Access Management for AWS Proton
Proton administrative guide

Accepting API keys as a query string in Amazon API Gateway

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/accepting-api-keys-as-a-query-string-in-amazon-api-gateway/

This post was written by Ronan Prenty, Sr. Solutions Architect and Zac Burns, Cloud Support Engineer & API Gateway SME

Amazon API Gateway is a fully managed service that makes it easier for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the front door to applications and allow developers to offload tasks like authorization, throttling, caching, and more.

A common feature requested by customers is the ability to track usage for specific users or services through API keys. API Gateway REST APIs support this feature and, for added security, require that the API key resides in a header or an authorizer.

Developers may also need to pass API keys in the query string parameters. Best practices encourage refactoring the requests at the client level to move API keys to the header. However, this may not be possible during the migration.

This blog explains how to build an API Gateway REST API that temporarily accepts API keys as query string parameters. This post helps customers who have APIs that accept API keys as query string parameters and want to migrate to API Gateway with minimal impact on their clients. The post also discusses increasing security by refactoring the client to send API keys as a header instead of a query string.

There is also an example project for you to test and evaluate. This solution uses a custom authorizer AWS Lambda function to extract the API key from the query string parameter and apply it to a usage plan. The sample application uses the AWS Serverless Application Model (AWS SAM) for deployment.

Key concepts

API keys and usage plans

API keys are alphanumeric strings that are distributed to developers to grant access to an API. API Gateway can generate these on your behalf, or you can import them.

Usage plans let you provide API keys to your customers so that you can track and limit their usage. API keys are not a primary authorization mechanism for your APIs. If multiple APIs are associated with a usage plan, a user with a valid API key can access all APIs in that usage plan. We provide numerous options for securing access to your APIs, including resource policies, Lambda authorizers, and Amazon Cognito user pools.

Usage plans define who can access deployed API stages and methods along with metering their usage. Usage plans use API keys to identify who is making requests and apply throttling and quota limits.

How API Gateway handles API keys

API Gateway supports API keys sent as headers in a request. It does not support API keys sent as a query string parameter. API Gateway only accepts requests over HTTPS, which means that the request is encrypted. When sending API keys as query string parameters, there is still a risk that URLs are logged in plaintext by the client sending requests.

API Gateway has two settings to accept API keys:

  1. Header: The request contains the values as the X-API-Key header. API Gateway then validates the key against a usage plan.
  2. Authorizer: The authorizer includes the API key as part of the authorization response. Once API Gateway receives the API key as part of the response, it validates it against a usage plan.

Solution overview

To accept an API key as a query string parameter temporarily, create a custom authorizer using a Lambda function:

Note: the apiKeySource property of your API must be set to Authorizer instead of Header.

Note: the apiKeySource property of your API must be set to Authorizer instead of Header.

  1. The client sends an HTTP request to the API Gateway endpoint with the API key in the query string.
  2. API Gateway sends the request to a REQUEST type custom authorizer
  3. The custom authorizer function extracts the API Key from the payload. It constructs the response object with the API Key as the value for the `usageIdentifierKey` property
  4. The response gets sent back to API Gateway for validation.
  5. API Gateway validates the API key against a usage plan.
  6. If valid, API Gateway passes the request to the backend.

Deploying the solution


This solution requires no pre-existing AWS resources and deploys everything you need from the template. Deploying the solution requires:

You can find the solution on GitHub using this link.

With the prerequisites completed, deploy the template with the following commands:

git clone https://github.com/aws-samples/amazon-apigateway-accept-apikeys-as-querystring.git
cd amazon-apigateway-accept-apikeys-as-querystring
sam build --use-container
sam deploy --guided

Long term considerations

This temporary solution enables developers to migrate APIs to API Gateway and maintain query string-based API keys. While this solution does work, it does not follow best practices.

In addition to security, there is also a cost factor. Each time the client request contains an API key, the custom authorizer AWS Lambda function will be invoked, increasing the total amount of Lambda invocations you are billed for. To ensure you are billed only for valid requests, you can add an identity source to the custom authorizer meaning that only requests containing this identity source will be sent to the Lambda function. Requests that do not contain this identity source will not be billed by Lambda or API Gateway. Migrating to a header-based API key removes the need for a custom authorizer and the extra Lambda function invocations. You can find out more information on AWS Lambda billing here.

Customer migration process

With this in mind, the structure of the request sent by API clients must change from:

GET /some-endpoint?apiKey=abc123456789


GET /some-endpoint
x-api-key: abc123456789

You can provide clients with a notice period when this temporary solution is operational. After, they must migrate to a new API endpoint using a header to provide the API keys. Once the client migration is complete, they can retire the custom solution.

Developer portal

In addition to migrating API keys to a header-based solution, customers also ask us how to manage customer keys and usage plans. One option is to deploy the API Gateway developer portal.

This portal enables your customers to discover available APIs, browse API documentation, register for API keys, test APIs in the user interface, and monitor their API usage. This portal also allows you to publish non-API Gateway managed APIs by uploading OpenAPI definitions. The serverless developer portal can be customized and branded to suit your organization.


This blog post demonstrates how to use custom authorizers in API Gateway to accept API keys as a query string parameter. It also provides an AWS SAM template to deploy an example application for testing. Finally, it discusses the importance of moving customers to header-based API keys and managing those keys with the developer portal.

For more serverless content, visit Serverless Land.

ICYMI: Serverless Q3 2021

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-q3-2021/

Welcome to the 15th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

Q3 calendar

In case you missed our last ICYMI, check out what happened last quarter here.

AWS Lambda

You can now choose next-generation AWS Graviton2 processors in your Lambda functions. This Arm-based processor architecture can provide up to 19% better performance at 20% lower cost. You can configure functions to use Graviton2 in the AWS Management Console, API, CloudFormation, and CDK. We recommend using the AWS Lambda Power Tuning tool to see how your function compare and determine the price improvement you may see.

All Lambda runtimes built on Amazon Linux 2 support Graviton2, with the exception of versions approaching end-of-support. The AWS Free Tier for Lambda includes functions powered by both x86 and Arm-based architectures.

Create Lambda function with new arm64 option

You can also use the Python 3.9 runtime to develop Lambda functions. You can choose this runtime version in the AWS Management Console, AWS CLI, or AWS Serverless Application Model (AWS SAM). Version 3.9 includes a range of new features and performance improvements.

Lambda now supports Amazon MQ for RabbitMQ as an event source. This makes it easier to develop serverless applications that are triggered by messages in a RabbitMQ queue. This integration does not require a consumer application to monitor queues for updates. The connectivity with the Amazon MQ message broker is managed by the Lambda service.

Lambda has added support for up to 10 GB of memory and 6 vCPU cores in AWS GovCloud (US) Regions and in the Middle East (Bahrain), Asia Pacific (Osaka), and Asia Pacific (Hong Kong) Regions.

AWS Step Functions

Step Functions now integrates with the AWS SDK, supporting over 200 AWS services and 9,000 API actions. You can call services directly from the Amazon States Language definition in the resource field of the task state. This allows you to work with services like DynamoDB, AWS Glue Jobs, or Amazon Textract directly from a Step Functions state machine. To learn more, see the SDK integration tutorial.

AWS Amplify

The Amplify Admin UI now supports importing existing Amazon Cognito user pools and identity pools. This allows you to configure multi-platform apps to use the same user pools with different client IDs.

Amplify CLI now enables command hooks, allowing you to run custom scripts in the lifecycle of CLI commands. You can create bash scripts that run before, during, or after CLI commands. Amplify CLI has also added support for storing environment variables and secrets used by Lambda functions.

Amplify Geo is in developer preview and helps developers provide location-aware features to their frontend web and mobile applications. This uses the Amazon Location Service to provide map UI components.

Amazon EventBridge

The EventBridge schema registry now supports discovery of cross-account events. When schema registry is enabled on a bus, it now generates schemes for events originating from another account. This helps organize and find events in multi-account applications.

Amazon DynamoDB

DynamoDB console

The new DynamoDB console experience is now the default for viewing and managing DynamoDB tables. This makes it easier to manage tables from the navigation pane and also provided a new dedicated Items page. There is also contextual guidance and step-by-step assistance to help you perform common tasks more quickly.

API Gateway

API Gateway can now authenticate clients using certificate-based mutual TLS. Previously, this feature only supported AWS Certificate Manager (ACM). Now, customers can use a server certificate issued by a third-party certificate authority or ACM Private CA. Read more about using mutual TLS authentication with API Gateway.

The Serverless Developer Advocacy team built the Amazon API Gateway CORS Configurator to help you configure cross origin resource scripting (CORS) for REST and HTTP APIs. Fill in the information specific to your API and the AWS SAM configuration is generated for you.

Serverless blog posts




Tech Talks & Events

We hold AWS Online Tech Talks covering serverless topics throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page. We also regularly deliver talks at conferences and events around the world, speak on podcasts, and record videos you can find to learn in bite-sized chunks.

Here are some from Q3:


Serverless Land

Serverless Office Hours – Tues 10 AM PT

Weekly live virtual office hours. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues. Ask us anything you want about serverless technologies and applications.




DynamoDB Office Hours

Are you an Amazon DynamoDB customer with a technical question you need answered? If so, join us for weekly Office Hours on the AWS Twitch channel led by Rick Houlihan, AWS principal technologist and Amazon DynamoDB expert. See upcoming and previous shows

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

Using AWS Serverless to Power Event Management Applications

Post Syndicated from Cheryl Joseph original https://aws.amazon.com/blogs/architecture/using-aws-serverless-to-power-event-management-applications/

Most large events have common activities such as event registration, check-in upon arrival, and requesting of amenities. When designing applications, factors such as high availability, low latency, reliability, and security must be considered.

In this blog post, we’d like to show how Amazon Web Services (AWS) can assist you in event planning activities. We’ll share an architecture that follows best practices, and one that can be used in developing other solutions.

Serverless to the Rescue

Serverless architecture enables you to focus on your application development without having to worry about managing servers and runtimes. You can quickly build, fix, and add new features to your applications. A microservices-based approach provides you the ability to scale and optimize each component of your event management application.

Let’s start by looking at some activities that an event guest might perform, and how they might be displayed in a mobile application:

  • Event registration: A guest can register either from a website or from a mobile device, see Figure 1. Events might have heavy traffic initially, or a large push toward the end. This requires building applications that are highly scalable.
Figure 1. Event registration

Figure 1. Event registration

  • Check-In: Check-In can be a manual and cumbersome process – some mobile options are shown in Figure 2. Attendees must queue up to register, pick up badges, receive agendas, and collect other meeting materials.
Figure 2. Guest check-in kiosk

Figure 2. Guest check-in kiosk

  • Guest requests: While the event is underway, a participant might request hand-outs or want to purchase food or beverages, see Figure 3.
Figure 3. Guest requests

Figure 3. Guest requests

  • Session notification: At popular events, there are some sessions that fill up quickly. Guests must queue up to get into the session. Figure 4 shows a notification screen.
Figure 4. Session notification on guest device

Figure 4. Session notification on guest device

Solution overview for event planning

The serverless architecture presented here is highly scalable and provides low latency. It follows the Serverless Application Lens of the AWS Well-Architected Framework. This enables you to build secure, high-performing, resilient, and efficient applications.

Frontend user interface using AWS Amplify

The event website is hosted on AWS Amplify. Amplify provides a fully managed service for deploying and hosting applications with built-in CI/CD workflows. An alternative for hosting the event website could be Amazon Simple Storage Service (S3) or even by provisioning Amazon EC2 instances. However, Amplify is well suited for native mobile apps and JavaScript-based web apps.

The event website uses Amazon Cognito for management of user authentication and authorization. Amazon Cognito is a good choice here as it allows federating with external identity providers.

Backend serverless microservices

The backend of the event management application uses Amazon API Gateway and AWS Lambda. They provide the ability to expose API operations. If the application has a flurry of requests coming in together, the backend serverless microservices will scale up or down seamlessly. However, there are service limits, and it is important to keep these in mind while designing your applications.

Amazon DynamoDB is the NoSQL database, which saves the guest registration data and other event-related information. DynamoDB is a good fit here, as it delivers single-digit millisecond performance at any scale and provides high availability, fault tolerance, and automatic capacity scaling.

Amazon Pinpoint is used to send notifications to guests via email and SMS. Amazon Pinpoint allows your app to connect with customers over channels like email, SMS, push, or voice.

Let’s take a closer look at some of the activities we’ve outlined.

Solution architecture – Event registration and check-in

Figure 5. Event registration and check-in

Figure 5. Event registration and check-in

Numbered items following refer to Figure 5:

  1. Developers upload code to AWS CodeCommit
  2. CodeCommit pushes the code to Amplify
  3. Guests access the website via Amazon Route 53
  4. Route 53 resolves incoming requests and forwards them to Amplify
  5. Guest authentication is performed by Amazon Cognito user pools
  6. Amplify sends the REST API requests to API Gateway
  7. API Gateway uses Amazon Cognito user pools as the authorizer
  8. API Gateway proxies the request to Lambda
  9. Lambda stores guest data in DynamoDB
  10. Lambda uses Amazon Pinpoint to notify the guest

The guest registration process begins with loading the web application hosted on Amplify. The application creates the user in the Amazon Cognito user pool and routes the request to API Gateway to complete the registration process. Amazon Cognito integrates with third-party authentication systems such as Google, Facebook, and Amazon. This allows guests to use their existing social media accounts to register.

The guest check-in process consists of loading a web application onto kiosks. Guest information is saved in a DynamoDB table. Upon registration, a QR code is sent to the guests, then scanned upon arrival at a kiosk. Guest information is then retrieved from a DynamoDB table. This allows guests to print their badges and other event materials.

Well-Architected guidance:

  • Enable active tracing with AWS X-Ray to provide distributed tracing capabilities and visual service maps for faster troubleshooting of the backend APIs.
  • For Lambda functions, follow least-privileged access and only allow the access required to perform a given operation.
  • Throttle API operations to enforce access patterns established by the event management application service contract.
  • Set appropriate logging levels and remove unnecessary logging information to optimize log ingestion. Use environment variables to control application logging level.

Solution architecture – Guest requests

Figure 6. Guest requests

Figure 6. Guest requests

Numbered items refer to Figure 6:

  1. Guests access the website via Route 53
  2. Route 53 resolves incoming requests and forwards them to Amplify
  3. Guest authentication is performed by Amazon Cognito user pools
  4. Amplify sends the REST API requests to API Gateway
  5. API Gateway uses Amazon Cognito user pools as the authorizer
  6. API Gateway proxies the request to Lambda
  7. Lambda validates and stores guest data in DynamoDB
  8. Lambda uses Amazon Pinpoint to notify the guest
  9. Amazon DynamoDB Streams are enabled which triggers a Lambda function
  10. Lambda notifies the employees via Amazon Simple Notification Service (SNS) to fulfill the request

Once a guest request is made for session handouts or food or beverages, it is stored in DynamoDB. DynamoDB Streams are enabled, see Figure 7, which captures a time-ordered sequence of item-level modifications in a DynamoDB table. It durably stores the information for up to 24 hours. This generates an event, which triggers a Lambda function. The Lambda function sends an SNS notification via SMS or email to the event employees who can address the guest requests.

Figure 7. Sample DynamoDB Streams record

Figure 7. Sample DynamoDB Streams record

Well-Architected guidance:

  • Standardize application logging across components, and business outcomes
  • Enable caching on API Gateway to improve application performance
  • Use an On-Demand Instance for DynamoDB when traffic is unpredictable, otherwise use provisioned mode when consistent
  • Amazon DynamoDB Accelerator (DAX) can be used as an in-memory cache to improve read performance

Solution architecture – Session notification

Figure 8. Session notification

Figure 8. Session notification

Numbered items refer to Figure 8:

  1. An Amazon EventBridge rule runs on a schedule and invokes a Lambda function
  2. Lambda retrieves guest and session information from DynamoDB
  3. Lambda notifies the guest via Amazon Pinpoint

Amazon Pinpoint can send notifications to registered guests to let them know when to queue up for the session.


This solution provides a powerful approach for deploying highly scalable applications, while providing low latency and low cost. Build a Serverless Web Application can get you started. Large events require a considerable amount of planning and coordination. We hope the guidance provided here will help you build a scalable and a robust event management application.

Building well-architected serverless applications: Optimizing application costs

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-optimizing-application-costs/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

COST 1. How do you optimize your serverless application costs?

Design, implement, and optimize your application to maximize value. Asynchronous design patterns and performance practices ensure efficient resource use and directly impact the value per business transaction. By optimizing your serverless application performance and its code patterns, you can directly impact the value it provides, while making more efficient use of resources.

Serverless architectures are easier to manage in terms of correct resource allocation compared to traditional architectures. Due to its pay-per-value pricing model and scale based on demand, a serverless approach effectively reduces the capacity planning effort. As covered in the operational excellence and performance pillars, optimizing your serverless application has a direct impact on the value it produces and its cost. For general serverless optimization guidance, see the AWS re:Invent talks, “Optimizing your Serverless applications” Part 1 and Part 2, and “Serverless architectural patterns and best practices”.

Required practice: Minimize external calls and function code initialization

AWS Lambda functions may call other managed services and third-party APIs. Functions may also use application dependencies that may not be suitable for ephemeral environments. Understanding and controlling what your function accesses while it runs can have a direct impact on value provided per invocation.

Review code initialization

I explain the Lambda initialization process with cold and warm starts in “Optimizing application performance – part 1”. Lambda reports the time it takes to initialize application code in Amazon CloudWatch Logs. As Lambda functions are billed by request and duration, you can use this to track costs and performance. Consider reviewing your application code and its dependencies to improve the overall execution time to maximize value.

You can take advantage of Lambda execution environment reuse to make external calls to resources and use the results for subsequent invocations. Use TTL mechanisms inside your function handler code. This ensures that you can prevent additional external calls that incur additional execution time, while preemptively fetching data that isn’t stale.

Review third-party application deployments and permissions

When using Lambda layers or applications provisioned by AWS Serverless Application Repository, be sure to understand any associated charges that these may incur. When deploying functions packaged as container images, understand the charges for storing images in Amazon Elastic Container Registry (ECR).

Ensure that your Lambda function only has access to what its application code needs. Regularly review that your function has a predicted usage pattern so you can factor in the cost of other services, such as Amazon S3 and Amazon DynamoDB.

Required practice: Optimize logging output and its retention

Considering reviewing your application logging level. Ensure that logging output and log retention are appropriately set to your operational needs to prevent unnecessary logging and data retention. This helps you have the minimum of log retention to investigate operational and performance inquiries when necessary.

Emit and capture only what is necessary to understand and operate your component as intended.

With Lambda, any standard output statements are sent to CloudWatch Logs. Capture and emit business and operational events that are necessary to help you understand your function, its integration, and its interactions. Use a logging framework and environment variables to dynamically set a logging level. When applicable, sample debugging logs for a percentage of invocations.

In the serverless airline example used in this series, the booking service Lambda functions use Lambda Powertools as a logging framework with output structured as JSON.

Lambda Powertools is added to the Lambda functions as a shared Lambda layer in the AWS Serverless Application Model (AWS SAM) template. The layer ARN is stored in Systems Manager Parameter Store.

    Type: AWS::SSM::Parameter::Value<String>
    Description: Project shared libraries Lambda Layer ARN
        Type: AWS::Serverless::Function
            FunctionName: !Sub ServerlessAirline-ConfirmBooking-${Stage}
            Handler: confirm.lambda_handler
            CodeUri: src/confirm-booking
                - !Ref SharedLibsLayer
            Runtime: python3.7

The LOG_LEVEL and other Powertools settings are configured in the Globals section as Lambda environment variable for all functions.

                POWERTOOLS_SERVICE_NAME: booking
                POWERTOOLS_METRICS_NAMESPACE: ServerlessAirline
                LOG_LEVEL: INFO 

For Amazon API Gateway, there are two types of logging in CloudWatch: execution logging and access logging. Execution logs contain information that you can use to identify and troubleshoot API errors. API Gateway manages the CloudWatch Logs, creating the log groups and log streams. Access logs contain details about who accessed your API and how they accessed it. You can create your own log group or choose an existing log group that could be managed by API Gateway.

Enable access logs, and selectively review the output format and request fields that might be necessary. For more information, see “Setting up CloudWatch logging for a REST API in API Gateway”.

API Gateway logging

API Gateway logging

Enable AWS AppSync logging which uses CloudWatch to monitor and debug requests. You can configure two types of logging: request-level and field-level. For more information, see “Monitoring and Logging”.

AWS AppSync logging

AWS AppSync logging

Define and set a log retention strategy

Define a log retention strategy to satisfy your operational and business needs. Set log expiration for each CloudWatch log group as they are kept indefinitely by default.

For example, in the booking service AWS SAM template, log groups are explicitly created for each Lambda function with a parameter specifying the retention period.

        Type: Number
        Default: 14
        Description: CloudWatch Logs retention period
        Type: AWS::Logs::LogGroup
            LogGroupName: !Sub "/aws/lambda/${ConfirmBooking}"
            RetentionInDays: !Ref LogRetentionInDays

The Serverless Application Repository application, auto-set-log-group-retention can update the retention policy for new and existing CloudWatch log groups to the specified number of days.

For log archival, you can export CloudWatch Logs to S3 and store them in Amazon S3 Glacier for more cost-effective retention. You can use CloudWatch Log subscriptions for custom processing, analysis, or loading to other systems. Lambda extensions allows you to process, filter, and route logs directly from Lambda to a destination of your choice.

Good practice: Optimize function configuration to reduce cost

Benchmark your function using a different set of memory size

For Lambda functions, memory is the capacity unit for controlling the performance and cost of a function. You can configure the amount of memory allocated to a Lambda function, between 128 MB and 10,240 MB. The amount of memory also determines the amount of virtual CPU available to a function. Benchmark your AWS Lambda functions with differing amounts of memory allocated. Adding more memory and proportional CPU may lower the duration and reduce the cost of each invocation.

In “Optimizing application performance – part 2”, I cover using AWS Lambda Power Tuning to automate the memory testing process to balances performance and cost.

Best practice: Use cost-aware usage patterns in code

Reduce the time your function runs by reducing job-polling or task coordination. This avoids overpaying for unnecessary compute time.

Decide whether your application can fit an asynchronous pattern

Avoid scenarios where your Lambda functions wait for external activities to complete. I explain the difference between synchronous and asynchronous processing in “Optimizing application performance – part 1”. You can use asynchronous processing to aggregate queues, streams, or events for more efficient processing time per invocation. This reduces wait times and latency from requesting apps and functions.

Long polling or waiting increases the costs of Lambda functions and also reduces overall account concurrency. This can impact the ability of other functions to run.

Consider using other services such as AWS Step Functions to help reduce code and coordinate asynchronous workloads. You can build workflows using state machines with long-polling, and failure handling. Step Functions also supports direct service integrations, such as DynamoDB, without having to use Lambda functions.

In the serverless airline example used in this series, Step Functions is used to orchestrate the Booking microservice. The ProcessBooking state machine handles all the necessary steps to create bookings, including payment.

Booking service state machine

Booking service state machine

To reduce costs and improves performance with CloudWatch, create custom metrics asynchronously. You can use the Embedded Metrics Format to write logs, rather than the PutMetricsData API call. I cover using the embedded metrics format in “Understanding application health” – part 1 and part 2.

For example, once a booking is made, the logs are visible in the CloudWatch console. You can select a log stream and find the custom metric as part of the structured log entry.

Custom metric structured log entry

Custom metric structured log entry

CloudWatch automatically creates metrics from these structured logs. You can create graphs and alarms based on them. For example, here is a graph based on a BookingSuccessful custom metric.

CloudWatch metrics custom graph

CloudWatch metrics custom graph

Consider asynchronous invocations and review run away functions where applicable

Take advantage of Lambda’s event-based model. Lambda functions can be triggered based on events ingested into Amazon Simple Queue Service (SQS) queues, S3 buckets, and Amazon Kinesis Data Streams. AWS manages the polling infrastructure on your behalf with no additional cost. Avoid code that polls for third-party software as a service (SaaS) providers. Rather use Amazon EventBridge to integrate with SaaS instead when possible.

Carefully consider and review recursion, and establish timeouts to prevent run away functions.


Design, implement, and optimize your application to maximize value. Asynchronous design patterns and performance practices ensure efficient resource use and directly impact the value per business transaction. By optimizing your serverless application performance and its code patterns, you can reduce costs while making more efficient use of resources.

In this post, I cover minimizing external calls and function code initialization. I show how to optimize logging output with the embedded metrics format, and log retention. I recap optimizing function configuration to reduce cost and highlight the benefits of asynchronous event-driven patterns.

This post wraps up the series, building well-architected serverless applications, where I cover the AWS Well-Architected Tool with the Serverless Lens . See the introduction post for links to all the blog posts.

For more serverless learning resources, visit Serverless Land.


Practical Entity Resolution on AWS to Reconcile Data in the Real World

Post Syndicated from David Amatulli original https://aws.amazon.com/blogs/architecture/practical-entity-resolution-on-aws-to-reconcile-data-in-the-real-world/

This post was co-written with Mamoon Chowdry, Solutions Architect, previously at AWS.

Businesses and organizations from many industries often struggle to ensure that their data is accurate. Data often has to match people or things exactly in the real world, such as a customer name, an address, or a company. Matching our data is important to validate it, de-duplicate it, or link records in different systems together. Know Your Customer (KYC) regulations also mean that we must be confident in who or what our data is referring to. We must match millions of records from different data sources. Some of that data may have been entered manually and contain inconsistencies.

It can often be hard to match data with the entity it is supposed to represent. For example, if a customer enters their details as, “Mr. John Doe, #1a 123 Main St.“ and you have a prior record in your customer database for ”J. Doe, Apt 1A, 123 Main Street“, are they referring to the same or a different person?

In cases like this, we often have to manually update our data to make sure it accurately and consistently matches a real-world entity. You may want to have consistent company names across a list of business contacts. When there isn’t an exact match, we have to reconcile our data with the available facts we know about that entity. This reconciliation is commonly referred to as entity resolution (ER). This process can be labor-intensive and error-prone.

This blog will explore some of the common types of ER. We will share a basic architectural pattern for near real-time ER processing. You will see how ER using fuzzy text matching can reconcile manually entered names with reference data.

Multiple ways to do entity resolution

Entity resolution is a broad and deep topic, and a complete discussion would be beyond the scope of this blog. However, at a high level there are four common approaches to matching ambiguous fields or records, to known entities.

    1. Fuzzy text matching. We might normally compare two strings to see if they are identical. If they don’t exactly match, it is often helpful to find the nearest match. We do this by calculating a similarity score. For example, “John Doe” and “J Doe” may have a similarity score of 80%. A common way to compare the similarity of two strings is to use the Levenshtein distance, which measures the distance between two sequences.

We may also examine more than one field. For example, we may compare a name and address. Is “Mr. J Doe, 123 Main St” likely to be the same person as “Mr John Doe, 123 Main Street”? If we compare multiple fields in a record and analyze all of their similarity scores, this is commonly called Pairwise comparison.

2. Clustering. We can plot records in an n-dimensional space based on values computed from their fields. Their similarity to other reference records is then measured by calculating how close they are to each other in that space. Those that are clustered together are likely to refer to the same entity. Clustering is an effective method for grouping or segmenting data for computer vision, astronomy, or market segmentation. An example of this method is K-means clustering.

3. Graph networks. Graph networks are commonly used to store relationships between entities, such as people who are friends with each other, or residents of a particular address. When we need to resolve an ambiguous record, we can use a graph database to identify potential relationships to other records. For example, “J Doe, 123 Main St,” may be the same as “John Doe, 123 Main St,” because they have the same address and similar names.

Graph networks are especially helpful when dealing with complex relationships over millions of entities. For example, you can build a customer profile using web server logs and other data.

4. Commercial off-the-shelf (COTS) software. Enterprises can also deploy ER software, such as these offerings from the AWS Marketplace and Senzing entity resolution. This is helpful when companies may not have the skill or experience to implement a solution themselves. It is important to mention the role of Master Data Management (MDM) with ER. MDM involves having a single trusted source for your data. Tools, such as Informatica, can help ER with their MDM features.

Our solution (shown in Figure 1) allows us to build a low-cost, streamlined solution using AWS serverless technology. The architecture uses AWS Lambda, which allows you to run code without having to provision or manage servers. This code will be invoked through an API, which is created with Amazon API Gateway. API Gateway is a fully managed service used by developers to create, publish, maintain, monitor, and secure API operations at any scale. Finally, we will store our reference data in Amazon Simple Storage Service (S3).

Entity resolution solution using AWS serverless services

We initially match manually entered strings to a list of reference strings. The strings we will try to match will be names of companies.

Figure 1. Example request dataflow through AWS

Figure 1. Example request dataflow through AWS

  1. Our API takes a string as input
  2. It then invokes the ER Lambda function
  3. This loads the index and data files of our reference dataset
  4. The ER finds the closest match in the list of real-world companies
  5. The closest match is returned

The reference data and index files were created from an export of the fuzzy match algorithm.

The fuzzy match algorithm in detail

The algorithm in the AWS Lambda function works by converting each string to a collection of n-grams. N-grams are smaller substrings that are commonly used for analyzing free-form text.

The n-grams are then converted to a simple vector. Each vector is a numerical statistic that represents the Term Frequency – Inverse Document Frequency (TF-IDF). Both TF-IDF and n-grams are used to prepare text for searching. N-grams of strings that are similar in nature, tend to have similar TF-IDF vectors. We can plot these vectors in a chart. This helps us find similar strings as they are grouped or clustered together.

Comparing vectors to find similar strings can be fairly straightforward. But if you have numerous records, it can be computationally expensive and slow. To solve this, we use the NMSLIB library. This library indexes the vectors for faster similarity searching. It also gives us the degree of similarity between two strings. This is important because we may want to know the accuracy of a match we have found. For example, it can be helpful to filter out weak matches.

The entity resolution Lambda

Using the NMSLIB library, which is loaded using Lambda layers, we initialize an index using Neighborhood APProximation (NAPP).

# initialize the index
newIndex = nmslib.init(method='napp', space='negdotprod_sparse_fast',

Next we imported the index and data files that were created from our reference data.

# load the index file
newIndex.loadIndex(DATA_DIR + 'index_company_names.bin',

The input parameter companyName is then used to query the index to find the approximate nearest neighbor. By using the knnQueryBatch method, we distribute the work over a thread pool, which provides faster querying.

# set the input variable and empty output list
inputString = companyName
outputList = []
# Find the nearest neighbor for our company name
# (K is the number of matches, set to 1)
newQueryMatrix = vectorizer.transform(inputString)
newNbrs = index.knnQueryBatch(newQueryMatrix, k = K, num_threads = numThreads)

The best match is then returned as a JSON response.

# return the match
for i in range(K):
return {
      'statusCode': '200',
      'body': json.dumps(outputList),

Cost estimate for solution

Our solution is a combination of Amazon API GatewayAWS Lambda, and Amazon S3 (hyperlinks are to pricing pages). As an example, let’s assume that the API will receive 10 million requests per month. We can estimate the costs of running the solution as:

Service Description Cost
AWS Lambda 10 million requests and associated compute costs $161.80
Amazon API Gateway HTTP API requests, avg size of request (34 KB), Avg message size (32 KB), requests (10 million/month) $10.00
Amazon S3 S3 Standard storage (including data transfer costs) $7.61
Total $179.41

Table 1. Example monthly cost estimate (USD)


Using AWS services to reconcile your data with real-world entities helps make your data more accurate and consistent. You can automate a manual task that could have been laborious, expensive, and error-prone.

Where can you use ER in your organization? Do you have manually entered or inaccurate data? Have you struggled to match it with real-world entities? You can experiment with this architecture to continue to improve the accuracy of your own data.

Further reading:

Scaling Data Analytics Containers with Event-based Lambda Functions

Post Syndicated from Brian Maguire original https://aws.amazon.com/blogs/architecture/scaling-data-analytics-containers-with-event-based-lambda-functions/

The marketing industry collects and uses data from various stages of the customer journey. When they analyze this data, they establish metrics and develop actionable insights that are then used to invest in customers and generate revenue.

If you’re a data scientist or developer in the marketing industry, you likely often use containers for services like collecting and preparing data, developing machine learning models, and performing statistical analysis. Because the types and amount of marketing data collected are quickly increasing, you’ll need a solution to manage the scale, costs, and number of required data analytics integrations.

In this post, we provide a solution that can perform and scale with dynamic traffic and is cost optimized for on-demand consumption. It uses synchronous container-based data science applications that are deployed with asynchronous container-based architectures on AWS Lambda. This serverless architecture automates data analytics workflows utilizing event-based prompts.

Synchronous container applications

Data science applications are often deployed to dedicated container instances, and their requests are routed by an Amazon API Gateway or load balancer. Typically, an Amazon API Gateway routes HTTP requests as synchronous invocations to instance-based container hosts.

The target of the requests is a container-based application running a machine learning service (SciLearn). The service container is configured with the required dependency packages such as scikit-learn, pandas, NumPy, and SciPy.

Containers are commonly deployed on different targets such as on-premises, Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Compute Cloud (Amazon EC2), and AWS Elastic Beanstalk. These services run synchronously and scale through Amazon Auto Scaling groups and a time-based consumption pricing model.

Figure 1. Synchronous container applications diagram

Figure 1. Synchronous container applications diagram

Challenges with synchronous architectures

When using a synchronous architecture, you will likely encounter some challenges related to scale, performance, and cost:

  • Operation blocking. The sender does not proceed until Lambda returns, and failure processing, such as retries, must be handled by the sender.
  • No native AWS service integrations. You cannot use several native integrations with other AWS services such as Amazon Simple Storage Service (Amazon S3), Amazon EventBridge, and Amazon Simple Queue Service (Amazon SQS).
  • Increased expense. A 24/7 environment can result in increased expenses if resources are idle, not sized appropriately, and cannot automatically scale.

The New for AWS Lambda – Container Image Support blog post offers a serverless, event-based architecture to address these challenges. This approach is explained in detail in the following section.

Benefits of using Lambda Container Image Support

In our solution, we refactored the synchronous SciLearn application that was deployed on instance-based hosts as an asynchronous event-based application running on Lambda. This solution includes the following benefits:

  • Easier dependency management with Dockerfile. Dockerfile allows you to install native operating system packages and language-compatible dependencies or use enterprise-ready container images.
  • Similar tooling. The asynchronous and synchronous solutions use Amazon Elastic Container Registry (Amazon ECR) to store application artifacts. Therefore, they have the same build and deployment pipeline tools to inspect Dockerfiles. This means your team will likely spend less time and effort learning how to use a new tool.
  • Performance and cost. Lambda provides sub-second autoscaling that’s aligned with demand. This results in higher availability, lower operational overhead, and cost efficiency.
  • Integrations. AWS provides more than 200 service integrations to deploy functions as container images on Lambda, without having to develop it yourself so it can be deployed faster.
  • Larger application artifact up to 10 GB. This includes larger application dependency support, giving you more room to host your files and packages in oppose to hard limit of 250 MB of unzipped files for deployment packages.

Scaling with asynchronous events

AWS offers two ways to asynchronously scale processing independently and automatically: Elastic Beanstalk worker environments and asynchronous invocation with Lambda. Both options offer the following:

  • They put events in an SQS queue.
  • They can be designed to take items from the queue only when they have the capacity available to process a task. This prevents them from becoming overwhelmed.
  • They offload tasks from one component of your application by sending them to a queue and process them asynchronously.

These asynchronous invocations add default, tunable failure processing and retry mechanisms through “on failure” and “on success” event destinations, as described in the following section.

Integrations with multiple destinations

“On failure” and “on success” events can be logged in an SQS queue, Amazon Simple Notification Service (Amazon SNS) topic, EventBridge event bus, or another Lambda function. All four are integrated with most AWS services.

“On failure” events are sent to an SQS dead-letter queue because they cannot be delivered to their destination queues. They will be reprocessed them as needed, and any problems with message processing will be isolated.

Figure 2 shows an asynchronous Amazon API Gateway that has placed an HTTP request as a message in an SQS queue, thus decoupling the components.

The messages within the SQS queue then prompt a Lambda function. This runs the machine learning service SciLearn container in Lambda for data analysis workflows, which are integrated with another SQS dead letter queue for failures processing.

Figure 2. Example asynchronous-based container applications diagram

Figure 2. Example asynchronous-based container applications diagram

When you deploy Lambda functions as container images, they benefit from the same operational simplicity, automatic scaling, high availability, and native integrations. This makes it an appealing architecture for our data analytics use cases.

Design considerations

The following can be considered when implementing Docker Container Images with Lambda:

  • Lambda supports container images that have manifest files that follow these formats:
    • Docker image manifest V2, schema 2 (Docker version 1.10 and newer)
    • Open Container Initiative (OCI) specifications (v1.0.0 and up)
  • Storage in Lambda:
  • On create/update, Lambda will cache the image to speed up the cold start of functions during execution. Cold starts occur on an initial request to Lambda, which can lead to longer startup times. The first request will maintain an instance of the function for only a short time period. If Lambda has not been called during that period, the next invocation will create a new instance
  • Fine grain role policies are highly recommended for security purposes
  • Container images can use the Lambda Extensions API to integrate monitoring, security and other tools with the Lambda execution environment.


We were able to architect this synchronous service based on a previously deployed on instance-based hosts and design it to become asynchronous on Amazon Lambda.

By using the new support for container-based images in Lambda and converting our workload into an asynchronous event-based architecture, we were able to overcome these challenges:

  • Performance and security. With batch requests, you can scale asynchronous workloads and handle failure records using SQS Dead Letter Queues and Lambda destinations. Using Lambda to integrate with other services (such as EventBridge and SQS) and using Lambda roles simplifies maintaining a granular permission structure. When Lambda uses an Amazon SQS queue as an event source, it can scale up to 60 more instances per minute, with a maximum of 1,000 concurrent invocations.
  • Cost optimization. Compute resources are a critical component of any application architecture. Overprovisioning computing resources and operating idle resources can lead to higher costs. Because Lambda is serverless, it only incurs costs on when you invoke a function and the resources allocated for each request.

Building well-architected serverless applications: Optimizing application performance – part 2

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-optimizing-application-performance-part-2/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

PERF 1. Optimizing your serverless application’s performance

This post continues part 1 of this security question. Previously, I cover measuring and optimizing function startup time. I explain cold and warm starts and how to reuse the Lambda execution environment to improve performance. I show a number of ways to analyze and optimize the initialization startup time. I explain how only importing necessary libraries and dependencies increases application performance.

Good practice: Design your function to take advantage of concurrency via asynchronous and stream-based invocations

AWS Lambda functions can be invoked synchronously and asynchronously.

Favor asynchronous over synchronous request-response processing.

Consider using asynchronous event processing rather than synchronous request-response processing. You can use asynchronous processing to aggregate queues, streams, or events for more efficient processing time per invocation. This reduces wait times and latency from requesting apps and functions.

When you invoke a Lambda function with a synchronous invocation, you wait for the function to process the event and return a response.

Synchronous invocation

Synchronous invocation

As synchronous processing involves a request-response pattern, the client caller also needs to wait for a response from a downstream service. If the downstream service then needs to call another service, you end up chaining calls that can impact service reliability, in addition to response times. For example, this POST /order request must wait for the response to the POST /invoice request before responding to the client caller.

Example synchronous processing

Example synchronous processing

The more services you integrate, the longer the response time, and you can no longer sustain complex workflows using synchronous transactions.

Asynchronous processing allows you to decouple the request-response using events without waiting for a response from the function code. This allows you to perform background processing without requiring the client to wait for a response, improving client performance. You pass the event to an internal Lambda queue for processing and Lambda handles the rest. An external process, separate from the function, manages polling and retries. Using this asynchronous approach can also make it easier to handle unpredictable traffic with significant volumes.

Asynchronous invocation

Asynchronous invocation

For example, the client makes a POST /order request to the order service. The order service accepts the request and returns that it has been received, without waiting for the invoice service. The order service then makes an asynchronous POST /invoice request to the invoice service, which can then process independently of the order service. If the client must receive data from the invoice service, it can handle this separately via a GET /invoice request.

Example asynchronous processing

Example asynchronous processing

You can configure Lambda to send records of asynchronous invocations to another destination service. This helps you to troubleshoot your invocations. You can also send messages or events that can’t be processed correctly into a dedicated Amazon Simple Queue Service (SQS) dead-letter queue for investigation.

You can add triggers to a function to process data automatically. For more information on which processing model Lambda uses for triggers, see “Using AWS Lambda with other services”.

Asynchronous workflows handle a variety of use cases including data Ingestion, ETL operations, and order/request fulfillment. In these use-cases, data is processed as it arrives and is retrieved as it changes. For example asynchronous patterns, see “Serverless Data Processing” and “Serverless Event Submission with Status Updates”.

For more information on Lambda synchronous and asynchronous invocations, see the AWS re:Invent presentation “Optimizing your serverless applications”.

Tune batch size, batch window, and compress payloads for high throughput

When using Lambda to process records using Amazon Kinesis Data Streams or SQS, there are a number of tuning parameters to consider for performance.

You can configure a batch window to buffer messages or records for up to 5 minutes. You can set a limit of the maximum number of records Lambda can process by setting a batch size. Your Lambda function is invoked whichever comes first.

For high volume SQS standard queue throughput, Lambda can process up to 1000 concurrent batches of records per second. For more information, see “Using AWS Lambda with Amazon SQS”.

For high volume Kinesis Data Streams throughput, there are a number of options. Configure the ParallelizationFactor setting to process one shard of a Kinesis Data Stream with more than one Lambda invocation simultaneously. Lambda can process up to 10 batches in each shard. For more information, see “New AWS Lambda scaling controls for Kinesis and DynamoDB event sources.” You can also add more shards to your data stream to increase the speed at which your function can process records. This increases the function concurrency at the expense of ordering per shard. For more details on using Kinesis and Lambda, see “Monitoring and troubleshooting serverless data analytics applications”.

Kinesis enhanced fan-out can maximize throughput by dedicating a 2 MB/second input/output channel per second per consumer instead of 2 MB per shard. For more information, see “Increasing stream processing performance with Enhanced Fan-Out and Lambda”.

Kinesis stream producers can also compress records. This is at the expense of additional CPU cycles for decompressing the records in your Lambda function code.

Required practice: Measure, evaluate, and select optimal capacity units

Capacity units are a unit of consumption for a service. They can include function memory size, number of stream shards, number of database reads/writes, request units, or type of API endpoint. Measure, evaluate and select capacity units to enable optimal configuration of performance, throughput, and cost.

Identify and implement optimal capacity units.

For Lambda functions, memory is the capacity unit for controlling the performance of a function. You can configure the amount of memory allocated to a Lambda function, between 128 MB and 10,240 MB. The amount of memory also determines the amount of virtual CPU available to a function. Adding more memory proportionally increases the amount of CPU, increasing the overall computational power available. If a function is CPU-, network- or memory-bound, then changing the memory setting can dramatically improve its performance.

Choosing the memory allocated to Lambda functions is an optimization process that balances performance (duration) and cost. You can manually run tests on functions by selecting different memory allocations and measuring the time taken to complete. Alternatively, use the AWS Lambda Power Tuning tool to automate the process.

The tool allows you to systematically test different memory size configurations and depending on your performance strategy – cost, performance, balanced – it identifies what is the most optimum memory size to use. For more information, see “Operating Lambda: Performance optimization – Part 2”.

AWS Lambda Power Tuning report

AWS Lambda Power Tuning report

Amazon DynamoDB manages table processing throughput using read and write capacity units. There are two different capacity modes, on-demand and provisioned.

On-demand capacity mode supports up to 40K read/write request units per second. This is recommended for unpredictable application traffic and new tables with unknown workloads. For higher and predictable throughputs, provisioned capacity mode along with DynamoDB auto scaling is recommended. For more information, see “Read/Write Capacity Mode”.

For high throughput Amazon Kinesis Data Streams with multiple consumers, consider using enhanced fan-out for dedicated 2 MB/second throughput per consumer. When possible, use Kinesis Producer Library and Kinesis Client Library for effective record aggregation and de-aggregation.

Amazon API Gateway supports multiple endpoint types. Edge-optimized APIs provide a fully managed Amazon CloudFront distribution. These are better for geographically distributed clients. API requests are routed to the nearest CloudFront Point of Presence (POP), which typically improves connection time.

Edge-optimized API Gateway deployment

Edge-optimized API Gateway deployment

Regional API endpoints are intended when clients are in the same Region. This helps you to reduce request latency and allows you to add your own content delivery network if necessary.

Regional endpoint API Gateway deployment

Regional endpoint API Gateway deployment

Private API endpoints are API endpoints that can only be accessed from your Amazon Virtual Private Cloud (VPC) using an interface VPC endpoint. For more information, see “Creating a private API in Amazon API Gateway”.

For more information on endpoint types, see “Choose an endpoint type to set up for an API Gateway API”. For more general information on API Gateway, see the AWS re:Invent presentation “I didn’t know Amazon API Gateway could do that”.

AWS Step Functions has two workflow types, standard and express. Standard Workflows have exactly once workflow execution and can run for up to one year. Express Workflows have at-least-once workflow execution and can run for up to five minutes. Consider the per-second rates you require for both execution start rate and the state transition rate. For more information, see “Standard vs. Express Workflows”.

Performance load testing is recommended at both sustained and burst rates to evaluate the effect of tuning capacity units. Use Amazon CloudWatch service dashboards to analyze key performance metrics including load testing results. I cover performance testing in more detail in “Regulating inbound request rates – part 1”.

For general serverless optimization information, see the AWS re:Invent presentation “Serverless at scale: Design patterns and optimizations”.


Evaluate and optimize your serverless application’s performance based on access patterns, scaling mechanisms, and native integrations. You can improve your overall experience and make more efficient use of the platform in terms of both value and resources.

This post continues from part 1 and looks at designing your function to take advantage of concurrency via asynchronous and stream-based invocations. I cover measuring, evaluating, and selecting optimal capacity units.

This well-architected question will continue in part 3 where I look at integrating with managed services directly over functions when possible. I cover optimizing access patterns and applying caching where applicable.

For more serverless learning resources, visit Serverless Land.

Convert and Watermark Documents Automatically with Amazon S3 Object Lambda

Post Syndicated from Joseph Simon original https://aws.amazon.com/blogs/architecture/convert-and-watermark-documents-automatically-with-amazon-s3-object-lambda/

When you provide access to a sensitive document to someone outside of your organization, you likely need to ensure that the document is read-only. In this case, your document should be associated with a specific user in case it is shared.

For example, authors often embed user-specific watermarks into their ebooks. This way, if their ebook gets posted to a file-sharing site, they can prevent the purchaser from downloading copies of the ebook in the future.

In this blog post, we provide you a cost-efficient, scalable, and secure solution to efficiently generate user-specific versions of sensitive documents. This solution helps users track who their documents are shared with. This helps prevent fraud and ensure that private information isn’t leaked. Our solution uses a RESTful API, which uses Amazon S3 Object Lambda to convert documents to PDF and apply a watermark based on the requesting user. It also provides a method for authentication and tracks access to the original document.

Architectural overview

S3 Object Lambda processes and transforms data that is requested from Amazon Simple Storage Service (Amazon S3) before it’s sent back to a client. The AWS Lambda function is invoked inline via a standard S3 GET request. It can return different results from the same document based on parameters, such as who is requesting the document. Figure 1 provides a high-level view of the different components that make up the solution.

Document processing architectural diagram

Figure 1. Document processing architectural diagram

Authenticating users with Amazon Cognito

This architecture defines a RESTful API, but users will likely be using a mobile or web application that calls the API. Thus, the application will first need to authenticate users. We do this via Amazon Cognito, which functions as its own identity provider (IdP). You could also use an external IdP, including those that support OpenID Connect and SAML.

Validating the JSON Web Token with API Gateway

Once the user is successfully authenticated with Amazon Cognito, the application will be sent a JSON Web Token (JWT). This JWT contains information about the user and will be used in subsequent requests to the API.

Now that the application has a token, it will make a request to the API, which is provided by Amazon API Gateway. API Gateway provides a secure, scalable entryway into your application. The API Gateway validates the JWT sent from the client with Amazon Cognito to make sure it is valid. If it is validated, the request is accepted and sent on to the Lambda API Handler. If it’s not, the client gets rejected and sent an error code.

Storing user data with DynamoDB

When the Lambda API Handler receives the request, it parses the JWT to extract the user making the request. It then logs that user, file, and access time into Amazon DynamoDB. Optionally, you may use DynamoDB to store an encoded string that will be used as the watermark, rather than something in plaintext, like user name or email.

Generating the PDF and user-specific watermark

At this point, the Lambda API Handler sends an S3 GET request. However, instead of going to Amazon S3 directly, it goes to a different endpoint that invokes the S3 Object Lambda function. This endpoint is called an S3 Object Lambda Access Point. The S3 GET request contains the original file name and the string that will be used for the watermark.

The S3 Object Lambda function transforms the original file that it downloads from its source S3 bucket. It uses the open-source office suite LibreOffice (and specifically this Lambda layer) to convert the source document to PDF. Once it is converted, a JavaScript library (PDF-Lib) embeds the watermark into the PDF before it’s sent back to the Lambda API Handler function.

The Lambda API Handler stores the converted file in a temporary S3 bucket, generates a presigned URL, and sends that URL back to the client as a 302 redirect. Then the client sends a request to that presigned URL to get the converted file.

To keep the temporary S3 bucket tidy, we use an S3 lifecycle configuration with an expiration policy.

Figure 2. Process workflow for document transformation

Figure 2. Process workflow for document transformation

Alternate approach

Before S3 Object Lambda was available, [email protected] was used. However, there are three main issues with using [email protected] instead of S3 Object Lambda:

  1. It is designed to run code closer to the end user to decrease latency, but in this case, latency is not a major concern.
  2. It requires using an Amazon CloudFront distribution, and the single-download pattern described here will not take advantage of [email protected]’s caching.
  3. It has quotas on memory that don’t lend themselves to complex libraries like OfficeLibre.

Extending this solution

This blog post describes the basic building blocks for the solution, but it can be extended relatively easily. For example, you could add another function to the API that would convert, resize, and watermark images. To do this, create an S3 Object Lambda function to perform those tasks. Then, add an S3 Object Lambda Access Point to invoke it based on a different API call.

API Gateway has many built-in security features, but you may want to enhance the security of your RESTful API. To do this, add enhanced security rules via AWS WAF. Integrating your IdP into Amazon Cognito can give you a single place to manage your users.

Monitoring any solution is critical, and understanding how an application is behaving end to end can greatly benefit optimization and troubleshooting. Adding AWS X-Ray and Amazon CloudWatch Lambda Insights will show you how functions and their interactions are performing.

Should you decide to extend this architecture, follow the architectural principles defined in AWS Well-Architected, and pay particular attention to the Serverless Application Lens.

Example expanded document processing architecture

Figure 3. Example expanded document processing architecture


You can implement this solution in a number of ways. However, by using S3 Object Lambda, you can transform documents without needing intermediary storage. S3 Object Lambda will also decouple your file logic from the rest of the application.

The Serverless on AWS components mentioned in this post allow you to reduce administrative overhead, saving you time and money.

Finally, the extensible nature of this architecture allows you to add functionality easily as your organization’s needs grow and change.

The following links provide more information on how to use S3 Object Lambda in your architectures:

Configuring CORS on Amazon API Gateway APIs

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/configuring-cors-on-amazon-api-gateway-apis/

Configuring cross-origin resource sharing (CORS) settings for a backend server is a typical challenge that developers face when building web applications. CORS is a layer of security enforced by modern browsers and is required when the client domain does not match the server domain. The complexity of CORS often leads developers to abandon it entirely by allowing all-access with the proverbial “*” permissions setting. However, CORS is an essential part of your application’s security posture and should be correctly configured.

This post explains how to configure CORS on Amazon API Gateway resources to enforce the least privileged access to an endpoint using the AWS Serverless Application Model (AWS SAM). I cover the notable CORS differences between REST APIs and HTTP APIs. Finally, I introduce you to the Amazon API Gateway CORS Configurator. This is a tool built by the AWS Serverless Developer Advocacy team to help you configure CORS settings properly.


CORS is a mechanism by which a server limits access through the use of headers. In requests that are not considered simple, the server relies on the browser to make a CORS preflight or OPTIONS request. A full request looks like this:

CORS request flow

CORS request flow

  1. Client application initiates a request
  2. Browser sends a preflight request
  3. Server sends a preflight response
  4. Browser sends the actual request
  5. Server sends the actual response
  6. Client receives the actual response

The preflight request verifies the requirements of the server by indicating the origin, method, and headers to come in the actual request.

OPTIONS preflight request

OPTIONS preflight request

The response from the server differs based on the backend you are using. Some servers respond with the allowed origin, methods, and headers for the endpoint.

OPTIONS preflight response

OPTIONS preflight response

Others only return CORS headers if the requested origin, method, and headers meet the requirements of the server. If the requirements are not met, then the response does not contain any CORS access control headers. The browser verifies the request’s origin, method, and headers against the data returned in the preflight response. If validation fails, the browser throws a CORS error and halts the request. If the validation is successful, the browser continues with the actual request.

Actual request

Actual request

The browser only sends the access-control-allow-origin header to verify the requesting origin during the actual request. The server then responds with the requested data.

Actual response

Actual response

This step is where many developers run into issues. Notice the endpoint of the actual request returns the access-control-allow-origin header. The browser once again verifies this before taking action.

Both the preflight and the actual response require CORS configuration, and it looks different depending on whether you select REST API or HTTP API.

Configuring API Gateway for CORS

While Amazon API Gateway offers several API endpoint types, this post focuses on REST API (v1) and HTTP API (v2). Both types create a representational state transfer (REST) endpoint that proxies an AWS Lambda function and other AWS services or third-party endpoints. Both types process preflight requests. However, there are differences in both the configuration, and the format of the integration response.


Before walking through the configuration examples, it is important to understand some terminology:

  • Resource: A unique identifier for the API path (/customer/reports/{region}). Resources can have subresources that combine to make a unique path.
  • Method: the REST methods (for example, GET, POST, PUT, PATCH) the resource supports. The method is not part of the path but is passed through the headers.
  • Endpoint: A combination of resources and methods to create a unique API URL.


A popular use of API Gateway REST APIs is to proxy one or more Lambda functions to build a serverless backend. In this pattern, API Gateway does not modify the request or response payload. Therefore, REST API manages CORS through a combination of preflight configuration and a properly formed response from the Lambda function.

Preflight requests

Configuring CORS on REST APIs is generally configured in four lines of code with AWS SAM:

  AllowMethods: "'GET, POST, OPTIONS'"
  AllowOrigin: "'http://localhost:3000'"
  AllowHeaders: "'Content-type, x-api-key'"

This code snippet creates a MOCK API resource that processes all preflight requests for that resource. This configuration is an example of the least privileged access to the server. It only allows GET, POST, and OPTIONS methods from a localhost endpoint on port 3000. Additionally, it only allows the Content-type and x-api-key CORS headers.

Notice that the preflight response only allows one origin to call this API. To enable multiple origins with REST APIs, use ‘*’ for the allow-control-allow-origin header. Alternatively, use a Lambda function integration instead of a MOCK integration to set the header dynamically based on the origin of the caller.


When configuring CORS for REST APIs that require authentication, it is important to configure the preflight endpoint without authorization required. The preflight is generated by the browser and does not include the credentials by default. To remove the authorizer from the OPTIONS method add the AddDefaultAuthorizerToCorsPreflight: false setting to the authorization configuration.

  AddDefaultAuthorizerToCorsPreflight: false


In REST APIs proxy configurations, CORS settings only apply to the OPTIONS endpoint and cover only the preflight check by the browser. The Lambda function backing the method must respond with the appropriate CORS information to handle CORS properly in the actual response. The following is an example of a proper response:

  "statusCode": 200,
  "headers": {
    "access-control-allow-origin":" http://localhost:3000",
  "body": {"message": "hello world"}

In this response, the critical parts are the statusCode returned to the user as the response status and the access-control-allow-origin header required by the browser’s CORS validation.


Like REST APIs, Amazon API Gateway HTTP APIs are commonly used to proxy Lambda functions and are configured to handle preflight requests. However, unlike REST APIs, HTTP APIs handle CORS for the actual API response as well.

Preflight requests

The following example shows how to configure CORS on HTTP APIs with AWS SAM:

    - GET
    - POST
    - http://localhost:3000
    - https://myproddomain.com
    - Content-type
    - x-api-key

This template configures HTTP APIs to manage CORS for the preflight requests and the actual requests. Note that the AllowOrigin section allows more than one domain. When the browser makes a request, HTTP APIs checks the list for the incoming origin. If it exists, HTTP APIs adds it to the access-control-allow-origin header in the response.


When configuring CORS for HTTP APIs with authorization configured, HTTP APIs automatically configures the preflight endpoint without authorization required. The only caveat to this is the use of the $default route. When configuring a $default route, all methods and resources are handled by the default route and the integration behind it. This includes the preflight OPTIONS method.

There are two options to handle preflight. First, and recommended, is to break out the routes individually. Create a route specifically for each method and resource as needed. The second is to create an OPTIONS /{proxy+} method to override the $defaut route for preflight requests.


Unlike REST APIs, by default, HTTP APIs modify the response for the actual request by adding the appropriate CORS headers based upon the CORS configuration. The following is an example of a simple response:

"hello world"

HTTP APIs then constructs the complete response with your data, status code, and any required CORS headers:

  "statusCode": 200,
  "headers": {
    "access-control-allow-origin":"[appropriate origin]",
  "body": "hello world"

To set the status code manually, configure your response as follows:

  "statusCode": 201,
  "body": "hello world"

To manage the complete response like in REST APIs, set the payload format to version one. The payload format for HTTP API changes the structure of the payload sent to the Lambda function and the expected response from the Lambda function. By default, HTTP API uses version two, which includes the dynamic CORS settings. For more information, read how the payload version affects the response format in the documentation.

The Amazon API Gateway CORS Configurator

The AWS serverless developer advocacy team built the Amazon API Gateway CORS Configurator to help you configure CORS for your serverless applications.

Amazon API Gateway CORS Configurator

Amazon API Gateway CORS Configurator

Start by entering the information on the left. The CORS Configurator builds the proper snippets to add the CORS settings to your AWS SAM template as you add more information. The utility demonstrates adding the configuration to all APIs in the template by using the Globals section. You can also add to an API’s specific resource to affect only that API.

Additionally, the CORS Configurator constructs an example response based on the API type you are using.

This utility is currently in preview, and we welcome your feedback on how we can improve it. Feel free to open an issue on GitHub at https://github.com/aws-samples/amazon-api-gateway-cors-configurator.


CORS can be challenging. For API Gateway, CORS configuration is the number one question developers ask. In this post, I give an overview of CORS with a link to an in-depth explanation. I then show how to configure API Gateway to create the least privileged access to your server using CORS. I also discuss the differences in how REST APIs and HTTP APIs handle CORS. Finally, I introduced you to the API Gateway CORS Configurator to help you configure CORS using AWS SAM.

I hope to provide you with enough information that you can avoid opening up your servers with the “*” setting for CORS. Take the time to understand your application and limit requests to only methods you support and from only originating hosts you intended.

For more serverless content, go to Serverless Land.

Understanding VPC links in Amazon API Gateway private integrations

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/understanding-vpc-links-in-amazon-api-gateway-private-integrations/

This post is written by Jose Eduardo Montilla Lugo, Security Consultant, AWS.

A VPC link is a resource in Amazon API Gateway that allows for connecting API routes to private resources inside a VPC. A VPC link acts like any other integration endpoint for an API and is an abstraction layer on top of other networking resources. This helps simplify configuring private integrations.

This post looks at the underlying technologies that make VPC links possible. I further describe what happens under the hood when a VPC link is created for both REST APIs and HTTP APIs. Understanding these details can help you better assess the features and benefits provided by each type. This also helps you make better architectural decisions when designing API Gateway APIs.

This article assumes you have experience in creating APIs in API Gateway. The main purpose is to provide a deeper explanation of the technologies that make private integrations possible. For more information on creating API Gateway APIs with private integrations, refer to the Amazon API Gateway documentation.


AWS Hyperplane and AWS PrivateLink

There are two types of VPC links: VPC links for REST APIs and VPC links for HTTP APIs. Both provide access to resources inside a VPC. They are built on top of an internal AWS service called AWS Hyperplane. This is an internal network virtualization platform, which supports inter-VPC connectivity and routing between VPCs. Internally, Hyperplane supports multiple network constructs that AWS services use to connect with the resources in customers’ VPCs. One of those constructs is AWS PrivateLink, which is used by API Gateway to support private APIs and private integrations.

AWS PrivateLink allows access to AWS services and services hosted by other AWS customers, while maintaining network traffic within the AWS network. Since the service is exposed via a private IP address, all communication is virtually local and private. This reduces the exposure of data to the public internet.

In AWS PrivateLink, a VPC endpoint service is a networking resource in the service provider side that enables other AWS accounts to access the exposed service from their own VPCs. VPC endpoint services allow for sharing a specific service located inside the provider’s VPC by extending a virtual connection via an elastic network interface in the consumer’s VPC.

An interface VPC endpoint is a networking resource in the service consumer side, which represents a collection of one or more elastic network interfaces. This is the entry point that allows for connecting to services powered by AWS PrivateLink.

Comparing private APIs and private integrations

Private APIs are different to private integrations. Both use AWS PrivateLink but they are used in different ways.

A private API means that the API endpoint is reachable only through the VPC. Private APIs are accessible only from clients within the VPC or from clients that have network connectivity to the VPC. For example, from on-premises clients via AWS Direct Connect. To enable private APIs, an AWS PrivateLink connection is established between the customer’s VPC and API Gateway’s VPC.

Clients connect to private APIs via an interface VPC endpoint, which routes requests privately to the API Gateway service. The traffic is initiated from the customer’s VPC and flows through the AWS PrivateLink to the API Gateway’s AWS account:

Consumer connected to provider through VPC Link

Consumer connected to provider through VPC Link

When the VPC endpoint for API Gateway is enabled, all requests to API Gateway APIs made from inside the VPC go through the VPC endpoint. This is true for private APIs and public APIs. Public APIs are still accessible from the internet and private APIs are accessible only from the interface VPC endpoint. Currently, you can only configure REST APIs as private.

A private integration means that the backend endpoint resides within a VPC and it’s not publicly accessible. With a private integration, API Gateway service can access the backend endpoint in the VPC without exposing the resources to the public internet.

A private integration uses a VPC link to encapsulate connections between API Gateway and targeted VPC resources. VPC links allow access to HTTP/HTTPS resources within a VPC without having to deal with advanced network configurations. Both REST APIs and HTTP APIs offer private integrations but only VPC links for REST APIs use AWS PrivateLink internally.

VPC links for REST APIs

When you create a VPC link for a REST API, a VPC endpoint service is also created, making the AWS account a service provider. The service consumer in this case is API Gateway’s account. The API Gateway service creates an interface VPC endpoint in their account for the Region where the VPC link is being created. This establishes an AWS PrivateLink from the API Gateway VPC to your VPC. The target of the VPC endpoint service and the VPC link is a Network Load Balancer, which forwards requests to the target endpoints:

VPC Link for REST APIs

VPC Link for REST APIs

Before establishing any AWS PrivateLink connection, the service provider must approve the connection request. Requests from the API Gateway accounts are automatically approved in the VPC link creation process. This is because the AWS accounts that serve API Gateway for each Region are allow-listed in the VPC endpoint service.

When a Network Load Balancer is associated with an endpoint service, the traffic to the targets is sourced from the NLB. The targets receive the private IP addresses of the NLB, not the IP addresses of the service consumers.

This is helpful when configuring the security groups of the instances behind the NLB for two reasons. First, you do not know the IP address range of the VPC that’s connecting to the service. Second, NLB’s elastic network interfaces do not have any security groups attached. This means that they cannot be used as a source in the security groups of the targets. To learn more, read how to find the internal IP addresses assigned to an NLB.

To create a private API with a private integration, two AWS PrivateLink connections are established. The first is from a customer VPC to API Gateway’s VPC so that clients in the VPC can reach the API Gateway service endpoint. The other is from API Gateway’s VPC to the customer VPC so that API Gateway can reach the backend endpoint. Here is an example architecture:

Private API with private integrations

Private API with private integrations

VPC links for HTTP APIs

HTTP APIs are the latest type of API Gateway APIs that are cheaper and faster than REST APIs. VPC links for HTTP APIs do not require the creation of VPC endpoint services so a Network Load Balancer is not necessary. With VPC Links for HTTP APIs, you can now use an ALB or an AWS Cloud Map service to target private resources. This allows for more flexibility and scalability in the configuration required on both sides.

Configuring multiple integration targets is also easier with VPC links for HTTP APIs. For example, VPC links for REST APIs can be associated only with a single NLB. Configuring multiple backend endpoints requires some workarounds such as using multiple listeners on the NLB, associated with different target groups.

In contrast, a single VPC link for HTTP APIs can be associated with multiple backend endpoints without additional configuration. Also, with the new VPC link, customers with containerized applications can use ALBs instead of NLBs and take advantage of layer-7 load-balancing capabilities and other features such as authentication and authorization.

AWS Hyperplane supports multiple types of network virtualization constructs, including AWS PrivateLink. VPC links for REST APIs rely on AWS PrivateLink. However, VPC links for HTTP APIs use VPC-to-VPC NAT, which provides a higher level of abstraction.

The new construct is conceptually similar to a tunnel between both VPCs. These are created via elastic network interface attachments on the provider and consumer ends, which are both managed by AWS Hyperplane. This tunnel allows a service hosted in the provider’s VPC (API Gateway) to initiate communications to resources in a consumer’s VPC. API Gateway has direct connectivity to these elastic network interfaces and can reach the resources in the VPC directly from their own VPC. Connections are permitted according to the configuration of the security groups attached to the elastic network interfaces in the customer side.

Although it seems to provide the same functionality as AWS PrivateLink, these constructs differ in implementation details. A service endpoint in AWS PrivateLink allows for multiple connections to a single endpoint (the NLB), whereas the new approach allows a source VPC to connect to multiple destination endpoints. As a result, a single VPC link can integrate with multiple Application Load Balancers, Network Load Balancers, or resources registered with an AWS Cloud Map service on the customer side:

VPC Link for HTTP APIs

VPC Link for HTTP APIs

This approach is similar to the way that other services such as Lambda access resources inside customer VPCs.


This post explores how VPC links can set up API Gateway APIs with private integrations. VPC links for REST APIs encapsulate AWS PrivateLink resources such as interface VPC endpoints and VPC endpoint services to configure connections from API Gateway’s VPC to customer’s VPC to access private backend endpoints.

VPC links for HTTP APIs use a different construct in the AWS Hyperplane service to provide API Gateway with direct network access to VPC private resources. Understanding the differences between the two is important when adding private integrations as part of your API architecture design.

For more serverless learning resources, visit Serverless Land.

Building well-architected serverless applications: Building in resiliency – part 2

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-building-in-resiliency-part-2/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

Reliability question REL2: How do you build resiliency into your serverless application?

This post continues part 1 of this reliability question. Previously, I cover managing failures using retries, exponential backoff, and jitter. I explain how DLQs can isolate failed messages. I show how to use state machines to orchestrate long running transactions rather than handling these in application code.

Required practice: Manage duplicate and unwanted events

Duplicate events can occur when a request is retried or multiple consumers process the same message from a queue or stream. A duplicate can also happen when a request is sent twice at different time intervals with the same parameters. Design your applications to process multiple identical requests to have the same effect as making a single request.

Idempotency refers to the capacity of an application or component to identify repeated events and prevent duplicated, inconsistent, or lost data. This means that receiving the same event multiple times does not change the result beyond the first time the event was received. An idempotent application can, for example, handle multiple identical refund operations. The first refund operation is processed. Any further refund requests to the same customer with the same payment reference should not be processes again.

When using AWS Lambda, you can make your function idempotent. The function’s code must properly validate input events and identify if the events were processed before. For more information, see “How do I make my Lambda function idempotent?

When processing streaming data, your application must anticipate and appropriately handle processing individual records multiple times. There are two primary reasons why records may be delivered more than once to your Amazon Kinesis Data Streams application: producer retries and consumer retries. For more information, see “Handling Duplicate Records”.

Generate unique attributes to manage duplicate events at the beginning of the transaction

Create, or use an existing unique identifier at the beginning of a transaction to ensure idempotency. These identifiers are also known as idempotency tokens. A number of Lambda triggers include a unique identifier as part of the event:

You can also create your own identifiers. These can be business-specific, such as transaction ID, payment ID, or booking ID. You can use an opaque random alphanumeric string, unique correlation identifiers, or the hash of the content.

A Lambda function, for example can use these identifiers to check whether the event has been previously processed.

Depending on the final destination, duplicate events might write to the same record with the same content instead of generating a duplicate entry. This may therefore not require additional safeguards.

Use an external system to store unique transaction attributes and verify for duplicates

Lambda functions can use Amazon DynamoDB to store and track transactions and idempotency tokens to determine if the transaction has been handled previously. DynamoDB Time to Live (TTL) allows you to define a per-item timestamp to determine when an item is no longer needed. This helps to limit the storage space used. Base the TTL on the event source. For example, the message retention period for SQS.

Using DynamoDB to store idempotent tokens

Using DynamoDB to store idempotent tokens

You can also use DynamoDB conditional writes to ensure a write operation only succeeds if an item attribute meets one of more expected conditions. For example, you can use this to fail a refund operation if a payment reference has already been refunded. This signals to the application that it is a duplicate transaction. The application can then catch this exception and return the same result to the customer as if the refund was processed successfully.

Third-party APIs can also support idempotency directly. For example, Stripe allows you to add an Idempotency-Key: <key> header to the request. Stripe saves the resulting status code and body of the first request made for any given idempotency key, regardless of whether it succeeded or failed. Subsequent requests with the same key return the same result.

Validate events using a pre-defined and agreed upon schema

Implicitly trusting data from clients, external sources, or machines could lead to malformed data being processed. Use a schema to validate your event conforms to what you are expecting. Process the event using the schema within your application code or at the event source when applicable. Events not adhering to your schema should be discarded.

For API Gateway, I cover validating incoming HTTP requests against a schema in “Implementing application workload security – part 1”.

Amazon EventBridge rules match event patterns. EventBridge provides schemas for all events that are generated by AWS services. You can create or upload custom schemas or infer schemas directly from events on an event bus. You can also generate code bindings for event schemas.

SNS supports message filtering. This allows a subscriber to receive a subset of the messages sent to the topic using a filter policy. For more information, see the documentation.

JSON Schema is a tool for validating the structure of JSON documents. There are a number of implementations available.

Best practice: Consider scaling patterns at burst rates

Load testing your serverless application allows you to monitor the performance of an application before it is deployed to production. Serverless applications can be simpler to load test, thanks to the automatic scaling built into many of the services. For more information, see “How to design Serverless Applications for massive scale”.

In addition to your baseline performance, consider evaluating how your workload handles initial burst rates. This ensures that your workload can sustain burst rates while scaling to meet possibly unexpected demand.

Perform load tests using a burst strategy with random intervals of idleness

Perform load tests using a burst of requests for a short period of time. Also introduce burst delays to allow your components to recover from unexpected load. This allows you to future-proof the workload for key events when you do not know peak traffic levels.

There are a number of AWS Marketplace and AWS Partner Network (APN) solutions available for performance testing, including Gatling FrontLine, BlazeMeter, and Apica.

In regulating inbound request rates – part 1, I cover running a performance test suite using Gatling, an open source tool.

Gatling performance results

Gatling performance results

Amazon does have a network stress testing policy that defines which high volume network tests are allowed. Tests that purposefully attempt to overwhelm the target and/or infrastructure are considered distributed denial of service (DDoS) tests and are prohibited. For more information, see “Amazon EC2 Testing Policy”.

Review service account limits with combined utilization across resources

AWS accounts have default quotas, also referred to as limits, for each AWS service. These are generally Region-specific. You can request increases for some limits while other limits cannot be increased. Service Quotas is an AWS service that helps you manage your limits for many AWS services. Along with looking up the values, you can also request a limit increase from the Service Quotas console.

Service Quotas dashboard

Service Quotas dashboard

As these limits are shared within an account, review the combined utilization across resources including the following:

  • Amazon API Gateway: number of requests per second across all APIs. (link)
  • AWS AppSync: throttle rate limits. (link)
  • AWS Lambda: function concurrency reservations and pool capacity to allow other functions to scale. (link)
  • Amazon CloudFront: requests per second per distribution. (link)
  • AWS IoT Core message broker: concurrent requests per second. (link)
  • Amazon EventBridge: API requests and target invocations limit. (link)
  • Amazon Cognito: API limits. (link)
  • Amazon DynamoDB: throughput, indexes, and request rates limits. (link)

Evaluate key metrics to understand how workloads recover from bursts

There are a number of key Amazon CloudWatch metrics to evaluate and alert on to understand whether your workload recovers from bursts.

  • AWS Lambda: Duration, Errors, Throttling, ConcurrentExecutions, UnreservedConcurrentExecutions. (link)
  • Amazon API Gateway: Latency, IntegrationLatency, 5xxError, 4xxError. (link)
  • Application Load Balancer: HTTPCode_ELB_5XX_Count, RejectedConnectionCount, HTTPCode_Target_5XX_Count, UnHealthyHostCount, LambdaInternalError, LambdaUserError. (link)
  • AWS AppSync: 5XX, Latency. (link)
  • Amazon SQS: ApproximateAgeOfOldestMessage. (link)
  • Amazon Kinesis Data Streams: ReadProvisionedThroughputExceeded, WriteProvisionedThroughputExceeded, GetRecords.IteratorAgeMilliseconds, PutRecord.Success, PutRecords.Success (if using Kinesis Producer Library), GetRecords.Success. (link)
  • Amazon SNS: NumberOfNotificationsFailed, NumberOfNotificationsFilteredOut-InvalidAttributes. (link)
  • Amazon Simple Email Service (SES): Rejects, Bounces, Complaints, Rendering Failures. (link)
  • AWS Step Functions: ExecutionThrottled, ExecutionsFailed, ExecutionsTimedOut. (link)
  • Amazon EventBridge: FailedInvocations, ThrottledRules. (link)
  • Amazon S3: 5xxErrors, TotalRequestLatency. (link)
  • Amazon DynamoDB: ReadThrottleEvents, WriteThrottleEvents, SystemErrors, ThrottledRequests, UserErrors. (link)


This post continues from part 1 and looks at managing duplicate and unwanted events with idempotency and an event schema. I cover how to consider scaling patterns at burst rates by managing account limits and show relevant metrics to evaluate

Build resiliency into your workloads. Ensure that applications can withstand partial and intermittent failures across components that may only surface in production. In the next post in the series, I cover the performance efficiency pillar from the Well-Architected Serverless Lens.

For more serverless learning resources, visit Serverless Land.

Building well-architected serverless applications: Regulating inbound request rates – part 2

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-regulating-inbound-request-rates-part-2/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

Reliability question REL1: How do you regulate inbound request rates?

This post continues part 1 of this security question. Previously, I cover controlling inbound request rates using throttling. I go through how to use throttling to control steady-rate and burst rate requests. I show some solutions for performance testing to identify the request rates that your workload can sustain before impacting performance.

Good practice: Use, analyze, and enforce API quotas

API quotas limit the maximum number of requests a given API key can submit within a specified time interval. Metering API consumers provides a better understanding of how different consumers use your workload at sustained and burst rates at any point in time. With this information, you can determine fine-grained rate limiting for multiple quota limits. These can be done according to a group of consumer needs, and can adjust their limits on a regular basis.

Segregate API consumers steady-rate requests and their quota into multiple buckets or tiers

Amazon API Gateway usage plans allow your API consumer to access selected APIs at agreed-upon request rates and quotas. These help your consumers meet their business requirements and budget constraints. Create and attach API keys to usage plans to control access to certain API stages. I show how to create usage plans and how to associate them with API keys in “Building well-architected serverless applications: Controlling serverless API access – part 2”.

API key associated with usage plan

API key associated with usage plan

You can extract utilization data from usage plans to analyze API usage on a per-API key basis. In the example, I show how to use usage plans to see how many requests are made.

View API key usage

View API key usage

This allows you to generate billing documents and determine whether your customers need higher or lower limits. Have a mechanism to allow customers to request higher limits preemptively. When customers anticipate greater API usage, they can take action proactively.

API Gateway Lambda authorizers can dynamically associate API keys to a given request. This can be used where you do not control API consumers, or want to associate API keys based on your own criteria. For more information, see the documentation.

You can also visualize usage plans with Amazon QuickSight using enriched API Gateway access logs.

Visualize usage plans with Amazon QuickSight

Visualize usage plans with Amazon QuickSight

Define whether your API consumers are end users or machines

Understanding your API consumers helps you manage how they connect to your API. This helps you define a request access pattern strategy, which can distinguish between end users or machines.

Machine consumers make automated connections to your API, which may require a different access pattern to end users. You may decide to prioritize end user consumers to provide a better experience. Machine consumers may be able to handle request throttling automatically.

Best practice: Use mechanisms to protect non-scalable resources

Limit component throughput by enforcing how many transactions it can accept

AWS Lambda functions can scale faster than traditional resources, such as relational databases and cache systems. Protect your non-scalable resources by ensuring that components that scale quickly do not exceed the throughput of downstream systems. This can prevent system performance degrading. There are a number of ways to achieve this, either directly or via buffer mechanisms such as queues and streams.

For relational databases such as Amazon RDS, you can limit the number of connections per user, in addition to the global maximum number of connections. With Amazon RDS Proxy, your applications can pool and share database connections to improve their ability to scale.

Amazon RDS Proxy

Amazon RDS Proxy

For additional options for using RDS with Lambda, see the AWS Serverless Hero blog post “How To: Manage RDS Connections from AWS Lambda Serverless Functions”.

Cache results and only connect to, and fetch data from databases when needed. This reduces the load on the downstream database. Adjust the maximum number of connections for caching systems. Include a caching expiration mechanism to prevent serving stale records. For more information on caching implementation patterns and considerations, see “Caching Best Practices”.

Lambda provides managed scaling. When a function is first invoked, the Lambda service creates an instance of the function to process the event. This is called a cold start. After completion, the function remains available for a period of time to process subsequent events. These are called warm starts. If other events arrive while the function is busy, Lambda creates more instances of the function to handle these requests concurrently as cold starts. The following example shows 10 events processed in six concurrent requests.

Lambda concurrency

Lambda concurrency

You can control the number of concurrent function invocations to both reserve and limit the maximum concurrency your function can achieve. You can configure reserved concurrency to set the maximum number of concurrent instances for the function. This can protect downstream resources such as a database by ensuring Lambda can only scale up to the number of connections the database can support.

For example, you may have a traditional database or external API that can only support a maximum of 50 concurrent connections. You can set the maximum number of concurrent Lambda functions using the function concurrency settings. Setting the value to 50 ensures that the traditional database or external API is not overwhelmed.

Edit Lambda concurrency

Edit Lambda concurrency

You can also set the Lambda function concurrency to 0, which disables the Lambda function in the event of anomalies.

Another solution to protect downstream resources is to use an intermediate buffer. A buffer can persistently store messages in a stream or queue until a receiver processes them. This helps you control how fast messages are processed, which can protect the load on downstream resources.

Amazon Kinesis Data Streams allows you to collect and process large streams of data records in real time, and can act as a buffer. Streams consist of a set of shards that contain a sequence of data records. When using Lambda to process records, it processes one batch of records at a time from each shard.

Kinesis Data Streams control concurrency at the shard level, meaning that a single shard has a single concurrent invocation. This can reduce downstream calls to non-scalable resources such as a traditional database. Kinesis Data Streams also support batch windows up to 5 minutes and batch record sizes. These can also be used to control how frequent invocations can occur.

To learn how to manage scaling with Kinesis, see the documentation. To learn more how Lambda works with Kinesis, read the blog series “Building serverless applications with streaming data”.

Lambda and Kinesis shards

Lambda and Kinesis shards

Amazon Simple Queue Service (SQS) is a fully managed serverless message queuing service that enables you to decouple and scale microservices. You can offload tasks from one component of your application by sending them to a queue and processing them asynchronously.

SQS can act as a buffer, using a Lambda function to process the messages. Lambda polls the queue and invokes your Lambda function synchronously with an event that contains queue messages. Lambda reads messages in batches and invokes your function once for each batch. When your function successfully processes a batch, Lambda deletes its messages from the queue.

You can protect downstream resources using the Lambda concurrency controls. This limits the number of concurrent Lambda functions that pull messages off the queue. The messages persist in the queue until Lambda can process them. For more information see, “Using AWS Lambda with Amazon SQS

Lambda and SQS

Lambda and SQS


Regulating inbound requests helps you adapt different scaling mechanisms based on customer demand. You can achieve better throughput for your workloads and make them more reliable by controlling requests to a rate that your workload can support.

In this post, I cover using, analyzing, and enforcing API quotas using usage plans and API keys. I show mechanisms to protect non-scalable resources such as using RDS Proxy to protect downstream databases. I show how to control the number of Lambda invocations using concurrency controls to protect downstream resources. I explain how you can use streams and queues as an intermediate buffer to store messages persistently until a receiver processes them.

In the next post in the series, I cover the second reliability question from the Well-Architected Serverless Lens, building resiliency into serverless applications.

For more serverless learning resources, visit Serverless Land.

Building well-architected serverless applications: Regulating inbound request rates – part 1

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-regulating-inbound-request-rates-part-1/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

Reliability question REL1: How do you regulate inbound request rates?

Defining, analyzing, and enforcing inbound request rates helps achieve better throughput. Regulation helps you adapt different scaling mechanisms based on customer demand. By regulating inbound request rates, you can achieve better throughput, and adapt client request submissions to a request rate that your workload can support.

Required practice: Control inbound request rates using throttling

Throttle inbound request rates using steady-rate and burst rate requests

Throttling requests limits the number of requests a client can make during a certain period of time. Throttling allows you to control your API traffic. This helps your backend services maintain their performance and availability levels by limiting the number of requests to actual system throughput.

To prevent your API from being overwhelmed by too many requests, Amazon API Gateway throttles requests to your API. These limits are applied across all clients using the token bucket algorithm. API Gateway sets a limit on a steady-state rate and a burst of request submissions. The algorithm is based on an analogy of filling and emptying a bucket of tokens representing the number of available requests that can be processed.

Each API request removes a token from the bucket. The throttle rate then determines how many requests are allowed per second. The throttle burst determines how many concurrent requests are allowed. I explain the token bucket algorithm in more detail in “Building well-architected serverless applications: Controlling serverless API access – part 2

Token bucket algorithm

Token bucket algorithm

API Gateway limits the steady-state rate and burst requests per second. These are shared across all APIs per Region in an account. For further information on account-level throttling per Region, see the documentation. You can request account-level rate limit increases using the AWS Support Center. For more information, see Amazon API Gateway quotas and important notes.

You can configure your own throttling levels, within the account and Region limits to improve overall performance across all APIs in your account. This restricts the overall request submissions so that they don’t exceed the account-level throttling limits.

You can also configure per-client throttling limits. Usage plans restrict client request submissions to within specified request rates and quotas. These are applied to clients using API keys that are associated with your usage policy as a client identifier. You can add throttling levels per API route, stage, or method that are applied in a specific order.

For more information on API Gateway throttling, see the AWS re:Invent presentation “I didn’t know Amazon API Gateway could do that”.

API Gateway throttling

API Gateway throttling

You can also throttle requests by introducing a buffering layer using Amazon Kinesis Data Stream or Amazon SQS. Kinesis can limit the number of requests at the shard level while SQS can limit at the consumer level. For more information on using SQS as a buffer with Amazon Simple Notification Service (SNS), read “How To: Use SNS and SQS to Distribute and Throttle Events”.

Identify steady-rate and burst rate requests that your workload can sustain at any point in time before performance degraded

Load testing your serverless application allows you to monitor the performance of an application before it is deployed to production. Serverless applications can be simpler to load test, thanks to the automatic scaling built into many of the services. During a load test, you can identify quotas that may act as a limiting factor for the traffic you expect and take action.

Perform load testing for a sustained period of time. Gradually increase the traffic to your API to determine your steady-state rate of requests. Also use a burst strategy with no ramp up to determine the burst rates that your workload can serve without errors or performance degradation. There are a number of AWS Marketplace and AWS Partner Network (APN) solutions available for performance testing, Gatling Frontline, BlazeMeter, and Apica.

In the serverless airline example used in this series, you can run a performance test suite using Gatling, an open source tool.

To deploy the test suite, follow the instructions in the GitHub repository perf-tests directory. Uncomment the deploy.perftest line in the repository Makefile.

Perf-test makefile

Perf-test makefile

Once the file is pushed to GitHub, AWS Amplify Console rebuilds the application, and deploys an AWS CloudFormation stack. You can run the load tests locally, or use an AWS Step Functions state machine to run the setup and Gatling load test simulation.

Performance test using Step Functions

Performance test using Step Functions

The Gatling simulation script uses constantUsersPerSec and rampUsersPerSec to add users for a number of test scenarios. You can use the test to simulate load on the application. Once the tests run, it generates a downloadable report.

Gatling performance results

Gatling performance results

Artillery Community Edition is another open-source tool for testing serverless APIs. You configure the number of requests per second and overall test duration, and it uses a headless Chromium browser to run its test flows. For Artillery, the maximum number of concurrent tests is constrained by your local computing resources and network. To achieve higher throughput, you can use Serverless Artillery, which runs the Artillery package on Lambda functions. As a result, this tool can scale up to a significantly higher number of tests.

For more information on how to use Artillery, see “Load testing a web application’s serverless backend”. This runs tests against APIs in a demo application. For example, one of the tests fetches 50,000 questions per hour. This calls an API Gateway endpoint and tests whether the AWS Lambda function, which queries an Amazon DynamoDB table, can handle the load.

Artillery performance test

Artillery performance test

This is a synchronous API so the performance directly impacts the user’s experience of the application. This test shows that the median response time is 165 ms with a p95 time of 201 ms.

Performance test API results

Performance test API results

Another consideration for API load testing is whether the authentication and authorization service can handle the load. For more information on load testing Amazon Cognito and API Gateway using Step Functions, see “Using serverless to load test Amazon API Gateway with authorization”.

API load testing with authentication and authorization

API load testing with authentication and authorization


Regulating inbound requests helps you adapt different scaling mechanisms based on customer demand. You can achieve better throughput for your workloads and make them more reliable by controlling requests to a rate that your workload can support.

In this post, I cover controlling inbound request rates using throttling. I show how to use throttling to control steady-rate and burst rate requests. I show some solutions for performance testing to identify the request rates that your workload can sustain before performance degradation.

This well-architected question will be continued where I look at using, analyzing, and enforcing API quotas. I cover mechanisms to protect non-scalable resources.

For more serverless learning resources, visit Serverless Land.

Data Caching Across Microservices in a Serverless Architecture

Post Syndicated from Irfan Saleem original https://aws.amazon.com/blogs/architecture/data-caching-across-microservices-in-a-serverless-architecture/

Organizations are re-architecting their traditional monolithic applications to incorporate microservices. This helps them gain agility and scalability and accelerate time-to-market for new features.

Each microservice performs a single function. However, a microservice might need to retrieve and process data from multiple disparate sources. These can include data stores, legacy systems, or other shared services deployed on premises in data centers or in the cloud. These scenarios add latency to the microservice response time because multiple real-time calls are required to the backend systems. The latency often ranges from milliseconds to a few seconds depending on size of the data, network bandwidth, and processing logic. In certain scenarios, it makes sense to maintain a cache close to the microservices layer to improve performance by reducing or eliminating the need for the real-time backend calls.

Caches reduce latency and service-to-service communication of microservice architectures. A cache is a high-speed data storage layer that stores a subset of data. When data is requested from a cache, it is delivered faster than if you accessed the data’s primary storage location.

While working with our customers, we have observed use cases where data caching helps reduce latency in the microservices layer. Caching can be implemented in several ways. In this blog post, we discuss a couple of these use cases that customers have built. In both use cases, the microservices layer is created using Serverless on AWS offerings. It requires data from multiple data sources deployed locally in the cloud or on premises. The compute layer is built using AWS Lambda. Though Lambda functions are short-lived, the cached data can be used by subsequent instances of the same microservice to avoid backend calls.

Use case 1: On-demand cache to reduce real-time calls

In this use case, the Cache-Aside design pattern is used for lazy loading of frequently accessed data. This means that an object is only cached when it is requested by a consumer, and the respective microservice decides if the object is worth saving.

This use case is typically useful when the microservices layer makes multiple real-time calls to fetch and process data. These calls can be greatly reduced by caching frequently accessed data for a short period of time.

Let’s discuss a real-world scenario. Figure 1 shows a customer portal that provides a list of car loans, their status, and the net outstanding amount for a customer:

  • The Billing microservice gets a request. It then tries to get required objects (for example, the list of car loans, their status, and the net outstanding balance) from the cache using an object_key. If the information is available in the cache, a response is sent back to the requester using cached data.
  • If requested objects are not available in the cache (a cache miss), the Billing microservice makes multiple calls to local services, applications, and data sources to retrieve data. The result is compiled and sent back to the requester. It also resides in the cache for a short period of time.
  • Meanwhile, if a customer makes a payment using the Payment microservice, the balance amount in the cache must be invalidated/deleted. The Payment microservice processes the payment and invokes an asynchronous event (payment_processed) with the respective object key for the downstream processes that will remove respective objects from the cache.
  • The events are stored in the event store.
  • The CacheManager microservice gets the event (payment_processed) and makes a delete request to the cache for the respective object_key. If necessary, the CacheManager can also refresh cached data. It can call a resource within the Billing service or it can refresh data directly from the source system depending on the data refresh logic.
Reducing latency by caching frequently accessed data on demand

Figure 1. Reducing latency by caching frequently accessed data on demand

Figure 2 shows AWS services for use case 1. The microservices layer (Billing, Payments, and Profile) is created using Lambda. The Amazon API Gateway is exposing Lambda functions as API operations to the internal or external consumers.

Suggested AWS services for implementing use case 1

Figure 2. Suggested AWS services for implementing use case 1

All three microservices are connected with the data cache and can save and retrieve objects from the cache. The cache is maintained in-memory using Amazon ElastiCache. The data objects are kept in cache for a short period of time. Every object has an associated TTL (time to live) value assigned to it. After that time period, the object expires. The custom events (such as payment_processed) are published to Amazon EventBridge for downstream processing.

Use case 2: Proactive caching of massive volumes of data

During large modernization and migration initiatives, not all data sources are colocated for a certain period of time. Some legacy systems, such as mainframe, require a longer decommissioning period. Many legacy backend systems process data through periodic batch jobs. In such scenarios, front-end applications can use cached data for a certain period of time (ranging from a few minutes to few hours) depending on nature of data and its usage. The real-time calls to the backend systems cannot deal with the extensive call volume on the front-end application.

In such scenarios, required data/objects can be identified up front and loaded directly into the cache through an automated process as shown in Figure 3:

  • An automated process loads data/objects in the cache during the initial load. Subsequent changes to the data sources (either in a mainframe database or another system of record) are captured and applied to the cache through an automated CDC (change data capture) pipeline.
  • Unlike use case 1, the microservices layer does not make real-time calls to load data into the cache. In this use case, microservices use data already cached for their processing.
  • However, the microservices layer may create an event if data in the cache is stale or specific objects have been changed by another service (for example, by the Payment service when a payment is made).
  • The events are stored in Event Manager. Upon receiving an event, the CacheManager initiates a backend process to refresh stale data on demand.
  • All data changes are sent directly to the system of record.
Eliminating real-time calls by caching massive data volumes proactively

Figure 3. Eliminating real-time calls by caching massive data volumes proactively

As shown in Figure 4, the data objects are maintained in Amazon DynamoDB, which provides low-latency data access at any scale. The data retrieval is managed through DynamoDB Accelerator (DAX), a fully managed, highly available, in-memory cache. It delivers up to a 10 times performance improvement, even at millions of requests per second.

Suggested AWS services for implementing use case 2

Figure 4. Suggested AWS services for implementing use case 2

The data in DynamoDB can be loaded through different methods depending on the customer use case and technology landscape. API Gateway, Lambda, and EventBridge are providing similar functionality as described in use case 1.

Use case 2 is also beneficial in scenarios where front-end applications must cache data for an extended period of time, such as a customer’s shopping cart.

In addition to caching, the following best practices can also be used to reduce latency and to improve performance within the Lambda compute layer:


The microservices architecture allows you to build several caching layers depending on your use case. In this blog, we discussed data caching within the compute layer to reduce latency when data is retrieved from disparate sources. The information from use case 1 can help you reduce real-time calls to your back-end system by saving frequently used data to the cache. Use case 2 helps you maintain large volumes of data in caches for extended periods of time when real-time calls to the backend system are not possible.

Should I Run my Containers on AWS Fargate, AWS Lambda, or Both?

Post Syndicated from Rob Solomon original https://aws.amazon.com/blogs/architecture/should-i-run-my-containers-on-aws-fargate-aws-lambda-or-both/

Containers have transformed how companies build and operate software. Bundling both application code and dependencies into a single container image improves agility and reduces deployment failures. But what compute platform should you choose to be most efficient, and what factors should you consider in this decision?

With the release of container image support for AWS Lambda functions (December 2020), customers now have an additional option for building serverless applications using their existing container-oriented tooling and DevOps best practices. In addition, a single container image can be configured to run on both of these compute platforms: AWS Lambda (using serverless functions) or AWS Fargate (using containers).

Three key factors can influence the decision of what platform you use to deploy your container: startup time, task runtime, and cost. That decision may vary each time a task is initiated, as shown in the three scenarios following.

Design considerations for deploying a container

Total task duration consists of startup time and runtime. The startup time of a containerized task is the time required to provision the container compute resource and deploy the container. Task runtime is the time it takes for the application code to complete.

Startup time: Some tasks must complete quickly. For example, when a user waits for a web response, or when a series of tasks is completed in sequential order. In those situations, the total duration time must be minimal. While the application code may be optimized to run faster, startup time depends on the chosen compute platform as well. AWS Fargate container startup time typically takes from 60 to 90 seconds. AWS Lambda initial cold start can take up to 5 seconds. Following that first startup, the same containerized function has negligible startup time.

Task runtime: The amount of time it takes for a task to complete is influenced by the compute resources allocated (vCPU and memory) and application code. AWS Fargate lets you select vCPU and memory size. With AWS Lambda, you define the amount of allocated memory. Lambda then provisions a proportional quantity of vCPU. In both AWS Fargate and AWS Lambda uses, increasing the amount of compute resources may result in faster completion time. However, this will depend on the application. While the additional compute resources incur greater cost, the total duration may be shorter, so the overall cost may also be lower.

AWS Lambda has a maximum limit of 15 minutes of runtime. Lambda shouldn’t be used for these tasks to avoid the likelihood of timeout errors.

Figure 1 illustrates the proportion of startup time to total duration. The initial steepness of each line shows a rapid decrease in startup overhead. This is followed by a flattening out, showing a diminishing rate of efficiency. Startup time delay becomes less impactful as the total job duration increases. Other factors (such as cost) become more significant.

Figure 1. Ratio of startup time as a function to overall job duration for each service

Figure 1. Ratio of startup time as a function to overall job duration for each service

Cost: When making the choice between Fargate and Lambda, it is important to understand the different pricing models. This way, you can make the appropriate selection for your needs.

Figure 2 shows a cost analysis of Lambda vs Fargate. This is for the entire range of configurations for a runtime task. For most of the range of configurable memory, AWS Lambda is more expensive per second than even the most expensive configuration of Fargate.

Figure 2. Total cost for both AWS Lambda and AWS Fargate based on task duration

Figure 2. Total cost for both AWS Lambda and AWS Fargate based on task duration

From a cost perspective, AWS Fargate is more cost-effective for tasks running for several seconds or longer. If cost is the only factor at play, then Fargate would be the better choice. But the savings gained by using Fargate may be offset by the business value gained from the shorter Lambda function startup time.

Dynamically choose your compute platform

In the following scenarios, we show how a single container image can serve multiple use cases. The decision to run a given containerized application on either AWS Lambda or AWS Fargate can be determined at runtime. This decision depends on whether cost, speed, or duration are the priority.

In Figure 3, an image-processing AWS Batch job runs on a nightly schedule, processing tens of thousands of images to extract location information. When run as a batch job, image processing may take 1–2 hours. The job pulls images stored in Amazon Simple Storage Service (S3) and writes the location metadata to Amazon DynamoDB. In this case, AWS Fargate provides a good combination of compute and cost efficiency. An added benefit is that it also supports tasks that exceed 15 minutes. If a single image is submitted for real-time processing, response time is critical. In that case, the same image-processing code can be run on AWS Lambda, using the same container image. Rather than waiting for the next batch process to run, the image is processed immediately.

Figure 3. One-off invocation of a typically long-running batch job

Figure 3. One-off invocation of a typically long-running batch job

In Figure 4, a SaaS application uses an AWS Lambda function to allow customers to submit complex text search queries for files stored in an Amazon Elastic File System (EFS) volume. The task should return results quickly, which is an ideal condition for AWS Lambda. However, a small percentage of jobs run much longer than the average, exceeding the maximum duration of 15 minutes.

A straightforward approach to avoid job failure is to initiate an Amazon CloudWatch alarm when the Lambda function times out. CloudWatch alarms can automatically retry the job using Fargate. An alternate approach is to capture historical data and use it to create a machine learning model in Amazon SageMaker. When a new job is initiated, the SageMaker model can predict the time it will take the job to complete. Lambda can use that prediction to route the job to either AWS Lambda or AWS Fargate.

Figure 4. Short duration tasks with occasional outliers running longer than 15 minutes

Figure 4. Short duration tasks with occasional outliers running longer than 15 minutes

In Figure 5, a customer runs a containerized legacy application that encompasses many different kinds of functions, all related to a recurring data processing workflow. Each function performs a task of varying complexity and duration. These can range from processing data files, updating a database, or submitting machine learning jobs.

Using a container image, one code base can be configured to contain all of the individual functions. Longer running functions, such as data preparation and big data analytics, are routed to Fargate. Shorter duration functions like simple queries can be configured to run using the container image in AWS Lambda. By using AWS Step Functions as an orchestrator, the process can be automated. In this way, a monolithic application can be broken up into a set of “Units of Work” that operate independently.

Figure 5. Heterogeneous function orchestration

Figure 5. Heterogeneous function orchestration


If your job lasts milliseconds and requires a fast response to provide a good customer experience, use AWS Lambda. If your function is not time-sensitive and runs on the scale of minutes, use AWS Fargate. For tasks that have a total duration of under 15 minutes, customers must decide based on impacts to both business and cost. Select the service that is the most effective serverless compute environment to meet your requirements. The choice can be made manually when a job is scheduled or by using retry logic to switch to the other compute platform if the first option fails. The decision can also be based on a machine learning model trained on historical data.

Integrating Amazon API Gateway private endpoints with on-premises networks

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/integrating-amazon-api-gateway-private-endpoints-with-on-premises-networks/

This post was written by Ahmed ElHaw, Sr. Solutions Architect

Using AWS Direct Connect or AWS Site-to-Site VPN, customers can establish a private virtual interface from their on-premises network directly to their Amazon Virtual Private Cloud (VPC). Hybrid networking enables customers to benefit from the scalability, elasticity, and ease of use of AWS services while using their corporate network.

Amazon API Gateway can make it easier for developers to interface with and expose other services in a uniform and secure manner. You can use it to interface with other AWS services such as Amazon SageMaker endpoints for real-time machine learning predictions or serverless compute with AWS Lambda. API Gateway can also integrate with HTTP endpoints and VPC links in the backend.

This post shows how to set up a private API Gateway endpoint with a Lambda integration. It uses a Route 53 resolver, which enables on-premises clients to resolve AWS private DNS names.


API Gateway private endpoints allow you to use private API endpoints inside your VPC. When used with Route 53 resolver endpoints and hybrid connectivity, you can access APIs and their integrated backend services privately from on-premises clients.

You can deploy the example application using the AWS Serverless Application Model (AWS SAM). The deployment creates a private API Gateway endpoint with a Lambda integration and a Route 53 inbound endpoint. I explain the security configuration of the AWS resources used. This is the solution architecture:

Private API Gateway with a Hello World Lambda integration.

Private API Gateway with a Hello World Lambda integration.


  1. The client calls the private API endpoint (for example, GET https://abc123xyz0.execute-api.eu-west-1.amazonaws.com/demostage).
  2. The client asks the on-premises DNS server to resolve (abc123xyz0.execute-api.eu-west-1.amazonaws.com). You must configure the on-premises DNS server to forward DNS queries for the AWS-hosted domains to the IP addresses of the inbound resolver endpoint. Refer to the documentation for your on-premises DNS server to configure DNS forwarders.
  3. After the client successfully resolves the API Gateway private DNS name, it receives the private IP address of the VPC Endpoint of the API Gateway.
    Note: Call the DNS endpoint of the API Gateway for the HTTPS certificate to work. You cannot call the IP address of the endpoint directly.
  4. Amazon API Gateway passes the payload to Lambda through an integration request.
  5. If Route 53 Resolver query logging is configured, queries from on-premises resources that use the endpoint are logged.


To deploy the example application in this blog post, you need:

  • AWS credentials that provide the necessary permissions to create the resources. This example uses admin credentials.
  • Amazon VPN or AWS Direct Connect with routing rules that allow DNS traffic to pass through to the Amazon VPC.
  • The AWS SAM CLI installed.
  • Clone the GitHub repository.

Deploying with AWS SAM

  1. Navigate to the cloned repo directory. Alternatively, use the sam init command and paste the repo URL:

    SAM init example

    SAM init example

  2. Build the AWS SAM application:
    sam build
  3. Deploy the AWS SAM application:
    sam deploy –guided

This stack creates and configures a virtual private cloud (VPC) configured with two private subnets (for resiliency) and DNS resolution enabled. It also creates a VPC endpoint with (service name = “com.amazonaws.{region}.execute-api”), Private DNS Name = enabled, and a security group set to allow TCP Port 443 inbound from a managed prefix list. You can edit the created prefix list with one or more CIDR block(s).

It also deploys an API Gateway private endpoint and an API Gateway resource policy that restricts access to the API, except from the VPC endpoint. There is also a “Hello world” Lambda function and a Route 53 inbound resolver with a security group that allows TCP/UDP DNS port inbound from the on-premises prefix list.

A VPC endpoint is a logical construct consisting of elastic network interfaces deployed in subnets. The elastic network interface is assigned a private IP address from your subnet space. For high availability, deploy in at least two Availability Zones.

Private API Gateway VPC endpoint

Private API Gateway VPC endpoint

Route 53 inbound resolver endpoint

Route 53 resolver is the Amazon DNS server. It is sometimes referred to as “AmazonProvidedDNS” or the “.2 resolver” that is available by default in all VPCs. Route 53 resolver responds to DNS queries from AWS resources within a VPC for public DNS records, VPC-specific DNS names, and Route 53 private hosted zones.

Integrating your on-premises DNS server with AWS DNS server requires a Route 53 resolver inbound endpoint (for DNS queries that you’re forwarding to your VPCs). When creating an API Gateway private endpoint, a private DNS name is created by API Gateway. This endpoint is resolved automatically from within your VPC.

However, the on-premises servers learn about this hostname from AWS. For this, create a Route 53 inbound resolver endpoint and point your on-premises DNS server to it. This allows your corporate network resources to resolve AWS private DNS hostnames.

To improve reliability, the resolver requires that you specify two IP addresses for DNS queries. AWS recommends configuring IP addresses in two different Availability Zones. After you add the first two IP addresses, you can optionally add more in the same or different Availability Zone.

The inbound resolver is a logical resource consisting of two elastic network interfaces. These are deployed in two different Availability Zones for resiliency.

Route 53 inbound resolver

Route 53 inbound resolver

Configuring the security groups and resource policy

In the security pillar of the AWS Well-Architected Framework, one of the seven design principles is applying security at all layers: Apply a defense in depth approach with multiple security controls. Apply to all layers (edge of network, VPC, load balancing, every instance and compute service, operating system, application, and code).

A few security configurations are required for the solution to function:

  • The resolver security group (referred to as ‘ResolverSG’ in solution diagram) inbound rules must allow TCP and UDP on port 53 (DNS) from your on-premises network-managed prefix list (source). Note: configure the created managed prefix list with your on-premises network CIDR blocks.
  • The security group of the VPC endpoint of the API Gateway “VPCEndpointSG” must allow HTTPS access from your on-premises network-managed prefix list (source). Note: configure the crated managed prefix list with your on-premises network CIDR blocks.
  • For a private API Gateway to work, a resource policy must be configured. The AWS SAM deployment sets up an API Gateway resource policy that allows access to your API from the VPC endpoint. We are telling API Gateway to deny any request explicitly unless it is originating from a defined source VPC endpoint.
    Note: AWS SAM template creates the following policy:

      "Version": "2012-10-17",
      "Statement": [
              "Effect": "Allow",
              "Principal": "*",
              "Action": "execute-api:Invoke",
              "Resource": "arn:aws:execute-api:eu-west-1:12345678901:dligg9dxuk/DemoStage/GET/hello"
              "Effect": "Deny",
              "Principal": "*",
              "Action": "execute-api:Invoke",
              "Resource": "arn:aws:execute-api:eu-west-1: 12345678901:dligg9dxuk/DemoStage/GET/hello",
              "Condition": {
                  "StringNotEquals": {
                      "aws:SourceVpce": "vpce-0ac4147ba9386c9z7"


The AWS SAM deployment creates a Hello World Lambda. For demonstration purposes, the Lambda function always returns a successful response, conforming with API Gateway integration response.

Testing the solution

To test, invoke the API using a curl command from an on-premises client. To get the API URL, copy it from the on-screen AWS SAM deployment outputs. Alternatively, from the console go to AWS CloudFormation outputs section.

CloudFormation outputs

CloudFormation outputs

Next, go to Route 53 resolvers, select the created inbound endpoint and note of the endpoint IP addresses. Configure your on-premises DNS forwarder with the IP addresses. Refer to the documentation for your on-premises DNS server to configure DNS forwarders.

Route 53 resolver IP addresses

Route 53 resolver IP addresses

Finally, log on to your on-premises client and call the API Gateway endpoint. You should get a success response from the API Gateway as shown.

curl https://dligg9dxuk.execute-api.eu-west-1.amazonaws.com/DemoStage/hello

{"response": {"resultStatus": "SUCCESS"}}

Monitoring and troubleshooting

Route 53 resolver query logging allows you to log the DNS queries that originate in your VPCs. It shows which domain names are queried, the originating AWS resources (including source IP and instance ID) and the responses.

You can log the DNS queries that originate in VPCs that you specify, in addition to the responses to those DNS queries. You can also log DNS queries from on-premises resources that use an inbound resolver endpoint, and DNS queries that use an outbound resolver endpoint for recursive DNS resolution.

After configuring query logging from the console, you can use Amazon CloudWatch as the destination for the query logs. You can use this feature to view and troubleshoot the resolver.

    "version": "1.100000",
    "account_id": "1234567890123",
    "region": "eu-west-1",
    "vpc_id": "vpc-0c00ca6aa29c8472f",
    "query_timestamp": "2021-04-25T12:37:34Z",
    "query_name": "dligg9dxuk.execute-api.eu-west-1.amazonaws.com.",
    "query_type": "A",
    "query_class": "IN",
    "rcode": "NOERROR",
    "answers": [
            "Rdata": "”, API Gateway VPC Endpoint IP#1
            "Type": "A",
            "Class": "IN"
            "Rdata": "", API Gateway VPC Endpoint IP#2
            "Type": "A",
            "Class": "IN"
    "srcaddr": "", ONPREMISES CLIENT
    "srcport": "32843",
    "transport": "UDP",
    "srcids": {
        "resolver_endpoint": "rslvr-in-a7dd746257784e148",
        "resolver_network_interface": "rni-3a4a0caca1d0412ab"

Cleaning up

To remove the example application, navigate to CloudFormation and delete the stack.


API Gateway private endpoints allow use cases for building private API–based services inside your VPCs. You can keep both the frontend to your application (API Gateway) and the backend service private inside your VPC.

I discuss how to access your private APIs from your corporate network through Direct Connect or Site-to-Site VPN without exposing your endpoints to the internet. You deploy the demo using AWS Serverless Application Model (AWS SAM). You can also change the template for your own needs.

To learn more, visit the API Gateway tutorials and workshops page in the API Gateway developer guide.

For more serverless learning resources, visit Serverless Land.

Building well-architected serverless applications: Implementing application workload security – part 1

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-implementing-application-workload-security-part-1/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

Security question SEC3: How do you implement application security in your workload?

Review and automate security practices at the application code level, and enforce security code review as part of development workflow. By implementing security at the application code level, you can protect against emerging security threats and reduce the attack surface from malicious code, including third-party dependencies.

Required practice: Review security awareness documents frequently

Stay up to date with both AWS and industry security best practices to understand and evolve protection of your workloads. Having a clear understanding of common threats helps you to mitigate them when developing your workloads.

The AWS Security Blog provides security-specific AWS content. The Open Web Application Security Project (OWASP) Top 10 is a guide for security practitioners to understand the most common application attacks and risks. The OWASP Top 10 Serverless Interpretation provides information specific to serverless applications.

Review and subscribe to vulnerability and security bulletins

Regularly review news feeds from multiple sources that are relevant to the technologies used in your workload. Subscribe to notification services to be informed of critical threats in near-real time.

The Common Vulnerabilities and Exposures (CVE) program identifies, defines, and catalogs publicly disclosed cybersecurity vulnerabilities. You can search the CVE list directly, for example “Python”.

CVE Python search

CVE Python search

The US National Vulnerability Database (NVD) allows you to search by vulnerability type, severity, and impact. You can also perform advanced searching by vendor name, product name, and version numbers. GitHub also integrates with CVE, which allows for advanced searching within the CVEproject/cvelist repository.

AWS Security Bulletins are a notification system for security and privacy events related to AWS services. Subscribe to the security bulletin RSS feed to keep up to date with AWS security announcements.

The US Cybersecurity and Infrastructure Security Agency (CISA) provides alerts about current security issues, vulnerabilities, and exploits. You can receive email alerts or subscribe to the RSS feed.

AWS Partner Network (APN) member Palo Alto Networks provides the “Serverless architectures Security Top 10” list. This is a security awareness and education guide to use while designing, developing, and testing serverless applications to help minimize security risks.

Good practice: Automatically review a workload’s code dependencies/libraries

Regularly reviewing application and code dependencies is a good industry security practice. This helps detect and prevent non-certified application code, and ensure that third-party application dependencies operate as intended.

Implement security mechanisms to verify application code and dependencies before using them

Combine automated and manual security code reviews to examine application code and its dependencies to ensure they operate as intended. Automated tools can help identify overly complex application code, and common security vulnerability exposures that are already cataloged.

Manual security code reviews, in addition to automated tools, help ensure that application code works as intended. Manual reviews can include business contextual information and integrations that automated tools may not capture.

Before adding any code dependencies to your workload, take time to review and certify each dependency to ensure that you are adding secure code. Use third-party services to review your code dependencies on every commit automatically.

OWASP has a code review guide and dependency check tool that attempt to detect publicly disclosed vulnerabilities within a project’s dependencies. The tool has a command line interface, a Maven plugin, an Ant task, and a Jenkins plugin.

GitHub has a number of security features for hosted repositories to inspect and manage code dependencies.

The dependency graph allows you to explore the packages that your repository depends on. Dependabot alerts show information about dependencies that are known to contain security vulnerabilities. You can choose whether to have pull requests generated automatically to update these dependencies. Code scanning alerts automatically scan code files to detect security vulnerabilities and coding errors.

You can enable these features by navigating to the Settings tab, and selecting Security & analysis.

GitHub configure security and analysis features

GitHub configure security and analysis features

Once Dependabot analyzes the repository, you can view the dependencies graph from the Insights tab. In the serverless airline example used in this series, you can view the Loyalty service package.json dependencies.

Serverless airline loyalty dependencies

Serverless airline loyalty dependencies

Dependabot alerts for security vulnerabilities are visible in the Security tab. You can review alerts and see information about how to resolve them.

Dependabot alert

Dependabot alert

Once Dependabot alerts are enabled for a repository, you can also view the alerts when pushing code to the repository from the terminal.

Dependabot terminal alert

Dependabot terminal alert

If you enable security updates, Dependabot can automatically create pull requests to update dependencies.

Dependabot pull requests

Dependabot pull requests

AWS Partner Network (APN) member Snyk has an integration with AWS Lambda to manage the security of your function code. Snyk determines what code and dependencies are currently deployed for Node.js, Ruby, and Java projects. It tests dependencies against their vulnerability database.

If you build your functions using container images, you can use Amazon Elastic Container Registry’s (ECR) image scanning feature. You can manually scan your images, or scan them on each push to your repository.

Elastic Container Registry image scanning example results

Elastic Container Registry image scanning example results

Best practice: Validate inbound events

Sanitize inbound events and validate them against a predefined schema. This helps prevent errors and increases your workload’s security posture by catching malformed events or events intentionally crafted to be malicious. The OWASP Input validation cheat sheet includes guidance for providing input validation security functionality in your applications.

Validate incoming HTTP requests against a schema

Implicitly trusting data from clients could lead to malformed data being processed. Use data type validators or web application frameworks to ensure data correctness. These should include regular expressions, value range, data structure, and data normalization.

You can configure Amazon API Gateway to perform basic validation of an API request before proceeding with the integration request to add another layer of security. This ensures that the HTTP request matches the desired format. Any HTTP request that does not pass validation is rejected, returning a 400 error response to the caller.

The Serverless Security Workshop has a module on API Gateway input validation based on the fictional Wild Rydes unicorn raid hailing service. The example shows a REST API endpoint where partner companies of Wild Rydes can submit unicorn customizations, such as branded capes, to advertise their company. The API endpoint should ensure that the request body follows specific patterns. These include checking the ImageURL is a valid URL, and the ID for Cape is a numeric value.

In API Gateway, a model defines the data structure of a payload, using the JSON schema draft 4. The model ensures that you receive the parameters in the format you expect. You can check them against regular expressions. The CustomizationPost model specifies that the ImageURL and Cape schemas should contain the following valid patterns:

    "imageUrl": {
      "type": "string",
      "title": "The Imageurl Schema",
      "pattern": "^https?:\/\/[[email protected]:%_+.~#?&//=]+$"
    "sock": {
      "type": "string",
      "title": " The Cape Schema ",
      "pattern": "^[0-9]*$"

The model is applied to the /customizations/post method as part of the Method Request. The Request Validator is set to Validate body and the CustomizationPost model is set for the Request Body.

API Gateway request validator

API Gateway request validator

When testing the POST /customizations API with valid parameters using the following input:

   "name":"Cherry-themed unicorn",
   "sock": "1",
   "horn": "2",
   "glasses": "3",
   "cape": "4"

The result is a valid response:


Testing validation to the POST /customizations API using invalid parameters shows the input validation process.

The ImageUrl is not a valid URL:

    "name":"Cherry-themed unicorn",
    "sock": "1" ,
    "horn": "2" ,
    "glasses": "3",
    "cape": "4"

The Cape parameter is not a number, which shows a SQL injection attempt.

    "name":"Orange-themed unicorn",
    "sock": "1",
    "horn": "2",
    "glasses": "3",
    "cape":"2); INSERT INTO Cape (NAME,PRICE) VALUES ('Bad color', 10000.00"

These return a 400 Bad Request response from API Gateway before invoking the Lambda function:

{"message": "Invalid request body"}

To gain further protection, consider adding an AWS Web Application Firewall (AWS WAF) access control list to your API endpoint. The workshop includes an AWS WAF module to explore three AWS WAF rules:

  • Restrict the maximum size of request body
  • SQL injection condition as part of the request URI
  • Rate-based rule to prevent an overwhelming number of requests


AWS WAF also includes support for custom responses and request header insertion to improve the user experience and security posture of your applications.

For more API Gateway security information, see the security overview whitepaper.

Also add further input validation logic to your Lambda function code itself. For examples, see “Input Validation for Serverless”.


Implementing application security in your workload involves reviewing and automating security practices at the application code level. By implementing code security, you can protect against emerging security threats. You can improve the security posture by checking for malicious code, including third-party dependencies.

In this post, I cover reviewing security awareness documentation such as the CVE database. I show how to use GitHub security features to inspect and manage code dependencies. I then show how to validate inbound events using API Gateway request validation.

This well-architected question will be continued where I look at securely storing, auditing, and rotating secrets that are used in your application code.

For more serverless learning resources, visit Serverless Land.

Building well-architected serverless applications: Managing application security boundaries – part 2

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-managing-application-security-boundaries-part-2/

This series uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the nine serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

Security question SEC2: How do you manage your serverless application’s security boundaries?

This post continues part 1 of this security question. Previously, I cover how to evaluate and define resource policies, showing what policies are available for various serverless services. I show some of the features of AWS Web Application Firewall (AWS WAF) to protect APIs. Then then go through how to control network traffic at all layers. I explain how AWS Lambda functions connect to VPCs, and how to use private APIs and VPC endpoints. I walk through how to audit your traffic.

Required practice: Use temporary credentials between resources and components

Do not share credentials and permissions policies between resources to maintain a granular segregation of permissions and improve the security posture. Use temporary credentials that are frequently rotated and that have policies tailored to the access the resource needs.

Use dynamic authentication when accessing components and managed services

AWS Identity and Access Management (IAM) roles allows your applications to access AWS services securely without requiring you to manage or hardcode the security credentials. When you use a role, you don’t have to distribute long-term credentials such as a user name and password, or access keys. Instead, the role supplies temporary permissions that applications can use when they make calls to other AWS resources. When you create a Lambda function, for example, you specify an IAM role to associate with the function. The function can then use the role-supplied temporary credentials to sign API requests.

Use IAM for authorizing access to AWS managed services such as Lambda or Amazon S3. Lambda also assumes IAM roles, exposing and rotating temporary credentials to your functions. This enables your application code to access AWS services.

Use IAM to authorize access to internal or private Amazon API Gateway API consumers. See this list of AWS services that work with IAM.

Within the serverless airline example used in this series, the loyalty service uses a Lambda function to fetch loyalty points and next tier progress. AWS AppSync acts as the client using an HTTP resolver, via an API Gateway REST API /loyalty/{customerId}/get resource, to invoke the function.

To ensure only AWS AppSync is authorized to invoke the API, IAM authorization is set within the API Gateway method request.

Viewing API Gateway IAM authorization

Viewing API Gateway IAM authorization

The IAM role specifies that appsync.amazonaws.com can perform an execute-api:Invoke on the specific API Gateway resource arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${LoyaltyApi}/*/*/*

For more information, see “Using an IAM role to grant permissions to applications”.

Use a framework such as the AWS Serverless Application Model (AWS SAM) to deploy your applications. This ensures that AWS resources are provisioned with unique per resource IAM roles. For example, AWS SAM automatically creates unique IAM roles for every Lambda function you create.

Best practice: Design smaller, single purpose functions

Creating smaller, single purpose functions enables you to keep your permissions aligned to least privileged access. This reduces the risk of compromise since the function does not require access to more than it needs.

Create single purpose functions with their own IAM role

Single purpose Lambda functions allow you to create IAM roles that are specific to your access requirements. For example, a large multipurpose function might need access to multiple AWS resources such as Amazon DynamoDB, Amazon S3, and Amazon Simple Queue Service (SQS). Single purpose functions would not need access to all of them at the same time.

With smaller, single purpose functions, it’s often easier to identify the specific resources and access requirements, and grant only those permissions. Additionally, new features are usually implemented by new functions in this architectural design. You can specifically grant permissions in new IAM roles for these functions.

Avoid sharing IAM roles with multiple cloud resources. As permissions are added to the role, these are shared across all resources using this role. For example, use one dedicated IAM role per Lambda function. This allows you to control permissions more intentionally. Even if some functions have the same policy initially, always separate the IAM roles to ensure least privilege policies.

Use least privilege access policies with your users and roles

When you create IAM policies, follow the standard security advice of granting least privilege, or granting only the permissions required to perform a task. Determine what users (and roles) must do and then craft policies that allow them to perform only those tasks.

Start with a minimum set of permissions and grant additional permissions as necessary. Doing so is more secure than starting with permissions that are too lenient and then trying to tighten them later. In the unlikely event of misused credentials, credentials will only be able to perform limited interactions.

To control access to AWS resources, AWS SAM uses the same mechanisms as AWS CloudFormation. For more information, see “Controlling access with AWS Identity and Access Management” in the AWS CloudFormation User Guide.

For a Lambda function, AWS SAM scopes the permissions of your Lambda functions to the resources that are used by your application. You add IAM policies as part of the AWS SAM template. The policies property can be the name of AWS managed policies, inline IAM policy documents, or AWS SAM policy templates.

For example, the serverless airline has a ConfirmBooking Lambda function that has UpdateItem permissions to the specific DynamoDB BookingTable resource.

        Type: AWS::SSM::Parameter::Value<String>
        Description: Parameter Name for Booking Table
        Type: AWS::Serverless::Function
            FunctionName: !Sub ServerlessAirline-ConfirmBooking-${Stage}
                - Version: "2012-10-17"
                      Action: dynamodb:UpdateItem
                      Effect: Allow
                      Resource: !Sub "arn:${AWS::Partition}:dynamodb:${AWS::Region}:${AWS::AccountId}:table/${BookingTable}"

One of the fastest ways to scope permissions appropriately is to use AWS SAM policy templates. You can reference these templates directly in the AWS SAM template for your application, providing custom parameters as required.

The serverless patterns collection allows you to build integrations quickly using AWS SAM and AWS Cloud Development Kit (AWS CDK) templates.

The booking service uses the SNSPublishMessagePolicy. This policy gives permission to the NotifyBooking Lambda function to publish a message to an Amazon Simple Notification Service (Amazon SNS) topic.

        Type: AWS::SNS::Topic

        Type: AWS::Serverless::Function
                - SNSPublishMessagePolicy:
                      TopicName: !Sub ${BookingTopic.TopicName}

Auditing permissions and removing unnecessary permissions

Audit permissions regularly to help you identify unused permissions so that you can remove them. You can use last accessed information to refine your policies and allow access to only the services and actions that your entities use. Use the IAM console to view when last an IAM role was used.

IAM last used

IAM last used

Use IAM access advisor to review when was the last time an AWS service was used from a specific IAM user or role. You can view last accessed information for IAM on the Access Advisor tab in the IAM console. Using this information, you can remove IAM policies and access from your IAM roles.

IAM access advisor

IAM access advisor

When creating and editing policies, you can validate them using IAM Access Analyzer, which provides over 100 policy checks. It generates security warnings when a statement in your policy allows access AWS considers overly permissive. Use the security warning’s actionable recommendations to help grant least privilege. To learn more about policy checks provided by IAM Access Analyzer, see “IAM Access Analyzer policy validation”.

With AWS CloudTrail, you can use CloudTrail event history to review individual actions your IAM role has performed in the past. Using this information, you can detect which permissions were actively used, and decide to remove permissions.

AWS CloudTrail

AWS CloudTrail

To work out which permissions you may need, you can generate IAM policies based on access activity. You configure an IAM role with broad permissions while the application is in development. Access Analyzer reviews your CloudTrail logs. It generates a policy template that contains the permissions that the role used in your specified date range. Use the template to create a policy that grants only the permissions needed to support your specific use case. For more information, see “Generate policies based on access activity”.

IAM Access Analyzer

IAM Access Analyzer


Managing your serverless application’s security boundaries ensures isolation for, within, and between components. In this post, I continue from part 1, looking at using temporary credentials between resources and components. I cover why smaller, single purpose functions are better from a security perspective, and how to audit permissions. I show how to use AWS SAM to create per-function IAM roles.

For more serverless learning resources, visit https://serverlessland.com.