Tag Archives: AWS Lambda

Announcing AWS Lambda support for .NET Core 3.1

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/announcing-aws-lambda-supports-for-net-core-3-1/

This post is courtesy of Norm Johanson, Senior Software Development Engineer, AWS SDKs and Tools.

From today, you can develop AWS Lambda functions using .NET Core 3.1. You can deploy to Lambda by setting the runtime parameter value to dotnetcore3.1. Version AWS Toolkit for Visual Studio and version 4.0.0 of the .NET Core Global Tool Amazon.Lambda.Tools are also available today. These make it easy to build and deploy your .NET Core 3.1 Lambda functions.

New features of .NET Core 3.1

.NET Core 3.1 brings many new runtime features to Lambda including C# 8.0 and F# 4.7 support, .NET Standard 2.1 support, new JSON serializer, and a new ReadyToRun feature for ahead-of-time compilation. There are also new versions of the .NET Lambda tooling and libraries. These include the Amazon.Lambda.AspNetCoreServer package, which allows you to run ASP.NET Core 3.1
projects as Lambda functions.

New Lambda JSON serializer

.NET Core Lambda functions support JSON serialization of input and return parameters. Use this feature by registering a serializer in your Lambda code. Typically, this is done using an assembly attribute like this which registers the JsonSerializer class from the Amazon.Lambda.Serialization.Json NuGet package as the serializer:


Amazon.Lambda.Serialization.Json uses the popular NuGet package Newtonsoft.Json for serialization. Newtonsoft.Json is a powerful serializer with many built-in features. This makes it a large assembly to add to your .NET Core Lambda functions.

Starting with .NET Core 3.0, a new JSON serializer called System.Text.Json is built into the .NET Core framework. This serializer is focused on the core features of serialization and built for performance. To take advantage of this new serializer, use the new NuGet package Amazon.Lambda.Serialization.SystemTextJson. Testing with this new serializer shows significant
improvements to Lambda cold start performance. The new Lambda blueprints available in Visual Studio or dotnet new via Amazon.Lambda.Templates default to this new serializer using the following assembly attribute.


Most .NET Lambda event packages work with either of the AWS serializer packages. For performance and simplicity reasons, the new versions of Amazon.Lambda.APIGatewayEvents and Amazon.Lambda.AspNetCoreServer only use the newer, faster Amazon.Lambda.Serialization.SystemTextJson for JSON serialization when targeting .NET Core 3.1.

ReadyToRun for better cold start performance

.NET Core 3.0 introduces a new build concept called ReadyToRun, which is also available in .NET Core 3.1. ReadyToRun performs much of the work of the just-in-time compiler used by the .NET runtime. If your project contains large amounts of code or large dependencies like the AWS SDK for .NET, this feature can significantly reduce cold start performance. It has less effect on small
functions using only the .NET Core base library.

To use ReadyToRun for Lambda, you must package your .NET Lambda function on Linux. You can use a Linux environment like an EC2 Linux instance, or CodeBuild, which also recently added .NET Core 3.1 support. Once you are in a Linux environment using .NET Lambda tooling, enable ReadyToRun by setting the –msbuild-parameters switch:

"/p:PublishReadyToRun=true --self-contained false"

For example, to deploy a function with ReadyToRun enabled using the .NET Core Global Tool for Lambda Amazon.Lambda.Tools, use:

dotnet lambda deploy-function R2RExample --msbuild-parameters "/p:PublishReadyToRun=true --self-contained false"

To avoid setting in the command line, set msbuild-parameters as a property in the aws-lambda-tools-defaults.json file.

Updated AWS Mock .NET Lambda Test Tool

Lambda test tool

With .NET Core 2.1, AWS released the AWS .NET Core Mock Lambda Test Tool. This makes it easy to debug .NET Core Lambda functions. If you are using the AWS Toolkit for Visual Studio, the toolkit automatically installs or updates the test tool, and configures your launchSettings.json file. With the toolkit, you can use F5 debugging when the project opens.

For .NET Core 3.1, this tool offers new features. First, the way the tool loads .NET Lambda code internally is redesigned. Previously, the assemblies in customer code may collide with the test tool’s assemblies. Now the Lambda code is loaded in a separate AssemblyLoadContext, preventing this collision.

The test tool is an ASP.NET Core application that loads and executes the Lambda code. This allows the debugger that is currently attached to the test tool to debug the loaded Lambda code. Pressing F5 opens the web interface, allowing you to select the function, payload, and other parameters. Once everything is set, choose execute to run the code inside the test tool’s process. To improve the debug turnaround cycle, there is a new switch: –no-ui. This skips the web interface after code changes, making it faster to debug your code.

My work flow for this tool is to use the web interface for the initial debug session, then save the request JSON. After the initial debug session, I edit the launchSettings.json file, which looks like this, setting up the port for the web interface:

  "profiles": {
    "Mock Lambda Test Tool": {
      "commandName": "Executable",
      "commandLineArgs": "--port 5050",
      "workingDirectory": ".\\bin\\$(Configuration)\\netcoreapp3.1",
      "executablePath": "C:\\Users\\%USERNAME%\\.dotnet\\tools\\dotnet-lambda-test-tool-3.1.exe"

I update this to:

  "profiles": {
    "Mock Lambda Test Tool": {
      "commandName": "Executable",
      "commandLineArgs": "—no-ui --payload SavedRequest",
      "workingDirectory": ".\\bin\\$(Configuration)\\netcoreapp3.1",
      "executablePath": "C:\\Users\\%USERNAME%\\.dotnet\\tools\\dotnet-lambda-test-tool-3.1.exe"

This uses the saved request from the web interface as the input payload, instead of the web interface.

For more information about this feature, see the new Documentation tab in the test tool after launching. Here is a demonstration of my debug workflow:

Lambda debug workflow

Amazon Linux 2

.NET Core 3.1, like (Ruby 2.7, Python 3.8, Node.js 10 and 12, and Java 11) is based on an Amazon Linux 2 execution environment. Amazon Linux 2 provides a secure, stable, and high-performance execution environment to develop and run cloud and enterprise applications.

Migrate to .NET Core 3.1

To migrate existing .NET Core 2.1 Lambda functions to the new 3.1 runtime, follow the steps below:

  1. Open the csproj or fsproj file.
    • Set the TargetFramework element to netcoreapp3.1.
  2. Open the aws-lambda-tools-defaults.json file.
    • If it exists, set the function-runtime field to dotnetcore3.1
    • If it exists, set the framework field to netcoreapp3.1. If you remove the field, the value is inferred from the project file.
  3. If it exists, open the serverless.template file.
    • For any AWS::Lambda::Function or AWS::Servereless::Function, set the Runtime property to dotnetcore3.1
  4. Update all Amazon.Lambda.* NuGet package references to the latest versions.

To use the new JSON serializer, follow these steps:

  1. Remove the NuGet package reference to Amazon.Lambda.Serialization.Json.
  2. Add the NuGet package reference to Amazon.Lambda.Serialization.SystemTextJson.
  3. In your code, where the LambdaSerializer attribute registers the JSON serializer, change the parameter to Amazon.Lambda.Serialization.SystemTextJson.LambdaJsonSerializer.


There is a blueprint in Visual Studio for detecting labels for images uploaded in S3:

Detect image labels

By converting this blueprint to .NET Core 3.1 and using the new JSON serializer and ReadyToRun features, the cold start time is reduced by 40% when using 256 MB of memory. Performance improvements vary, so be sure to try these new features in your Lambda functions.

Start building .NET Core 3.1 Lambda functions with the latest versions of the AWS Toolkit for Visual Studio or the .NET Core Global Tool Amazon.Lambda.Tools. If you are not using .NET Core Lambda tooling, specify dotnetcore3.1 as the runtime value in your preferred tool to deploy Lambda functions.

We would like to hear your feedback for AWS .NET Lambda support. Contact the AWS .NET Team for Lambda questions through our .NET Lambda GitHub repository.

Converting call center recordings into useful data for analytics

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/converting-call-center-recordings-into-useful-data-for-analytics/

Many businesses operate call centers that record conversations with customers for training or regulatory purposes. These vast collections of audio offer unique opportunities for improving customer service. However, since audio data is mostly unsearchable, it’s usually archived in these systems and never analyzed for insights.

Developing machine learning models for accurately understanding and transcribing speech is also a major challenge. These models require large datasets for optimal performance, along with teams of experts to build and maintain the software. This puts it out of reach for the majority of businesses and organizations. Fortunately, you can use AWS services to handle this difficult problem.

In this blog post, I show how you can use a serverless approach to analyze audio data from your call center. You can clone this application from the GitHub repo and modify to meet your needs. The solution uses Amazon ML services, together with scalable storage, and serverless compute. The example application has the following architecture:

The architecture for the call center audio analyzer.

For call center analysis, this application is useful to determine the types of general topics that customers are calling about. It can also detect the sentiment of the conversation, so if the call is a compliment or a complaint, you could take additional action. When combined with other metadata such as caller location or time of day, this can yield important insights to help you improve customer experience. For example, you might discover there are common service issues in a geography at a certain time of day.

To set up the example application, visit the GitHub repo and follow the instructions in the README.md file.

How the application works

A key part of the serverless solution is Amazon S3, an object store that scales to meet your storage needs. When new objects are stored, this triggers AWS Lambda functions, which scale to keep pace with S3 usage. The application coordinates activities between the S3 bucket and two managed Machine Learning (ML) services, storing the results in an Amazon DynamoDB table.

The ML services used are:

  • Amazon Transcribe, which transcribes audio data into JSON output, using a process called automatic speech recognition. This can understand 31 languages and dialects, and identify different speakers in a customer support call.
  • Amazon Comprehend, which offers sentiment analysis as one of its core features. This service returns an array of scores to estimate the probability that the input text is positive, negative, neutral, or mixed.

Sample application architecture.

  1. A downstream process, such as a call recording system, stores audio data in the application’s S3 bucket.
  2. When the MP3 objects are stored, this triggers the Transcribe function. The function creates a new job in the Amazon Transcribe service.
  3. When the transcription process finishes, Transcribe stores the JSON result in the same S3 bucket.
  4. This JSON object triggers the Sentiment function. The Sentiment function requests a sentiment analysis from the Comprehend service.
  5. After receiving the sentiment scores, this function stores the results in a DynamoDB table.

There is only one bucket used in the application. The two Lambda functions are triggered by the same bucket, using different object suffixes. This is configured in the SAM template, shown here:

          Type: S3
            Bucket: !Ref InputS3Bucket
            Events: s3:ObjectCreated:*
                  - Name: suffix
                    Value: '.json'              

          Type: S3
            Bucket: !Ref InputS3Bucket
            Events: s3:ObjectCreated:*
                  - Name: suffix
                    Value: '.mp3'    

Testing the application

To test the application, you need an MP3 audio file containing spoken text. For example, in my testing, I use audio files of a person reading business reviews representing positive, neutral, and negative experiences.

  1. After cloning the GitHub repo, follow the instructions in the README.md file to deploy the application. Note the name of the S3 bucket output in the deployment.SAM deployment CLI output
  2. Upload your test MP3 files using this command in a terminal, replacing your-bucket-name with the deployed bucket name:aws s3 cp .\ s3://your-bucket-name --recursiveOnce executed, your terminal shows the uploaded media files:

    Uploading sample media files.

  3.  Navigate to the Amazon Transcribe console, and choose Transcription jobs in the left-side menu. The MP3 files you uploaded appear here as separate jobs:Amazon Transcribe jobs in progress
  4. Once the Status column shows all pending job as Complete, navigate to the DynamoDB console.
  5. Choose Tables from the left-side menu and select the table created by the deployment. Choose the Items tab:Sentiment scores in the DynamoDB table
    Each MP3 file appears as a separate item with a sentiment rating and a probability for each sentiment category. It also includes the transcript of the audio.

Handling multiple languages

One of the most useful aspects of serverless architecture is the ability to add functionality easily. For call centers handling multiple languages, ideally you should translate to a common language for sentiment scoring. With this application, it’s easy to add an extra step to the process to translate the transcription language to a base language:

Advanced application architecture

A new Translate Lambda function is invoked by the S3 JSON suffix filter and creates text output in a common base language. The sentiment scoring function is triggered by new objects with the suffix TXT.

In this modified case, when the MP3 audio file is uploaded to S3, you can append the language identifier as metadata to the object. For example, to upload an MP3 with a French language identifier using the AWS CLI:

aws s3 cp .\test-audio-fr.mp3 s3://your-bucket --metadata Content-Language=fr-FR

The first Lambda function passes the language identifier to the Transcribe service. In the Transcribe console, the language appears in the new job:

French transcription job complete

After the job finishes, the JSON output is stored in the same S3 bucket. It shows the transcription from the French language audio:

French transcription output

The new Translate Lambda function passes the transcript value into the Amazon Translate service. This converts the French to English and saves the translation as a text file. The sentiment Lambda function now uses the contents of this text file to generate the sentiment scores.

This approach allows you to accept audio in a wide range of spoken languages but standardize your analytics in one base language.

Developing for extensibility

You might want to take action on phone calls that have a negative sentiment score, or publish scores to other applications in your organization. This architecture makes it simple to extend functionality once DynamoDB saves the sentiment scores. By using DynamoDB Streams, you can invoke a Lambda function each time a record is created or updated in the underlying DynamoDB table:

Adding notifications to the application

In this case, the routing function could trigger an email via Amazon SES where the sentiment score is negative. For example, this could email a manager to follow up with the customer. Alternatively, you may choose to publish all scores and results to any downstream application with Amazon EventBridge. By publishing events to the default event bus, you can allow consuming applications to build new functionality without needing any direct integration.

Deferred execution in Amazon Transcribe

The services used in the example application are all highly scalable and highly available, and can handle significant amounts of data. Amazon Transcribe allows up to 100 concurrent transcription jobs – see the service limits and quotas for more information.

The service also provides a mechanism for deferred execution, which allows you to hold jobs in a queue. When the numbering of executing jobs falls below the concurrent execution limit, the service takes the next job from this queue. This effectively means you can submit any number of jobs to the Transcribe service, and it manages the queue and processing automatically.

To use this feature, there are two additional attributes used in the startTranscriptionJob method of the AWS.TranscribeService object. When added to the Lambda handler in the Transcribe function, the code looks like this:

Deferred execution for Amazon Transcribe

After setting AllowDeferredExecution to true, you must also provide an IAM role ARN in the DataAccessRoleArn attribute. For more information on how to use this feature, see the Transcribe documentation for job execution settings.


In this blog post, I show how to transcribe the content of audio files and calculate a sentiment score. This can be useful for organizations wanting to analyze saved audio for customer calls, webinars, or team meetings.

This solution uses Amazon ML services to handle the audio and text analysis, and serverless services like S3 and Lambda to manage the storage and business logic. The serverless application here can scale to handle large amounts of production data. You can also easily extend the application to provide new functionality, built specifically for your organization’s use-case.

To learn more about building serverless applications at scale, visit the AWS Serverless website.

Building a Raspberry Pi telepresence robot using serverless: Part 1

Post Syndicated from Moheeb Zara original https://aws.amazon.com/blogs/compute/building-a-raspberry-pi-telepresence-robot-using-serverless-part-1/

A Pimoroni STS-Pi Robot Kit connected to AWS for remote control and viewing.

A Pimoroni STS-Pi Robot Kit connected to AWS for remote control and viewing.

A telepresence robot allows you to explore remote environments from the comfort of your home through live stream video and remote control. These types of robots can improve the lives of the disabled, elderly, or those that simply cannot be with their coworkers or loved ones in person. Some are used to explore off-world terrain and others for search and rescue.

This guide walks through building a simple telepresence robot using a Pimoroni STS-PI Raspberry Pi robot kit. A Raspberry Pi is a small low-cost device that runs Linux. Add-on modules for Raspberry Pi are called “hats”. You can substitute this kit with any mobile platform that uses two motors wired to an Adafruit Motor Hat or a Pimoroni Explorer Hat.

The sample serverless application uses AWS Lambda and Amazon API Gateway to create a REST API for driving the robot. A Python application running on the robot uses AWS IoT Core to receive drive commands and authenticate with Amazon Kinesis Video Streams with WebRTC using an IoT Credentials Provider. In the next blog I walk through deploying a web frontend to both view the livestream and control the robot via the API.


You need the following to complete the project:

A Pimoroni STS-Pi robot kit, Explorer Hat, Raspberry Pi, camera, and battery.

A Pimoroni STS-Pi robot kit, Explorer Hat, Raspberry Pi, camera, and battery.

Estimated Cost: $120

There are three major parts to this project. First deploy the serverless backend using the AWS Serverless Application Repository. Then assemble the robot and run an installer on the Raspberry Pi. Finally, configure and run the Python application on the robot to confirm it can be driven through the API and is streaming video.

Deploy the serverless application

In this section, use the Serverless Application Repository to deploy the backend resources for the robot. The resources to deploy are defined using the AWS Serverless Application Model (SAM), an open-source framework for building serverless applications using AWS CloudFormation. To deeper understand how this application is built, look at the SAM template in the GitHub repository.

An architecture diagram of the AWS IoT and Amazon Kinesis Video Stream resources of the deployed application.

The Python application that runs on the robot requires permissions to connect as an IoT Thing and subscribe to messages sent to a specific topic on the AWS IoT Core message broker. The following policy is created in the SAM template:

      Type: "AWS::IoT::Policy"
        PolicyName: !Sub "${RobotName}Policy"
          Version: "2012-10-17"
            - Effect: Allow
                - iot:Connect
                - iot:Subscribe
                - iot:Publish
                - iot:Receive
                - !Sub "arn:aws:iot:*:*:topicfilter/${RobotName}/action"
                - !Sub "arn:aws:iot:*:*:topic/${RobotName}/action"
                - !Sub "arn:aws:iot:*:*:topic/${RobotName}/telemetry"
                - !Sub "arn:aws:iot:*:*:client/${RobotName}"

To transmit video, the Python application runs the amazon-kinesis-video-streams-webrtc-sdk-c sample in a subprocess. Instead of using separate credentials to authenticate with Kinesis Video Streams, a Role Alias policy is created so that IoT credentials can be used.

  "Version": "2012-10-17",
  "Statement": [
      "Action": [
      "Resource": "arn:aws:iot:Region:AccountID:rolealias/robot-camera-streaming-role-alias",
      "Effect": "Allow"

When the above policy is attached to a certificate associated with an IoT Thing, it can assume the following role:

      Type: 'AWS::IAM::Role'
          Version: '2012-10-17'
          - Effect: 'Allow'
              Service: 'credentials.iot.amazonaws.com'
            Action: 'sts:AssumeRole'
        - PolicyName: !Sub "KVSIAMPolicy-${AWS::StackName}"
            Version: '2012-10-17'
            - Effect: Allow
                - kinesisvideo:ConnectAsMaster
                - kinesisvideo:GetSignalingChannelEndpoint
                - kinesisvideo:CreateSignalingChannel
                - kinesisvideo:GetIceServerConfig
                - kinesisvideo:DescribeSignalingChannel
              Resource: "arn:aws:kinesisvideo:*:*:channel/${credentials-iot:ThingName}/*"

This role grants access to connect and transmit video over WebRTC using the Kinesis Video Streams signaling channel deployed by the serverless application. An architecture diagram of the API endpoint in the deployed application.

A deployed API Gateway endpoint, when called with valid JSON, invokes a Lambda function that publishes to an IoT message topic, RobotName/action. The Python application on the robot subscribes to this topic and drives the motors based on any received message that maps to a command.

  1. Navigate to the aws-serverless-telepresence-robot application in the Serverless Application Repository.
  2. Choose Deploy.
  3. On the next page, under Application Settings, fill out the parameter, RobotName.
  4. Choose Deploy.
  5. Once complete, choose View CloudFormation Stack.
  6. Select the Outputs tab. Copy the ApiURL and the EndpointURL for use when configuring the robot.

Create and download the AWS IoT device certificate

The robot requires an AWS IoT root CA (fetched by the install script), certificate, and private key to authenticate with AWS IoT Core. The certificate and private key are not created by the serverless application since they can only be downloaded on creation. Create a new certificate and attach the IoT policy and Role Alias policy deployed by the serverless application.

  1. Navigate to the AWS IoT Core console.
  2. Choose Manage, Things.
  3. Choose the Thing that corresponds with the name of the robot.
  4. Under Security, choose Create certificate.
  5. Choose Activate.
  6. Download the Private Key and Thing Certificate. Save these securely, as this is the only time you can download this certificate.
  7. Choose Attach Policy.
  8. Two policies are created and must be attached. From the list, select
  9. Choose Done.

Flash an operating system to an SD card

The Raspberry Pi single-board Linux computer uses an SD card as the main file system storage. Raspbian Buster Lite is an officially supported Debian Linux operating system that must be flashed to an SD card. Balena.io has created an application called balenaEtcher for the sole purpose of accomplishing this safely.

  1. Download the latest version of Raspbian Buster Lite.
  2. Download and install balenaEtcher.
  3. Insert the SD card into your computer and run balenaEtcher.
  4. Choose the Raspbian image. Choose Flash to burn the image to the SD card.
  5. When flashing is complete, balenaEtcher dismounts the SD card.

Configure Wi-Fi and SSH headless

Typically, a keyboard and monitor are used to configure Wi-Fi or to access the command line on a Raspberry Pi. Since it is on a mobile platform, configure the Raspberry Pi to connect to a Wi-Fi network and enable remote access headless by adding configuration files to the SD card.

  1. Re-insert the SD card to your computer so that it shows as volume boot.
  2. Create a file in the boot volume of the SD card named wpa_supplicant.conf.
  3. Paste in the following contents, substituting your Wi-Fi credentials.
    ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
            country=<Insert country code here>
             ssid="<Name of your WiFi>"
             psk="<Password for your WiFi>"

  4. Create an empty file without a file extension in the boot volume named ssh. At boot, the Raspbian operating system looks for this file and enables remote access if it exists. This can be done from a command line:
    cd path/to/volume/boot
    touch ssh

  5. Safely eject the SD card from your computer.

Assemble the robot

For this section, you can use the Pimoroni STS-Pi robot kit with a Pimoroni Explorer Hat, along with a Raspberry Pi Model 3 B+ or newer, and a camera module. Alternatively, you can use any two motor robot platform that uses the Explorer Hat or Adafruit Motor Hat.

  1. Follow the instructions in this video to assemble the Pimoroni STS-Pi robot kit.
  2. Place the SD card in the Raspberry Pi.
  3. Since the installation may take some time, power the Raspberry Pi using a USB 5V power supply connected to a wall plug rather than a battery.

Connect remotely using SSH

Use your computer to gain remote command line access of the Raspberry Pi using SSH. Both devices must be on the same network.

  1. Open a terminal application with SSH installed. It is already built into Linux and Mac OS, to enable SSH on Windows follow these instructions.
  2. Enter the following to begin a secure shell session as user pi on the default local hostname raspberrypi, which resolves to the IP address of the device using MDNS:
  3. If prompted to add an SSH key to the list of known hosts, type yes.
  4. When prompted for a password, type raspberry. This is the default password and can be changed using the raspi-config utility.
  5. Upon successful login, you now have shell access to your Raspberry Pi device.

Enable the camera using raspi-config

A built-in utility, raspi-config, provides an easy to use interface for configuring Raspbian. You must enable the camera module, along with I2C, a serial bus used for communicating with the motor driver.

  1. In an open SSH session, type the following to open the raspi-config utility:
    sudo raspi-config

  2. Using the arrows, choose Interfacing Options.
  3. Choose Camera. When prompted, choose Yes to enable the camera module.
  4. Repeat the process to enable the I2C interface.
  5. Select Finish and reboot.

Run the install script

An installer script is provided for building and installing the Kinesis Video Stream WebRTC producer, AWSIoTPythonSDK and Pimoroni Explorer Hat Python libraries. Upon completion, it creates a directory with the following structure:

├── /home/pi/Projects/robot
│  └── main.py // The main Python application
│  └── config.json // Parameters used by main.py
│  └── kvsWebrtcClientMasterGstSample //Kinesis Video Stream producer
│  └── /certs
│     └── cacert.pem // Amazon SFSRootCAG2 Certificate Authority
│     └── certificate.pem // AWS IoT certificate placeholder
│     └── private.pem.key // AWS IoT private key placeholder
  1. Open an SSH session on the Raspberry Pi.
  2. (Optional) If using the Adafruit Motor Hat, run this command, otherwise the script defaults to the Pimoroni Explorer Hat.
    export MOTOR_DRIVER=adafruit  

  3. Run the following command to fetch and execute the installer script.
    wget -O - https://raw.githubusercontent.com/aws-samples/aws-serverless-telepresence-robot/master/scripts/install.sh | bash

  4. While the script installs, proceed to the next section.

Configure the code

The Python application on the robot subscribes to AWS IoT Core to receive messages. It requires the certificate and private key created for the IoT thing to authenticate. These files must be copied to the directory where the Python application is stored on the Raspberry Pi.

It also requires the IoT Credentials endpoint is added to the file config.json to assume permissions necessary to transmit video to Amazon Kinesis Video Streams.

  1. Open an SSH session on the Raspberry Pi.
  2. Open the certificate.pem file with the nano text editor and paste in the contents of the certificate downloaded earlier.
    nano certificate.pem

  3. Press CTRL+X and then Y to save the file.
  4. Repeat the process with the private.key.pem file.
    nano private.key.pem

  5. Open the config.json file.
    nano config.json

  6. Provide the following information:
    IOT_THINGNAME: The name of your robot, as set in the serverless application.
    IOT_CORE_ENDPOINT: This is found under the Settings page in the AWS IoT Core console.
    IOT_GET_CREDENTIAL_ENDPOINT: Provided by the serverless application.
    ROLE_ALIAS: This is already set to match the Role Alias deployed by the serverless application.
    AWS_DEFAULT_REGION: Corresponds to the Region the application is deployed in.
  7. Save the file using CTRL+X and Y.
  8. To start the robot, run the command:
    python3 main.py

  9. To stop the script, press CTRL+C.

View the Kinesis video stream

The following steps create a WebRTC connection with the robot to view the live stream.

  1. Navigate to the Amazon Kinesis Video Streams console.
  2. Choose Signaling channels from the left menu.
  3. Choose the channel that corresponds with the name of your robot.
  4. Open the Media Playback card.
  5. After a moment, a WebRTC peer to peer connection is negotiated and live video is displayed.
    An animated gif demonstrating a live video stream from the robot.

Sending drive commands

The serverless backend includes an Amazon API Gateway REST endpoint that publishes JSON messages to the Python script on the robot.

The robot expects a message:

{ “action”: <direction> }

Where direction can be “forward”, “backwards”, “left”, or “right”.

  1. While the Python script is running on the robot, open another terminal window.
  2. Run this command to tell the robot to drive forward. Replace <API-URL> using the endpoint listed under Outputs in the CloudFormation stack for the serverless application.
    curl -d '{"action":"forward"}' -H "Content-Type: application/json" -X POST https://<API-URL>/publish

    An animated gif demonstrating the robot being driven from a REST request.


In this post, I show how to build and program a telepresence robot with remote control and a live video feed in the cloud. I did this by installing a Python application on a Raspberry Pi robot and deploying a serverless application.

The Python application uses AWS IoT credentials to receive remote commands from the cloud and transmit live video using Kinesis Video Streams with WebRTC. The serverless application deploys a REST endpoint using API Gateway and a Lambda function. Any application that can connect to the endpoint can drive the robot.

In part two, I build on this project by deploying a web interface for the robot using AWS Amplify.

A preview of the web frontend built in the next blog.

A preview of the web frontend built in the next blog.



Use AWS Lambda authorizers with a third-party identity provider to secure Amazon API Gateway REST APIs

Post Syndicated from Bryant Bost original https://aws.amazon.com/blogs/security/use-aws-lambda-authorizers-with-a-third-party-identity-provider-to-secure-amazon-api-gateway-rest-apis/

Note: This post focuses on Amazon API Gateway REST APIs used with OAuth 2.0 and custom AWS Lambda authorizers. API Gateway also offers HTTP APIs, which provide native OAuth 2.0 features. For more information about which is right for your organization, see Choosing Between HTTP APIs and REST APIs.

Amazon API Gateway is a fully managed AWS service that simplifies the process of creating and managing REST APIs at any scale. If you are new to API Gateway, check out Amazon API Gateway Getting Started to get familiar with core concepts and terminology. In this post, I will demonstrate how an organization using a third-party identity provider can use AWS Lambda authorizers to implement a standard token-based authorization scheme for REST APIs that are deployed using API Gateway.

In the context of this post, a third-party identity provider refers to an entity that exists outside of AWS and that creates, manages, and maintains identity information for your organization. This identity provider issues cryptographically signed tokens to users containing information about the user identity and their permissions. In order to use these non-AWS tokens to control access to resources within API Gateway, you will need to define custom authorization code using a Lambda function to “map” token characteristics to API Gateway resources and permissions.

Defining custom authorization code is not the only way to implement authorization in API Gateway and ensure resources can only be accessed by the correct users. In addition to Lambda authorizers, API Gateway offers several “native” options that use existing AWS services to control resource access and do not require any custom code. To learn more about the established practices and authorization mechanisms, see Controlling and Managing Access to a REST API in API Gateway.

Lambda authorizers are a good choice for organizations that use third-party identity providers directly (without federation) to control access to resources in API Gateway, or organizations requiring authorization logic beyond the capabilities offered by “native” authorization mechanisms.

Benefits of using third-party tokens with API Gateway

Using a Lambda authorizer with third-party tokens in API Gateway can provide the following benefits:

  • Integration of third-party identity provider with API Gateway: If your organization has already adopted a third-party identity provider, building a Lambda authorizer allows users to access API Gateway resources by using their third-party credentials without having to configure additional services, such as Amazon Cognito. This can be particularly useful if your organization is using the third-party identity provider for single sign-on (SSO).
  • Minimal impact to client applications: If your organization has an application that is already configured to sign in to a third-party identity provider and issue requests using tokens, then minimal changes will be required to use this solution with API Gateway and a Lambda authorizer. By using credentials from your existing identity provider, you can integrate API Gateway resources into your application in the same manner that non-AWS resources are integrated.
  • Flexibility of authorization logic: Lambda authorizers allow for the additional customization of authorization logic, beyond validation and inspection of tokens.

Solution overview

The following diagram shows the authentication/authorization flow for using third-party tokens in API Gateway:

Figure 1: Example Solution Architecture

Figure 1: Example Solution Architecture

  1. After a successful login, the third-party identity provider issues an access token to a client.
  2. The client issues an HTTP request to API Gateway and includes the access token in the HTTP Authorization header.
  3. The API Gateway resource forwards the token to the Lambda authorizer.
  4. The Lambda authorizer authenticates the token with the third-party identity provider.
  5. The Lambda authorizer executes the authorization logic and creates an identity management policy.
  6. API Gateway evaluates the identity management policy against the API Gateway resource that the user requested and either allows or denies the request. If allowed, API Gateway forwards the user request to the API Gateway resource.


To build the architecture described in the solution overview, you will need the following:

  • An identity provider: Lambda authorizers can work with any type of identity provider and token format. The post uses a generic OAuth 2.0 identity provider and JSON Web Tokens (JWT).
  • An API Gateway REST API: You will eventually configure this REST API to rely on the Lambda authorizer for access control.
  • A means of retrieving tokens from your identity provider and calling API Gateway resources: This can be a web application, a mobile application, or any application that relies on tokens for accessing API resources.

For the REST API in this example, I use API Gateway with a mock integration. To create this API yourself, you can follow the walkthrough in Create a REST API with a Mock Integration in Amazon API Gateway.

You can use any type of client to retrieve tokens from your identity provider and issue requests to API Gateway, or you can consult the documentation for your identity provider to see if you can retrieve tokens directly and issue requests using a third-party tool such as Postman.

Before you proceed to building the Lambda authorizer, you should be able to retrieve tokens from your identity provider and issue HTTP requests to your API Gateway resource with the token included in the HTTP Authorization header. This post assumes that the identity provider issues OAuth JWT tokens, and the example below shows a raw HTTP request addressed to the mock API Gateway resource with an OAuth JWT access token in the HTTP Authorization header. This request should be sent by the client application that you are using to retrieve your tokens and issue HTTP requests to the mock API Gateway resource.

# Example HTTP Request using a Bearer token\
GET /dev/my-resource/?myParam=myValue HTTP/1.1\
Host: rz8w6b1ik2.execute-api.us-east-1.amazonaws.com\
Authorization: Bearer eyJraWQiOiJ0ekgtb1Z5eEpPSF82UDk3...}

Building a Lambda authorizer

When you configure a Lambda authorizer to serve as the authorization source for an API Gateway resource, the Lambda authorizer is invoked by API Gateway before the resource is called. Check out the Lambda Authorizer Authorization Workflow for more details on how API Gateway invokes and exchanges information with Lambda authorizers. The core functionality of the Lambda authorizer is to generate a well-formed identity management policy that dictates the allowed actions of the user, such as which APIs the user can access. The Lambda authorizer will use information in the third-party token to create the identity management policy based on “permissions mapping” documents that you define — I will discuss these permissions mapping documents in greater detail below.

After the Lambda authorizer generates an identity management policy, the policy is returned to API Gateway and API Gateway uses it to evaluate whether the user is allowed to invoke the requested API. You can optionally configure a setting in API Gateway to automatically cache the identity management policy so that subsequent API invocations with the same token do not invoke the Lambda authorizer, but instead use the identity management policy that was generated on the last invocation.

In this post, you will build your Lambda authorizer to receive an OAuth access token and validate its authenticity with the token issuer, then implement custom authorization logic to use the OAuth scopes present in the token to create an identity management policy that dictates which APIs the user is allowed to access. You will also configure API Gateway to cache the identity management policy that is returned by the Lambda authorizer. These patterns provide the following benefits:

  • Leverage third-party identity management services: Validating the token with the third party allows for consolidated management of services such as token verification, token expiration, and token revocation.
  • Cache to improve performance: Caching the token and identity management policy in API Gateway removes the need to call the Lambda authorizer for each invocation. Caching a policy can improve performance; however, this increased performance comes with addition security considerations. These considerations are discussed below.
  • Limit access with OAuth scopes: Using the scopes present in the access token, along with custom authorization logic, to generate an identity management policy and limit resource access is a familiar OAuth practice and serves as a good example of customizable authentication logic. Refer to Defining Scopes for more information on OAuth scopes and how they are typically used to control resource access.

The Lambda authorizer is invoked with the following object as the event parameter when API Gateway is configured to use a Lambda authorizer with the token event payload; refer to Input to an Amazon API Gateway Lambda Authorizer for more information on the types of payloads that are compatible with Lambda authorizers. Since you are using a token-based authorization scheme, you will use the token event payload. This payload contains the methodArn, which is the Amazon Resource Name (ARN) of the API Gateway resource that the request was addressed to. The payload also contains the authorizationToken, which is the third-party token that the user included with the request.

# Lambda Token Event Payload  
 type: 'TOKEN',  
 methodArn: 'arn:aws:execute-api:us-east-1:2198525...',  
 authorizationToken: 'Bearer eyJraWQiOiJ0ekgt...'  

Upon receiving this event, your Lambda authorizer will issue an HTTP POST request to your identity provider to validate the token, and use the scopes present in the third-party token with a permissions mapping document to generate and return an identity management policy that contains the allowed actions of the user within API Gateway. Lambda authorizers can be written in any Lambda-supported language. You can explore some starter code templates on GitHub. The example function in this post uses Node.js 10.x.

The Lambda authorizer code in this post uses a static permissions mapping document. This document is represented by apiPermissions. For a complex or highly dynamic permissions document, this document can be decoupled from the Lambda authorizer and exported to Amazon Simple Storage Service (Amazon S3) or Amazon DynamoDB for simplified management. The static document contains the ARN of the deployed API, the API Gateway stage, the API resource, the HTTP method, and the allowed token scope. The Lambda authorizer then generates an identity management policy by evaluating the scopes present in the third-party token against those present in the document.

The fragment below shows an example permissions mapping. This mapping restricts access by requiring that users issuing HTTP GET requests to the ARN arn:aws:execute-api:us-east-1:219852565112:rz8w6b1ik2 and the my-resource resource in the DEV API Gateway stage are only allowed if they provide a valid token that contains the email scope.

# Example permissions document  
 "arn": "arn:aws:execute-api:us-east-1:219852565112:rz8w6b1ik2",  
 "resource": "my-resource",  
 "stage": "DEV",  
 "httpVerb": "GET",  
 "scope": "email"  

The logic to create the identity management policy can be found in the generateIAMPolicy() method of the Lambda function. This method serves as a good general example of the extent of customization possible in Lambda authorizers. While the method in the example relies solely on token scopes, you can also use additional information such as request context, user information, source IP address, user agents, and so on, to generate the returned identity management policy.

Upon invocation, the Lambda authorizer below performs the following procedure:

  1. Receive the token event payload, and isolate the token string (trim “Bearer ” from the token string, if present).
  2. Verify the token with the third-party identity provider.

    Note: This Lambda function does not include this functionality. The method, verifyAccessToken(), will need to be customized based on the identity provider that you are using. This code assumes that the verifyAccessToken() method returns a Promise that resolves to the decoded token in JSON format.

  3. Retrieve the scopes from the decoded token. This code assumes these scopes can be accessed as an array at claims.scp in the decoded token.
  4. Iterate over the scopes present in the token and create identity and access management (IAM) policy statements based on entries in the permissions mapping document that contain the scope in question.
  5. Create a complete, well-formed IAM policy using the generated IAM policy statements. Refer to IAM JSON Policy Elements Reference for more information on programmatically building IAM policies.
  6. Return complete IAM policy to API Gateway.
     * Sample Lambda Authorizer to validate tokens originating from
     * 3rd Party Identity Provider and generate an IAM Policy
    const apiPermissions = [
        "arn": "arn:aws:execute-api:us-east-1:219852565112:rz8w6b1ik2", // NOTE: Replace with your API Gateway API ARN
        "resource": "my-resource", // NOTE: Replace with your API Gateway Resource
        "stage": "dev", // NOTE: Replace with your API Gateway Stage
        "httpVerb": "GET",
        "scope": "email"
    var generatePolicyStatement = function (apiName, apiStage, apiVerb, apiResource, action) {
      'use strict';
      // Generate an IAM policy statement
      var statement = {};
      statement.Action = 'execute-api:Invoke';
      statement.Effect = action;
      var methodArn = apiName + "/" + apiStage + "/" + apiVerb + "/" + apiResource + "/";
      statement.Resource = methodArn;
      return statement;
    var generatePolicy = function (principalId, policyStatements) {
      'use strict';
      // Generate a fully formed IAM policy
      var authResponse = {};
      authResponse.principalId = principalId;
      var policyDocument = {};
      policyDocument.Version = '2012-10-17';
      policyDocument.Statement = policyStatements;
      authResponse.policyDocument = policyDocument;
      return authResponse;
    var verifyAccessToken = function (accessToken) {
      'use strict';
      * Verify the access token with your Identity Provider here (check if your 
      * Identity Provider provides an SDK).
      * This example assumes this method returns a Promise that resolves to 
      * the decoded token, you may need to modify your code according to how
      * your token is verified and what your Identity Provider returns.
    var generateIAMPolicy = function (scopeClaims) {
      'use strict';
      // Declare empty policy statements array
      var policyStatements = [];
      // Iterate over API Permissions
      for ( var i = 0; i  -1 ) {
          // User token has appropriate scope, add API permission to policy statements
          policyStatements.push(generatePolicyStatement(apiPermissions[i].arn, apiPermissions[i].stage, apiPermissions[i].httpVerb,
                                                        apiPermissions[i].resource, "Allow"));
      // Check if no policy statements are generated, if so, create default deny all policy statement
      if (policyStatements.length === 0) {
        var policyStatement = generatePolicyStatement("*", "*", "*", "*", "Deny");
      return generatePolicy('user', policyStatements);
    exports.handler = async function(event, context) {
      // Declare Policy
      var iamPolicy = null;
      // Capture raw token and trim 'Bearer ' string, if present
      var token = event.authorizationToken.replace("Bearer ", "");
      // Validate token
      await verifyAccessToken(token).then(data => {
        // Retrieve token scopes
        var scopeClaims = data.claims.scp;
        // Generate IAM Policy
        iamPolicy = generateIAMPolicy(scopeClaims);
      .catch(err => {
        // Generate default deny all policy statement if there is an error
        var policyStatements = [];
        var policyStatement = generatePolicyStatement("*", "*", "*", "*", "Deny");
        iamPolicy = generatePolicy('user', policyStatements);
      return iamPolicy;

The following is an example of the identity management policy that is returned from your function.

# Example IAM Policy
  "principalId": "user",
  "policyDocument": {
    "Version": "2012-10-17",
    "Statement": [
        "Action": "execute-api:Invoke",
        "Effect": "Allow",
        "Resource": "arn:aws:execute-api:us-east-1:219852565112:rz8w6b1ik2/get/DEV/my-resource/"

It is important to note that the Lambda authorizer above is not considering the method or resource that the user is requesting. This is because you want to generate a complete identity management policy that contains all the API permissions for the user, instead of a policy that only contains allow/deny for the requested resource. By generating a complete policy, this policy can be cached by API Gateway and used if the user invokes a different API while the policy is still in the cache. Caching the policy can reduce API latency from the user perspective, as well as the total amount of Lambda invocations; however, it can also increase vulnerability to Replay Attacks and acceptance of expired/revoked tokens.

Shorter cache lifetimes introduce more latency to API calls (that is, the Lambda authorizer must be called more frequently), while longer cache lifetimes introduce the possibility of a token expiring or being revoked by the identity provider, but still being used to return a valid identity management policy. For example, the following scenario is possible when caching tokens in API Gateway:

  • Identity provider stamps access token with an expiration date of 12:30.
  • User calls API Gateway with access token at 12:29.
  • Lambda authorizer generates identity management policy and API Gateway caches the token/policy pair for 5 minutes.
  • User calls API Gateway with same access token at 12:32.
  • API Gateway evaluates access against policy that exists in the cache, despite original token being expired.

Since tokens are not re-validated by the Lambda authorizer or API Gateway once they are placed in the API Gateway cache, long cache lifetimes may also increase susceptibility to Replay Attacks. Longer cache lifetimes and large identity management policies can increase the performance of your application, but must be evaluated against the trade-off of increased exposure to certain security vulnerabilities.

Deploying the Lambda authorizer

To deploy your Lambda authorizer, you first need to create and deploy a Lambda deployment package containing your function code and dependencies (if applicable). Lambda authorizer functions behave the same as other Lambda functions in terms of deployment and packaging. For more information on packaging and deploying a Lambda function, see AWS Lambda Deployment Packages in Node.js. For this example, you should name your Lambda function myLambdaAuth and use a Node.js 10.x runtime environment.

After the function is created, add the Lambda authorizer to API Gateway.

  1. Navigate to API Gateway and in the navigation pane, under APIs, select the API you configured earlier
  2. Under your API name, choose Authorizers, then choose Create New Authorizer.
  3. Under Create Authorizer, do the following:
    1. For Name, enter a name for your Lambda authorizer. In this example, the authorizer is named Lambda-Authorizer-Demo.
    2. For Type, select Lambda
    3. For Lambda Function, select the AWS Region you created your function in, then enter the name of the Lambda function you just created.
    4. Leave Lambda Invoke Role empty.
    5. For Lambda Event Payload choose Token.
    6. For Token Source, enter Authorization.
    7. For Token Validation, enter:
      ^(Bearer )[a-zA-Z0-9\-_]+?\.[a-zA-Z0-9\-_]+?\.([a-zA-Z0-9\-_]+)$

      This represents a regular expression for validating that tokens match JWT format (more below).

    8. For Authorization Caching, select Enabled and enter a time to live (TTL) of 1 second.
  4. Select Save.


Figure 2: Create a new Lambda authorizer

Figure 2: Create a new Lambda authorizer

This configuration passes the token event payload mentioned above to your Lambda authorizer, and is necessary since you are using tokens (Token Event Payload) for authentication, rather than request parameters (Request Event Payload). For more information, see Use API Gateway Lambda Authorizers.

In this solution, the token source is the Authorization header of the HTTP request. If you know the expected format of your token, you can include a regular expression in the Token Validation field, which automatically rejects any request that does not match the regular expression. Token validations are not mandatory. This example assumes the token is a JWT.

# Regex matching JWT Bearer Tokens  
^(Bearer )[a-zA-Z0-9\-_]+?\.[a-zA-Z0-9\-_]+?\.([a-zA-Z0-9\-_]+)$

Here, you can also configure how long the token/policy pair will be cached in API Gateway. This example enables caching with a TTL of 1 second.

In this solution, you leave the Lambda Invoke Role field empty. This field is used to provide an IAM role that allows API Gateway to execute the Lambda authorizer. If left blank, API Gateway configures a default resource-based policy that allows it to invoke the Lambda authorizer.

The final step is to point your API Gateway resource to your Lambda authorizer. Select the configured API Resource and HTTP method.

  1. Navigate to API Gateway and in the navigation pane, under APIs, select the API you configured earlier.
  2. Select the GET method.

    Figure 3: GET Method Execution

    Figure 3: GET Method Execution

  3. Select Method Request.
  4. Under Settings, edit Authorization and select the authorizer you just configured (in this example, Lambda-Authorizer-Demo).

    Figure 4: Select your API authorizer

    Figure 4: Select your API authorizer

Deploy the API to an API Gateway stage that matches the stage configured in the Lambda authorizer permissions document (apiPermissions variable).

  1. Navigate to API Gateway and in the navigation pane, under APIs, select the API you configured earlier.
  2. Select the / resource of your API.
  3. Select Actions, and under API Actions, select Deploy API.
  4. For Deployment stage, select [New Stage] and for the Stage name, enter dev. Leave Stage description and Deployment description blank.
  5. Select Deploy.

    Figure 5: Deploy your API stage

    Figure 5: Deploy your API stage

Testing the results

With the Lambda authorizer configured as your authorization source, you are now able to access the resource only if you provide a valid token that contains the email scope.

The following example shows how to issue an HTTP request with curl to your API Gateway resource using a valid token that contains the email scope passed in the HTTP Authorization header. Here, you are able to authenticate and receive an appropriate response from API Gateway.

# HTTP Request (including valid token with "email" scope)  
$ curl -X GET \  
> 'https://rz8w6b1ik2.execute-api.us-east-1.amazonaws.com/dev/my-resource/?myParam=myValue' \  
> -H 'Authorization: Bearer eyJraWQiOiJ0ekgtb1Z5eE...'  
 "statusCode" : 200,  
 "message" : "Hello from API Gateway!"  

The following JSON object represents the decoded JWT payload used in the previous example. The JSON object captures the token scopes in scp, and you can see that the token contained the email scope.

Figure 6: JSON object that contains the email scope

Figure 6: JSON object that contains the email scope

If you provide a token that is expired, is invalid, or that does not contain the email scope, then you are not able to access the resource. The following example shows a request to your API Gateway resource with a valid token that does not contain the email scope. In this example, the Lambda authorizer rejects the request.

# HTTP Request (including token without "email" scope)  
$ curl -X GET \  
> 'https://rz8w6b1ik2.execute-api.us-east-1.amazonaws.com/dev/my-resource/?myParam=myValue' \  
> -H 'Authorization: Bearer eyJraWQiOiJ0ekgtb1Z5eE...'  
 "Message" : "User is not authorized to access this resource with an explicit deny"  

The following JSON object represents the decoded JWT payload used in the above example; it does not include the email scope.

Figure 7: JSON object that does not contain the email scope

Figure 7: JSON object that does not contain the email scope

If you provide no token, or you provide a token not matching the provided regular expression, then you are immediately rejected by API Gateway without invoking the Lambda authorizer. API Gateway only forwards tokens to the Lambda authorizer that have the HTTP Authorization header and pass the token validation regular expression, if a regular expression was provided. If the request does not pass token validation or does not have an HTTP Authorization header, API Gateway rejects it with a default HTTP 401 response. The following example shows how to issue a request to your API Gateway resource using an invalid token that does match the regular expression you configured on your authorizer. In this example, API Gateway rejects your request automatically without invoking the authorizer.

# HTTP Request (including a token that is not a JWT)  
$ curl -X GET \  
> 'https://rz8w6b1ik2.execute-api.us-east-1.amazonaws.com/dev/my-resource/?myParam=myValue' \  
> -H 'Authorization: Bearer ThisIsNotAJWT'  
 "Message" : "Unauthorized"  

These examples demonstrate how your Lambda authorizer allows and denies requests based on the token format and the token content.


In this post, you saw how Lambda authorizers can be used with API Gateway to implement a token-based authentication scheme using third-party tokens.

Lambda authorizers can provide a number of benefits:

  • Leverage third-party identity management services directly, without identity federation.
  • Implement custom authorization logic.
  • Cache identity management policies to improve performance of authorization logic (while keeping in mind security implications).
  • Minimally impact existing client applications.

For organizations seeking an alternative to Amazon Cognito User Pools and Amazon Cognito identity pools, Lambda authorizers can provide complete, secure, and flexible authentication and authorization services to resources deployed with Amazon API Gateway. For more information about Lambda authorizers, see API Gateway Lambda Authorizers.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Bryant Bost

Bryant Bost is an Application Consultant for AWS Professional Services based out of Washington, DC. As a consultant, he supports customers with architecting, developing, and operating new applications, as well as migrating existing applications to AWS. In addition to web application development, Bryant specializes in serverless and container architectures, and has authored several posts on these topics.

Ingest Excel data automatically into Amazon QuickSight

Post Syndicated from Ying Wang original https://aws.amazon.com/blogs/big-data/ingest-excel-data-automatically-into-amazon-quicksight/

Amazon QuickSight is a fast, cloud-powered, business intelligence (BI) service that makes it easy to deliver insights to everyone in your organization. This post demonstrates how to build a serverless data ingestion pipeline to automatically import frequently changed data into a SPICE (Super-fast, Parallel, In-memory Calculation Engine) dataset of Amazon QuickSight dashboards.

It is sometimes quite difficult to be agile in BI development. For example, end-users that perform self-service analytics may want to add their additional ad hoc data into an existing dataset and have a view of the corresponding updated dashboards and reports in a timely fashion. However, dashboards and reports are usually built on top of a single online analytic processing (OLAP) data warehouse with a rigid schema. Therefore, an end-user (who doesn’t have permission to update the dataset directly) has to go through a complicated and time-consuming procedure to have their data updated in the warehouse. Alternatively, they could open a ticket for you to edit the dataset manually, but it is still a very inconvenient solution that involves a significant amount of repetitive manual effort, especially if they frequently need to update the data.

Therefore, an automated data processing tool that can perform real-time data ingestion is very useful. This post discusses a tool that, when an end-user uploads Excel files into Amazon S3 or any other data file sharing location, performs the following end-to-end process:

  • Cleans the raw data from the Excel files, which might contain a lot of formatting and redundant information.
  • Ingests the cleaned data.
  • Performs a status check to monitor the data cleaning and ingestion process.
  • Sends a notification of the results to the end-user and BI development team.

With the recently launched feature cross data source joins, you can join across all data sources that Amazon QuickSight supports, including file-to-file, file-to-database, and database-to-database joins. For more information, see Joining across data sources on Amazon QuickSight.

In addition to cross data source joins, Amazon QuickSight has also launched new APIs for SPICE ingestion. For more information, see Importing Data into SPICE and Evolve your analytics with Amazon QuickSight’s new APIs and theming capabilities.

This post shows how you can combine these features to build an agile solution that cleans and ingests an Excel file into a SPICE dataset of Amazon QuickSight automatically. In SPICE, the real-time data from Excel joins with the Amazon Redshift OLAP data warehouse, and end-users receive Amazon SNS messages about its status throughout the process.

Solution overview

The following diagram illustrates the workflow of the solution.

The workflow includes the following steps:

  1. An end-user uploads an Excel file into an on-premises shared folder.
  2. The Excel files upload to the Amazon S3 bucket excel-raw-data.Alternatively, the end-user can skip this step and upload the Excel file into this Amazon S3 bucket directly.
  3. This upload event triggers the SNS message Excel-Raw-Data-Uploaded.
  4. Both the end-user and the BI team receive a message about the new upload event.
  5. The upload event also triggers the AWS Lambda function DataClean to process the Excel data.
  6. The Lambda function removes the formatting and redundant information of the Excel file, saves the cleaned data as a CSV file into the S3 bucket autoingestionqs, and publishes an SNS message to notify end-users about the data cleansing status.
  7. This cleaned CSV file is mapped as an Amazon Athena table.
  8. In the Amazon QuickSight SPICE dataset, this table joins with the Amazon Redshift table through the cross data source join functionality.
  9. The CSV file creation event in the S3 bucket autoingestionqs triggers the Lambda function qsAutoIngestion.
  10. This function calls the data ingestion API of Amazon QuickSight and checks the data ingestion status.
  11. When the data ingestion is complete, end-users receive the Ingestion-Finished SNS message.


For this walkthrough, you should have the following prerequisites:

Creating resources

Create your resources by launching the following AWS CloudFormation stack:

During the stack creation process, you have to provide a valid email address as the endpoint of Amazon SNS services. After the stack creation is successful, you have three SNS topics, two S3 buckets, and the corresponding IAM policies.


To implement this solution, complete the following steps:

  1. Enable SNS notification of new object creation event in S3 bucket excel-raw-data. For more information, see How Do I Enable and Configure Event Notifications for an S3 Bucket? When an end-user uploads an Excel file into the excel-raw-data S3 bucket, the event triggers an Amazon SNS message.The following screenshot shows the example Excel file that this post uses.The following screenshot shows the SNS message Excel-Raw-Data-Upload, which includes details of the upload event.
  2. Download the sample code DataClean.py in Python 3.7 from the GitHub repo.
  3. Create a Lambda function named DataClean.
  4. Configure the function to be a subscriber of the SNS topic Excel-Raw-Data-Uploaded.
  5. Edit the SNS topic Cleaning-is-Done, and add the following code to the access policy:
    "Sid": "example-statement-ID",
          "Effect": "Allow",
          "Principal": {
            "AWS": "*"
          "Action": "SNS:Publish",
          "Resource": " arn:aws:sns:us-east-1:AWS Account ID: SNS Topic Name",
          "Condition": {
            "ArnLike": {
              "aws:SourceArn": "arn:aws:lambda:us-east-1:AWSAccountID:function:DataClean"

    The policy allows the Lambda function DataClean to trigger the SNS message Cleaning-is-Done.

    The function DataClean saves the CSV file of the cleaned data into the S3 bucket autoingestionqs. You should see the new CSV file in this bucket. See the following screenshot.

    When the Lambda function ends, it triggers the SNS message Cleaning-is-Done. The following screenshot shows the text of the notification message.

  6. Add an event notification into the S3 bucket autoingestionqs to trigger a Lambda function named qsAutoIngestion.This function calls the Amazon QuickSight data API to ingest data into the SPICE dataset.The cleaned CSV file in the S3 bucket autoingestionqs is mapped as an Athena table. The following screenshot shows the sample data in the CSV file.

    In the Amazon QuickSight SPICE dataset, the Athena table joins with the Amazon Redshift table through the cross data source join functionality.

  7. Create the SPICE dataset. For more information, see Joining across data sources on Amazon QuickSight.The following screenshot shows the Data page in Amazon QuickSight where your data set details appear. The Athena table joins a Redshift table.The new object creation event in the Amazon S3 bucket autoingestionqs triggers another Lambda function named qsAutoIngestion. This function calls the data ingestion API of Amazon QuickSight and checks the data ingestion status. If the data ingestion is completed successfully, end-users receive the SNS message Ingestion-Finished. You can download the sample code of qsAutoIngestion from the GitHub repo.

Cleaning up

To avoid incurring future charges, delete the resources you created: the two Lambda functions, three SNS topics, two S3 buckets, and corresponding IAM policies.


This post discussed how BI developers and architects can use data API, Lambda functions, and other AWS services to complete an end-to-end automation process. End-users can have their real-time data ingested and joined with OLAP data warehouse tables and visualize their data in a timely fashion without the need to wait for nightly or hourly ETL or the need to understand the complex technical development steps. You should now be fully equipped to construct a solution in a development environment and demo it to non-technical business end-users.


About the Author

Ying Wang is a Data Visualization Engineer with the Data & Analytics Global Specialty Practice in AWS Professional Services.

Halodoc: Building the Future of Tele-Health One Microservice at a Time

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/halodoc-building-the-future-of-tele-health-one-microservice-at-a-time/

Halodoc, a Jakarta-based healthtech platform, uses tele-health and artificial intelligence to connect patients, doctors, and pharmacies. Join builder Adrian De Luca for this special edition of This is My Architecture as he dives deep into the solutions architecture of this Indonesian healthtech platform that provides healthcare services in one of the most challenging traffic environments in the world.

Explore how the company evolved its monolithic backend into decoupled microservices with Amazon EC2 and Amazon Simple Queue Service (SQS), adopted serverless to cost effectively support new user functionality with AWS Lambda, and manages the high volume and velocity of data with Amazon DynamoDB, Amazon Relational Database Service (RDS), and Amazon Redshift.

For more content like this, subscribe to our YouTube channels This is My Architecture, This is My Code, and This is My Model, or visit the This is My Architecture AWS website, which has search functionality and the ability to filter by industry, language, and service.

The AWS Serverless Application Repository adds sharing for AWS Organizations

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/the-aws-serverless-application-repository-adds-sharing-for-aws-organizations/

The AWS Serverless Application Repository (SAR) enables builders to package serverless applications and reuse these within their own AWS accounts, or share with a broader audience. Previously, SAR applications could only be shared with specific AWS account IDs or made publicly available to all users. For organizations with large numbers of AWS accounts, this means managing a large list of IDs. It also involves tracking when accounts are added or removed to the organizational group.

This new feature allows developers to share SAR applications with AWS Organizations without specifying a list of AWS account IDs. Additionally, if you add or remove accounts later from AWS Organizations, you do not need to manually maintain a list for sharing your SAR application. This new feature also brings more granular controls over the permissions granted, including the ability for master accounts to unshare applications.

This blog post explains these new features and how you can use them to share your new and existing SAR applications.

Sharing a new SAR application with AWS Organizations

First, find your AWS Organization ID by visiting the AWS Organizations console, and choose Settings from the menu bar. Copy your Organization ID from the Organization details card and save this for later. If you do not currently have an organization configured and want to use this feature, see this tutorial for instructions on how to set up an AWS Organization.

Organization ID

Go to the Serverless Application Repository, choose Publish Application and follow the process for publishing an application to your own account. After the application is published, you will see a new tab on the Application Details page called Sharing.

SAR sharing tab

From the Sharing tab, choose Create Statement in the Application policy statements card. To share the application with an entire organization:

  • Enter a Statement Id, which is a helpful reference for the policy statement.
  • Select “With an organization” from the list of sharing options.
  • Enter the Organization ID from earlier.
  • Check the option to “Enable all actions needed to deploy”.
  • Check the acknowledgment check box.
  • Choose Save.

Statement configuration

Now your application is published to all the account IDs within your AWS Organization. As you add policy statements to define how an application is shared, these appear as cards at the end of the Sharing tab.

Policy statements in the sharing tab

Existing shared SAR applications

If you have previous created shared SAR applications with individual accounts, or shared these applications publicly, the policy statements have already been configured. In this case, the policy statements that are generated automatically reflect the existing scope of sharing, so there is no change in the level of visibility of the application.

For example, for a SAR application that was previously shared with two AWS account IDs, there is now a policy statement showing these two account principals and the permission to deploy the application. A Statement Id value is automatically generated as a random globally unique identifier.

GUID statement ID

To change these automatically generated policy statements, select the policy statement you want to change and choose Edit. The subsequent page allows you to modify the random Statement Id, configure the sharing options, and modify the deployment actions allowed.

Statement configuration

For a SAR application that was previously shared publicly, there is now a policy statement showing all accounts as an asterisk (*), with the permission to deploy. To disable public access, choose Edit in the Public Sharing card, and modify the settings on the panel.

Public sharing

More flexibility for defining permissions

This new feature allows you to define specific API actions allowed in each policy statement, and you can associate different permitted actions with different organizations. This allows you to more precisely control how applications are discovered and used within different accounts.

To learn more, read about the definition of these API actions in the SAR documentation.

Allowed actions

Any changes made to the resource policy sharing statements are effective immediately, and applications shared with AWS Organizations and other AWS accounts are only shared within the Region where the application was initially published. This new ability to share applications with organizations can be used in all Regions where the Serverless Application Repository is available.

In addition to creating and modifying resource policy statements within the AWS Management Console, builders can also choose to manage application sharing via the AWS SDK or AWS CLI.

Unsharing an application from AWS Organizations

You can also unshare an application from AWS Organizations through the AWS Management Console by doing the following:

  1. From the Serverless Application Repository console, choose Available Applications in the left navigation pane.
  2. In the application’s tile, choose Unshare.
  3. In the unshare confirmation dialog, enter the Organization ID and application name, then choose Save.

To learn more, read about unsharing published applications.


This post shows how to enable sharing for SAR applications across AWS Organizations using application policy statements, and how to modify existing resource policies. Additionally, it covers how existing SAR applications that are already shared now expose a resource policy that reflects the previously selected sharing preferences, and how you can also modify this policy statement.

The new sharing interface in the SAR console continues to support the previous capabilities of sharing applications with other AWS accounts, and making applications public. This new feature makes it much easier for builders to share SAR applications across large number of accounts within an organization, without needing to manually manage list of account IDs, and provides more granularity over the access controls granted.

For more information about sharing applications using AWS Organization IDs, visit the SAR documentation.

How to run AWS CloudHSM workloads on AWS Lambda

Post Syndicated from Mohamed AboElKheir original https://aws.amazon.com/blogs/security/how-to-run-aws-cloudhsm-workloads-on-aws-lambda/

AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to generate and use your own encryption keys on the AWS Cloud. With CloudHSM, you can manage your own encryption keys using FIPS 140-2 Level 3 validated HSMs. CloudHSM also automatically manages synchronization, high availability and failover within a cluster.

When the service first launched, many customers ran CloudHSM workloads on Amazon Elastic Compute Cloud (Amazon EC2), which required the CloudHSM client to be installed on the Amazon EC2 instance in order to communicate with the CloudHSM cluster. Today, we see customers who are interested in leveraging CloudHSM for serverless workloads using AWS Lambda, but when using Lambda there is no “instance” to install the CloudHSM client on. This blog post shows a workaround that can be used to satisfy the CloudHSM client installation requirement on Lambda functions to be able to run CloudHSM workloads within these Lambda functions.

The workaround is performed by first packaging the CloudHSM client and its requirements in a Lambda layer, and then running the CloudHSM client in a child process from within the Lambda function code to allow communication with the HSMs in your CloudHSM cluster. By leveraging this approach, you gain the benefits of serverless computing (such as increased scalability and decreased admin overhead), as well as the ability to integrate with other AWS services like Amazon CloudWatch Events, Amazon Simple Storage Service (Amazon S3) and AWS Config.

Why would I want to run CloudHSM workloads on Lambda?

Below are some specific use cases enabled by this solution:

  1. When a file is added to an Amazon S3 bucket, you can trigger a Lambda function to encrypt or decrypt the file using keys stored in CloudHSM.
  2. When a file is added to an Amazon S3 bucket, you can trigger a Lambda function to create a digital signature for the file using a private key stored in CloudHSM. This digital signature can then be used to ensure file integrity.
  3. You can create a custom AWS Config rule that checks to ensure files in a directory or a bucket have not been tampered with by verifying their digital signatures using keys stored in CloudHSM.

Solution overview

This solution shows you how to package the CloudHSM client binary and its dependencies (configuration files and libraries) as well as the CloudHSM Java JCE library to a Lambda layer which is attached to the Lambda function. This enables the function to run the CloudHSM client daemon in the background as a child process, allowing it to connect to the CloudHSM cluster and to perform cryptographic tasks such as encryption and decryption operations.

Using a Lambda layer decouples the code of the Lambda function from the CloudHSM client and the CloudHSM Java JCE library. This way, when a new version of the CloudHSM client and the CloudHSM Java JCE library is released, it can be included in a new Lambda layer version and attached to the Lambda function without needing to rebuild the Lambda function package.

The example solution below includes a complete Java sample for the Lambda function. It uses the CloudHSM Java JCE library to generate a symmetric key on the HSM, and it uses this key to encrypt and decrypt after starting the CloudHSM client. Maven (a build automation tool) will be used to build the Lambda function package.

The solution uses AWS Secrets Manager to store and retrieve the crypto user (CU) credentials that are needed to perform cryptographic operations. If the HSM IPs of the CloudHSM cluster are changed (for example, if the HSMs are deleted and re-created), the Lambda function will automatically update the configuration during runtime.


  1. The solution only works with version 2.0.4 or later of the CloudHSM client and CloudHSM Java JCE library.
  2. In this workaround, the client is started at the beginning of each Lambda invocation, and is stopped at the end of the invocation. Due to the way Lambda works, the client can’t persist through multiple invocations.
  3. Secrets Manager uses AWS Key Management Service to secure its data. If your workload requires that all data be secured using HSMs under your sole control, without reliance on IAM credentials, this solution may not be appropriate. You should work with your security or compliance officer to ensure you’re using a method of securing HSM login credentials that meets your application and security needs.


Figure 1: Architectural diagram

Figure 1: Architectural diagram

Here are the resources you’ll need in order to follow along with the example in Figure 1:

  1. An Amazon Virtual Private Cloud (Amazon VPC) with the following components:
    1. Private subnets in multiple Availability Zones to be used for the HSM’s elastic network interfaces (ENIs).
    2. A public subnet that contains a network address translation (NAT) gateway.
    3. A private subnet with a route table that routes internet traffic ( to the NAT gateway. You’ll use this subnet to run the Lambda function. The NAT gateway allows you to connect to the CloudHSM, CloudWatch Logs and Secrets Manager endpoints.

    Note: For high availability, you can add multiple instances of the public and private subnets mentioned in Prerequisites 1.b and 1.c. For more information about how to create an Amazon VPC with public and private subnets as well as a NAT gateway, refer to the Amazon VPC user guide.

  2. An active CloudHSM cluster with at least one active HSM. The HSMs should be created in the private subnets mentioned in Prerequisite 1.a. You can follow the Getting Started with AWS CloudHSM guide to create and initialize the CloudHSM cluster.
  3. An Amazon Linux 2 EC2 instance with the CloudHSM client installed and configured to connect to the CloudHSM cluster. The client instance should be launched in the public subnet mentioned in Prerequisite 1.b. You can again refer to Getting Started With AWS CloudHSM to configure and connect the client instance.

    Note: You only need the client instance to build the Lambda function package. You can terminate the instance after the package has been created.

  4. CU credentials. You can create a CU by following the steps in the user guide.
  5. A server/machine with AWS Command Line Interface (AWS CLI) installed and configured. You’ll need this to follow along, as the example uses AWS CLI to create and configure the necessary AWS resources. The IAM user/role should have at minimum the permissions in the below policy attached to it to follow this example. Make sure you replace the <REGION> and <ACCOUNT-ID> tags below with the actual Region and account ID you are using.
        "Version": "2012-10-17",
        "Statement": [
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": "secretsmanager:CreateSecret",
                "Resource": "*",
                "Condition": {
                    "StringEquals": {
                        "secretsmanager:Name": "CloudHSM_CU"
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": [
                "Resource": [
                "Sid": "VisualEditor3",
                "Effect": "Allow",
                "Action": [
                "Resource": "*"

Step 1: Build the Lambda function package

In this step, you’ll build the Lambda function package using Maven. For more information about using Maven to build an AWS Lambda Java package, refer to the AWS Lambda developer guide.

  1. On your CloudHSM client instance, install the CloudHSM Java JCE library by following the steps in the user guide.
  2. Install OpenJDK 8 and Maven:
    $ sudo yum install -y java maven

  3. Download the sample code, unzip it and move to the created directory. The directory will have the name aws-cloudhsm-on-aws-lambda-sample-master and will include:
    • A file with the name pom.xml that contains the Maven project configuration.
    • A file with the name SymmetricKeys.java which is also available on the AWS CloudHSM Java JCE samples repo. This file contains the function that you’ll use to generate the advanced encryption standard (AES) key.
    • A file with the name AESGCMEncryptDecryptLambda.java, which will run when the Lambda function is invoked:
      $ wget https://github.com/aws-samples/aws-cloudhsm-on-aws-lambda-sample/archive/master.zip
      $ unzip master.zip
      $ cd aws-cloudhsm-on-aws-lambda-sample-master/

  4. Create a Java Archive (JAR) package by running the below commands. This will create the JAR file under the target/ directory with the name cloudhsm_lambda_project-1.0-SNAPSHOT.jar.

    $ export CLOUDHSM_VER=$(ls /opt/cloudhsm/java/ | grep "cloudhsm-[0-9\.]\+.jar" | grep -o "[0-9\.]\+[0-9]")
    $ export LOG4JCORE_VER=$(ls /opt/cloudhsm/java/ | grep "log4j-core-[0-9\.]\+.jar" | grep -o "[0-9\.]\+[0-9]")
    $ export LOG4JAPI_VER=$(ls /opt/cloudhsm/java/ | grep "log4j-api-[0-9\.]\+.jar" | grep -o "[0-9\.]\+[0-9]")
    $ mvn validate && mvn clean package 

Step 2: Create the Lambda layer

In this step, you’ll create the Lambda layer that contains the CloudHSM client and its dependencies and the CloudHSM Java library JARs.

  1. On your CloudHSM client instance, create a directory called “layer” and change directories to it:
    $ mkdir ~/layer && cd ~/layer

  2. Create the following directories, which you’ll use in the next steps to hold the CloudHSM binary and its prerequisites such as configuration files and libraries, and the CloudHSM Java JCE JARs:
    $ mkdir -p lib cloudhsm/bin cloudhsm/etc java/lib

  3. Copy the cloudhsm_client binary and the needed configuration files to the directories you created in the previous step.
    $ cp /opt/cloudhsm/bin/cloudhsm_client cloudhsm/bin
    $ cp -r /opt/cloudhsm/etc/{cloudhsm_client.cfg,customerCA.crt,client.crt,client.key,certs} cloudhsm/etc

  4. Add the necessary libraries by running the commands below. These libraries are needed by the Lambda function to be able to run the cloudhsm_client binary.
    $ cp /opt/cloudhsm/lib/libcaviumjca.so lib/
    $ ldd /opt/cloudhsm/bin/cloudhsm_client | awk '{print $3}' | grep "^/" | xargs -I{} cp {} lib/

  5. Add the CloudHSM Java JCE Jars by running the commands below. These JARs include the classes needed by the Lambda function code to run.
    $ cp /opt/cloudhsm/java/{cloudhsm-[0-9]*.jar,log4j-*-*.jar} java/lib/

  6. Create the Lambda layer ZIP archive by running the command below. This will create the archive with the name layer.zip in the home directory.
    $ zip -r ~/layer.zip * 

  7. Move the ZIP archive (layer.zip) to the server/machine with AWS CLI installed and configured, and run the below command to create the Lambda layer with the name cloudhsm-client-layer.
    $ aws lambda publish-layer-version --layer-name cloudhsm-client-layer --zip-file fileb://layer.zip --compatible-runtimes java8

Step 3: Create a secret to store the CU credentials

In this step, you will use Secrets Manager to create a secret to store your CU credentials. You must perform this step on your server/machine that has AWS CLI installed and configured.

Run the following command to create a secret with the name CloudHSM_CU that contains your CU user name and password (Prerequisite 4). Make sure to replace the user name and password below with your actual CU user name and password.

$ export HSM_USER=<user>
$ export HSM_PASSWORD=<password>
$ aws secretsmanager create-secret --name CloudHSM_CU --secret-string "{ \"HSM_USER\": \"$HSM_USER\", \"HSM_PASSWORD\": \"$HSM_PASSWORD\"}"

Step 4: Create an IAM role for the Lambda function

In this step, you’ll create an IAM role that has the permissions necessary for it to be assumed by the Lambda function.

  1. On the server/machine with AWS CLI installed and configured, create a new file with the name trust.json.
      "Version": "2012-10-17",
      "Statement": [
          "Effect": "Allow",
          "Principal": {
            "Service": "lambda.amazonaws.com"
          "Action": "sts:AssumeRole"

  2. Create a role named cloudhsm_lambda_example_role using the following AWS CLI command:

    $ aws iam create-role --role-name cloudhsm_lambda_example_role --assume-role-policy-document file://trust.json

  3. Run the commands below to create a new file named policy.json. The policy in this file allows the IAM role to perform the following actions:
    • Writing to CloudWatch Logs. This permission allows the IAM role to write to the CloudWatch Logs of the Lambda function. You can then use the logs for troubleshooting. For more information about accessing CloudWatch Logs for Lambda, refer to this guide.
    • Retrieving the CU secret value from Secrets Manager. The CU credentials stored in the CU secret are needed by the Lambda function to be able to log-in to the CloudHSM cluster.
    • Describing CloudHSM clusters. This permission allows the Lambda function to check the current HSM IPs and update its configuration if the IPs have changed.
    $ export SECRET_ARN=$(aws secretsmanager describe-secret --secret-id "CloudHSM_CU" --query "ARN" --output text)
    $ cat <<EOF> policy.json
        "Version": "2012-10-17",
        "Statement": [
                "Sid": "CWLogs",
                "Effect": "Allow",
                "Action": [
                "Resource": "*"
                "Sid": "SecretsManager",
                "Effect": "Allow",
                "Action": "secretsmanager:GetSecretValue",
                "Resource": "$SECRET_ARN"
                "Sid": "CloudHSM",
                "Effect": "Allow",
                "Action": "cloudhsm:DescribeClusters",
                "Resource": "*"

  4. Attach the policy to the IAM role created in step 2 of this section by running the following command:
    $ aws iam put-role-policy --role-name cloudhsm_lambda_example_role --policy-name cloudhsm_lambda_example_policy --policy-document file://policy.json

  5. Attach the AWS managed policy AWSLambdaVPCAccessExecutionRole to the created role by running the command below. This policy allows the IAM role to access the VPC, which is necessary in order to run the Lambda function in a VPC and a subnet.
    $ aws iam attach-role-policy --role-name cloudhsm_lambda_example_role --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole

  6. To make sure the CU secret is only accessible to the Lambda function role, run the below commands to attach a resource-based policy to the secret:
    $ export ROLE_ARN=$(aws iam get-role --role-name cloudhsm_lambda_example_role --query Role.Arn --output text)
    $ export ASSUMED_ROLE_ARN=$(echo $ROLE_ARN | sed -e "s/:iam:/:sts:/" -e "s/:role/:assumed-role/" -e "s/$/\/cloudhsm_lambda_example/")
    $ export ROOT_ARN=$(echo $ROLE_ARN | sed "s/:role.*/:root/")
    $ cat <<EOF> sm_policy.json
    { "Version": "2012-10-17",
    	"Statement": [
    			"Effect": "Deny",
    			"Action": "secretsmanager:GetSecretValue",
    			"NotPrincipal": {"AWS": [
    				"Resource": "*"
    $ aws secretsmanager put-resource-policy --resource-policy file://sm_policy.json --secret-id CloudHSM_CU

Step 5: Create the Lambda function

In this step, you will create a Lambda function with the necessary settings.

  1. On the server/machine with AWS CLI installed and configured, run the command below to create a security group with the name outbound-443. This security group will be attached to the Lambda function to allow it to connect to the CloudWatch Logs, Secrets Manager and CloudHSM endpoints. Make sure to replace the CLUSTER_ID below with the actual CloudHSM cluster ID of your environment.
    $ export CLUSTER_ID=<cluster-xxxxxxxxxx>
    $ export CLUSTER_VPC=$(aws cloudhsmv2 describe-clusters --filters clusterIds=$CLUSTER_ID --query Clusters[0].VpcId --output text)
    $ export OUTBOUND_SG=$(aws ec2 create-security-group --group-name outbound-443 --description "Allow outbound access to port 443" --vpc-id $CLUSTER_VPC --output text)
    $ aws ec2 authorize-security-group-egress --group-id $OUTBOUND_SG --protocol tcp --port 443 --cidr

  2. Move the JAR package generated in step 4 of the Step 1 section to the current directory on the server/machine that has AWS CLI installed and configured (The file was generated on the CloudHSM client instance under ~/aws-cloudhsm-on-aws-lambda-sample-master/target/cloudhsm_lambda_project-1.0-SNAPSHOT.jar).
  3. Replace the cluster ID and subnet ID below with the CloudHSM cluster ID of your environment, and the ID of the private Lambda subnet in your environment (Prerequisite 1.c), then run the commands below. These commands set environment variables that you’ll need for the next command.
    $ export CLUSTER_ID=<cluster-xxxxxxxxxx>
    $ export SUBNET_ID=<subnet-xxxxxxxx>
    $ export CLUSTER_VPC=$(aws cloudhsmv2 describe-clusters --filters clusterIds=$CLUSTER_ID --query Clusters[0].VpcId --output text)
    $ export OUTBOUND_SG=$(aws ec2 describe-security-groups --filters Name=group-name,Values=outbound-443  --query SecurityGroups[0].GroupId --output text)
    $ export CLUSTER_SG=$(aws cloudhsmv2 describe-clusters --filters clusterIds=$CLUSTER_ID --query Clusters[0].SecurityGroup --output text)
    $ export ROLE_ARN=$(aws iam get-role --role-name cloudhsm_lambda_example_role --query Role.Arn --output text)
    $ export LAYER_ARN=$(aws lambda get-layer-version --layer-name cloudhsm-client-layer --version-number 1 --query LayerVersionArn --output text)

  4. Create a Lambda function with the name cloudhsm_lambda_example by running the below command:
    $ aws lambda create-function --function-name "cloudhsm_lambda_example" \
    --runtime java8 \
    --role $ROLE_ARN \
    --handler "com.amazonaws.cloudhsm.examples.AESGCMEncryptDecryptLambda::myhandler" \
    --timeout 600 \
    --memory-size 512 \
    --vpc-config SubnetIds=$SUBNET_ID,SecurityGroupIds=$CLUSTER_SG,$OUTBOUND_SG \
    --environment "Variables={CLUSTER_ID=$CLUSTER_ID, SECRET_ID=CloudHSM_CU,liquidsecurity_daemon_id=1}" \
    --layers $LAYER_ARN \
    --zip-file fileb://cloudhsm_lambda_project-1.0-SNAPSHOT.jar

The command will create a Lambda function with the following configuration:

  • Runtime: Java8
  • Execution Role: The role you created in the Step 4 section.
  • Handler: The name of the class and the function in the package created in the Step 1 section.
  • Timeout: 10 minutes.
  • Memory size: 512 MB.
  • Subnet: The private Lambda subnet in your environment (Prerequisite 1.c).
  • Security Groups: The CloudHSM cluster security group AND the security group created in step 1 of the Step 5 section for outbound access to port 443 (outbound-443).
  • Code/Package: The JAR package you created in step 4 of the Step 1 section.
  • Layer: The layer created in the Step 2 section.
  • Environmental Variables:
    • CLUSTER_ID = the CloudHSM cluster ID in your environment
    • SECRET_ID = the ID of the secret you created in the Step 3 section
    • liquidsecurity_daemon_id = 1 (this is needed by the cloudhsm_client binary)

Step 6: Run the Lambda function

In this step, you will invoke the Lambda function and check the logs to view the output.

  1. You can invoke the Lambda function using the following command. This will execute the code in the package you created in Step 1.
    $ aws lambda invoke --function-name cloudhsm_lambda_example out.txt

  2. You can check the function’s CloudWatch Log group with a command like this one:
    $ aws logs filter-log-events --log-group-name "/aws/lambda/cloudhsm_lambda_example" --start-time "`date -d "now -5min" +%s`000" --query events[*].message --output text | sed "s/\t/\n/g" 

    If the Lambda function was successful, the output of the function should look something like the example below:

    START RequestId: 39c627f2-3908-4424-97ef-038c28a72f9a Version: $LATEST
    * Running GetSecretValue to get the CU credentials ...
    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
    * Running DescribeClusters to get the HSM IP ...
    DescribeClusters returned the HSM IP =
    * Getting the HSM IP inf the configuration file ...
    The configuration file has the HSM IP =
    * Starting the cloudhsm client ...
    * Waiting for the cloudhsm client to start ...
    * cloudhsm client started ...
    * Adding the Cavium provider ...
    ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
    * Using credentials to Login to the CloudHSM Cluster ...
    Login successful!
    * Generating AES Key ...
    * Generating Random data to encrypt ...
    Plain Text data = 3B0566E9A3FADA8FED7D6C88FE92ECBE8526922E84489AB48F1F3F3116235E69
    * Encrypting data ...
    Cipher Text data = CA6D80AD34BBADEF34275743F309E6730ABC66BA19C2EADC731899B0FB86564EDDB9F7FC103E1C9C2A6A1E64BF2D2C48
    * Decrypting ciphertext ...
    Decrypted Text data = 3B0566E9A3FADA8FED7D6C88FE92ECBE8526922E84489AB48F1F3F3116235E69
     * Successful decryption
    * Logging out the CloudHSM Cluster
    * Closing client ...
    END RequestId: 39c627f2-3908-4424-97ef-038c28a72f9a
    REPORT RequestId: 39c627f2-3908-4424-97ef-038c28a72f9a
    Duration: 11990.69 ms
    Billed Duration: 12000 ms
    Memory Size: 512 MB
    Max Memory Used: 103 MB

Note: The StatusLogger No log4j2 configuration file found error above is normal and can be ignored. This is related to missing log4j configuration which is normally used to configure logging, but is not needed in this case as the log messages are being written to CloudWatch Logs by default.


This solution demonstrates how to run CloudHSM workloads on Lambda, which allows you to not only leverage the flexibility of serverless computing, but also helps you meet security and compliance requirements by performing cryptographic tasks such as encryption and decryption operations. This approach also allows you to integrate with other AWS services like Amazon CloudWatch Events, Amazon Simple Storage Service (Amazon S3), or AWS Config for a seamless experience across your environment.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS CloudHSM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author photo

Mohamed AboElKheir

Mohamed AboElKheir is an Application Security Engineer who works with different teams to ensure AWS services, applications, and websites are designed and implemented to the highest security standards. He is a subject matter expert for CloudHSM and is always enthusiastic about assisting CloudHSM customers with advanced issues and use cases. Mohamed is passionate about InfoSec, specifically cryptography, penetration testing (he’s OSCP certified), application security, and cloud security (he’s AWS Security Specialty certified).

Generating REST APIs from data classes in Python

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/generating-rest-apis-from-data-classes-in-python/

This post is courtesy of Robert Enyedi – Senior Research Engineer – AI Labs

Implementing and managing public APIs is greatly simplified by API Gateway. Among the various features of API Gateway, the ability to import API definitions in the Open API format is powerful.

In this post, I show how you can automatically generate REST APIs directly from Python data classes. This method includes a highly automated workflow for exposing Python services as public APIs using the API Gateway. Recent changes in the Python language open the door for full automation of API publishing directly from code.

Open API and API Gateway

The Open API specification is a popular mechanism to declare the structure of REST APIs. It’s language-independent and allows you to determine API operations and their data types. Previously called Swagger, it is a standardization effort with benefits for the service developer and service consumer. It reduces repetitive tasks, increases API quality, and removes the guesswork from calling a service.

Examples shown here use data classes, which are supported in Python 3.7 or higher. There are backports of data classes to Python 3.6 available but they are beyond the scope of this post.

Python standard type annotations

The type hints syntax, defined in PEP 526 and implemented in Python 3.5, allow the declaration of a type for identifiers. This includes local variables, function and method parameters, and return type or class fields. They improve the readability of the code and provide useful information for tools. This allows your IDE to be more effective at auto-completion, semantic error detection, and refactoring.

Code checkers such as Mypy can better catch problems at build time. These are the typical advantages of statically typed languages. With Python, because type annotations are optional and a recent addition to the language, not all the project’s dependencies have types. That’s why tooling is less accurate in detecting all error conditions.

Python data classes

Data classes are an even more recent addition to the language. Described in PEP 557 and introduced in Python 3.7 they allow a simplified declaration of class data structures useful for storing state. Combined with type hints, one can use the @dataclass decorator:

class Person:
  name: str
  age: int

Then the Python implementation can generate:

  1. The constructor:
    Person(”Joe”, 12)
  2. Comparator methods to allow operations such as:
    Person(name=”Joe”, age=12) == Person(name=”Joe”, age=12)
  3. The __repr__() implementation to pretty print the object:
    Person(name='Joe', age=12)

Building an API using data classes

Data classes containing fields with type hints lend themselves to automation of API definitions. This solution uses data classes to generate Open API service definitions with AWS extensions and to create API Gateway configurations.

Similar solutions exist for strictly typed languages like Java, C# or Scala. In Python, this level of automation was not available until version 3.7. This code uses the Dataclasses JSON library to automate the serialization of data classes.

1. Start with the entity definition, in this case a person:

class Person:
  name: str
  age: int

2. Create one class for the request and another for the response to help payload serialization:

class CreatePersonRequest:
  person: Person

class CreatePersonResponse:
  person_id: int

3. Next, implement the route handler (this example uses the Flask Web framework):

OPERATION_CREATE_PERSON: str = 'create-person'
@app.route(f'/{OPERATION_CREATE_PERSON}', methods=['POST'])
def create_person():
    payload = request.get_json()
    logging.info(f"Incoming payload for {OPERATION_CREATE_PERSON}: {payload}")
    person = CreatePersonRequest.from_json(payload)

The payload is deserialized transparently using the schema derived from the data class definition of Person.

4. To generate a corresponding API definition, enter:

spec = {}


spec_dict = spec.to_dict()

The implementation of generate_operation() makes use of the apispec library to programmatically construct the Open API definition.

With spec_dict containing the Open API specification, it’s used to either create or update the API definition. You can also run any Open API tools on this definition, such as SDK generators, mock servers, or documentation generators. There’s a comprehensive catalog of tools maintained at https://openapi.tools/.

As a sensible default, the code generates API operations guarded by API keys supplied with the x-api-key header:

"securitySchemes": {
      "api_key": {
        "type": "apiKey",
        "name": "x-api-key",
        "in": "header"

The spec uses API Gateway extensions to include implementation-specific metadata. The most important is the one linking the API definition to the ECS backend:

"x-amazon-apigateway-integration": {
          "passthroughBehavior": "when_no_match",
          "type": "http_proxy",
          "httpMethod": "POST",
          "uri": "http://myecshost-1234567890.us-east-1.elb.amazonaws.com/create-person"

You can use a similar pattern to connect the gateway to a different service, such as AWS Lambda:

"x-amazon-apigateway-integration": {
          "uri": "arn:aws:apigateway:...:lambda:path/.../functions/arn:aws:lambda:...:...:function:yourLambdaFunction/invocations",
          "responses": {
            "default": {
              "statusCode": "200"
          "passthroughBehavior": "when_no_match",
          "httpMethod": "POST",
          "contentHandling": "CONVERT_TO_TEXT",
          "type": "aws"

For more information on the API Gateway extension to Open API, visit the AWS documentation.

Generating the API using API Gateway

This example uses the boto3 API Gateway API to expose a public API.

1. To create the API, enter the following:

api_definition = json.dumps(spec_dict, indent=2)
api_gateway_client.import_rest_api( body=api_definition )

2. To update the API, merge the changes into a manually modified API definition (mode='merge'), or completely overwrite the API (mode='overwrite'). It is often safer to merge the API, as follows:

api_gateway_client.put_rest_api(body=api_definition, mode='merge', restApiId=find_api_id(api_gateway_client, api_name))

The find_api_id() helper function looks up the API ID based on its name.

3. Check the API Gateway dashboard in the AWS Management Console for the new API definition. It shows the API and its resources:

API Gateway dashboard

Now you are ready to issue a test call to the external API to validate its security and functionality. The Open API definition of a manually created or modified API can be exported by various means, including from the stage editor.

Validate the API

The correct way to call the API is shown in test_get_dubbing_job_status_API() from test/ondemand_test_call_service.py:

response = _send_request(secure=True,
                         request=CreatePersonRequest(Person(name='Jane Doe', age='40')),

response_obj = CreatePersonResponse.from_json(response)

assert response_obj.person_id is not None

If you call the API without the api_key parameter, it returns an HTTP 403 code and the error message:



This post shows how to automatically expose Python services as public APIs directly from the code. With the introduction of Python data classes, it is easy to automate JSON serialization.

Now you can fully automate the API generation and deployment tasks for API Gateway.  Introducing a new entity is trivial, and adding a new field to your API requires only writing its definition. You can develop a fully functional API based upon these building blocks.

Learn more from this sample repository, and adapt the code for your projects to achieve a high level of automation for your public APIs.


Savings Plan Update: Save Up to 17% On Your Lambda Workloads

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/savings-plan-update-save-up-to-17-on-your-lambda-workloads/

Late last year I wrote about Savings Plans, and showed you how you could use them to save money when you make a one or three year commitment to use a specified amount (measured in dollars per hour) of Amazon Elastic Compute Cloud (EC2) or AWS Fargate. Savings Plans give you the flexibility to change compute services, instance types, operating systems, and regions while accessing compute power at a lower price.

Now for Lambda
Today I am happy to be able to tell you that Compute Savings Plans now apply to the compute time consumed by your AWS Lambda functions, with savings of up to 17%. If you are already using one or more Savings Plans to save money on your server-based processing, you can enjoy the cost savings while modernizing your applications and taking advantage of a multitude of powerful Lambda features including a simple programming model, automatic function scaling, Step Functions, and more! If your use case includes a constant level of function invocation for microservices, you should be able to make great use of Compute Savings Plans.

AWS Cost Explorer will now take Lambda usage in to account when it recommends a Savings Plan. I open AWS Cost Explorer, then click Recommendations within Savings Plans, then review the recommendations. As I am doing this, I can alter the term, payment option, and the time window that is used to make the recommendations:

When I am ready to proceed, I click Add selected Savings Plan(s) to cart, and then View cart to review my selections and submit my order:

The Savings Plan becomes active right away. I can use Cost Explorer’s Utilization and Coverage reports to verify that I am making good use of my plans. The Savings Plan Utilization report shows the percentage of savings plan commitment that is being used to realize savings on compute usage:

The Coverage report shows the percentage of Savings Plan commitment that is covered by Savings Plans for the selected time period:

When the coverage is less than 100% for an extended period of time, I should think about buying another plan.

Things to Know
Here are a couple of things to know:

Discount Order – If you are using two or more compute services, the plans are applied in order of highest to lowest discount percentage.

Applicability – The discount applies duration (both on demand and provisioned concurrency), and provisioned concurrency charges. It does not apply to Lambda requests.

Available Now
If you already own a Savings Plan or two and are using Lambda, you will receive a discount automatically (unless you are at 100% utilization with EC2 and Fargate).

If you don’t own a plan and are using Lambda, buy a plan today!



How Siemens built a fully managed scheduling mechanism for updates on Amazon S3 data lakes

Post Syndicated from Pedro Bento original https://aws.amazon.com/blogs/big-data/how-siemens-built-a-fully-managed-scheduling-mechanism-for-consistent-updates-on-amazon-s3-data-lakes/

Siemens is a global technology leader with more than 370,000 employees and 170 years of experience. To protect Siemens from cybercrime, the Siemens Cyber Defense Center (CDC) continuously monitors Siemens’ networks and assets. To handle the resulting enormous data load, the CDC built a next-generation threat detection and analysis platform called ARGOS. ARGOS is a hybrid-cloud solution that makes heavy use of fully managed AWS services for streaming, big data processing, and machine learning.

Users such as security analysts, data scientists, threat intelligence teams, and incident handlers continuously access data in the ARGOS platform. Further, various automated components update, extend, and remove data to enrich information, improve data quality, enforce PII requirements, or mutate data due to schema evolution or additional data normalization requirements. Keeping the data always available and consistent presents multiple challenges.

While object-based data lakes are highly beneficial from a cost perspective compared to traditional transactional databases in such scenarios, they hardly allow for atomic updates or require highly complex and costly extensions. To overcome this problem, Siemens designed a solution that enables atomic file updates on Amazon S3-based data lakes without compromising query performance and availability.

This post presents this solution, which is an easy-to-use scheduling service for S3 data update tasks. Siemens uses it for multiple purposes, including pseudonymization, anonymization, and removal of sensitive data. This post demonstrates how to use the solution to remove values from a dataset after a predefined amount of time. Adding further data processing tasks is straightforward because the solution has a well-defined architecture and the whole stack consists of fewer than 200 lines of source code. It is solely based on fully managed AWS services and therefore achieves minimal operational overhead.

Architecture overview

This post uses an S3-based data lake with continuous data ingestion and Amazon Athena as query mechanism. The goal is to remove certain values after a predefined time automatically after ingestion. Applications and users consuming the data via Athena are not impacted (for example, they do not observe downtimes or data quality issues like duplication).

The following diagram illustrates the architecture of this solution.

Siemens built the solution with the following services and components:

  1. Scheduling trigger – New data (for example, in JSON format) is continuously uploaded to a S3 bucket.
  2. Task scheduling – As soon as new files land, an AWS Lambda function processes the resulting S3 bucket notification events. As part of the processing, it creates a new item on Amazon DynamoDB that specifies a Time to Live (TTL) and the path to that S3 object.
  3. Task execution trigger – When the TTL expires, the DynamoDB item is deleted from the table and the DynamoDB stream triggers a Lambda function that processes the S3 object at that path.
  4. Task execution – The Lambda function derives meta information (like the relevant S3 path) from the TTL expiration event and processes the S3 object. Finally, the new S3 object replaces the older version.
  5. Data usage – The updated data is available for querying from Athena without further manual processing, and uses S3’s eventual consistency on read operations.

About DynamoDB Streams and TTL

TTL for DynamoDB lets you define when items in a table expire so they can be deleted from the database automatically. TTL comes at no extra cost as a way to reduce storage use and reduce the cost of storing irrelevant data without using provisioned throughput. You can set a timestamp for deletion on a per-item basis, which allows you to limit storage usage to only those records that are relevant, by enabling TTL on a table.

Solution overview

To implement this solution manually, complete the following steps:

  1. Create a DynamoDB table and configure DynamoDB Streams.
  2. Create a Lambda function to insert TTL records.
  3. Configure an S3 event notification on the target bucket.
  4. Create a Lambda function that performs data processing tasks.
  5. Use Athena to query the processed data.

If you want to deploy the solution automatically, you may skip these steps, and use the AWS Cloudformation template provided.


To complete this walkthrough, you must have the following:

  • An AWS account with access to the AWS Management Console.
  • A role with access to S3, DynamoDB, Lambda, and Athena.

Creating a DynamoDB table and configuring DynamoDB Streams

Start first with the time-based trigger setup. For this, you use S3 notifications, DynamoDB Streams, and a Lambda function to integrate both services. The DynamoDB table stores the items to process after a predefined time.

Complete the following steps:

  1. On the DynamoDB console, create a table.
  2. For Table name, enter objects-to-process.
  3. For Primary key, enter path and choose String.
  4. Select the table and click on Manage TTL next to “Time to live attribute” under table details.
  5. For TTL attribute, enter ttl.
  6. For DynamoDB Streams, choose Enable with view type New and old images.

Note that you can enable DynamoDB TTL on non-numeric attributes, but it only works on numeric attributes.

The DynamoDB TTL is not minute-precise. Expired items are typically deleted within 48 hours of expiration. However, you may experience shorter deviations of only 10–30 minutes from the actual TTL value. For more information, see Time to Live: How It Works.

Creating a Lambda function to insert TTL records

The first Lambda function you create is for scheduling tasks. It receives a S3 notification as input, recreates the S3 path (for example, s3://<bucket>/<key>), and creates a new item on DynamoDB with two attributes: the S3 path and the TTL (in seconds). For more information about a similar S3 notification event structure, see Test the Lambda Function.

To deploy the Lambda function, on the Lambda console, create a function named NotificationFunction with the Python 3.7 runtime and the following code:

import boto3, os, time

# Put here a new parameter for TTL, default 300, 5 minutes
default_ttl = 300

s3_client = boto3.client('s3')
table = boto3.resource('dynamodb').Table('objects-to-process')

def parse_bucket_and_key(s3_notif_event):
    s3_record = s3_notif_event['Records'][0]['s3']
    return s3_record['bucket']['name'], s3_record['object']['key']

def lambda_handler(event, context):
        bucket_name, key = parse_bucket_and_key(event)
        head_obj = s3_client.head_object(Bucket=bucket_name, Key=key)
        tags = s3_client.get_object_tagging(Bucket=bucket_name, Key=key)
        if(head_obj['ContentLength'] > 0 and len(tags['TagSet']) == 0):
            record_path = f"s3://{bucket_name}/{key}"
            table.put_item(Item={'path': record_path, 'ttl': int(time.time()) + default_ttl})
        pass # Ignore

Configuring S3 event notifications on the target bucket

You can take advantage of the scalability, security, and performance of S3 by using it as a data lake for storing your datasets. Additionally, you can use S3 event notifications to capture S3-related events, such as the creation or deletion of objects within a bucket. You can forward these events to other AWS services, such as Lambda.

To configure S3 event notifications, complete the following steps:

  1. On the S3 console, create an S3 bucket named data-bucket.
  2. Click on the bucket and go to “Properties” tab.
  3. Under Advanced Settings, choose Events and add a notification.
  4. For Name, enter MyEventNotification.
  5. For Events, select All object create events.
  6. For Prefix, enter dataset/.
  7. For Send to, choose Lambda Function.
  8. For Lambda, choose NotificationFunction.

This configuration restricts the scheduling to events that happen within your previously defined dataset. For more information, see How Do I Enable and Configure Event Notifications for an S3 Bucket?

Creating a Lambda function that performs data processing tasks

You have now created a time-based trigger for the deletion of the record in the DynamoDB table. However, when the system delete occurs and the change is recorded in DynamoDB Streams, no further action is taken. Lambda can poll the stream to detect these change records and trigger a function to process them according to the activity (INSERT, MODIFY, REMOVE).

This post is only concerned with deleted items because it uses the TTL feature of DynamoDB Streams to trigger task executions. Lambda gives you the flexibility to either process the item by itself or to forward the processing effort to somewhere else (such as an AWS Glue job or an Amazon SQS queue).

This post uses Lambda directly to process the S3 objects. The Lambda function performs the following tasks:

  1. Gets the S3 object from the DynamoDB item’s S3 path attribute.
  2. Modifies the object’s data.
  3. Overrides the old S3 object with the updated content and tags the object as processed.

Complete the following steps:

  1. On the Lambda console, create a function named JSONProcessingFunction with Python 3.7 as the runtime and the following code:
    import os, json, boto3
    from functools import partial
    from urllib.parse import urlparse
    s3 = boto3.resource('s3')
    def parse_bucket_and_key(s3_url_as_string):
        s3_path = urlparse(s3_url_as_string)
        return s3_path.netloc, s3_path.path[1:]
    def extract_s3path_from_dynamo_event(event):
        if event["Records"][0]["eventName"] == "REMOVE":
            return event["Records"][0]["dynamodb"]["Keys"]["path"]["S"]
    def modify_json(json_dict, column_name, value):
        json_dict[column_name] = value
        return json_dict
    def get_obj_contents(bucketname, key):
        obj = s3.Object(bucketname, key)
        return obj.get()['Body'].iter_lines()
    clean_column_2_func = partial(modify_json, column_name="file_contents", value="")
    def lambda_handler(event, context):
        s3_url_as_string = extract_s3path_from_dynamo_event(event)
        if s3_url_as_string:
            bucket_name, key = parse_bucket_and_key(s3_url_as_string)
            updated_json = "\n".join(map(json.dumps, map(clean_column_2_func, map(json.loads, get_obj_contents(bucket_name, key)))))
            s3.Object(bucket_name, key).put(Body=updated_json, Tagging="PROCESSED=True")
            print(f"Invalid event: {str(event)}")

  2. On the Lambda function configuration webpage, click on Add trigger.
  3. For Trigger configuration, choose DynamoDB.
  4. For DynamoDB table, choose objects-to-process.
  5. For Batch size, enter 1.
  6. For Batch window, enter 0.
  7. For Starting position, choose Trim horizon.
  8. Select Enable trigger.

You use batch size = 1 because each S3 object represented on the DynamoDB table is typically large. If these files are small, you can use a larger batch size. The batch size is essentially the number of files that your Lambda function processes at a time.

Because any new objects on S3 (in a versioning-enabled bucket) create an object creation event, even if its key already exists, you must make sure that your task schedule Lambda function ignores any object creation events that your task execution function creates. Otherwise, it creates an infinite loop. This post uses tags on S3 objects: when the task execution function processes an object, it adds a processed tag. The task scheduling function ignores those objects in subsequent executions.

Using Athena to query the processed data

The final step is to create a table for Athena to query the data. You can do this manually or by using an AWS Glue crawler that infers the schema directly from the data and automatically creates the table for you. This post uses a crawler because it can handle schema changes and add new partitions automatically. To create this crawler, use the following code:

aws glue create-crawler --name data-crawler \ 
--role <AWSGlueServiceRole-crawler> \
--database-name data_db \
--description 'crawl data bucket!' \
--targets \
  \"S3Targets\": [\
      \"Path\": \"s3://<data-bucket>/dataset/\"\

Replace <AWSGlueServiceRole-crawler> and <data-bucket> with the name of your AWSGlueServiceRole and S3 bucket, respectively.

When the crawling process is complete, you can start querying the data. You can use the Athena console to interact with the table while its underlying data is being transparently updated. See the following code:

SELECT * FROM data_db.dataset LIMIT 1000

Automated setup

You can use the following AWS CloudFormation template to create the solution described on this post on your AWS account. To launch the template, choose the following link:

This CloudFormation stack requires the following parameters:

  • Stack name – A meaningful name for the stack, for example, data-updater-solution.
  • Bucket name – The name of the S3 bucket to use for the solution. The stack creation process creates this bucket.
  • Time to Live – The number of seconds to expire items on the DynamoDB table. Referenced S3 objects are processed on item expiration.

Stack creation takes up to a few minutes. Check and refresh the AWS CloudFormation Resources tab to monitor the process while it is running.

When the stack shows the state CREATE_COMPLETE, you can start using the solution.

Testing the solution

To test the solution, download the mock_uploaded_data.json dataset created with the Mockaroo data generator. The use case is a web service in which users can upload files. The goal is to delete those files some predefined time after the upload to reduce storage and query costs. To this end, the provided code looks for the attribute file_contents and replaces its value with an empty string.

You can now upload new data into your data-bucket S3 bucket under the dataset/ prefix. Your NotificationFunction Lambda function processes the resulting bucket notification event for the upload, and a new item appears on your DynamoDB table. Shortly after the predefined TTL time, the JSONProcessingFunction Lambda function processes the data and you can check the resulting changes via an Athena query.

You can also confirm that a S3 object was processed successfully if the DynamoDB item corresponding to this S3 object is no longer present in the DynamoDB table and the S3 object has the processed tag.


This post showed how to automatically re-process objects on S3 after a predefined amount of time by using a simple and fully managed scheduling mechanism. Because you use S3 for storage, you automatically benefit from S3’s eventual consistency model, simply by using identical keys (names) both for the original and processed objects. This way, you avoid query results with duplicate or missing data. Also, incomplete or only partially uploaded objects do not result in data inconsistencies because S3 only creates new object versions for successfully completed file transfers.

You may have previously used Spark to process objects hourly. This requires you to monitor objects that must be processed, to move and process them in a staging area, and to move them back to their actual destination. The main drawback is the final step because, due to Spark’s parallelism nature, files are generated with different names and contents. That prevents direct file replacement in the dataset and leads to downtimes or potential data duplicates when data is queried during a move operation. Additionally, because each copy/delete operation could potentially fail, you have to deal with possible partially processed data manually.

From an operations perspective, AWS serverless services simplify your infrastructure. You can combine the scalability of these services with a pay-as-you-go plan to start with a low-cost POC and scale to production quickly—all with a minimal code base.

Compared to hourly Spark jobs, you could potentially reduce costs by up to 80%, which makes this solution both cheaper and simpler.

Special thanks to Karl Fuchs, Stefan Schmidt, Carlos Rodrigues, João Neves, Eduardo Dixo and Marco Henriques for their valuable feedback on this post’s content.


About the Authors

Pedro Completo Bento is a senior big data engineer working at Siemens CDC. He holds a Master in Computer Science from the Instituto Superior Técnico in Lisbon. He started his career as a full-stack developer, specializing later on big data challenges. Working with AWS, he builds highly reliable, performant and scalable systems on the cloud, while keeping the costs at bay. In his free time, he enjoys to play boardgames with his friends.



Arturo Bayo is a big data consultant at Amazon Web Services. He promotes a data-driven culture in enterprise customers around EMEA, providing specialized guidance on business intelligence and data lake projects while working with AWS customers and partners to build innovative solutions around data and analytics.





AWS Lambda now supports Ruby 2.7

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/aws-lambda-now-supports-ruby-2-7/

You can now develop your AWS Lambda functions using Ruby 2.7. Start using this runtime today by specifying a runtime parameter value of ruby2.7 when creating or updating Lambda functions.

New Ruby runtime features

Ruby 2.7 is a stable release and brings several new features, including pattern matching, argument forwarding, and numbered arguments.

Pattern matching

Pattern matching, a widely used feature in functional programming languages, is introduced as a new experimental feature. This allows deep matching of structured values, checking the structure and binding the matched parts to local variables.

It can traverse a given object and assign it’s value if it matches a pattern:

require "json"
json = <<END
  "name": "Alice",
  "age": 30,
  "children": [{ "name": "Bob", "age": 2 }]
case JSON.parse(json, symbolize_names: true)
in {name: "Alice", children: [{name: "Bob", age: age}]}
  p age #=> 2

For additional information on pattern matching, see Feature #14912.

Argument forwarding

Prior to Ruby 2.7 the * and ** operators are available for single and keyword arguments. These are used to specify any number of arguments or convert array or hashes to several arguments.

Ruby 2.7 added a new shorthand syntax ... for forwarding all arguments to a method irrespective of type. In the example below, all arguments to foo are forwarded to bar, including keyword and block arguments. It acts similar to calling super without any arguments.

def foo(...)

Numbered arguments

Numbered arguments allow you to reference block arguments solely by their index. They are only valid when referenced inside of a block:


[1, 2, 3].each { |i| puts i }


[1, 2, 3].each { puts @1 }

This can make short code blocks easier to read and reduce code repetition.

Amazon Linux 2

Ruby 2.7, like (Python 3.8, Node.js 10 and 12, and Java 11) is based on an Amazon Linux 2 execution environment. Amazon Linux 2 provides a secure, stable, and high-performance execution environment to develop and run cloud and enterprise applications.

Next steps

Get started building with Ruby 2.7 today by specifying a runtime parameter value of ruby2.7 when creating your Lambda functions. You can read about the Ruby programming model in the AWS Lambda documentation to learn more about writing functions in Ruby 2.7.

For existing Ruby functions, migrate to the new runtime by making any necessary changes to the code for compatibility with Ruby 2.7, then changing the function’s runtime configuration to ruby2.7.

Enjoy, go build with Ruby!

Deploy and publish to an Amazon MQ broker using AWS serverless

Post Syndicated from Moheeb Zara original https://aws.amazon.com/blogs/compute/deploy-and-publish-to-an-amazon-mq-broker-using-aws-serverless/

If you’re managing a broker on premises or in the cloud with a dependent existing infrastructure, Amazon MQ can provide easily deployed, managed ActiveMQ brokers. These support a variety of messaging protocols that can offload operational overhead. That can be useful when deploying a serverless application that communicates with one or more external applications that also communicate with each other.

This post walks through deploying a serverless backend and an Amazon MQ broker in one step using the AWS Serverless Application Model (AWS SAM). It shows you how to publish to a topic using AWS Lambda and then how to create a client application to consume messages from the topic, using a supported protocol. As a result, the AWS services and features supported by AWS Lambda can now be delivered to an external application connected to an Amazon MQ broker using STOMP, AMQP, MQTT, OpenWire, or WSS.

Although many protocols are supported by Amazon MQ, this walkthrough focuses on one. MQTT is a lightweight publish–subscribe messaging protocol. It is built to work in a small code footprint and is one of the most well-supported messaging protocols across programming languages. The protocol also introduced quality of service (QoS) to ensure message delivery when a device goes offline. Using QoS features, you can limit failure states in an interdependent network of applications.

To simplify this configuration, I’ve provided an AWS Serverless Application Repository application that deploys AWS resources using AWS CloudFormation. Two resources are deployed, a single instance Amazon MQ broker and a Lambda function. The Lambda function uses Node.js and an MQTT library to act as a producer and publish to a message topic on the Amazon MQ broker. A provided sample Node.js client app can act as an MQTT client and subscribe to the topic to receive messages.


The following resources are required to complete the walkthrough:

Required steps

To complete the walkthrough, follow these steps:

  • Clone the Aws-sar-lambda-publish-amazonmq GitHub repository.
  • Deploy the AWS Serverless Application Repository application.
  • Run a Node.js MQTT client application.
  • Send a test message from an AWS Lambda function.
  • Use composite destinations.

Clone the GitHub repository

Before beginning, clone or download the project repository from GitHub. It contains the sample Node.js client application used later in this walkthrough.

Deploy the AWS Serverless Application Repository application

  1. Navigate to the page for the lambda-publish-amazonmq AWS Serverless Application Repository application.
  2. In Application settings, fill the following fields:

    – AdminUsername
    – AdminPassword
    – ClientUsername
    – ClientPassword

    These are the credentials for the Amazon MQ broker. The admin credentials are assigned to environment variables used by the Lambda function to publish messages to the Amazon MQ broker. The client credentials are used in the Node.js client application.

  3. Choose Deploy.

Creation can take up to 10 minutes. When completed, proceed to the next section.

Run a Node.js MQTT client application

The Amazon MQ broker supports OpenWire, AMQP, STOMP, MQTT, and WSS connections. This allows any supported programming language to publish and consume messages from an Amazon MQ queue or topic.

To demonstrate this, you can deploy the sample Node.js MQTT client application included in the GitHub project for the AWS Serverless Application Repository app. The client credentials created in the previous section are used here.

  1. Open a terminal application and change to the client-app directory in the GitHub project folder by running the following command:
    cd ~/some-project-path/aws-sar-lambda-publish-amazonmq/client-app
  2. Install the Node.js dependencies for the client application:
    npm install
  3. The app requires a WSS endpoint to create an Amazon MQ broker MQTT WebSocket connection. This can be found on the broker page in the Amazon MQ console, under Connections.
  4. The node app takes four arguments separated by spaces. Provide the user name and password of the client created on deployment, followed by the WSS endpoint and a topic, some/topic.
    node app.js "username" "password" "wss://endpoint:port" "some/topic"
  5. After connected prints in the terminal, leave this app running, and proceed to the next section.

There are three important components run by this code to subscribe and receive messages:

  • Connecting to the MQTT broker.
  • Subscribing to the topic on a successful connection.
  • Creating a handler for any message events.

The following code example shows connecting to the MQTT broker.

const args = process.argv.slice(2)

let options = {
  username: args[0],
  password: args[1],
  clientId: 'mqttLambda_' + uuidv1()

let mqEndpoint = args[2]
let topic = args[3]

let client = mqtt.connect( mqEndpoint, options)

The following code example shows subscribing to the topic on a successful connection.

// When connected, subscribe to the topic

client.on('connect', function() {

  client.subscribe(topic, function (err) {
    if(err) console.log(err)

The following code example shows creating a handler for any message events.

// Log messages

client.on('message', function (topic, message) {
  console.log(`message received on ${topic}: ${message.toString()}`)

Send a test message from an AWS Lambda function

Now that the Amazon MQ broker, PublishMessage Lambda function, and the Node.js client application are running, you can test consuming messages from a serverless application.

  1. In the Lambda console, select the newly created PublishMessage Lambda function. Its name begins with the name given to the AWS Serverless Application Repository application on deployment.
  2. Choose Test.
  3. Give the new test event a name, and optionally modify the message. Choose Create.
  4. Choose Test to invoke the Lambda function with the test event.
  5. If the execution is successful, the message appears in the terminal where the Node.js client-app is running.

Using composite destinations

The Amazon MQ broker uses an XML configuration to enable and configure ActiveMQ features. One of these features, composite destinations, makes one-to-many relationships on a single destination possible. This means that a queue or topic can be configured to forward to another queue, topic, or combination.

This is useful when fanning out to a number of clients, some of whom are consuming queues while others are consuming topics. The following steps demonstrate how you can easily modify the broker configuration and define multiple destinations for a topic.

  1. On the Amazon MQ Configurations page, select the matching configuration from the list. It has the same stack name prefix as your broker.
  2. Choose Edit configuration.
  3. After the broker tag, add the following code example. It creates a new virtual composite destination where messages published to “some/topic” publishes to a queue “A.queue” and a topic “foo.”
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <broker schedulePeriodForDestinationPurge="10000" xmlns="http://activemq.apache.org/schema/core">
            <compositeTopic name="some.topic">
                <queue physicalName="A.Queue"/>
                <topic physicalName="foo" />
  4. Choose Save, add a description for this revision, and then choose Save.
  5. In the left navigation pane, choose Brokers, and select the broker with the stack name prefix.
  6. Under Details, choose Edit.
  7. Under Configuration, select the latest configuration revision that you just created.
  8. Choose Schedule modifications, Immediately, Apply.

After the reboot is complete, run another test of the Lambda function. Then, open and log in to the ActiveMQ broker web console, which can be found under Connections on the broker page. To log in, use the admin credentials created on deployment.

On the Queues page, a new queue “A.Queue” was generated because you published to some/topic, which has a composite destination configured.


It can be difficult to tackle architecting a solution with multiple client destinations and networked applications. Although there are many ways to go about solving this problem, this post showed you how to deploy a robust solution using ActiveMQ with a serverless workflow. The workflow publishes messages to a client application using MQTT, a well-supported and lightweight messaging protocol.

To accomplish this, you deployed a serverless application and an Amazon MQ broker in one step using the AWS Serverless Application Repository. You also ran a Node.js MQTT client application authenticated as a registered user in the Amazon MQ broker. You then used Lambda to test publishing a message to a topic on the Amazon MQ broker. Finally, you extended functionality by modifying the broker configuration to support a virtual composite destination, allowing delivery to multiple topic and queue destinations.

With the completion of this project, you can take things further by integrating other AWS services and third-party or custom client applications. Amazon MQ provides multiple protocol endpoints that are widely used across the software and platform landscape. Using serverless as an in-between, you can deliver features from services like Amazon EventBridge to your external applications, wherever they might be. You can also explore how to invoke an Lambda function from Amazon MQ.


Creating a Seamless Handoff Between Amazon Pinpoint and Amazon Connect

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/messaging-and-targeting/creating-a-seamless-handoff-between-amazon-pinpoint-and-amazon-connect/

Note: This post was written by Ilya Pupko, Senior Consultant for the AWS Digital User Engagement team.

Time to read5 minutes
Learning levelIntermediate (200)
Services usedAmazon Pinpoint, Amazon SNS, AWS Lambda, Amazon Lex, Amazon Connect

Your customers deserve to have helpful communications with your brand, regardless of the channel that you use to interact with them. There are many situations in which you might have to move customers from one channel to another—for example, when a customer is interacting with a chatbot over SMS, but their needs suddenly change to require voice assistance. To create a great customer experience, your communications with your customers should be seamless across all communication channels.

Welcome aboard Customer Obsessed Airlines

In this post, we look at a scenario that involves our fictitious airline, Customer Obsessed Airlines. Severe storms in one area of the country have caused Customer Obsessed Airlines to cancel a large number of flights. Customer Obsessed Airlines has to notify all of the affected customers of the cancellations right away. But most importantly, to keep customers as happy as possible in this unfortunate and unavoidable situation, Customer Obsessed Airlines has to make it easy for customers to rebook their flights.

Fortunately, Customer Obsessed Airlines has implemented the solution that’s outlined later in this post. This solution uses Amazon Pinpoint to send messages to a targeted segment of customers—in this case, the specific customers who were booked on the affected flights. Some of these customers might have straightforward travel itineraries that can simply be rebooked through interactions with a chatbot. Other customers who have more complex itineraries, or those who simply prefer to interact with a human over the phone, can be handed off to an agent in your call center.

About the solution

The solution that we’ll build to handle this scenario can be deployed in under an hour. The following diagram illustrates the interactions in this solution.

At a high level, this solution uses the following workflow:

  1. An event occurs. Automated impact analysis systems trigger the creation of custom segments—in this case, all passengers whose flights were cancelled.
  2. Amazon Pinpoint sends a message to the affected passengers through their preferred channels. Amazon Pinpoint supports the email, SMS, push, and voice channels, but in this example, we focus exclusively on SMS.
  3. Passengers who receive the message can respond. When they do, they interact with a chatbot that helps them book a different flight.
  4. If a passenger requests a live agent, or if their situation can’t be handled by a chatbot, then Amazon Pinpoint passes information about the customer’s situation and communication history to Amazon Connect. The passenger is entered into a queue. When the passenger reaches the front of the queue, they receive a phone call from an agent.
  5. After being re-booked, the passenger receives a written confirmation of the changes to their itinerary through their preferred channel. Passengers are also given the option of providing feedback on their interaction when the process is complete.

To build this solution, we use Amazon Pinpoint to segment our customers based on their attributes (such as which flight they’ve booked), and to deliver messages to those segments.

We also use Amazon Connect to manage the voice calling part of the solution, and Amazon Lex to power the chatbot. Finally, we connect these services using logic that’s defined in AWS Lambda functions.

Setting up the solution

Step 1: Set up Amazon Pinpoint and link it with Amazon Lex

The first step in setting up this solution is to create a new Amazon Pinpoint project and configure the SMS channel. When that’s done, you can create an Amazon Lex chatbot and link it to the Amazon Pinpoint project.

We described this process in detail in an earlier blog post. Complete the procedures in Create an SMS Chatbot with Amazon Pinpoint and Amazon Lex, and then proceed to step 2.

Step 2: Set up Amazon Connect and link it with your Amazon Lex chatbot

By completing step 1, we’ve created a system that can send messages to our passengers and receive messages from them. The next step is to create a way for passengers to communicate with our call center.

The Amazon Connect Administrator Guide provides instructions for linking an Amazon Lex bot to an Amazon Connect instance. For complete procedures, see Add an Amazon Lex Bot.

When you complete these procedures, link your Amazon Connect instance to the same Amazon Lex bot that you created in step 1. This step is intended to provide customers with a consistent, cohesive experience across channels.

Step 3: Set up an Amazon Connect callback queue and use Amazon Pinpoint keyword logic to trigger it

Now that we’ve configured Amazon Pinpoint and Amazon Connect, we can connect them.

Linking the two services makes it possible for passengers to request additional assistance. Traditionally, passengers in this situation would have to call a call center themselves and then wait on hold for an agent to become available. However, in this solution, our call center calls the passenger directly as soon as an agent is available. When the agent calls the passenger, the agent has all of the information about the passenger’s issue, as well as a transcript of the passenger’s interactions with your chatbot.

To implement an automatic callback mechanism, use the Amazon Pinpoint Connect Callback Requestor, which is available on the AWS GitHub page.

Next steps

By completing the preceding three steps, you can send messages to a subset of your users based on the criteria you choose and the type of message you want to send. Your customers can interact with your message by replying with questions. When they do, a chatbot responds intelligently and appropriately.

You can add to this solution by expanding it to cover other communication channels, such as push notifications. You can also automate the initial communication by integrating the solution with your systems of record.

We’re excited to see what you build using the solution that we outlined in this post. Let us know of your ideas and your successes in the comments.

Building a serverless URL shortener app without AWS Lambda – part 1

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/building-a-serverless-url-shortener-app-without-lambda-part-1/

When building applications, developers often use a standard multi-tier architecture pattern that generally includes a presentation, processing, and data tier. When building such an application using serverless technologies on AWS, it might look like the following:

Serverless architecture

In this three-part series, I am going to challenge you to approach this a different way by building a functionless or “backend-less” URL shortener application, that looks like this:

Functionless architecture

In part one, I discuss configuring a service integration between Amazon API Gateway and Amazon DynamoDB, removing the need for AWS Lambda entirely. I also demonstrate using Apache’s Velocity Templating Language (VTL) to apply business logic and modify the API request and response as needed. In part two, I show how to use API Gateway to increase security. In part three, I demonstrate how to improve response time and configure observability to get insights into application performance and client usage.

At AWS re:Invent 2019, the new HTTP API for Amazon API Gateway was announced. At the time of this writing, this new service does not support VTL or some of the other features discussed, so instead I use a REST API. When HTTP API gains feature parity, we will publish an additional follow up to this post.

Throughout this blog series, there are deep links to AWS SAM and OpenAPI configurations to show how to build this application using infrastructure as code (IaC). To refer to the full application, visit https://github.com/aws-samples/amazon-api-gateway-url-shortener. The template.yaml file is the AWS SAM configuration for the application, and the api.yaml is the OpenAPI configuration for the API. I have included instructions on how to deploy the full application, including a simple web client, in the README.md file.

Why would I do this?

AWS Lambda is the standard compute resource for serverless applications. With a Lambda function, I can process complex business logic in any of the AWS supported runtimes or even in my own custom runtime. However, do I really need to use a Lambda function when the business logic is minimal, and the main purpose becomes the transportation of data? Instead, I can turn to API Gateway to transport the data and process minimal amounts of business logic, as needed, with VTL. This allows me to minimize my application resources and cost.

API Gateway service integration

While each request to an API Gateway REST endpoint follows the same path, to understand how service integrations work, I show the integration for /app – POST. This represents the lifecycle of a request made to http://myexampleapi.com/api using a POST method. The purpose of this endpoint is to post new short links to the database.

API Gateway request lifecycle

The Method Request and Method Response mainly handle authorization, modeling, and validation, and are covered in detail in part two of this blog. For now, I focus on the Integration Request and Integration Response. The Integration Request is responsible for service integrations, and looks like this:

POST integration request

The Integration type is AWS Service and the AWS Region is my closest Region, us-west-2. For AWS Service, I choose DynamoDB from the long list of available services. For the HTTP Method, when interacting with the DynamoDB API, the POST method is required to take action on the underlying table.

For the Action, I choose UpdateItem. The action is the same here as you would use in the CLI or SDK to interact with DynamoDB. Generally, when adding new items to the DynamoDB table, I use the PutItem command. However, in this instance I must use UpdateItem to get a specific set of return data from DynamoDB.

When creating a new record in DynamoDB, the PutItem action does not return the completed record in the single request. If I want to obtain the new record, I need to make a secondary call to DynamoDB to fetch the record. However, the API Gateway request lifecycle does not have the ability to call the database a second time. I need to make sure I get everything I need the first time around. The nature of the UpdateItem is to update an existing item or create a new one if it doesn’t exist. Additionally, it returns the newly created object which I can then return to the client.

Finally, I configure the execution role. On this method, API Gateway needs permission to read and write from DynamoDB. Here is the policy section of the DDBCrudRole:

  - PolicyName: DDBCrudPolicy
      Version: '2012-10-17'
          - dynamodb:DeleteItem
          - dynamodb:UpdateItem
        Effect: Allow
        Resource: !GetAtt LinkTable.Arn

This simple policy is used for all create, read, update, and delete (CRUD) operations, and UpdateItem is used for both create and update. This policy is part of the SAM template, and dynamically references the DynamoDB table name for the resource. This follows the principles of least privilege, only allowing access to the required table.

Modifying the request

Now that I have configured the integration from API Gateway to DynamoDB, I modify the incoming request to a format that DynamoDB understands. Further down the page on the Integration Request, you see the Mapping Template option:

Mapping templates

The mapping template evaluates incoming request body and looks for existing templates to apply. I have created a template for application/json to match the incoming body. Here is a summarized version of the template:

  "TableName": "URLShortener-LinkTable-QTK7WFAJ11YS",
  "Key": {
    "id": { "S": $input.json('$.id') }
  "ExpressionAttributeNames": {
    "#u": "url",
    "#o": "owner",
    "#ts": "timestamp"
    ":u": {"S": $input.json('$.url')},
    ":o": {"S": "$context.authorizer.claims.email"},
    ":ts": {"S": "$context.requestTime"}
  "UpdateExpression": "SET #u = :u, #o = :o, #ts = :ts",
  "ReturnValues": "ALL_NEW"

If you have worked with the DynamoDB SDK, this might look familiar. The TableName indicates which table to use in the call. The ConditionExpression value ensures that the id passed does not already exist. The value for id is extracted from the request body using $input.json(‘$.id’).

To avoid colliding with reserved words, DynamoDB has the concept of ExpressionAttributeNames and ExpressionAttributeValues. In the ExpressionAttributeValues I have set ‘:o’ to $context.authorizer.claims.email. This extracts the authenticated user’s email from the request context and maps it to owner. This allows me to uniquely group a single user’s links into a global secondary index (GSI). Querying the GSI is much more efficient than scanning the entire table.

I also retrieve the requestTime from the context object, allowing me to place a timestamp in the record. I set the ReturnValues to return all new values for the record.  Finally, the UpdateExpression maps the values to the proper names and inserts the item into DynamoDB.

Modifying the response

Before I discuss the Integration Response, let’s examine the Method Response:

Method response

The Method Response is responsible for modeling the response to the client. In most cases, DynamoDB returns a status code of either 200 or 400. Therefore, I configure a 200 response and a 400 response.

When DynamoDB returns a 200 response, the data looks like the following:

  "id": {"S": "aws"},
  "owner": {"S": "[email protected]"},
  "timestamp": {"S": "27/Dec/2019:21:21:17 +0000"},
  "url": {"S": "http://aws.amazon.com"}

In the Integration Response, I have a template that converts this to a structure that the client is expecting. The template looks like this:

#set($inputRoot = $input.path('$'))

This template has a variable called ­$inputRoot to contain the root data. I then build out the return object, formatted for the client:

  "id": "aws",
  "url": http://aws.amazon.com,
  "timestamp": "27/Dec/2019:21:21:17 +0000",
  "owner": "[email protected]"

For a 400 status, I must evaluate the issue and respond accordingly. The mapping template looks like this:

#set($inputRoot = $input.path('$')) 
  #set($context.responseOverride.status = 200)
  {"error": true,"message": "URL link already exists"} 

This template checks for the string, “ConditionalCheckFailedException”. If it exists, then I know that the conditional check “attribute_not_exists(id)”, from the UpdateItem template in the Integration Request failed. To return a 200 response, I use the “#set($context.responseOverride.status = 200)” override andset the response with the error details.

With my integration and mapping templates in place for the /app – POST method, I now have the ability to create new short links for my URL shortener. Taking this same approach for reading, updating, and deleting short links, I now have a fully functioning backend for the URL shortener that only uses API Gateway and DynamoDB.

What we have built so far


In this post, I walked through using VTL to manage simple business logic at the processing tier with API Gateway. I covered configuring the service integration with DynamoDB and modifying the request and response payloads as needed. In part 2, I discuss different options for configuring Amazon API Gateway security.

To deploy the URL shortener, visit https://github.com/aws-samples/amazon-api-gateway-url-shortener. The README.md file contains instructions for launching the application.

Continue to part two.

Happy coding!

Integrating Amazon EventBridge into your serverless applications

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/integrating-amazon-eventbridge-into-your-serverless-applications/

Event-driven architecture enables developers to create decoupled services across applications. When combined with the range of managed services available in AWS, this approach can make applications highly scalable and flexible, with minimal maintenance.

Many services in the AWS Cloud produce events, including integrated software as a service (SaaS) applications. Your custom applications can also produce and consume events. With so many events from different sources, you need a way to coordinate this traffic. Amazon EventBridge is a serverless event bus that helps manage how all these events are routed throughout your applications.

The routing logic is managed by rules that evaluate the events against event expressions. EventBridge delivers matching events to targets such as AWS Lambda, so you can process events with your custom business logic.

EventBridge architecture

In this blog post, I show how you can build an event producer and consumer in AWS Lambda, and create a rule to route events. The code uses the AWS Serverless Application Model (SAM), so you can deploy the application in your own AWS Account. This walkthrough uses AWS resources that are covered by the AWS Free Tier.

To set up the example application, visit the GitHub repo and follow the instructions in the README.md file.

How the example application works

In this example, a banking application for automated teller machine (ATM) produces events about transactions. It sends the events to EventBridge, which then uses rules defined by the application to route accordingly. There are three downstream services consuming a subset of these events.


Sample ATM application architecture

In the repo, the atmProducer subdirectory contains handler.js, which represents the ATM service producing events. This code is a Lambda handler written in Node.js, and publishes events to EventBridge via the AWS SDK using this line of code:

const result = await eventbridge.putEvents(params).promise()

This directory also contains events.js, listing several test transactions in an Entries array. A single event is defined as follows:

      // Event envelope fields
      Source: 'custom.myATMapp',
      EventBusName: 'default',
      DetailType: 'transaction',
      Time: new Date(),

      // Main event body
      Detail: JSON.stringify({
        action: 'withdrawal',
        location: 'MA-BOS-01',
        amount: 300,
        result: 'approved',
        transactionId: '123456',
        cardPresent: true,
        partnerBank: 'Example Bank',
        remainingFunds: 722.34

The Detail section of the event specifies transaction attributes. These include the location of the ATM, the amount, the partner bank, and the result of the transaction.

The handler.js file in the atmConsumer subdirectory contains three functions:

exports.case1Handler = async (event) => {
  console.log('--- Approved transactions ---')
  console.log(JSON.stringify(event, null, 2))

exports.case2Handler = async (event) => {
  console.log('--- NY location transactions ---')
  console.log(JSON.stringify(event, null, 2))

exports.case3Handler = async (event) => {
  console.log('--- Unapproved transactions ---')
  console.log(JSON.stringify(event, null, 2))

Each function receives transaction events, which are logged via the console.log statements to Amazon CloudWatch Logs. The consumer functions operate independently of the producer and are unaware of the source of the events.

The routing logic is contained in the EventBridge rules that are deployed by the application’s SAM template. The rules evaluate the incoming stream of events, and route matching events to the target Lambda functions.

Running the ATM application

After deploying the sample application, you can generate test events by invoking the atmProducer Lambda function:

  1. Open the Lambda console in the same Region where you deployed the SAM application.

    AWS Lambda console

  2. There are four Lambda functions with the prefix atm-demo. Choose the atmProducerFn function, then choose Test.

    Testing the Lambda function

  3. For Event name, enter Test, then choose Create. Choose Test once more to invoke the function.

    Invoking the Lambda function

This puts the sample events onto the EventBridge default event bus. Next, inspect the logs from the three consumer functions to see which events route to each function:

  1. Navigate to the CloudWatch console in the same Region. Select Logs, then Log groups from the menu.

    CloudWatch console

  2. Select the log group containing atmConsumerCase1. You see two streams representing the two transactions approved by the ATM. Choose a log stream to view the output.

    Log stream output

  3. Navigate back to the list of log groups, then select the log group containing atmConsumerCase2. You see streams for the two transactions matching the “New York” location filter.

    Transaction matching "New York" filter

  4. Navigate back once more to the list of log groups and select the log group containing atmConsumerCase3. Open the stream to see the denied transaction.

    Denied transaction in logs

How EventBridge rules work

From the AWS Management Console, navigate to EventBridge from the Services dropdown. Choose Rules from the menu to see the rules created by the application deployment.

Amazon EventBridge rules

Choose one of the rules to see the configuration. Each rule is associated with a single event bus (the default bus for this application), which means it evaluates every event published to the bus. Scroll down to view the Event pattern used by the rule:

EventBridge event patterns

The event pattern is a JSON object with the same structure as the events they match. Each matching value must be wrapped in an array, and you can provide multiple values if necessary. If you use multiple values, this is compared using ‘or’ logic – only ones of the values needs to match the incoming event.

You can also use content-based filtering in event patterns to create more complex rules to match dynamically. The prefix operator above matches any event where the detail.location value begins with “NY-“.

Integrating EventBridge into SAM templates

You can build and test rules manually in the EventBridge console, which can help in the development process as you refine event patterns. However, once you are ready to deploy your application, it’s easier to use a framework like SAM to launch all your serverless resources consistently.

In the example application, open the template.yaml file to view the SAM template, which defines the four Lambda functions. This shows two different ways to integrate the Lambda functions with EventBridge. The first approach uses the Events property to configure the EventBridge rule:

    Type: AWS::Serverless::Function
      CodeUri: atmConsumer/
      Handler: handler.case3Handler
      Runtime: nodejs12.x
          Type: CloudWatchEvent 
                - custom.myATMapp
                - transaction                
                  - "anything-but": "approved"

The syntax defines an event that invokes the Lambda function. In the YAML, you only need to define the pattern, and SAM automatically creates an IAM role with the required permissions. The pattern is the YAML equivalent of the Event Pattern shown in the console earlier.

This example automatically creates the rule on the default event bus, which exists in every AWS account. To associate the rule with a custom event bus, you can add the EventBusName to the template. If this property is missing, SAM uses the default bus.

In the second approach to defining an EventBridge configuration in SAM, you can separate the resources more clearly in the template. First, you define the Lambda function:

    Type: AWS::Serverless::Function
      CodeUri: atmConsumer/
      Handler: handler.case1Handler
      Runtime: nodejs12.x

Next, the rule is defined using an AWS::Events::Rule resource. The properties define the event pattern as before, but can also specify targets. Whereas the first method can only create a single, implied target (the parent Lambda function), you can explicitly define multiple targets using this syntax:

    Type: AWS::Events::Rule
      Description: "Approved transactions"
          - "custom.myATMapp"
          - transaction   
            - "approved"
      State: "ENABLED"
              - "atmConsumerCase1Fn"
              - "Arn"
          Id: "atmConsumerTarget1"

Finally, there is an AWS::Lambda::Permission resource that grants permission to EventBridge to invoke the target:

    Type: AWS::Lambda::Permission
        Ref: "atmConsumerCase1Fn"
      Action: "lambda:InvokeFunction"
      Principal: "events.amazonaws.com"
          - "EventRuleCase1"
          - "Arn"

For simple integrations where one Lambda function is invoked by one rule, the first approach is recommended. If you have complex routing logic, or you are connecting to resources outside of your SAM template, the second method is the better choice.


This walkthrough shows how to build a simple serverless application that produces and consumes events, using the EventBridge event bus to managing the routing. Using event patterns in rules, you can centralize the routing logic at EventBridge, helping reduce code in your downstream consuming services.

SAM templates make it simple to create EventBridge rules and define Lambda functions as targets. I show two ways to use SAM statements to define IAM permissions implicitly or explicitly. This allows you to decouple the services within your serverless applications, and take advantage of the routing offered by EventBridge with minimal configuration in your SAM templates.

To learn more, visit the Amazon EventBridge documentation.


Customizing triggers for AWS CodePipeline with AWS Lambda and Amazon CloudWatch Events

Post Syndicated from Bryant Bost original https://aws.amazon.com/blogs/devops/adding-custom-logic-to-aws-codepipeline-with-aws-lambda-and-amazon-cloudwatch-events/

AWS CodePipeline is a fully managed continuous delivery service that helps automate the build, test, and deploy processes of your application. Application owners use CodePipeline to manage releases by configuring “pipeline,” workflow constructs that describe the steps, from source code to deployed application, through which an application progresses as it is released. If you are new to CodePipeline, check out Getting Started with CodePipeline to get familiar with the core concepts and terminology.


In a default setup, a pipeline is kicked-off whenever a change in the configured pipeline source is detected. CodePipeline currently supports sourcing from AWS CodeCommit, GitHub, Amazon ECR, and Amazon S3. When using CodeCommit, Amazon ECR, or Amazon S3 as the source for a pipeline, CodePipeline uses an Amazon CloudWatch Event to detect changes in the source and immediately kick off a pipeline. When using GitHub as the source for a pipeline, CodePipeline uses a webhook to detect changes in a remote branch and kick off the pipeline. Note that CodePipeline also supports beginning pipeline executions based on periodic checks, although this is not a recommended pattern.

CodePipeline supports adding a number of custom actions and manual approvals to ensure that pipeline functionality is flexible and code releases are deliberate; however, without further customization, pipelines will still be kicked-off for every change in the pipeline source. To customize the logic that controls pipeline executions in the event of a source change, you can introduce a custom CloudWatch Event, which can result in the following benefits:

  • Multiple pipelines with a single source: Trigger select pipelines when multiple pipelines are listening to a single source. This can be useful if your organization is using monorepos, or is using a single repository to host configuration files for multiple instances of identical stacks.
  • Avoid reacting to unimportant files: Avoid triggering a pipeline when changing files that do not affect the application functionality (e.g. documentation files, readme files, and .gitignore files).
  • Conditionally kickoff pipelines based on environmental conditions: Use custom code to evaluate whether a pipeline should be triggered. This allows for further customization beyond polling a source repository or relying on a push event. For example, you could create custom logic to automatically reschedule deployments on holidays to the next available workday.

This post explores and demonstrates how to customize the actions that invoke a pipeline by modifying the default CloudWatch Events configuration that is used for CodeCommit, ECR, or S3 sources. To illustrate this customization, we will walk through two examples: prevent updates to documentation files from triggering a pipeline, and manage execution of multiple pipelines monitoring a single source repository.

The key concepts behind customizing pipeline invocations extend to GitHub sources and webhooks as well; however, creating a custom webhook is outside the scope of this post.

Sample Architecture

This post is only interested in controlling the execution of the pipeline (as opposed to the deploy, test, or approval stages), so it uses simple source and pipeline configurations. The sample architecture considers a simple CodePipeline with only two stages: source and build.

Example CodePipeline Architecture

Example CodePipeline Architecture with Custom CloudWatch Event Configuration

The sample CodeCommit repository consists only of buildspec.yml, readme.md, and script.py files.

Normally, after you create a pipeline, it automatically triggers a pipeline execution to release the latest version of your source code. From then on, every time you make a change to your source location, a new pipeline execution is triggered. In addition, you can manually re-run the last revision through a pipeline using the “Release Change” button in the console. This architecture uses a custom CloudWatch Event and AWS Lambda function to avoid commits that change only the readme.md file from initiating an execution of the pipeline.

Creating a custom CloudWatch Event

When we create a CodePipeline that monitors a CodeCommit (or other) source, a default CloudWatch Events rule is created to trigger our pipeline for every change to the CodeCommit repository. This CloudWatch Events rule monitors the CodeCommit repository for changes, and triggers the pipeline for events matching the referenceCreated or referenceUpdated CodeCommit Event (refer to CodeCommit Event Types for more information).

Default CloudWatch Events Rule to Trigger CodePipeline

Default CloudWatch Events Rule to Trigger CodePipeline

To introduce custom logic and control the events that kickoff the pipeline, this example configures the default CloudWatch Events rule to detect changes in the source and trigger a Lambda function rather than invoke the pipeline directly. The example uses a CodeCommit source, but the same principle applies to Amazon S3 and Amazon ECR sources as well, as these both use CloudWatch Events rules to notify CodePipeline of changes.

Custom CloudWatch Events Rule to Trigger CodePipeline

Custom CloudWatch Events Rule to Trigger CodePipeline

When a change is introduced to the CodeCommit repository, the configured Lambda function receives an event from CloudWatch signaling that there has been a source change.

   "detail-type":"CodeCommit Repository State Change",
      "callerUserArn":"arn:aws:sts::accountNumber:assumed-role/admin/roleName ",

The Lambda function is responsible for determining whether a source change necessitates kicking-off the pipeline, which in the example is necessary if the change contains modifications to files other than readme.md. To implement this, the Lambda function uses the commitId and oldCommitId fields provided in the body of the CloudWatch event message to determine which files have changed. If the function determines that a change has occurred to a “non-ignored” file, then the function programmatically executes the pipeline. Note that for S3 sources, it may be necessary to process an entire file zip archive, or to retrieve past versions of an artifact.

import boto3

files_to_ignore = [ "readme.md" ]

codecommit_client = boto3.client('codecommit')
codepipeline_client = boto3.client('codepipeline')

def lambda_handler(event, context):
    # Extract commits
    old_commit_id = event["detail"]["oldCommitId"]
    new_commit_id = event["detail"]["commitId"]
    # Get commit differences
    codecommit_response = codecommit_client.get_differences(
    # Search commit differences for files to ignore
    for difference in codecommit_response["differences"]:
        file_name = difference["afterBlob"]["path"].lower()
        # If non-ignored file is present, kickoff pipeline
        if file_name not in files_to_ignore:
            codepipeline_response = codepipeline_client.start_pipeline_execution(
            # Break to avoid executing the pipeline twice

Multiple pipelines sourcing from a single repository

Architectures that use a single-source repository monitored by multiple pipelines can add custom logic to control the types of events that trigger a specific pipeline to execute. Without customization, any change to the source repository would trigger all pipelines.

Consider the following example:

  • A CodeCommit repository contains a number of config files (for example, config_1.json and config_2.json).
  • Multiple pipelines (for example, codepipeline-customization-sandbox-pipeline-1 and codepipeline-customization-sandbox-pipeline-2) source from this CodeCommit repository.
  • Whenever a config file is updated, a custom CloudWatch Event triggers a Lambda function that is used to determine which config files changed, and therefore which pipelines should be executed.
Example CodePipeline Architecture

Example CodePipeline Architecture for Monorepos with Custom CloudWatch Event Configuration

This example follows the same pattern of creating a custom CloudWatch Event and Lambda function shown in the preceding example. However, in this scenario, the Lambda function is responsible for determining which files changed and which pipelines should be kicked off as a result. To execute this logic, the Lambda function uses the config_file_mapping variable to map files to corresponding pipelines. Pipelines are only executed if their designated config file has changed.

Note that the config_file_mapping can be exported to Amazon S3 or Amazon DynamoDB for more complex use cases.

import boto3

# Map config files to pipelines
config_file_mapping = {
        "config_1.json" : "codepipeline-customization-sandbox-pipeline-1",
        "config_2.json" : "codepipeline-customization-sandbox-pipeline-2"
codecommit_client = boto3.client('codecommit')
codepipeline_client = boto3.client('codepipeline')

def lambda_handler(event, context):
    # Extract commits
    old_commit_id = event["detail"]["oldCommitId"]
    new_commit_id = event["detail"]["commitId"]
    # Get commit differences
    codecommit_response = codecommit_client.get_differences(
    # Search commit differences for files that trigger executions
    for difference in codecommit_response["differences"]:
        file_name = difference["afterBlob"]["path"].lower()
        # If file corresponds to pipeline, execute pipeline
        if file_name in config_file_mapping:
            codepipeline_response = codepipeline_client.start_pipeline_execution(


For the first example, updates affecting only the readme.md file are completely ignored by the pipeline, while updates affecting other files begin a normal pipeline execution. For the second example, the two pipelines monitor the same source repository; however, codepipeline-customization-sandbox-pipeline-1 is executed only when config_1.json is updated and codepipeline-customization-sandbox-pipeline-2 is executed only when config_2.json is updated.

These CloudWatch Event and Lambda function combinations serve as a good general examples of the introduction of custom logic to pipeline kickoffs, and can be expanded to account for variously complex processing logic.


To avoid additional infrastructure costs from the examples described in this post, be sure to delete all CodeCommit repositories, CodePipeline pipelines, Lambda functions, and CodeBuild projects. When you delete a CodePipeline, the CloudWatch Events rule that was created automatically is deleted, even if the rule has been customized.


For scenarios which need you to define additional custom logic to control the execution of one or multiple pipelines, configuring a CloudWatch Event to trigger a Lambda function allows you to customize the conditions and types of events that can kick-off your pipeline.

Reducing custom code by using advanced rules in Amazon EventBridge

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/reducing-custom-code-by-using-advanced-rules-in-amazon-eventbridge/

Amazon EventBridge allows you to route events between AWS services, integrated software as a service (SaaS) applications, and your own applications. Event producers publish events onto an event bus, which uses rules to determine where to send those events. The rules can specify one or more targets, which can be other AWS services or Lambda functions. This model makes it easy to develop scalable, distributed serverless applications by handling event routing and filtering.

EventBridge Content Filtering

EventBridge recently introduced additional content filtering functionality, which creates new possibilities for building sophisticated rules. This blog post explores how to use event patterns to build rules that make this routing process more powerful without needing custom code. I show how this could work with a sample ATM banking application integrating into an AWS service.

Events, rules, and filtering

In EventBridge, an event is simply a JSON structure. It contains some top-level envelope fields, such as the source, event, and timestamp, followed by a detail field containing the body of the event. Events generated from AWS services always contain a number of descriptive fields and are identifiable by the source attribute prefix “aws”.

You can also generate events from your own applications. EventBridge requires specific envelope fields, but otherwise you are free to add additional attributes as needed. A typical event structure for a custom application looks like this:

  "Source": string,
  "EventBusName": string,
  "DetailType": string,
  "Detail": string

If your application uses nested attributes, you must convert the Detail attribute into a string. In programming languages such as Node.js, you can do this using JSON.stringify to send an event, and JSON.parse when receiving it. For example, for a banking application where an ATM application sends events to EventBridge, a cash withdrawal event may look like this:

  "Source": "custom.myATMapp",
  "EventBusName": "default",
  "DetailType": "transaction",
  "Detail": "{\"action\":\"withdrawal”,\"amount\":300}"

EventBridge rules use event patterns that are JSON structures. These match against the attributes in the events. In the rules, you only specify the fields where you want to apply filtering logic.

To see all events for a single application using an event bus, you can filter by source. Any incoming event with this source matches regardless of the content of other fields. In the ATM application example, a rule that accepts all events looks like this:

  "source": [ "custom.myATMapp" ]

EventBridge examines the incoming event and compares it against this rule. The rule specifies a source value of custom.myATMapp and, as this exists in the event, the pattern matches. It then routes the event to the rule’s targets:

EventBridge rules

The example above shows a static, exact match pattern – the attribute is either present, or it’s not. There are now additional operators available for dynamic matching based on specific comparison conditions. This provides functionality that’s similar to what you use in a SQL where clause for filtering records in a database.

Here is a summary of all the comparison operators available in EventBridge:

ComparisonExampleRule syntax
NullUserID is null“UserID”: [ null ]
EmptyLastName is empty“LastName”: [“”]
EqualsName is “Alice”“Name”: [ “Alice” ]
AndLocation is “New York” and Day is “Monday”“Location”: [ “New York” ], “Day”: [“Monday”]
OrPaymentType is “Credit” or “Debit”“PaymentType”: [ “Credit”, “Debit”]
NotWeather is anything but “Raining”“Weather”: [ { “anything-but”: [ “Raining” ] } ]
Numeric (equals)Price is 100“Price”: [ { “numeric”: [ “=”, 100 ] } ]
Numeric (range)Price is more than 10, and less than or equal to 20“Price”: [ { “numeric”: [ “>”, 10, “<=”, 20 ] } ]
ExistsProductName exists“ProductName”: [ { “exists”: true } ]
Does not existProductName does not exist“ProductName”: [ { “exists”: false } ]
Begins withRegion is in the US“Region”: [ {“prefix”: “us-“ } ]

Filtering events in a custom application

In this example, a bank runs software on a network of ATMs that forwards transactional information to EventBridge. This software sends all events to EventBridge, but downstream systems only want to receive a subset of ATM events:

ATM example application

The events from the ATMs have the following structure:

        "Source": "custom.myATMapp",
        "EventBusName": "default",
        "DetailType": "transaction",
        "Time": "Wed Jan 29 2020 08:03:18 GMT-0500",
          "action": "withdrawal",
          "location": "NY-NYC-001",
          "amount": 300,
          "result": "approved",
          "transactionId": "123456",
          "cardPresent": true,
          "partnerBank": "Example Bank",
          "remainingFunds": 722.34

The downstream services can use the event patterns in EventBridge rules to ensure that they only receive specific events.

1. Transactions where the amount is over $300

The following event pattern filters for ATM transactions over $300.

  "source": [ "custom.myATMapp" ],
  "detail-type": [ "transaction" ],
  "detail": {
    "amount": [ { "numeric": [ ">", 300 ] } ]

2. All ATMs in New York City

The ATM location attribute uses the format state-city-id, so NY-NYC-001 indicates that the machine is located in New York City in New York state. To filter events from only ATMs in the New York City area, I use a prefix in the filter:

  "source": [ "custom.myATMapp" ],
  "detail-type": [ "transaction" ],
  "detail": {
    "location": [ { "prefix": "NY-NYC-" } ]

3. ATM customers using a third-party bank account

To filter for transactions that show a partnerBank attribute, the following event pattern checks for the existence of this attribute:

  "source": [ "custom.myATMapp" ],
  "detail-type": [ "transaction" ],
  "detail": {
    "partnerBank": [ { "exists": true } ]

4. Combined filter

I can combine filters in a single event pattern to create use-cases that are more complex. For example, this filters on approved transactions where no partnerBank attribute exists, reporting from any ATM with a location different to NY-NYC-002:

  "source": [ "custom.myATMapp" ],
  "detail-type": [ "transaction" ],
  "detail": {
    "result": [ "approved" ],
    "partnerBank": [ { "exists": false } ],
    "location": [ { "anything-but": "NY-NYC-002" }]

In each of these cases, EventBridge matches incoming events against the event patterns in these rules. If there is no match, it does not route the event. This eliminates custom code that otherwise exists to filter incoming events and terminate if necessary.

Filtering AWS events to create a custom S3-to-Lambda integration

EventBridge uses a variety of AWS services as native event sources. For other AWS services, such as Amazon S3, it consumes events via AWS CloudTrail. You must first enable CloudTrail logging for the service you want to use with EventBridge. Once enabled, you can filter on any of the attributes available in an AWS event. This allows you to create dynamic, flexible integrations in your event-driven applications.

The standard S3-to-Lambda trigger allows developers to subscribe a Lambda function to an event on a single bucket. Although these events can filter on prefixes and suffixes of object keys in S3, you cannot use multiple configurations that overlap. Beyond the prefix and suffix of the key name, you cannot filter further on any other attributes of the event before invoking the Lambda function. To examine the S3 event further, you must do this within the code in the function itself.

Using EventBridge, you can configure a rule between one or more S3 buckets, and one or more Lambda functions, based upon any of the attributes available. This enables you to create much more granular filters for routing events to downstream consumers. Using a declarative approach results in greater flexibility and less custom code. In this section, I show four use-cases where this could be useful.

S3 to EventBridge

(a) Invoking a single Lambda function from events in multiple buckets

This example uses multiple buckets with a common prefix in the bucket name (for example, buckets with the names “myApp-images”, “myApp-uploads”, and “myApp-archive”). You can use all these buckets as an event source to trigger the same Lambda function. This event pattern matches for all put events in those buckets:

  "source": [ "aws.s3" ],
  "detail-type": [ "AWS API Call via CloudTrail" ],
  "detail": {
    "eventSource": [ "s3.amazonaws.com" ],
    "eventName": [ "PutObject" ],
    "requestParameters": {
      "bucketName": [ { "prefix": "myApp-" } ]

(b) Invoking multiple consumers as targets

EventBridge allows up to five targets per rule, so you can specify up to five separate Lambda functions to receive the event. All five functions are invoked in parallel when the event pattern matches. To use this, add the targets in the rule – no changes to the event pattern is required.

If you need more than five targets, use Amazon Simple Notification Service (SNS). You can define an SNS topic as the EventBridge rule target, and then fan out from SNS to much larger number of subscribers.

  "source": [ "aws.s3" ],
  "detail-type": [ "AWS API Call via CloudTrail" ],
  "detail": {
    "eventSource": [ "s3.amazonaws.com" ],
    "eventName": [ "GetObject" ],
    "userAgent": [ "userAgent" ],

    "requestParameters": {
      "bucketName": [ "mybucket" ]


The new content filtering syntax in EventBridge enables precise filtering of events using comparison operators and ranges of values. This allows you to filter declaratively at the event bus rather than filtering downstream using custom code. For custom applications, like the ATM example, it enables you to build precise rules for specific use-cases, reducing the number of calls to targets.

This approach enables you to route events more precisely based upon any of the attributes reported in an event. This makes it easier to handle complex routing at the EventBridge level and reduces the need for custom code across your application.

To learn more about content filtering, see the Amazon EventBridge documentation.

Binge-Watch Live This is My Architecture Videos from AWS re:Invent

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/binge-watch-live-videos-from-aws-reinvent-2019/

AWS re:Invent 2019 was a whirlwind of activity, especially in the Expo Hall, where the AWS team spent four days filming 12 live This is My Architecture videos for Twitch. Watch one a day for the next two weeks…or eat them all in one sitting. Whichever you do, you’re guaranteed to learn something new.


Discover security and operational excellence in healthcare with Accolade.

AWS Solution Builders

Get multi-region Availability with Amazon DynamoDB, Amazon S3, and Amazon Cognito.


EcoFit offers responsive, AWS Lambda-based microservices at scale.


Splunk explains data at scale by decoupling compute and storage.


Crownpeak uses AWS Lambda for its decoupled content deployment architecture.

Formula One Group

Learn how Formula One Group is using Amazon SageMaker to deliver real-time insights to fans.


Adobe is simplifying networking across thousands of AWS accounts with AWS Transit Gateway.

The Trade Desk

The Trade Desk offers real-time ad bidding in the cloud with AWS Global Accelerator.

Mueller Water Products

Learn all about about scalable ingestion of sensor data for municipal water conservation with Mueller Water Products.


NextRoll is driving OpEx efficiency for ad bidding engines.

Pason Systems

Explore petabyte-scale drilling datamart on AWS with Pason Systems.


Application vending machine with runtime event control at UltraServe.


Be sure to visit the AWS channel on Twitch for more in-depth videos and interviews.

ICYMI: Serverless Q4 2019

Post Syndicated from Rob Sutter original https://aws.amazon.com/blogs/compute/icymi-serverless-q4-2019/

Welcome to the eighth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

The three months comprising the fourth quarter of 2019

AWS re:Invent

AWS re:Invent 2019

re:Invent 2019 dominated the fourth quarter at AWS. The serverless team presented a number of talks, workshops, and builder sessions to help customers increase their skills and deliver value more rapidly to their own customers.

Serverless talks from re:Invent 2019

Chris Munns presenting 'Building microservices with AWS Lambda' at re:Invent 2019

We presented dozens of sessions showing how customers can improve their architecture and agility with serverless. Here are some of the most popular.



You can also find decks for many of the serverless presentations and other re:Invent presentations on our AWS Events Content.

AWS Lambda

For developers needing greater control over performance of their serverless applications at any scale, AWS Lambda announced Provisioned Concurrency at re:Invent. This feature enables Lambda functions to execute with consistent start-up latency making them ideal for building latency sensitive applications.

As shown in the below graph, provisioned concurrency reduces tail latency, directly impacting response times and providing a more responsive end user experience.

Graph showing performance enhancements with AWS Lambda Provisioned Concurrency

Lambda rolled out enhanced VPC networking to 14 additional Regions around the world. This change brings dramatic improvements to startup performance for Lambda functions running in VPCs due to more efficient usage of elastic network interfaces.

Illustration of AWS Lambda VPC to VPC NAT

New VPC to VPC NAT for Lambda functions

Lambda now supports three additional runtimes: Node.js 12, Java 11, and Python 3.8. Each of these new runtimes has new version-specific features and benefits, which are covered in the linked release posts. Like the Node.js 10 runtime, these new runtimes are all based on an Amazon Linux 2 execution environment.

Lambda released a number of controls for both stream and async-based invocations:

  • You can now configure error handling for Lambda functions consuming events from Amazon Kinesis Data Streams or Amazon DynamoDB Streams. It’s now possible to limit the retry count, limit the age of records being retried, configure a failure destination, or split a batch to isolate a problem record. These capabilities help you deal with potential “poison pill” records that would previously cause streams to pause in processing.
  • For asynchronous Lambda invocations, you can now set the maximum event age and retry attempts on the event. If either configured condition is met, the event can be routed to a dead letter queue (DLQ), Lambda destination, or it can be discarded.

AWS Lambda Destinations is a new feature that allows developers to designate an asynchronous target for Lambda function invocation results. You can set separate destinations for success and failure. This unlocks new patterns for distributed event-based applications and can replace custom code previously used to manage routing results.

Illustration depicting AWS Lambda Destinations with success and failure configurations

Lambda Destinations

Lambda also now supports setting a Parallelization Factor, which allows you to set multiple Lambda invocations per shard for Kinesis Data Streams and DynamoDB Streams. This enables faster processing without the need to increase your shard count, while still guaranteeing the order of records processed.

Illustration of multiple AWS Lambda invocations per Kinesis Data Streams shard

Lambda Parallelization Factor diagram

Lambda introduced Amazon SQS FIFO queues as an event source. “First in, first out” (FIFO) queues guarantee the order of record processing, unlike standard queues. FIFO queues support messaging batching via a MessageGroupID attribute that supports parallel Lambda consumers of a single FIFO queue, enabling high throughput of record processing by Lambda.

Lambda now supports Environment Variables in the AWS China (Beijing) Region and the AWS China (Ningxia) Region.

You can now view percentile statistics for the duration metric of your Lambda functions. Percentile statistics show the relative standing of a value in a dataset, and are useful when applied to metrics that exhibit large variances. They can help you understand the distribution of a metric, discover outliers, and find hard-to-spot situations that affect customer experience for a subset of your users.

Amazon API Gateway

Screen capture of creating an Amazon API Gateway HTTP API in the AWS Management Console

Amazon API Gateway announced the preview of HTTP APIs. In addition to significant performance improvements, most customers see an average cost savings of 70% when compared with API Gateway REST APIs. With HTTP APIs, you can create an API in four simple steps. Once the API is created, additional configuration for CORS and JWT authorizers can be added.


Screen capture of the new 'sam deploy' process in a terminal window

The AWS SAM CLI team simplified the bucket management and deployment process in the SAM CLI. You no longer need to manage a bucket for deployment artifacts – SAM CLI handles this for you. The deployment process has also been streamlined from multiple flagged commands to a single command, sam deploy.

AWS Step Functions

One powerful feature of AWS Step Functions is its ability to integrate directly with AWS services without you needing to write complicated application code. In Q4, Step Functions expanded its integration with Amazon SageMaker to simplify machine learning workflows. Step Functions also added a new integration with Amazon EMR, making EMR big data processing workflows faster to build and easier to monitor.

Screen capture of an AWS Step Functions step with Amazon EMR

Step Functions step with EMR

Step Functions now provides the ability to track state transition usage by integrating with AWS Budgets, allowing you to monitor trends and react to usage on your AWS account.

You can now view CloudWatch Metrics for Step Functions at a one-minute frequency. This makes it easier to set up detailed monitoring for your workflows. You can use one-minute metrics to set up CloudWatch Alarms based on your Step Functions API usage, Lambda functions, service integrations, and execution details.

Step Functions now supports higher throughput workflows, making it easier to coordinate applications with high event rates. This increases the limits to 1,500 state transitions per second and a default start rate of 300 state machine executions per second in US East (N. Virginia), US West (Oregon), and Europe (Ireland). Click the above link to learn more about the limit increases in other Regions.

Screen capture of choosing Express Workflows in the AWS Management Console

Step Functions released AWS Step Functions Express Workflows. With the ability to support event rates greater than 100,000 per second, this feature is designed for high-performance workloads at a reduced cost.

Amazon EventBridge

Illustration of the Amazon EventBridge schema registry and discovery service

Amazon EventBridge announced the preview of the Amazon EventBridge schema registry and discovery service. This service allows developers to automate discovery and cataloging event schemas for use in their applications. Additionally, once a schema is stored in the registry, you can generate and download a code binding that represents the schema as an object in your code.

Amazon SNS

Amazon SNS now supports the use of dead letter queues (DLQ) to help capture unhandled events. By enabling a DLQ, you can catch events that are not processed and re-submit them or analyze to locate processing issues.

Amazon CloudWatch

Amazon CloudWatch announced Amazon CloudWatch ServiceLens to provide a “single pane of glass” to observe health, performance, and availability of your application.

Screenshot of Amazon CloudWatch ServiceLens in the AWS Management Console

CloudWatch ServiceLens

CloudWatch also announced a preview of a capability called Synthetics. CloudWatch Synthetics allows you to test your application endpoints and URLs using configurable scripts that mimic what a real customer would do. This enables the outside-in view of your customers’ experiences, and your service’s availability from their point of view.

CloudWatch introduced Embedded Metric Format, which helps you ingest complex high-cardinality application data as logs and easily generate actionable metrics. You can publish these metrics from your Lambda function by using the PutLogEvents API or using an open source library for Node.js or Python applications.

Finally, CloudWatch announced a preview of Contributor Insights, a capability to identify who or what is impacting your system or application performance by identifying outliers or patterns in log data.


AWS X-Ray announced trace maps, which enable you to map the end-to-end path of a single request. Identifiers show issues and how they affect other services in the request’s path. These can help you to identify and isolate service points that are causing degradation or failures.

X-Ray also announced support for Amazon CloudWatch Synthetics, currently in preview. CloudWatch Synthetics on X-Ray support tracing canary scripts throughout the application, providing metrics on performance or application issues.

Screen capture of AWS X-Ray Service map in the AWS Management Console

X-Ray Service map with CloudWatch Synthetics

Amazon DynamoDB

Amazon DynamoDB announced support for customer-managed customer master keys (CMKs) to encrypt data in DynamoDB. This allows customers to bring your own key (BYOK) giving you full control over how you encrypt and manage the security of your DynamoDB data.

It is now possible to add global replicas to existing DynamoDB tables to provide enhanced availability across the globe.

Another new DynamoDB capability to identify frequently accessed keys and database traffic trends is currently in preview. With this, you can now more easily identify “hot keys” and understand usage of your DynamoDB tables.

Screen capture of Amazon CloudWatch Contributor Insights for DynamoDB in the AWS Management Console

CloudWatch Contributor Insights for DynamoDB

DynamoDB also released adaptive capacity. Adaptive capacity helps you handle imbalanced workloads by automatically isolating frequently accessed items and shifting data across partitions to rebalance them. This helps reduce cost by enabling you to provision throughput for a more balanced workload instead of over provisioning for uneven data access patterns.

Amazon RDS

Amazon Relational Database Services (RDS) announced a preview of Amazon RDS Proxy to help developers manage RDS connection strings for serverless applications.

Illustration of Amazon RDS Proxy

The RDS Proxy maintains a pool of established connections to your RDS database instances. This pool enables you to support a large number of application connections so your application can scale without compromising performance. It also increases security by enabling IAM authentication for database access and enabling you to centrally manage database credentials using AWS Secrets Manager.

AWS Serverless Application Repository

The AWS Serverless Application Repository (SAR) now offers Verified Author badges. These badges enable consumers to quickly and reliably know who you are. The badge appears next to your name in the SAR and links to your GitHub profile.

Screen capture of SAR Verifiedl developer badge in the AWS Management Console

SAR Verified developer badges

AWS Developer Tools

AWS CodeCommit launched the ability for you to enforce rule workflows for pull requests, making it easier to ensure that code has pass through specific rule requirements. You can now create an approval rule specifically for a pull request, or create approval rule templates to be applied to all future pull requests in a repository.

AWS CodeBuild added beta support for test reporting. With test reporting, you can now view the detailed results, trends, and history for tests executed on CodeBuild for any framework that supports the JUnit XML or Cucumber JSON test format.

Screen capture of AWS CodeBuild

CodeBuild test trends in the AWS Management Console

Amazon CodeGuru

AWS announced a preview of Amazon CodeGuru at re:Invent 2019. CodeGuru is a machine learning based service that makes code reviews more effective and aids developers in writing code that is more secure, performant, and consistent.

AWS Amplify and AWS AppSync

AWS Amplify added iOS and Android as supported platforms. Now developers can build iOS and Android applications using the Amplify Framework with the same category-based programming model that they use for JavaScript apps.

Screen capture of 'amplify init' for an iOS application in a terminal window

The Amplify team has also improved offline data access and synchronization by announcing Amplify DataStore. Developers can now create applications that allow users to continue to access and modify data, without an internet connection. Upon connection, the data synchronizes transparently with the cloud.

For a summary of Amplify and AppSync announcements before re:Invent, read: “A round up of the recent pre-re:Invent 2019 AWS Amplify Launches”.

Illustration of AWS AppSync integrations with other AWS services

Q4 serverless content

Blog posts




Tech talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page.

Here are the ones from Q4:



There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

AWS Serverless Heroes

We are excited to welcome some new AWS Serverless Heroes to help grow the serverless community. We look forward to some amazing content to help you with your serverless journey.

AWS Serverless Application Repository (SAR) Apps

In this edition of ICYMI, we are introducing a section devoted to SAR apps written by the AWS Serverless Developer Advocacy team. You can run these applications and review their source code to learn more about serverless and to see examples of suggested practices.

Still looking for more?

The Serverless landing page has much more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. We’re also kicking off a fresh series of Tech Talks in 2020 with new content providing greater detail on everything new coming out of AWS for serverless application developers.

Throughout 2020, the AWS Serverless Developer Advocates are crossing the globe to tell you more about serverless, and to hear more about what you need. Follow this blog to keep up on new launches and announcements, best practices, and examples of serverless applications in action.

You can also follow all of us on Twitter to see latest news, follow conversations, and interact with the team.

Chris Munns: @chrismunns
Eric Johnson: @edjgeek
James Beswick: @jbesw
Moheeb Zara: @virgilvox
Ben Smith: @benjamin_l_s
Rob Sutter: @rts_rob
Julian Wood: @julian_wood

Happy coding!