All posts by Eric Johnson

Serverless ICYMI Q4 2023

2024-01-09 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/serverless-icymi-q4-2023/

Welcome to the 24th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

2023 Q4 Calendar

ServerlessVideo

ServerlessVideo at re:Invent 2024

ServerlessVideo is a demo application built by the AWS Serverless Developer Advocacy team to stream live videos and also perform advanced post-video processing. It uses several AWS services including AWS Step Functions, Amazon EventBridge, AWS Lambda, Amazon ECS, and Amazon Bedrock in a serverless architecture that makes it fast, flexible, and cost-effective. Key features include an event-driven core with loosely coupled microservices that respond to events routed by EventBridge. Step Functions orchestrates using both Lambda and ECS for video processing to balance speed, scale, and cost. There is a flexible plugin-based architecture using Step Functions and EventBridge to integrate and manage multiple video processing workflows, which include GenAI.

ServerlessVideo allows broadcasters to stream video to thousands of viewers using Amazon IVS. When a broadcast ends, a Step Functions workflow triggers a set of configured plugins to process the video, generating transcriptions, validating content, and more. The application incorporates various microservices to support live streaming, on-demand playback, transcoding, transcription, and events. Learn more about the project and watch videos from reinvent 2023 at video.serverlessland.com.

AWS Lambda

AWS Lambda enabled outbound IPv6 connections from VPC-connected Lambda functions, providing virtually unlimited scale by removing IPv4 address constraints.

The AWS Lambda and AWS SAM teams also added support for sharing test events across teams using AWS SAM CLI to improve collaboration when testing locally.

AWS Lambda introduced integration with AWS Application Composer, allowing users to view and export Lambda function configuration details for infrastructure as code (IaC) workflows.

AWS added advanced logging controls enabling adjustable JSON-formatted logs, custom log levels, and configurable CloudWatch log destinations for easier debugging. AWS enabled monitoring of errors and timeouts occurring during initialization and restore phases in CloudWatch Logs as well, making troubleshooting easier.

For Kafka event sources, AWS enabled failed event destinations to prevent functions stalling on failing batches by rerouting events to SQS, SNS, or S3. AWS also enhanced Lambda auto scaling for Kafka event sources in November to reach maximum throughput faster, reducing latency for workloads prone to large bursts of messages.

AWS launched support for Python 3.12 and Java 21 Lambda runtimes, providing updated libraries, smaller deployment sizes, and better AWS service integration. AWS also introduced a simplified console workflow to automate complex network configuration when connecting functions to Amazon RDS and RDS Proxy.

Additionally in December, AWS enabled faster individual Lambda function scaling allowing each function to rapidly absorb traffic spikes by scaling up to 1000 concurrent executions every 10 seconds.

Amazon ECS and AWS Fargate

In Q4 of 2023, AWS introduced several new capabilities across its serverless container services including Amazon ECS, AWS Fargate, AWS App Runner, and more. These features help improve application resilience, security, developer experience, and migration to modern containerized architectures.

In October, Amazon ECS enhanced its task scheduling to start healthy replacement tasks before terminating unhealthy ones during traffic spikes. This prevents going under capacity due to premature shutdowns. Additionally, App Runner launched support for IPv6 traffic via dual-stack endpoints to remove the need for address translation.

In November, AWS Fargate enabled ECS tasks to selectively use SOCI lazy loading for only large container images in a task instead of requiring it for all images. Amazon ECS also added idempotency support for task launches to prevent duplicate instances on retries. Amazon GuardDuty expanded threat detection to Amazon ECS and Fargate workloads which users can easily enable.

Also in November, the open source Finch container tool for macOS became generally available. Finch allows developers to build, run, and publish Linux containers locally. A new website provides tutorials and resources to help developers get started.

Finally in December, AWS Migration Hub Orchestrator added new capabilities for replatforming applications to Amazon ECS using guided workflows. App Runner also improved integration with Route 53 domains to automatically configure required records when associating custom domains.

AWS Step Functions

In Q4 2023, AWS Step Functions announced the redrive capability for Standard Workflows. This feature allows failed workflow executions to be redriven from the point of failure, skipping unnecessary steps and reducing costs. The redrive functionality provides an efficient way to handle errors that require longer investigation or external actions before resuming the workflow.

Step Functions also launched support for HTTPS endpoints in AWS Step Functions, enabling easier integration with external APIs and SaaS applications without needing custom code. Developers can now connect to third-party HTTP services directly within workflows. Additionally, AWS released a new test state capability that allows testing individual workflow states before full deployment. This feature helps accelerate development by making it faster and simpler to validate data mappings and permissions configurations.

AWS announced optimized integrations between AWS Step Functions and Amazon Bedrock for orchestrating generative AI workloads. Two new API actions were added specifically for invoking Bedrock models and training jobs from workflows. These integrations simplify building prompt chaining and other techniques to create complex AI applications with foundation models.

Finally, the Step Functions Workflow Studio is now integrated in the AWS Application Composer. This unified builder allows developers to design workflows and define application resources across the full project lifecycle within a single interface.

Amazon EventBridge

Amazon EventBridge announced support for new partner integrations with Adobe and Stripe. These integrations enable routing events from the Adobe and Stripe platforms to over 20 AWS services. This makes it easier to build event-driven architectures to handle common use cases.

Amazon SNS

In Q4, Amazon SNS added native in-place message archiving for FIFO topics to improve event stream durability by allowing retention policies and selective replay of messages without provisioning separate resources. Additional message filtering operators were also introduced including suffix matching, case-insensitive equality checks, and OR logic for matching across properties to simplify routing logic implementation for publishers and subscribers. Finally, delivery status logging was enabled through AWS CloudFormation.

Amazon SQS

Amazon SQS has introduced several major new capabilities and updates. These improve visibility, throughput, and message handling for users. Specifically, Amazon SQS enabled AWS CloudTrail logging of key SQS APIs. This gives customers greater visibility into SQS activity. Additionally, SQS increased the throughput quota for the high throughput mode of FIFO queues. This was significantly increased in certain Regions. It also boosted throughput in Asia Pacific Regions. Furthermore, Amazon SQS added dead letter queue redrive support. This allows you to redrive messages that failed and were sent to a dead letter queue (DLQ).

Serverless at AWS re:Invent

Serverless videos from re:Invent

Visit the Serverless Land YouTube channel to find a list of serverless and serverless container sessions from reinvent 2023. Hear from experts like Chris Munns and Julian Wood in their popular session, Best practices for serverless developers, or Nathan Peck and Jessica Deen in Deploying multi-tenant SaaS applications on Amazon ECS and AWS Fargate.

EDA Day Nashville

The AWS Serverless Developer Advocacy team hosted an event-driven architecture (EDA) day conference on October 26, 2022 in Nashville, Tennessee. This inaugural GOTO EDA day convened over 200 attendees ranging from prominent EDA community members to AWS speakers and product managers. Attendees engaged in 13 sessions, two workshops, and panels covering EDA adoption best practices. The event built upon 2022 content by incorporating additional topics like messaging, containers, and machine learning. It also created opportunities for students and underrepresented groups in tech to participate. The full-day conference facilitated education, inspiration, and thoughtful discussion around event-driven architectural patterns and services on AWS.

Videos from EDA Day are now available on the Serverless Land YouTube channel.

Serverless blog posts

October

November

December

Serverless container blog posts

October

November

December

Serverless Office Hours

October

Oct 3 – Governance in depth for serverless apps
Oct 10 – Serverless observability
Oct 17 – Super serverless tools with Lars Jacobsson
Oct 24 – Building GenAI apps
Oct 31 – Visually build AWS applications

November

December

Dec 5 – Step Functions: what’s new
Dec 19 – 2023 Year in review

Containers from the Couch

FooBar

October

November

December

Dec 7 – Build generative AI apps using AWS Step Functions and Amazon Bedrock
Dec 14 – Test State API for Step Functions
Dec 21 – Invoke external endpoints from AWS Step Functions

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

James Beswick: @jbesw
Eric Johnson: @edjgeek
Ben Smith: @benjamin_l_s
Julian Wood: @julian_wood
Marcia Villalba: @mavi888uy
David Boyne: @boyney123

Jeramiah Dooley @jdooley_clt
Jessica Deen @jldeen
Kyle Davis @linux_mclinuxface
Maish Saidel-Keesing @maishsk
Nathan Peck @nathanpeck
Olly Pomeroy @oliver-p
Scott Coulton @scottcoulton

And finally, visit the Serverless Land and Containers on AWS websites for all your serverless and serverless container needs.

Introducing support for read-only management events in Amazon EventBridge

2023-11-18 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/introducing-support-for-read-only-management-events-in-amazon-eventbridge/

This post is written by Pawan Puthran, Principal Specialist TAM, Serverless and Heeki Park, Principal Solutions Architect, Serverless

Today, AWS is announcing support for read-only management events in Amazon EventBridge. This feature enables customers to build rich event-driven responses from any action taken on AWS infrastructure to detect security vulnerabilities or identify suspicious activity in near real-time. You can now gain insight into all activity across all your AWS accounts and respond to those events as is appropriate.

Overview

EventBridge is a serverless event bus used to decouple event producers and consumers. Event producers publish events onto an event bus, which then uses rules to determine where to send those events. The rules determine the downstream targets that receive and process the events, and EventBridge routes the events accordingly.

EventBridge allows customers to monitor, audit, and react, in near real-time, to changes in their AWS environments through events generated by AWS CloudTrail for AWS API calls. CloudTrail records actions taken by a user, role, or an AWS service as events in a trail. Events include actions taken in the AWS Management Console, AWS Command Line Interface (CLI), and AWS SDKs and APIs.

Previously, only mutating API calls are published from CloudTrail to EventBridge for control plane changes. Events for mutating API calls include those that create, update, or delete resources. Control plane changes are also referred to as management events. EventBridge now supports non-mutating or read-only API calls for management events at no additional cost. These include those that list, get, or describe resources.

CloudTrail events in EventBridge enable you to build rich event-driven responses from any action taken on AWS infrastructure in real time. Previously, customers and partners often used a polling model to iterate over a batch of CloudTrail logs from Amazon S3 buckets to detect issues, making it slower to respond. The launch of read-only management events enables you to detect and remediate issues in near real-time and thus improve the overall security posture.

Enabling read-only management events

You can start receiving read-only management events if a CloudTrail is configured in your account and if the event selector for that CloudTrail is configured with ReadWriteType of either All or ReadOnly. This ensures that the read-only management events are logged in the CloudTrail and are then passed to EventBridge.

For example, you can receive an alert if a production account lists resources from an IP address outside of your VPC. Another example could be if an entity, such as a principal (AWS account root user, IAM role, or IAM user), calls the ListInstanceProfilesForRole or DescribeInstances APIs without any prior record of doing so. A malicious actor could use stolen credentials for conducting this reconnaissance to find more valuable credentials or determine the capabilities of the credentials they have.

To enable read-only management events with an EventBridge rule, in addition to the existing mutating events, use the new ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS state when creating the rule.

Resources:
  SampleMgmtRule:
    Type: 'AWS::Events::Rule'
    Properties:
      Description: 'Example for enabling read-only management events'
      State: ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS
      EventPattern:
        source:
          - aws.s3
        detail:
          - AWS API Call via CloudTrail
      Targets:
        - Arn: 'arn:aws:sns:us-east-1:123456789012:notificationTopic'
          Id: 'NotificationTarget'

You can also create a rule on an event bus to specify a particular API action by using the eventName attribute under the detail key:

aws events put-rule --name "SampleTestRule" \
--event-pattern '{"detail": {"eventName": ["ListBuckets"]}}' \
--state ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS --region us-east-1

Common scenarios and use-cases

The following section describes a couple of the scenarios where you can set up EventBridge rules and take actions on non-mutating or read-only management events.

Detecting anomalous Secrets Manager GetSecretValue API Calls

Consider the security team of your organization that wants a notification whenever GetSecretValue API calls for AWS Secrets Manager are made through the CLI. They can then investigate if these calls are made by entities outside their organization or by an unauthorized user and take corrective actions to deny such requests.

When the application calls the GetSecretValue API to retrieve the secrets via a CLI, it generates an event like this:

{
    "version": "0",
    "id": "d3368cc1-e6d6-e4bf-e58e-030f03b6eae3",
    "detail-type": "AWS API Call via CloudTrail",
    "source": "aws.secretsmanager",
    "account": "111111111111",
    "time": "2023-11-08T19:58:38Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventVersion": "1.08",
        "userIdentity": {
            "type": "IAMUser",
            "principalId": "AAAAAAAAAAAAA",
            "arn": "arn:aws:iam:: 111111111111:user/USERNAME"
            // ... additional detail fields
        },
        "eventTime": "2023-11-08T19:58:38Z",
        "eventSource": "secretsmanager.amazonaws.com",
        "eventName": "GetSecretValue",
        "awsRegion": "us-east-1",
        "userAgent": "aws-cli/2.13.15 Python/3.11.4 Darwin/22.6.0 exe/x86_64 prompt/off command/secretsmanager.get-secret-value"
        // ... additional detail fields
    }
}

You set the following event pattern on the rule to filter incoming events to specific consumers. This example also uses the recently launched wildcard filter for event matching.

{
    "source": [
        "aws.secretsmanager"
    ],
    "detail-type": [
        "AWS API Call via CloudTrail"
    ],
    "detail": {
        "eventName": [
            "GetSecretValue"
        ],
        "userAgent": [
            {
                "wildcard": "aws-cli/*"
            }
        ]
    }
}

You can create a rule matching a combination of these event properties. In this case, you are matching for aws.secretsmanager as source, AWS API Call via CloudTrail as detail-type, GetSecretValue as detail.eventName and wildcard pattern on detail.userAgent for aws-cli/*. You can filter detail.userAgent with a wildcard to catch events that come from a particular application or user.

You can then route these events to a target like an Amazon CloudWatch Logs stream to record the change. You can also route them to Amazon SNS to get notified via email subscription. You can alternatively route them to an AWS Lambda function in which you perform custom business logic.

Creating an EventBridge rule for read-only management events

Create a rule on the default event bus using the new state ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS.

aws events put-rule --name "monitor-secretsmanager" \
--event-pattern '{"source": ["aws.secretsmanager"], "detail-type": ["AWS API Call via CloudTrail"], "detail": {"eventName": ["GetSecretValue"], "userAgent": [{ "wildcard": "aws-cli/*"} ]}}' \
--state ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS --region us-east-1

Rule details

Configure a target. In this case, the target service is CloudWatch Logs but you can configure any of the supported targets.

aws events put-targets --rule monitor-secretsmanager --targets Id=1,Arn=arn:aws:logs:us-east-1:ACCOUNT_ID:log-group:/aws/events/getsecretvaluelogs --region us-east-1

Target details

You can then use CloudWatch Log Insights to search and analyze log data using the CloudWatch Log Insights query syntax where you can retrieve the user who performed these calls.

Identifying suspicious data exfiltration behavior

Consider the security or data perimeter team who wants to secure data residing in Amazon S3 buckets. The team requires notifications whenever API calls to list S3 buckets or to list S3 objects are made.

When a user or application calls the ListBuckets API to discover the available buckets, it generates the following CloudTrail event:

{
    "version": "0",
    "id": "345ca690-6510-85b2-ff02-090493a33cf1",
    "detail-type": "AWS API Call via CloudTrail",
    "source": "aws.s3",
    "account": "111111111111",
    "time": "2023-11-14T17:25:30Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventVersion": "1.09",
        "userIdentity": {
            "type": "IAMUser",
            "principalId": "principal-identity-uuid",
            "arn": "arn:aws:iam::111111111111:user/exploited-user",
            "accountId": "111111111111",
            "accessKeyId": "AAAABBBBCCCCDDDDEEEE",
            "userName": "exploited-user"
        },
        "eventTime": "2023-11-14T17:25:30Z",
        "eventSource": "s3.amazonaws.com",
        "eventName": "ListBuckets",
        "awsRegion": "us-east-1",
        "sourceIPAddress": "11.22.33.44",
        "userAgent": "[aws-cli/2.13.29 Python/3.11.6 Darwin/22.6.0 exe/x86_64 prompt/off command/s3api.list-buckets]",
        "requestParameters": {
            "Host": "s3.us-east-1.amazonaws.com"
        },
        "readOnly": true,
        "eventType": "AwsApiCall",
        "managementEvent": true
        // additional detail fields
    }
}

In this scenario, you can create an EventBridge rule matching for aws.s3 for the source field, and ListBuckets for the eventName.

{
    "source": [
        "aws.s3"
    ],
    "detail-type": [
        "AWS API Call via CloudTrail"
    ],
    "detail": {
        "eventName": [
            "ListBuckets "
        ]
    }
}

However, listing objects alone might only be the beginning of a potential data exfiltration attempt. You may also want to check for ListObjects or ListObjectsV2 as the next action, followed by a large number of GetObject API calls. You can create the following rule to match those actions.

{
    "source": [
        "aws.s3"
    ],
    "detail-type": [
        "AWS API Call via CloudTrail"
    ],
    "detail": {
        "eventName": [
            "ListObjects",
            "ListObjectsV2",
            "GetObject"
        ]
    }
}

You could potentially forward this log information to your central security logging solution or use anomaly detection machine learning models to evaluate these events to determine the appropriate response to these events.

Configuring cross-account and cross-Region event routing

You can also create rules to receive the read-only events to cross account or cross-Region to centralize your AWS events into one Region or one AWS account for auditing and monitoring purposes. For example, capture all workload events from multiple Regions in eu-west-1 for compliance reporting.

Cross Account example for default event bus and custom event bus

To do this, create a rule using the new ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS state on the default bus of the source account or the Region, targeting either default event bus or a custom event bus of the target account or Region. You must also ensure you have a rule configured with ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS to be able to invoke the targets in the destination account or Region.

Cross-Region setup for CloudTrail read-only events

Conclusion

This blog shows how customers can build rich event-driven responses with the newly launched support for read-only events. You can now observe events as potential signals of reconnaissance and data exfiltration activities from any action taken on AWS infrastructure in near real time. You can also use the cross-Region and cross-account functionality to deliver the read-only events to a centralized AWS account or Region, enhancing the capability for auditing and monitoring across all your AWS environments.

For more serverless learning resources, visit Serverless Land.

AWS SAM support for HashiCorp Terraform now generally available

2023-09-05 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/aws-sam-support-for-hashicorp-terraform-now-generally-available/

In November 2022, AWS announced the public preview of AWS Serverless Application Model (AWS SAM) support for HashiCorp Terraform. The public preview introduces a subset of features to help Terraform users test serverless applications locally. Today, AWS is announcing the general availability of Terraform support in AWS SAM. This GA release expands AWS SAM’s feature set to enhance the local development of serverless applications.

Terraform and AWS SAM are both open-source frameworks allowing developers to define infrastructure as code (IaC). Developers can version and share infrastructure definitions in the same way they share code. However, because AWS SAM is specifically designed for serverless, it includes a command line interface (CLI) designed for serverless development. The CLI enables developers to create, debug, and deploy serverless applications using local emulators along with build and deployment tools. In this release, AWS SAM is making a subset of those tools to Terraform users as well.

Terraform support

The public preview blog demonstrated the initial support for Terraform. This blog demonstrates AWS SAM’s expanded feature set for local development. The blog also simplifies the implementation by using the Serverless.tf modules for AWS Lambda functions and layers rather than the native Terraform resources.

Modules can build the deployment artifacts for the Lambda functions and layers. Additionally, the module automatically generates the metadata required by AWS SAM to interface with the Terraform resources. To use the native Terraform resources, refer to the preview blog for metadata configuration.

Downloading the code

To explore AWS SAM’s support for Terraform, visit the aws-sam-terraform-examples repository. Clone the repository and change to the ga directory to get started:

git clone https://github.com/aws-samples/aws-sam-terraform-examples

cd ga

In this directory, there are two demo applications. Both of the applications are identical except for api_gateway_v1 uses an Amazon API Gateway REST API (v1) and api_gateway_v2 uses an Amazon API Gateway HTTP API (v2). Choose one and change to the tf-resources folder in that directory.

cd api_gateway_v1/tf-resources

Unless indicated otherwise, examples in this post reference the api_gateway_v1 application.

Code structure

Code structure diagram

Terraform supports spreading IaC across multiple files. Because of this, developers often collect all the Terraform files in a single directory and keep the resource files elsewhere. The example applications are configured this way.

Any Terraform or AWS SAM command must run from the location of the main.tf file, in this case, the tf-resources directory. Because AWS SAM commands are generally run from the project root, AWS SAM has a command to support nested structures. If running the sam build command from a nested folder, pass the flag terraform-project-root-path with a relative or absolute path to the root of the project.

Local invoke

The preview version of Terraform supported local invocation but the team simplified the experience with support for Serverless.tf. The demonstration applications have two functions in them. A responder function is the backend integration for the API Gateway endpoints and the Auth function is a custom authorizer. Find both module definitions in the functions.tf file.

Responder function

module "lambda_function_responder" {
  source        = "terraform-aws-modules/lambda/aws"
  version       = "~> 6.0"
  timeout       = 300
  source_path   = "${path.module}/src/responder/"
  function_name = "responder"
  handler       = "app.open_handler"
  runtime       = "python3.9"
  create_sam_metadata = true
  publish       = true
  allowed_triggers = {
    APIGatewayAny = {
      service    = "apigateway"
      source_arn = "${aws_api_gateway_rest_api.api.execution_arn}/*/*"
    }
  }
}

There are two important parameters:

source_path, which points to a local folder. Because this is not a zip file, Serverless.tf builds the artifacts as needed.
create_sam_data, which generates the metadata required for AWS SAM to locate the necessary files and modules.

To invoke the function locally, run the following commands:

Run build to run any build scripts

sam build --hook-name terraform --terraform-project-root-path ../

Run local invoke to invoke the desired Lambda function

sam local invoke --hook-name terraform --terraform-project-root-path ../ 'module.lambda_function_responder.aws_lambda_function.this[0]’

Because the project is Terraform, the hook-name parameter with the value terraform is required to let AWS SAM know how to proceed. The function name is a combination of the module name and the resource type that it becomes. If you are unsure of the name, run the command without the name:

sam local invoke --hook-name terraform

AWS SAM evaluates the template. If there is only one function, AWS SAM proceeds to invoke it. If there are more than one, as is the case here, AWS SAM asks you which one and provides a list of options.

Example error text

Auth function

The authorizer function requires some input data as a mock event. To generate a mock event for the api_gateway_v1 project:

sam local generate-event apigateway authorizer

For the api_gateway_v2 project use:

sam local generate-event apigateway request-authorizer

The resulting events are different because API Gateway REST and HTTP APIs can handle custom authorizers differently. In these examples, REST uses a standard token authorizer and returns the proper AWS Identity and Access Management (IAM) role. The HTTP API example uses a simple pass or fail option.

Each of the examples already has the properly formatted event for testing included at events/auth.json. To invoke the Auth function, run the following:

sam local invoke --hook-name terraform 'module.lambda_function_auth.aws_lambda_function.this[0]' -e events/auth.json

There is no need to run the sam build command again because the application has not changed.

Local start-api

You can now emulate a local version of API Gateway with the generally available release. Each of these examples have two endpoints. One endpoint is open and a custom authorizer secures the other. Both return the same response:

{
  “message”: “Hello TF World”,
  “location”: “ip address”
}

To start the local emulator, run the following:

sam local start-api –hook-name terraform

AWS SAM starts the emulator and exposes the two endpoints for local testing.

Open endpoint

Using curl, test the open endpoint:

curl --location http://localhost:3000/open

The local emulator processes the request and provides a response in the terminal window. The emulator also includes logs from the Lambda function.

Open endpoint example output

Auth endpoint

Test the secure endpoint and pass the extra required header, myheader:

curl -v --location http://localhost:3000/secure --header 'myheader: 123456789'

The endpoint returns an authorized response with the “Hello TF World” messaging. Try the endpoint again with an invalid header value:

curl --location http://localhost:3000/secure --header 'myheader: IamInvalid'

The endpoint returns an unauthenticated response.

Unauthenticated response

Parameters

There are several options when using AWS SAM with Terraform:

Hook-name: required for every command when working with Terraform. This informs AWS SAM that the project is a Terraform application.
Skip-prepare-infra: AWS SAM uses the terraform plan command identify and process all the required artifacts. However, it should only be run when new resources are added or modified. This option keeps AWS SAM from running the terraform plan command. If this flag is passed and a plan does not exist, AWS SAM ignores the flag and run the terraform plan command anyway.
Prepare-infra: forces AWS SAM to run the terraform plan command.
Terraform-project-root-path: overrides the current directory as the root of the project. You can use an absolute path (/path/to/project/root) or relative path (../ or ../../).
Terraform-plan-file: allows a developer to specify a specific Terraform plan file. This command also enables Terraform users to use local commands.

Combining these options can create long commands:

sam build --hook-name terraform --terraform-project-root-path ../

sam local invoke –hook-name terraform –skip-prepare-infra 'module.lambda_function_responder.aws_lambda_function.this[0]'

You can use the samconfig file to set defaults, shorten commands, and optimize the development process. Using the new samconfig YAML support, the file looks like this:

version: 0.1
default:
  global:
    parameters:
      hook_name: terraform
      skip_prepare_infra: true
  build:
    parameters:
      terraform_project_root_path: ../

By setting these defaults, the command is now shorter:

sam local invoke 'module.lambda_function_responder.aws_lambda_function.this[0]'

AWS SAM now knows it is a Terraform project and skips the preparation task unless the Terraform plan is missing. If a plan refresh is required, add the –prepare-infra flag to override the default setting.

Deployment and remote debugging

The applications in these projects are regular Terraform applications. Deploy them as any other Terraform project.

terraform plan
terraform apply

Currently, AWS SAM accelerate does not support Terraform projects. However, because Terraform deploys using the API method, serverless applications deploy quickly. Use a third party watch and the terraform apply –auto-approve command to approximate this experience.

For logging, take advantage of the sam logs command. Refer to the deploy output of the projects for an example of tailing the logs for one or all of the resources.

HashiCorp Cloud Platform

HashiCorp Cloud Platform allows developers to run deployments using a centralized location to maintain security and state. When developers run builds in the cloud, a local plan file is not available for AWS SAM to use in local testing and debugging. However, developers can generate a plan in the cloud and use the plan locally for development. For instructions, refer to the documentation.

Conclusion

HashiCorp Terraform is a popular IaC framework for building applications in the AWS Cloud. AWS SAM is an IaC framework and the CLI is specifically designed to help developers build serverless applications.

This blog covers the new AWS SAM support for Terraform and how developers can use them together to maximize the development experience. The blog covers locally invoking a single function, emulating API Gateway endpoints locally, and testing a Lambda authorizer locally before deploying. Finally, the blog deploys the application and uses AWS SAM to monitor the deployed resources.

For more serverless learning resources, visit Serverless Land.

Testing AWS Lambda functions with AWS SAM remote invoke

2023-06-23 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/testing-aws-lambda-functions-with-aws-sam-remote-invoke/

Developers are taking advantage of event driven architecture (EDA) to build large distributed applications. To build these applications, developers are using managed service like AWS Lambda, AWS Step Functions, and Amazon EventBridge to handle compute, orchestration, and choreography. Since these services run in the cloud, developers are also looking for ways to test in the cloud. With this in mind, AWS SAM is adding a new feature to the AWS SAM CLI called remote invoke.

AWS SAM remote invoke enables developers to invoke a Lambda function in the AWS Cloud from their development environment. The feature has several options for identifying the Lambda function to invoke, the payload event, and the output type.

Using remote invoke

To test the remote invoke feature, there is a small AWS SAM application that comprises two AWS Lambda functions. The TranslateFunction takes a text string and translates it to the target language using the AI/ML service Amazon Translate. The StreamFunction generates data in a streaming format. To run these demonstrations, be sure to install the latest AWS SAM CLI.

To deploy the application, follow these steps:

Clone the repository:

git clone https://github.com/aws-samples/aws-sam-remote-invoke-example

Change to the root directory of the repository:
```
cd aws-sam-remote-invoke-example
```
Build the AWS Lambda artifacts (use the –use-container option to ensure Python 3.10 and Node 18 are present. If these are both set up on your machine, you can ignore this flag):
```
sam build --use-container
```
Deploy the application to your AWS account:
```
sam deploy --guided
```
Name the application “remote-test” and choose all defaults.

AWS SAM can now remotely invoke the Lambda functions deployed with this application. Use the following command to test the TranslateFunction:

sam remote invoke --stack-name remote-test --event '{"message":"I am testing the power of remote invocation", "target-language":"es"}' TranslateFunction

This is a quick way to test a small event. However, developers often deal with large complex payloads. The AWS SAM remote invoke function also allows an event to be passed as a file. Use the following command to test:

sam remote invoke --stack-name remote-test --event-file './events/translate-event.json' TranslateFunction

With either of these methods, AWS SAM returns the response from the Lambda function as if it were called from a service like Amazon API Gateway. However, AWS SAM also offers the ability to get the raw response as returned from the Python software development kit (SDK), boto3. This format provides additional information such as the version that you invoked, if any retries were attempted, and more. To retrieve this output, run the invocation with the additional –output parameter with the value of json.

sam remote invoke --stack-name remote-test --event '{"message": "I am testing the power of remote invocation", "target-language": "es"}' --output json TranslateFunction

Full output from SDK

It is also possible to invoke Lambda functions that are not created in AWS SAM. Using the name of a Lambda function, AWS SAM can remotely invoke any Lambda function that you have permission to invoke. When you deployed the sample application, AWS SAM prints the name of the Lambda function in the console. Use the following command to print the output again:

sam list stack-outputs --stack-name remote-test

Using the output for the TranslateFunctionName, run:

sam remote invoke --event '{"message": "Testing direct access of the function", "target-language": "fr"}' <TranslateFunctionName>

Lambda recently added support from streaming responses from Lambda functions. Streaming functions do not wait until the entire response is available before they respond to the client. To show this, the StreamFunction generates multiple chunks of text and sends them over a period of time.

To invoke the function, run:

sam remote invoke --stack-name remote-test StreamFunction

Extending remote invoke

The AWS SDKs offer different options when invoking Lambda functions via the Lambda service. Behind the scenes, AWS SAM is using boto3 to power the remote invoke functionality. To make full use of the SDK options for Lambda function invocation, the AWS SAM offers a —parameter flag that can be used multiple times.

For example, you may want to run an invocation as a dry run only. This type of invocation tests Lambda’s ability to invoke the function based on factors like variable values and proper permissions. The command looks like the following:

sam remote invoke --stack-name remote-test --event '{"message": "I am testing the power of remote invocation", "target-language": "es"}' --parameter InvocationType=DryRun --output json TranslateFunction

In a second example, I want to invoke a specific version of the Lambda function:

sam remote invoke --stack-name remote-test --event '{"message": "I am testing the power of remote invocation", "target-language": "es"}' --parameter Qualifier='$LATEST' TranslateFunction

If you need both options:

sam remote invoke --stack-name remote-test --event '{"message": "I am testing the power of remote invocation", "target-language": "es"}' --parameter InvocationType=DryRun --parameter Qualifier='$LATEST' --output json TranslateFunction

Logging

When developing distributed applications, logging is a critical tool to trace the state of a request across decoupled microservices. AWS SAM offers the sam logs functionality to help view aggregated logs and traces from Amazon CloudWatch and AWS X-Ray, respectively. However, when testing individual functions, developers want contextual logs pinpointed to a specific invocation. The new remote invoke function provides these logs by default. Returning to the TranslateFunction, run the following command again:

sam remote invoke --stack-name remote-test --event '{"message": "I am testing the power of remote invocation", "target-language": "es"}' TranslateFunction

Logging response from remote invoke

The remote invocation returns the response from the Lambda function, any logging from within the Lambda function, followed by the final report from the Lambda service about the invocation itself.

Combining remote invoke with AWS SAM Accelerate

Developers are constantly striving to remove complexity and friction and improve speed and agility in the development pipeline. To help serverless developers towards this goal, the AWS SAM team released a feature called AWS SAM Accelerate. AWS SAM Accelerate is a series of features that move debugging and testing from the local machine to the cloud.

To show how AWS SAM Accelerate and remote invoke can work together, follow these steps:

In a separate terminal, start the AWS SAM sync process with the watch option:
```
sam sync --stack-name remote-test --use-container --watch
```

In a second window or tab, run the remote invoke function:

sam remote invoke --stack-name remote-test --event-file './events/translate-event.json' TranslateFunction

The combination of these two options provides a robust auto-deployment and testing environment. During iterations of code in the Lambda function, each time you save the file, AWS SAM syncs the code and any dependencies to the cloud. As needed, the remote invoke is then run to verify the code works as expected, with logging provided for each execution.

Conclusion

Serverless developers are looking for the most efficient way to test their applications in the AWS Cloud. They want to invoke an AWS Lambda function quickly without having to mock security, external services, or other environment variables. This blog shows how to use the new AWS SAM remote invoke feature to do just that.

This post shows how to invoke the Lambda function, change the payload type and location, and change the output format. It explains using this feature in conjunction with the AWS SAM Accelerate features to streamline the serverless development and testing process.

For more serverless learning resources, visit Serverless Land.

Developing a serverless Slack app using AWS Step Functions and AWS Lambda

2023-05-25 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/developing-a-serverless-slack-app-using-aws-step-functions-and-aws-lambda/

This blog was written by Sam Wilson, Cloud Application Architect and John Lopez, Cloud Application Architect.

Slack, as an enterprise collaboration and communication service, presents opportunities for builders to improve efficiency through implementing custom-written Slack Applications (apps). One such opportunity is to expose existing AWS resources to your organization without your employees needing AWS Management Console or AWS CLI access.

For example, a member of your data analytics team needs to trigger an AWS Step Functions workflow to reprocess a batch data job. Instead of granting the user direct access to the Step Functions workflow in the AWS Management Console or AWS CLI, you can provide access to invoke the workflow from within a designated Slack channel.

This blog covers how serverless architecture lets Slack users invoke AWS resources such as AWS Lambda functions and Step Functions via the Slack Desktop UI and Mobile UI using Slack apps. Serverless architecture is ideal for a Slack app because of its ability to scale. It can process thousands of concurrent requests for Slack users without the burden of managing operational overhead.

This example supports integration with other AWS resources via Step Functions. Visit the documentation for more information on integrations with other AWS resources.

This post explains the serverless example architecture, and walks through how to deploy the example in your AWS account. It demonstrates the example and discusses constraints discovered during development.

Overview

The code included in this post creates a Slack app built with a variety of AWS serverless services:

Amazon API Gateway receives all incoming requests from Slack. Step Functions coordinates request activities such as user validation, configuration retrieval, request routing, and response formatting.
A Lambda Function invokes Slack-specific authentication functionality and sends responses to the Slack UI.
Amazon EventBridge serves as a pub-sub integration between a request and the request processor.
Amazon DynamoDB stores permissions for each Slack user to ensure they only have access to resources you designate.
AWS Systems Manager stores the specific Slack channel where you use the Slack app.
AWS Secrets Manager stores the Slack app signing secret and bot token used for authentication.

AWS Cloud Development Kit (AWS CDK) deploys the AWS resources. This example can plug into any existing CI/CD platform of your choice.

Architectural overview

The desktop or mobile Slack user starts requests by using /my-slack-bot slash command or by interacting with a Slack Block Kit UI element.
API Gateway proxies the request and transforms the payload into a format that the request validator Step Functions workflow can accept.
The request validator triggers the Slack authentication Lambda function to verify that the request originates from the configured Slack organization. This Lambda function uses the Slack Bolt library for TypeScript to perform request authentication and extract request details into a consistent payload. Also, Secrets Manager stores a signing secret, which the Slack Bolt API uses during authentication.
The request validator queries the Authorized Users DynamoDB table with the username extracted from the request payload. If the user does not exist, the request ends with an unauthorized response.
The request validator retrieves the permitted channel ID and compares it to the channel ID found in the request. If the two channel IDs do not match, the request ends with an unauthorized response.
The request validator sends the request to the Command event bus in EventBridge.
The Command event bus uses the request’s action property to route the request to a request processor Step Functions workflow.
Each processor Step Functions workflow may build Slack Block Kit UI elements, send updates to existing UI elements, or invoke existing Lambda functions and Step Functions workflows.
The Slack desktop or mobile application displays new UI elements or presents updates to an existing request as it is processed. Users may interact with new UI elements to further a request or start over with an additional request.

This application architecture scales for production loads, providing capacity for processing thousands of concurrent Slack users (through the Slack platform itself), bypassing the need for direct AWS Management Console access. This architecture provides for easier extensibility of the chat application to support new commands as consumer needs arise.

Finally, the application’s architecture and implementation follow the AWS Well-Architected guidance for Operational Excellence, Security, Reliability, and Performance Efficiency.

CDK manages application infrastructure for the Operational Excellence Pillar.
Secrets Manager and DynamoDB encrypt data at rest for the Security Pillar.
This architecture scales horizontally with Lambda functions and Step Functions for the Reliability Pillar.
Serverless architecture removes the need to maintain physical servers for the Performance Efficiency Pillar.

Step Functions is suited for this example because the service supports integrations with many other AWS services. Step Functions allows this example to orchestrate interactions with Lambda functions, DynamoDB tables, and EventBridge event busses with minimal code.

This example takes advantage of Step Functions Express Workflows to support the high-volume, event-driven actions generated by the Slack app. The result of using Express Workflows is a responsive, scalable request processor capable of handling tens of thousands requests per second. To learn more, review the differences between standard and Express Workflows.

The presented example uses AWS Secrets Manager for storing and retrieving application credentials securely. AWS Secrets Manager provides the following benefits:

Central, secure storage of secrets via encryption-at-rest.
Ease of access management through AWS Identity and Access Management (IAM) permissions policies.
Out-of-the-box integration support with all AWS services comprising the architecture

In addition, this example uses the AWS Systems Manager Parameter Store service for persisting our application’s configuration data for the Slack Channel ID. Among the benefits offered by AWS System Manager, this example takes advantage of storing encrypted configuration data with versioning support.

This example features a variety of interactions for the Slack Mobile or Desktop application, including displaying a welcome screen, entering form information, and reporting process status. EventBridge enables this example to route requests from the Request Validator Step Function using the serverless Event Bus and decouple each action from one another. We configure Rules to associate an event to a request processor Step Function.

Walkthrough

As prerequisites, you need the following:

AWS CDK version 2.19.0 or later.
Node version 16+.
Docker-cli.
Git.
A personal or company Slack account with permissions to create applications.
The Slack Channel ID of a channel in your Workspace for integration with the Slack App.

Slack channel details

To get the Channel ID, open the context menu on the Slack channel and select View channel details. The modal displays your Channel ID at the bottom:

Slack bot channel details

To learn more, visit Slack’s Getting Started with Bolt for JavaScript page.

The README document for the GitHub Repository titled, Amazon Interactive Slack App Starter Kit, contains a comprehensive walkthrough, including detailed steps for:

Slack API configuration
Application deployment via AWS Cloud Development Kit (CDK)
Required updates for AWS Systems Manager Parameters and secrets

Demoing the Slack app

Start the Slack app by invoking the /my-slack-bot slash command.

Start sample Lambda invocation
From the My Slack Bot action menu, choose Sample Lambda.

Choosing sample Lambda

Choosing sample Lambda
Enter command input, choose Submit, then observe the response (this input value applies to the sample Lambda function).

Sample Lambda submit

Sample Lambda results
Start the Slack App by invoking the /my-slack-bot slash command, then select Sample State Machine:

Start sample state machine invocation

Select Sample State Machine
Enter command input, choose Submit, then observe the response (this input value applies to the downstream state machine)

Sample state machine submit

Sample state machine results

Constraints in this example

Slack has constraints for sending and receiving requests, but API Gateway’s mapping templates provide mechanisms for integrating with a variety of request constraints.

Slack uses application/x-www-form-urlencoded to send requests to a custom URL. We designed a request template to format the input from Slack into a consistent format for the Request Validator Step Function. Headers such as X-Slack-Signature and X-Slack-Request-Timestamp needed to be passed along to ensure the request from Slack was authentic.

Here is the request mapping template needed for the integration:

{
  "stateMachineArn": "arn:aws:states:us-east-1:827819197510:stateMachine:RequestValidatorB6FDBF18-FACc7f2PzNAv",
  "input": "{\"body\": \"$input.path('$')\", \"headers\": {\"X-Slack-Signature\": \"$input.params().header.get('X-Slack-Signature')\", \"X-Slack-Request-Timestamp\": \"$input.params().header.get('X-Slack-Request-Timestamp')\", \"Content-Type\": \"application/x-www-form-urlencoded\"}}"
}

Slack sends the message payload in two different formats: URL-encoded and JSON. Fortunately, the Slack Bolt for JavaScript library can merge the two request formats into a single JSON payload and handle verification.

Slack requires a 204 status response along with an empty body to recognize that a request was successful. An integration response template overrides the Step Function response into a format that Slack accepts.

Here is the response mapping template needed for the integration:

#set($context.responseOverride.status = 204)
{}

Conclusion

In this blog post, you learned how you can let your organization’s Slack users with the ability to invoke your existing AWS resources with no AWS Management Console or AWS CLI access. This serverless example lets you build your own custom workflows to meet your organization’s needs.

To learn more about the concepts discussed in this blog, visit:

For more serverless learning resources, visit Serverless Land.

Implementing cross-account CI/CD with AWS SAM for container-based Lambda functions

2023-05-10 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/implementing-cross-account-cicd-with-aws-sam-for-container-based-lambda/

This post is written by Chetan Makvana, Sr. Solutions Architect.

Customers use modular architectural patterns and serverless operational models to build more sustainable, scalable, and resilient applications in modern application development. AWS Lambda is a popular choice for building these applications.

If customers have invested in container tooling for their development workflows, they deploy to Lambda using the container image packaging format for workloads like machine learning inference or data intensive workloads. Using functions deployed as container images, customers benefit from the same operation simplicity, automation scaling, high availability, and native integration with many services.

Containerized applications often have several distinct environments and accounts, such as dev, test, and prod. An application has to go through a process of deployment and testing in these environments. One common pattern for deploying containerized applications is to have a central AWS create a single container image, and carry out deployment across other AWS accounts. To achieve automated deployment of the application across different environments, customers use CI/CD pipelines with familiar container tooling.

This blog post explores how to use AWS Serverless Application Model (AWS SAM) Pipelines to create a CI/CD deployment pipeline and deploy a container-based Lambda function across multiple accounts.

Solution overview

This example comprises three accounts: tooling, test, and prod. The tooling account is a central account where you provision the pipeline, and build the container. The pipeline deploys the container into Lambda in the test and prod accounts using AWS CodeBuild. It also requires the necessary resources in the test and prod account. This consists of an Identity and Access Management (IAM) role that trusts the tooling account and provides the required deployment-specific permissions. AWS CodeBuild assumes this IAM role in the tooling account to carry out deployment.

The solution uses AWS SAM Pipelines to create CI/CD deployment pipeline resources. It provides commands to generate the required AWS infrastructure resources and a pipeline configuration file that CI/CD system can use to deploy using AWS SAM. Find the example code for this solution in the GitHub repository.

Full solution architecture

AWS CodePipeline goes through these steps to deploy the container-based Lambda function in the test and prod accounts:

The developer commits the code of Lambda function into AWS CodeCommit or other source control repositories, which triggers the CI/CD workflow.
AWS CodeBuild builds the code, creates a container image, and pushes the image to the Amazon Elastic Container Registry (ECR) repository using AWS SAM.
AWS CodeBuild assumes a cross-account role for the test account.
AWS CodeBuild uses AWS SAM to deploy the Lambda function by pulling image from Amazon ECR.
If deployment is successful, AWS CodeBuild deploys the same image in prod account using AWS SAM.

Deploying the example

Prerequisites

An AWS account. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
AWS CLI installed and configured.
Git installed.
AWS SAM installed.
Setup .aws/credentials named profiles for tooling, test and prod accounts so you can run CLI and AWS SAM commands against them.

Set the TOOLS_ACCOUNT_ID, TEST_ACCOUNT_ID, and PROD_ACCOUNT_ID env variables:

export TOOLS_ACCOUNT_ID=<Tooling Account Id>
export TEST_ACCOUNT_ID=<Test Account Id>
export PROD_ACCOUNT_ID=<Prod Account Id>

Creating a Git Repository and pushing the code

Run the following command in the tooling account from your terminal to create a new CodeCommit repository:

aws codecommit create-repository --repository-name lambda-container-repo --profile tooling

Initialize the Git repository and push the code.

cd ~/environment/cicd-lambda-container
git init -b main
git add .
git commit -m "Initial commit"
git remote add origin codecommit://lambda-container-repo
git push -u origin main

Creating cross-account roles in test and prod accounts

For the pipeline to gain access to the test and production environment, it must assume an IAM role. In cross-account scenario, the IAM role for the pipeline must be created on the test and production accounts.

Change directory to the directory templates and run the following command to deploy roles to test and prod using respective named profiles.

Test Profile

cd ~/environment/cicd-lambda-container/templates
aws cloudformation deploy --template-file crossaccount_pipeline_roles.yml --stack-name codepipeline-crossaccount-roles --capabilities CAPABILITY_NAMED_IAM --profile test --parameter-overrides ToolAccountID=${TOOLS_ACCOUNT_ID}
aws cloudformation describe-stacks --stack-name codepipeline-crossaccount-roles --query "Stacks[0].Outputs" --output json --profile test

Open the codepipeline_parameters.json file from the root directory. Replace the value of TestCodePipelineCrossAccountRoleArn and TestCloudFormationCrossAccountRoleArn with the CloudFormation output value of CodePipelineCrossAccountRole and CloudFormationCrossAccountRole respectively.

Prod Profile

aws cloudformation deploy --template-file crossaccount_pipeline_roles.yml --stack-name codepipeline-crossaccount-roles --capabilities CAPABILITY_NAMED_IAM --profile prod --parameter-overrides ToolAccountID=${TOOLS_ACCOUNT_ID}
aws cloudformation describe-stacks --stack-name codepipeline-crossaccount-roles --query "Stacks[0].Outputs" --output json --profile prod

Open the codepipeline_parameters.json file from the root directory. Replace the value of ProdCodePipelineCrossAccountRoleArn and ProdCloudFormationCrossAccountRoleArn with the CloudFormation output value of CodePipelineCrossAccountRole and CloudFormationCrossAccountRole respectively.

Creating the required IAM roles and infrastructure in the tooling account

Change to the templates directory and run the following command using tooling named profile:

aws cloudformation deploy --template-file tooling_resources.yml --stack-name tooling-resources --capabilities CAPABILITY_NAMED_IAM --parameter-overrides TestAccountID=${TEST_ACCOUNT_ID} ProdAccountID=${PROD_ACCOUNT_ID} --profile tooling
aws cloudformation describe-stacks --stack-name tooling-resources --query "Stacks[0].Outputs" --output json --profile tooling

Open the codepipeline_parameters.json file from the root directory. Replace value of ImageRepositoryURI, ArtifactsBucket, ToolingCodePipelineExecutionRoleArn, and ToolingCloudFormationExecutionRoleArn with the corresponding CloudFormation output value.

Updating cross-account IAM roles

The cross-account IAM roles on the test and production account require permission to access artifacts that contain application code (S3 bucket and ECR repository). Note that the cross-account roles are deployed twice. This is because there is a circular dependency on the roles in the test and prod accounts and the pipeline artifact resources provisioned in the tooling account.

The pipeline must reference and resolve the ARNs of the roles it needs to assume to deploy the application to the test and prod accounts, so the roles must be deployed before the pipeline is provisioned. However, the policies attached to the roles need to include the S3 bucket and ECR repository. But the S3 bucket and ECR repository don’t exist until the resources deploy in the preceding step. By deploying the roles twice, once without a policy so their ARNs resolve, and a second time to attach policies to the existing roles that reference the resources in the tooling account.

Replace ImageRepositoryArn and ArtifactBucketArn with output value from the above step in the below command and run it from the templates directory using Test and Prod named profiles.

Test Profile

aws cloudformation deploy --template-file crossaccount_pipeline_roles.yml --stack-name codepipeline-crossaccount-roles --capabilities CAPABILITY_NAMED_IAM --profile test --parameter-overrides ToolAccountID=${TOOLS_ACCOUNT_ID} ImageRepositoryArn=&lt;ImageRepositoryArn value&gt; ArtifactsBucketArn=&lt;ArtifactsBucketArn value&gt;

Prod Profile

aws cloudformation deploy --template-file crossaccount_pipeline_roles.yml --stack-name codepipeline-crossaccount-roles --capabilities CAPABILITY_NAMED_IAM --profile prod --parameter-overrides ToolAccountID=${TOOLS_ACCOUNT_ID} ImageRepositoryArn=&lt;ImageRepositoryArn value&gt; ArtifactsBucketArn=&lt;ArtifactsBucketArn value&gt;

Deploying the pipeline

Replace DeploymentRegion value with the current Region and CodeCommitRepositoryName value with the CodeCommit repository name in codepipeline_parameters.json file.

Push the changes to CodeCommit repository using Git commands.

Replace CodeCommitRepositoryName value with the CodeCommit repository name created in the first step and run the following command from the root directory of the project using tooling named profile.

sam deploy -t codepipeline.yaml --stack-name cicd-lambda-container-pipeline --capabilities=CAPABILITY_IAM --parameter-overrides CodeCommitRepositoryName=&lt;CodeCommit Repository Name&gt; --profile tooling

Cleaning Up

Run the following command in root directory of the project to delete the pipeline:
```
sam delete --stack-name cicd-lambda-container-pipeline --profile tooling
```
Empty the artifacts bucket. Replace the artifacts bucket name with the output value from the preceding step:
```
aws s3 rm s3://&lt;Arifacts bucket name&gt; --recursive --profile tooling
```

Delete the Lambda functions from the test and prod accounts:

aws cloudformation delete-stack --stack-name lambda-container-app-test --profile test
aws cloudformation delete-stack --stack-name lambda-container-app-prod --profile prod

Delete cross-account roles from the test and prod accounts:

aws cloudformation delete-stack --stack-name codepipeline-crossaccount-roles --profile test
aws cloudformation delete-stack --stack-name codepipeline-crossaccount-roles --profile prod

Delete the ECR repository:

aws ecr delete-repository --repository-name image-repository --profile tooling --force

Delete resources from the tooling account:

aws cloudformation delete-stack --stack-name tooling-resources --profile tooling

Conclusion

This blog post discusses how to automate deployment of container-based Lambda across multiple accounts using AWS SAM Pipelines.

Navigate to the GitHub repository and review the implementation to see how CodePipeline pushes container image to Amazon ECR, and deploys image to Lambda using cross-account role. Examine the codepipeline.yml file to see how the AWS SAM Pipelines creates CI/CD resources using this template.

For more serverless learning resources, visit Serverless Land.

Implementing error handling for AWS Lambda asynchronous invocations

2023-04-25 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/implementing-error-handling-for-aws-lambda-asynchronous-invocations/

This blog is written by Poornima Chand, Senior Solutions Architect, Strategic Accounts and Giedrius Praspaliauskas, Senior Solutions Architect, Serverless.

AWS Lambda functions allow both synchronous and asynchronous invocations, which both have different function behaviors and error handling:

When you invoke a function synchronously, Lambda returns any unhandled errors in the function code back to the caller. The caller can then decide how to handle the errors. With asynchronous invocations, the caller does not wait for a response from the function code. It hands off the event to the Lambda service to handle the process.

As the caller does not have visibility of any downstream errors, error handling for asynchronous invocations can be more challenging and must be implemented at the Lambda service layer.

This post explains the error behaviors and approaches for handling errors in Lambda asynchronous invocations to build reliable serverless applications.

Overview

AWS services such as Amazon S3, Amazon SNS, and Amazon EventBridge invoke Lambda functions asynchronously. When you invoke a function asynchronously, the Lambda service places the event in an internal queue and returns a success response without additional information. A separate process reads the events from the queue and sends those to the function.

You can configure how a Lambda function handles the errors either by implementing error handling within the code and using the error handling features provided by the Lambda service. The following diagram depicts the solution options for observing and handling errors in asynchronous invocations.

Architectural overview

Understanding the error behavior

When you invoke a function, two types of errors can occur. Invocation errors occur if the Lambda service rejects the request before the function receives it (throttling and system errors (400-series and 500-series)). Function errors occur when the function’s code or runtime returns an error (exceptions and timeouts). The Lambda service retries the function invocation if it encounters unhandled errors in an asynchronous invocation.

The retry behavior is different for invocation errors and function errors. For function errors, the Lambda service retries twice by default, and these additional invocations incur cost. For throttling and system errors, the service returns the event to the event queue and attempts to run the function again for up to 6 hours, using exponential backoff. You can control the default retry behavior by setting the maximum age of an event (up to 6 hours) and the retry attempts (0, 1 or 2). This allows you to limit the number of retries and avoids retrying obsolete events.

Handling the errors

Depending on the error type and behaviors, you can use the following options to implement error handling in Lambda asynchronous invocations.

Lambda function code

The most typical approach to handling errors is to address failures directly in the function code. While implementing this approach varies across programming languages, it commonly involves the use of a try/catch block in your code.

Error handling within the code may not cover all potential errors that could occur during the invocation. It may also affect Lambda error metrics in CloudWatch if you suppress the error. You can address these scenarios by using the error handling features provided by Lambda.

Failure destinations

You can configure Lambda to send an invocation record to another service, such as Amazon SQS, SNS, Lambda, or EventBridge, using AWS Lambda Destination. The invocation record contains details about the request and response in JSON format. You can configure separate destinations for events that are processed successfully, and events that fail all processing attempts.

With failure destinations, after exhausting all retries, Lambda sends a JSON document with details about the invocation and error to the destination. You can use this information to determine re-processing strategy (for example, extended logging, separate error flow, manual processing).

For example, to use Lambda destinations in an AWS Serverless Application Model (AWS SAM) template:

ProcessOrderForShipping:
    Type: AWS::Serverless::Function
    Properties:
      Description: Function that processes order before shipping
      Handler: src/process_order_for_shipping.lambda_handler
      EventInvokeConfig:
        DestinationConfig:
          OnSuccess:
            Type: SQS
            Destination: !GetAtt ShipmentsJobsQueue.Arn 
          OnFailure:
            Type: Lambda
            Destination: !GetAtt ErrorHandlingFunction.Arn

Dead-letter queues

You can use dead-letter queues (DLQ) to capture failed events for re-processing. With DLQs, message attributes capture error details. You can configure a standard SQS queue or standard SNS topic as a dead-letter queue for discarded events. For dead-letter queues, Lambda only sends the content of the event, without details about the response.

This is an example of using dead-letter queues in an AWS SAM template:

SendOrderToShipping:
    Type: AWS::Serverless::Function
    Properties:
      Description: Function that sends order to shipping
      Handler: src/send_order_to_shipping.lambda_handler
      DeadLetterQueue:
        Type: SQS
        TargetArn: !GetAtt OrderShippingFunctionDLQ.Arn

Design considerations

There are a number of design considerations when using DLQs:

Error handling within the function code works well for issues that you can easily address in the code. For example, retrying database transactions in the case of failures because of disruptions in network connectivity.
Scenarios that require complex error handling logic (for example, sending failed messages for manual re-processing) are better handled using Lambda service features. This approach would keep the function code simpler and easy to maintain.
Even though the dead-letter queue’s behavior is the same as an on-failure destination, a dead-letter queue is part of a function’s version-specific configuration.
Invocation records sent to on-failure destinations contain more information about the failure than DLQ message attributes. This includes the failure condition, error message, stack trace, request, and response payloads.
Lambda destinations also support additional targets, such as other Lambda functions and EventBridge. This allows destinations to give you more visibility and control of function execution results, and reduce code.

Gaining visibility into errors

Understanding of the behavior and errors cannot rely on error handling alone.

You also want to know why errors address the underlying issues. You must also know when there is elevated error rate, the expected baseline for the errors, other activities in the system when errors happen. Monitoring and observability, including metrics, logs and tracing, brings visibility to the errors and underlying issues.

Metrics

When a function finishes processing an event, Lambda sends metrics about the invocation to Amazon CloudWatch. This includes metrics for the errors that happen during the invocation that you should monitor and react to:

Errors – the number of invocations that result in a function error (include exceptions that both your code and the Lambda runtime throw).
Throttles – the number of invocation requests that are throttled (note that throttled requests and other invocation errors don’t count as errors in the previous metric).

There are also metrics specific to the errors in asynchronous invocations:

AsyncEventsDropped – the number of events that are dropped without successfully running the function.
DeadLetterErrors – the number of times that Lambda attempts to send an event to a dead-letter queue (DLQ) but fails (typically because of mis-configured resources or size limits).
DestinationDeliveryFailures – the number of times that Lambda attempts to send an event to a destination but fails (typically because of permissions, mis-configured resources, or size limits).

CloudWatch Logs

Lambda automatically sends logs to Amazon CloudWatch Logs. You can write to these logs using the standard logging functionality for your programming language. The resulting logs are in the CloudWatch Logs group that is specific to your function, named /aws/lambda/<function name>. You can use CloudWatch Logs Insights to query logs across multiple functions.

AWS X-Ray

AWS X-Ray can visualize the components of your application, identify performance bottlenecks, and troubleshoot requests that resulted in an error. Keep in mind that AWS X-Ray does not trace all requests. The sampling rate is one request per second and 5 percent of additional requests (this is non-configurable). Do not rely on AWS X-Ray as an only tool while troubleshooting a particular failed invocation as it may be missing in the sampled traces.

Conclusion

This blog post walks through error handling in the asynchronous Lambda function invocations using various approaches and discusses how to gain observability into those errors.

For more detail on the topics covered, visit:

For more serverless learning resources, visit Serverless Land.

Architecture patterns for consuming private APIs cross-account

2022-12-15 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/architecture-patterns-for-consuming-private-apis-cross-account/

This blog written by Thomas Moore, Senior Solutions Architect and Josh Hart, Senior Solutions Architect.

Amazon API Gateway allows developers to create private REST APIs that are only accessible from a virtual private cloud (VPC). Traffic to the private API uses secure connections and does not leave the AWS network, meaning AWS isolates it from the public internet. This makes private API Gateway endpoints a good fit for publishing internal APIs, such as those used by backend microservice communication.

In microservice architectures, where multiple teams build and manage components, different AWS accounts often consume private API endpoints.

This blog post shows how a service can consume a private API Gateway endpoint that is published in another AWS account securely over AWS PrivateLink.

Consuming API Gateway private endpoint cross-account via AWS PrivateLink.

This blog covers consuming API Gateway endpoints cross-account. For exposing cross-account resources behind an API Gateway, read this existing blog post.

Overview

To access API Gateway private endpoints, you must create an interface VPC endpoint (named execute-api) inside your VPC. This creates an AWS PrivateLink connection between your AWS account VPC and the API Gateway service VPC. The PrivateLink connection allows traffic to flow over private IP address space without traversing the internet.

PrivateLink allows access to private API Gateway endpoints in different AWS accounts, without VPC peering, VPN connections, or AWS Transit Gateway. A single execute-api endpoint is used to connect to any API Gateway, regardless of which AWS account the destination API Gateway is in. Resource policies control which VPC endpoints have access to the API Gateway private endpoint. This makes the cross-account architecture simpler, with no complex routing or inter-vpc connectivity.

The following diagram shows how interface VPC endpoints in a consumer account create a PrivateLink connection back to the API Gateway service account VPC. The resource policy applied to the private API determines which VPC endpoint can access the API. For this reason, it is critical to ensure that the resource policy is correct to prevent unintentional access from other AWS account VPC endpoints.

Access to private API Gateway endpoints requires an AWS PrivateLink connection to an AWS service account VPC.

In this example, the resource policy denies all connections to the private API endpoint unless the aws:SourceVpce condition matches vpce-1a2b3c4d in account A. This means that connections from other execute-api VPC endpoints are denied. To allow access from account B, add vpce-9z8y7x6w to the resource policy. Refer to the documentation to learn about other condition keys you can use in API Gateway resource policies.

For more detail on how VPC links work, read Understanding VPC links in Amazon API Gateway private integrations.

The following sections cover three architecture patterns to consume API Gateway private endpoints cross-account:

Regional API Gateway to private API Gateway
Lambda function calling API Gateway in another account
Container microservice calling API Gateway in another account using mTLS

Regional API Gateway to private API Gateway cross-account

When building microservices in different AWS accounts, private API Gateway endpoints are often used to allow service-to-service communication. Sometimes a portion of these endpoints must be exposed publicly for end user consumption. One pattern for this is to have a central public API Gateway, which acts as the front-door to multiple private API Gateway endpoints. This allows for central governance of authentication, logging and monitoring.

The following diagram shows how to achieve this using a VPC link. VPC links enable you to connect API Gateway integrations to private resources inside a VPC. The API Gateway VPC interface endpoint is the VPC resource that you want to connect to, as this is routing traffic to the private API Gateway endpoints in different AWS accounts.

API Gateway Regional endpoint consuming API Gateway private endpoints cross-account

VPC link requires the use of a Network Load Balancer (NLB). The target group of the NLB points to the private IP addresses of the VPC endpoint, normally one for each Availability Zone. The target group health check must validate the API Gateway service is online. You can use the API Gateway reserved /ping path for this, which returns an HTTP status code of 200 when the service is healthy.

You can deploy this pattern in your own account using the example CDK code found on GitHub.

Lambda function calling private API Gateway cross-account

Another popular requirement is for AWS Lambda functions to invoke private API Gateway endpoints cross-account. This enables service-to-service communication in microservice architectures.

The following diagram shows how to achieve this using interface endpoints for Lambda, which allows access to private resources inside your VPC. This allows Lambda to access the API Gateway VPC endpoint and, therefore, the private API Gateway endpoints in another account.

Consuming API Gateway private endpoints from Lambda cross-account

Unlike the previous example, there is no NLB or VPC link required. The resource policy on the private API Gateway must allow access from the VPC endpoint in the account where the consuming Lambda function is.

As the Lambda function has a VPC attachment, it will use DNS resolution from inside the VPC. This means that if you selected the Enable Private DNS Name option when creating the interface VPC endpoint for API Gateway the https://{restapi-id}.execute-api.{region}.amazonaws.com endpoint will automatically resolve to private IP addresses. Note that this DNS configuration can block access from Regional and edge-optimized API endpoints from inside the VPC. For more information, refer to the knowledge center article.

You can deploy this pattern in your own account using the sample CDK code found on GitHub.

Calling private API Gateway cross-account with mutual TLS (mTLS)

Customers that operate in regulated industries, such as open banking, must often implement mutual TLS (mTLS) for securely accessing their APIs. It is also great for Internet of Things (IoT) applications to authenticate devices using digital certificates.

Mutual TLS (mTLS) verifies both the client and server via certificates with TLS

Regional API Gateway has native support for mTLS but, currently, private API Gateway does not support mTLS, so you must terminate mTLS before the API Gateway. One pattern is to implement a proxy service in the producer account that resolves the mTLS handshake, terminates mTLS, and proxies the request to the private API Gateway over regular HTTPS.

The following diagram shows how to use a combination of PrivateLink, an NGINX-based proxy, and private API Gateway to implement mTLS and consume the private API across accounts.

Consuming API Gateway private endpoints cross-account with mTLS

In this architecture diagram, Amazon ECS Fargate is used to host the container task running the NGINX proxy server. This proxy validates the certificate passed by the connecting client before passing the connection to API Gateway via the execute-proxy VPC endpoint. The following sample NGINX configuration shows how the mTLS proxy service works by using ssl_verify_client and ssl_client_certificate settings to verify the connecting client’s certificate, and proxy_pass to forward the request onto API Gateway.

server {
    listen 443 ssl;

    ssl_certificate     /etc/ssl/server.crt;
    ssl_certificate_key /etc/ssl/server.key;
    ssl_protocols       TLSv1.2;
    ssl_prefer_server_ciphers on;
    ssl_ciphers ECDH+AESGCM:ECDH+AES256:ECDH+AES128:DH+3DES:!ADH:!AECDH:!MD5;

    ssl_client_certificate /etc/ssl/client.crt;
    ssl_verify_client      on;

    location / {
        proxy_pass https://{api-gateway-endpoint-api};
    }
}

The connecting client must supply the client certificate when connecting to the API via the VPC endpoint service:

curl --key client.key --cert client.crt --cacert server.crt https://{vpc-endpoint-service-url}

Use VPC security group rules on both the VPC endpoint and the NGINX proxy to prevent clients bypassing the mTLS endpoint and connecting directly to the API Gateway endpoint.

There is an example NGINX config and Dockerfile to configure this solution in the GitHub repository.

Conclusion

This post explores three solutions to consume private API Gateway across AWS accounts. A key component of all the solutions is the VPC interface endpoint. Using VPC Endpoints & PrivateLink, you can consume resources securely and even your own microservices across AWS accounts. For more details, read Enabling New SaaS Strategies with AWS PrivateLink. Visit the GitHub repository to get started implementing one of these solutions today.

For more serverless learning resources, visit Serverless Land.

Reducing Java cold starts on AWS Lambda functions with SnapStart

2022-11-30 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/reducing-java-cold-starts-on-aws-lambda-functions-with-snapstart/

Written by Mark Sailes, Senior Serverless Solutions Architect, AWS.

At AWS re:Invent 2022, AWS announced SnapStart for AWS Lambda functions running on Java Corretto 11. This feature enables customers to achieve up to 10x faster function startup performance for Java functions, at no additional cost, and typically with minimal or no code changes.

Overview

Today, for Lambda’s function invocations, the largest contributor to startup latency is the time spent initializing a function. This includes loading the function’s code and initializing dependencies. For interactive workloads that are sensitive to start-up latencies, this can cause suboptimal end user experience.

To address this challenge, customers either provision resources ahead of time, or spend effort building relatively complex performance optimizations, such as compiling with GraalVM native-image. Although these workarounds help reduce the startup latency, users must spend time on some heavy lifting instead of focusing on delivering business value. SnapStart addresses this concern directly for Java-based Lambda functions.

How SnapStart works

With SnapStart, when a customer publishes a function version, the Lambda service initializes the function’s code. It takes an encrypted snapshot of the initialized execution environment, and persists the snapshot in a tiered cache for low latency access.

When the function is first invoked and then scaled, Lambda resumes the execution environment from the persisted snapshot instead of initializing from scratch. This results in a lower startup latency.

Lambda function lifecycle

A function version activated with SnapStart transitions to an inactive state if it remains idle for 14 days, after which Lambda deletes the snapshot. When you try to invoke a function version that is inactive, the invocation fails. Lambda sends a SnapStartNotReadyException and begins initializing a new snapshot in the background, during which the function version remains in Pending state. Wait until the function reaches the Active state, and then invoke it again. To learn more about this process and the function states, read the documentation.

Using SnapStart

Application frameworks such as Spring give developers an enormous productivity gain by reducing the amount of boilerplate code they write to accomplish common tasks. When first created, frameworks didn’t have to consider startup time because they run on application servers, which run for long periods of time. The startup time is minimal compared to the running duration. You often only restart them when there is an application version change.

If the functionality that these frameworks bring is implemented at runtime, then they often contribute to latency in startup time. SnapStart allows you to use frameworks like Spring and not compromise tail latency.

To demonstrate SnapStart, I use a sample application that saves records into Amazon DynamoDB. This Spring Boot application (TODO Link) uses a REST controller to handle CRUD requests. This sample includes infrastructure as code to deploy the application using the AWS Serverless Application Model (AWS SAM). You must install the AWS SAM CLI to deploy this example.

To deploy:

Clone the git repository and change to project directory:

git clone https://github.com/aws-samples/serverless-patterns.git
cd serverless-patterns/apigw-lambda-snapstart

Use the AWS SAM CLI to build the application:
```
sam build
```
Use the AWS SAM CLI to deploy the resources to your AWS account:
```
sam deploy -g
```

This project deploys with SnapStart already enabled. To enable or disable this functionality in the AWS Management Console:

Navigate to your Lambda function.
Select the Configuration tab.
Choose Edit and change the SnapStart attribute to PublishedVersions.
Choose Save.

Lambda Console confoguration
Select the Versions tab and choose Publish new.
Choose Publish.

Once you’ve enabled SnapStart, Lambda publishes all subsequent versions with snapshots. The time to run your publish version depends on your init code. You can run init up to 15 minutes with this feature.

Considerations

Stale credentials

Using SnapStart and restoring from a snapshot often changes how you create functions. With on-demand functions, you might access one time data in the init phase, and then reuse it during future invokes. If this data is ephemeral, a database password for example, then there might be a time between fetching the secret and using it, that the password has changed leading to an error. You must write code to handle this error case.

With SnapStart, if you follow the same approach, your database password is persisted in an encrypted snapshot. All future execution environments have the same state. This can be days, weeks, or longer after the snapshot is taken. This makes it more likely that your function has the incorrect password stored. To improve this, you could move the functionality to fetch the password to the post-snapshot hook. With each approach, it is important to understand your application’s needs and handle errors when they occur.

Demo application architecture

A second challenge in sharing the initial state is with randomness and uniqueness. If random seeds are stored in the snapshot during the initialization phase, then it may cause random numbers to be predictable.

Cryptography

AWS has changed the managed runtime to help customers handle the effects of uniqueness and randomness when restoring functions.

Lambda has already incorporated updates to Amazon Linux 2 and one of the commonly used cryptographic libraries, OpenSSL (1.0.2), to make them resilient to snapshot operations. AWS has also validated that Java runtime’s built-in RNG java.security.SecureRandom maintains uniqueness when resuming from a snapshot.

Software that always gets random numbers from the operating system (for example, from /dev/random or /dev/urandom) is already resilient to snapshot operations. It does not need updates to restore uniqueness. However, customers who prefer to implement uniqueness using custom code for their Lambda functions must verify that their code restores uniqueness when using SnapStart.

For more details, read Starting up faster with AWS Lambda SnapStart and refer to Lambda documentation on SnapStart uniqueness.

Runtime hooks

These pre- and post-hooks give developers a way to react to the snapshotting process.

For example, a function that must always preload large amounts of data from Amazon S3 should do this before Lambda takes the snapshot. This embeds the data in the snapshot so that it does not need fetching repeatedly. However, in some cases, you may not want to keep ephemeral data. A password to a database may be rotated frequently and cause unnecessary errors. I discuss this in greater detail in a later section.

The Java managed runtime uses the open-source Coordinated Restore at Checkpoint (CRaC) project to provide hook support. The managed Java runtime contains a customized CRaC context implementation that calls your Lambda function’s runtime hooks before completing snapshot creation and after restoring the execution environment from a snapshot.

The following function example shows how you can create a function handler with runtime hooks. The handler implements the CRaC Resource and the Lambda RequestHandler interface.

...
import org.crac.Resource;
import org.crac.Core;
...

public class HelloHandler implements RequestHandler<String, String>, Resource {

    public HelloHandler() {
        Core.getGlobalContext().register(this);
    }

    public String handleRequest(String name, Context context) throws IOException {
        System.out.println("Handler execution");
        return "Hello " + name;
    }

    @Override
    public void beforeCheckpoint(org.crac.Context<? extends Resource> context) throws Exception {
        System.out.println("Before Checkpoint");
    }

    @Override
    public void afterRestore(org.crac.Context<? extends Resource> context) throws Exception {
        System.out.println("After Restore");
    }
}

For the classes required to write runtime hooks, add the following dependency to your project:

Maven

<dependency>
  <groupId>io.github.crac</groupId>
  <artifactId>org-crac</artifactId>
  <version>0.1.3</version>
</dependency>

Gradle

implementation 'io.github.crac:org-crac:0.1.3'

Priming

SnapStart and runtime hooks give you new ways to build your Lambda functions for low startup latency. You can use the pre-snapshot hook to make your Java application as ready as possible for the first invoke. Do as much as possible within your function before the snapshot is taken. This is called priming.

When you upload your zip file of Java code to Lambda, the zip contains .class files of bytecode. This can be run on any machine with a JVM. When the JVM executes your bytecode, it is initially interpreted, then compiled into native machine code. This compilation stage is relatively CPU intensive and happens just in time (JIT Compiler).

You can use the before snapshot hook to run code paths before the snapshot is taken. The JVM compiles these code paths and the optimization is kept for future restores. For example, if you have a function that integrates with DynamoDB, you can make a read operation in your before snapshot hook.

This means that your function code, the AWS SDK for Java, and any other libraries used in that action are compiled and kept within the snapshot. The JVM then won’t need to compile this code when your function is invoked, meaning your latency is less the first time an execution environment is invoked.

Priming requires that you understand your application code and the consequences of executing it. The sample application includes a before snapshot hook, which primes the application by making a read operation from DynamoDB. (TODO Link)

Metrics

The following chart reflects invoking the sample application Lambda function 100 times per second for 10 minutes. This test is based on this function, both with and without SnapStart.

	p50	p99.9
On-demand	7.87ms	5,114ms
SnapStart	7.87ms	488ms

Conclusion

This blog shows how SnapStart reduces startup (cold-start) latencies times for Java-based Lambda functions. You can configure SnapStart using AWS SDK, AWS CloudFormation, AWS SAM, and CDK.

To learn more, see Configuring function options in the AWS documentation. This functionality may require some minimal code changes. In most cases, the existing code is already compatible with SnapStart. You can now bring your latency-sensitive Java-based workloads to Lambda and run with improved tail latencies.

This feature allows developers to use the on-demand model in Lambda with low-latency response times, without incurring extra cost. To read more about how to use SnapStart with partner frameworks, find out more from Quarkus and Micronaut. To read more about this and other features, visit Serverless Land.

Starting up faster with AWS Lambda SnapStart

2022-11-29 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/starting-up-faster-with-aws-lambda-snapstart/

This blog written by Tarun Rai Madan, Sr. Product Manager, AWS Lambda, and Mike Danilov, Sr. Principal Engineer, AWS Lambda.

AWS Lambda SnapStart is a new performance optimization developed by AWS that can significantly improve the startup time for applications. Announced at AWS re:Invent 2022, the first capability to feature SnapStart is Lambda SnapStart for Java. This feature delivers up to 10x faster function startup times for latency-sensitive Java applications at no extra cost, and with minimal or no code changes.

Overview

When applications start up, whether it’s an app on your phone, or a serverless Lambda function, they go through initialization. The initialization process can vary based on the application and the programming language, but even the smallest applications written in the most efficient programming languages require some kind of initialization before they can do anything useful. For a Lambda function, the initialization phase involves downloading the function’s code, starting the runtime and any external dependencies, and running the function’s initialization code. Ordinarily, for a Lambda function, this initialization happens every time your application scales up to create a new execution environment.

With SnapStart, the function’s initialization is done ahead of time when you publish a function version. Lambda takes a Firecracker microVM snapshot of the memory and disk state of the initialized execution environment, encrypts the snapshot, and caches it for low-latency access. When your application starts up and scales to handle traffic, Lambda resumes new execution environments from the cached snapshot instead of initializing them from scratch, improving startup performance.

The following diagram compares a cold start request lifecycle for a non-SnapStart function and a SnapStart function. The time it takes to initialize the function, which is the predominant contributor to high startup latency, is replaced by a faster resume phase with SnapStart.

Diagram of a non-SnapStart function versus a SnapStart function

Request lifecycle for a non-SnapStart function versus a SnapStart function

Front loading the initialization phase can significantly improve the startup performance for latency-sensitive Lambda functions, such as synchronous microservices that are sensitive to initialization time. Because Java is a dynamic language with its own runtime and garbage collector, Lambda functions written in Java can be amongst the slowest to initialize. For applications that require frequent scaling, the delay introduced by initialization, commonly referred to as a cold start, can lead to a suboptimal experience for end users. Such applications can now start up faster with SnapStart.

AWS’ work in Firecracker makes it simple to use SnapStart. Because SnapStart uses micro Virtual Machine (microVM) snapshots to checkpoint and restore full applications, the approach is adaptable and general purpose. It can be used to speed up many kinds of application starts. While microVMs have long been used for strong secure isolation between applications and environments, the ability to front-load initialization with SnapStart means that microVMs can also augment performance savings at scale.

SnapStart and uniqueness

Lambda SnapStart speeds up applications by re-using a single initialized snapshot to resume multiple execution environments. As a result, unique content included in the snapshot during initialization is reused across execution environments, and so may no longer remain unique. A class of applications where uniqueness of state is a key consideration is cryptographic software, which assumes that the random numbers are truly random (both random and unpredictable). If content such as a random seed is saved in the snapshot during initialization, it is re-used when multiple execution environments resume and may produce predictable random sequences.

To maintain uniqueness, you must verify before using SnapStart that any unique content previously generated during the initialization now gets generated after that initialization. This includes unique IDs, unique secrets, and entropy used to generate pseudo-randomness.

Multiple execution environments resumed from a shared snapshot

SnapStart life cycle

However, we have implemented a few things to make it easier for customers to maintain uniqueness.

First, it is not common or a best practice for applications to generate these unique items directly. Still, it’s worth confirming that your application handles uniqueness correctly. That’s usually a matter of checking for any unique IDs, keys, timestamps, or “homemade” entropy in the initializer methods for your function.

Lambda offers a SnapStart scanning tool that checks for certain categories of code that assume uniqueness, so customers can make changes as required. The SnapStart scanning tool is an open-source SpotBugs plugin that runs static analysis against a set of rules and reports “potential SnapStart bugs”. We are committed to engaging with the community to expand these set of rules against which the scanning tool checks the code.

As an example, the following Lambda function creates a unique log stream for each execution environment during initialization. This unique value is re-used across execution environments when they re-use a snapshot.

public class LambdaUsingUUID {

    private AWSLogsClient logs;
    private final UUID sandboxId;

    public LambdaUsingUUID() {
       sandboxId = UUID.randomUUID(); // <-- unique content created
       logs = new AWSLogsClient();
    }
    @Override
    public String handleRequest(Map<String,String> event, Context context) {
       CreateLogStreamRequest request = new CreateLogStreamRequest(
         "myLogGroup", sandboxId + ".log9.txt");
         logs.createLogStream(request);     
         return "Hello world!";
    }
}

When you run the scanning tool on the previous code, the following message helps identify a potential implementation that assumes uniqueness. One way to address such cases is to move the generation of the unique ID inside your function’s handler method.

H C SNAP_START: Detected a potential SnapStart bug in Lambda function initialization code. At LambdaUsingUUID.java: [line 7]

A best practice used by many applications is to rely on the system libraries and kernel for uniqueness. These have long-handled other cases where keys and IDs may be inadvertently duplicated, such as when forking or cloning processes. AWS has worked with upstream kernel maintainers and open source developers so that the existing protection mechanisms use the open standard VM Generation ID (vmgenid) that SnapStart supports. vmgenid is an emulated device, which exposes a 128-bit, cryptographically random integer value identifier to the kernel, and is statistically unique across all resumed microVMs.

Lambda’s included versions of Amazon Linux 2, OpenSSL (1.0.2), and java.security.SecureRandom all automatically re-initialize their randomness and secrets after a SnapStart. Software that always gets random numbers from the operating system (for example, from /dev/random or /dev/urandom) does not need any updates to maintain randomness. Because Lambda always reseeds /dev/random and /dev/urandom when restoring a snapshot, random numbers are not repeated even when multiple execution environments resume from the same snapshot.

Lambda’s request IDs are already unique for each invocation and are available using the getAwsRequestId() method of the Lambda request object. Most Lambda functions should require no modification to run with SnapStart enabled. It’s generally recommended that for SnapStart, you do not include unique state in the function’s initialization code, and use cryptographically secure random number generators (CSPRNGs) when needed.

Second, if you do want to create unique data directly in a Lambda function initialization phase, Lambda supports two new runtime hooks. Runtime hooks are available as part of the open-source Coordinated Restore at Checkpoint (CRaC) project. You can use the beforeCheckpoint hook to run code immediately before a snapshot is taken, and use the afterRestore hook to run code immediately after restoring a snapshot. This helps you delete any unique content before the snapshot is created, and restore any unique content after the snapshot is restored. For an example of how to use CRaC with a reference application, see the CRaC GitHub repository.

Conclusion

This blog describes how SnapStart optimizes startup performance under the hood, and outlines considerations around uniqueness. We also introduce the new interfaces that AWS Lambda provides (via scanning tool and runtime hooks) to customers to maintain uniqueness for their SnapStart functions.

SnapStart is made possible by several pieces of open-source work, including Firecracker, Linux, CraC, OpenSSL and more. AWS is grateful to the maintainers and developers who have made this possible. With this work, we’re excited to launch Lambda SnapStart for Java as what we hope is the first amongst many other capabilities to benefit from the performance savings and enhanced security that SnapStart microVMs provide.

For more serverless learning resources, visit Serverless Land.

Better together: AWS SAM CLI and HashiCorp Terraform

2022-11-15 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/better-together-aws-sam-cli-and-hashicorp-terraform/

This post is written by Suresh Poopandi, Senior Solutions Architect and Seb Kasprzak, Senior Solutions Architect.

Today, AWS is announcing the public preview of AWS Serverless Application Model CLI (AWS SAM CLI) support for local development, testing, and debugging of serverless applications defined using HashiCorp Terraform configuration.

AWS SAM and Terraform are open-source frameworks for building applications using infrastructure as code (IaC). Both frameworks allow building, changing, and managing cloud infrastructure in a repeatable way by defining resource configurations.

Previously, you could use the AWS SAM CLI to build, test, and debug applications defined by AWS SAM templates or through the AWS Cloud Development Kit (CDK). With this preview release, you can also use AWS SAM CLI to test and debug serverless applications defined using Terraform configurations.

Walkthrough of Terraform support

This blog post contains a sample Terraform template, which shows how developers can use AWS SAM CLI to build locally, test, and debug AWS Lambda functions defined in Terraform. This sample application has a Lambda function that stores a book review score and review text in an Amazon DynamoDB table. An Amazon API Gateway book review API uses Lambda proxy integration to invoke the book review Lambda function.

Demo application architecture

Prerequisites

Before running this example:

Install the AWS CLI.
- Configure with valid AWS credentials.
- Note that AWS CLI now requires Python runtime.
Install HashiCorp Terraform.
Install the AWS SAM CLI.
Install Docker (required to run AWS Lambda function locally).

Since Terraform support is currently in public preview, you must provide a –beta-features flag while executing AWS SAM commands. Alternatively, set this flag in samconfig.toml file by adding beta_features=”true”.

Deploying the example application

This Lambda function interacts with DynamoDB. For the example to work, it requires an existing DynamoDB table in an AWS account. Deploying this creates all the required resources for local testing and debugging of the Lambda function.

To deploy:

Clone the aws-sam-terraform-examples repository locally:

git clone https://github.com/aws-samples/aws-sam-terraform-examples

Change to the project directory:
```
cd aws-sam-terraform-examples/zip_based_lambda_functions/api-lambda-dynamodb-example/
```
Terraform must store the state of the infrastructure and configuration it creates. Terraform uses this state to map cloud resources to configuration and track changes. This example uses a local backend to store the state file on the local filesystem.
Open the main.tf file and review its contents. Locate the backend section of the code, updating the region field with the target deployment Region of this sample solution:
```
provider “aws” {
    region = “<AWS region>” # e.g. us-east-1
}
```
Initialize a working directory containing Terraform configuration files:
```
terraform init
```
Deploy the application using Terraform CLI. When prompted by “Do you want to perform these actions?”, enter Yes.
```
terraform apply
```

Terraform deploys the application, as shown in the terminal output.

Terminal output

After completing the deployment process, the AWS account is ready for use by the Lambda function with all the required resources.

Terraform Configuration for local testing

Lambda functions require application dependencies bundled together with function code as a deployment package (typically a .zip file) to run correctly. Terraform natively does not create the deployment package and a separate build process handles this package creation.

This sample application uses Terraform’s null_resource and local-exec provisioner to trigger a build process script. This installs Python dependencies in a temporary folder and creates a .zip file with dependencies and function code. It contains this logic within the main.tf file of the example application.

To explain each code segment in more detail:

Terraform example

aws_lambda_function: This sample defines a Lambda function resource. It contains properties such as environment variables (in this example, the DynamoDB table_id) and the depends_on argument, which creates the .zip package before deploying the Lambda function.

Terraform example
null_resource: When the AWS SAM CLI build command runs, AWS SAM reviews Terraform code for any null_resource starting with sam_metadata_ and uses the information contained within this resource block to gather the location of the Lambda function source code and .zip package. This information allows the AWS SAM CLI to start the local execution of the Lambda function. This special resource should contain the following attributes:
- resource_name: The Lambda function address as defined in the current module (aws_lambda_function.publish_book_review)
- resource_type: Packaging type of the Lambda function (ZIP_LAMBDA_FUNCTION)
- original_source_code: Location of Lambda function code
- built_output_path: Location of .zip deployment package

Local testing

With the backend services now deployed, run local tests to see if everything is working. The locally running sample Lambda function interacts with the services deployed in the AWS account. Run the sam build to reflect the local sam testing environment with changes after each code update.

Local Build: To create a local build of the Lambda function for testing, use the sam build command:
```
sam build --hook-name terraform --beta-features
```
Local invoke: The first test is to invoke the Lambda function with a mocked event payload from the API Gateway. These events are in the events directory. Run this command, passing in a mocked event:
```
AWS_DEFAULT_REGION=<Your Region Name>
sam local invoke aws_lambda_function.publish_book_review -e events/new-review.json --beta-features
```
AWS SAM mounts the Lambda function runtime and code and runs it locally. The function makes a request to the DynamoDB table in the cloud to store the information provided via the API. It returns a 200 response code, signaling the successful completion of the function.
Local invoke from AWS CLI
Another test is to run a local emulation of the Lambda service using “sam local start-lambda” and invoke the function directly using AWS SDK or the AWS CLI. Start the local emulator with the following command:
```
sam local start-lambda
```
Terminal output

AWS SAM starts the emulator and exposes a local endpoint for the AWS CLI or a software development kit (SDK) to call. With the start-lambda command still running, run the following command to invoke this function locally with the AWS CLI:
```
aws lambda invoke --function-name aws_lambda_function.publish_book_review --endpoint-url http://127.0.0.1:3001/ response.json --cli-binary-format raw-in-base64-out --payload file://events/new-review.json
```
The AWS CLI invokes the local function and returns a status report of the service to the screen. The response from the function itself is in the response.json file. The window shows the following messages:

Invocation results
Debugging the Lambda function

Developers can use AWS SAM with a variety of AWS toolkits and debuggers to test and debug serverless applications locally. For example, developers can perform local step-through debugging of Lambda functions by setting breakpoints, inspecting variables, and running function code one line at a time.

The AWS Toolkit Integrated Development Environment (IDE) plugin provides the ability to perform many common debugging tasks, like setting breakpoints, inspecting variables, and running function code one line at a time. AWS Toolkits make it easier to develop, debug, and deploy serverless applications defined using AWS SAM. They provide an experience for building, testing, debugging, deploying, and invoking Lambda functions integrated into IDE. Refer to this link that lists common IDE/runtime combinations that support step-through debugging of AWS SAM applications.

Visual Studio Code keeps debugging configuration information in a launch.json file in a workspace .vscode folder. Here is a sample launch configuration file to debug Lambda code locally using AWS SAM and Visual Studio Code.

{
    "version": "0.2.0",
    "configurations": [
          {
            "name": "Attach to SAM CLI",
            "type": "python",
            "request": "attach",
            "address": "localhost",
            "port": 9999,
            "localRoot": "${workspaceRoot}/sam-terraform/book-reviews",
            "remoteRoot": "/var/task",
            "protocol": "inspector",
            "stopOnEntry": false
          }
    ]
}

After adding the launch configuration, start a debug session in the Visual Studio Code.

Step 1: Uncomment the following two lines in zip_based_lambda_functions/api-lambda-dynamodb-example/src/index.py

Enable debugging in the Lambda function

Step 2: Run the Lambda function in the debug mode and wait for the Visual Studio Code to attach to this debugging session:

sam local invoke aws_lambda_function.publish_book_review -e events/new-review.json -d 9999

Step 3: Select the Run and Debug icon in the Activity Bar on the side of VS Code. In the Run and Debug view, select “Attach to SAM CLI” and choose Run.

For this example, set a breakpoint at the first line of lambda_handler. This breakpoint allows viewing the input data coming into the Lambda function. Also, it helps in debugging code issues before deploying to the AWS Cloud.

Debugging in then IDE

Lambda Terraform module

A community-supported Terraform module for lambda (terraform-aws-lambda) has added support for SAM metadata null_resource. When using the latest version of this module, AWS SAM CLI will automatically support local invocation of the Lambda function, without additional resource blocks required.

Conclusion

This blog post shows how to use the AWS SAM CLI together with HashiCorp Terraform to develop and test serverless applications in a local environment. With AWS SAM CLI’s support for HashiCorp Terraform, developers can now use the AWS SAM CLI to test their serverless functions locally while choosing their preferred infrastructure as code tooling.

For more information about the features supported by AWS SAM, visit AWS SAM. For more information about the Metadata resource, visit HashiCorp Terraform.

Support for the Terraform configuration is currently in preview, and the team is asking for feedback and feature request submissions. The goal is for both communities to help improve the local development process using AWS SAM CLI. Submit your feedback by creating a GitHub issue here.

For more serverless learning resources, visit Serverless Land.

Propagating valid mTLS client certificate identity to downstream services using Amazon API Gateway

2022-10-19 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/propagating-valid-mtls-client-certificate-identity-to-downstream-services-using-amazon-api-gateway/

This blog written by Omkar Deshmane, Senior SA and Anton Aleksandrov, Principal SA, Serverless.

This blog shows how to use Amazon API Gateway with a custom authorizer to process incoming requests, validate the mTLS client certificate, extract the client certificate subject, and propagate it to the downstream application in a base64 encoded HTTP header.

This pattern allows you to terminate mTLS at the edge so downstream applications do not need to perform client certificate validation. With this approach, developers can focus on application logic and offload mTLS certificate management and validation to a dedicated service, such as API Gateway.

Overview

Authentication is one of the core security aspects that you must address when building a cloud application. Successful authentication proves you are who you are claiming to be. There are various common authentication patterns, such as cookie-based authentication, token-based authentication, or the topic of this blog – a certificate-based authentication.

Transport Layer Security (TLS) certificates are at the core of a secure and safe internet. TLS certificates secure the connection between the client and server by encrypting data, ensuring private communication. When using the TLS protocol, the server must prove its identity to the client using a certificate signed by a certificate authority trusted by the client.

Mutual TLS (mTLS) introduces an additional layer of security, in which both the client and server must prove their identities to each other. Developers commonly use mTLS for application-to-application authentication — using digital certificates to represent both client and server apps is a common authentication pattern for building such workflows. We highly recommended decoupling the mTLS implementation from the application business logic so that you do not have to update the application when changing the mTLS configuration. It is a common pattern to implement the mTLS authentication and termination in a network appliance at the edge, such as Amazon API Gateway.

In this solution, we show a pattern of using API Gateway with an authorizer implemented with AWS Lambda to validate the mTLS client certificate, extract the client certificate subject, and propagate it to the downstream application in a base64 encoded HTTP header.

While this blog describes how to implement this pattern for identities extracted from the mTLS client certificates, you can generalize it and apply to propagating information obtained via any other means of authentication.

mTLS Sample Application

This blog includes a sample application implemented using the AWS Serverless Application Model (AWS SAM). It creates a demo environment containing resources like API Gateway, a Lambda authorizer, and an Amazon EC2 instance, which simulates the backend application.

The EC2 instance is used for the backend application to mimic common customer scenarios. You can use any other type of compute, such as Lambda functions or containerized applications with Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS), as a backend application layer as well.

The following diagram shows the solution architecture:

Example architecture diagram

Store the client certificate in a trust store in an Amazon S3 bucket.
The client makes a request to the API Gateway endpoint, supplying the client certificate to establish the mTLS session.
API Gateway retrieves the trust store from the S3 bucket. It validates the client certificate, matches the trusted authorities, and terminates the mTLS connection.
API Gateway invokes the Lambda authorizer, providing the request context and the client certificate information.
The Lambda authorizer extracts the client certificate subject. It performs any necessary custom validation, and returns the extracted subject to API Gateway as a part of the authorization context.
API Gateway injects the subject extracted in the previous step into the integration request HTTP header and sends the request to a downstream endpoint.
The backend application receives the request, extracts the injected subject, and uses it with custom business logic.

Prerequisites and deployment

Some resources created as part of this sample architecture deployment have associated costs, both when running and idle. This includes resources like Amazon Virtual Private Cloud (Amazon VPC), VPC NAT Gateway, and EC2 instances. We recommend deleting the deployed stack after exploring the solution to avoid unexpected costs. See the Cleaning Up section for details.

Refer to the project code repository for instructions to deploy the solution using AWS SAM. The deployment provisions multiple resources, taking several minutes to complete.

Following the successful deployment, refer to the RestApiEndpoint variable in the Output section to locate the API Gateway endpoint. Note this value for testing later.

AWS CloudFormation output

Key areas in the sample code

There are two key areas in the sample project code.

In src/authorizer/index.js, the Lambda authorizer code extracts the subject from the client certificate. It returns the value as part of the context object to API Gateway. This allows API Gateway to use this value in the subsequent integration request.

const crypto = require('crypto');

exports.handler = async (event) => {
    console.log ('> handler', JSON.stringify(event, null, 4));

    const clientCertPem = event.requestContext.identity.clientCert.clientCertPem;
    const clientCert = new crypto.X509Certificate(clientCertPem);
    const clientCertSub = clientCert.subject.replaceAll('\n', ',');

    const response = {
        principalId: clientCertSub,
        context: { clientCertSub },
        policyDocument: {
            Version: '2012-10-17',
            Statement: [{
                Action: 'execute-api:Invoke',
                Effect: 'allow',
                Resource: event.methodArn
            }]
        }
    };

    console.log('Authorizer Response', JSON.stringify(response, null, 4));
    return response;
};

In template.yaml, API Gateway injects the client certificate subject extracted by the Lambda authorizer previously into the integration request as X-Client-Cert-Sub HTTP header. X-Client-Cert-Sub is a custom header name and you can choose any other custom header name instead.

SayHelloGetMethod:
    Type: AWS::ApiGateway::Method
    Properties:
      AuthorizationType: CUSTOM
      AuthorizerId: !Ref CustomAuthorizer
      HttpMethod: GET
      ResourceId: !Ref SayHelloResource
      RestApiId: !Ref RestApi
      Integration:
        Type: HTTP_PROXY
        ConnectionType: VPC_LINK
        ConnectionId: !Ref VpcLink
        IntegrationHttpMethod: GET
        Uri: !Sub 'http://${NetworkLoadBalancer.DNSName}:3000/'
        RequestParameters:
          'integration.request.header.X-Client-Cert-Sub': 'context.authorizer.clientCertSub'

Testing the example

You create a client key and certificate during the deployment, which are stored in the /certificates directory. Use the curl command to make a request to the REST API endpoint using these files.

curl --cert certificates/client.pem --key certificates/client.key \
<use the RestApiEndpoint found in CloudFormation output>

Example flow diagram

The client request to API Gateway uses mTLS with a client certificate supplied for mutual TLS authentication. API Gateway uses the Lambda authorizer to extract the certificate subject, and inject it into the Integration request.

The HTTP server runs on the EC2 instance, simulating the backend application. It accepts the incoming request and echoes it back, supplying request headers as part of the response body. The HTTP response received from the backend application contains a simple message and a copy of the request headers sent from API Gateway to the backend.

One header is x-client-cert-sub with the Common Name value you provided to the client certificate during creation. Verify that the value matches the Common Name that you provided when generating the client certificate.

Response example

API Gateway validated the mTLS client certificate, used the Lambda authorizer to extract the subject common name from the certificate, and forwarded it to the downstream application.

Cleaning up

Use the sam delete command in the api-gateway-certificate-propagation directory to delete the sample application infrastructure:

sam delete

You can also refer to the project code repository for the clean-up instructions.

Conclusion

This blog shows how to use the API Gateway with a Lambda authorizer for mTLS client certificate validation, custom field extraction, and downstream propagation to backend systems. This pattern allows you to terminate mTLS at the edge so that downstream applications are not responsible for client certificate validation.

For additional documentation, refer to Using API Gateway with Lambda Authorizer. Download the sample code from the project code repository. For more serverless learning resources, visit Serverless Land.

Simplifying serverless permissions with AWS SAM Connectors

2022-10-12 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/simplifying-serverless-permissions-with-aws-sam-connectors/

This post written by Kurt Tometich, Senior Solutions Architect, AWS.

Developers have been using the AWS Serverless Application Model (AWS SAM) to streamline the development of serverless applications with AWS since late 2018. Besides making it easier to create, build, test, and deploy serverless applications, AWS SAM now further simplifies permission management between serverless components with AWS SAM Connectors.

Connectors allow the builder to focus on the relationships between components without expert knowledge of AWS Identity and Access Management (IAM) or direct creation of custom policies. AWS SAM connector supports AWS Step Functions, Amazon DynamoDB, AWS Lambda, Amazon SQS, Amazon SNS, Amazon API Gateway, Amazon EventBridge and Amazon S3, with more resources planned in the future.

AWS SAM policy templates are an existing feature that helps builders deploy serverless applications with minimally scoped IAM policies. Because there are a finite number of templates, they’re a good fit when a template exists for the services you’re using. Connectors are best for those getting started and who want to focus on modeling the flow of data and events within their applications. Connectors will take the desired relationship model and create the permissions for the relationship to exist and function as intended.

In this blog post, I show you how to speed up serverless development while maintaining secure best practices using AWS SAM connector. Defining a connector in an AWS SAM template requires a source, destination, and a permission (for example, read or write). From this definition, IAM policies with minimal privileges are automatically created by the connector.

Usage

Within an AWS SAM template:

Create serverless resource definitions.
Define a connector.
Add a source and destination ID of the resources to connect.
Define the permissions (read, write) of the connection.

This example creates a Lambda function that requires write access to an Amazon DynamoDB table to keep track of orders created from a website.

AWS Lambda function needing write access to an Amazon DynamoDB table

The AWS SAM connector for the resources looks like the following:

LambdaDynamoDbWriteConnector:
  Type: AWS::Serverless::Connector
  Properties:
    Source:
      Id: CreateOrder
    Destination:
      Id: Orders
    Permissions:
      - Write

“LambdaDynamoDbWriteConnector” is the name of the connector, while the “Type” designates it as an AWS SAM connector. “Properties” contains the source and destination logical ID for our serverless resources found within our template. Finally, the “Permissions” property defines a read or write relationship between the components.

This basic example shows how easy it is to define permissions between components. No specific role or policy names are required, and this syntax is consistent across many other serverless components, enforcing standardization.

Example

AWS SAM connectors save you time as your applications grow and connections between serverless components become more complex. Manual creation and management of permissions become error prone and difficult at scale. To highlight the breadth of support, we’ll use an AWS Step Functions state machine to operate with several other serverless components. AWS Step Functions is a serverless orchestration workflow service that integrates natively with other AWS services.

Solution overview

Architectural overview

This solution implements an image catalog moderation pipeline. Amazon Rekognition checks for inappropriate content, and detects objects and text in an image. It processes valid images and stores metadata in an Amazon DynamoDB table, otherwise emailing a notification for invalid images.

Prerequisites

Git installed
AWS SAM CLI version 1.58.0 or greater installed

Deploying the solution

Clone the repository and navigate to the solution directory:

git clone https://github.com/aws-samples/step-functions-workflows-collection
cd step-functions-workflows-collection/moderated-image-catalog

Open the template.yaml file located at step-functions-workflows-collection/moderated-image-catalog and replace the “ImageCatalogStateMachine:” section with the following snippet. Ensure to preserve YAML formatting.

ImageCatalogStateMachine:
    Type: AWS::Serverless::StateMachine
    Properties:
      Name: moderated-image-catalog-workflow
      DefinitionUri: statemachine/statemachine.asl.json
      DefinitionSubstitutions:
        CatalogTable: !Ref CatalogTable
        ModeratorSNSTopic: !Ref ModeratorSNSTopic
      Policies:
        - RekognitionDetectOnlyPolicy: {}

Within the same template.yaml file, add the following after the ModeratorSNSTopic section and before the Outputs section:

# Serverless connector permissions
StepFunctionS3ReadConnector:
  Type: AWS::Serverless::Connector
  Properties:
    Source:
      Id: ImageCatalogStateMachine
    Destination:
      Id: IngestionBucket
    Permissions:
      - Read

StepFunctionDynamoWriteConnector:
  Type: AWS::Serverless::Connector
  Properties:
    Source:
      Id: ImageCatalogStateMachine
    Destination:
      Id: CatalogTable
    Permissions:
      - Write

StepFunctionSNSWriteConnector:
  Type: AWS::Serverless::Connector
  Properties:
    Source:
      Id: ImageCatalogStateMachine
    Destination:
      Id: ModeratorSNSTopic
    Permissions:
      - Write

You have removed the existing inline policies for the state machine and replaced them with AWS SAM connector definitions, except for the Amazon Rekognition policy. At the time of publishing this blog, connectors do not support Amazon Rekognition. Take some time to review each of the connector’s syntax.

Deploy the application using the following command:
```
sam deploy --guided
```
Provide a stack name, Region, and moderators’ email address. You can accept defaults for the remaining prompts.

Verifying permissions

Once the deployment has completed, you can verify the correct role and policies.

Navigate to the Step Functions service page within the AWS Management Console and ensure you have the correct Region selected.
Select State machines from the left menu and then the moderated-image-catalog-workflow state machine.
Select the “IAM role ARN” link, which will take you to the IAM role and policies created.

You should see a list of policies that correspond to the AWS SAM connectors in the template.yaml file with the actions and resources.

Permissions list in console

You didn’t need to supply the specific policy actions: Use Read or Write as the permission and the service handles the rest. This results in improved readability, standardization, and productivity, while retaining security best practices.

Testing

Upload a test image to the Amazon S3 bucket created during the deployment step. To find the name of the bucket, navigate to the AWS CloudFormation console. Select the CloudFormation stack via the name entered as part of “sam deploy –guided.” Select the Outputs tab and note the IngestionBucket name.
After uploading the image, navigate to the AWS Step Functions console and select the “moderated-image-catalog-workflow” workflow.

Select Start Execution and input an event:

{
    "bucket": "<S3-bucket-name>",
    "key": "<image-name>.jpeg"
}

Select Start Execution and observe the execution of the workflow.
Depending on the image selected, it will either add to the image catalog, or send a content moderation email to the email address provided. Find out more about content considered inappropriate by Amazon Rekognition.

Cleanup

To delete any images added to the Amazon S3 bucket, and the resources created by this template, use the following commands from the same project directory.

aws s3 rm s3://< bucket_name_here> --recursive
sam delete

Conclusion

This blog post shows how AWS SAM connectors simplify connecting serverless components. View the Developer Guide to find out more about AWS SAM connectors. For further sample serverless workflows like the one used in this blog, see Serverless Land.

Speeding up incremental changes with AWS SAM Accelerate and nested stacks

2022-08-25 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/speeding-up-incremental-changes-with-aws-sam-accelerate-and-nested-stacks/

This blog written by Jeff Marcinko, Sr. Technical Account Manager, Health Care & Life Sciencesand Brian Zambrano, Sr. Specialist Solutions Architect, Serverless.

Developers and operators have been using the AWS Serverless Application Model (AWS SAM) to author, build, test, and deploy serverless applications in AWS for over three years. Since its inception, the AWS SAM team has focused on developer productivity, simplicity, and best practices.

As good as AWS SAM is at making your serverless development experience easier and faster, building non-trivial cloud applications remains a challenge. Developers and operators want a development experience that provides high-fidelity and fast feedback on incremental changes. With serverless development, local emulation of an application composed of many AWS resources and managed services can be incomplete and inaccurate. We recommend developing serverless applications in the AWS Cloud against live AWS services to increase developer confidence. However, the latency of deploying an entire AWS CloudFormation stack for every code change is a challenge that developers face with this approach.

In this blog post, I show how to increase development velocity by using AWS SAM Accelerate with AWS CloudFormation nested stacks. Nested stacks are an application lifecycle management best practice at AWS. We recommend nested stacks for deploying complex serverless applications, which aligns to the Serverless Application Lens of the AWS Well-Architected Framework. AWS SAM Accelerate speeds up deployment from your local system by bypassing AWS CloudFormation to deploy code and resource updates when possible.

AWS CloudFormation nested stacks and AWS SAM

A nested stack is a CloudFormation resource that is part of another stack, referred to as the parent, or root stack.

Nested stack architecture

The best practice for modeling complex applications is to author a root stack template and declare related resources in their own nested stack templates. This partitioning improves maintainability and encourages reuse of common template patterns. It is easier to reason about the configuration of the AWS resources in the example application because they are described in nested templates for each application component.

With AWS SAM, developers create nested stacks using the AWS::Serverless::Application resource type. The following example shows a snippet from a template.yaml file, which is the root stack for an AWS SAM application.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  DynamoDB:
    Type: AWS::Serverless::Application
    Properties:
      Location: db/template.yaml

  OrderWorkflow:
    Type: AWS::Serverless::Application
    Properties:
      Location: workflow/template.yaml

  ApiIntegrations:
    Type: AWS::Serverless::Application
    Properties:
      Location: api-integrations/template.yaml

  Api:
    Type: AWS::Serverless::Application
    Properties:
      Location: api/template.yaml

Each AWS::Serverless::Application resource type references a child stack, which is an independent AWS SAM template. The Location property tells AWS SAM where to find the stack definition.

Solution overview

The sample application exposes an API via Amazon API Gateway. One API endpoint (#2) forwards POST requests to Amazon SQS, an AWS Lambda function polls (#3) the SQS Queue and starts an Amazon Step Function workflow execution (#4) for each message.

Sample application architecture

Prerequisites

AWS SAM CLI, version 1.53.0 or higher
Python 3.9

Deploy the application

To deploy the application:

Clone the repository:

git clone <a href="https://github.com/aws-samples/sam-accelerate-nested-stacks-demo.git" target="_blank" rel="noopener">https://github.com/aws-samples/sam-accelerate-nested-stacks-demo.git</a>

Change to the root directory of the project and run the following AWS SAM CLI commands:
```
cd sam-accelerate-nested-stacks-demo
sam build
sam deploy --guided --capabilities CAPABILITY_IAM CAPABILITY_AUTO_EXPAND
```
You must include the CAPABILITY_IAM and CAPABILITY_AUTO_EXPAND capabilities to support nested stacks and the creation of permissions.
Use orders-app as the stack name during guided deployment. During the deploy process, enter your email for the SubscriptionEmail value. This requires confirmation later. Accept the defaults for the rest of the values.

SAM deploy example
After the CloudFormation deployment completes, save the API endpoint URL from the outputs.

Confirming the notifications subscription

After the deployment finishes, you receive an Amazon SNS subscription confirmation email at the email address provided during the deployment. Choose the Confirm Subscription link to receive notifications.

You have chosen to subscribe to the topic: 
arn:aws:sns:us-east-1:123456789012:order-topic-xxxxxxxxxxxxxxxxxx

To confirm this subscription, click or visit the link below (If this was in error no action is necessary): 
Confirm subscription

Testing the orders application

To test the application, use the curl command to create a new Order request with the following JSON payload:

{
    "quantity": 1,
    "name": "Pizza",
    "restaurantId": "House of Pizza"
}

curl -s --header "Content-Type: application/json" \
  --request POST \
  --data '"quantity":1,"name":"Pizza","quantity":1,"restaurantId":"House of Pizza"}' \
  https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/Dev/orders  | python -m json.tool

API Gateway responds with the following message, showing it successfully sent the request to the SQS queue:

API Gateway response

The application sends an order notification once the Step Functions workflow completes processing. The workflow intentionally randomizes the SUCCESS or FAILURE status message.

Accelerating development with AWS SAM sync

AWS SAM Accelerate enhances the development experience. It automatically observes local code changes and synchronizes them to AWS without building and deploying every function in my project.

However, when you synchronize code changes directly into the AWS Cloud, it can introduce drift between your CloudFormation stacks and its deployed resources. For this reason, you should only use AWS SAM Accelerate to publish changes in a development stack.

In your terminal, change to the root directory of the project folder and run the sam sync command. This runs in the foreground while you make code changes:

cd sam-accelerate-nested-stacks-demo
sam sync --watch --stack-name orders-app

The –watch option causes AWS SAM to perform an initial CloudFormation deployment. After the deployment is complete, AWS SAM watches for local changes and synchronizes them to AWS. This feature allows you to make rapid iterative code changes and sync to the Cloud automatically in seconds.

Making a code change

In the editor, update the Subject argument in the send_order_notification function in workflow/src/complete_order/app.py.

def send_order_notification(message):
    topic_arn = TOPIC_ARN
    response = sns.publish(
        TopicArn=topic_arn,
        Message=json.dumps(message),
        Subject=f'Orders-App: Update for order {message["order_id"]}'
        #Subject='Orders-App: SAM Accelerate for the win!'
    )

On save, AWS SAM notices the local code change, and updates the CompleteOrder Lambda function. AWS SAM does not trigger updates to other AWS resources across the different stacks, since they are unchanged. This can result in increased development velocity.

SAM sync output

Validate the change by sending a new order request and review the notification email subject.

curl -s --header "Content-Type: application/json" \
  --request POST \
  --data '"quantity":1,"name":"Pizza","quantity":1,"restaurantId":"House of Pizza"}' \
  https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/Dev/orders  | python -m json.tool

In this example, AWS SAM Accelerate is 10–15 times faster than the CloudFormation deployment workflow (sam deploy) for single function code changes.

Deployment speed comparison between SAM accelerate and CloudFormation

Deployment times vary based on the size and complexity of your Lambda functions and the number of resources in your project.

Making a configuration change

Next, make an infrastructure change to show how sync –watch handles configuration updates.

Update ReadCapacityUnits and WriteCapacityUnits in the DynamoDB table definition by changing the values from five to six in db/template.yaml.

Resources:
  OrderTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName: order-table-test
      AttributeDefinitions:
        - AttributeName: user_id
          AttributeType: S
        - AttributeName: id
          AttributeType: S
      KeySchema:
        - AttributeName: user_id
          KeyType: HASH
        - AttributeName: id
          KeyType: RANGE
      ProvisionedThroughput:
        ReadCapacityUnits: 5
        WriteCapacityUnits: 5

The sam sync –watch command recognizes the configuration change requires a CloudFormation deployment to update the db nested stack. Nested stacks reflect an UPDATE_COMPLETE status because CloudFormation starts an update to every nested stack to determine if changes must be applied.

SAM sync infrastructure update

Cleaning up

Delete the nested stack resources to make sure that you don’t continue to incur charges. After stopping the sam sync –watch command, run the following command to delete your resources:

sam delete orders-app

You can also delete the CloudFormation root stack from the console by following these steps.

Conclusion

Local emulation of complex serverless applications, built with nested stacks, can be challenging. AWS SAM Accelerate helps builders achieve a high-fidelity development experience by rapidly synchronizing code changes into the AWS Cloud.

This post shows AWS SAM Accelerate features that push code changes in near real time to a development environment in the Cloud. I use a non-trivial sample application to show how developers can push code changes to a live environment in seconds while using CloudFormation nested stacks to achieve the isolation and maintenance benefits.

For more serverless learning resources, visit Serverless Land.

Building resilient private APIs using Amazon API Gateway

2022-05-27 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/building-resilient-private-apis-using-amazon-api-gateway/

This post written by Giedrius Praspaliauskas, Senior Solutions Architect, Serverless.

Modern architectures meet recovery objectives (recovery time objective, RTO, and recovery point objective, RPO) by being resilient to routine and unexpected infrastructure disruptions. Depending on the recovery objectives and regulatory requirements, developers must choose the disaster recovery strategy. For more on disaster recovery strategies, see this whitepaper.

AWS’ global infrastructure enables customers to support applications with near zero RTO requirements. Customers can run workloads in multiple Regions, in a multi-site active/active manner, and serve traffic from all Regions. To do so, developers often must implement private multi-regional APIs for use by the applications.

This blog describes how to implement this solution using Amazon API Gateway and Amazon Route 53.

Overview

The first step is to build a REST API that uses private API Gateway endpoints with custom domain names as described in this sample. The next step is to deploy APIs in two AWS Regions and configure Route 53, following a disaster recovery strategy.
This architecture uses the following resources in each Region:

Two region architecture example

API Gateway with an AWS Lambda function as integration target.
Amazon Virtual Private Cloud (Amazon VPC) with two private subnets, used to deploy VPC endpoint and Network Load Balancers (NLB).
AWS Transit Gateway to establish connectivity between the two VPCs in different Regions.
VPC endpoint to access API Gateway from a private VPC.
Elastic network interfaces (ENIs) created by the VPC endpoint.
A Network Load Balancer with ENIs in its target group and a TLS listener with an AWS Certificate Manager (ACM) certificate, used as a facade for the API.
ACM issues and manages certificates used by the NLB TLS listener and API Gateway custom domain.
Route 53 with private hosted zones used for DNS resolution.
Amazon CloudWatch with alarms used for Route 53 health checks.

This sample implementation uses a Lambda function as an integration target, though it can target on-premises resources accessible via AWS Direct Connect, and applications running on Amazon EKS. For more information on best practices designing API Gateway private APIs, see this whitepaper.

This post uses AWS Transit Gateway with inter-Region peering to establish connectivity between the two VPCs. Depending on the networking needs and infrastructure already in place, you may tailor the architecture and use a different approach. Read this whitepaper for more information on available VPC-to-VPC connectivity options.

Implementation

Prerequisites

You can use existing infrastructure to deploy private APIs. Otherwise check the sample repository for templates and detailed instructions on how to provision the necessary infrastructure.

This post uses the AWS Serverless Application Model (AWS SAM) to deploy private APIs with custom domain names. Visit the documentation to install it in your environment.

Deploying private APIs into multiple Regions

To deploy private APIs with custom domain names and CloudWatch alarms for health checks into the two AWS Regions:

Download the AWS SAM template file api.yaml from the sample repository. Replace default parameter values in the template with ones that match your environment or provide them during the deployment step.

Navigate to the directory containing the template. Run following commands to deploy the API stack in the us-east-1 Region:

sam build --template-file api.yaml
sam deploy --template-file api.yaml --guided --stack-name private-api-gateway --region us-east-1

Repeat the deployment in the us-west-2 Region. Update the parameters values in the template to match the second Region (or provide them as an input during the deployment step). Run the following commands:
```
sam build --template-file api.yaml
sam deploy --template-file api.yaml --guided --stack-name private-api-gateway --region us-west-2
```

Setting up Route 53

With the API stacks deployed in both Regions, create health checks to use them in Route 53:

Navigate to Route 53 in the AWS Management Console. Use the CloudWatch alarms created in the previous step to create health checks (one per Region):

Configuring the health check for region 1

Configuring the health check for region 2

This Route 53 health check uses the CloudWatch alarms that are based on a static threshold of the NLB healthy host count metric in each Region. In your implementation, you may need more sophisticated API health status tracking. It can use anomaly detection, include metric math, or use a composite state of the multiple alarms. Check this documentation article for more information on CloudWatch alarms. You can also use the approach documented in this blog post as an alternative. It will help you to create a health check solution that observes the state of the private resources and creates custom metrics that are more specific to your use case. For example, such metrics could include increased database transactions’ failure rate, number of timed out requests to a downstream legacy system, status of an external system that your workload depends on, etc.
Create a private Route 53 hosted zone. Associate this with the VPCs where the client applications that access private APIs reside:

Create a private hosted zone
Create Route 53 private zone alias records, pointing to the NLBs in both Regions and VPCs, using the same name for both records (for example, private.internal.example.com):

Create alias records

This post uses Route 53 latency-based routing for private DNS to implement resilient active-active private API architecture. Depending on your use case and disaster recovery strategy, you can change this approach and use geolocation-based routing, failover, or weighted routing. See the documentation for more details on supported routing policies for the records in a private hosted zone.

In this implementation, client applications that connect to the private APIs reside in the VPCs and can access Route 53 private hosted zones. You may also operate an application that runs on-premises and must access the private APIs. Read this blog post for more information on how to create DNS naming that spans the entire network.

Validating the configuration

To validate this implementation, I use a bastion instance in each of the VPCs. I connect to them using SSH or AWS Systems Manager Session Manager (see this documentation article for details).

Run the following command from both bastion instances:
```
dig +short private.internal.com
```
The response should contain IP addresses of the NLB in one VPC:
```
10.2.2.188
10.2.1.211
```
After DNS resolution verification, run the following command in each of the VPCs:
```
curl -s -XGET https://private.internal.example.com/demo
```
The response should include event data as the Lambda function received it.
To simulate an outage, navigate to Route 53 in the AWS Management Console. Select the health check that corresponds to the Region where you received the response, and invert the health status:
After a few minutes, retry the same DNS resolution and API response validation steps. This time, it routes all your requests to the remaining healthy API stack.

Cleaning Up

To avoid incurring further charges, delete all the resources that you have created in Route 53 records, health checks, and private hosted zones.

Run the following commands to delete API stacks in both Regions:

sam delete --stack-name private-api-gateway --region us-west-2
sam delete --stack-name private-api-gateway --region us-east-1

If you used the sample templates to provision the infrastructure, follow the steps listed in the Cleanup section of the sample repository instructions.

Conclusion

This blog post walks through implementing a multi-Regional private API using API Gateway with custom domain names. This approach allows developers to make their internal applications and workloads more resilient, react to disruptions, and meet their disaster recovery scenario objectives.

For additional guidance on such architectures, including multi-Region application architecture, see this solution, blog post, re:Invent presentation.

For more serverless learning resources, visit Serverless Land.

Orchestrating Amazon S3 Glacier Deep Archive object retrieval using AWS Step Functions

2022-05-06 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/orchestrating-amazon-s3-glacier-deep-archive-object-retrieval-using-aws-step-functions/

This blog was written by Monica Cortes Sack, Solutions Architect, Oskar Neumann, Partner Solutions Architect, and Dhiraj Mahapatro, Principal Specialist SA, Serverless.

AWS Step Functions now support over 220 services and over 10,000 AWS API actions. This enables you to use the AWS SDK integration directly instead of writing an AWS Lambda function as a proxy.

One such service integration is with Amazon S3. Currently, you write scripts using AWS CLI S3 commands to achieve automation around running S3 tasks. For example, S3 integrates with AWS Transfer Family, builds a custom security check, takes action on S3 buckets on S3 object creations, or orchestrates a workflow around S3 Glacier Deep Archive object retrieval. These script executions do not provide an execution history or an easy way to validate the behavior.

Step Functions’ AWS SDK integration with S3 declaratively creates serverless workflows around S3 tasks without relying on those scripts. You can validate the execution history and behavior of a Step Functions workflow.

This blog highlights one of the S3 use cases. It shows how to orchestrate workflows around S3 Glacier Deep Archive object retrieval, cost estimation, and interaction with the requester using Step Functions. The demo application provides additional details on the entire architecture.

S3 Glacier Deep Archive is a storage class in S3 used for data that is rarely accessed. The service provides durable and secure long-term storage, trading immediate access for cost-effectiveness. You must restore archived objects before they are downloadable. It supports two options for object retrieval:

Standard – Access objects within 12 hours of the start of the restoration process.
Bulk – Access objects within 48 hours of the start of the restoration process.

Business use case

Consider a research institute that stores backups on S3 Glacier Deep Archive. The backups are maintained in S3 Glacier Deep Archive for redundancy. The institute has multiple researchers with one central IT team. When a researcher requests an object from S3 Glacier Deep Archive, the central IT team retrieves it and charges the corresponding research group for retrieval and data transfer costs.

Researchers are the end users and do not operate in the AWS Cloud. They run computing clusters on-premises and depend on the central IT team to provide them with the restored archive. A member of the research team requesting an object retrieval provides the following information to the central IT team:

Object key to be restored.
The number of days the researcher needs the object accessible for download.
Researcher’s email address.
Retrieve within 12 or 48 hours SLA. This determines whether “Standard” or “Bulk” retrieval respectively.

The following overall architecture explains the setup on AWS and the interaction between a researcher and the central IT team’s architecture.

Architecture overview

Architecture diagram

The researcher uses a front-end application to request object retrieval from S3 Glacier Deep Archive.
Amazon API Gateway synchronously invokes AWS Step Functions Express Workflow.
Step Functions initiates RestoreObject from S3 Glacier Deep Archive.
Step Functions stores the metadata of this retrieval in an Amazon DynamoDB table.
Step Functions uses Amazon SES to email the researcher about archive retrieval initiation.
Upon completion, S3 sends the RestoreComplete event to Amazon EventBridge.
EventBridge rule triggers another Step Functions for post-processing after the restore is complete.
A Lambda function inside the Step Functions calculates the estimated cost (retrieval and data transfer out) and updates existing metadata in the DynamoDB table.
Sync data from DynamoDB table using Amazon Athena Federated Queries to generate reports dashboard in Amazon QuickSight.
Step Functions uses SES to email the researcher with cost details.
Once the researcher receives an email, the researcher uses the front-end application to call the /download API endpoint.
API Gateway invokes a Lambda function that generates a pre-signed S3 URL of the retrieved object and returns it in the response.

Setup

To run the sample application, you must install CDK v2, Node.js, and npm.

To clone the repository, run:

git clone https://github.com/aws-samples/aws-stepfunctions-examples.git
cd cdk/app-glacier-deep-archive-retrieval

To deploy the application, run:
```
cdk deploy --all
```

Identifying workflow components

Starting the restore object workflow

The first component is accepting the researcher’s request to start the archive retrieval process. The sample application created from the demo provides a basic front-end app that shows the files from an S3 bucket that has objects stored in S3 Glacier Deep Archive. The researcher retrieves file requests from the front-end application reached by the sample application’s Amazon CloudFront URL.

Glacier Deep Archive Retrieval menu

The front-end app asks the researcher for an email address, the number of days the researcher wants the object to be available for download, and their ETA on retrieval speed. Based on the retrieval speed, the researcher accepts either Standard or Bulk object retrieval. To test this, put objects in the data bucket under the S3 Glacier Deep Archive storage class and use the front-end application to retrieve them.

Item retrieval prompt

The researcher then chooses the Retrieve file. This action invokes an API endpoint provided by API Gateway. The API Gateway endpoint synchronously invokes a Step Functions Express Workflow. This validates the restore object request, gets the object metadata, and starts to restore the object from S3 Glacier Deep Archive.

The state machine stores the metadata of the restore object AWS SDK call in a DynamoDB table for later use. You can use this metadata to build a dashboard in Amazon QuickSight for reporting and administration purposes. Finally, the state machine uses Amazon SES to email the researcher, notifying them about the restore object initiation process:

Restore object initiated

The following state machine shows the workflow:

Workflow diagram

The ability to use S3 APIs declaratively using AWS SDK from Step Functions makes it convenient to integrate with S3. This approach avoids writing a Lambda function to wrap the SDK calls. The following portion of the state machine definition shows the usage of S3 HeadObject and RestoreObject APIs:

"Get Object Metadata": {
    "Next": "Initiate Restore Object from Deep Archive",
    "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "Next": "Bad Request"
    }],
    "Type": "Task",
    "ResultPath": "$.result.metadata",
    "Resource": "arn:aws:states:::aws-sdk:s3:headObject",
    "Parameters": {
        "Bucket": "glacierretrievalapp-databucket-abc123",
        "Key.$": "$.fileKey"
    }
}, 
"Initiate Restore Object from Deep Archive": {
    "Next": "Update restore operation metadata",
    "Type": "Task",
    "ResultPath": null,
    "Resource": "arn:aws:states:::aws-sdk:s3:restoreObject",
    "Parameters": {
        "Bucket": "glacierretrievalapp-databucket-abc123",
        "Key.$": "$.fileKey",
        "RestoreRequest": {
            "Days.$": "$.requestedForDays"
        }
    }
}

You can extend the previous workflow and build your own Step Functions workflows to orchestrate other S3 related workflows.

Processing after object restoration completion

S3 RestoreObject is a long-running process for S3 Glacier Deep Archive objects. S3 emits a RestoreCompleted event notification on the object restore completion to EventBridge. You set up an EventBridge rule to trigger another Step Functions workflow as a target for this event. This workflow takes care of the object restoration post-processing.

cfnDataBucket.addPropertyOverride('NotificationConfiguration.EventBridgeConfiguration.EventBridgeEnabled', true);

An EventBridge rule triggers the following Step Functions workflow and passes the event payload as an input to the Step Functions execution:

new aws_events.Rule(this, 'invoke-post-processing-rule', {
  eventPattern: {
    source: ["aws.s3"],
    detailType: [
      "Object Restore Completed"
    ],
    detail: {
      bucket: {
        name: [props.dataBucket.bucketName]
      }
    }
  },
  targets: [new aws_events_targets.SfnStateMachine(this.stateMachine, {
    input: aws_events.RuleTargetInput.fromObject({
      's3Event': aws_events.EventField.fromPath('$')
    })
  })]
});

The Step Functions workflow gets object metadata from the DynamoDB table and then invokes a Lambda function to calculate the estimated cost. The Lambda function calculates the estimated retrieval and the data transfer costs using the contentLength of the retrieved object and the Price List API for the unit cost. The workflow then updates the calculated cost in the DynamoDB table.

The retrieval cost and the data transfer out cost are proportional to the size of the retrieved object. The Step Functions workflow also invokes a Lambda function to create the download API URL for object retrieval. Finally, it emails the researcher with the estimated cost and the download URL as a restoration completion notification.

Workflow studio diagram

The email notification to the researcher looks like:

Email example

Downloading the restored object

Once the object restoration is complete, the researcher can download the object from the front-end application.

Front end retrieval menu

The researcher chooses the Download action, which invokes another API Gateway endpoint. The endpoint integrates with a Lambda function as a backend that creates a pre-signed S3 URL sent as a response to the browser.

Administering object restoration usage

This architecture also provides a view for the central IT team to understand object restoration usage. You achieve this by creating reports and dashboards from the metadata stored in DynamoDB.

The sample application uses Amazon Athena Federated Queries and Amazon Athena DynamoDB Connector to generate a reports dashboard in Amazon QuickSight. You can also use Step Functions AWS SDK integration with Amazon Athena and visualize the workflows in the Athena console.

The following QuickSight visualization shows the count of restored S3 Glacier Deep Archive objects by their contentType:

QuickSight visualization

Considerations

With the preceding approach, you should consider that:

You must start the object retrieval in the same Region as the Region of the archived object.
S3 Glacier Deep Archive only supports standard and bulk retrievals.
You must enable the “Object Restore Completed” event notification on the S3 bucket with the S3 Glacier Deep Archive object.
The researcher’s email must be verified in SES.
Use a Lambda function for the Price List GetProducts API as the service endpoints are available in specific Regions.

Cleanup

To clean up the infrastructure used in this sample application, run:

cdk destroy --all

Conclusion

Step Functions’ AWS SDK integration opens up different opportunities to orchestrate a workflow. Step Functions provides native support for retries and error handling, which offloads the heavy lifting of handling them manually in scripts.

This blog shows one example use case with S3 Glacier Deep Archive. With AWS SDK integration in Step Functions, you can build any workflow orchestration using S3 or S3 control APIs. For example, a workflow to enforce AWS Key Management Service encryption based on an S3 event, or create static website hosting on-demand in a few steps.

With different S3 API calls available in Step Functions’ Workflow Studio, you can declaratively build a Step Functions workflow instead of imperatively calling each S3 API from a shell script or command line. Refer to the demo application for more details.

For more serverless learning resources, visit Serverless Land.

Using organization IDs as principals in Lambda resource policies

2022-03-15 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/using-organization-ids-as-principals-in-lambda-resource-policies/

This post is written by Rahul Popat, Specialist SA, Serverless and Dhiraj Mahapatro, Sr. Specialist SA, Serverless

AWS Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you. These events may include changes in state or an update, such as a user placing an item in a shopping cart on an ecommerce website. You can use AWS Lambda to extend other AWS services with custom logic, or create your own backend services that operate at AWS scale, performance, and security.

You may have multiple AWS accounts for your application development, but may want to keep few common functionalities in one centralized account. For example, have user authentication service in a centralized account and grant permission to other accounts to access it using AWS Lambda.

Today, AWS Lambda launches improvements to resource-based policies, which makes it easier for you to control access to a Lambda function by using the identifier of the AWS Organizations as a condition in your resource policy. The service expands the use of the resource policy to enable granting cross-account access at the organization level instead of granting explicit permissions for each individual account within an organization.

Before this release, the centralized account had to grant explicit permissions to all other AWS accounts to use the Lambda function. You had to specify each account as a principal in the resource-based policy explicitly. While that remains a viable option, managing access for individual accounts using such resource policy becomes an operational overhead when the number of accounts grows within your organization.

In this post, I walk through the details of the new condition and show you how to restrict access to only principals in your organization for accessing a Lambda function. You can also restrict access to a particular alias and version of the Lambda function with a similar approach.

Overview

For AWS Lambda function, you grant permissions using resource-based policies to specify the accounts and principals that can access it and what actions they can perform on it. Now, you can use a new condition key, aws:PrincipalOrgID, in these policies to require any principals accessing your Lambda function to be from an account (including the management account) within an organization. For example, let’s say you have a resource-based policy for a Lambda function and you want to restrict access to only principals from AWS accounts under a particular AWS Organization. To accomplish this, you can define the aws:PrincipalOrgID condition and set the value to your Organization ID in the resource-based policy. Your organization ID is what sets the access control on your Lambda function. When you use this condition, policy permissions apply when you add new accounts to this organization without requiring an update to the policy, thus reducing the operational overhead of updating the policy every time you add a new account.

Condition concepts

Before I introduce the new condition, let’s review the condition element of an IAM policy. A condition is an optional IAM policy element that you can use to specify special circumstances under which the policy grants or denies permission. A condition includes a condition key, operator, and value for the condition. There are two types of conditions: service-specific conditions and global conditions. Service-specific conditions are specific to certain actions in an AWS service. For example, the condition key ec2:InstanceType supports specific EC2 actions. Global conditions support all actions across all AWS services.

AWS:PrincipalOrgID condition key

You can use this condition key to apply a filter to the principal element of a resource-based policy. You can use any string operator, such as StringLike, with this condition and specify the AWS organization ID as its value.

Condition key	Description	Operators	Value
aws:PrincipalOrgID	Validates if the principal accessing the resource belongs to an account in your organization.	All string operators	Any AWS Organization ID

Restricting Lambda function access to only principals from a particular organization

Consider an example where you want to give specific IAM principals in your organization direct access to a Lambda function that logs to the Amazon CloudWatch.

Step 1 – Prerequisites

You already have an AWS Organization and at least two member accounts set up under it. Learn more at getting started with AWS Organization.
You have a Lambda function that you want to restrict access from your organization only. Learn more at getting started with AWS Lambda.

Once you have an organization and accounts setup, on the AWS Organization looks like this:

Organization accounts example

This example has two accounts in the AWS Organization, the Management Account, and the MainApp Account. Make a note of the Organization ID from the left menu. You use this to set up a resource-based policy for the Lambda function.

Step 2 – Create resource-based policy for a Lambda function that you want to restrict access to

Now you want to restrict the Lambda function’s invocation to principals from accounts that are member of your organization. To do so, write and attach a resource-based policy for the Lambda function:

{
  "Version": "2012-10-17",
  "Id": "default",
  "Statement": [
    {
      "Sid": "org-level-permission",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "lambda:InvokeFunction",
      "Resource": "arn:aws:lambda:<REGION>:<ACCOUNT_ID >:function:<FUNCTION_NAME>",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalOrgID": "o-sabhong3hu"
        }
      }
    }
  ]
}

In this policy, I specify Principal as *. This means that all users in the organization ‘o-sabhong3hu’ get function invocation permissions. If you specify an AWS account or role as the principal, then only that principal gets function invocation permissions, but only if they are also part of the ‘o-sabhong3hu’ organization.

Next, I add lambda:InvokeFunction as the Action and the ARN of the Lambda function as the resource to grant invoke permissions to the Lambda function. Finally, I add the new condition key aws:PrincipalOrgID and specify an Organization ID in the Condition element of the statement to make sure only the principals from the accounts in the organization can invoke the Lambda function.

You could also use the AWS Management Console to create a resource-based policy. Go to Lambda function page, click on the Configuration tab. Select Permissions from the left menu. Choose Add Permissions and fill in the required details. Scroll to the bottom and expand the Principal organization ID – optional submenu and enter your organization ID in the text box labeled as PrincipalOrgID and choose Save.

Add permissions

Step 3 – Testing

The Lambda function ‘LogOrganizationEvents’ is in your Management Account. You configured a resource-based policy to allow all the principals in your organization to invoke your Lambda function. Now, invoke the Lambda function from another account within your organization.

Sign in to the MainApp Account, which is another member account in the same organization. Open AWS CloudShell from the AWS Management Console. Invoke the Lambda function ‘LogOrganizationEvents’ from the terminal, as shown below. You receive the response status code of 200, which means success. Learn more on how to invoke Lambda function from AWS CLI.

Console example of access

Conclusion

You can now use the aws:PrincipalOrgID condition key in your resource-based policies to restrict access more easily to IAM principals only from accounts within an AWS Organization. For more information about this global condition key and policy examples using aws:PrincipalOrgID, read the IAM documentation.

If you have questions about or suggestions for this solution, start a new thread on the AWS Lambda or contact AWS Support.

For more information, visit Serverless Land.

Building TypeScript projects with AWS SAM CLI

2022-02-23 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/building-typescript-projects-with-aws-sam-cli/

This post written by Dan Fox, Principal Specialist Solutions Architect and Roman Boiko, Senior Specialist Solutions Architect

The AWS Serverless Application Model (AWS SAM) CLI provides developers with a local tool for managing serverless applications on AWS. This command line tool allows developers to initialize and configure applications, build and test locally, and deploy to the AWS Cloud. Developers can also use AWS SAM from IDEs like Visual Studio Code, JetBrains, or WebStorm. TypeScript is a superset of JavaScript and adds static typing, which reduces errors during development and runtime.

On February 22, 2022 we announced the beta of AWS SAM CLI support for TypeScript. These improvements simplify TypeScript application development by allowing you to build and deploy serverless TypeScript projects using AWS SAM CLI commands. To install the latest version of the AWS SAM CLI, refer to the installation section of the AWS SAM page.

In this post, I initialize a TypeScript project using an AWS SAM template. Then I build a TypeScript project using the AWS SAM CLI. Next, I use AWS SAM Accelerate to speed up the development and test iteration cycles for your TypeScript project. Last, I measure the impact of bundling, tree shaking, and minification on deployment package size.

Initializing a TypeScript template

This walkthrough requires:

Node.js 14. x
AWS SAM CLI

AWS SAM now provides the capability to create a sample TypeScript project using a template. Since this feature is still in preview, you can enable this by one of the following methods:

Use env variable `SAM_CLI_BETA_ESBUILD=1`

Add the following parameters to your samconfig.toml

[default.build.parameters]
beta_features = true
[default.sync.parameters]
beta_features = true

Use the --beta-features option with sam build and sam sync. I use this approach in the following examples.
Choose option ‘y’ when CLI prompts you about using beta features.

To create a new project:

Run – sam init
In the wizard, select the following options:
1. AWS Quick Start Templates
2. Hello World Example
3. nodejs14.x – TypeScript
4. Zip
5. Keep the name of the application as sam-app

sam init wizard steps

Open the created project in a text editor. In the root, you see a README.MD file with the project description and a template.yaml. This is the specification that defines the serverless application.

In the hello-world folder is an app.ts file written in TypeScript. This project also includes a unit test in Jest and sample configurations for ESLint, Prettier, and TypeScript compilers.

Project structure

Building and deploying a TypeScript project

Previously, to use TypeScript with AWS SAM CLI, you needed custom steps. These transform the TypeScript project into a JavaScript project before running the build.

Today, you can use the sam build command to transpile code from TypeScript to JavaScript. This bundles local dependencies and symlinks, and minifies files to reduce asset size.

AWS SAM uses the popular open source bundler esbuild to perform these tasks. This does not perform type checking but you may use the tsc CLI to perform this task. Once you have built the TypeScript project, use the sam deploy command to deploy to the AWS Cloud.
The following shows how this works.

Navigate to the root of sam-app.
Run sam build. This command uses esbuild to transpile and package app.ts.

sam build wizard
Customize the esbuild properties by editing the Metadata section in the template.yaml file.

Esbuild configuration
After a successful build, run sam deploy --guided to deploy the application to your AWS account.
Accept all the default values in the wizard, except this question:
HelloWorldFunction may not have authorization defined, Is this okay? [y/N]: y

sam deploy wizard
After successful deployment, test that the function is working by querying the API Gateway endpoint displayed in the Outputs section.

sam deploy output

Using AWS SAM Accelerate with TypeScript

AWS SAM Accelerate is a set of features that reduces development and test cycle latency by enabling you to test code quickly against AWS services in the cloud. AWS SAM Accelerate released beta support for TypeScript. Use the template from the last example to use SAM Accelerate with TypeScript.

Use AWS SAM Accelerate to build and deploy your code upon changes.

Run sam sync --stack-name sam-app --watch.
Open your browser with the API Gateway endpoint from the Outputs section.

Update the handler function in app.ts file to:

export const lambdaHandler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
    let response: APIGatewayProxyResult;
    try {
        response = {
            statusCode: 200,
            body: JSON.stringify({
                message: 'hello SAM',
            }),
        };
    } catch (err) {
        console.log(err);
        response = {
            statusCode: 500,
            body: JSON.stringify({
                message: 'some error happened',
            }),
        };
    }

    return response;
};

Save changes. AWS SAM automatically rebuilds and syncs the application code to the cloud.

AWS SAM Accelerate output
Refresh the browser to see the updated message.

Deployment package size optimizations

One additional benefit of the TypeScript build process is that it reduces your deployment package size through bundling, tree shaking, and minification. The bundling process removes dependency files not referenced in the control flow. Tree shaking is the term used for unused code elimination. It is a compiler optimization that removes unreachable code within files.

Minification reduces file size by removing white space, rewriting syntax to be more compact, and renaming local variables to be shorter. The sam build process performs bundling and tree shaking by default. Configure minification, a feature typically used in production environments, within the Metadata section of the template.yaml file.

Measure the impact of these optimizations by the reduced deployment package size. For example, measure the before and after size of an application, which includes the AWS SDK for JavaScript v3 S3 Client as a dependency.

To begin, change the package.json file to include the @aws-sdk/client-s3 as a dependency:

From the application root, cd into the hello-world directory.
Run the command:
npm install @aws-sdk/client-s3
Delete all the devDependencies except for esbuild to get a more accurate comparison

package.json contents
Run the following command to build your dependency library:
npm install
From the application root, run the following command to measure the size of the application directory contents:
du -sh hello-worldThe current application is approximately 50 MB.
Turn on minification by setting the Minify value to true in the template.yaml file

Metadata section of template.yaml
Now run the following command to build your project using bundling, tree shaking, and minification.
sam build
Your deployment package is now built in the .aws_sam directory. You can measure the size of the package with the following command:
du -sh .aws-sam

The new package size is approximately 2.8 MB. That represents a 94% reduction in uncompressed application size.

Conclusion

This post reviews several new features that can improve the development experience for TypeScript developers. I show how to create a sample TypeScript project using sam init. I build and deploy a TypeScript project using the AWS SAM CLI. I show how to use AWS SAM Accelerate with your TypeScript project. Last, I measure the impact of bundling, tree shaking, and minification on a sample project. We invite the serverless community to help improve AWS SAM. AWS SAM is an open source project and you can contribute to the repository here.

For more serverless content, visit Serverless Land.

Using the circuit breaker pattern with AWS Step Functions and Amazon DynamoDB

2022-01-31 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/using-the-circuit-breaker-pattern-with-aws-step-functions-and-amazon-dynamodb/

This post is written by Anitha Deenadayalan, Developer Specialist SA, DevAx

Modern applications use microservices as an architectural and organizational approach to software development, where the application comprises small independent services that communicate over well-defined APIs.

When multiple microservices collaborate to handle requests, one or more services may become unavailable or exhibit a high latency. Microservices communicate through remote procedure calls, and it is always possible that transient errors could occur in the network connectivity, causing failures.

This can cause performance degradation in the entire application during synchronous execution because of the cascading of timeouts or failures causing poor user experience. When complex applications use microservices, an outage in one microservice can lead to application failure. This post shows how to use the circuit breaker design pattern to help with a graceful service degradation.

Introducing circuit breakers

Michael Nygard popularized the circuit breaker pattern in his book, Release It. This design pattern can prevent a caller service from retrying another callee service call that has previously caused repeated timeouts or failures. It can also detect when the callee service is functional again.

Fallacies of distributed computing are a set of assertions made by Peter Deutsch and others at Sun Microsystems. They say the programmers new to distributed applications invariably make false assumptions. The network reliability, zero-latency expectations, and bandwidth limitations result in software applications written with minimal error handling for network errors.

During a network outage, applications may indefinitely wait for a reply and continually consume application resources. Failure to retry the operations when the network becomes available can also lead to application degradation. If API calls to a database or an external service time-out due to network issues, repeated calls with no circuit breaker can affect cost and performance.

The circuit breaker pattern

There is a circuit breaker object that routes the calls from the caller to the callee in the circuit breaker pattern. For example, in an ecommerce application, the order service can call the payment service to collect the payments. When there are no failures, the order service routes all calls to the payment service by the circuit breaker:

Circuit breaker with no failures

If the payment service times out, the circuit breaker can detect the timeout and track the failure. If the timeouts exceed a specified threshold, the application opens the circuit:

Circuit breaker with payment service failure

Once the circuit is open, the circuit breaker object does not route the calls to the payment service. It returns an immediate failure when the order service calls the payment service:

Circuit breaker stops routing to payment service

The circuit breaker object periodically tries to see if the calls to the payment service are successful:

Circuit breaker retries payment service

When the call to payment service succeeds, the circuit is closed, and all further calls are routed to the payment service again:

Circuit breaker with working payment service again

Architecture overview

This example uses the AWS Step Functions, AWS Lambda, and Amazon DynamoDB to implement the circuit breaker pattern:

Circuit breaker architecture

The Step Functions workflow provides circuit breaker capabilities. When a service wants to call another service, it starts the workflow with the name of the callee service.

The workflow gets the circuit status from the CircuitStatus DynamoDB table, which stores the currently degraded services. If the CircuitStatus contains a record for the service called, then the circuit is open. The Step Functions workflow returns an immediate failure and exit with a FAIL state.

If the CircuitStatus table does not contain an item for the called service, then the service is operational. The ExecuteLambda step in the state machine definition invokes the Lambda function sent through a parameter value. The Step Functions workflow exits with a SUCCESS state, if the call succeeds.

The items in the DynamoDB table have the following attributes:

DynamoDB items list

If the service call fails or a timeout occurs, the application retries with exponential backoff for a defined number of times. If the service call fails after the retries, the workflow inserts a record in the CircuitStatus table for the service with the CircuitStatus as OPEN, and the workflow exits with a FAIL state. Subsequent calls to the same service return an immediate failure as long as the circuit is open.

I enter the item with an associated time-to-live (TTL) value to ensure eventual connection retries and the item expires at the defined TTL time. DynamoDB’s time to live (TTL) allows you to define a per-item timestamp to determine when an item is no longer needed. Shortly after the date and time of the specified timestamp, DynamoDB deletes the item from your table without consuming write throughput.

For example, if you set the TTL value to 60 seconds to check a service status after a minute, DynamoDB deletes the item from the table after 60 seconds. The workflow invokes the service to check for availability when a new call comes in after the item has expired.

Circuit breaker Step Function

Prerequisites

For this walkthrough, you need:

An AWS account and an AWS user with AdministratorAccess (see the instructions on the AWS Identity and Access Management (IAM) console)
Access to the following AWS services: AWS Lambda, AWS Step Functions, and Amazon DynamoDB.
AWS SAM CLI using the instructions here.
NET Core 3.1 SDK installed
JetBrains Rider or Microsoft Visual Studio 2017 or later (or Visual Studio Code)

Setting up the environment

Use the .NET Core 3.1 code in the GitHub repository and the AWS SAM template to create the AWS resources for this walkthrough. These include IAM roles, DynamoDB table, the Step Functions workflow, and Lambda functions.

You need an AWS access key ID and secret access key to configure the AWS Command Line Interface (AWS CLI). To learn more about configuring the AWS CLI, follow these instructions.
Clone the repo:
git clone https://github.com/aws-samples/circuit-breaker-netcore-blog
After cloning, this is the folder structure:

Project file structure

Deploy using Serverless Application Model (AWS SAM)

The AWS Serverless Application Model (AWS SAM) CLI provides developers with a local tool for managing serverless applications on AWS.

The sam build command processes your AWS SAM template file, application code, and applicable language-specific files and dependencies. The command copies build artifacts in the format and location expected for subsequent steps in your workflow. Run these commands to process the template file:
```
cd circuit-breaker
sam build
```
After you build the application, test using the sam deploy command. AWS SAM deploys the application to AWS and displays the output in the terminal.
```
sam deploy --guided
```
Output from sam deploy
You can also view the output in AWS CloudFormation page.

Output in CloudFormation console
The Step Functions workflow provides the circuit-breaker function. Refer to the circuitbreaker.asl.json file in the statemachine folder for the state machine definition in the Amazon States Language (ASL).

To deploy with the CDK, refer to the GitHub page.

Running the service through the circuit breaker

To provide circuit breaker capabilities to the Lambda microservice, you must send the name or function ARN of the Lambda function to the Step Functions workflow:

{
  "TargetLambda": "<Name or ARN of the Lambda function>"
}

Successful run

To simulate a successful run, use the HelloWorld Lambda function provided by passing the name or ARN of the Lambda function the stack has created. Your input appears as follows:

{
  "TargetLambda": "circuit-breaker-stack-HelloWorldFunction-pP1HNkJGugQz"
}

During the successful run, the Get Circuit Status step checks the circuit status against the DynamoDB table. Suppose that the circuit is CLOSED, which is indicated by zero records for that service in the DynamoDB table. In that case, the Execute Lambda step runs the Lambda function and exits the workflow successfully.

Step Function with closed circuit

Service timeout

To simulate a timeout, use the TestCircuitBreaker Lambda function by passing the name or ARN of the Lambda function the stack has created. Your input appears as:

{
  "TargetLambda": "circuit-breaker-stack-TestCircuitBreakerFunction-mKeyyJq4BjQ7"
}

Again, the circuit status is checked against the DynamoDB table by the Get Circuit Status step in the workflow. The circuit is CLOSED during the first pass, and the Execute Lambda step runs the Lambda function and timeout.

The workflow retries based on the retry count and the exponential backoff values, and finally returns a timeout error. It runs the Update Circuit Status step where a record is inserted in the DynamoDB table for that service, with a predefined time-to-live value specified by TTL attribute ExpireTimeStamp.

Step Function with open circuit

Repeat timeout

As long as there is an item for the service in the DynamoDB table, the circuit breaker workflow returns an immediate failure to the calling service. When you re-execute the call to the Step Functions workflow for the TestCircuitBreaker Lambda function within 20 seconds, the circuit is still open. The workflow immediately fails, ensuring the stability of the overall application performance.

Step Function workflow immediately fails until retry

The item in the DynamoDB table expires after 20 seconds, and the workflow retries the service again. This time, the workflow retries with exponential backoffs, and if it succeeds, the workflow exits successfully.

Cleaning up

To avoid incurring additional charges, clean up all the created resources. Run the following command from a terminal window. This command deletes the created resources that are part of this example.

sam delete --stack-name circuit-breaker-stack --region <region name>

Conclusion

This post showed how to implement the circuit breaker pattern using Step Functions, Lambda, DynamoDB, and .NET Core 3.1. This pattern can help prevent system degradation in service failures or timeouts. Step Functions and the TTL feature of DynamoDB can make it easier to implement the circuit breaker capabilities.

To learn more about developing microservices on AWS, refer to the whitepaper on microservices. To learn more about serverless and AWS SAM, visit the Sessions with SAM series and find more resources at Serverless Land.

Offset lag metric for Amazon MSK as an event source for Lambda

2021-11-23 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/offset-lag-metric-for-amazon-msk-as-an-event-source-for-lambda/

This post written by Adam Wagner, Principal Serverless Solutions Architect.

Last year, AWS announced support for Amazon Managed Streaming for Apache Kafka (MSK) and self-managed Apache Kafka clusters as event sources for AWS Lambda. Today, AWS adds a new OffsetLag metric to Lambda functions with MSK or self-managed Apache Kafka event sources.

Offset in Apache Kafka is an integer that marks the current position of a consumer. OffsetLag is the difference in offset between the last record written to the Kafka topic and the last record processed by Lambda. Kafka expresses this in the number of records, not a measure of time. This metric provides visibility into whether your Lambda function is keeping up with the records added to the topic it is processing.

This blog walks through using the OffsetLag metric along with other Lambda and MSK metrics to understand your streaming application and optimize your Lambda function.

Overview

In this example application, a producer writes messages to a topic on the MSK cluster that is an event source for a Lambda function. Each message contains a number and the Lambda function finds the factors of that number. It outputs the input number and results to an Amazon DynamoDB table.

Finding all the factors of a number is fast if the number is small but takes longer for larger numbers. This difference means the size of the number written to the MSK topic influences the Lambda function duration.

Example application architecture

A Kafka client writes messages to a topic in the MSK cluster.
The Lambda event source polls the MSK topic on your behalf for new messages and triggers your Lambda function with batches of messages.
The Lambda function factors the number in each message and then writes the results to DynamoDB.

In this application, several factors can contribute to offset lag. The first is the volume and size of messages. If more messages are coming in, the Lambda may take longer to process them. Other factors are the number of partitions in the topic, and the number of concurrent Lambda functions processing messages. A full explanation of how Lambda concurrency scales with the MSK event source is in the documentation.

If the average duration of your Lambda function increases, this also tends to increase the offset lag. This lag could be latency in a downstream service or due to the complexity of the incoming messages. Lastly, if your Lambda function errors, the MSK event source retries the identical records set until they succeed. This retry functionality also increases offset lag.

Measuring OffsetLag

To understand how the new OffsetLag metric works, you first need a working MSK topic as an event source for a Lambda function. Follow this blog post to set up an MSK instance.

To find the OffsetLag metric, go to the CloudWatch console, select All Metrics from the left-hand menu. Then select Lambda, followed by By Function Name to see a list of metrics by Lambda function. Scroll or use the search bar to find the metrics for this function and select OffsetLag.

OffsetLag metric example

To make it easier to look at multiple metrics at once, create a CloudWatch dashboard starting with the OffsetLag metric. Select Actions -> Add to Dashboard. Select the Create new button, provide the dashboard a name. Choose Create, keeping the rest of the options at the defaults.

Adding OffsetLag to dashboard

After choosing Add to dashboard, the new dashboard appears. Choose the Add widget button to add the Lambda duration metric from the same function. Add another widget that combines both Lambda errors and invocations for the function. Finally, add a widget for the BytesInPerSec metric for the MSK topic. Find this metric under AWS/Kafka -> Broker ID, Cluster Name, Topic. Finally, click Save dashboard.

After a few minutes, you see a steady stream of invocations, as you would expect when consuming from a busy topic.

Data incoming to dashboard

This example is a CloudWatch dashboard showing the Lambda OffsetLag, Duration, Errors, and Invocations, along with the BytesInPerSec for the MSK topic.

In this example, the OffSetLag metric is averaging about eight, indicating that the Lambda function is eight records behind the latest record in the topic. While this is acceptable, there is room for improvement.

The first thing to look for is Lambda function errors, which can drive up offset lag. The metrics show that there are no errors so the next step is to evaluate and optimize the code.

The Lambda handler function loops through the records and calls the process_msg function on each record:

def lambda_handler(event, context):
    for batch in event['records'].keys():
        for record in event['records'][batch]:
            try:
                process_msg(record)
            except:
                print("error processing record:", record)
    return()

The process_msg function handles base64 decoding, calls a factor function to factor the number, and writes the record to a DynamoDB table:

def process_msg(record):
    #messages are base64 encoded, so we decode it here
    msg_value = base64.b64decode(record['value']).decode()
    msg_dict = json.loads(msg_value)
    #using the number as the hash key in the dynamodb table
    msg_id = f"{msg_dict['number']}"
    if msg_dict['number'] <= MAX_NUMBER:
        factors = factor_number(msg_dict['number'])
        print(f"number: {msg_dict['number']} has factors: {factors}")
        item = {'msg_id': msg_id, 'msg':msg_value, 'factors':factors}
        resp = ddb_table.put_item(Item=item)
    else:
        print(f"ERROR: {msg_dict['number']} is >= limit of {MAX_NUMBER}")

The heavy computation takes place in the factor function:

def factor(number):
    factors = [1,number]
    for x in range(2, (int(1 + number / 2))):
        if (number % x) == 0:
            factors.append(x)
    return factors

The code loops through all numbers up to the factored number divided by two. The code is optimized by only looping up to the square root of the number.

def factor(number):
    factors = [1,number]
    for x in range(2, 1 + int(number**0.5)):
        if (number % x) == 0:
            factors.append(x)
            factors.append(number // x)
    return factors

There are further optimizations and libraries for factoring numbers but this provides a noticeable performance improvement in this example.

Data after optimization

After deploying the code, refresh the metrics after a while to see the improvements:

The average Lambda duration has dropped to single-digit milliseconds and the OffsetLag is now averaging two.

If you see a noticeable change in the OffsetLag metric, there are several things to investigate. The input side of the system, increased messages per second, or a significant increase in the size of the message are a few options.

Conclusion

This post walks through implementing the OffsetLag metric to understand latency between the latest messages in the MSK topic and the records a Lambda function is processing. It also reviews other metrics that help understand the underlying cause of increases to the offset lag. For more information on this topic, refer to the documentation and other MSK Lambda metrics.

For more serverless learning resources, visit Serverless Land.