Tag Archives: contributed

Capturing client events using Amazon API Gateway and Amazon EventBridge

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/capturing-client-events-using-amazon-api-gateway-and-amazon-eventbridge/

This post is written by Tim Bruce, Senior Solutions Architect, DevAx.

Event producers are one of the three main components in an event-driven architecture. Event producers create and publish events to event routers, which send them to event consumers. Any portion of a system, including a mobile or web client, can be an event producer.

To extend the event model to your mobile and web clients, you must implement standards for security, messaging formats, and event storage.

This post shows how to build a client-enabled event-handling solution. It uses Amazon EventBridge, Amazon API Gateway, AWS Lambda, and Amazon Cognito. This architecture supports routing client events to internal and external destinations. It provides a blueprint that you can use to simplify the integration.

Overview

This example creates a RESTful API using API Gateway. It sends events directly to EventBridge without the need for compute services. In production, you have more requirements than only receiving and forwarding events. Additional requirements include security, user identification, validation, enrichment, transformation, event forwarding, and storing.

In this example, API Gateway provides security and user identification by invoking a Lambda authorizer. The authorizer generates a policy and returns client identification to API Gateway. API Gateway then performs request validation and message enrichment before forwarding the events to EventBridge.

EventBridge evaluates the events against rules and forwards the events to targets. The rules apply transformation to the events and forward an event to up to five targets. Targets include AWS services, such as Amazon Kinesis Data Firehose, and many third-party solutions, such as Zendesk, with HTTPS endpoints.

Lastly, Kinesis Data Firehose provides a cost-effective solution to store events into an Amazon S3 bucket. Before storing the events, Kinesis Data Firehose transforms records via Lambda transformers. It also partitions records using data in the record or calculated data via a Lambda function. Kinesis Data Firehose uses this partitioning data to create keys in the bucket and store matching records within the keys.

Example architecture

Example architecture

The example consists of the following resources defined in the AWS SAM template:

Data flow

Data flow

  1. Application clients collect or generate the events.
  2. The client sends the events to API Gateway as URL-encoded JSON. The client includes the user’s JWT in an authorization header with the request for validation.
  3. The Lambda authorizer validates the JWT with Amazon Cognito and returns the user’s unique clientID value to API Gateway.
  4. API Gateway transforms the request into events, appending clientId, the bus name, and environment.
  5. API Gateway sends the events to EventBridge.
  6. EventBridge rules match the events and:
    1. Forwards all client events to Kinesis Data Firehose.
    2. Forwards client events with detail.eventType of “loyaltypurchase” to Zendesk.
  7. Kinesis Data Firehose receives the records.
  8. The Kinesis Data Firehose data transformation processes each record, moving the client ID to the detail object.
  9. Kinesis Data Firehose partitions the records and stores them in an S3 bucket.

Overall design

The following sections discuss details of the solution, starting from the event in a web or mobile client. This solution requires the client to create an HTTPS request, including the user’s JWT as an authorization header.

{"entries": [{"entry": "{\"eventType\": \"searching\", \"schemaVersion\":1, \"data\": {\"searchTerm\":\"games\"}}"}]}

The preceding JSON shows a sample request body for this solution. The top-level item “entries” is an array of “entry” items. API Gateway will translate each “entry” to the event-detail field in EventBridge events. The client must escape the data for “entry” to prevent translation errors.

API Gateway and Lambda authorizer

API Gateway receives the request and validates the JWT by invoking the Lambda authorizer. The authorizer generates a policy allowing the request for valid tokens. It adds the Amazon Cognito “custom:clientId” custom attribute to the response context before returning the response to API Gateway. The “custom:clientId” attribute is a unique client identifier in the form of a UUID that downstream systems can use to retrieve data about the customer.

API Gateway validates the request by matching the request body against a model. Models represent what a request should look like. A mapping template then transforms valid requests to the format required by EventBridge. Mapping templates use velocity templating language (VTL) to do this.

VTL template
This mapping template uses a #foreach loop to process the array “entries” from the request body. The process enriches each event with the user’s “custom:clientId” and stage variables for bus name and environment from API Gateway.

Integration request

The preceding API Gateway AWS integration enables API Gateway to send the events to EventBridge without using compute services, such as Lambda or Amazon EC2. The integration and IAM execution role enable API Gateway to call the EventBridge PutEvents API to do this.

EventBridge rules and transformations

EventBridge rules match events against criteria, transform the events, and forward the events to targets. There are two rules in this example. One processes events for Zendesk tickets and the other forwards data to Kinesis Data Firehose to store events for triage and analytics.

This example creates service tickets in the Zendesk ticketing system. The tickets trigger agents to contact customers who are expecting a call to complete their purchases. The software client, by sending the event directly, reducing time-to-action for back-office processes and helping improve customer satisfaction.

Matching EventBridge rule

This rule matches client event messages for loyalty purchases and forwards details to the Zendesk API. The rule includes a transformation, which selects a portion of the event before sending the information to the target.

EventBridge uses an API destination to store details about the HTTP endpoint and usage policies. Additionally, an EventBridge connection and an AWS Secrets Manager secret store details. These include the authentication policy and authentication credentials to connect to the API destination.

Zendesk dashboard

Successfully processed events open tickets in Zendesk using the API destination. Agents now have a list of customers to contact.

Enterprises often require storing the events for troubleshooting or analytics. EventBridge does not include a newline between records when forwarding events to Kinesis Data Firehose. Because of this, it may be more challenging to discern each record when analyzing the data.

Rule to transform events
A rule for all client events changes this behavior. This AWS CloudFormation snippet defines the rule that will transform each event, adding a new line after each. The “\n” character in the InputTemplate field adds the separator between records before forwarding the data to Kinesis Data Firehose.

After, Kinesis Data Firehose receives each record separated by a new line, enabling both triage and analytics without extra overhead.

Kinesis Data Firehose to S3

Kinesis Data Firehose is a cost-effective way to batch and write records to S3. It offers optional transformation capabilities by invoking a Lambda function. This example uses a Lambda function that moves the “clientID” field to the detail section of the event record.

Kinesis Data Firehose to S3

Kinesis Data Firehose also supports dynamic partitioning of records when writing to S3. It selects data from the records or data calculated by a Lambda function. In this example, it selects data from the records to store data in separate folders in S3.

Event durability considerations

You can extend this example using an EventBridge archive and Amazon Kinesis Data Streams. Archiving allows you to create an encrypted archive of matching events. You can define the data retention in days, from one through indefinite. You can replay events from your archive when you must re-process data.

Kinesis Data Streams is a serverless data streaming solution. The EventBridge rule for all records can forward data to Kinesis Data Streams instead of Kinesis Data Firehose. Multiple applications can consume the Kinesis Data Streams. Kinesis Data Firehose would consume this stream of data and store it in S3.

Prerequisites

You need the following prerequisites to deploy the example solution:

Implementation

The full source of the solution is in the GitHub repository and is deployed with AWS SAM.

  1. Create a Secrets Manager secret using the command the AWS CLI:
    aws secretsmanager create-secret --name proto/Zendesk --secret-string '{"username":"<YOUR EMAIL>","apiKey":"<YOUR APIKEY>"}
  2. Clone the solution repository using git:
    git clone https://github.com/aws-samples/client-event-sample
  3. Build the AWS SAM project:
    sam build --use-container
  4. Deploy the project using AWS SAM:
    sam deploy --guided --capabilities CAPABILITY_NAMED_IAMAWS SAM deployment output
  5. From the outputs from the deployment, set the following shell variables:
    APPCLIENTID=<output APPCLIENTID>
    APIID=<output APIID>
    REGION=<region you deployed to>
  6. Create a user in Amazon Cognito using the AWS CLI:
    aws cognito-idp sign-up --client-id $APPCLIENTID --username <YOUR USER ID> --password <YOUR PASSWORD> --user-attributes Name=email,Value=<YOUR EMAIL>
  7. After you receive the confirmation code, confirm the user using the AWS CLI:
    aws cognito-idp confirm-sign-up --client-id $APPCLIENTID --username <userid> --confirmation-code <confirmation code>
  8. Test the user login with the AWS CLI:
    aws cognito-idp initiate-auth --auth-flow USER_PASSWORD_AUTH --client-id $APPCLIENTID --auth-parameters USERNAME=<YOUR USER ID>,PASSWORD=<YOUR PASSWORD>

If successful, this returns a JSON web token (JWT).

Testing the client event solution

  1. The sample repository includes an event generator in the util directory. The generator uses your credentials and simulates events from a user’s software client. From the utils directory, run the generator:
    python3 generator.py
    --minutes <minutes to run generator> --batch <batch size from 1-10>
    --errors <True|False> --userid <YOUR USER ID> --password <YOUR
    PASSWORD> --region $REGION --appclientid $APPCLIENTID --apiid $APIID
  2. Log in to your Zendesk console and view the created tickets.
  3. After five minutes, review the “clientevents” bucket to view the event records.

Cleaning up

To remove the example:

  1. Delete the data stored in the clientevents buckets created from the template.
  2. Delete the stack using the command:
    sam delete --stack-name clientevents
  3. Delete the secret using the command:
    aws secretsmanager delete-secret --secret-id <arn of secret>

Conclusion

This post shows how to send client events to an API and EventBridge to enable new customer experiences. The example covers enabling new experiences by creating a way for software clients to send events with minimal custom code. This blueprint shows how you can include client events in your solution, featuring validation, enrichment, transformation, and storage.

You can modify the example code provided here for your use in your organization. This enables your client software to register events without modifying backend code.

For more serverless learning resources, visit Serverless Land.

Mocking service integrations with AWS Step Functions Local

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/mocking-service-integrations-with-aws-step-functions-local/

This post is written by Sam Dengler, Principal Specialist Solutions Architect, and Dhiraj Mahapatro, Senior Specialist Solutions Architect.

AWS Step Functions now supports over 200 AWS Service integrations via AWS SDK Integration. Developers want to build and test control flow logic for workflows using branching logicerror handling, and retries. This allows for precise workflow execution with deterministic results. Additionally, developers use Step Functions’ input and output processing features to transform data as it enters and exits tasks.

Developers can test their state machines locally using Step Functions Local before deploying them to an AWS account. However state machines that use service integrations like AWS Lambda, Amazon SQS, or Amazon SNS require Step Functions Local to perform calls to AWS service endpoints. Often, developers want to test the control and data flows of their state machine executions in isolation, without any dependency on service integration availability.

Today, AWS is releasing Mocked Service Integrations for Step Functions Local. This allows developers to define sample outputs from AWS service integrations. You can combine them into test case scenarios to validate workflow control and data flow definitions. You can find the code used in this post in the Step Functions examples GitHub repository.

Sales lead generation sample workflow

In this example, new sales leads are created in a customer relationship management system. This triggers the sample workflow execution using input data, which provides information about the contact.

Using the sales lead data, the workflow first validates the contact’s identity and address. If valid, it uses Step Functions’ AWS SDK integration for Amazon Comprehend to call the DetectSentiment API. It uses the sales lead’s comments as input for sentiment analysis.

If the comments have a positive sentiment, it adds the sales leads information to a DynamoDB table for follow-up. The event is published to Amazon EventBridge to notify subscribers.

If the sales lead data is invalid or a negative sentiment is detected, it publishes events to EventBridge for notification. No record is added to the Amazon DynamoDB table. The following Step Functions Workflow Studio diagram shows the control logic:

The full workflow definition is available in the code repository. Note the workflow task names in the diagram, such as DetectSentiment, which are important when defining the mocked responses.

Sentiment analysis test case

In this example, you test a scenario in which:

  1. The identity and address are successfully validated using a Lambda function.
  2. A positive sentiment is detected using the Comprehend.DetectSentiment API after three retries.
  3. A contact item is written to a DynamoDB table successfully
  4. An event is published to an EventBridge event bus successfully

The execution path for this test scenario is shown in the following diagram (the red and green numbers have been added). 0 represents the first execution; 1, 2, and 3 represent the max retry attempts (MaxAttempts), in case of an InternalServerException.

Mocked response configuration

To use service integration mocking, create a mock configuration file with sections specifying mock AWS service responses. These are grouped into test cases that can be activated when executing state machines locally. The following example provides code snippets and the full mock configuration is available in the code repository.

To mock a successful Lambda function invocation, define a mock response that conforms to the Lambda.Invoke API response elements. Associate it to the first request attempt:

"CheckIdentityLambdaMockedSuccess": {
  "0": {
    "Return": {
      "StatusCode": 200,
      "Payload": {
        "statusCode": 200,
        "body": "{\"approved\":true,\"message\":\"identity validation passed\"
}"
      }
    }
  }
}

To mock the DetectSentiment retry behavior, define failure and successful mock responses that conform to the Comprehend.DetectSentiment API call. Associate the failure mocks to three request attempts, and associate the successful mock to the fourth attempt:

"DetectSentimentRetryOnErrorWithSuccess": {
  "0-2": {
    "Throw": {
      "Error": "InternalServerException",
      "Cause": "Server Exception while calling DetectSentiment API in Comprehend Service"
    }
  },
  "3": {
    "Return": {
      "Sentiment": "POSITIVE",
      "SentimentScore": {
        "Mixed": 0.00012647535,
        "Negative": 0.00008031699,
        "Neutral": 0.0051454515,
        "Positive": 0.9946478
      }
    }
  }
}

Note that Step Functions Local does not validate the structure of the mocked responses. Ensure that your mocked responses conform to actual responses before testing. To review the structure of service responses, either perform the actual service calls using Step Functions or view the documentation for those services.

Next, associate the mocked responses to a test case identifier:

"RetryOnServiceExceptionTest": {
  "Check Identity": "CheckIdentityLambdaMockedSuccess",
  "Check Address": "CheckAddressLambdaMockedSuccess",
  "DetectSentiment": "DetectSentimentRetryOnErrorWithSuccess",
  "Add to FollowUp": "AddToFollowUpSuccess",
  "CustomerAddedToFollowup": "CustomerAddedToFollowupSuccess"
}

With the test case and mock responses configured, you can use them for testing with Step Functions Local.

Test case execution using Step Functions Local

The Step Functions Developer Guide describes the steps used to set up Step Functions Local on your workstation and create a state machine.

After these steps are complete, you can run a workflow locally using the start-execution AWS CLI command. Activate the mocked responses by appending a pound sign and the test case identifier to the state machine ARN:

aws stepfunctions start-execution \
  --endpoint http://localhost:8083 \
  --state-machine arn:aws:states:us-east-1:123456789012:stateMachine: LeadGenerationStateMachine#RetryOnServiceExceptionTest \
  --input file://events/sfn_valid_input.json

Test case validation

To validate the workflow executed correctly in the test case, examine the state machine execution events using the StepFunctions.GetExecutionHistory API. This ensures that the correct states are used. There are a variety of validation tools available. This post shows how to achieve this using the AWS CLI filtering feature using JMESPath syntax.

In this test case, you validate the TaskFailed and TaskSucceeded events match the retry definition for the DetectSentiment task, which specifies three retries. Use the following AWS CLI command to get the execution history and filter on the execution events:

aws stepfunctions get-execution-history \
  --endpoint http://localhost:8083 \
  --execution-arn <ExecutionArn>
  --query 'events[?(type==`TaskFailed` && contains(taskFailedEventDetails.cause, `Server Exception while calling DetectSentiment API in Comprehend Service`)) || (type==`TaskSucceeded` && taskSucceededEventDetails.resource==`comprehend:detectSentiment`)]'

The results include matching events:

{
  "timestamp": "2022-01-13T17:24:32.276000-05:00",
  "type": "TaskFailed",
  "id": 19,
  "previousEventId": 18,
  "taskFailedEventDetails": {
    "error": "InternalServerException",
    "cause": "Server Exception while calling DetectSentiment API in Comprehend Service"
  }
}

These results should be compared to the test acceptance criteria to verify the execution behavior. Test cases, acceptance criteria, and validation expressions vary by customer and use case. These techniques are flexible to accommodate various happy path and error scenarios. To explore additional sample test cases and examples, visit the example code repository.

Conclusion

This post introduces a new robust way to test AWS Step Functions state machines in isolation. With mocking, developers get more control over the type of scenarios that a state machine can handle, leading to assertion of multiple behaviors. Testing a state machine with mocks can also be part of the software release. Asserting on behaviors like error handling, branching, parallel, dynamic parallel (map state) helps test the entire state machine’s behavior. For any new behavior in the state machine, such as a new type of exception from a state, you can mock and add as a test.

See the Step Functions Developer Guide for more information on service mocking with Step Functions Local. The sample application covers basic scenarios of testing a state machine. You can use a similar approach for complex scenarios including other Step Functions flows, like map and wait.

For more serverless learning resources, visit Serverless Land.

Using the circuit breaker pattern with AWS Step Functions and Amazon DynamoDB

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/using-the-circuit-breaker-pattern-with-aws-step-functions-and-amazon-dynamodb/

This post is written by Anitha Deenadayalan, Developer Specialist SA, DevAx

Modern applications use microservices as an architectural and organizational approach to software development, where the application comprises small independent services that communicate over well-defined APIs.

When multiple microservices collaborate to handle requests, one or more services may become unavailable or exhibit a high latency. Microservices communicate through remote procedure calls, and it is always possible that transient errors could occur in the network connectivity, causing failures.

This can cause performance degradation in the entire application during synchronous execution because of the cascading of timeouts or failures causing poor user experience. When complex applications use microservices, an outage in one microservice can lead to application failure. This post shows how to use the circuit breaker design pattern to help with a graceful service degradation.

Introducing circuit breakers

Michael Nygard popularized the circuit breaker pattern in his book, Release It. This design pattern can prevent a caller service from retrying another callee service call that has previously caused repeated timeouts or failures. It can also detect when the callee service is functional again.

Fallacies of distributed computing are a set of assertions made by Peter Deutsch and others at Sun Microsystems. They say the programmers new to distributed applications invariably make false assumptions. The network reliability, zero-latency expectations, and bandwidth limitations result in software applications written with minimal error handling for network errors.

During a network outage, applications may indefinitely wait for a reply and continually consume application resources. Failure to retry the operations when the network becomes available can also lead to application degradation. If API calls to a database or an external service time-out due to network issues, repeated calls with no circuit breaker can affect cost and performance.

The circuit breaker pattern

There is a circuit breaker object that routes the calls from the caller to the callee in the circuit breaker pattern. For example, in an ecommerce application, the order service can call the payment service to collect the payments. When there are no failures, the order service routes all calls to the payment service by the circuit breaker:

Circuit breaker with no failures

Circuit breaker with no failures

If the payment service times out, the circuit breaker can detect the timeout and track the failure. If the timeouts exceed a specified threshold, the application opens the circuit:

Circuit breaker with payment service failure

Circuit breaker with payment service failure

Once the circuit is open, the circuit breaker object does not route the calls to the payment service. It returns an immediate failure when the order service calls the payment service:

Circuit breaker stops routing to payment service

Circuit breaker stops routing to payment service

The circuit breaker object periodically tries to see if the calls to the payment service are successful:

Circuit breaker retries payment service

Circuit breaker retries payment service

When the call to payment service succeeds, the circuit is closed, and all further calls are routed to the payment service again:

Circuit breaker with working payment service again

Circuit breaker with working payment service again

Architecture overview

This example uses the AWS Step Functions, AWS Lambda, and Amazon DynamoDB to implement the circuit breaker pattern:

Circuit breaker architecture

Circuit breaker architecture

The Step Functions workflow provides circuit breaker capabilities. When a service wants to call another service, it starts the workflow with the name of the callee service.

The workflow gets the circuit status from the CircuitStatus DynamoDB table, which stores the currently degraded services. If the CircuitStatus contains a record for the service called, then the circuit is open. The Step Functions workflow returns an immediate failure and exit with a FAIL state.

If the CircuitStatus table does not contain an item for the called service, then the service is operational. The ExecuteLambda step in the state machine definition invokes the Lambda function sent through a parameter value. The Step Functions workflow exits with a SUCCESS state, if the call succeeds.

The items in the DynamoDB table have the following attributes:

DynamoDB items list

DynamoDB items list

If the service call fails or a timeout occurs, the application retries with exponential backoff for a defined number of times. If the service call fails after the retries, the workflow inserts a record in the CircuitStatus table for the service with the CircuitStatus as OPEN, and the workflow exits with a FAIL state. Subsequent calls to the same service return an immediate failure as long as the circuit is open.

I enter the item with an associated time-to-live (TTL) value to ensure eventual connection retries and the item expires at the defined TTL time. DynamoDB’s time to live (TTL) allows you to define a per-item timestamp to determine when an item is no longer needed. Shortly after the date and time of the specified timestamp, DynamoDB deletes the item from your table without consuming write throughput.

For example, if you set the TTL value to 60 seconds to check a service status after a minute, DynamoDB deletes the item from the table after 60 seconds. The workflow invokes the service to check for availability when a new call comes in after the item has expired.

Circuit breaker Step Function

Circuit breaker Step Function

Prerequisites

For this walkthrough, you need:

Setting up the environment

Use the .NET Core 3.1 code in the GitHub repository and the AWS SAM template to create the AWS resources for this walkthrough. These include IAM roles, DynamoDB table, the Step Functions workflow, and Lambda functions.

  1. You need an AWS access key ID and secret access key to configure the AWS Command Line Interface (AWS CLI). To learn more about configuring the AWS CLI, follow these instructions.
  2. Clone the repo:
    git clone https://github.com/aws-samples/circuit-breaker-netcore-blog
  3. After cloning, this is the folder structure:

    Project file structure

    Project file structure

Deploy using Serverless Application Model (AWS SAM)

The AWS Serverless Application Model (AWS SAM) CLI provides developers with a local tool for managing serverless applications on AWS.

  1. The sam build command processes your AWS SAM template file, application code, and applicable language-specific files and dependencies. The command copies build artifacts in the format and location expected for subsequent steps in your workflow. Run these commands to process the template file:
    cd circuit-breaker
    sam build
  2. After you build the application, test using the sam deploy command. AWS SAM deploys the application to AWS and displays the output in the terminal.
    sam deploy --guided

    Output from sam deploy

    Output from sam deploy

  3. You can also view the output in AWS CloudFormation page.

    Output in CloudFormation console

    Output in CloudFormation console

  4. The Step Functions workflow provides the circuit-breaker function. Refer to the circuitbreaker.asl.json file in the statemachine folder for the state machine definition in the Amazon States Language (ASL).

To deploy with the CDK, refer to the GitHub page.

Running the service through the circuit breaker

To provide circuit breaker capabilities to the Lambda microservice, you must send the name or function ARN of the Lambda function to the Step Functions workflow:

{
  "TargetLambda": "<Name or ARN of the Lambda function>"
}

Successful run

To simulate a successful run, use the HelloWorld Lambda function provided by passing the name or ARN of the Lambda function the stack has created. Your input appears as follows:

{
  "TargetLambda": "circuit-breaker-stack-HelloWorldFunction-pP1HNkJGugQz"
}

During the successful run, the Get Circuit Status step checks the circuit status against the DynamoDB table. Suppose that the circuit is CLOSED, which is indicated by zero records for that service in the DynamoDB table. In that case, the Execute Lambda step runs the Lambda function and exits the workflow successfully.

Step Function with closed circuit

Step Function with closed circuit

Service timeout

To simulate a timeout, use the TestCircuitBreaker Lambda function by passing the name or ARN of the Lambda function the stack has created. Your input appears as:

{
  "TargetLambda": "circuit-breaker-stack-TestCircuitBreakerFunction-mKeyyJq4BjQ7"
}

Again, the circuit status is checked against the DynamoDB table by the Get Circuit Status step in the workflow. The circuit is CLOSED during the first pass, and the Execute Lambda step runs the Lambda function and timeout.

The workflow retries based on the retry count and the exponential backoff values, and finally returns a timeout error. It runs the Update Circuit Status step where a record is inserted in the DynamoDB table for that service, with a predefined time-to-live value specified by TTL attribute ExpireTimeStamp.

Step Function with open circuit

Step Function with open circuit

Repeat timeout

As long as there is an item for the service in the DynamoDB table, the circuit breaker workflow returns an immediate failure to the calling service. When you re-execute the call to the Step Functions workflow for the TestCircuitBreaker Lambda function within 20 seconds, the circuit is still open. The workflow immediately fails, ensuring the stability of the overall application performance.

Step Function workflow immediately fails until retry

Step Function workflow immediately fails until retry

The item in the DynamoDB table expires after 20 seconds, and the workflow retries the service again. This time, the workflow retries with exponential backoffs, and if it succeeds, the workflow exits successfully.

Cleaning up

To avoid incurring additional charges, clean up all the created resources. Run the following command from a terminal window. This command deletes the created resources that are part of this example.

sam delete --stack-name circuit-breaker-stack --region <region name>

Conclusion

This post showed how to implement the circuit breaker pattern using Step Functions, Lambda, DynamoDB, and .NET Core 3.1. This pattern can help prevent system degradation in service failures or timeouts. Step Functions and the TTL feature of DynamoDB can make it easier to implement the circuit breaker capabilities.

To learn more about developing microservices on AWS, refer to the whitepaper on microservices. To learn more about serverless and AWS SAM, visit the Sessions with SAM series and find more resources at Serverless Land.

Introducing AWS Lambda batching controls for message broker services

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-aws-lambda-batching-controls-for-message-broker-services/

This post is written by Mithun Mallick, Senior Specialist Solutions Architect.

AWS Lambda now supports configuring a maximum batch window for instance-based message broker services to fine tune when Lambda invocations occur. This feature gives you an additional control on batching behavior when processing data. It applies to Amazon Managed Streaming for Apache Kafka (Amazon MSK), self-hosted Apache Kafka, and Amazon MQ for Apache ActiveMQ and RabbitMQ.

Apache Kafka is an open source event streaming platform used to support workloads such as data pipelines and streaming analytics. It is conceptually similar to Amazon Kinesis. Amazon MSK is a fully managed, highly available service that simplifies the setup, scaling, and management of clusters running Kafka.

Amazon MQ is a managed, highly available message broker service for Apache ActiveMQ and RabbitMQ that makes it easier to set up and operate message brokers on AWS. Amazon MQ reduces your operational responsibilities by managing the provisioning, setup, and maintenance of message brokers for you.

Amazon MSK, self-hosted Apache Kafka and Amazon MQ for ActiveMQ and RabbitMQ are all available as event sources for AWS Lambda. You configure an event source mapping to use Lambda to process items from a stream or queue. This allows you to use these message broker services to store messages and asynchronously integrate them with downstream serverless workflows.

In this blog, I explain how message batching works. I show how to use the new maximum batching window control for the managed message broker services and self-managed Apache Kafka.

Understanding batching

For event source mappings, the Lambda service internally polls for new records or messages from the event source, and then synchronously invokes the target Lambda function. Lambda reads the messages in batches and provides these to your function as an event payload. Batching allows higher throughput message processing, up to 10,000 messages in a batch. The payload limit of a single invocation is 6 MB.

Previously, you could only use batch size to configure the maximum number of messages Lambda would poll for. Once a defined batch size is reached, the poller invokes the function with the entire set of messages. This feature is ideal when handling a low volume of messages or batches of data that take time to build up.

Batching window

The new Batch Window control allows you to set the maximum amount of time, in seconds, that Lambda spends gathering records before invoking the function. This brings similar batching functionality that AWS supports with Amazon SQS to Amazon MQ, Amazon MSK and self-managed Apache Kafka. The Lambda event source mapping batching functionality can be described as follows.

Batching controls with Lambda event source mapping

Batching controls with Lambda event source mapping

Using MaximumBatchingWindowInSeconds, you can set your function to wait up to 300 seconds for a batch to build before processing it. This allows you to create bigger batches if there are enough messages. You can manage the average number of records processed by the function with each invocation. This increases the efficiency of each invocation, and reduces the frequency.

Setting MaximumBatchingWindowInSeconds to 0 invokes the target Lambda function as soon as the Lambda event source receives a message from the broker.

Message broker batching behavior

For ActiveMQ, the Lambda event source mapping uses the Java Message Service (JMS) API to receive messages. For RabbitMQ, Lambda uses a RabbitMQ client library to get messages from the queue.

The Lambda event source mappings act as a consumer when polling the queue. The batching pattern for all instance-based message broker services is the same. As soon as a message is received, the batching window timer starts. If there are more messages, the consumer makes additional calls to the broker and adds them to a buffer. It keeps a count of the number of messages and the total size of the payload.

The batch is considered complete if the addition of a new message makes the batch size equal to or greater than 6 MB, or the batch window timeout is reached. If the batch size is greater than 6 MB, the last message is returned back to the broker.

Lambda then invokes the target Lambda function synchronously and passes on the batch of messages to the function. The Lambda event source continues to poll for more messages and as soon as it retrieves the next message, the batching window starts again. Polling and invocation of the target Lambda function occur in separate processes.

Kafka uses a distributed append log architecture to store messages. This works differently from ActiveMQ and RabbitMQ as messages are not removed from the broker once they have been consumed. Instead, consumers must maintain an offset to the last record or message that was consumed from the broker. Kafka provides several options in the consumer API to simplify the tracking of offsets.

Amazon MSK and Apache Kafka store data in multiple partitions to provide higher scalability. Lambda reads the messages sequentially for each partition and a batch may contain messages from different partitions.  Lambda then commits the offsets once the target Lambda function is invoked successfully.

Configuring the maximum batching window

To reduce Lambda function invocations for existing or new functions, set the MaximumBatchingWindowInSeconds value close to 300 seconds. A longer batching window can introduce additional latency. For latency-sensitive workloads set the MaximumBatchingWindowInSeconds value to an appropriate setting.

To configure Maximum Batching on a function in the AWS Management Console, navigate to the function in the Lambda console. Create a new Trigger, or edit an existing once. Along with the Batch size you can configure a Batch window. The Trigger Configuration page is similar across the broker services.

Max batching trigger window

Max batching trigger window

You can also use the AWS CLI to configure the --maximum-batching-window-in-seconds parameter.

For example, with Amazon MQ:

aws lambda create-event-source-mapping --function-name my-function \
--maximum-batching-window-in-seconds 300 --batch-size 100 --starting-position AT_TIMESTAMP \
--event-source-arn arn:aws:mq:us-east-1:123456789012:broker:ExampleMQBroker:b-24cacbb4-b295-49b7-8543-7ce7ce9dfb98

You can use AWS CloudFormation to configure the parameter. The following example configures the MaximumBatchingWindowInSeconds as part of the AWS::Lambda::EventSourceMapping resource for Amazon MQ:

  LambdaFunctionEventSourceMapping:
    Type: AWS::Lambda::EventSourceMapping
    Properties:
      BatchSize: 10
      MaximumBatchingWindowInSeconds: 300
      Enabled: true
      Queues:
        - "MyQueue"
      EventSourceArn: !GetAtt MyBroker.Arn
      FunctionName: !GetAtt LambdaFunction.Arn
      SourceAccessConfigurations:
        - Type: BASIC_AUTH
          URI: !Ref secretARNParameter

You can also use AWS Serverless Application Model (AWS SAM) to configure the parameter as part of the Lambda function event source.

MQReceiverFunction:
      Type: AWS::Serverless::Function 
      Properties:
        FunctionName: MQReceiverFunction
        CodeUri: src/
        Handler: app.lambda_handler
        Runtime: python3.9
        Events:
          MQEvent:
            Type: MQ
            Properties:
              Broker: !Ref brokerARNParameter
              BatchSize: 10
              MaximumBatchingWindowInSeconds: 300
              Queues:
                - "workshop.queueC"
              SourceAccessConfigurations:
                - Type: BASIC_AUTH
                  URI: !Ref secretARNParameter

Error handling

If your function times out or returns an error for any of the messages in a batch, Lambda retries the whole batch until processing succeeds or the messages expire.

When a function encounters an unrecoverable error, the event source mapping is paused and the consumer stops processing records. Any other consumers can continue processing, provided that they do not encounter the same error.  If your Lambda event records exceed the allowed size limit of 6 MB, they can go unprocessed.

For Amazon MQ, you can redeliver messages when there’s a function error. You can configure dead-letter queues (DLQs) for both Apache ActiveMQ, and RabbitMQ. For RabbitMQ, you can set a per-message TTL to move failed messages to a DLQ.

Since the same event may be received more than once, functions should be designed to be idempotent. This means that receiving the same event multiple times does not change the result beyond the first time the event was received.

Conclusion

Lambda supports a number of event sources including message broker services like Amazon MQ and Amazon MSK. This post explains how batching works with the event sources and how messages are sent to the Lambda function.

Previously, you could only control the batch size. The new Batch Window control allows you to set the maximum amount of time, in seconds, that Lambda spends gathering records before invoking the function. This can increase the overall throughput of message processing and reduces Lambda invocations, which may improve cost.

For more serverless learning resources, visit Serverless Land.

Using Node.js ES modules and top-level await in AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-node-js-es-modules-and-top-level-await-in-aws-lambda/

This post is written by Dan Fox, Principal Specialist Solutions Architect, Serverless.

AWS Lambda now enables the use of ECMAScript (ES) modules in Node.js 14 runtimes. This feature allows Lambda customers to use dependency libraries that are configured as ES modules, or to designate their own function code as an ES module. It provides customers the benefits of ES module features like import/export operators, language-level support for modules, strict mode by default, and improved static analysis and tree shaking. ES modules also enable top-level await, a feature that can lower cold start latency when used with Provisioned Concurrency.

This blog post shows how to use ES modules in a Lambda function. It also provides guidance on how to use top-level await with Provisioned Concurrency to improve cold start performance for latency sensitive workloads.

Designating a function handler as an ES module

You may designate function code as an ES module in one of two ways. The first way is to specify the “type” in the function’s package.json file. By setting the type to “module”, you designate all “.js” files in the package to be treated as ES modules. Set the “type” as “commonjs” to specify the package contents explicitly as CommonJS modules:

// package.json
{
  "name": "ec-module-example",
  "type": "module",
  "description": "This package will be treated as an ES module.",
  "version": "1.0",
  "main": "index.js",
  "author": "Dan Fox",
  "license": "ISC"
}

// index.js – this file will inherit the type from 
// package.json and be treated as an ES module.

import { double } from './lib.mjs';

export const handler = async () => {
    let result = double(6); // 12
    return result;
};

// lib.mjs

export function double(x) {
    return x + x;
}

The second way to designate a function as either an ES module or a CommonJS module is by using the file name extension. File name extensions override the package type directive.

File names ending in .cjs are always treated as CommonJS modules. File names ending in .mjs are always treated as ES modules. File names ending in .js inherit their type from the package. You may mix ES modules and CommonJS modules within the same package. Packages are designated as CommonJS by default:

// this file is named index.mjs – it will always be treated as an ES module
import { square } from './lib.mjs';

export async function handler() {
    let result = square(6); // 36
    return result;
};

// lib.mjs
export function square(x) {
    return x * x;
}

Understanding Provisioned Concurrency

When a Lambda function scales out, the process of allocating and initializing new runtime environments may increase latency for end users. Provisioned Concurrency gives customers more control over cold start performance by enabling them to create runtime environments in advance.

In addition to creating execution environments, Provisioned Concurrency also performs initialization tasks defined by customers. Customer initialization code performs a variety of tasks including importing libraries and dependencies, retrieving secrets and configurations, and initializing connections to other services. According to an AWS analysis of Lambda service usage, customer initialization code is the largest contributor to cold start latency.

Provisioned Concurrency runs both environment setup and customer initialization code. This enables runtime environments to be ready to respond to invocations with low latency and reduces the impact of cold starts for end users.

Reviewing the Node.js event loop

Node.js has an event loop that causes it to behave differently than other runtimes. Specifically, it uses a non-blocking input/output model that supports asynchronous operations. This model enables it to perform efficiently in most cases.

For example, if a Node.js function makes a network call, that request may be designated as an asynchronous operation and placed into a callback queue. The function may continue to process other operations within the main call stack without getting blocked by waiting for the network call to return. Once the network call is returned, the callback is run and then removed from the callback queue.

This non-blocking model affects the Lambda execution environment lifecycle. Asynchronous functions written in the initialization block of a Node.js Lambda function may not complete before handler invocation. In fact, it is possible for function handlers to be invoked with open items remaining in the callback queue.

Typically, JavaScript developers use the await keyword to instruct a function to block and force it to complete before moving on to the next step. However, await is not permitted in the initialization block of a CommonJS JavaScript function. This behavior limits the amount of asynchronous initialization code that can be run by Provisioned Concurrency before the invocation cycle.

Improving cold start performance with top-level await

With ES modules, developers may use top-level await within their functions. This allows developers to use the await keyword in the top level of the file. With this feature, Node.js functions may now complete asynchronous initialization code before handler invocations. This maximizes the effectiveness of Provisioned Concurrency as a mechanism for limiting cold start latency.

Consider a Lambda function that retrieves a parameter from the AWS Systems Manager Parameter Store. Previously, using CommonJS syntax, you place the await operator in the body of the handler function:

// method1 – CommonJS

// CommonJS require syntax
const { SSMClient, GetParameterCommand } = require("@aws-sdk/client-ssm"); 

const ssmClient = new SSMClient();
const input = { "Name": "/configItem" };
const command = new GetParameterCommand(input);
const init_promise = ssmClient.send(command);

exports.handler = async () => {
    const parameter = await init_promise; // await inside handler
    console.log(parameter);

    const response = {
        "statusCode": 200,
        "body": parameter.Parameter.Value
    };
    return response;
};

When you designate code as an ES module, you can use the await keyword at the top level of the code. As a result, the code that makes a request to the AWS Systems Manager Parameter Store now completes before the first invocation:

// method2 – ES module

// ES module import syntax
import { SSMClient, GetParameterCommand } from "@aws-sdk/client-ssm"; 

const ssmClient = new SSMClient();
const input = { "Name": "/configItem" }
const command = new GetParameterCommand(input);
const parameter = await ssmClient.send(command); // top-level await

export async function handler() {
    const response = {
        statusCode: 200,
        "body": parameter.Parameter.Value
    };
    return response;
};

With on-demand concurrency, an end user is unlikely to see much difference between these two methods. But when you run these functions using Provisioned Concurrency, you may see performance improvements. Using top-level await, Provisioned Concurrency fetches the parameter during its startup period instead of during the handler invocation. This reduces the duration of the handler execution and improves end user response latency for cold invokes.

Performing benchmark testing

You can perform benchmark tests to measure the impact of top level await. I have created a project that contains two Lambda functions, one that contains an ES module and one that contains a CommonJS module.

Both functions are configured to respond to a single API Gateway endpoint. Both functions retrieve a parameter from AWS Systems Manager Parameter Store and are configured to use Provisioned Concurrency. The ES module uses top-level await to retrieve the parameter. The CommonJS function awaits the parameter retrieval in the handler.

Example architecture

Before deploying the solution, you need:

To deploy:

  1. From a terminal window, clone the git repo:
    git clone https://github.com/aws-samples/aws-lambda-es-module-performance-benchmark
  2. Change directory:
    cd ./aws-lambda-es-module-performance-benchmark
  3. Build the application:
    sam build
  4. Deploy the application to your AWS account:
    sam deploy --guided
  5. Take note of the API Gateway URL in the Outputs section.
    Deployment outputs

This post uses a popular open source tool Artillery to provide load testing. To perform load tests:

  1. Open config.yaml document in the /load_test directory and replace the target string with the URL of the API Gateway:
    target: “Put API Gateway url string here”
  2. From a terminal window, navigate to the /load_test directory:
    cd load_test
  3. Download and install dependencies:
    npm install
  4. Begin load test for the CommonJS function.
    ./test_commonjs.sh
  5. Begin load test for ES module function.
    ./test_esmodule.sh

Reviewing the results

Test results

Here is a side-by-side comparison of the results of two load tests of 600 requests each. The left shows the results for the CommonJS module and the right shows the results for the ES module. The p99 response time reflects the cold start durations when the Lambda service scales up the function due to load. The p99 for the CommonJS module is 603 ms while the p99 for the ES module is 340.5 ms, a performance improvement of 43.5% (262.5 ms) for the p99 of this comparison load test.

Cleaning up

To delete the sample application, use the latest version of the AWS SAM CLI and run:

sam delete

Conclusion

Lambda functions now support ES modules in Node.js 14.x runtimes. ES modules support await at the top-level of function code. Using top-level await maximizes the effectiveness of Provisioned Concurrency and can reduce the latency experienced by end users during cold starts.

This post demonstrates a sample application that can be used to perform benchmark tests that measure the impact of top-level await.

For more serverless content, visit Serverless Land.

Validating addresses with AWS Lambda and the Amazon Location Service

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/validating-addresses-with-aws-lambda-and-the-amazon-location-service/

This post is written by Matthew Nightingale, Associate Solutions Architect.

Traditional methods of performing address validation on geospatial datasets can be expensive and time consuming. Using Amazon Location Service with AWS Lambda in a serverless data processing pipeline, you may achieve significant performance improvements and cost savings on address validation jobs that use geospatial data.

This blog contains a deployable AWS Serverless Application Model (AWS SAM) template. It also uses sample data sourced from publicly available datasets that you can deploy and use to test the application. This blog offers a starting point to build out a serverless address validation pipeline in your own AWS account.

Overview

This application implements a serverless scatter/gather architecture using Lambda and Amazon S3, performing address validation with the Amazon Location Service. An S3 PUT event triggers each Lambda function to run data processing jobs along each step of the pipeline.

To test the application, a user uploads a .CSV file to S3. This dataset is labeled with fields that are recognized by the 2waygeocoder Lambda function. The application returns a processed dataset to S3 appended with location information from the Amazon Location Places API.

Solution overview

  1. The Scatter Lambda function takes a dataset from the S3 bucket labeled input and splits it into equally sized shards.
  2. The Process Lambda function takes each shard from the pre-processed bucket. It performs address validation in parallel with a 2waygeocoder function calling the Amazon Location Service Places API.
  3. The Gather Lambda function takes each shard from the post-processed bucket. It appends the data into a complete dataset with additional address information.

Amazon Location Service

Amazon Location Service sources high-quality geospatial data from HERE and ESRI to support searches by using a place index resource.

With the Amazon Locations Places API, you can convert addresses and other textual queries into geographic coordinates (also known as geocoding). You can also convert geographic positions into addresses and place descriptions (known as reverse geocoding).

The example application includes a 2waygeocoder capable of both geocoding and reverse geocoding. The next section shows examples of the call and response from the Amazon Location Places API for both geocoding and reverse geocoding.

Geocoding with Amazon Location Service

Here is an example of calling the Amazon Location Service Places API using the AWS SDK for Python (Boto3). This uses the search_place_index_for_text method:

Response = location.search_place_index_for_text(
	IndexName = ‘explore.place’ 
###index is created using Amazon Location service
	Text = “Boston, MA”)
location_response = Reponse[“Results”]
print(location_response)

Example response:

Response

Example reverse-geocoding with Amazon Location Service

Here is another example of calling the Amazon Location Service Places API using the AWS SDK for Python (boto3). This uses the search_place_index_for_position method:

Response = location.search_place_index_for_position(
	IndexName = ‘explore.place’ 
###index is created using Amazon Location service
	Position = “-71.056739, 42.358660”))
location_response = Reponse[“Results”]
print(location_response)

Example response:

Response

Design considerations

Processing data with Lambda in parallel using a serverless scatter/gather pipeline helps provide performance efficiency at lower cost. To provide even greater performance, you can optimize your Lambda configuration for higher throughput. There are several strategies you can implement to do this and a key few topics to keep in mind.

Increase the allocated memory for your Lambda function

The simplest way to increase throughput is to increase the allocated memory of the Lambda function.

Faster Lambda functions can process more data and increase throughput. This works even if a Lambda function’s memory utilization is low. This is because increasing memory also increases vCPUs in proportion to the amount configured. Each function supports up to 10 GB of memory and you can access up to six vCPUs per function.

To see the average cost and execution speed for each memory configuration, the Lambda Power Tuning tool helps to visualize the tradeoffs.

Optimize shard size

Another method for increasing performance in a serverless scatter/gather architecture is to optimize the total number of shards created by the scatter function. Increasing the total number of shards consequently reduces the size of any single shard, allowing Lambda to process each shard faster.

When scaling with Lambda, one instance of a function handles one request at a time. When the number of requests increases, Lambda creates more instances of the function to process traffic. Because S3 invokes Lambda asynchronously, there is an internal queue buffering requests between the event source and the Lambda service.

In a serverless scatter/gather architecture, having more shards results in more concurrent invocations of the process Lambda function. For more information about scaling and concurrency with Lambda, see this blog post. Increasing concurrency with Lambda can lead to API request throttling.

Consider API request throttling with your concurrent Lambda functions

In a serverless scatter/gather architecture, the rate at which your code calls APIs increases by a factor equal to the number of concurrent Lambda functions. This means API request limits can quickly be exceeded. You must consider Service Quotas and API request limits when trying to increase the performance of your serverless scatter/gather architecture.

For example, the Amazon Location Places APIs called in the processing function of this application has a default limit of 50 API requests per second. The 2waygeocoder calls on average about 12 APIs per second. Splitting the application into more than four shards may cause API throttling exception errors in this case. Requests to increase Service Quotas can be made through your AWS account.

Deploying the solution

You need the following perquisites to deploy the example application:

Deploy the example application:

  1. Clone the repository and download the sample source code to your environment where AWS SAM is installed:
    git clone https://github.com/aws-samples/amazon-location-service-serverless-address-validation
  2. Change into the project directory containing the template.yaml file:
    cd ~/environment/amazon-location-service-serverless-address-validation
  3. Build the application using AWS SAM:
    sam build
    Terminal output
  4. Deploy the application to your account using AWS SAM. Be sure to follow proper S3 naming conventions providing globally unique names for S3 buckets:
    sam deploy --guided
    Deployment output

Testing the application

Testing geocoding

To test the application, download the dataset that is linked in Testing the Application section of the GitHub repository. These tests demonstrate both the geocoding and reverse-geocoding capabilities of the application.

First, test the geocoding capabilities. You perform address validation on the City of Hartford Business Listing dataset linked in the GitHub repository. The dataset contains a listing of all the active businesses registered in the city Hartford, CT, and each business address. The GitHub repo links to an external website where you can download the dataset.

  1. Download the .csv version of the City of Hartford Business Listing dataset. The link is found in the Testing the Application section of the README file on GitHub.
  2. Open the file locally to explore its contents.
  3. Ensure that the .csv file contains columns labeled as “Address”, “City”, and “State”. The 2waygeocoder deployed as part of the AWS SAM template recognizes these columns to perform geocoding.
  4. Before testing the application’s geocoding capabilities, explore the pricing of Amazon Location Service. In order to save money, you can trim the length of the dataset for testing by removing rows. Once the dataset is trimmed to a desired length, navigate to S3 in the AWS Management Console.
  5. Upload the dataset to the S3 bucket labeled “input”. This triggers the scatter function.
  6. Navigate to the S3 bucket labeled “raw” to view the shards of your dataset created by the scatter function.
  7. Navigate to Lambda and select the 2waygeocoder function to view the CloudWatch Logs to see any information that is returned by the function code in near-real-time.
  8. Once the data is processed, navigate to the S3 bucket labeled “destination” to view the complete processed dataset that is created by the gather function. It may take several minutes for your dataset to finish processing.

Congratulations! You have successfully geocoded a dataset using Amazon Location Service with a serverless address validation pipeline.

Testing reverse-geocoding

Next, test the reverse-geocoding capabilities of the application. You perform address validation on the Miami Housing Dataset linked in the GitHub repository. This dataset contains information on 13,932 single-family homes sold in Miami. The repo links to an external website where you can download the dataset.

Before testing, explore the pricing of Amazon Location Service. To start the test:

  1. Download the zip file containing the .csv version of the dataset from . The link is found in the Testing the Application section of the README file on GitHub.
  2. Open the file locally to explore its contents.
  3. Ensure the .csv file contains columns A and B labeled “Latitude” and “Longitude”. You must edit these column headers to match the correct format that is recognized by the 2waygeocoder to perform reverse-geocoding. Only the “L” should be capitalized.
  4. To minimize cost, trim the length of the dataset for testing by removing rows. At the full size of ~13,933 rows, the dataset takes approx. 5 minutes to process.
  5. Once the dataset is trimmed to a desired length and both column A and B are labeled as “Latitude” and “Longitude” respectively, navigate to S3 in the AWS Management Console, and upload the dataset to your S3 bucket labeled “Input”.
  6. Navigate to the S3 bucket labeled “raw” to view the shards of your dataset.
  7. Navigate to Lambda and select the 2waygeocoder function to view the CloudWatch Logs to see any information that is returned by the function code in near-real-time.
  8. Navigate to the S3 bucket labeled “destination” to view the complete processed dataset that is created by the gather function. It may take several minutes for your dataset to finish processing.

Congratulations! You have successfully reverse-geocoded a dataset with Amazon Location Service using a serverless scatter/gather pipeline. You can move on to the conclusion, or continue to test the geocoding capabilities of the application with additional datasets.

Next steps

To get started testing your own datasets, use the AWS SAM template from GitHub deployed as part of this blog. Ensure that the labels in your dataset are labeled to match the constructs used in this blog post. The 2waygeocoder recognizes columns labeled “Latitude” and “Longitude” to perform reverse-geocoding, and “Address”, “City”, and “State” to perform geocoding.

Now that the data has been geocoded by Amazon Location Service and is in S3, you can use Amazon QuickSight geospatial charts to quickly and easily create interactive charts. For information on how to create a Dataset in QuickSight using Amazon S3 Files, check out the QuickSight User Guide.

Below is an example using QuickSight Geospatial charts to map the Miami housing dataset. The map shows average sale price by zipcode:

QuickSight map

This example uses QuickSight geospatial charts to map the City of Hartford Business dataset. The map shows DBA (doing business as) by latitude and longitude:

Dataset visualization

Conclusion

This blog post performs address validation with the Amazon Location Service, demonstrating both geocoding and reverse geocoding capabilities.

Using a serverless architecture with S3 and Lambda, you can achieve both cost optimization and performance improvement compared with traditional methods of address validation. Using this application, your organization can better understand and harness geospatial data.

For more serverless learning resources, visit Serverless Land.

Building a serverless multi-player game that scales: Part 3

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-serverless-multi-player-game-that-scales-part-3/

This post is written by Tim Bruce, Sr. Solutions Architect, DevAx, Chelsie Delecki, Solutions Architect, DNB, and Brian Krygsman, Solutions Architect, Enterprise.

This blog series discusses building a serverless game that scales, using Simple Trivia Service:

  • Part 1 describes the overall architecture, how to deploy to your AWS account, and the different communication methods.
  • Part 2 describes adding automation to the game to help your teams scale.

This post discusses how the game scales to support concurrent users (CCU) under a load test. While this post focuses on Simple Trivia Service, you can apply the concepts to any serverless workload.

To set up the example, see the instructions in the Simple Trivia Service GitHub repo and the README.md file. This example uses services beyond AWS Free Tier and incurs charges. To remove the example from your account, see the README.md file.

Overview

Simple Trivia Service is launching at a new trivia conference. There are 200,000 registered attendees who are invited to play the game during the conference. The developers are following AWS Well-Architected best practice and load test before the launch.

Load testing is the practice of simulating user load to validate the system’s ability to scale. The focus of the load test is the game’s microservices, built using AWS Serverless services, including:

  • Amazon API Gateway and AWS IoT, which provide serverless endpoints, allowing users to interact with the Simple Trivia Service microservices.
  • AWS Lambda, which provides serverless compute services for the microservices.
  • Amazon DynamoDB, which provides a serverless NoSQL database for storing game data.

Preparing for load testing

Defining success criteria is one of the first steps in preparing for a load test. You use success criteria to determine how well the game meets the requirements and includes concurrent users, error rates, and response time. These three criteria help to ensure that your users have a good experience when playing your game.

Excluding one can lead to invalid assumptions about the scale of users that the game can support. If you exclude error rate goals, for example, users may encounter more errors, impacting their experience.

The success criteria used for Simple Trivia Service are:

  • 200,000 concurrent users split across game types.
  • Error rates below 0.05%.
  • 95th percentile synchronous responses under 1 second.

With these identified, you can develop dashboards to report on the targets. Dashboards allow you to monitor system metrics over the course of load tests. You can develop dashboards using Amazon CloudWatch dashboards, using custom widgets that organize and display metrics.

Common metrics to monitor include:

  • Error rates – total errors / total invocations.
  • Throttles – invocations resulting in 429 errors.
  • Percentage of quota usage – usage against your game’s Service Quotas.
  • Concurrent execution counts – maximum concurrent Lambda invocations.
  • Provisioned concurrency invocation rate – provisioned concurrency spillover invocation count / provisioned concurrency invocation count.
  • Latency – percentile-based response time, such as 90th and 95th percentiles.

Documentation and other services are also helpful during load testing. Centralized logging via Amazon CloudWatch Logs and AWS CloudTrail provide your team with operational data for the game. This data can help triage issues during testing.

System architecture documents provide key details to help your team focus their work during triage. Amazon DevOps Guru can also provide your team with potential solutions for issues. This uses machine learning to identify operational deviations and deployments and provides recommendations for resolving issues.

A load testing tool simplifies your testing, allowing you to model users playing the game. Popular load testing tools include Apache JMeter, Artillery.io Artillery, and Locust.io Locust. The load testing tool you select can act as your application client and access your endpoints directly.

This example uses Locust to load test Simple Trivia Service based on language and technical requirements. It allows you to accurately model usage and not only generate transactions. In production applications, select a tool that aligns to your team’s skills and meets your technical requirements.

You can place automation around load testing tool to reduce manual effort of running tests. Automation can include allocating environments, deploying and running test scripts, and collecting results. You can include this as part of your continuous integration/continuous delivery (CI/CD) pipeline. You can use the Distributed Load Testing on AWS solution to support Taurus-compatible load testing.

Also, document a plan, working backwards from your goals to help measure your progress. Plans typically use incremental growth of CCU, which can help you to identify constraints in your game. Use your plan while you are in development once portions of your game feature complete.

Testing plan for STS

This shows an example plan for load testing Simple Trivia Service:

  1. Start with individual game testing to validate tests and game modes separately.
  2. Add in testing of the three game modes together, mirroring expected real world activity.

Finally, evaluate your load test and architecture against your AWS Customer Agreement, AWS Acceptable Use Policy, Amazon EC2 Testing Policy, and the AWS Customer Support Policy for Penetration Testing. These policies are put in place to help you to be successful in your load testing efforts. AWS Support requires you to notify them at least two weeks prior to your load test using the Simulated Events Submission Form with the AWS Management Console. This form can also be used if you have questions before your load test.

Additional help for your load test may be available on the AWS Forums, AWS re:Post, or via your account team.

Testing

After triggering a test, automation scales up your infrastructure and initializes the test users. Depending on the number of users you need and their startup behavior, this ramp-up phase can take several minutes. Similarly, when the test run is complete, your test users should ramp down. Unless you have modeled the ramp-up and ramp-down phases to match real-world behavior, exclude these phases from your measurements. If you include them, you may optimize for unrealistic user behavior.

While tests are running, let metrics normalize before drawing conclusions. Services may report data at different rates. Investigate when you find metrics that cross your acceptable thresholds. You may need to make adjustments like adding Lambda Provisioned Concurrency or changing application code to resolve constraints. You may even need to re-evaluate your requirements based on how the system performs. When you make changes, re-test to verify any changes had the impact you expected before continuing with your plan.

Finally, keep an organized record of the inputs and outputs of tests, including dashboard exports and your own observations. This record is valuable when sharing test outcomes and comparing test runs. Mark your progress against the plan to stay on track.

Analyzing and improving Simple Trivia Service performance

Running the test plan, using observability tools to measure performance, finds opportunities to tune performance bottlenecks.

In this example, during single player individual tests, the dashboards show acceptable latency values. As the test size grows, increasing read capacity for retrieving leaderboards indicates a tuning opportunity:

Dashboard reads 1

Dashboard reads 2

  1. The CloudWatch dashboard reveals that the LeaderboardGet function is leading to high consumed read capacity for the Players DynamoDB table. A process within the function is querying scores and player records with every call to load avatar URLs
  2. Standardizing the player avatar URL process within the code reduces reads from the table. The update improves DynamoDB reads.

Moving into the full test phase of the plan with combined game types identified additional areas for performance optimization. In one case, dashboards highlight unexpected error rates for a Lambda function. Consulting function logs and DevOps Guru to triage the behavior, these show a downstream issue with an Amazon Kinesis Data Stream:

Identifying error rates in dashboards

  1. DevOps Guru, within an insight, highlights the problem of the Kinesis:WriteProvisionedThroughputExceeded metric during our test window
  2. DevOps Guru also correlates that metric with the Kinesis:GetRecords.Latency metric.

DevOps Guru also links to a recommendation for Kinesis Data Streams to troubleshoot and resolve the incident with the data stream. Following this advice helps to resolve the Lambda error rates during the next test.

Load testing results

By following the plan, making incremental changes as optimizations became apparent, you can reach the goals.

Table of results

The preceding table is a summary of data from Amazon CloudWatch Lambda Insights and statistics captured from Locust:

  1. The test exceeded the goal of 200k CCU with a combined total of 236,820 CCU.
  2. Less than 0.05% error rate with a combined average of 0.010%.
  3. Performance goals are achieved without needing Provisioned Concurrency in Lambda.

Function latency

Function concurrency and throttles

  1. The function latency goal of < 1 second is met, based on data from CloudWatch Lambda Insights.
  2. Function concurrency is below Service Quotas for Lambda during the test, based on data from our custom CloudWatch dashboard.

Conclusion

This post discusses how to perform a load test on a serverless workload. The process was used to validate a scale of Simple Trivia Service, a single- and multi-player game built using a serverless-first architecture on AWS. The results show a scale of over 220,000 CCUs while maintaining less than 1-second response time and an error rate under 0.05%.

For more serverless learning resources, visit Serverless Land.

Using an Amazon MQ network of broker topologies for distributed microservices

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-an-amazon-mq-network-of-broker-topologies-for-distributed-microservices/

This post is written by Suranjan Choudhury Senior Manager SA and Anil Sharma, Apps Modernization SA.

This blog looks at ActiveMQ topologies that customers can evaluate when planning hybrid deployment architectures spanning AWS Regions and customer data centers, using a network of brokers. A network of brokers can have brokers on-premises and Amazon MQ brokers on AWS.

Distributing broker nodes across AWS and on-premises allows for messaging infrastructure to scale, provide higher performance, and improve reliability. This post also explains a topology spanning two Regions and demonstrates how to deploy on AWS.

A network of brokers is composed of multiple simultaneously active single-instance brokers or active/standby brokers. A network of brokers provides a large-scale messaging fabric in which multiple brokers are networked together. It allows a system to survive the failure of a broker. It also allows distributed messaging. Applications on remote, disparate networks can exchange messages with each other. A network of brokers helps to scale the overall broker throughput in your network, providing increased availability and performance.

Types of ActiveMQ topologies

Network of brokers can be configured in a variety of topologies – for example, mesh, concentrator, and hub and spoke. The topology depends on requirements such as security and network policies, reliability, scaling and throughput, and management and operational overheads. You can configure individual brokers to operate as a single broker or in an active/standby configuration.

Mesh topology

Mesh topology

A mesh topology provides multiple brokers that are all connected to each other. This example connects three single-instance brokers, but you can configure more brokers as a mesh. The mesh topology needs subnet security group rules to be opened for allowing brokers in internal subnets to communicate with brokers in external subnets.

For scaling, it’s simpler to add new brokers for incrementing overall broker capacity. The mesh topology by design offers higher reliability with no single point of failure. Operationally, adding or deleting of nodes requires broker re-configuration and restarting the broker service.

Concentrator topology

Concentrator topology

In a concentrator topology, you deploy brokers in two (or more) layers to funnel incoming connections into a smaller collection of services. This topology allows segmenting brokers into internal and external subnets without any additional security group changes. If additional capacity is needed, you can add new brokers without needing to update other brokers’ configurations. The concentrator topology provides higher reliability with alternate paths for each broker. This enables hybrid deployments with lower operational overheads.

Hub and spoke topology

Hub and spoke topology

A hub and spoke topology preserves messages if there is disruption to any broker on a spoke. Messages are forwarded throughout and only the central Broker1 is critical to the network’s operation. Subnet security group rules must be opened to allow brokers in internal subnets to communicate with brokers in external subnets.

Adding brokers for scalability is constrained by the hub’s capacity. Hubs are a single point of failure and should be configured as active-standby to increase reliability. In this topology, depending on the location of the hub, there may be increased bandwidth needs and latency challenges.

Using a concentrator topology for large-scale hybrid deployments

When planning deployments spanning AWS and customer data centers, the starting point is the concentrator topology. The brokers are deployed in tiers such that brokers in each tier connect to fewer brokers at the next tier. This allows you to funnel connections and messages from a large number of producers to a smaller number of brokers. This concentrates messages at fewer subscribers:

Hybrid deployment topology

Deploying ActiveMQ brokers across Regions and on-premises

When placing brokers on-premises and in the AWS Cloud in a hybrid network of broker topologies, security and network routing are key. The following diagram shows a typical hybrid topology:

Typical hybrid topology

Amazon MQ brokers on premises are placed behind a firewall. They can communicate to Amazon MQ brokers through an IPsec tunnel terminating on the on-premises firewall. On the AWS side, this tunnel terminates on an AWS Transit Gateway (TGW). The TGW routes all network traffic to a firewall in AWS in a service VPC.

The firewall inspects the network traffic and routes all inspected traffic sent back to the transit gateway. The TGW, based on routing configured, sends the traffic to the Amazon MQ broker in the application VPC. This broker concentrates messages from Amazon MQ brokers hosted on AWS. The on premises brokers and the AWS brokers form a hybrid network of brokers that spans AWS and customer data center. This allows applications and services to communicate securely. This architecture exposes only the concentrating broker to receive and send messages to the broker on premises. The applications are protected from outside, non-validated network traffic.

This blog shows how to create a cross-Region network of brokers. This topology removes multiple brokers in the internal subnet. However, in a production environment, you have multiple brokers’ internal subnets catering to multiple producers and consumers. This topology spans an AWS Region and an on-premises customer data center represented in a second AWS Region:

Cross-Region topology

Best practices for configuring network of brokers

Client-side failover

In a network of brokers, failover transport configures a reconnect mechanism on top of the transport protocols. The configuration allows you to specify multiple URIs to connect to. An additional configuration using the randomize transport option allows for random selection of the URI when re-establishing a connection.

The example Lambda functions provided in this blog use the following configuration:

//Failover URI
failoverURI = "failover:(" + uri1 + "," + uri2 + ")?randomize=True";

Broker side failover

Dynamic failover allows a broker to receive a list of all other brokers in the network. It can use the configuration to update producer and consumer clients with this list. The clients can update to rebalance connections to these brokers.

In the broker configuration in this blog, the following configuration is set up:

<transportConnectors> <transportConnector name="openwire" updateClusterClients="true" updateClusterClientsOnRemove = "false" rebalanceClusterClients="true"/> </transportConnectors>

Network connector properties – TTL and duplex

TTL values allow messages to traverse through the network. There are two TTL values – messageTTL and consumerTTL. Another way is to set up the network TTL, which sets both the message and consumer TTL.

The duplex option allows for creating a bidirectional path between two brokers for sending and receiving messages. This blog uses the following configuration:

<networkConnector name="connector_1_to_3" networkTTL="5" uri="static:(ssl://xxxxxxxxx.mq.us-east-2.amazonaws.com:61617)" userName="MQUserName"/>

Connection pooling for producers

In the example Lambda function, a pooled connection factory object is created to optimize connections to broker:

// Create a conn factory

final ActiveMQSslConnectionFactory connFacty = new ActiveMQSslConnectionFactory(failoverURI);
connFacty.setConnectResponseTimeout(10000);
return connFacty;

// Create a pooled conn factory

final PooledConnectionFactory pooledConnFacty = new PooledConnectionFactory();
pooledConnFacty.setMaxConnections(10);
pooledConnFacty.setConnectionFactory(connFacty);
return pooledConnFacty;

Deploying the example solution

  1. Create an IAM role for Lambda by following the steps at https://github.com/aws-samples/aws-mq-network-of-brokers#setup-steps.
  2. Create the network of brokers in the first Region. Navigate to the CloudFormation console and choose Create stack:
    Stack name
  3. Provide the parameters for the network configuration section:
    Network configuration
  4. In the Amazon MQ configuration section, configure the following parameters. Ensure that these two parameter values are the same in both Regions.
    MQ configuration
  5. Configure the following in the Lambda configuration section. Deploy mqproducer and mqconsumer in two separate Regions:
    Lambda Configuration
  6. Create the network of brokers in the second Region. Repeat step 2 to create the network of brokers in the second Region. Ensure that the VPC CIDR in region2 is different than the one in region1. Ensure that the user name and password are the same as in the first Region.
  7. Complete VPC peering and updating route tables:
    1. Follow the steps here to complete VPC peering between the two VPCs.
    2. Update the route tables in both the VPC.
    3. Enable DNS resolution for the peering connection.
      Route table
  8. Configure the network of brokers and create network connectors:
    1. In region1, choose Broker3. In the Connections section, copy the endpoint for the openwire protocol.
    2. In region2 on broker3, set up the network of brokers using the networkConnector configuration element.
      Brokers
    3. Edit the configuration revision and add a new NetworkConnector within the NetworkConnectors section. Replace the uri with the URI for the broker3 in region1.
      <networkConnector name="broker3inRegion2_to_ broker3inRegion1" duplex="true" networkTTL="5" userName="MQUserName" uri="static:(ssl://b-123ab4c5-6d7e-8f9g-ab85-fc222b8ac102-1.mq.ap-south-1.amazonaws.com:61617)" />
  9. Send a test message using the mqProducer Lambda function in region1. Invoke the producer Lambda function:
    aws lambda invoke --function-name mqProducer out --log-type Tail --query 'LogResult' --output text | base64 -d

    Terminal output

  10. Receive the test message. In region2, invoke the consumer Lambda function:
    aws lambda invoke --function-name mqConsumer out --log-type Tail --query 'LogResult' --output text | base64 -d

    Terminal response

The message receipt confirms that the message has crossed the network of brokers from region1 to region2.

Cleaning up

To avoid incurring ongoing charges, delete all the resources by following the steps at https://github.com/aws-samples/aws-mq-network-of-brokers#clean-up.

Conclusion

This blog explains the choices when designing a cross-Region or a hybrid network of brokers architecture that spans AWS and your data centers. The example starts with a concentrator topology and enhances that with a cross-Region design to help address network routing and network security requirements.

The blog provides a template that you can modify to suit specific network and distributed application scenarios. It also covers best practices when architecting and designing failover strategies for a network of brokers or when developing producers and consumers client applications.

The Lambda functions used as producer and consumer applications demonstrate best practices in designing and developing ActiveMQ clients. This includes storing and retrieving parameters, such as passwords from the AWS Systems Manager.

For more serverless learning resources, visit Serverless Land.

Introducing Amazon Simple Queue Service dead-letter queue redrive to source queues

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-amazon-simple-queue-service-dead-letter-queue-redrive-to-source-queues/

This blog post is written by Mark Richman, a Senior Solutions Architect for SMB.

Today AWS is launching a new capability to enhance the dead-letter queue (DLQ) management experience for Amazon Simple Queue Service (SQS). DLQ redrive to source queues allows SQS to manage the lifecycle of unconsumed messages stored in DLQs.

SQS is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. Using Amazon SQS, you can send, store, and receive messages between software components at any volume without losing messages or requiring other services to be available.

To use SQS, a producer sends messages to an SQS queue, and a consumer pulls the messages from the queue. Sometimes, messages can’t be processed due to a number of possible issues. These can include logic errors in consumers that cause message processing to fail, network connectivity issues, or downstream service failures. This can result in unconsumed messages remaining in the queue.

Understanding SQS dead-letter queues (DLQs)

SQS allows you to manage the life cycle of the unconsumed messages using dead-letter queues (DLQs).

A DLQ is a separate SQS queue that one or many source queues can send messages that can’t be processed or consumed. DLQs allow you to debug your application by letting you isolate messages that can’t be processed correctly to determine why their processing didn’t succeed. Use a DLQ to handle message consumption failures gracefully.

When you create a source queue, you can specify a DLQ and the condition under which SQS moves messages from the source queue to the DLQ. This is called the redrive policy. The redrive policy condition specifies the maxReceiveCount. When a producer places messages on an SQS queue, the ReceiveCount tracks the number of times a consumer tries to process the message. When the ReceiveCount for a message exceeds the maxReceiveCount for a queue, SQS moves the message to the DLQ. The original message ID is retained.

For example, a source queue has a redrive policy with maxReceiveCount set to 5. If the consumer of the source queue receives a message 6, without successfully consuming it, SQS moves the message to the dead-letter queue.

You can configure an alarm to alert you when any messages are delivered to a DLQ. You can then examine logs for exceptions that might have caused them to be delivered to the DLQ. You can analyze the message contents to diagnose consumer application issues. Once the issue has been resolved and the consumer application recovers, these messages can be redriven from the DLQ back to the source queue to process them successfully.

Previously, this required dedicated operational cycles to review and redrive these messages back to their source queue.

DLQ redrive to source queues

DLQ redrive to source queues enables SQS to manage the second part of the lifecycle of unconsumed messages that are stored in DLQs. Once the consumer application is available to consume the failed messages, you can now redrive the messages from the DLQ back to the source queue. You can optionally review a sample of the available messages in the DLQ. You redrive the messages using the Amazon SQS console. This allows you to more easily recover from application failures.

Using redrive to source queues

To show how to use the new functionality there is an existing standard source SQS queue called MySourceQueue.

SQS does not create DLQs automatically. You must first create an SQS queue and then use it as a DLQ. The DLQ must be in the same region as the source queue.

Create DLQ

  1. Navigate to the SQS Management Console and create a standard SQS queue for the DLQ called MyDLQ. Use the default configuration. Refer to the SQS documentation for instructions on creating a queue.
  2. Navigate to MySourceQueue and choose Edit.
  3. Navigate to the Dead-letter queue section and choose Enabled.
  4. Select the Amazon Resource Name (ARN) of the MyDLQ queue you created previously.
  5. You can configure the number of times that a message can be received before being sent to a DLQ by setting Set Maximum receives to a value between 1 and 1,000. For this demo enter a value of 1 to immediately drive messages to the DLQ.
  6. Choose Save.
Configure source queue with DLQ

Configure source queue with DLQ

The console displays the Details page for the queue. Within the Dead-letter queue tab, you can see the Maximum receives value and DLQ ARN.

DLQ configuration

DLQ configuration

Send and receive test messages

You can send messages to test the functionality in the SQS console.

  1. Navigate to MySourceQueue and choose Send and receive messages
  2. Send a number of test messages by entering the message content in Message body and choosing Send message.
  3. Send and receive messages

    Send and receive messages

  4. Navigate to the Receive messages section where you can see the number of messages available.
  5. Choose Poll for messages. The Maximum message count is set to 10 by default If you sent more than 10 test messages, poll multiple times to receive all the messages.
Poll for messages

Poll for messages

All the received messages are sent to the DLQ because the maxReceiveCount is set to 1. At this stage you would normally review the messages. You would determine why their processing didn’t succeed and resolve the issue.

Redrive messages to source queue

Navigate to the list of all queues and filter if required to view the DLQ. The queue displays the approximate number of messages available in the DLQ. For standard queues, the result is approximate because of the distributed architecture of SQS. In most cases, the count should be close to the actual number of messages in the queue.

Messages available in DLQ

Messages available in DLQ

  1. Select the DLQ and choose Start DLQ redrive.
  2. DLQ redrive

    DLQ redrive

    SQS allows you to redrive messages either to their source queue(s) or to a custom destination queue.

  3. Choose to Redrive to source queue(s), which is the default.
  4. Redrive has two velocity control settings.

  • System optimized sends messages back to the source queue as fast as possible
  • Custom max velocity allows SQS to redrive messages with a custom maximum rate of messages per second. This feature is useful for minimizing the impact to normal processing of messages in the source queue.

You can optionally inspect messages prior to redrive.

  • To redrive the messages back to the source queue, choose DLQ redrive.
  • DLQ redrive

    DLQ redrive

    The Dead-letter queue redrive status panel shows the status of the redrive and percentage processed. You can refresh the display or cancel the redrive.

    Dead-letter queue redrive status

    Dead-letter queue redrive status

    Once the redrive is complete, which takes a few seconds in this example, the status reads Successfully completed.

    Redrive status completed

    Redrive status completed

  • Navigate back to the source queue and you can see all the messages are redriven back from the DLQ to the source queue.
  • Messages redriven from DLQ to source queue

    Messages redriven from DLQ to source queue

    Conclusion

    Dead-letter queue redrive to source queues allows you to effectively manage the life cycle of unconsumed messages stored in dead-letter queues. You can build applications with the confidence that you can easily examine unconsumed messages, recover from errors, and reprocess failed messages.

    You can redrive messages from their DLQs to their source queues using the Amazon SQS console.

    Dead-letter queue redrive to source queues is available in all commercial regions, and coming soon to GovCloud.

    To get started, visit https://aws.amazon.com/sqs/

    For more serverless learning resources, visit Serverless Land.

    Visualizing AWS Step Functions workflows from the Amazon Athena console

    Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/visualizing-aws-step-functions-workflows-from-the-amazon-athena-console/

    This post is written by Dhiraj Mahapatro, Senior Specialist SA, Serverless.

    In October 2021, AWS announced visualizing AWS Step Functions from the AWS Batch console. Now you can also visualize Step Functions from the Amazon Athena console.

    Amazon Athena is an interactive query service that makes it easier to analyze Amazon S3 data using standard SQL. Athena is a serverless service and can interact directly with data stored in S3. Athena can process unstructured, semistructured, and structured datasets.

    AWS Step Functions is a low-code visual workflow service used to orchestrate AWS services, automate business processes, and build serverless applications. Step Functions workflows manage failures, retries, parallelization, service integrations, and observability so builders can focus on business logic. Athena is one of the service integrations that are available for Step Functions.

    This blog walks through Step Functions integration in Amazon Athena console. It shows how you can visualize and operate Athena queries at scale using Step Functions.

    Introducing workflow orchestration in Amazon Athena console

    AWS customers store large amounts of historical data on S3 and query the data using Athena to get results quickly. They also use Athena to process unstructured data or analyze structured data as part of a data processing pipeline.

    Data processing involves discrete steps for ingesting, processing, storing the transformed data, and post-processing, such as visualizing or analyzing the transformed data. Each step involves multiple AWS services. With Step Functions workflow integration, you can orchestrate these steps. This helps to create repeatable and scalable data processing pipelines as part of a larger business application and visualize the workflows in the Athena console.

    With Step Functions, you can run queries on a schedule or based on an event by using Amazon EventBridge. You can poll long-running Athena queries before moving to the next step in the process, and handle errors without writing custom code. Combining these two services provides developers with a single method that is scalable and repeatable.

    Step Functions workflows in the Amazon Athena console allow orchestration of Athena queries with Step Functions state machines:

    Athena console

    Using Athena query patterns from Step Functions

    Execute multiple queries

    In Athena, you run SQL queries in the Athena console against Athena workgroups. With Step Functions, you can run Athena queries in a sequence or run independent queries simultaneously in parallel using a parallel state. Step Functions also natively handles errors and retries related to Athena query tasks.

    Workflow orchestration in the Athena console provides these capabilities to run and visualize multiple queries in Step Functions. For example:

    UI flow

    1. Choose Get Started from Execute multiple queries.
    2. From the pop-up, choose Create your own workflow and select Continue.

    A new browser tab opens with the Step Functions Workflow Studio. The designer shows a workflow pattern template pre-created. The workflow loads data from a data source running two Athena queries in parallel. The results are then published to an Amazon SNS topic.

    Alternatively, choosing Deploy a sample project from the Get Started pop-up deploys a sample Step Functions workflow.

    Get started flow

    This option creates a state machine. You then review the workflow definition, deploy an AWS CloudFormation stack, and run the workflow in the Step Functions console.

    Deploy and run

    Once deployed, the state machine is visible in the Step Functions console as:

    State machines

    Select the AthenaMultipleQueriesStateMachine to land on the details page:

    Details page

    The CloudFormation template provisions the required AWS Glue database, S3 bucket, an Athena workgroup, the required AWS Lambda functions, and the SNS topic for query results notification.

    To see the Step Functions workflow in action, choose Start execution. Keep the optional name and input and choose Start execution:

    Start execution

    The state machine completes the tasks successfully by Executing multiple queries in parallel using Amazon Athena and Sending query results using the SNS topic:

    Successful execution

    The state machine used the Amazon Athena StartQueryExecution and GetQueryResults tasks. The Workflow orchestration in Athena console now highlights this newly created Step Functions state machine:

    Athena console

    Any state machine that uses this task in Step Functions in this account is listed here as a state machine that orchestrates Athena queries.

    Query large datasets

    You can also ingest an extensive dataset in Amazon S3, partition it using AWS Glue crawlers, then run Amazon Athena queries against that partition.

    Select Get Started from the Query large datasets pop-up, then choose Create your own workflow and Continue. This action opens the Step Functions Workflow studio with the following pattern. The Glue crawler starts and partitions large datasets for Athena to query in the subsequent query execution task:

    Query large datasets

    Step Functions allows you to combine Glue crawler tasks and Athena queries to partition where necessary before querying and publishing the results.

    Keeping data up to date

    You can also use Athena to query a target table to fetch data, then update it with new data from other sources using Step Functions’ choice state. The choice state in Step Functions provides branching logic for a state machine.

    Keep data up to date

    You are not limited to the previous three patterns shown in workflow orchestration in the Athena console. You can start from scratch and build Step Functions state machine by navigating to the bottom right and using Create state machine:

    Create state machine

    Create State Machine in the Athena console opens a new tab showing the Step Functions console’s Create state machine page.

    Choosing authoring method

    Refer to building a state machine AWS Step Functions Workflow Studio for additional details.

    Step Functions integrates with all Amazon Athena’s API actions

    In September 2021, Step Functions announced integration support for 200 AWS services to enable easier workflow automation. With this announcement, Step Functions can integrate with all Amazon Athena API actions today.

    Step Functions can automate the lifecycle of an Athena query: Create/read/update/delete/list workGroups; Create/read/update/delete/list data catalogs, and more.

    Other AWS service integrations

    Step Functions’ integration with the AWS SDK provides support for 200 AWS Services and over 9,000 API actions. Athena tasks in Step Functions can evolve by integrating available AWS services in the workflow for their pre and post-processing needs.

    For example, you can read Athena query results that are put to an S3 bucket by using a GetObject S3 task AWS SDK integration in Step Functions. You can combine different AWS services into a single business process so that they can ingest data through Amazon Kinesis, do processing via AWS Lambda or Amazon EMR jobs, and send notifications to interested parties via Amazon SNS or Amazon SQS or Amazon EventBridge to trigger other parts of their business application.

    There are multiple ways to decorate around an Amazon Athena job task. Refer to AWS SDK service integrations and optimized integrations for Step Functions for additional details.

    Important considerations

    Workflow orchestrations in the Athena console only show Step Functions state machines that use Athena’s optimized API integrations. This includes StartQueryExecution, StopQueryExecution, GetQueryExecution, and GetQueryResults.

    Step Functions state machines do not show in the Athena console when:

    1. A state machine uses any other AWS SDK Athena API integration task.
    2. The APIs are invoked inside a Lambda function task using an AWS SDK client (like Boto3 or Node.js or Java).

    Cleanup

    First, empty DataBucket and AthenaWorkGroup to delete the stack successfully. To delete the sample application stack, use the latest version of AWS CLI and run:

    aws cloudformation delete-stack --stack-name <stack-name>

    Alternatively, delete the sample application stack in the CloudFormation console by selecting the stack and choosing Delete:

    Stack deletion

    Conclusion

    Amazon Athena console now provides an integration with AWS Step Functions’ workflows. You can use the provided patterns to create and visualize Step Functions’ workflows directly from the Amazon Athena console. Step Functions’ workflows that use Athena’s optimized API integration appear in the Athena console. To learn more about Amazon Athena, read the user guide.

    To get started, open the Workflows page in the Athena console. Select Create Athena jobs with Step Functions Workflows to deploy a sample project, if you are new to Step Functions.

    For more serverless learning resources, visit Serverless Land.

    Offset lag metric for Amazon MSK as an event source for Lambda

    Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/offset-lag-metric-for-amazon-msk-as-an-event-source-for-lambda/

    This post written by Adam Wagner, Principal Serverless Solutions Architect.

    Last year, AWS announced support for Amazon Managed Streaming for Apache Kafka (MSK) and self-managed Apache Kafka clusters as event sources for AWS Lambda. Today, AWS adds a new OffsetLag metric to Lambda functions with MSK or self-managed Apache Kafka event sources.

    Offset in Apache Kafka is an integer that marks the current position of a consumer. OffsetLag is the difference in offset between the last record written to the Kafka topic and the last record processed by Lambda. Kafka expresses this in the number of records, not a measure of time. This metric provides visibility into whether your Lambda function is keeping up with the records added to the topic it is processing.

    This blog walks through using the OffsetLag metric along with other Lambda and MSK metrics to understand your streaming application and optimize your Lambda function.

    Overview

    In this example application, a producer writes messages to a topic on the MSK cluster that is an event source for a Lambda function. Each message contains a number and the Lambda function finds the factors of that number. It outputs the input number and results to an Amazon DynamoDB table.

    Finding all the factors of a number is fast if the number is small but takes longer for larger numbers. This difference means the size of the number written to the MSK topic influences the Lambda function duration.

    Example application architecture

    Example application architecture

    1. A Kafka client writes messages to a topic in the MSK cluster.
    2. The Lambda event source polls the MSK topic on your behalf for new messages and triggers your Lambda function with batches of messages.
    3. The Lambda function factors the number in each message and then writes the results to DynamoDB.

    In this application, several factors can contribute to offset lag. The first is the volume and size of messages. If more messages are coming in, the Lambda may take longer to process them. Other factors are the number of partitions in the topic, and the number of concurrent Lambda functions processing messages. A full explanation of how Lambda concurrency scales with the MSK event source is in the documentation.

    If the average duration of your Lambda function increases, this also tends to increase the offset lag. This lag could be latency in a downstream service or due to the complexity of the incoming messages. Lastly, if your Lambda function errors, the MSK event source retries the identical records set until they succeed. This retry functionality also increases offset lag.

    Measuring OffsetLag

    To understand how the new OffsetLag metric works, you first need a working MSK topic as an event source for a Lambda function. Follow this blog post to set up an MSK instance.

    To find the OffsetLag metric, go to the CloudWatch console, select All Metrics from the left-hand menu. Then select Lambda, followed by By Function Name to see a list of metrics by Lambda function. Scroll or use the search bar to find the metrics for this function and select OffsetLag.

    OffsetLag metric example

    OffsetLag metric example

    To make it easier to look at multiple metrics at once, create a CloudWatch dashboard starting with the OffsetLag metric. Select Actions -> Add to Dashboard. Select the Create new button, provide the dashboard a name. Choose Create, keeping the rest of the options at the defaults.

    Adding OffsetLag to dashboard

    Adding OffsetLag to dashboard

    After choosing Add to dashboard, the new dashboard appears. Choose the Add widget button to add the Lambda duration metric from the same function. Add another widget that combines both Lambda errors and invocations for the function. Finally, add a widget for the BytesInPerSec metric for the MSK topic. Find this metric under AWS/Kafka -> Broker ID, Cluster Name, Topic. Finally, click Save dashboard.

    After a few minutes, you see a steady stream of invocations, as you would expect when consuming from a busy topic.

    Data incoming to dashboard

    Data incoming to dashboard

    This example is a CloudWatch dashboard showing the Lambda OffsetLag, Duration, Errors, and Invocations, along with the BytesInPerSec for the MSK topic.

    In this example, the OffSetLag metric is averaging about eight, indicating that the Lambda function is eight records behind the latest record in the topic. While this is acceptable, there is room for improvement.

    The first thing to look for is Lambda function errors, which can drive up offset lag. The metrics show that there are no errors so the next step is to evaluate and optimize the code.

    The Lambda handler function loops through the records and calls the process_msg function on each record:

    def lambda_handler(event, context):
        for batch in event['records'].keys():
            for record in event['records'][batch]:
                try:
                    process_msg(record)
                except:
                    print("error processing record:", record)
        return()

    The process_msg function handles base64 decoding, calls a factor function to factor the number, and writes the record to a DynamoDB table:

    def process_msg(record):
        #messages are base64 encoded, so we decode it here
        msg_value = base64.b64decode(record['value']).decode()
        msg_dict = json.loads(msg_value)
        #using the number as the hash key in the dynamodb table
        msg_id = f"{msg_dict['number']}"
        if msg_dict['number'] <= MAX_NUMBER:
            factors = factor_number(msg_dict['number'])
            print(f"number: {msg_dict['number']} has factors: {factors}")
            item = {'msg_id': msg_id, 'msg':msg_value, 'factors':factors}
            resp = ddb_table.put_item(Item=item)
        else:
            print(f"ERROR: {msg_dict['number']} is >= limit of {MAX_NUMBER}")
    

    The heavy computation takes place in the factor function:

    def factor(number):
        factors = [1,number]
        for x in range(2, (int(1 + number / 2))):
            if (number % x) == 0:
                factors.append(x)
        return factors

    The code loops through all numbers up to the factored number divided by two. The code is optimized by only looping up to the square root of the number.

    def factor(number):
        factors = [1,number]
        for x in range(2, 1 + int(number**0.5)):
            if (number % x) == 0:
                factors.append(x)
                factors.append(number // x)
        return factors

    There are further optimizations and libraries for factoring numbers but this provides a noticeable performance improvement in this example.

    Data after optimization

    Data after optimization

    After deploying the code, refresh the metrics after a while to see the improvements:

    The average Lambda duration has dropped to single-digit milliseconds and the OffsetLag is now averaging two.

    If you see a noticeable change in the OffsetLag metric, there are several things to investigate. The input side of the system, increased messages per second, or a significant increase in the size of the message are a few options.

    Conclusion

    This post walks through implementing the OffsetLag metric to understand latency between the latest messages in the MSK topic and the records a Lambda function is processing. It also reviews other metrics that help understand the underlying cause of increases to the offset lag. For more information on this topic, refer to the documentation and other MSK Lambda metrics.

    For more serverless learning resources, visit Serverless Land.

    Expanding cross-Region event routing with Amazon EventBridge

    Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/expanding-cross-region-event-routing-with-amazon-eventbridge/

    This post is written by Stephen Liedig, Sr Serverless Specialist SA.

    In April 2021, AWS announced a new feature for Amazon EventBridge that allows you to route events from any commercial AWS Region to US East (N. Virginia), US West (Oregon), and Europe (Ireland). From today, you can now route events between any AWS Regions, except AWS GovCloud (US) and China.

    EventBridge enables developers to create event-driven applications by routing events between AWS services, integrated software as a service (SaaS) applications, and your own applications. This helps you produce loosely coupled, distributed, and maintainable architectures. With these new capabilities, you can now route events across Regions and accounts using the same model used to route events to existing targets.

    Cross-Region event routing with Amazon EventBridge makes it easier for customers to develop multi-Region workloads to:

    • Centralize your AWS events into one Region for auditing and monitoring purposes, such as aggregating security events for compliance reasons in a single account.
    • Replicate events from source to destinations Regions to help synchronize data in cross-Region data stores.
    • Invoke asynchronous workflows in a different Region from a source event. For example, you can load balance from a target Region by routing events to another Region.

    A previous post shows how cross-Region routing works. This blog post expands on these concepts and discusses a common use case for cross-Region event delivery – event auditing. This example explores how you can manage resources using AWS CloudFormation and EventBridge resource policies.

    Multi-Region event auditing example walkthrough

    Compliance is an important part of building event-driven applications and reacting to any potential policy or security violations. Customers use EventBridge to route security events from applications and globally distributed infrastructure into a single account for analysis. In many cases, they share specific AWS CloudTrail events with security teams. Customers also audit events from their custom-built applications to monitor sensitive data usage.

    In this scenario, a company has their base of operations located in Asia Pacific (Singapore) with applications distributed across US East (N. Virginia) and Europe (Frankfurt). The applications in US East (N. Virginia) and Europe (Frankfurt) are using EventBridge for their respective applications and services. The security team in Asia Pacific (Singapore) wants to analyze events from the applications and CloudTrail events for specific API calls to monitor infrastructure security.

    Reference architecture

    To create the rules to receive these events:

    1. Create a new set of rules directly on all the event buses across the global infrastructure. Alternatively, delegate the responsibility of managing security rules to distributed teams that manage the event bus resources.
    2. Provide the security team with the ability to manage rules centrally, and control the lifecycle of rules on the global infrastructure.

    Allowing the security team to manage the resources centrally provides more scalability. It is more consistent with the design principle that event consumers own and manage the rules that define how they process events.

    Deploying the example application

    The following code snippets are shortened for brevity. The full source code of the solution is in the GitHub repository. The solution uses AWS Serverless Application Model (AWS SAM) for deployment. Clone the repo and navigate to the solution directory:

    git clone https://github.com/aws-samples/amazon-eventbridge-resource-policy-samples
    cd ./patterns/ cross-region-cross-account-pattern/

    To allow the security team to start receiving accounts from any of the cross-Region accounts:

    1. Create a security event bus in the Asia Pacific (Singapore) Region with a rule that processes events from the respective event sources.

    ap-southest-1 architecture

    For simplicity, this example uses an Amazon CloudWatch Logs target to visualize the events arriving from cross-Region accounts:

     SecurityEventBus:
      Type: AWS::Events::EventBus
      Properties:
       Name: !Ref SecurityEventBusName
    
     # This rule processes events coming in from cross-Region accounts
     SecurityAnalysisRule:
      Type: AWS::Events::Rule
      Properties:
       Name: SecurityAnalysisRule
       Description: Analyze events from cross-Region event buses
       EventBusName: !GetAtt SecurityEventBus.Arn
       EventPattern:
        source:
         - anything-but: com.company.security
       State: ENABLED
       RoleArn: !GetAtt WriteToCwlRole.Arn
       Targets:
        - Id: SendEventToSecurityAnalysisRule
         Arn: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:${SecurityAnalysisRuleTarget}"
    

    In this example, you set the event pattern to process any event from a source that is not from the security team’s own domain. This allows you to process events from any account in any Region. You can filter this further as needed.

    2. Set an event bus policy on each default and custom event bus that the security team must receive events from.

    Event bus policy

    This policy allows the security team to create rules to route events to its own security event bus in the Asia Pacific (Singapore) Region. The following policy defines a custom event bus in Account 2 in US East (N. Virginia) and an AWS::Events::EventBusPolicy that sets the Principal as the security team account.

    This allows the security team to manage rules on the CustomEventBus:

     CustomEventBus: 
      Type: AWS::Events::EventBus
      Properties: 
        Name: !Ref EventBusName
    
     SecurityServiceRuleCreationStatement:
      Type: AWS::Events::EventBusPolicy
      Properties:
       EventBusName: !Ref CustomEventBus # If you omit this, the default event bus is used.
       StatementId: "AllowCrossRegionRulesForSecurityTeam"
       Statement:
        Effect: "Allow"
        Principal:
         AWS: !Sub "arn:aws:iam::${SecurityAccountNo}:root"
        Action:
         - "events:PutRule"
         - "events:DeleteRule"
         - "events:DescribeRule"
         - "events:DisableRule"
         - "events:EnableRule"
         - "events:PutTargets"
         - "events:RemoveTargets"
        Resource:
         - !Sub 'arn:aws:events:${AWS::Region}:${AWS::AccountId}:rule/${CustomEventBus.Name}/*'
        Condition:
         StringEqualsIfExists:
          "events:creatorAccount": "${aws:PrincipalAccount}"
    

    3. With the policies set on the cross-Region accounts, now create the rules. Because you cannot create CloudFormation resources across Regions, you must define the rules in separate templates. This also gives the ability to expand to other Regions.

    Once the template is deployed to the cross-Region accounts, use EventBridge resource policies to propagate rule definitions across accounts in the same Region. The security account must have permission to create CloudFormation resources in the cross-Region accounts to deploy the rule templates.

    Resource policies

    There are two parts to the rule templates. The first specifies a role that allows EventBridge to assume a role to send events to the target event bus in the security account:

     # This IAM role allows EventBridge to assume the permissions necessary to send events
     # from the source event buses to the destination event bus.
     SourceToDestinationEventBusRole:
      Type: "AWS::IAM::Role"
      Properties:
       AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
         - Effect: Allow
          Principal:
           Service:
            - events.amazonaws.com
          Action:
           - "sts:AssumeRole"
       Path: /
       Policies:
        - PolicyName: PutEventsOnDestinationEventBus
         PolicyDocument:
          Version: 2012-10-17
          Statement:
           - Effect: Allow
            Action: "events:PutEvents"
            Resource:
             - !Ref SecurityEventBusArn
    

    The second is the definition of the rule resource. This requires the Amazon Resource Name (ARN) of the event bus where you want to put the rule, the ARN of the target event bus in the security account, and a reference to the SourceToDestinationEventBusRole role:

     SecurityAuditRule2:
      Type: AWS::Events::Rule
      Properties:
       Name: SecurityAuditRuleAccount2
       Description: Audit rule for the Security team in Singapore
       EventBusName: !Ref EventBusArnAccount2 # ARN of the custom event bus in Account 2
       EventPattern:
        source:
         - com.company.marketing
       State: ENABLED
       Targets:
        - Id: SendEventToSecurityEventBusArn
         Arn: !Ref SecurityEventBusArn
         RoleArn: !GetAtt SourceToDestinationEventBusRole.Arn
    

    You can use the AWS SAM CLI to deploy this:

    sam deploy -t us-east-1-rules.yaml \
      --stack-name us-east-1-rules \
      --region us-east-1 \
      --profile default \
      --capabilities=CAPABILITY_IAM \
      --parameter-overrides SecurityEventBusArn="arn:aws:events:ap-southeast-1:111111111111:event-bus/SecurityEventBus" EventBusArnAccount1="arn:aws:events:us-east-1:111111111111:event-bus/default" EventBusArnAccount2="arn:aws:events:us-east-1:222222222222:event-bus/custom-eventbus-account-2"
    

    Testing the example application

    With the rules deployed across the Regions, you can test by sending events to the event bus in Account 2:

    1. Navigate to the applications/account_2 directory. Here you find an events.json file, which you use as input for the put-events API call.
    2. Run the following command using the AWS CLI. This sends messages to the event bus in us-east-1 which are routed to the security event bus in ap-southeast-1:
      aws events put-events \
       --region us-east-1 \
       --profile [NAMED PROFILE FOR ACCOUNT 2] \
       --entries file://events.json
      

      If you have run this successfully, you see:
      Entries:
      - EventId: a423b35e-3df0-e5dc-b854-db9c42144fa2
      - EventId: 5f22aea8-51ea-371f-7a5f-8300f1c93973
      - EventId: 7279fa46-11a6-7495-d7bb-436e391cfcab
      - EventId: b1e1ecc1-03f7-e3ef-9aa4-5ac3c8625cc7
      - EventId: b68cea94-28e2-bfb9-7b1f-9b2c5089f430
      - EventId: fc48a303-a1b2-bda8-8488-32daa5f809d8
      FailedEntryCount: 0

    3. Navigate to the Amazon CloudWatch console to see a collection of log entries with the events you published. The log group is /aws/events/SecurityAnalysisRule.CloudWatch Logs

    Congratulations, you have successfully sent your first events across accounts and Regions!

    Conclusion

    With cross-Region event routing in EventBridge, you can now route events to and from any AWS Region. This post explains how to manage and configure cross-Region event routing using CloudFormation and EventBridge resource policies to simplify rule propagation across your global event bus infrastructure. Finally, I walk through an example you can deploy to your AWS account.

    For more serverless learning resources, visit Serverless Land.

    Introducing mutual TLS authentication for Amazon MSK as an event source

    Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-mutual-tls-authentication-for-amazon-msk-as-an-event-source/

    This post is written by Uma Ramadoss, Senior Specialist Solutions Architect, Integration.

    Today, AWS Lambda is introducing mutual TLS (mTLS) authentication for Amazon Managed Streaming for Apache Kafka (Amazon MSK) and self-managed Kafka as an event source.

    Many customers use Amazon MSK for streaming data from multiple producers. Multiple subscribers can then consume the streaming data and build data pipelines, analytics, and data integration. To learn more, read Using Amazon MSK as an event source for AWS Lambda.

    You can activate any combination of authentication modes (mutual TLS, SASL SCRAM, or IAM access control) on new or existing clusters. This is useful if you are migrating to a new authentication mode or must run multiple authentication modes simultaneously. Lambda natively supports consuming messages from both self-managed Kafka and Amazon MSK through event source mapping.

    By default, the TLS protocol only requires a server to authenticate itself to the client. The authentication of the client to the server is managed by the application layer. The TLS protocol also offers the ability for the server to request that the client send an X.509 certificate to prove its identity. This is called mutual TLS as both parties are authenticated via certificates with TLS.

    Mutual TLS is a commonly used authentication mechanism for business-to-business (B2B) applications. It’s used in standards such as Open Banking, which enables secure open API integrations for financial institutions. It is one of the popular authentication mechanisms for customers using Kafka.

    To use mutual TLS authentication for your Kafka-triggered Lambda functions, you provide a signed client certificate, the private key for the certificate, and an optional password if the private key is encrypted. This establishes a trust relationship between Lambda and Amazon MSK or self-managed Kafka. Lambda supports self-signed server certificates or server certificates signed by a private certificate authority (CA) for self-managed Kafka. Lambda trusts the Amazon MSK certificate by default as the certificates are signed by Amazon Trust Services CAs.

    This blog post explains how to set up a Lambda function to process messages from an Amazon MSK cluster using mutual TLS authentication.

    Overview

    Using Amazon MSK as an event source operates in a similar way to using Amazon SQS or Amazon Kinesis. You create an event source mapping by attaching Amazon MSK as event source to your Lambda function.

    The Lambda service internally polls for new records from the event source, reading the messages from one or more partitions in batches. It then synchronously invokes your Lambda function, sending each batch as an event payload. Lambda continues to process batches until there are no more messages in the topic.

    The Lambda function’s event payload contains an array of records. Each array item contains details of the topic and Kafka partition identifier, together with a timestamp and base64 encoded message.

    Kafka event payload

    Kafka event payload

    You store the signed client certificate, the private key for the certificate, and an optional password if the private key is encrypted in the AWS Secrets Manager as a secret. You provide the secret in the Lambda event source mapping.

    The steps for using mutual TLS authentication for Amazon MSK as event source for Lambda are:

    1. Create a private certificate authority (CA) using AWS Certificate Manager (ACM) Private Certificate Authority (PCA).
    2. Create a client certificate and private key. Store them as secret in AWS Secrets Manager.
    3. Create an Amazon MSK cluster and a consuming Lambda function using the AWS Serverless Application Model (AWS SAM).
    4. Attach the event source mapping.

    This blog walks through these steps in detail.

    Prerequisites

    1. Creating a private CA.

    To use mutual TLS client authentication with Amazon MSK, create a root CA using AWS ACM Private Certificate Authority (PCA). We recommend using independent ACM PCAs for each MSK cluster when you use mutual TLS to control access. This ensures that TLS certificates signed by PCAs only authenticate with a single MSK cluster.

    1. From the AWS Certificate Manager console, choose Create a Private CA.
    2. In the Select CA type panel, select Root CA and choose Next.
    3. Select Root CA

      Select Root CA

    4. In the Configure CA subject name panel, provide your certificate details, and choose Next.
    5. Provide your certificate details

      Provide your certificate details

    6. From the Configure CA key algorithm panel, choose the key algorithm for your CA and choose Next.
    7. Configure CA key algorithm

      Configure CA key algorithm

    8. From the Configure revocation panel, choose any optional certificate revocation options you require and choose Next.
    9. Configure revocation

      Configure revocation

    10. Continue through the screens to add any tags required, allow ACM to renew certificates, review your options, and confirm pricing. Choose Confirm and create.
    11. Once the CA is created, choose Install CA certificate to activate your CA. Configure the validity of the certificate and the signature algorithm and choose Next.
    12. Configure certificate

      Configure certificate

    13. Review the certificate details and choose Confirm and install. Note down the Amazon Resource Name (ARN) of the private CA for the next section.
    14. Review certificate details

      Review certificate details

    2. Creating a client certificate.

    You generate a client certificate using the root certificate you previously created, which is used to authenticate the client with the Amazon MSK cluster using mutual TLS. You provide this client certificate and the private key as AWS Secrets Manager secrets to the AWS Lambda event source mapping.

    1. On your local machine, run the following command to create a private key and certificate signing request using OpenSSL. Enter your certificate details. This creates a private key file and a certificate signing request file in the current directory.
    2. openssl req -new -newkey rsa:2048 -days 365 -keyout key.pem -out client_cert.csr -nodes
      OpenSSL create a private key and certificate signing request

      OpenSSL create a private key and certificate signing request

    3. Use the AWS CLI to sign your certificate request with the private CA previously created. Replace Private-CA-ARN with the ARN of your private CA. The certificate validity value is set to 300, change this if necessary. Save the certificate ARN provided in the response.
    4. aws acm-pca issue-certificate --certificate-authority-arn Private-CA-ARN --csr fileb://client_cert.csr --signing-algorithm "SHA256WITHRSA" --validity Value=300,Type="DAYS"
    5. Retrieve the certificate that ACM signed for you. Replace the Private-CA-ARN and Certificate-ARN with the ARN you obtained from the previous commands. This creates a signed certificate file called client_cert.pem.
    6. aws acm-pca get-certificate --certificate-authority-arn Private-CA-ARN --certificate-arn Certificate-ARN | jq -r '.Certificate + "\n" + .CertificateChain' >> client_cert.pem
    7. Create a new file called secret.json with the following structure
    8. {
      "certificate":"",
      "privateKey":""
      }
      
    9. Copy the contents of the client_cert.pem in certificate and the content of key.pem in privatekey. Ensure that there are no extra spaces added. The file structure looks like this:
    10. Certificate file structure

      Certificate file structure

    11. Create the secret and save the ARN for the next section.
    aws secretsmanager create-secret --name msk/mtls/lambda/clientcert --secret-string file://secret.json

    3. Setting up an Amazon MSK cluster with AWS Lambda as a consumer.

    Amazon MSK is a highly available service, so it must be configured to run in a minimum of two Availability Zones in your preferred Region. To comply with security best practice, the brokers are usually configured in private subnets in each Region.

    You can use AWS CLI, AWS Management Console, AWS SDK and AWS CloudFormation to create the cluster and the Lambda functions. This blog uses AWS SAM to create the infrastructure and the associated code is available in the GitHub repository.

    The AWS SAM template creates the following resources:

    1. Amazon Virtual Private Cloud (VPC).
    2. Amazon MSK cluster with mutual TLS authentication.
    3. Lambda function for consuming the records from the Amazon MSK cluster.
    4. IAM roles.
    5. Lambda function for testing the Amazon MSK integration by publishing messages to the topic.

    The VPC has public and private subnets in two Availability Zones with the private subnets configured to use a NAT Gateway. You can also set up VPC endpoints with PrivateLink to allow the Amazon MSK cluster to communicate with Lambda. To learn more about different configurations, see this blog post.

    The Lambda function requires permission to describe VPCs and security groups, and manage elastic network interfaces to access the Amazon MSK data stream. The Lambda function also needs two Kafka permissions: kafka:DescribeCluster and kafka:GetBootstrapBrokers. The policy template AWSLambdaMSKExecutionRole includes these permissions. The Lambda function also requires permission to get the secret value from AWS Secrets Manager for the secret you configure in the event source mapping.

      ConsumerLambdaFunctionRole:
        Type: AWS::IAM::Role
        Properties:
          AssumeRolePolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Principal:
                  Service: lambda.amazonaws.com
                Action: sts:AssumeRole
          ManagedPolicyArns:
            - arn:aws:iam::aws:policy/service-role/AWSLambdaMSKExecutionRole
          Policies:
            - PolicyName: SecretAccess
              PolicyDocument:
                Version: "2012-10-17"
                Statement:
                  - Effect: Allow
                    Action: "SecretsManager:GetSecretValue"
                    Resource: "*"

    This release adds two new SourceAccessConfiguration types to the Lambda event source mapping:

    1. CLIENT_CERTIFICATE_TLS_AUTH – (Amazon MSK, Self-managed Apache Kafka) The Secrets Manager ARN of your secret key containing the certificate chain (PEM), private key (PKCS#8 PEM), and private key password (optional) used for mutual TLS authentication of your Amazon MSK/Apache Kafka brokers. A private key password is required if the private key is encrypted.

    2. SERVER_ROOT_CA_CERTIFICATE – This is only for self-managed Apache Kafka. This contains the Secrets Manager ARN of your secret containing the root CA certificate used by your Apache Kafka brokers in PEM format. This is not applicable for Amazon MSK as Amazon MSK brokers use public AWS Certificate Manager certificates which are trusted by AWS Lambda by default.

    Deploying the resources:

    To deploy the example application:

    1. Clone the GitHub repository
    2. git clone https://github.com/aws-samples/aws-lambda-msk-mtls-integration.git
    3. Navigate to the aws-lambda-msk-mtls-integration directory. Copy the client certificate file and the private key file to the producer lambda function code.
    4. cd aws-lambda-msk-mtls-integration
      cp ../client_cert.pem code/producer/client_cert.pem
      cp ../key.pem code/producer/client_key.pem
    5. Navigate to the code directory and build the application artifacts using the AWS SAM build command.
    6. cd code
      sam build
    7. Run sam deploy to deploy the infrastructure. Provide the Stack Name, AWS Region, ARN of the private CA created in section 1. Provide additional information as required in the sam deploy and deploy the stack.
    8. sam deploy -g
      Running sam deploy -g

      Running sam deploy -g

      The stack deployment takes about 30 minutes to complete. Once complete, note the output values.

    9. Create the event source mapping for the Lambda function. Replace the CONSUMER_FUNCTION_NAME and MSK_CLUSTER_ARN from the output of the stack created by the AWS SAM template. Replace SECRET_ARN with the ARN of the AWS Secrets Manager secret created previously.
    10. aws lambda create-event-source-mapping --function-name CONSUMER_FUNCTION_NAME --batch-size 10 --starting-position TRIM_HORIZON --topics exampleTopic --event-source-arn MSK_CLUSTER_ARN --source-access-configurations '[{"Type": "CLIENT_CERTIFICATE_TLS_AUTH","URI": "SECRET_ARN"}]'
    11. Navigate one directory level up and configure the producer function with the Amazon MSK broker details. Replace the PRODUCER_FUNCTION_NAME and MSK_CLUSTER_ARN from the output of the stack created by the AWS SAM template.
    12. cd ../
      ./setup_producer.sh MSK_CLUSTER_ARN PRODUCER_FUNCTION_NAME
    13. Verify that the event source mapping state is enabled before moving on to the next step. Replace UUID from the output of step 5.
    14. aws lambda get-event-source-mapping --uuid UUID
    15. Publish messages using the producer. Replace PRODUCER_FUNCTION_NAME from the output of the stack created by the AWS SAM template. The following command creates a Kafka topic called exampleTopic and publish 100 messages to the topic.
    16. ./produce.sh PRODUCER_FUNCTION_NAME exampleTopic 100
    17. Verify that the consumer Lambda function receives and processes the messages by checking in Amazon CloudWatch log groups. Navigate to the log group by searching for aws/lambda/{stackname}-MSKConsumerLambda in the search bar.
    Consumer function log stream

    Consumer function log stream

    Conclusion

    Lambda now supports mutual TLS authentication for Amazon MSK and self-managed Kafka as an event source. You now have the option to provide a client certificate to establish a trust relationship between Lambda and MSK or self-managed Kafka brokers. It supports configuration via the AWS Management Console, AWS CLI, AWS SDK, and AWS CloudFormation.

    To learn more about how to use mutual TLS Authentication for your Kafka triggered AWS Lambda function, visit AWS Lambda with self-managed Apache Kafka and Using AWS Lambda with Amazon MSK.

    Publishing messages in batch to Amazon SNS topics

    Post Syndicated from Talia Nassi original https://aws.amazon.com/blogs/compute/publishing-messages-in-batch-to-amazon-sns-topics/

    This post is written by Heeki Park (Principal Solutions Architect, Serverless Specialist), Marc Pinaud (Senior Product Manager, Amazon SNS), Amir Eldesoky (Software Development Engineer, Amazon SNS), Jack Li (Software Development Engineer, Amazon SNS), and William Nguyen (Software Development Engineer, Amazon SNS).

    Today, we are announcing the ability for AWS customers to publish messages in batch to Amazon SNS topics. Until now, you were only able to publish one message to an SNS topic per Publish API request. With the new PublishBatch API, you can send up to 10 messages at a time in a single API request. This reduces cost for API requests by up to 90%, as you need fewer API requests to publish the same number of messages.

    Introducing the PublishBatch API

    Consider a log processing application where you process system logs and have different requirements for downstream processing. For example, you may want to do inference on
    incoming log data, populate an operational Amazon OpenSearch Service environment, and store log data in an enterprise data lake.

    Systems send log data to a standard SNS topic, and Amazon SQS queues and Amazon Kinesis Data Firehose are configured as subscribers. An AWS Lambda function subscribes to the first SQS queue and uses machine learning models to perform inference to detect security incidents or system access anomalies. A Lambda function subscribes to the second SQS queue and emits those log entries to an Amazon OpenSearch Service cluster. The workload uses Kibana dashboards to visualize log data. An Amazon Kinesis Data Firehose delivery stream subscribes to the SNS topic and archives all log data into Amazon S3. This allows data scientists to conduct further investigation and research on those logs.

    To do this, the following Java code publishes a set of log messages. In this code, you construct a publish request for a single message to an SNS topic and submit that request via the publish() method:

    // tab 1: standard publish example
    private static AmazonSNS snsClient;
    private static final String MESSAGE_PAYLOAD = " 192.168.1.100 - - [28/Oct/2021:10:27:10 -0500] "GET /index.html HTTP/1.1" 200 3395";
    
    PublishRequest request = new PublishRequest()
        .withTopicArn(topicArn)
        .withMessage(MESSAGE_PAYLOAD);
    PublishResult response = snsClient.publish(request);
    
    // tab 2: fifo publish example
    private static AmazonSNS snsClient;
    private static final String MESSAGE_PAYLOAD = " 192.168.1.100 - - [28/Oct/2021:10:27:10 -0500] "GET /index.html HTTP/1.1" 200 3395";
    private static final String MESSAGE_FIFO_GROUP = "server1234";
    
    PublishRequest request = new PublishRequest()
        .withTopicArn(topicArn)
        .withMessage(MESSAGE_PAYLOAD)
        .withMessageGroupId(MESSAGE_FIFO_GROUP)
        .withMessageDeduplicationId(UUID.randomUUID().toString());
    PublishResult response = snsClient.publish(request);
    

    If you extended the example above and had 10 log lines that each needed to be published as a message, you would have to write code to construct 10 publish requests, and subsequently submit each of those requests via the publish() method.

    With the new ability to publish batch messages, you write the following new code. In the code below, you construct a list of publish entries first, then create a single publish batch request, and subsequently submit that batch request via the new publishBatch() method. In the code below, you use a sample helper method getLoggingPayload(i) to get the appropriate payload for the message, which you can replace with your own business logic.

    // tab 1: standard publish example
    private static final String MESSAGE_BATCH_ID_PREFIX = "server1234-batch-id-";
    
    List<PublishBatchRequestEntry> entries = IntStream.range(0, 10)
    .mapToObj(i -> {
    new PublishBatchRequestEntry()
    .withId(MESSAGE_BATCH_ID_PREFIX + i)
    .withMessage(getLoggingPayload(i));
    })
    .collect(Collectors.toList());
    PublishBatchRequest request = new PublishBatchRequest()
    .withTopicArn(topicArn)
    .withPublishBatchRequestEntries(entries);
    PublishBatchResult response = snsClient.publishBatch(request);
    
    // tab 2: fifo publish example
    private static final String MESSAGE_BATCH_ID_PREFIX = "server1234-batch-id-";
    private static final String MESSAGE_FIFO_GROUP = "server1234";
    
    
    List<PublishBatchRequestEntry> entries = IntStream.range(0, 10)
    .mapToObj(i -> {
    new PublishBatchRequestEntry()
    .withId(MESSAGE_BATCH_ID_PREFIX + i)
    .withMessage(getLoggingPayload(i))
    .withMessageGroupId(MESSAGE_FIFO_GROUP)
    .withMessageDeduplicationId(UUID.randomUUID().toString());
    })
    .collect(Collectors.toList());
    PublishBatchRequest request = new PublishBatchRequest()
    .withTopicArn(topicArn)
    .withPublishBatchRequestEntries(entries);
    PublishBatchResult response = snsClient.publishBatch(request);

    In the list of publish requests, the application must assign a unique batch ID (up to 80 characters) to each publish request within that batch. When the SNS service successfully receives a message, the SNS service assigns a unique message ID and returns that message ID in the response object.

    If publishing to a FIFO topic, the SNS service additionally returns a sequence number in the response. When publishing a batch of messages, the PublishBatchResult object returns a list of response objects for successful and failed messages. If you iterate through the list of response objects for successful messages, you might see the following:

    // tab 1: standard publish output
    {
    "Id": "server1234-batch-id-0",
    "MessageId": "fcaef5b3-e9e3-5c9e-b761-ac46c4a779bb",
    ...
    }
    
    // tab 2: fifo publish output
    {
    "Id": "server1234-batch-id-0",
    "MessageId": "fcaef5b3-e9e3-5c9e-b761-ac46c4a779bb",
    "SequenceNumber": "10000000000000003000",
    ...
    }

    When receiving the message from SNS in the SQS queue, the application reads the following message:

    // tab 1: standard publish output
    {
    "Type" : "Notification",
    "MessageId" : "fcaef5b3-e9e3-5c9e-b761-ac46c4a779bb",
    "TopicArn" : "arn:aws:sns:us-east-1:112233445566:publishBatchTopic",
    "Message" : "payload-0",
    "Timestamp" : "2021-10-28T22:58:12.862Z",
    "UnsubscribeURL" : "http://sns.us-east-1.amazon.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east-1:112233445566:publishBatchTopic:ff78260a-0953-4b60-9c2c-122ebcb5fc96"
    }
    
    // tab 2: fifo publish output
    {
    "Type" : "Notification",
    "MessageId" : "fcaef5b3-e9e3-5c9e-b761-ac46c4a779bb",
    "SequenceNumber" : "10000000000000003000",
    "TopicArn" : "arn:aws:sns:us-east-1:112233445566:publishBatchTopic",
    "Message" : "payload-0",
    "Timestamp" : "2021-10-28T22:58:12.862Z",
    "UnsubscribeURL" : "http://sns.us-east-1.amazon.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east-1:112233445566:publishBatchTopic.fifo:ff78260a-0953-4b60-9c2c-122ebcb5fc96"
    }

    In the standard publish example, the MessageId of fcaef5b3-e9e3-5c9e-b761-ac46c4a779bb is propagated down to the message in SQS. In the FIFO publish example, the SequenceNumber of 10000000000000003000 is also propagated down to the message in SQS.

    Handling errors and quotas

    When publishing messages in batch, the application must handle errors that may have occurred during the publish batch request. Errors can occur at two different levels. The first is when publishing the batch request to the SNS topic. For example, if the application does not specify a unique message batch ID, it fails with the following error:

    com.amazonaws.services.sns.model.BatchEntryIdsNotDistinctException: Two or more batch entries in the request have the same Id. (Service: AmazonSNS; Status Code: 400; Error Code: BatchEntryIdsNotDistinct; Request ID: 44cdac03-eeac-5760-9264-f5f99f4914ad; Proxy: null)

    The second is within the batch request at the message level. The application must inspect the returned PublishBatchResult object by iterating through successful and failed responses:

    PublishBatchResult publishBatchResult = snsClient.publishBatch(request);
    publishBatchResult.getSuccessful().forEach(entry -> {
    System.out.println(entry.toString());
    });
    
    publishBatchResult.getFailed().forEach(entry -> {
    System.out.println(entry.toString());
    });

    With respect to quotas, the overall message throughput for an SNS topic remains the same. For example, in US East (N. Virginia), standard topics support up to 30,000 messages per second. Before this feature, 30,000 messages also meant 30,000 API requests per second. Because SNS now supports up to 10 messages per request, you can publish the same number of messages using only 3,000 API requests. With FIFO topics, the message throughput remains the same at 300 messages per second, but you can now send that volume of messages using only 30 API requests, thus reducing your messaging costs with SNS.

    Conclusion

    SNS now supports the ability to publish up to 10 messages in a single API request, reducing costs for publishing messages into SNS. Your applications can validate the publish status of each of the messages sent in the batch and handle failed publish requests accordingly. Message throughput to SNS topics remains the same for both standard and FIFO topics.

    Learn more about this ability in the SNS Developer Guide.
    Learn more about the details of the API request in the SNS API reference.
    Learn more about SNS quotas.

    For more serverless learning resources, visit Serverless Land.

    Deploying AWS Lambda layers automatically across multiple Regions

    Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/deploying-aws-lambda-layers-automatically-across-multiple-regions/

    This post is written by Ben Freiberg, Solutions Architect, and Markus Ziller, Solutions Architect.

    Many developers import libraries and dependencies into their AWS Lambda functions. These dependencies can be zipped and uploaded as part of the build and deployment process but it’s often easier to use Lambda layers instead.

    A Lambda layer is an archive containing additional code, such as libraries or dependencies. Layers are deployed as immutable versions, and the version number increments each time you publish a new layer. When you include a layer in a function, you specify the layer version you want to use.

    Lambda layers simplify and speed up the development process by providing common dependencies and reducing the deployment size of your Lambda functions. To learn more, refer to Using Lambda layers to simplify your development process.

    Many customers build Lambda layers for use across multiple Regions. But maintaining up-to-date and consistent layer versions across multiple Regions is a manual process. Layers are set as private automatically but they can be shared with other AWS accounts or shared publicly. Permissions only apply to a single version of a layer. This solution automates the creation and deployment of Lambda layers across multiple Regions from a centralized pipeline.

    Overview of the solution

    This solution uses AWS Lambda, AWS CodeCommit, AWS CodeBuild and AWS CodePipeline.

    Reference architecture

    This diagram outlines the workflow implemented in this blog:

    1. A CodeCommit repository contains the language-specific definition of dependencies and libraries that the layer contains, such as package.json for Node.js or requirements.txt for Python. Each commit to the main branch triggers an execution of the surrounding CodePipeline.
    2. A CodeBuild job uses the provided buildspec.yaml to create a zipped archive containing the layer contents. CodePipeline automatically stores the output of the CodeBuild job as artifacts in a dedicated Amazon S3 bucket.
    3. A Lambda function is invoked for each configured Region.
    4. The function first downloads the zip archive from S3.
    5. Next, the function creates the layer version in the specified Region with the configured permissions.

    Walkthrough

    The following walkthrough explains the components and how the provisioning can be automated via CDK. For this walkthrough, you need:

    To deploy the sample stack:

    1. Clone the associated GitHub repository by running the following command in a local directory:
      git clone https://github.com/aws-samples/multi-region-lambda-layers
    2. Open the repository in your preferred editor and review the contents of the src and cdk folder.
    3. Follow the instructions in the README.md to deploy the stack.
    4. Check the execution history of your pipeline in the AWS Management Console. The pipeline has been started once already and published a first version of the Lambda layer.
      Execution history

    Code repository

    The source code of the Lambda layers is stored in AWS CodeCommit. This is a secure, highly scalable, managed source control service that hosts private Git repositories. This example initializes a new repository as part of the CDK stack:

        const asset = new Asset(this, 'SampleAsset', {
          path: path.join(__dirname, '../../res')
        });
    
        const cfnRepository = new codecommit.CfnRepository(this, 'lambda-layer-source', {
          repositoryName: 'lambda-layer-source',
          repositoryDescription: 'Contains the source code for a nodejs12+14 Lambda layer.',
          code: {
            branchName: 'main',
            s3: {
              bucket: asset.s3BucketName,
              key: asset.s3ObjectKey
            }
          },
        });
    

    This code uploads the contents of the ./cdk/res/ folder to an S3 bucket that is managed by the CDK. The CDK then initializes a new CodeCommit repository with the contents of the bucket. In this case, the repository gets initialized with the following files:

    • LICENSE: A text file describing the license for this Lambda layer
    • package.json: In Node.js, the package.json file is a manifest for projects. It defines dependencies, scripts, and metainformation about the project. The npm install command installs all project dependencies in a node_modules folder. This is where you define the contents of the Lambda layer.

    The default package.json in the sample code defines a Lambda layer with the latest version of the AWS SDK for JavaScript:

    {
        "name": "lambda-layer",
        "version": "1.0.0",
        "description": "Sample AWS Lambda layer",
        "dependencies": {
          "aws-sdk": "latest"
        }
    }
    

    To see what is included in the layer, run npm install in the ./cdk/res/ directory. This shows the files that are bundled into the Lambda layer. The contents of this folder initialize the CodeCommit repository, so delete node_modules and package-lock.json inspecting these files.

    Node modules directory

    This blog post uses a new CodeCommit repository as the source but you can adapt this to other providers. CodePipeline also supports repositories on GitHub and Bitbucket. To connect to those providers, see the documentation.

    CI/CD Pipeline

    CodePipeline automates the process of building and distributing Lambda layers across Region for every change to the main branch of the source repository. It is a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. CodePipeline automates the build, test, and deploy phases of your release process every time there is a code change, based on the release model you define.

    The CDK creates a pipeline in CodePipeline and configures it so that every change to the code base of the Lambda layer runs through the following three stages:

    new codepipeline.Pipeline(this, 'Pipeline', {
          pipelineName: 'LambdaLayerBuilderPipeline',
          stages: [
            {
              stageName: 'Source',
              actions: [sourceAction]
            },
            {
              stageName: 'Build',
              actions: [buildAction]
            },
            {
              stageName: 'Distribute',
              actions: parallel,
            }
          ]
        });
    

    Source

    The source phase is the first phase of every run of the pipeline. It is typically triggered by a new commit to the main branch of the source repository. You can also start the source phase manually with the following AWS CLI command:

    aws codepipeline start-pipeline-execution --name LambdaLayerBuilderPipeline

    When started manually, the current head of the main branch is used. Otherwise CodePipeline checks out the code in the revision of the commit that triggered the pipeline execution.

    CodePipeline stores the code locally and uses it as an output artifact of the Source stage. Stages use input and output artifacts that are stored in the Amazon S3 artifact bucket you chose when you created the pipeline. CodePipeline zips and transfers the files for input or output artifacts as appropriate for the action type in the stage.

    Build

    In the second phase of the pipeline, CodePipeline installs all dependencies and packages according to the specs of the targeted Lambda runtime. CodeBuild is a fully managed build service in the cloud. It reduces the need to provision, manage, and scale your own build servers. It provides prepackaged build environments for popular programming languages and build tools like npm for Node.js.

    In CodeBuild, you use build specifications (buildspecs) to define what commands need to run to build your application. Here, it runs commands in a provisioned Docker image with Amazon Linux 2 to do the following:

    • Create the folder structure expected by Lambda Layer.
    • Run npm install to install all Node.js dependencies.
    • Package the code into a layer.zip file and define layer.zip as output of the Build stage.

    The following CDK code highlights the specifications of the CodeBuild project.

    const buildAction = new codebuild.PipelineProject(this, 'lambda-layer-builder', {
          buildSpec: codebuild.BuildSpec.fromObject({
            version: '0.2',
            phases: {
              install: {
                commands: [
                  'mkdir -p node_layer/nodejs',
                  'cp package.json ./node_layer/nodejs/package.json',
                  'cd ./node_layer/nodejs',
                  'npm install',
                ]
              },
              build: {
                commands: [
                  'rm package-lock.json',
                  'cd ..',
                  'zip ../layer.zip * -r',
                ]
              }
            },
            artifacts: {
              files: [
                'layer.zip',
              ]
            }
          }),
          environment: {
            buildImage: codebuild.LinuxBuildImage.STANDARD_5_0
          }
        })
    

    Distribute

    In the final stage, Lambda uses layer.zip to create and publish a Lambda layer across multiple Regions. The sample code defines four Regions as targets for the distribution process:

    regionCodesToDistribute: ['eu-central-1', 'eu-west-1', 'us-west-1', 'us-east-1']

    The Distribution phase consists of n (one per Region) parallel invocations of the same Lambda function, each with userParameter.region set to the respective Region. This is defined in the CDK stack:

    const parallel = props.regionCodesToDistribute.map((region) => new codepipelineActions.LambdaInvokeAction({
          actionName: `distribute-${region}`,
          lambda: distributor,
          inputs: [buildOutput],
          userParameters: { region, layerPrincipal: props.layerPrincipal }
    }));
    

    Each Lambda function runs the following code to publish a new Lambda layer in each Region:

    const parallel = props.regionCodesToDistribute.map((region) => new codepipelineActions.LambdaInvokeAction({
          actionName: `distribute-${region}`,
          lambda: distributor,
          inputs: [buildOutput],
          userParameters: { region, layerPrincipal: props.layerPrincipal }
    }));
    
    Each Lambda function runs the following code to publish a new Lambda layer in each Region:
    
    // Simplified code for brevity
    // Omitted error handling, permission management and logging 
    // See code samples for full code.
    export async function handler(event: any) {
        // #1 Get job specific parameters (e.g. target region)
        const { location } = event['CodePipeline.job'].data.inputArtifacts[0];
        const { region, layerPrincipal } = JSON.parse(event["CodePipeline.job"].data.actionConfiguration.configuration.UserParameters);
        
        // #2 Get location of layer.zip and download it locally
        const layerZip = s3.getObject(/* Input artifact location*/);
        const lambda = new Lambda({ region });
        // #3 Publish a new Lambda layer version based on layer.zip
        const layer = lambda.publishLayerVersion({
            Content: {
                ZipFile: layerZip.Body
            },
            LayerName: 'sample-layer',
            CompatibleRuntimes: ['nodejs12.x', 'nodejs14.x']
        })
        
        // #4 Report the status of the operation back to CodePipeline
        return codepipeline.putJobSuccessResult(..);
    }
    
    

    After each Lambda function completes successfully, the pipeline ends. In a production application, you likely would have additional steps after publishing. For example, it may send notifications via Amazon SNS. To learn more about other possible integrations, read Working with pipeline in CodePipeline.

    Pipeline output

    Testing the workflow

    With this automation, you can release a new version of the Lambda layer by changing package.json in the source repository.

    Add the AWS X-Ray SDK for Node.js as a dependency to your project, by making the following changes to package.json and committing the new version to your main branch:

    {
        "name": "lambda-layer",
        "version": "1.0.0",
        "description": "Sample AWS Lambda layer",
        "dependencies": {
            "aws-sdk": "latest",
            "aws-xray-sdk": "latest"
        }
    }
    

    After committing the new version to the repository, the pipeline is triggered again. After a while, you see that an updated version of the Lambda layer is published to all Regions:

    Execution history results

    Cleaning up

    Many services in this blog post are available in the AWS Free Tier. However, using this solution may incur cost and you should tear down the stack if you don’t need it anymore. Cleaning up steps are included in the readme in the repository.

    Conclusion

    This blog post shows how to create a centralized pipeline to build and distribute Lambda layers consistently across multiple Regions. The pipeline is configurable and allows you to adapt the Regions and permissions according to your use-case.

    For more serverless learning resources, visit Serverless Land.

    Modernizing deployments with container images in AWS Lambda

    Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/modernizing-deployments-with-container-images-in-aws-lambda/

    This post is written by Joseph Keating, AWS Modernization Architect, and Virginia Chu, Sr. DevSecOps Architect.

    Container image support for AWS Lambda enables developers to package function code and dependencies using familiar patterns and tools. With this pattern, developers use standard tools like Docker to package their functions as container images and deploy them to Lambda.

    In a typical deployment process for image-based Lambda functions, the container and Lambda function are created or updated in the same process. However, some use cases require developers to create the image first, and then update one or more Lambda functions from that image. In these situations, organizations may mandate that infrastructure components such as Amazon S3 and Amazon Elastic Container Registry (ECR) are centralized and deployed separately from their application deployment pipelines.

    This post demonstrates how to use AWS continuous integration and deployment (CI/CD) services and Docker to separate the container build process from the application deployment process.

    Overview

    There is a sample application that creates two pipelines to deploy a Java application. The first pipeline uses Docker to build and deploy the container image to the Amazon ECR. The second pipeline uses AWS Serverless Application Model (AWS SAM) to deploy a Lambda function based on the container from the first process.

    This shows how to build, manage, and deploy Lambda container images automatically with infrastructure as code (IaC). It also covers automatically updating or creating Lambda functions based on a container image version.

    Example architecture

    Example architecture

    The example application uses AWS CloudFormation to configure the AWS Lambda container pipelines. Both pipelines use AWS CodePipeline, AWS CodeBuild, and AWS CodeCommit. The lambda-container-image-deployment-pipeline builds and deploys a container image to ECR. The sam-deployment-pipeline updates or deploys a Lambda function based on the new container image.

    The pipeline deploys the sample application:

    1. The developer pushes code to the main branch.
    2. An update to the main branch invokes the pipeline.
    3. The pipeline clones the CodeCommit repository.
    4. Docker builds the container image and assigns tags.
    5. Docker pushes the image to ECR.
    6. The lambda-container-image-pipeline completion triggers an Amazon EventBridge event.
    7. The pipeline clones the CodeCommit repository.
    8. AWS SAM builds the Lambda-based container image application.
    9. AWS SAM deploys the application to AWS Lambda.

    Prerequisites

    To provision the pipeline deployment, you must have the following prerequisites:

    Infrastructure configuration

    The pipeline relies on infrastructure elements like AWS Identity and Access Management roles, S3 buckets, and an ECR repository. Due to security and governance considerations, many organizations prefer to keep these infrastructure components separate from their application deployments.

    To start, deploy the core infrastructure components using CloudFormation and the AWS CLI:

    1. Create a local directory called BlogDemoRepo and clone the source code repository found in the following location:
      mkdir -p $HOME/BlogDemoRepo
      cd $HOME/BlogDemoRepo
      git clone https://github.com/aws-samples/modernize-deployments-with-container-images-in-lambda
    2. Change directory into the cloned repository:
      cd modernize-deployments-with-container-images-in-lambda/
    3. Deploy the s3-iam-config CloudFormation template, keeping the following CloudFormation template names:
      aws cloudformation create-stack \
        --stack-name s3-iam-config \
        --template-body file://templates/s3-iam-config.yml \
        --parameters file://parameters/s3-iam-config-params.json \
        --capabilities CAPABILITY_NAMED_IAM

      The output should look like the following:

      Output example for stack creation

      Output example for stack creation

    Application overview

    The application uses Docker to build the container image and an ECR repository to store the container image. AWS SAM deploys the Lambda function based on the new container.

    The example application in this post uses a Java-based container image using Amazon Corretto. Amazon Corretto is a no-cost, multi-platform, production-ready Open Java Development Kit (OpenJDK).

    The Lambda container-image base includes the Amazon Linux operating system, and a set of base dependencies. The image also consists of the Lambda Runtime Interface Client (RIC) that allows your runtime to send and receive to the Lambda service. Take some time to review the Dockerfile and how it configures the Java application.

    Configure the repository

    The CodeCommit repository contains all of the configurations the pipelines use to deploy the application. To configure the CodeCommit repository:

    1. Get metadata about the CodeCommit repository created in a previous step. Run the following command from the BlogDemoRepo directory created in a previous step:
      aws codecommit get-repository \
        --repository-name DemoRepo \
        --query repositoryMetadata.cloneUrlHttp \
        --output text

      The output should look like the following:

      Output example for get repository

      Output example for get repository

    2. In your terminal, paste the Git URL from the previous step and clone the repository:
      git clone <insert_url_from_step_1_output>

      You receive a warning because the repository is empty.

      Empty repository warning

      Empty repository warning

    3. Create the main branch:
      cd DemoRepo
      git checkout -b main
    4. Copy all of the code from the cloned GitHub repository to the CodeCommit repository:
      cp -r ../modernize-deployments-with-container-images-in-lambda/* .
    5. Commit and push the changes:
      git add .
      git commit -m "Initial commit"
      git push -u origin main

    Pipeline configuration

    This example deploys two separate pipelines. The first is called the modernize-deployments-with-container-images-in-lambda, which consists of building and deploying a container-image to ECR using Docker and the AWS CLI. An EventBridge event starts the pipeline when the CodeCommit branch is updated.

    The second pipeline, sam-deployment-pipeline, is where the container image built from lambda-container-image-deployment-pipeline is deployed to a Lambda function using AWS SAM. This pipeline is also triggered using an Amazon EventBridge event. Successful completion of the lambda-container-image-deployment-pipeline invokes this second pipeline through Amazon EventBridge.

    Both pipelines consist of AWS CodeBuild jobs configured with a buildspec file. The buildspec file enables developers to run bash commands and scripts to build and deploy applications.

    Deploy the pipeline

    You now configure and deploy the pipelines and test the configured application in the AWS Management Console.

    1. Change directory back to modernize-serverless-deployments-leveraging-lambda-container-images directory and deploy the lambda-container-pipeline CloudFormation Template:
      cd $HOME/BlogDemoRepo/modernize-deployments-with-container-images-in-lambda/
      aws cloudformation create-stack \
        --stack-name lambda-container-pipeline \
        --template-body file://templates/lambda-container-pipeline.yml \
        --parameters file://parameters/lambda-container-params.json  \
        --capabilities CAPABILITY_IAM \
        --region us-east-1

      The output appears:

      Output example for stack creation

      Output example for stack creation

    2. Wait for the lambda-container-pipeline stack from the previous step to complete and deploy the sam-deployment-pipeline CloudFormation template:
      aws cloudformation create-stack \
        --stack-name sam-deployment-pipeline \
        --template-body file://templates/sam-deployment-pipeline.yml \
        --parameters file://parameters/sam-deployment-params.json  \
        --capabilities CAPABILITY_IAM \
        --region us-east-1

      The output appears:

      Output example of stack creation

      Output example of stack creation

    3. In the console, select CodePipelinepipelines:

    4. Wait for the status of both pipelines to show Succeeded:
    5. Navigate to the ECR console and choose demo-java. This shows that the pipeline is built and the image is deployed to ECR.
    6. Navigate to the Lambda console and choose the MyCustomLambdaContainer function.
    7. The Image configuration panel shows that the function is configured to use the image created earlier.
    8. To test the function, choose Test.
    9. Keep the default settings and choose Test.

    This completes the walkthrough. To further test the workflow, modify the Java application and commit and push your changes to the main branch. You can then review the updated resources you have deployed.

    Conclusion

    This post shows how to use AWS services to automate the creation of Lambda container images. Using CodePipeline, you create a CI/CD pipeline for updates and deployments of Lambda container-images. You then test the Lambda container-image in the AWS Management Console.

    For more serverless content visit Serverless Land.

    Understanding how AWS Lambda scales with Amazon SQS standard queues

    Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/understanding-how-aws-lambda-scales-when-subscribed-to-amazon-sqs-queues/

    This post is written by John Lee, Solutions Architect, and Isael Pelletier, Senior Solutions Architect.

    Many architectures use Amazon SQS, a fully managed message queueing service, to decouple producers and consumers. SQS is a fundamental building block for building decoupled architectures. AWS Lambda is also fully managed by AWS and is a common choice as a consumer as it supports native integration with SQS. This combination of services allows you to write and maintain less code and unburden you from the heavy lifting of server management.

    This blog post looks into optimizing the scaling behavior of Lambda functions when subscribed to an SQS standard queue. It discusses how Lambda’s scaling works in this configuration and reviews best practices for maximizing message throughput. The post provides insight into building your own scalable workload and guides you in building well-architected workloads.

    Scaling Lambda functions

    When a Lambda function subscribes to an SQS queue, Lambda polls the queue as it waits for messages to arrive. Lambda consumes messages in batches, starting at five concurrent batches with five functions at a time.

    If there are more messages in the queue, Lambda adds up to 60 functions per minute, up to 1,000 functions, to consume those messages. This means that Lambda can scale up to 1,000 concurrent Lambda functions processing messages from the SQS queue.

    Lambda poller

    Figure 1. The Lambda service polls the SQS queue and batches messages that are processed by automatic scaling Lambda functions. You start with five concurrent Lambda functions.

    This scaling behavior is managed by AWS and cannot be modified. To process more messages, you can optimize your Lambda configuration for higher throughput. There are several strategies you can implement to do this.

    Increase the allocated memory for your Lambda function

    The simplest way to increase throughput is to increase the allocated memory of the Lambda function. While you do not have control over the scaling behavior of the Lambda functions subscribed to an SQS queue, you control the memory configuration.

    Faster Lambda functions can process more messages and increase throughput. This works even if a Lambda function’s memory utilization is low. This is because increasing memory also increases vCPUs in proportion to the amount configured. Each function now supports up to 10 GB of memory and you can access up to six vCPUs per function.

    You may need to modify code to take advantage of the extra vCPUs. This consists of implementing multithreading or parallel processing to use all the vCPUs. You can find a Python example in this blog post.

    To see the average cost and execution speed for each memory configuration before making a decision, Lambda Power Tuning tool helps to visualize the tradeoffs.

    Optimize batching behavior

    Batching can increase message throughput. By default, Lambda batches up to 10 messages in a queue to process them during a single Lambda execution. You can increase this number up to 10,000 messages, or up to 6 MB of messages in a single batch for standard SQS queues.

    If each payload size is 256 KB (the maximum message size for SQS), Lambda can only take 23 messages per batch, regardless of the batch size setting. Similar to increasing memory for a Lambda function, processing more messages per batch can increase throughput.

    However, increasing the batch size does not always achieve this. It is important to understand how message batches are processed. All messages in a failed batch return to the queue. This means that if a Lambda function with five messages fails while processing the third message, all five messages are returned to the queue, including the successfully processed messages. The Lambda function code must be able to process the same message multiple times without side effects.

    Failed message processing

    Figure 2. A Lambda function returning all five messages to the queue after failing to process the third message.

    To prevent successfully processed messages from being returned to SQS, you can add code to delete the processed messages from the queue manually. You can also use existing open source libraries, such as Lambda Powertools for Python or Lambda Powertools for Java that provide this functionality.

    Catch errors from the Lambda function

    The Lambda service scales to process messages from the SQS queue when there are sufficient messages in the queue.

    However, there is a case where Lambda scales down the number of functions, even when there are messages remaining in the queue. This is when a Lambda function throws errors. The Lambda function scales down to minimize erroneous invocations. To sustain or increase the number of concurrent Lambda functions, you must catch the errors so the function exits successfully.

    To retain the failed messages, use an SQS dead-letter queue (DLQ). There are caveats to this approach. Catching errors without proper error handling and tracking mechanisms can result in errors being ignored instead of raising alerts. This may lead to silent failures while Lambda continues to scale and process messages in the queue.

    Relevant Lambda configurations

    There are several Lambda configuration settings to consider for optimizing Lambda’s scaling behavior. Paying attention to the following configurations can help prevent throttling and increase throughput of your Lambda function.

    Reserved concurrency

    If you use reserved concurrency for a Lambda function, set this value greater than five. This value sets the maximum number of concurrent Lambda functions that can run. Lambda allocates five functions to consume five batches at a time. If the reserved concurrency value is lower than five, the function is throttled when it tries to process more than this value concurrently.

    Batching Window

    For larger batch sizes, set the MaximumBatchingWindowInSeconds parameter to at least 1 second. This is the maximum amount of time that Lambda spends gathering records before invoking the function. If this value is too small, Lambda may invoke the function with a batch smaller than the batch size. If this value is too large, Lambda polls for a longer time before processing the messages in the batch. You can adjust this value to see how it affects your throughput.

    Queue visibility timeout

    All SQS messages have a visibility timeout that determines how long the message is hidden from the queue after being selected up by a consumer. If the message is not successfully processed or deleted, the message reappears in the queue when the visibility timeout ends.

    Give your Lambda function enough time to process the message by setting the visibility timeout based on your function-specific metrics. You should set this value to six times the Lambda function timeout plus the value of MaximumBatchingWindowInSeconds. This prevents other functions from unnecessarily processing the messages while the message is already being processed.

    Dead-letter queues (DLQs)

    Failed messages are placed back in the queue to be retried by Lambda. To prevent failed messages from getting added to the queue multiple times, designate a DLQ and send failed messages there.

    The number of times the messages should be retried is set by the Maximum receives value for the DLQ. Once the message is re-added to the queue more than the Maximum receives value, it is placed in the DLQ. You can then process this message at a later time.

    This allows you to avoid situations where many failed messages are continuously placed back into the queue, consuming Lambda resources. Failed messages scale down the Lambda function and add the entire batch to the queue, which can worsen the situation. To ensure smooth scaling of the Lambda function, move repeatedly failing messages to the DLQ.

    Conclusion

    This post explores Lambda’s scaling behavior when subscribed to SQS standard queues. It walks through several ways to scale faster and maximize Lambda throughput when needed. This includes increasing the memory allocation for the Lambda function, increasing batch size, catching errors, and making configuration changes. Better understanding the levers available for SQS and Lambda interaction can help in meeting your scaling needs.

    To learn more about building decoupled architectures, see these videos on Amazon SQS. For more serverless learning resources, visit https://serverlessland.com.

    Token-based authentication for iOS applications with Amazon SNS

    Post Syndicated from Talia Nassi original https://aws.amazon.com/blogs/compute/token-based-authentication-for-ios-applications-with-amazon-sns/

    This post is co-written by Karen Hong, Software Development Engineer, AWS Messaging.

    To use Amazon SNS to send mobile push notifications, you must provide a set of credentials for connecting to the supported push notification service (see prerequisites for push). For the Apple Push Notification service (APNs), SNS now supports using token-based authentication (.p8), in addition to the existing certificate-based method.

    You can now use a .p8 file to create or update a platform application resource through the SNS console or programmatically. You can publish messages (directly or from a topic) to platform application endpoints configured for token-based authentication.

    In this tutorial, you set up an example iOS application. You retrieve information from your Apple developer account and learn how to register a new signing key. Next, you use the SNS console to set up a platform application and a platform endpoint. Finally, you test the setup and watch a push notification arrive on your device.

    Advantages of token-based authentication

    Token-based authentication has several benefits compared to using certificates. The first is that you can use the same signing key from multiple provider servers (iOS,VoIP, and MacOS), and you can use one signing key to distribute notifications for all of your company’s application environments (sandbox, production). In contrast, a certificate is only associated with a particular subset of these channels.

    A pain point for customers using certificate-based authentication is the need to renew certificates annually, an inconvenient procedure which can lead to production issues when forgotten. Your signing key for token-based authentication, on the other hand, does not expire.

    Token-based authentication improves the security of your certificates. Unlike certificate-based authentication, the credential does not transfer. Hence, it is less likely to be compromised. You establish trust through encrypted tokens that are frequently regenerated. SNS manages the creation and management of these tokens.

    You configure APNs platform applications for use with both .p8 and .p12 certificates, but only 1 authentication method is active at any given time.

    Setting up your iOS application

    To use token-based authentication, you must set up your application.

    Prerequisites: An Apple developer account

    1. Create a new XCode project. Select iOS as the platform and use the App template.
      xcode project
    2. Select your Apple Developer Account team and your organization identifier.
      vscode details
    3. Go to Signing & Capabilities and select + Capability. This step creates resources on your Apple Developer Account.
      step 3
    4. Add the Push Notification Capability.
    5. In SNSPushDemoApp.swift , add the following code to print the device token and receive push notifications.
      import SwiftUI
      
      @main
      struct SNSPushDemoApp: App {
          
          @UIApplicationDelegateAdaptor private var appDelegate: AppDelegate
          
          var body: some Scene {
              WindowGroup {
                  ContentView()
              }
          }
      }
      
      class AppDelegate: NSObject, UIApplicationDelegate, UNUserNotificationCenterDelegate {
          
          func application(_ application: UIApplication,
                           didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey : Any]? = nil) -> Bool {
              UNUserNotificationCenter.current().delegate = self
              return true
          }
          
          func application(_ application: UIApplication,
                           didRegisterForRemoteNotificationsWithDeviceToken deviceToken: Data) {
              let tokenParts = deviceToken.map { data in String(format: "%02.2hhx", data) }
              let token = tokenParts.joined()
              print("Device Token: \(token)")
          };
          
          func application(_ application: UIApplication, didFailToRegisterForRemoteNotificationsWithError error: Error) {
             print(error.localizedDescription)
          }
          
          func userNotificationCenter(_ center: UNUserNotificationCenter, willPresent notification: UNNotification, withCompletionHandler completionHandler: @escaping (UNNotificationPresentationOptions) -> Void) {
              completionHandler([.banner, .badge, .sound])
          }
      }
      
    6. In ContentView.swift, add the code to request authorization for push notifications and register for notifications.
      import SwiftUI
      
      struct ContentView: View {
          init() {
              requestPushAuthorization();
          }
          
          var body: some View {
              Button("Register") {
                  registerForNotifications();
              }
          }
      }
      
      struct ContentView_Previews: PreviewProvider {
          static var previews: some View {
              ContentView()
          }
      }
      
      func requestPushAuthorization() {
          UNUserNotificationCenter.current().requestAuthorization(options: [.alert, .badge, .sound]) { success, error in
              if success {
                  print("Push notifications allowed")
              } else if let error = error {
                  print(error.localizedDescription)
              }
          }
      }
      
      func registerForNotifications() {
          UIApplication.shared.registerForRemoteNotifications()
      }
      
    7. Build and run the app on an iPhone. The push notification feature does not work with a simulator.
    8. On your phone, select allow notifications when the prompt appears. The debugger prints out “Push notifications allowed” if it is successful.
      allow notifications
    9. On your phone, choose the Register button. The debugger prints out the device token.
    10. You have set up an iOS application that can receive push notifications and prints the device token. We can now use this app to test sending push notifications with SNS configured for token-based authentication.

    Retrieving your Apple resources

    After setting up your application, you retrieve your Apple resources from your Apple developer account. There are four pieces of information you need from your Apple Developer Account: Bundle ID, Team ID, Signing Key, and Signing Key ID.

    The signing key and signing key ID are credentials that you manage through your Apple Developer Account. You can register a new key by selecting the Keys tab under the Certificates, Identifiers & Profiles menu. Your Apple developer account provides the signing key in the form of a text file with a .p8 extension.

    certs

    Find the team ID under Membership Details. The bundle ID is the unique identifier that you set up when creating your application. Find this value in the Identifiers section under the Certificates, Identifiers & Profiles menu.

    Amazon SNS uses a token constructed from the team ID, signing key, and signing key ID to authenticate with APNs for every push notification that you send. Amazon SNS manages tokens on your behalf and renews them when necessary (within an hour). The request header includes the bundle ID and helps identify where the notification goes.

    Creating a new platform application using APNs token-based authentication

    Prerequisites

    In order to implement APNs token-based authentication, you must have:

    • An Apple Developer Account
    • A mobile application

    To create a new platform application:

    1. Navigate to the Amazon SNS console and choose Push notifications. Then choose Create platform application.
      sns console
    2. Enter a name for your application. In the Push notification platform dropdown, choose Apple iOS/VoIP/Mac.
      sns name
    3. For the Push service, choose iOS, and for the Authentication method, choose Token. Select the check box labeled Used for development in sandbox. Then, input the fields from your Apple Developer Account.
      step 3
    4. You have successfully created a platform application using APNs token-based authentication.

    Creating a new platform endpoint using APNs token-based authentication

    A platform application stores credentials, sending configuration, and other settings but does not contain an exact sending destination. Create a platform endpoint resource to store the information to allow SNS to target push notifications to the proper application on the correct mobile device.

    Any iOS application that is capable of receiving push notifications must register with APNs. Upon successful registration, APNs returns a device token that uniquely identifies an instance of an app. SNS needs this device token in order to send to that app. Each platform endpoint belongs to a specific platform application and uses the credentials and settings set in the platform application to complete the sending.

    In this tutorial, you create the platform endpoint manually through the SNS console. In a real system, upon receiving the device token, you programmatically call SNS from your application server to create or update your platform endpoints.

    These are the steps to create a new platform endpoint:

    1. From the details page of the platform application in the SNS console, choose Create application endpoint.
      appliation endpont
    2. From the iOS app that you set up previously, find the device token in the application logs. Enter the device token and choose Create application endpoint.
      application endpont details
    3. You have successfully created a platform application endpoint.
      application endpoint

    Testing a push notification from your device

    In this section, you test a push notification from your device.

    1. From the details page of the application endpoint you just created, (this is the page you end up at immediately after creating the endpoint), choose Publish message.
    2. Enter a message to send and choose Publish message.
      testing app
    3. The notification arrives on your iOS app.
      testing the app

    Conclusion

    Developers sending mobile push notifications can now use a .p8 key to authenticate an Apple device endpoint. Token-based authentication is more secure, and reduces operational burden of renewing the certificates every year. In this post, you learn how to set up your iOS application for mobile push using token-based authentication, by creating and configuring a new platform endpoint in the Amazon SNS console.

    To learn more about APNs token-based authentication with Amazon SNS, visit the Amazon SNS Developer Guide. For more serverless content, visit Serverless Land.

    Creating static custom domain endpoints with Amazon MQ for RabbitMQ

    Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/creating-static-custom-domain-endpoints-with-amazon-mq-for-rabbitmq/

    This post is written by Nate Bachmeier, Senior Solutions Architect, Wallace Printz, Senior Solutions Architect, Christian Mueller, Principal Solutions Architect.

    Many cloud-native application architectures take advantage of the point-to-point and publish-subscribe, or “pub-sub”, model of message-based communication between application components. Not only is this architecture generally more resilient to failure because of the loose coupling and because message processing failures can be retried, it is also more efficient because individual application components can independently scale up or down to maintain message processing SLAs, compared to monolithic application architectures.

    Synchronous (REST-based) systems are tightly coupled. A problem in a synchronous downstream dependency has immediate impact on the upstream callers. Retries from upstream callers can fan out and amplify problems.

    For applications requiring messaging protocols including JMS, NMS, AMQP, STOMP, MQTT, and WebSocket, Amazon provides Amazon MQ. This is a managed message broker service for Apache ActiveMQ and RabbitMQ that makes it easier to set up and operate message brokers in the cloud.

    Amazon MQ provides two managed broker deployment connection options: public brokers and private brokers. Public brokers receive internet-accessible IP addresses while private brokers receive only private IP addresses from the corresponding CIDR range in their VPC subnet. In some cases, for security purposes, customers may prefer to place brokers in a private subnet, but also allow access to the brokers through a persistent public endpoint, such as a subdomain of their corporate domain like ‘mq.example.com’.

    This blog explains how to provision private Amazon MQ brokers behind a secure public load balancer endpoint using an example subdomain.

    AmazonMQ also supports ActiveMQ – to learn more, read Creating static custom domain endpoints with Amazon MQ to simplify broker modification and scaling.

    Overview

    There are several reasons one might want to deploy this architecture beyond the security aspects. First, human-readable URLs are easier for people to parse when reviewing operations and troubleshooting, such as deploying updates to ‘mq-dev.example.com’ before ‘mq-prod.example.com’. Additionally, maintaining static URLs for your brokers helps reduce the necessity of modifying client code when performing maintenance on the brokers.

    The following diagram shows the solutions architecture. This blog post assumes some familiarity with AWS networking fundamentals, such as VPCs, subnets, load balancers, and Amazon Route 53. For additional information on these topics, see the Elastic Load Balancing documentation.

    Reference architecture

    1. The client service tries to connect with a RabbitMQ connection string to the domain endpoint setup in Route 53.
    2. The client looks up the domain name from Route 53, which returns the IP address of the Network Load Balancer (NLB).
    3. The client creates a Transport Layer Security (TLS) connection to the NLB with a secure socket layer (SSL) certificate provided from AWS Certificate Manager (ACM).
    4. The NLB chooses a healthy endpoint from the target group and creates a separate SSL connection. This provides secure, end-to-end SSL encrypted messaging between client and brokers.

    To build this architecture, you build the network segmentation first, then add the Amazon MQ brokers, and finally the network routing. You need a VPC, one private subnet per Availability Zone, and one public subnet for your bastion host (if desired).

    This demonstration VPC uses the 10.0.0.0/16 CIDR range. Additionally, you must create a custom security group for your brokers. You must set up this security group to allow traffic from your Network Load Balancer to the RabbitMQ brokers.

    This example does not use this VPC for other workloads so it allows all incoming traffic that originates within the VPC (which includes the NLB) through to the brokers on the AMQP port of 5671 and the web console port of 443.

    Inbound rules

    Adding the Amazon MQ brokers

    With the network segmentation set up, add the Amazon MQ brokers:

    1. Choose Create brokers on the Active Amazon MQ home page.
    2. Toggle the radio button next to RabbitMQ and choose Next.
    3. Choose a deployment mode of either single-instance broker (for development environments), or cluster deployment (for production environments).
    4. In Configure settings, specify the broker name and instance type.
    5. Confirm that the broker engine version is 3.8.22 or higher and set Access type to Private access.
      Configure settings page
    6. Specify the VPC, private subnets, and security groups before choosing Next.
      Additional settings

    Finding the broker’s IP address

    Before configuring the NLB’s target groups, you must look up the broker’s IP address. Unlike Amazon MQ for Apache ActiveMQ, RabbitMQ does not show its private IP addresses, though you can reliably find its VPC endpoints using DNS. Amazon MQ creates one VPC endpoint in each subnet with a static address that won’t change until you delete the broker.

    1. Navigate to the broker’s details page and scroll to the Connections panel.
    2. Find the endpoint’s fully qualified domain name. It is formatted like broker-id.mq.region.amazonaws.com.
    3. Open a command terminal on your local workstation.
    4. Retrieve the ‘A’ record values using the host (Linux) or nslookup (Windows) command.
    5. Record these values for the NLB configuration steps later.
      Terminal results

    Configure the load balancer’s target group

    The next step in the build process is to configure the load balancer’s target group. You use the private IP addresses of the brokers as targets for the NLB. Create a Target Group, select the target type as IP, and make sure to choose the TLS protocol and for each required port, as well as the VPC your brokers reside in.

    Choose a target type

    It is important to configure the health check settings so traffic is only routed to active brokers. Select the TCP protocol and override the health check port to 443 Rabbit MQ’s console port. Also, configure the healthy threshold to 2 with a 10-second check interval so the NLB detects faulty hosts within 20 seconds.

    Be sure not to use RabbitMQ’s AMQP port as the target group health check port. The NLB may not be able to recognize the host as healthy on that port. It is better to use the broker’s web console port.

    Health checks

    Add the VPC endpoint addresses as NLB targets. The NLB routes traffic across the endpoints and provides networking reliability if an AZ is offline. Finally, configure the health checks to use the web console (TCP port 443).

    Specify IPs and ports

    Creating a Network Load Balancer

    Next, you create a Network Load Balancer. This is an internet-facing load balancer with TLS listeners on port 5671 (AMQP), routing traffic to the brokers’ VPC and private subnets. You select the target group you created, selecting TLS for the connection between the NLB and the brokers. To allow clients to connect to the NLB securely, select an ACM certificate for the subdomain registered in Route 53 (for example ‘mq.example.com’).

    To learn about ACM certificate provisioning, read more about the process here. Make sure that the ACM certificate is provisioned in the same Region as the NLB or the certificate is not shown in the dropdown menu.

    Basic configuration page

    Optionally configure IP filtering

    The NLB is globally accessible and this may be overly permissive for some workloads. You can restrict incoming traffic to specific IP ranges on the NLB’s public subnet by using network access control list (NACL) configuration:

    1. Navigate to the AWS Management Console to the VPC service and choose Subnets.
    2. Select your public subnet and then switch to the Network ACL tab.
    3. Select the link to the associated network ACL (e.g., acl-0d6fxxxxxxxx) details.
    4. Activate this item and choose Edit inbound rules in the Action menu.
    5. Specify the desired IP range and then choose Save changes.

    Edit inbound rules

    Configuring Route 53

    Finally, configure Route 53 to serve traffic at the subdomain of your choice to the NLB:

    1. Go to the Route 53 hosted zone and create a new subdomain record set, such as mq.example.com, that matches the ACM certificate that you previously created.
    2. In the “type” field, select “A – IPv4 address”, then select “Yes” for alias. This allows you to select the NLB as the alias target.
    3. Select from the alias target menu the NLB you just created and save the record set.

    Quick create record

    Now callers can use the friendly name in the RabbitMQ connection string. This capability improves the developer experience and reduces operational cost when rebuilding the cluster. Since you added multiple VPC endpoints (one per subnet) into the NLB’s target group, the solution has Multi-AZ redundancy.

    Testing with a RabbitMQ client process

    The entire process can be tested using any RabbitMQ client process. One approach is to launch the official Docker image and connect with the native client. The service documentation also provides sample code for authenticating, publishing, and subscribing to RabbitMQ channels.

    To log in to the broker’s RabbitMQ web console, there are three options. Due to the security group rules, only traffic originating from inside the VPC is allowed to the brokers:

    1. Use a VPN connection from your corporate network to the VPC. Many customers use this option but for rapid testing, there is a simpler and more cost-effective method.
    2. Connect to the brokers’ web console through your Route 53 subdomain, which requires creating a separate web console port listener (443) on the existing NLB and creating a separate TLS target group for the brokers.
    3. Use a bastion host to proxy traffic to the web console.

    RabbitMQ

    Conclusion

    In this post, you build a highly available Amazon MQ broker in a private subnet. You layer security by placing the brokers behind a highly scalable Network Load Balancer. You configure routing from a single custom subdomain URL to multiple brokers with a built-in health check.

    For more serverless learning resources, visit Serverless Land.

    Implementing header-based API Gateway versioning with Amazon CloudFront

    Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/implementing-header-based-api-gateway-versioning-with-amazon-cloudfront/

    This post is written by Amir Khairalomoum, Sr. Solutions Architect.

    In this blog post, I show you how to use Lambda@Edge feature of Amazon CloudFront to implement a header-based API versioning solution for Amazon API Gateway.

    Amazon API Gateway is a fully managed service that makes it easier for developers to create, publish, maintain, monitor, and secure APIs at any scale. Amazon CloudFront is a global content delivery network (CDN) service built for high-speed, low-latency performance, security, and developer ease-of-use. Lambda@Edge is a feature of Amazon CloudFront, a compute service that lets you run functions that customize the content that CloudFront delivers.

    The example uses the AWS SAM CLI to build, deploy, and test the solution on AWS. The AWS Serverless Application Model (AWS SAM) is an open-source framework that you can use to build serverless applications on AWS. The AWS SAM CLI lets you locally build, test, and debug your applications defined by AWS SAM templates. You can also use the AWS SAM CLI to deploy your applications to AWS, or create secure continuous integration and deployment (CI/CD) pipelines.

    After an API becomes publicly available, it is used by customers. As a service evolves, its contract also evolves to reflect new changes and capabilities. It’s safe to evolve a public API by adding new features but it’s not safe to change or remove existing features.

    Any breaking changes may impact consumer’s applications and break them at runtime. API versioning is important to avoid breaking backward compatibility and breaking a contract. You need a clear strategy for API versioning to help consumers adopt them.

    Versioning APIs

    Two of the most commonly used API versioning strategies are URI versioning and header-based versioning.

    URI versioning

    This strategy is the most straightforward and the most commonly used approach. In this type of versioning, versions are explicitly defined as part of API URIs. These example URLs show how domain name, path, or query string parameters can be used to specify a version:

    https://api.example.com/v1/myservice
    https://apiv1.example.com/myservice
    https://api.example.com/myservice?v=1

    To deploy an API in API Gateway, the deployment is associated with a stage. A stage is a logical reference to a lifecycle state of your API (for example, dev, prod, beta, v2). As your API evolves, you can continue to deploy it to different stages as different versions of the API.

    Header-based versioning

    This strategy is another commonly used versioning approach. It uses HTTP headers to specify the desired version. It uses the “Accept” header for content negotiation or uses a custom header (for example, “APIVER” to indicate a version):

    Accept:application/vnd.example.v1+json
    APIVER:v1

    This approach allows you to preserve URIs between versions. As a result, you have a cleaner and more understandable set of URLs. It is also easier to add versioning after design. However, you may need to deal with complexity of returning different versions of your resources.

    Overview of solution

    The target architecture for the solution uses Lambda@Edge. It dynamically routes a request to the relevant API version, based on the provided header:

    Architecture overview

    Architecture overview

    In this architecture:

    1. The user sends a request with a relevant header, which can be either “Accept” or another custom header.
    2. This request reaches the CloudFront distribution and triggers the Lambda@Edge Origin Request.
    3. The Lambda@Edge function uses the provided header value and fetches data from an Amazon DynamoDB table. This table contains mappings for API versions. The function then modifies the Origin and the Host header of the request and returns it back to CloudFront.
    4. CloudFront sends the request to the relevant Amazon API Gateway URL.

    In the next sections, I walk you through setting up the development environment and deploying and testing this solution.

    Setting up the development environment

    To deploy this solution on AWS, you use the AWS Cloud9 development environment.

    1. Go to the AWS Cloud9 web console. In the Region dropdown, make sure you’re using N. Virginia (us-east-1) Region.
    2. Select Create environment.
    3. On Step 1 – Name environment, enter a name for the environment, and choose Next step.
    4. On Step 2 – Configure settings, keep the existing environment settings.

      Console view of configuration settings

      Console view of configuration settings

    5. Choose Next step. Choose Create environment.

    Deploying the solution

    Now that the development environment is ready, you can proceed with the solution deployment. In this section, you download, build, and deploy a sample serverless application for the solution using AWS SAM.

    Download the sample serverless application

    The solution sample code is available on GitHub. Clone the repository and download the sample source code to your Cloud9 IDE environment by running the following command in the Cloud9 terminal window:

    git clone https://github.com/aws-samples/amazon-api-gateway-header-based-versioning.git ./api-gateway-header-based-versioning

    This sample includes:

    • template.yaml: Contains the AWS SAM template that defines your application’s AWS resources.
    • hello-world/: Contains the Lambda handler logic behind the API Gateway endpoints to return the hello world message.
    • edge-origin-request/: Contains the Lambda@Edge handler logic to query the API version mapping and modify the Origin and the Host header of the request.
    • init-db/: Contains the Lambda handler logic for a custom resource to populate sample DynamoDB table

    Build your application

    Run the following commands in order to first, change into the project directory, where the template.yaml file for the sample application is located then build your application:

    cd ~/environment/api-gateway-header-based-versioning/
    sam build

    Output:

    Build output

    Build output

    Deploy your application

    Run the following command to deploy the application in guided mode for the first time then follow the on-screen prompts:

    sam deploy --guided

    Output:

    Deploy output

    Deploy output

    The output shows the deployment of the AWS CloudFormation stack.

    Testing the solution

    This application implements all required components for the solution. It consists of two Amazon API Gateway endpoints backed by AWS Lambda functions. The deployment process also initializes the API Version Mapping DynamoDB table with the values provided earlier in the deployment process.

    Run the following commands to see the created mappings:

    STACK_NAME=$(grep stack_name ~/environment/api-gateway-header-based-versioning/samconfig.toml | awk -F\= '{gsub(/"/, "", $2); gsub(/ /, "", $2); print $2}')
    
    DDB_TBL_NAME=$(aws cloudformation describe-stacks --region us-east-1 --stack-name $STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`DynamoDBTableName`].OutputValue' --output text) && echo $DDB_TBL_NAME
    
    aws dynamodb scan --table-name $DDB_TBL_NAME
    

    Output:

    Table scan results

    Table scan results

    When a user sends a GET request to CloudFront, it routes the request to the relevant API Gateway endpoint version according to the provided header value. The Lambda function behind that API Gateway endpoint is invoked and returns a “hello world” message.

    To send a request to the CloudFront distribution, which is created as part of the deployment process, first get its domain name from the deployed AWS CloudFormation stack:

    CF_DISTRIBUTION=$(aws cloudformation describe-stacks --region us-east-1 --stack-name $STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`CFDistribution`].OutputValue' --output text) && echo $CF_DISTRIBUTION

    Output:

    Domain name results

    Domain name results

    You can now send a GET request along with the relevant header you specified during the deployment process to the CloudFront to test the application.

    Run the following command to test the application for API version one. Note that if you entered a different value other than the default value provided during the deployment process, change the --header parameter to match your inputs:

    curl -i -o - --silent -X GET "https://${CF_DISTRIBUTION}/hello" --header "Accept:application/vnd.example.v1+json" && echo

    Output:

    Curl results

    Curl results

    The response shows that CloudFront successfully routed the request to the API Gateway v1 endpoint as defined in the mapping Amazon DynamoDB table. API Gateway v1 endpoint received the request. The Lambda function behind the API Gateway v1 was invoked and returned a “hello world” message.

    Now you can change the header value to v2 and run the command again this time to test the API version two:

    curl -i -o - --silent -X GET "https://${CF_DISTRIBUTION}/hello" --header "Accept:application/vnd.example.v2+json" && echo

    Output:

    Curl results after header change

    Curl results after header change

    The response shows that CloudFront routed the request to the API Gateway v2 endpoint as defined in the mapping DynamoDB table. API Gateway v2 endpoint received the request. The Lambda function behind the API Gateway v2 was invoked and returned a “hello world” message.

    This solution requires valid a header value on each individual request, so the application checks and raises an error if the header is missing or the header value is not valid.

    You can remove the header parameter and run the command to test this scenario:

    curl -i -o - --silent -X GET "https://${CF_DISTRIBUTION}/hello" && echo

    Output:

    No header causes a 403 error

    No header causes a 403 error

    The response shows that Lambda@Edge validated the request and raised an error to inform us that the request did not have a valid header.

    Mitigating latency

    In this solution, Lambda@Edge reads the API version mappings data from the DynamoDB table. Accessing external data at the edge can cause additional latency to the request. In order to mitigate the latency, solution uses following methods:

    1. Cache data in Lambda@Edge memory: As data is unlikely to change across many Lambda@Edge invocations, Lambda@Edge caches API version mappings data in the memory for a certain period of time. It reduces latency by avoiding an external network call for each individual request.
    2. Use Amazon DynamoDB global table: It brings data closer to the CloudFront distribution and reduces external network call latency.

    Cleaning up

    To clean up the resources provisioned as part of the solution:

    1. Run following command to delete the deployed application:
      sam delete
    2. Go to the AWS Cloud9 web console. Select the environment you created then choose Delete.

    Conclusion

    Header-based API versioning is a commonly used versioning strategy. This post shows how to use CloudFront to implement a header-based API versioning solution for API Gateway. It uses the AWS SAM CLI to build and deploy a sample serverless application to test the solution in the AWS Cloud.

    To learn more about API Gateway, visit the API Gateway developer guide documentation, and for CloudFront, refer to Amazon CloudFront developer guide documentation.

    For more serverless learning resources, visit Serverless Land.