Tag Archives: contributed

Visualizing AWS Step Functions workflows from the Amazon Athena console

2021-11-23 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/visualizing-aws-step-functions-workflows-from-the-amazon-athena-console/

This post is written by Dhiraj Mahapatro, Senior Specialist SA, Serverless.

In October 2021, AWS announced visualizing AWS Step Functions from the AWS Batch console. Now you can also visualize Step Functions from the Amazon Athena console.

Amazon Athena is an interactive query service that makes it easier to analyze Amazon S3 data using standard SQL. Athena is a serverless service and can interact directly with data stored in S3. Athena can process unstructured, semistructured, and structured datasets.

AWS Step Functions is a low-code visual workflow service used to orchestrate AWS services, automate business processes, and build serverless applications. Step Functions workflows manage failures, retries, parallelization, service integrations, and observability so builders can focus on business logic. Athena is one of the service integrations that are available for Step Functions.

This blog walks through Step Functions integration in Amazon Athena console. It shows how you can visualize and operate Athena queries at scale using Step Functions.

Introducing workflow orchestration in Amazon Athena console

AWS customers store large amounts of historical data on S3 and query the data using Athena to get results quickly. They also use Athena to process unstructured data or analyze structured data as part of a data processing pipeline.

Data processing involves discrete steps for ingesting, processing, storing the transformed data, and post-processing, such as visualizing or analyzing the transformed data. Each step involves multiple AWS services. With Step Functions workflow integration, you can orchestrate these steps. This helps to create repeatable and scalable data processing pipelines as part of a larger business application and visualize the workflows in the Athena console.

With Step Functions, you can run queries on a schedule or based on an event by using Amazon EventBridge. You can poll long-running Athena queries before moving to the next step in the process, and handle errors without writing custom code. Combining these two services provides developers with a single method that is scalable and repeatable.

Step Functions workflows in the Amazon Athena console allow orchestration of Athena queries with Step Functions state machines:

Using Athena query patterns from Step Functions

Execute multiple queries

In Athena, you run SQL queries in the Athena console against Athena workgroups. With Step Functions, you can run Athena queries in a sequence or run independent queries simultaneously in parallel using a parallel state. Step Functions also natively handles errors and retries related to Athena query tasks.

Workflow orchestration in the Athena console provides these capabilities to run and visualize multiple queries in Step Functions. For example:

Choose Get Started from Execute multiple queries.
From the pop-up, choose Create your own workflow and select Continue.

A new browser tab opens with the Step Functions Workflow Studio. The designer shows a workflow pattern template pre-created. The workflow loads data from a data source running two Athena queries in parallel. The results are then published to an Amazon SNS topic.

Alternatively, choosing Deploy a sample project from the Get Started pop-up deploys a sample Step Functions workflow.

This option creates a state machine. You then review the workflow definition, deploy an AWS CloudFormation stack, and run the workflow in the Step Functions console.

Once deployed, the state machine is visible in the Step Functions console as:

Select the AthenaMultipleQueriesStateMachine to land on the details page:

The CloudFormation template provisions the required AWS Glue database, S3 bucket, an Athena workgroup, the required AWS Lambda functions, and the SNS topic for query results notification.

To see the Step Functions workflow in action, choose Start execution. Keep the optional name and input and choose Start execution:

The state machine completes the tasks successfully by Executing multiple queries in parallel using Amazon Athena and Sending query results using the SNS topic:

The state machine used the Amazon Athena StartQueryExecution and GetQueryResults tasks. The Workflow orchestration in Athena console now highlights this newly created Step Functions state machine:

Any state machine that uses this task in Step Functions in this account is listed here as a state machine that orchestrates Athena queries.

Query large datasets

You can also ingest an extensive dataset in Amazon S3, partition it using AWS Glue crawlers, then run Amazon Athena queries against that partition.

Select Get Started from the Query large datasets pop-up, then choose Create your own workflow and Continue. This action opens the Step Functions Workflow studio with the following pattern. The Glue crawler starts and partitions large datasets for Athena to query in the subsequent query execution task:

Step Functions allows you to combine Glue crawler tasks and Athena queries to partition where necessary before querying and publishing the results.

Keeping data up to date

You can also use Athena to query a target table to fetch data, then update it with new data from other sources using Step Functions’ choice state. The choice state in Step Functions provides branching logic for a state machine.

You are not limited to the previous three patterns shown in workflow orchestration in the Athena console. You can start from scratch and build Step Functions state machine by navigating to the bottom right and using Create state machine:

Create State Machine in the Athena console opens a new tab showing the Step Functions console’s Create state machine page.

Refer to building a state machine AWS Step Functions Workflow Studio for additional details.

Step Functions integrates with all Amazon Athena’s API actions

In September 2021, Step Functions announced integration support for 200 AWS services to enable easier workflow automation. With this announcement, Step Functions can integrate with all Amazon Athena API actions today.

Step Functions can automate the lifecycle of an Athena query: Create/read/update/delete/list workGroups; Create/read/update/delete/list data catalogs, and more.

Other AWS service integrations

Step Functions’ integration with the AWS SDK provides support for 200 AWS Services and over 9,000 API actions. Athena tasks in Step Functions can evolve by integrating available AWS services in the workflow for their pre and post-processing needs.

For example, you can read Athena query results that are put to an S3 bucket by using a GetObject S3 task AWS SDK integration in Step Functions. You can combine different AWS services into a single business process so that they can ingest data through Amazon Kinesis, do processing via AWS Lambda or Amazon EMR jobs, and send notifications to interested parties via Amazon SNS or Amazon SQS or Amazon EventBridge to trigger other parts of their business application.

There are multiple ways to decorate around an Amazon Athena job task. Refer to AWS SDK service integrations and optimized integrations for Step Functions for additional details.

Important considerations

Workflow orchestrations in the Athena console only show Step Functions state machines that use Athena’s optimized API integrations. This includes StartQueryExecution, StopQueryExecution, GetQueryExecution, and GetQueryResults.

Step Functions state machines do not show in the Athena console when:

A state machine uses any other AWS SDK Athena API integration task.
The APIs are invoked inside a Lambda function task using an AWS SDK client (like Boto3 or Node.js or Java).

Cleanup

First, empty DataBucket and AthenaWorkGroup to delete the stack successfully. To delete the sample application stack, use the latest version of AWS CLI and run:

aws cloudformation delete-stack --stack-name <stack-name>

Alternatively, delete the sample application stack in the CloudFormation console by selecting the stack and choosing Delete:

Conclusion

Amazon Athena console now provides an integration with AWS Step Functions’ workflows. You can use the provided patterns to create and visualize Step Functions’ workflows directly from the Amazon Athena console. Step Functions’ workflows that use Athena’s optimized API integration appear in the Athena console. To learn more about Amazon Athena, read the user guide.

To get started, open the Workflows page in the Athena console. Select Create Athena jobs with Step Functions Workflows to deploy a sample project, if you are new to Step Functions.

For more serverless learning resources, visit Serverless Land.

Offset lag metric for Amazon MSK as an event source for Lambda

2021-11-23 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/offset-lag-metric-for-amazon-msk-as-an-event-source-for-lambda/

This post written by Adam Wagner, Principal Serverless Solutions Architect.

Last year, AWS announced support for Amazon Managed Streaming for Apache Kafka (MSK) and self-managed Apache Kafka clusters as event sources for AWS Lambda. Today, AWS adds a new OffsetLag metric to Lambda functions with MSK or self-managed Apache Kafka event sources.

Offset in Apache Kafka is an integer that marks the current position of a consumer. OffsetLag is the difference in offset between the last record written to the Kafka topic and the last record processed by Lambda. Kafka expresses this in the number of records, not a measure of time. This metric provides visibility into whether your Lambda function is keeping up with the records added to the topic it is processing.

This blog walks through using the OffsetLag metric along with other Lambda and MSK metrics to understand your streaming application and optimize your Lambda function.

Overview

In this example application, a producer writes messages to a topic on the MSK cluster that is an event source for a Lambda function. Each message contains a number and the Lambda function finds the factors of that number. It outputs the input number and results to an Amazon DynamoDB table.

Finding all the factors of a number is fast if the number is small but takes longer for larger numbers. This difference means the size of the number written to the MSK topic influences the Lambda function duration.

Example application architecture

A Kafka client writes messages to a topic in the MSK cluster.
The Lambda event source polls the MSK topic on your behalf for new messages and triggers your Lambda function with batches of messages.
The Lambda function factors the number in each message and then writes the results to DynamoDB.

In this application, several factors can contribute to offset lag. The first is the volume and size of messages. If more messages are coming in, the Lambda may take longer to process them. Other factors are the number of partitions in the topic, and the number of concurrent Lambda functions processing messages. A full explanation of how Lambda concurrency scales with the MSK event source is in the documentation.

If the average duration of your Lambda function increases, this also tends to increase the offset lag. This lag could be latency in a downstream service or due to the complexity of the incoming messages. Lastly, if your Lambda function errors, the MSK event source retries the identical records set until they succeed. This retry functionality also increases offset lag.

Measuring OffsetLag

To understand how the new OffsetLag metric works, you first need a working MSK topic as an event source for a Lambda function. Follow this blog post to set up an MSK instance.

To find the OffsetLag metric, go to the CloudWatch console, select All Metrics from the left-hand menu. Then select Lambda, followed by By Function Name to see a list of metrics by Lambda function. Scroll or use the search bar to find the metrics for this function and select OffsetLag.

OffsetLag metric example

To make it easier to look at multiple metrics at once, create a CloudWatch dashboard starting with the OffsetLag metric. Select Actions -> Add to Dashboard. Select the Create new button, provide the dashboard a name. Choose Create, keeping the rest of the options at the defaults.

Adding OffsetLag to dashboard

After choosing Add to dashboard, the new dashboard appears. Choose the Add widget button to add the Lambda duration metric from the same function. Add another widget that combines both Lambda errors and invocations for the function. Finally, add a widget for the BytesInPerSec metric for the MSK topic. Find this metric under AWS/Kafka -> Broker ID, Cluster Name, Topic. Finally, click Save dashboard.

After a few minutes, you see a steady stream of invocations, as you would expect when consuming from a busy topic.

Data incoming to dashboard

This example is a CloudWatch dashboard showing the Lambda OffsetLag, Duration, Errors, and Invocations, along with the BytesInPerSec for the MSK topic.

In this example, the OffSetLag metric is averaging about eight, indicating that the Lambda function is eight records behind the latest record in the topic. While this is acceptable, there is room for improvement.

The first thing to look for is Lambda function errors, which can drive up offset lag. The metrics show that there are no errors so the next step is to evaluate and optimize the code.

The Lambda handler function loops through the records and calls the process_msg function on each record:

def lambda_handler(event, context):
    for batch in event['records'].keys():
        for record in event['records'][batch]:
            try:
                process_msg(record)
            except:
                print("error processing record:", record)
    return()

The process_msg function handles base64 decoding, calls a factor function to factor the number, and writes the record to a DynamoDB table:

def process_msg(record):
    #messages are base64 encoded, so we decode it here
    msg_value = base64.b64decode(record['value']).decode()
    msg_dict = json.loads(msg_value)
    #using the number as the hash key in the dynamodb table
    msg_id = f"{msg_dict['number']}"
    if msg_dict['number'] <= MAX_NUMBER:
        factors = factor_number(msg_dict['number'])
        print(f"number: {msg_dict['number']} has factors: {factors}")
        item = {'msg_id': msg_id, 'msg':msg_value, 'factors':factors}
        resp = ddb_table.put_item(Item=item)
    else:
        print(f"ERROR: {msg_dict['number']} is >= limit of {MAX_NUMBER}")

The heavy computation takes place in the factor function:

def factor(number):
    factors = [1,number]
    for x in range(2, (int(1 + number / 2))):
        if (number % x) == 0:
            factors.append(x)
    return factors

The code loops through all numbers up to the factored number divided by two. The code is optimized by only looping up to the square root of the number.

def factor(number):
    factors = [1,number]
    for x in range(2, 1 + int(number**0.5)):
        if (number % x) == 0:
            factors.append(x)
            factors.append(number // x)
    return factors

There are further optimizations and libraries for factoring numbers but this provides a noticeable performance improvement in this example.

Data after optimization

After deploying the code, refresh the metrics after a while to see the improvements:

The average Lambda duration has dropped to single-digit milliseconds and the OffsetLag is now averaging two.

If you see a noticeable change in the OffsetLag metric, there are several things to investigate. The input side of the system, increased messages per second, or a significant increase in the size of the message are a few options.

Conclusion

This post walks through implementing the OffsetLag metric to understand latency between the latest messages in the MSK topic and the records a Lambda function is processing. It also reviews other metrics that help understand the underlying cause of increases to the offset lag. For more information on this topic, refer to the documentation and other MSK Lambda metrics.

For more serverless learning resources, visit Serverless Land.

Expanding cross-Region event routing with Amazon EventBridge

2021-11-22 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/expanding-cross-region-event-routing-with-amazon-eventbridge/

This post is written by Stephen Liedig, Sr Serverless Specialist SA.

In April 2021, AWS announced a new feature for Amazon EventBridge that allows you to route events from any commercial AWS Region to US East (N. Virginia), US West (Oregon), and Europe (Ireland). From today, you can now route events between any AWS Regions, except AWS GovCloud (US) and China.

EventBridge enables developers to create event-driven applications by routing events between AWS services, integrated software as a service (SaaS) applications, and your own applications. This helps you produce loosely coupled, distributed, and maintainable architectures. With these new capabilities, you can now route events across Regions and accounts using the same model used to route events to existing targets.

Cross-Region event routing with Amazon EventBridge makes it easier for customers to develop multi-Region workloads to:

Centralize your AWS events into one Region for auditing and monitoring purposes, such as aggregating security events for compliance reasons in a single account.
Replicate events from source to destinations Regions to help synchronize data in cross-Region data stores.
Invoke asynchronous workflows in a different Region from a source event. For example, you can load balance from a target Region by routing events to another Region.

A previous post shows how cross-Region routing works. This blog post expands on these concepts and discusses a common use case for cross-Region event delivery – event auditing. This example explores how you can manage resources using AWS CloudFormation and EventBridge resource policies.

Multi-Region event auditing example walkthrough

Compliance is an important part of building event-driven applications and reacting to any potential policy or security violations. Customers use EventBridge to route security events from applications and globally distributed infrastructure into a single account for analysis. In many cases, they share specific AWS CloudTrail events with security teams. Customers also audit events from their custom-built applications to monitor sensitive data usage.

In this scenario, a company has their base of operations located in Asia Pacific (Singapore) with applications distributed across US East (N. Virginia) and Europe (Frankfurt). The applications in US East (N. Virginia) and Europe (Frankfurt) are using EventBridge for their respective applications and services. The security team in Asia Pacific (Singapore) wants to analyze events from the applications and CloudTrail events for specific API calls to monitor infrastructure security.

To create the rules to receive these events:

Create a new set of rules directly on all the event buses across the global infrastructure. Alternatively, delegate the responsibility of managing security rules to distributed teams that manage the event bus resources.
Provide the security team with the ability to manage rules centrally, and control the lifecycle of rules on the global infrastructure.

Allowing the security team to manage the resources centrally provides more scalability. It is more consistent with the design principle that event consumers own and manage the rules that define how they process events.

Deploying the example application

The following code snippets are shortened for brevity. The full source code of the solution is in the GitHub repository. The solution uses AWS Serverless Application Model (AWS SAM) for deployment. Clone the repo and navigate to the solution directory:

git clone https://github.com/aws-samples/amazon-eventbridge-resource-policy-samples
cd ./patterns/ cross-region-cross-account-pattern/

To allow the security team to start receiving accounts from any of the cross-Region accounts:

1. Create a security event bus in the Asia Pacific (Singapore) Region with a rule that processes events from the respective event sources.

For simplicity, this example uses an Amazon CloudWatch Logs target to visualize the events arriving from cross-Region accounts:

 SecurityEventBus:
  Type: AWS::Events::EventBus
  Properties:
   Name: !Ref SecurityEventBusName

 # This rule processes events coming in from cross-Region accounts
 SecurityAnalysisRule:
  Type: AWS::Events::Rule
  Properties:
   Name: SecurityAnalysisRule
   Description: Analyze events from cross-Region event buses
   EventBusName: !GetAtt SecurityEventBus.Arn
   EventPattern:
    source:
     - anything-but: com.company.security
   State: ENABLED
   RoleArn: !GetAtt WriteToCwlRole.Arn
   Targets:
    - Id: SendEventToSecurityAnalysisRule
     Arn: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:${SecurityAnalysisRuleTarget}"

In this example, you set the event pattern to process any event from a source that is not from the security team’s own domain. This allows you to process events from any account in any Region. You can filter this further as needed.

2. Set an event bus policy on each default and custom event bus that the security team must receive events from.

This policy allows the security team to create rules to route events to its own security event bus in the Asia Pacific (Singapore) Region. The following policy defines a custom event bus in Account 2 in US East (N. Virginia) and an AWS::Events::EventBusPolicy that sets the Principal as the security team account.

This allows the security team to manage rules on the CustomEventBus:

 CustomEventBus: 
  Type: AWS::Events::EventBus
  Properties: 
    Name: !Ref EventBusName

 SecurityServiceRuleCreationStatement:
  Type: AWS::Events::EventBusPolicy
  Properties:
   EventBusName: !Ref CustomEventBus # If you omit this, the default event bus is used.
   StatementId: "AllowCrossRegionRulesForSecurityTeam"
   Statement:
    Effect: "Allow"
    Principal:
     AWS: !Sub "arn:aws:iam::${SecurityAccountNo}:root"
    Action:
     - "events:PutRule"
     - "events:DeleteRule"
     - "events:DescribeRule"
     - "events:DisableRule"
     - "events:EnableRule"
     - "events:PutTargets"
     - "events:RemoveTargets"
    Resource:
     - !Sub 'arn:aws:events:${AWS::Region}:${AWS::AccountId}:rule/${CustomEventBus.Name}/*'
    Condition:
     StringEqualsIfExists:
      "events:creatorAccount": "${aws:PrincipalAccount}"

3. With the policies set on the cross-Region accounts, now create the rules. Because you cannot create CloudFormation resources across Regions, you must define the rules in separate templates. This also gives the ability to expand to other Regions.

Once the template is deployed to the cross-Region accounts, use EventBridge resource policies to propagate rule definitions across accounts in the same Region. The security account must have permission to create CloudFormation resources in the cross-Region accounts to deploy the rule templates.

There are two parts to the rule templates. The first specifies a role that allows EventBridge to assume a role to send events to the target event bus in the security account:

 # This IAM role allows EventBridge to assume the permissions necessary to send events
 # from the source event buses to the destination event bus.
 SourceToDestinationEventBusRole:
  Type: "AWS::IAM::Role"
  Properties:
   AssumeRolePolicyDocument:
    Version: 2012-10-17
    Statement:
     - Effect: Allow
      Principal:
       Service:
        - events.amazonaws.com
      Action:
       - "sts:AssumeRole"
   Path: /
   Policies:
    - PolicyName: PutEventsOnDestinationEventBus
     PolicyDocument:
      Version: 2012-10-17
      Statement:
       - Effect: Allow
        Action: "events:PutEvents"
        Resource:
         - !Ref SecurityEventBusArn

The second is the definition of the rule resource. This requires the Amazon Resource Name (ARN) of the event bus where you want to put the rule, the ARN of the target event bus in the security account, and a reference to the SourceToDestinationEventBusRole role:

 SecurityAuditRule2:
  Type: AWS::Events::Rule
  Properties:
   Name: SecurityAuditRuleAccount2
   Description: Audit rule for the Security team in Singapore
   EventBusName: !Ref EventBusArnAccount2 # ARN of the custom event bus in Account 2
   EventPattern:
    source:
     - com.company.marketing
   State: ENABLED
   Targets:
    - Id: SendEventToSecurityEventBusArn
     Arn: !Ref SecurityEventBusArn
     RoleArn: !GetAtt SourceToDestinationEventBusRole.Arn

You can use the AWS SAM CLI to deploy this:

sam deploy -t us-east-1-rules.yaml \
  --stack-name us-east-1-rules \
  --region us-east-1 \
  --profile default \
  --capabilities=CAPABILITY_IAM \
  --parameter-overrides SecurityEventBusArn="arn:aws:events:ap-southeast-1:111111111111:event-bus/SecurityEventBus" EventBusArnAccount1="arn:aws:events:us-east-1:111111111111:event-bus/default" EventBusArnAccount2="arn:aws:events:us-east-1:222222222222:event-bus/custom-eventbus-account-2"

Testing the example application

With the rules deployed across the Regions, you can test by sending events to the event bus in Account 2:

Navigate to the applications/account_2 directory. Here you find an events.json file, which you use as input for the put-events API call.
Run the following command using the AWS CLI. This sends messages to the event bus in us-east-1 which are routed to the security event bus in ap-southeast-1:
```
aws events put-events \
 --region us-east-1 \
 --profile [NAMED PROFILE FOR ACCOUNT 2] \
 --entries file://events.json
```
If you have run this successfully, you see:
Entries:
- EventId: a423b35e-3df0-e5dc-b854-db9c42144fa2
- EventId: 5f22aea8-51ea-371f-7a5f-8300f1c93973
- EventId: 7279fa46-11a6-7495-d7bb-436e391cfcab
- EventId: b1e1ecc1-03f7-e3ef-9aa4-5ac3c8625cc7
- EventId: b68cea94-28e2-bfb9-7b1f-9b2c5089f430
- EventId: fc48a303-a1b2-bda8-8488-32daa5f809d8
FailedEntryCount: 0
Navigate to the Amazon CloudWatch console to see a collection of log entries with the events you published. The log group is /aws/events/SecurityAnalysisRule.

Congratulations, you have successfully sent your first events across accounts and Regions!

Conclusion

With cross-Region event routing in EventBridge, you can now route events to and from any AWS Region. This post explains how to manage and configure cross-Region event routing using CloudFormation and EventBridge resource policies to simplify rule propagation across your global event bus infrastructure. Finally, I walk through an example you can deploy to your AWS account.

For more serverless learning resources, visit Serverless Land.

Introducing mutual TLS authentication for Amazon MSK as an event source

2021-11-19 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-mutual-tls-authentication-for-amazon-msk-as-an-event-source/

This post is written by Uma Ramadoss, Senior Specialist Solutions Architect, Integration.

Today, AWS Lambda is introducing mutual TLS (mTLS) authentication for Amazon Managed Streaming for Apache Kafka (Amazon MSK) and self-managed Kafka as an event source.

Many customers use Amazon MSK for streaming data from multiple producers. Multiple subscribers can then consume the streaming data and build data pipelines, analytics, and data integration. To learn more, read Using Amazon MSK as an event source for AWS Lambda.

You can activate any combination of authentication modes (mutual TLS, SASL SCRAM, or IAM access control) on new or existing clusters. This is useful if you are migrating to a new authentication mode or must run multiple authentication modes simultaneously. Lambda natively supports consuming messages from both self-managed Kafka and Amazon MSK through event source mapping.

By default, the TLS protocol only requires a server to authenticate itself to the client. The authentication of the client to the server is managed by the application layer. The TLS protocol also offers the ability for the server to request that the client send an X.509 certificate to prove its identity. This is called mutual TLS as both parties are authenticated via certificates with TLS.

Mutual TLS is a commonly used authentication mechanism for business-to-business (B2B) applications. It’s used in standards such as Open Banking, which enables secure open API integrations for financial institutions. It is one of the popular authentication mechanisms for customers using Kafka.

To use mutual TLS authentication for your Kafka-triggered Lambda functions, you provide a signed client certificate, the private key for the certificate, and an optional password if the private key is encrypted. This establishes a trust relationship between Lambda and Amazon MSK or self-managed Kafka. Lambda supports self-signed server certificates or server certificates signed by a private certificate authority (CA) for self-managed Kafka. Lambda trusts the Amazon MSK certificate by default as the certificates are signed by Amazon Trust Services CAs.

This blog post explains how to set up a Lambda function to process messages from an Amazon MSK cluster using mutual TLS authentication.

Overview

Using Amazon MSK as an event source operates in a similar way to using Amazon SQS or Amazon Kinesis. You create an event source mapping by attaching Amazon MSK as event source to your Lambda function.

The Lambda service internally polls for new records from the event source, reading the messages from one or more partitions in batches. It then synchronously invokes your Lambda function, sending each batch as an event payload. Lambda continues to process batches until there are no more messages in the topic.

The Lambda function’s event payload contains an array of records. Each array item contains details of the topic and Kafka partition identifier, together with a timestamp and base64 encoded message.

Kafka event payload

You store the signed client certificate, the private key for the certificate, and an optional password if the private key is encrypted in the AWS Secrets Manager as a secret. You provide the secret in the Lambda event source mapping.

The steps for using mutual TLS authentication for Amazon MSK as event source for Lambda are:

Create a private certificate authority (CA) using AWS Certificate Manager (ACM) Private Certificate Authority (PCA).
Create a client certificate and private key. Store them as secret in AWS Secrets Manager.
Create an Amazon MSK cluster and a consuming Lambda function using the AWS Serverless Application Model (AWS SAM).
Attach the event source mapping.

This blog walks through these steps in detail.

Prerequisites

Install AWS Command Line Interface (CLI) and AWS SAM CLI.
Install OpenSSL, jq, npm, and Git.

1. Creating a private CA.

To use mutual TLS client authentication with Amazon MSK, create a root CA using AWS ACM Private Certificate Authority (PCA). We recommend using independent ACM PCAs for each MSK cluster when you use mutual TLS to control access. This ensures that TLS certificates signed by PCAs only authenticate with a single MSK cluster.

From the AWS Certificate Manager console, choose Create a Private CA.
In the Select CA type panel, select Root CA and choose Next.

Select Root CA

In the Configure CA subject name panel, provide your certificate details, and choose Next.

Provide your certificate details

From the Configure CA key algorithm panel, choose the key algorithm for your CA and choose Next.

Configure CA key algorithm

From the Configure revocation panel, choose any optional certificate revocation options you require and choose Next.

Configure revocation

Continue through the screens to add any tags required, allow ACM to renew certificates, review your options, and confirm pricing. Choose Confirm and create.
Once the CA is created, choose Install CA certificate to activate your CA. Configure the validity of the certificate and the signature algorithm and choose Next.

Configure certificate

Review the certificate details and choose Confirm and install. Note down the Amazon Resource Name (ARN) of the private CA for the next section.

Review certificate details

2. Creating a client certificate.

You generate a client certificate using the root certificate you previously created, which is used to authenticate the client with the Amazon MSK cluster using mutual TLS. You provide this client certificate and the private key as AWS Secrets Manager secrets to the AWS Lambda event source mapping.

On your local machine, run the following command to create a private key and certificate signing request using OpenSSL. Enter your certificate details. This creates a private key file and a certificate signing request file in the current directory.

openssl req -new -newkey rsa:2048 -days 365 -keyout key.pem -out client_cert.csr -nodes

OpenSSL create a private key and certificate signing request

Use the AWS CLI to sign your certificate request with the private CA previously created. Replace Private-CA-ARN with the ARN of your private CA. The certificate validity value is set to 300, change this if necessary. Save the certificate ARN provided in the response.

aws acm-pca issue-certificate --certificate-authority-arn Private-CA-ARN --csr fileb://client_cert.csr --signing-algorithm "SHA256WITHRSA" --validity Value=300,Type="DAYS"

Retrieve the certificate that ACM signed for you. Replace the Private-CA-ARN and Certificate-ARN with the ARN you obtained from the previous commands. This creates a signed certificate file called client_cert.pem.

aws acm-pca get-certificate --certificate-authority-arn Private-CA-ARN --certificate-arn Certificate-ARN | jq -r '.Certificate + "\n" + .CertificateChain' >> client_cert.pem

Create a new file called secret.json with the following structure

{
"certificate":"",
"privateKey":""
}

Copy the contents of the client_cert.pem in certificate and the content of key.pem in privatekey. Ensure that there are no extra spaces added. The file structure looks like this:

Certificate file structure

Create the secret and save the ARN for the next section.

aws secretsmanager create-secret --name msk/mtls/lambda/clientcert --secret-string file://secret.json

3. Setting up an Amazon MSK cluster with AWS Lambda as a consumer.

Amazon MSK is a highly available service, so it must be configured to run in a minimum of two Availability Zones in your preferred Region. To comply with security best practice, the brokers are usually configured in private subnets in each Region.

You can use AWS CLI, AWS Management Console, AWS SDK and AWS CloudFormation to create the cluster and the Lambda functions. This blog uses AWS SAM to create the infrastructure and the associated code is available in the GitHub repository.

The AWS SAM template creates the following resources:

Amazon Virtual Private Cloud (VPC).
Amazon MSK cluster with mutual TLS authentication.
Lambda function for consuming the records from the Amazon MSK cluster.
IAM roles.
Lambda function for testing the Amazon MSK integration by publishing messages to the topic.

The VPC has public and private subnets in two Availability Zones with the private subnets configured to use a NAT Gateway. You can also set up VPC endpoints with PrivateLink to allow the Amazon MSK cluster to communicate with Lambda. To learn more about different configurations, see this blog post.

The Lambda function requires permission to describe VPCs and security groups, and manage elastic network interfaces to access the Amazon MSK data stream. The Lambda function also needs two Kafka permissions: kafka:DescribeCluster and kafka:GetBootstrapBrokers. The policy template AWSLambdaMSKExecutionRole includes these permissions. The Lambda function also requires permission to get the secret value from AWS Secrets Manager for the secret you configure in the event source mapping.

  ConsumerLambdaFunctionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaMSKExecutionRole
      Policies:
        - PolicyName: SecretAccess
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action: "SecretsManager:GetSecretValue"
                Resource: "*"

This release adds two new SourceAccessConfiguration types to the Lambda event source mapping:

1. CLIENT_CERTIFICATE_TLS_AUTH – (Amazon MSK, Self-managed Apache Kafka) The Secrets Manager ARN of your secret key containing the certificate chain (PEM), private key (PKCS#8 PEM), and private key password (optional) used for mutual TLS authentication of your Amazon MSK/Apache Kafka brokers. A private key password is required if the private key is encrypted.

2. SERVER_ROOT_CA_CERTIFICATE – This is only for self-managed Apache Kafka. This contains the Secrets Manager ARN of your secret containing the root CA certificate used by your Apache Kafka brokers in PEM format. This is not applicable for Amazon MSK as Amazon MSK brokers use public AWS Certificate Manager certificates which are trusted by AWS Lambda by default.

Deploying the resources:

To deploy the example application:

Clone the GitHub repository

git clone https://github.com/aws-samples/aws-lambda-msk-mtls-integration.git

Navigate to the aws-lambda-msk-mtls-integration directory. Copy the client certificate file and the private key file to the producer lambda function code.

cd aws-lambda-msk-mtls-integration
cp ../client_cert.pem code/producer/client_cert.pem
cp ../key.pem code/producer/client_key.pem

Navigate to the code directory and build the application artifacts using the AWS SAM build command.

cd code
sam build

Run sam deploy to deploy the infrastructure. Provide the Stack Name, AWS Region, ARN of the private CA created in section 1. Provide additional information as required in the sam deploy and deploy the stack.

sam deploy -g

Running sam deploy -g

The stack deployment takes about 30 minutes to complete. Once complete, note the output values.

Create the event source mapping for the Lambda function. Replace the CONSUMER_FUNCTION_NAME and MSK_CLUSTER_ARN from the output of the stack created by the AWS SAM template. Replace SECRET_ARN with the ARN of the AWS Secrets Manager secret created previously.

aws lambda create-event-source-mapping --function-name CONSUMER_FUNCTION_NAME --batch-size 10 --starting-position TRIM_HORIZON --topics exampleTopic --event-source-arn MSK_CLUSTER_ARN --source-access-configurations '[{"Type": "CLIENT_CERTIFICATE_TLS_AUTH","URI": "SECRET_ARN"}]'

Navigate one directory level up and configure the producer function with the Amazon MSK broker details. Replace the PRODUCER_FUNCTION_NAME and MSK_CLUSTER_ARN from the output of the stack created by the AWS SAM template.

cd ../
./setup_producer.sh MSK_CLUSTER_ARN PRODUCER_FUNCTION_NAME

Verify that the event source mapping state is enabled before moving on to the next step. Replace UUID from the output of step 5.

aws lambda get-event-source-mapping --uuid UUID

Publish messages using the producer. Replace PRODUCER_FUNCTION_NAME from the output of the stack created by the AWS SAM template. The following command creates a Kafka topic called exampleTopic and publish 100 messages to the topic.

./produce.sh PRODUCER_FUNCTION_NAME exampleTopic 100

Verify that the consumer Lambda function receives and processes the messages by checking in Amazon CloudWatch log groups. Navigate to the log group by searching for aws/lambda/{stackname}-MSKConsumerLambda in the search bar.

Consumer function log stream

Conclusion

Lambda now supports mutual TLS authentication for Amazon MSK and self-managed Kafka as an event source. You now have the option to provide a client certificate to establish a trust relationship between Lambda and MSK or self-managed Kafka brokers. It supports configuration via the AWS Management Console, AWS CLI, AWS SDK, and AWS CloudFormation.

To learn more about how to use mutual TLS Authentication for your Kafka triggered AWS Lambda function, visit AWS Lambda with self-managed Apache Kafka and Using AWS Lambda with Amazon MSK.

Publishing messages in batch to Amazon SNS topics

2021-11-18 Talia Nassi

Post Syndicated from Talia Nassi original https://aws.amazon.com/blogs/compute/publishing-messages-in-batch-to-amazon-sns-topics/

This post is written by Heeki Park (Principal Solutions Architect, Serverless Specialist), Marc Pinaud (Senior Product Manager, Amazon SNS), Amir Eldesoky (Software Development Engineer, Amazon SNS), Jack Li (Software Development Engineer, Amazon SNS), and William Nguyen (Software Development Engineer, Amazon SNS).

Today, we are announcing the ability for AWS customers to publish messages in batch to Amazon SNS topics. Until now, you were only able to publish one message to an SNS topic per Publish API request. With the new PublishBatch API, you can send up to 10 messages at a time in a single API request. This reduces cost for API requests by up to 90%, as you need fewer API requests to publish the same number of messages.

Introducing the PublishBatch API

Consider a log processing application where you process system logs and have different requirements for downstream processing. For example, you may want to do inference on
incoming log data, populate an operational Amazon OpenSearch Service environment, and store log data in an enterprise data lake.

Systems send log data to a standard SNS topic, and Amazon SQS queues and Amazon Kinesis Data Firehose are configured as subscribers. An AWS Lambda function subscribes to the first SQS queue and uses machine learning models to perform inference to detect security incidents or system access anomalies. A Lambda function subscribes to the second SQS queue and emits those log entries to an Amazon OpenSearch Service cluster. The workload uses Kibana dashboards to visualize log data. An Amazon Kinesis Data Firehose delivery stream subscribes to the SNS topic and archives all log data into Amazon S3. This allows data scientists to conduct further investigation and research on those logs.

To do this, the following Java code publishes a set of log messages. In this code, you construct a publish request for a single message to an SNS topic and submit that request via the publish() method:

// tab 1: standard publish example
private static AmazonSNS snsClient;
private static final String MESSAGE_PAYLOAD = " 192.168.1.100 - - [28/Oct/2021:10:27:10 -0500] "GET /index.html HTTP/1.1" 200 3395";

PublishRequest request = new PublishRequest()
    .withTopicArn(topicArn)
    .withMessage(MESSAGE_PAYLOAD);
PublishResult response = snsClient.publish(request);

// tab 2: fifo publish example
private static AmazonSNS snsClient;
private static final String MESSAGE_PAYLOAD = " 192.168.1.100 - - [28/Oct/2021:10:27:10 -0500] "GET /index.html HTTP/1.1" 200 3395";
private static final String MESSAGE_FIFO_GROUP = "server1234";

PublishRequest request = new PublishRequest()
    .withTopicArn(topicArn)
    .withMessage(MESSAGE_PAYLOAD)
    .withMessageGroupId(MESSAGE_FIFO_GROUP)
    .withMessageDeduplicationId(UUID.randomUUID().toString());
PublishResult response = snsClient.publish(request);

If you extended the example above and had 10 log lines that each needed to be published as a message, you would have to write code to construct 10 publish requests, and subsequently submit each of those requests via the publish() method.

With the new ability to publish batch messages, you write the following new code. In the code below, you construct a list of publish entries first, then create a single publish batch request, and subsequently submit that batch request via the new publishBatch() method. In the code below, you use a sample helper method getLoggingPayload(i) to get the appropriate payload for the message, which you can replace with your own business logic.

// tab 1: standard publish example
private static final String MESSAGE_BATCH_ID_PREFIX = "server1234-batch-id-";

List<PublishBatchRequestEntry> entries = IntStream.range(0, 10)
.mapToObj(i -> {
new PublishBatchRequestEntry()
.withId(MESSAGE_BATCH_ID_PREFIX + i)
.withMessage(getLoggingPayload(i));
})
.collect(Collectors.toList());
PublishBatchRequest request = new PublishBatchRequest()
.withTopicArn(topicArn)
.withPublishBatchRequestEntries(entries);
PublishBatchResult response = snsClient.publishBatch(request);

// tab 2: fifo publish example
private static final String MESSAGE_BATCH_ID_PREFIX = "server1234-batch-id-";
private static final String MESSAGE_FIFO_GROUP = "server1234";


List<PublishBatchRequestEntry> entries = IntStream.range(0, 10)
.mapToObj(i -> {
new PublishBatchRequestEntry()
.withId(MESSAGE_BATCH_ID_PREFIX + i)
.withMessage(getLoggingPayload(i))
.withMessageGroupId(MESSAGE_FIFO_GROUP)
.withMessageDeduplicationId(UUID.randomUUID().toString());
})
.collect(Collectors.toList());
PublishBatchRequest request = new PublishBatchRequest()
.withTopicArn(topicArn)
.withPublishBatchRequestEntries(entries);
PublishBatchResult response = snsClient.publishBatch(request);

In the list of publish requests, the application must assign a unique batch ID (up to 80 characters) to each publish request within that batch. When the SNS service successfully receives a message, the SNS service assigns a unique message ID and returns that message ID in the response object.

If publishing to a FIFO topic, the SNS service additionally returns a sequence number in the response. When publishing a batch of messages, the PublishBatchResult object returns a list of response objects for successful and failed messages. If you iterate through the list of response objects for successful messages, you might see the following:

// tab 1: standard publish output
{
"Id": "server1234-batch-id-0",
"MessageId": "fcaef5b3-e9e3-5c9e-b761-ac46c4a779bb",
...
}

// tab 2: fifo publish output
{
"Id": "server1234-batch-id-0",
"MessageId": "fcaef5b3-e9e3-5c9e-b761-ac46c4a779bb",
"SequenceNumber": "10000000000000003000",
...
}

When receiving the message from SNS in the SQS queue, the application reads the following message:

// tab 1: standard publish output
{
"Type" : "Notification",
"MessageId" : "fcaef5b3-e9e3-5c9e-b761-ac46c4a779bb",
"TopicArn" : "arn:aws:sns:us-east-1:112233445566:publishBatchTopic",
"Message" : "payload-0",
"Timestamp" : "2021-10-28T22:58:12.862Z",
"UnsubscribeURL" : "http://sns.us-east-1.amazon.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east-1:112233445566:publishBatchTopic:ff78260a-0953-4b60-9c2c-122ebcb5fc96"
}

// tab 2: fifo publish output
{
"Type" : "Notification",
"MessageId" : "fcaef5b3-e9e3-5c9e-b761-ac46c4a779bb",
"SequenceNumber" : "10000000000000003000",
"TopicArn" : "arn:aws:sns:us-east-1:112233445566:publishBatchTopic",
"Message" : "payload-0",
"Timestamp" : "2021-10-28T22:58:12.862Z",
"UnsubscribeURL" : "http://sns.us-east-1.amazon.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east-1:112233445566:publishBatchTopic.fifo:ff78260a-0953-4b60-9c2c-122ebcb5fc96"
}

In the standard publish example, the MessageId of fcaef5b3-e9e3-5c9e-b761-ac46c4a779bb is propagated down to the message in SQS. In the FIFO publish example, the SequenceNumber of 10000000000000003000 is also propagated down to the message in SQS.

Handling errors and quotas

When publishing messages in batch, the application must handle errors that may have occurred during the publish batch request. Errors can occur at two different levels. The first is when publishing the batch request to the SNS topic. For example, if the application does not specify a unique message batch ID, it fails with the following error:

com.amazonaws.services.sns.model.BatchEntryIdsNotDistinctException: Two or more batch entries in the request have the same Id. (Service: AmazonSNS; Status Code: 400; Error Code: BatchEntryIdsNotDistinct; Request ID: 44cdac03-eeac-5760-9264-f5f99f4914ad; Proxy: null)

The second is within the batch request at the message level. The application must inspect the returned PublishBatchResult object by iterating through successful and failed responses:

PublishBatchResult publishBatchResult = snsClient.publishBatch(request);
publishBatchResult.getSuccessful().forEach(entry -> {
System.out.println(entry.toString());
});

publishBatchResult.getFailed().forEach(entry -> {
System.out.println(entry.toString());
});

With respect to quotas, the overall message throughput for an SNS topic remains the same. For example, in US East (N. Virginia), standard topics support up to 30,000 messages per second. Before this feature, 30,000 messages also meant 30,000 API requests per second. Because SNS now supports up to 10 messages per request, you can publish the same number of messages using only 3,000 API requests. With FIFO topics, the message throughput remains the same at 300 messages per second, but you can now send that volume of messages using only 30 API requests, thus reducing your messaging costs with SNS.

Conclusion

SNS now supports the ability to publish up to 10 messages in a single API request, reducing costs for publishing messages into SNS. Your applications can validate the publish status of each of the messages sent in the batch and handle failed publish requests accordingly. Message throughput to SNS topics remains the same for both standard and FIFO topics.

Learn more about this ability in the SNS Developer Guide.
Learn more about the details of the API request in the SNS API reference.
Learn more about SNS quotas.

For more serverless learning resources, visit Serverless Land.

Deploying AWS Lambda layers automatically across multiple Regions

2021-11-18 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/deploying-aws-lambda-layers-automatically-across-multiple-regions/

This post is written by Ben Freiberg, Solutions Architect, and Markus Ziller, Solutions Architect.

Many developers import libraries and dependencies into their AWS Lambda functions. These dependencies can be zipped and uploaded as part of the build and deployment process but it’s often easier to use Lambda layers instead.

A Lambda layer is an archive containing additional code, such as libraries or dependencies. Layers are deployed as immutable versions, and the version number increments each time you publish a new layer. When you include a layer in a function, you specify the layer version you want to use.

Lambda layers simplify and speed up the development process by providing common dependencies and reducing the deployment size of your Lambda functions. To learn more, refer to Using Lambda layers to simplify your development process.

Many customers build Lambda layers for use across multiple Regions. But maintaining up-to-date and consistent layer versions across multiple Regions is a manual process. Layers are set as private automatically but they can be shared with other AWS accounts or shared publicly. Permissions only apply to a single version of a layer. This solution automates the creation and deployment of Lambda layers across multiple Regions from a centralized pipeline.

Overview of the solution

This solution uses AWS Lambda, AWS CodeCommit, AWS CodeBuild and AWS CodePipeline.

This diagram outlines the workflow implemented in this blog:

A CodeCommit repository contains the language-specific definition of dependencies and libraries that the layer contains, such as package.json for Node.js or requirements.txt for Python. Each commit to the main branch triggers an execution of the surrounding CodePipeline.
A CodeBuild job uses the provided buildspec.yaml to create a zipped archive containing the layer contents. CodePipeline automatically stores the output of the CodeBuild job as artifacts in a dedicated Amazon S3 bucket.
A Lambda function is invoked for each configured Region.
The function first downloads the zip archive from S3.
Next, the function creates the layer version in the specified Region with the configured permissions.

Walkthrough

The following walkthrough explains the components and how the provisioning can be automated via CDK. For this walkthrough, you need:

An AWS account.
Installed and authenticated AWS CLI (authenticate with an IAM user or an AWS STS security token).
Installed Node.js, TypeScript.
Installed git.
AWS CDK installed.

To deploy the sample stack:

Clone the associated GitHub repository by running the following command in a local directory:
git clone https://github.com/aws-samples/multi-region-lambda-layers
Open the repository in your preferred editor and review the contents of the src and cdk folder.
Follow the instructions in the README.md to deploy the stack.
Check the execution history of your pipeline in the AWS Management Console. The pipeline has been started once already and published a first version of the Lambda layer.

Code repository

The source code of the Lambda layers is stored in AWS CodeCommit. This is a secure, highly scalable, managed source control service that hosts private Git repositories. This example initializes a new repository as part of the CDK stack:

    const asset = new Asset(this, 'SampleAsset', {
      path: path.join(__dirname, '../../res')
    });

    const cfnRepository = new codecommit.CfnRepository(this, 'lambda-layer-source', {
      repositoryName: 'lambda-layer-source',
      repositoryDescription: 'Contains the source code for a nodejs12+14 Lambda layer.',
      code: {
        branchName: 'main',
        s3: {
          bucket: asset.s3BucketName,
          key: asset.s3ObjectKey
        }
      },
    });

This code uploads the contents of the ./cdk/res/ folder to an S3 bucket that is managed by the CDK. The CDK then initializes a new CodeCommit repository with the contents of the bucket. In this case, the repository gets initialized with the following files:

LICENSE: A text file describing the license for this Lambda layer
package.json: In Node.js, the package.json file is a manifest for projects. It defines dependencies, scripts, and metainformation about the project. The npm install command installs all project dependencies in a node_modules folder. This is where you define the contents of the Lambda layer.

The default package.json in the sample code defines a Lambda layer with the latest version of the AWS SDK for JavaScript:

{
    "name": "lambda-layer",
    "version": "1.0.0",
    "description": "Sample AWS Lambda layer",
    "dependencies": {
      "aws-sdk": "latest"
    }
}

To see what is included in the layer, run npm install in the ./cdk/res/ directory. This shows the files that are bundled into the Lambda layer. The contents of this folder initialize the CodeCommit repository, so delete node_modules and package-lock.json inspecting these files.

This blog post uses a new CodeCommit repository as the source but you can adapt this to other providers. CodePipeline also supports repositories on GitHub and Bitbucket. To connect to those providers, see the documentation.

CI/CD Pipeline

CodePipeline automates the process of building and distributing Lambda layers across Region for every change to the main branch of the source repository. It is a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. CodePipeline automates the build, test, and deploy phases of your release process every time there is a code change, based on the release model you define.

The CDK creates a pipeline in CodePipeline and configures it so that every change to the code base of the Lambda layer runs through the following three stages:

new codepipeline.Pipeline(this, 'Pipeline', {
      pipelineName: 'LambdaLayerBuilderPipeline',
      stages: [
        {
          stageName: 'Source',
          actions: [sourceAction]
        },
        {
          stageName: 'Build',
          actions: [buildAction]
        },
        {
          stageName: 'Distribute',
          actions: parallel,
        }
      ]
    });

Source

The source phase is the first phase of every run of the pipeline. It is typically triggered by a new commit to the main branch of the source repository. You can also start the source phase manually with the following AWS CLI command:

aws codepipeline start-pipeline-execution --name LambdaLayerBuilderPipeline

When started manually, the current head of the main branch is used. Otherwise CodePipeline checks out the code in the revision of the commit that triggered the pipeline execution.

CodePipeline stores the code locally and uses it as an output artifact of the Source stage. Stages use input and output artifacts that are stored in the Amazon S3 artifact bucket you chose when you created the pipeline. CodePipeline zips and transfers the files for input or output artifacts as appropriate for the action type in the stage.

Build

In the second phase of the pipeline, CodePipeline installs all dependencies and packages according to the specs of the targeted Lambda runtime. CodeBuild is a fully managed build service in the cloud. It reduces the need to provision, manage, and scale your own build servers. It provides prepackaged build environments for popular programming languages and build tools like npm for Node.js.

In CodeBuild, you use build specifications (buildspecs) to define what commands need to run to build your application. Here, it runs commands in a provisioned Docker image with Amazon Linux 2 to do the following:

Create the folder structure expected by Lambda Layer.
Run npm install to install all Node.js dependencies.
Package the code into a layer.zip file and define layer.zip as output of the Build stage.

The following CDK code highlights the specifications of the CodeBuild project.

const buildAction = new codebuild.PipelineProject(this, 'lambda-layer-builder', {
      buildSpec: codebuild.BuildSpec.fromObject({
        version: '0.2',
        phases: {
          install: {
            commands: [
              'mkdir -p node_layer/nodejs',
              'cp package.json ./node_layer/nodejs/package.json',
              'cd ./node_layer/nodejs',
              'npm install',
            ]
          },
          build: {
            commands: [
              'rm package-lock.json',
              'cd ..',
              'zip ../layer.zip * -r',
            ]
          }
        },
        artifacts: {
          files: [
            'layer.zip',
          ]
        }
      }),
      environment: {
        buildImage: codebuild.LinuxBuildImage.STANDARD_5_0
      }
    })

Distribute

In the final stage, Lambda uses layer.zip to create and publish a Lambda layer across multiple Regions. The sample code defines four Regions as targets for the distribution process:

regionCodesToDistribute: ['eu-central-1', 'eu-west-1', 'us-west-1', 'us-east-1']

The Distribution phase consists of n (one per Region) parallel invocations of the same Lambda function, each with userParameter.region set to the respective Region. This is defined in the CDK stack:

const parallel = props.regionCodesToDistribute.map((region) => new codepipelineActions.LambdaInvokeAction({
      actionName: `distribute-${region}`,
      lambda: distributor,
      inputs: [buildOutput],
      userParameters: { region, layerPrincipal: props.layerPrincipal }
}));

Each Lambda function runs the following code to publish a new Lambda layer in each Region:

const parallel = props.regionCodesToDistribute.map((region) => new codepipelineActions.LambdaInvokeAction({
      actionName: `distribute-${region}`,
      lambda: distributor,
      inputs: [buildOutput],
      userParameters: { region, layerPrincipal: props.layerPrincipal }
}));

Each Lambda function runs the following code to publish a new Lambda layer in each Region:

// Simplified code for brevity
// Omitted error handling, permission management and logging 
// See code samples for full code.
export async function handler(event: any) {
    // #1 Get job specific parameters (e.g. target region)
    const { location } = event['CodePipeline.job'].data.inputArtifacts[0];
    const { region, layerPrincipal } = JSON.parse(event["CodePipeline.job"].data.actionConfiguration.configuration.UserParameters);
    
    // #2 Get location of layer.zip and download it locally
    const layerZip = s3.getObject(/* Input artifact location*/);
    const lambda = new Lambda({ region });
    // #3 Publish a new Lambda layer version based on layer.zip
    const layer = lambda.publishLayerVersion({
        Content: {
            ZipFile: layerZip.Body
        },
        LayerName: 'sample-layer',
        CompatibleRuntimes: ['nodejs12.x', 'nodejs14.x']
    })
    
    // #4 Report the status of the operation back to CodePipeline
    return codepipeline.putJobSuccessResult(..);
}

After each Lambda function completes successfully, the pipeline ends. In a production application, you likely would have additional steps after publishing. For example, it may send notifications via Amazon SNS. To learn more about other possible integrations, read Working with pipeline in CodePipeline.

Testing the workflow

With this automation, you can release a new version of the Lambda layer by changing package.json in the source repository.

Add the AWS X-Ray SDK for Node.js as a dependency to your project, by making the following changes to package.json and committing the new version to your main branch:

{
    "name": "lambda-layer",
    "version": "1.0.0",
    "description": "Sample AWS Lambda layer",
    "dependencies": {
        "aws-sdk": "latest",
        "aws-xray-sdk": "latest"
    }
}

After committing the new version to the repository, the pipeline is triggered again. After a while, you see that an updated version of the Lambda layer is published to all Regions:

Cleaning up

Many services in this blog post are available in the AWS Free Tier. However, using this solution may incur cost and you should tear down the stack if you don’t need it anymore. Cleaning up steps are included in the readme in the repository.

Conclusion

This blog post shows how to create a centralized pipeline to build and distribute Lambda layers consistently across multiple Regions. The pipeline is configurable and allows you to adapt the Regions and permissions according to your use-case.

For more serverless learning resources, visit Serverless Land.

Modernizing deployments with container images in AWS Lambda

2021-11-17 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/modernizing-deployments-with-container-images-in-aws-lambda/

This post is written by Joseph Keating, AWS Modernization Architect, and Virginia Chu, Sr. DevSecOps Architect.

Container image support for AWS Lambda enables developers to package function code and dependencies using familiar patterns and tools. With this pattern, developers use standard tools like Docker to package their functions as container images and deploy them to Lambda.

In a typical deployment process for image-based Lambda functions, the container and Lambda function are created or updated in the same process. However, some use cases require developers to create the image first, and then update one or more Lambda functions from that image. In these situations, organizations may mandate that infrastructure components such as Amazon S3 and Amazon Elastic Container Registry (ECR) are centralized and deployed separately from their application deployment pipelines.

This post demonstrates how to use AWS continuous integration and deployment (CI/CD) services and Docker to separate the container build process from the application deployment process.

Overview

There is a sample application that creates two pipelines to deploy a Java application. The first pipeline uses Docker to build and deploy the container image to the Amazon ECR. The second pipeline uses AWS Serverless Application Model (AWS SAM) to deploy a Lambda function based on the container from the first process.

This shows how to build, manage, and deploy Lambda container images automatically with infrastructure as code (IaC). It also covers automatically updating or creating Lambda functions based on a container image version.

Example architecture

The example application uses AWS CloudFormation to configure the AWS Lambda container pipelines. Both pipelines use AWS CodePipeline, AWS CodeBuild, and AWS CodeCommit. The lambda-container-image-deployment-pipeline builds and deploys a container image to ECR. The sam-deployment-pipeline updates or deploys a Lambda function based on the new container image.

The pipeline deploys the sample application:

The developer pushes code to the main branch.
An update to the main branch invokes the pipeline.
The pipeline clones the CodeCommit repository.
Docker builds the container image and assigns tags.
Docker pushes the image to ECR.
The lambda-container-image-pipeline completion triggers an Amazon EventBridge event.
The pipeline clones the CodeCommit repository.
AWS SAM builds the Lambda-based container image application.
AWS SAM deploys the application to AWS Lambda.

Prerequisites

To provision the pipeline deployment, you must have the following prerequisites:

A Git client to clone the source code in a repository.
An IAM user with Git credentials.
An AWS account with local credentials properly configured (typically under ~/.aws/credentials).
The latest version of the AWS CLI. For more information, see installing, updating, and uninstalling the AWS CLI.

Infrastructure configuration

The pipeline relies on infrastructure elements like AWS Identity and Access Management roles, S3 buckets, and an ECR repository. Due to security and governance considerations, many organizations prefer to keep these infrastructure components separate from their application deployments.

To start, deploy the core infrastructure components using CloudFormation and the AWS CLI:

Create a local directory called BlogDemoRepo and clone the source code repository found in the following location:

mkdir -p $HOME/BlogDemoRepo
cd $HOME/BlogDemoRepo
git clone https://github.com/aws-samples/modernize-deployments-with-container-images-in-lambda

Change directory into the cloned repository:

cd modernize-deployments-with-container-images-in-lambda/

Deploy the s3-iam-config CloudFormation template, keeping the following CloudFormation template names:

aws cloudformation create-stack \
  --stack-name s3-iam-config \
  --template-body file://templates/s3-iam-config.yml \
  --parameters file://parameters/s3-iam-config-params.json \
  --capabilities CAPABILITY_NAMED_IAM

The output should look like the following:

Output example for stack creation

Application overview

The application uses Docker to build the container image and an ECR repository to store the container image. AWS SAM deploys the Lambda function based on the new container.

The example application in this post uses a Java-based container image using Amazon Corretto. Amazon Corretto is a no-cost, multi-platform, production-ready Open Java Development Kit (OpenJDK).

The Lambda container-image base includes the Amazon Linux operating system, and a set of base dependencies. The image also consists of the Lambda Runtime Interface Client (RIC) that allows your runtime to send and receive to the Lambda service. Take some time to review the Dockerfile and how it configures the Java application.

Configure the repository

The CodeCommit repository contains all of the configurations the pipelines use to deploy the application. To configure the CodeCommit repository:

Get metadata about the CodeCommit repository created in a previous step. Run the following command from the BlogDemoRepo directory created in a previous step:
```
aws codecommit get-repository \
  --repository-name DemoRepo \
  --query repositoryMetadata.cloneUrlHttp \
  --output text
```
The output should look like the following:

Output example for get repository
In your terminal, paste the Git URL from the previous step and clone the repository:
```
git clone <insert_url_from_step_1_output>
```
You receive a warning because the repository is empty.

Empty repository warning
Create the main branch:
```
cd DemoRepo
git checkout -b main
```
Copy all of the code from the cloned GitHub repository to the CodeCommit repository:
```
cp -r ../modernize-deployments-with-container-images-in-lambda/* .
```

Commit and push the changes:

git add .
git commit -m "Initial commit"
git push -u origin main

Pipeline configuration

This example deploys two separate pipelines. The first is called the modernize-deployments-with-container-images-in-lambda, which consists of building and deploying a container-image to ECR using Docker and the AWS CLI. An EventBridge event starts the pipeline when the CodeCommit branch is updated.

The second pipeline, sam-deployment-pipeline, is where the container image built from lambda-container-image-deployment-pipeline is deployed to a Lambda function using AWS SAM. This pipeline is also triggered using an Amazon EventBridge event. Successful completion of the lambda-container-image-deployment-pipeline invokes this second pipeline through Amazon EventBridge.

Both pipelines consist of AWS CodeBuild jobs configured with a buildspec file. The buildspec file enables developers to run bash commands and scripts to build and deploy applications.

Deploy the pipeline

You now configure and deploy the pipelines and test the configured application in the AWS Management Console.

Change directory back to modernize-serverless-deployments-leveraging-lambda-container-images directory and deploy the lambda-container-pipeline CloudFormation Template:

cd $HOME/BlogDemoRepo/modernize-deployments-with-container-images-in-lambda/
aws cloudformation create-stack \
  --stack-name lambda-container-pipeline \
  --template-body file://templates/lambda-container-pipeline.yml \
  --parameters file://parameters/lambda-container-params.json  \
  --capabilities CAPABILITY_IAM \
  --region us-east-1

The output appears:

Output example for stack creation

Wait for the lambda-container-pipeline stack from the previous step to complete and deploy the sam-deployment-pipeline CloudFormation template:

aws cloudformation create-stack \
  --stack-name sam-deployment-pipeline \
  --template-body file://templates/sam-deployment-pipeline.yml \
  --parameters file://parameters/sam-deployment-params.json  \
  --capabilities CAPABILITY_IAM \
  --region us-east-1

The output appears:

Output example of stack creation

In the console, select CodePipeline → pipelines:
Wait for the status of both pipelines to show Succeeded:
Navigate to the ECR console and choose demo-java. This shows that the pipeline is built and the image is deployed to ECR.
Navigate to the Lambda console and choose the MyCustomLambdaContainer function.
The Image configuration panel shows that the function is configured to use the image created earlier.
To test the function, choose Test.
Keep the default settings and choose Test.

This completes the walkthrough. To further test the workflow, modify the Java application and commit and push your changes to the main branch. You can then review the updated resources you have deployed.

Conclusion

This post shows how to use AWS services to automate the creation of Lambda container images. Using CodePipeline, you create a CI/CD pipeline for updates and deployments of Lambda container-images. You then test the Lambda container-image in the AWS Management Console.

For more serverless content visit Serverless Land.

Understanding how AWS Lambda scales with Amazon SQS standard queues

2021-11-11 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/understanding-how-aws-lambda-scales-when-subscribed-to-amazon-sqs-queues/

This post is written by John Lee, Solutions Architect, and Isael Pelletier, Senior Solutions Architect.

Many architectures use Amazon SQS, a fully managed message queueing service, to decouple producers and consumers. SQS is a fundamental building block for building decoupled architectures. AWS Lambda is also fully managed by AWS and is a common choice as a consumer as it supports native integration with SQS. This combination of services allows you to write and maintain less code and unburden you from the heavy lifting of server management.

This blog post looks into optimizing the scaling behavior of Lambda functions when subscribed to an SQS standard queue. It discusses how Lambda’s scaling works in this configuration and reviews best practices for maximizing message throughput. The post provides insight into building your own scalable workload and guides you in building well-architected workloads.

Scaling Lambda functions

When a Lambda function subscribes to an SQS queue, Lambda polls the queue as it waits for messages to arrive. Lambda consumes messages in batches, starting at five concurrent batches with five functions at a time.

If there are more messages in the queue, Lambda adds up to 60 functions per minute, up to 1,000 functions, to consume those messages. This means that Lambda can scale up to 1,000 concurrent Lambda functions processing messages from the SQS queue.

Figure 1. The Lambda service polls the SQS queue and batches messages that are processed by automatic scaling Lambda functions. You start with five concurrent Lambda functions.

This scaling behavior is managed by AWS and cannot be modified. To process more messages, you can optimize your Lambda configuration for higher throughput. There are several strategies you can implement to do this.

Increase the allocated memory for your Lambda function

The simplest way to increase throughput is to increase the allocated memory of the Lambda function. While you do not have control over the scaling behavior of the Lambda functions subscribed to an SQS queue, you control the memory configuration.

Faster Lambda functions can process more messages and increase throughput. This works even if a Lambda function’s memory utilization is low. This is because increasing memory also increases vCPUs in proportion to the amount configured. Each function now supports up to 10 GB of memory and you can access up to six vCPUs per function.

You may need to modify code to take advantage of the extra vCPUs. This consists of implementing multithreading or parallel processing to use all the vCPUs. You can find a Python example in this blog post.

To see the average cost and execution speed for each memory configuration before making a decision, Lambda Power Tuning tool helps to visualize the tradeoffs.

Optimize batching behavior

Batching can increase message throughput. By default, Lambda batches up to 10 messages in a queue to process them during a single Lambda execution. You can increase this number up to 10,000 messages, or up to 6 MB of messages in a single batch for standard SQS queues.

If each payload size is 256 KB (the maximum message size for SQS), Lambda can only take 23 messages per batch, regardless of the batch size setting. Similar to increasing memory for a Lambda function, processing more messages per batch can increase throughput.

However, increasing the batch size does not always achieve this. It is important to understand how message batches are processed. All messages in a failed batch return to the queue. This means that if a Lambda function with five messages fails while processing the third message, all five messages are returned to the queue, including the successfully processed messages. The Lambda function code must be able to process the same message multiple times without side effects.

Figure 2. A Lambda function returning all five messages to the queue after failing to process the third message.

To prevent successfully processed messages from being returned to SQS, you can add code to delete the processed messages from the queue manually. You can also use existing open source libraries, such as Lambda Powertools for Python or Lambda Powertools for Java that provide this functionality.

Catch errors from the Lambda function

The Lambda service scales to process messages from the SQS queue when there are sufficient messages in the queue.

However, there is a case where Lambda scales down the number of functions, even when there are messages remaining in the queue. This is when a Lambda function throws errors. The Lambda function scales down to minimize erroneous invocations. To sustain or increase the number of concurrent Lambda functions, you must catch the errors so the function exits successfully.

To retain the failed messages, use an SQS dead-letter queue (DLQ). There are caveats to this approach. Catching errors without proper error handling and tracking mechanisms can result in errors being ignored instead of raising alerts. This may lead to silent failures while Lambda continues to scale and process messages in the queue.

Relevant Lambda configurations

There are several Lambda configuration settings to consider for optimizing Lambda’s scaling behavior. Paying attention to the following configurations can help prevent throttling and increase throughput of your Lambda function.

Reserved concurrency

If you use reserved concurrency for a Lambda function, set this value greater than five. This value sets the maximum number of concurrent Lambda functions that can run. Lambda allocates five functions to consume five batches at a time. If the reserved concurrency value is lower than five, the function is throttled when it tries to process more than this value concurrently.

Batching Window

For larger batch sizes, set the MaximumBatchingWindowInSeconds parameter to at least 1 second. This is the maximum amount of time that Lambda spends gathering records before invoking the function. If this value is too small, Lambda may invoke the function with a batch smaller than the batch size. If this value is too large, Lambda polls for a longer time before processing the messages in the batch. You can adjust this value to see how it affects your throughput.

Queue visibility timeout

All SQS messages have a visibility timeout that determines how long the message is hidden from the queue after being selected up by a consumer. If the message is not successfully processed or deleted, the message reappears in the queue when the visibility timeout ends.

Give your Lambda function enough time to process the message by setting the visibility timeout based on your function-specific metrics. You should set this value to six times the Lambda function timeout plus the value of MaximumBatchingWindowInSeconds. This prevents other functions from unnecessarily processing the messages while the message is already being processed.

Dead-letter queues (DLQs)

Failed messages are placed back in the queue to be retried by Lambda. To prevent failed messages from getting added to the queue multiple times, designate a DLQ and send failed messages there.

The number of times the messages should be retried is set by the Maximum receives value for the DLQ. Once the message is re-added to the queue more than the Maximum receives value, it is placed in the DLQ. You can then process this message at a later time.

This allows you to avoid situations where many failed messages are continuously placed back into the queue, consuming Lambda resources. Failed messages scale down the Lambda function and add the entire batch to the queue, which can worsen the situation. To ensure smooth scaling of the Lambda function, move repeatedly failing messages to the DLQ.

Conclusion

This post explores Lambda’s scaling behavior when subscribed to SQS standard queues. It walks through several ways to scale faster and maximize Lambda throughput when needed. This includes increasing the memory allocation for the Lambda function, increasing batch size, catching errors, and making configuration changes. Better understanding the levers available for SQS and Lambda interaction can help in meeting your scaling needs.

To learn more about building decoupled architectures, see these videos on Amazon SQS. For more serverless learning resources, visit https://serverlessland.com.

Token-based authentication for iOS applications with Amazon SNS

2021-11-10 Talia Nassi

Post Syndicated from Talia Nassi original https://aws.amazon.com/blogs/compute/token-based-authentication-for-ios-applications-with-amazon-sns/

This post is co-written by Karen Hong, Software Development Engineer, AWS Messaging.

To use Amazon SNS to send mobile push notifications, you must provide a set of credentials for connecting to the supported push notification service (see prerequisites for push). For the Apple Push Notification service (APNs), SNS now supports using token-based authentication (.p8), in addition to the existing certificate-based method.

You can now use a .p8 file to create or update a platform application resource through the SNS console or programmatically. You can publish messages (directly or from a topic) to platform application endpoints configured for token-based authentication.

In this tutorial, you set up an example iOS application. You retrieve information from your Apple developer account and learn how to register a new signing key. Next, you use the SNS console to set up a platform application and a platform endpoint. Finally, you test the setup and watch a push notification arrive on your device.

Advantages of token-based authentication

Token-based authentication has several benefits compared to using certificates. The first is that you can use the same signing key from multiple provider servers (iOS,VoIP, and MacOS), and you can use one signing key to distribute notifications for all of your company’s application environments (sandbox, production). In contrast, a certificate is only associated with a particular subset of these channels.

A pain point for customers using certificate-based authentication is the need to renew certificates annually, an inconvenient procedure which can lead to production issues when forgotten. Your signing key for token-based authentication, on the other hand, does not expire.

Token-based authentication improves the security of your certificates. Unlike certificate-based authentication, the credential does not transfer. Hence, it is less likely to be compromised. You establish trust through encrypted tokens that are frequently regenerated. SNS manages the creation and management of these tokens.

You configure APNs platform applications for use with both .p8 and .p12 certificates, but only 1 authentication method is active at any given time.

Setting up your iOS application

To use token-based authentication, you must set up your application.

Prerequisites: An Apple developer account

Create a new XCode project. Select iOS as the platform and use the App template.
Select your Apple Developer Account team and your organization identifier.
Go to Signing & Capabilities and select + Capability. This step creates resources on your Apple Developer Account.
Add the Push Notification Capability.

In SNSPushDemoApp.swift , add the following code to print the device token and receive push notifications.

import SwiftUI

@main
struct SNSPushDemoApp: App {
    
    @UIApplicationDelegateAdaptor private var appDelegate: AppDelegate
    
    var body: some Scene {
        WindowGroup {
            ContentView()
        }
    }
}

class AppDelegate: NSObject, UIApplicationDelegate, UNUserNotificationCenterDelegate {
    
    func application(_ application: UIApplication,
                     didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey : Any]? = nil) -> Bool {
        UNUserNotificationCenter.current().delegate = self
        return true
    }
    
    func application(_ application: UIApplication,
                     didRegisterForRemoteNotificationsWithDeviceToken deviceToken: Data) {
        let tokenParts = deviceToken.map { data in String(format: "%02.2hhx", data) }
        let token = tokenParts.joined()
        print("Device Token: \(token)")
    };
    
    func application(_ application: UIApplication, didFailToRegisterForRemoteNotificationsWithError error: Error) {
       print(error.localizedDescription)
    }
    
    func userNotificationCenter(_ center: UNUserNotificationCenter, willPresent notification: UNNotification, withCompletionHandler completionHandler: @escaping (UNNotificationPresentationOptions) -> Void) {
        completionHandler([.banner, .badge, .sound])
    }
}

In ContentView.swift, add the code to request authorization for push notifications and register for notifications.

import SwiftUI

struct ContentView: View {
    init() {
        requestPushAuthorization();
    }
    
    var body: some View {
        Button("Register") {
            registerForNotifications();
        }
    }
}

struct ContentView_Previews: PreviewProvider {
    static var previews: some View {
        ContentView()
    }
}

func requestPushAuthorization() {
    UNUserNotificationCenter.current().requestAuthorization(options: [.alert, .badge, .sound]) { success, error in
        if success {
            print("Push notifications allowed")
        } else if let error = error {
            print(error.localizedDescription)
        }
    }
}

func registerForNotifications() {
    UIApplication.shared.registerForRemoteNotifications()
}

Build and run the app on an iPhone. The push notification feature does not work with a simulator.
On your phone, select allow notifications when the prompt appears. The debugger prints out “Push notifications allowed” if it is successful.
On your phone, choose the Register button. The debugger prints out the device token.
You have set up an iOS application that can receive push notifications and prints the device token. We can now use this app to test sending push notifications with SNS configured for token-based authentication.

Retrieving your Apple resources

After setting up your application, you retrieve your Apple resources from your Apple developer account. There are four pieces of information you need from your Apple Developer Account: Bundle ID, Team ID, Signing Key, and Signing Key ID.

The signing key and signing key ID are credentials that you manage through your Apple Developer Account. You can register a new key by selecting the Keys tab under the Certificates, Identifiers & Profiles menu. Your Apple developer account provides the signing key in the form of a text file with a .p8 extension.

Find the team ID under Membership Details. The bundle ID is the unique identifier that you set up when creating your application. Find this value in the Identifiers section under the Certificates, Identifiers & Profiles menu.

Amazon SNS uses a token constructed from the team ID, signing key, and signing key ID to authenticate with APNs for every push notification that you send. Amazon SNS manages tokens on your behalf and renews them when necessary (within an hour). The request header includes the bundle ID and helps identify where the notification goes.

Creating a new platform application using APNs token-based authentication

Prerequisites

In order to implement APNs token-based authentication, you must have:

An Apple Developer Account
A mobile application

To create a new platform application:

Navigate to the Amazon SNS console and choose Push notifications. Then choose Create platform application.
Enter a name for your application. In the Push notification platform dropdown, choose Apple iOS/VoIP/Mac.
For the Push service, choose iOS, and for the Authentication method, choose Token. Select the check box labeled Used for development in sandbox. Then, input the fields from your Apple Developer Account.
You have successfully created a platform application using APNs token-based authentication.

Creating a new platform endpoint using APNs token-based authentication

A platform application stores credentials, sending configuration, and other settings but does not contain an exact sending destination. Create a platform endpoint resource to store the information to allow SNS to target push notifications to the proper application on the correct mobile device.

Any iOS application that is capable of receiving push notifications must register with APNs. Upon successful registration, APNs returns a device token that uniquely identifies an instance of an app. SNS needs this device token in order to send to that app. Each platform endpoint belongs to a specific platform application and uses the credentials and settings set in the platform application to complete the sending.

In this tutorial, you create the platform endpoint manually through the SNS console. In a real system, upon receiving the device token, you programmatically call SNS from your application server to create or update your platform endpoints.

These are the steps to create a new platform endpoint:

From the details page of the platform application in the SNS console, choose Create application endpoint.
From the iOS app that you set up previously, find the device token in the application logs. Enter the device token and choose Create application endpoint.
You have successfully created a platform application endpoint.

Testing a push notification from your device

In this section, you test a push notification from your device.

From the details page of the application endpoint you just created, (this is the page you end up at immediately after creating the endpoint), choose Publish message.
Enter a message to send and choose Publish message.
The notification arrives on your iOS app.

Conclusion

Developers sending mobile push notifications can now use a .p8 key to authenticate an Apple device endpoint. Token-based authentication is more secure, and reduces operational burden of renewing the certificates every year. In this post, you learn how to set up your iOS application for mobile push using token-based authentication, by creating and configuring a new platform endpoint in the Amazon SNS console.

To learn more about APNs token-based authentication with Amazon SNS, visit the Amazon SNS Developer Guide. For more serverless content, visit Serverless Land.

Creating static custom domain endpoints with Amazon MQ for RabbitMQ

2021-11-09 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/creating-static-custom-domain-endpoints-with-amazon-mq-for-rabbitmq/

This post is written by Nate Bachmeier, Senior Solutions Architect, Wallace Printz, Senior Solutions Architect, Christian Mueller, Principal Solutions Architect.

Many cloud-native application architectures take advantage of the point-to-point and publish-subscribe, or “pub-sub”, model of message-based communication between application components. Not only is this architecture generally more resilient to failure because of the loose coupling and because message processing failures can be retried, it is also more efficient because individual application components can independently scale up or down to maintain message processing SLAs, compared to monolithic application architectures.

Synchronous (REST-based) systems are tightly coupled. A problem in a synchronous downstream dependency has immediate impact on the upstream callers. Retries from upstream callers can fan out and amplify problems.

For applications requiring messaging protocols including JMS, NMS, AMQP, STOMP, MQTT, and WebSocket, Amazon provides Amazon MQ. This is a managed message broker service for Apache ActiveMQ and RabbitMQ that makes it easier to set up and operate message brokers in the cloud.

Amazon MQ provides two managed broker deployment connection options: public brokers and private brokers. Public brokers receive internet-accessible IP addresses while private brokers receive only private IP addresses from the corresponding CIDR range in their VPC subnet. In some cases, for security purposes, customers may prefer to place brokers in a private subnet, but also allow access to the brokers through a persistent public endpoint, such as a subdomain of their corporate domain like ‘mq.example.com’.

This blog explains how to provision private Amazon MQ brokers behind a secure public load balancer endpoint using an example subdomain.

AmazonMQ also supports ActiveMQ – to learn more, read Creating static custom domain endpoints with Amazon MQ to simplify broker modification and scaling.

Overview

There are several reasons one might want to deploy this architecture beyond the security aspects. First, human-readable URLs are easier for people to parse when reviewing operations and troubleshooting, such as deploying updates to ‘mq-dev.example.com’ before ‘mq-prod.example.com’. Additionally, maintaining static URLs for your brokers helps reduce the necessity of modifying client code when performing maintenance on the brokers.

The following diagram shows the solutions architecture. This blog post assumes some familiarity with AWS networking fundamentals, such as VPCs, subnets, load balancers, and Amazon Route 53. For additional information on these topics, see the Elastic Load Balancing documentation.

The client service tries to connect with a RabbitMQ connection string to the domain endpoint setup in Route 53.
The client looks up the domain name from Route 53, which returns the IP address of the Network Load Balancer (NLB).
The client creates a Transport Layer Security (TLS) connection to the NLB with a secure socket layer (SSL) certificate provided from AWS Certificate Manager (ACM).
The NLB chooses a healthy endpoint from the target group and creates a separate SSL connection. This provides secure, end-to-end SSL encrypted messaging between client and brokers.

To build this architecture, you build the network segmentation first, then add the Amazon MQ brokers, and finally the network routing. You need a VPC, one private subnet per Availability Zone, and one public subnet for your bastion host (if desired).

This demonstration VPC uses the 10.0.0.0/16 CIDR range. Additionally, you must create a custom security group for your brokers. You must set up this security group to allow traffic from your Network Load Balancer to the RabbitMQ brokers.

This example does not use this VPC for other workloads so it allows all incoming traffic that originates within the VPC (which includes the NLB) through to the brokers on the AMQP port of 5671 and the web console port of 443.

Adding the Amazon MQ brokers

With the network segmentation set up, add the Amazon MQ brokers:

Choose Create brokers on the Active Amazon MQ home page.
Toggle the radio button next to RabbitMQ and choose Next.
Choose a deployment mode of either single-instance broker (for development environments), or cluster deployment (for production environments).
In Configure settings, specify the broker name and instance type.
Confirm that the broker engine version is 3.8.22 or higher and set Access type to Private access.
Specify the VPC, private subnets, and security groups before choosing Next.

Finding the broker’s IP address

Before configuring the NLB’s target groups, you must look up the broker’s IP address. Unlike Amazon MQ for Apache ActiveMQ, RabbitMQ does not show its private IP addresses, though you can reliably find its VPC endpoints using DNS. Amazon MQ creates one VPC endpoint in each subnet with a static address that won’t change until you delete the broker.

Navigate to the broker’s details page and scroll to the Connections panel.
Find the endpoint’s fully qualified domain name. It is formatted like broker-id.mq.region.amazonaws.com.
Open a command terminal on your local workstation.
Retrieve the ‘A’ record values using the host (Linux) or nslookup (Windows) command.
Record these values for the NLB configuration steps later.

Configure the load balancer’s target group

The next step in the build process is to configure the load balancer’s target group. You use the private IP addresses of the brokers as targets for the NLB. Create a Target Group, select the target type as IP, and make sure to choose the TLS protocol and for each required port, as well as the VPC your brokers reside in.

It is important to configure the health check settings so traffic is only routed to active brokers. Select the TCP protocol and override the health check port to 443 Rabbit MQ’s console port. Also, configure the healthy threshold to 2 with a 10-second check interval so the NLB detects faulty hosts within 20 seconds.

Be sure not to use RabbitMQ’s AMQP port as the target group health check port. The NLB may not be able to recognize the host as healthy on that port. It is better to use the broker’s web console port.

Add the VPC endpoint addresses as NLB targets. The NLB routes traffic across the endpoints and provides networking reliability if an AZ is offline. Finally, configure the health checks to use the web console (TCP port 443).

Creating a Network Load Balancer

Next, you create a Network Load Balancer. This is an internet-facing load balancer with TLS listeners on port 5671 (AMQP), routing traffic to the brokers’ VPC and private subnets. You select the target group you created, selecting TLS for the connection between the NLB and the brokers. To allow clients to connect to the NLB securely, select an ACM certificate for the subdomain registered in Route 53 (for example ‘mq.example.com’).

To learn about ACM certificate provisioning, read more about the process here. Make sure that the ACM certificate is provisioned in the same Region as the NLB or the certificate is not shown in the dropdown menu.

Optionally configure IP filtering

The NLB is globally accessible and this may be overly permissive for some workloads. You can restrict incoming traffic to specific IP ranges on the NLB’s public subnet by using network access control list (NACL) configuration:

Navigate to the AWS Management Console to the VPC service and choose Subnets.
Select your public subnet and then switch to the Network ACL tab.
Select the link to the associated network ACL (e.g., acl-0d6fxxxxxxxx) details.
Activate this item and choose Edit inbound rules in the Action menu.
Specify the desired IP range and then choose Save changes.

Configuring Route 53

Finally, configure Route 53 to serve traffic at the subdomain of your choice to the NLB:

Go to the Route 53 hosted zone and create a new subdomain record set, such as mq.example.com, that matches the ACM certificate that you previously created.
In the “type” field, select “A – IPv4 address”, then select “Yes” for alias. This allows you to select the NLB as the alias target.
Select from the alias target menu the NLB you just created and save the record set.

Now callers can use the friendly name in the RabbitMQ connection string. This capability improves the developer experience and reduces operational cost when rebuilding the cluster. Since you added multiple VPC endpoints (one per subnet) into the NLB’s target group, the solution has Multi-AZ redundancy.

Testing with a RabbitMQ client process

The entire process can be tested using any RabbitMQ client process. One approach is to launch the official Docker image and connect with the native client. The service documentation also provides sample code for authenticating, publishing, and subscribing to RabbitMQ channels.

To log in to the broker’s RabbitMQ web console, there are three options. Due to the security group rules, only traffic originating from inside the VPC is allowed to the brokers:

Use a VPN connection from your corporate network to the VPC. Many customers use this option but for rapid testing, there is a simpler and more cost-effective method.
Connect to the brokers’ web console through your Route 53 subdomain, which requires creating a separate web console port listener (443) on the existing NLB and creating a separate TLS target group for the brokers.
Use a bastion host to proxy traffic to the web console.

Conclusion

In this post, you build a highly available Amazon MQ broker in a private subnet. You layer security by placing the brokers behind a highly scalable Network Load Balancer. You configure routing from a single custom subdomain URL to multiple brokers with a built-in health check.

For more serverless learning resources, visit Serverless Land.

Implementing header-based API Gateway versioning with Amazon CloudFront

2021-11-08 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/implementing-header-based-api-gateway-versioning-with-amazon-cloudfront/

This post is written by Amir Khairalomoum, Sr. Solutions Architect.

In this blog post, I show you how to use Lambda@Edge feature of Amazon CloudFront to implement a header-based API versioning solution for Amazon API Gateway.

Amazon API Gateway is a fully managed service that makes it easier for developers to create, publish, maintain, monitor, and secure APIs at any scale. Amazon CloudFront is a global content delivery network (CDN) service built for high-speed, low-latency performance, security, and developer ease-of-use. Lambda@Edge is a feature of Amazon CloudFront, a compute service that lets you run functions that customize the content that CloudFront delivers.

The example uses the AWS SAM CLI to build, deploy, and test the solution on AWS. The AWS Serverless Application Model (AWS SAM) is an open-source framework that you can use to build serverless applications on AWS. The AWS SAM CLI lets you locally build, test, and debug your applications defined by AWS SAM templates. You can also use the AWS SAM CLI to deploy your applications to AWS, or create secure continuous integration and deployment (CI/CD) pipelines.

After an API becomes publicly available, it is used by customers. As a service evolves, its contract also evolves to reflect new changes and capabilities. It’s safe to evolve a public API by adding new features but it’s not safe to change or remove existing features.

Any breaking changes may impact consumer’s applications and break them at runtime. API versioning is important to avoid breaking backward compatibility and breaking a contract. You need a clear strategy for API versioning to help consumers adopt them.

Versioning APIs

Two of the most commonly used API versioning strategies are URI versioning and header-based versioning.

URI versioning

This strategy is the most straightforward and the most commonly used approach. In this type of versioning, versions are explicitly defined as part of API URIs. These example URLs show how domain name, path, or query string parameters can be used to specify a version:

https://api.example.com/v1/myservice
https://apiv1.example.com/myservice
https://api.example.com/myservice?v=1

To deploy an API in API Gateway, the deployment is associated with a stage. A stage is a logical reference to a lifecycle state of your API (for example, dev, prod, beta, v2). As your API evolves, you can continue to deploy it to different stages as different versions of the API.

Header-based versioning

This strategy is another commonly used versioning approach. It uses HTTP headers to specify the desired version. It uses the “Accept” header for content negotiation or uses a custom header (for example, “APIVER” to indicate a version):

Accept:application/vnd.example.v1+json
APIVER:v1

This approach allows you to preserve URIs between versions. As a result, you have a cleaner and more understandable set of URLs. It is also easier to add versioning after design. However, you may need to deal with complexity of returning different versions of your resources.

Overview of solution

The target architecture for the solution uses Lambda@Edge. It dynamically routes a request to the relevant API version, based on the provided header:

Architecture overview

In this architecture:

The user sends a request with a relevant header, which can be either “Accept” or another custom header.
This request reaches the CloudFront distribution and triggers the Lambda@Edge Origin Request.
The Lambda@Edge function uses the provided header value and fetches data from an Amazon DynamoDB table. This table contains mappings for API versions. The function then modifies the Origin and the Host header of the request and returns it back to CloudFront.
CloudFront sends the request to the relevant Amazon API Gateway URL.

In the next sections, I walk you through setting up the development environment and deploying and testing this solution.

Setting up the development environment

To deploy this solution on AWS, you use the AWS Cloud9 development environment.

Go to the AWS Cloud9 web console. In the Region dropdown, make sure you’re using N. Virginia (us-east-1) Region.
Select Create environment.
On Step 1 – Name environment, enter a name for the environment, and choose Next step.
On Step 2 – Configure settings, keep the existing environment settings.

Console view of configuration settings
Choose Next step. Choose Create environment.

Deploying the solution

Now that the development environment is ready, you can proceed with the solution deployment. In this section, you download, build, and deploy a sample serverless application for the solution using AWS SAM.

Download the sample serverless application

The solution sample code is available on GitHub. Clone the repository and download the sample source code to your Cloud9 IDE environment by running the following command in the Cloud9 terminal window:

git clone https://github.com/aws-samples/amazon-api-gateway-header-based-versioning.git ./api-gateway-header-based-versioning

This sample includes:

template.yaml: Contains the AWS SAM template that defines your application’s AWS resources.
hello-world/: Contains the Lambda handler logic behind the API Gateway endpoints to return the hello world message.
edge-origin-request/: Contains the Lambda@Edge handler logic to query the API version mapping and modify the Origin and the Host header of the request.
init-db/: Contains the Lambda handler logic for a custom resource to populate sample DynamoDB table

Build your application

Run the following commands in order to first, change into the project directory, where the template.yaml file for the sample application is located then build your application:

cd ~/environment/api-gateway-header-based-versioning/
sam build

Output:

Build output

Deploy your application

Run the following command to deploy the application in guided mode for the first time then follow the on-screen prompts:

sam deploy --guided

Output:

Deploy output

The output shows the deployment of the AWS CloudFormation stack.

Testing the solution

This application implements all required components for the solution. It consists of two Amazon API Gateway endpoints backed by AWS Lambda functions. The deployment process also initializes the API Version Mapping DynamoDB table with the values provided earlier in the deployment process.

Run the following commands to see the created mappings:

STACK_NAME=$(grep stack_name ~/environment/api-gateway-header-based-versioning/samconfig.toml | awk -F\= '{gsub(/"/, "", $2); gsub(/ /, "", $2); print $2}')

DDB_TBL_NAME=$(aws cloudformation describe-stacks --region us-east-1 --stack-name $STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`DynamoDBTableName`].OutputValue' --output text) && echo $DDB_TBL_NAME

aws dynamodb scan --table-name $DDB_TBL_NAME

Output:

Table scan results

When a user sends a GET request to CloudFront, it routes the request to the relevant API Gateway endpoint version according to the provided header value. The Lambda function behind that API Gateway endpoint is invoked and returns a “hello world” message.

To send a request to the CloudFront distribution, which is created as part of the deployment process, first get its domain name from the deployed AWS CloudFormation stack:

CF_DISTRIBUTION=$(aws cloudformation describe-stacks --region us-east-1 --stack-name $STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`CFDistribution`].OutputValue' --output text) && echo $CF_DISTRIBUTION

Output:

Domain name results

You can now send a GET request along with the relevant header you specified during the deployment process to the CloudFront to test the application.

Run the following command to test the application for API version one. Note that if you entered a different value other than the default value provided during the deployment process, change the --header parameter to match your inputs:

curl -i -o - --silent -X GET "https://${CF_DISTRIBUTION}/hello" --header "Accept:application/vnd.example.v1+json" && echo

Output:

Curl results

The response shows that CloudFront successfully routed the request to the API Gateway v1 endpoint as defined in the mapping Amazon DynamoDB table. API Gateway v1 endpoint received the request. The Lambda function behind the API Gateway v1 was invoked and returned a “hello world” message.

Now you can change the header value to v2 and run the command again this time to test the API version two:

curl -i -o - --silent -X GET "https://${CF_DISTRIBUTION}/hello" --header "Accept:application/vnd.example.v2+json" && echo

Output:

Curl results after header change

The response shows that CloudFront routed the request to the API Gateway v2 endpoint as defined in the mapping DynamoDB table. API Gateway v2 endpoint received the request. The Lambda function behind the API Gateway v2 was invoked and returned a “hello world” message.

This solution requires valid a header value on each individual request, so the application checks and raises an error if the header is missing or the header value is not valid.

You can remove the header parameter and run the command to test this scenario:

curl -i -o - --silent -X GET "https://${CF_DISTRIBUTION}/hello" && echo

Output:

No header causes a 403 error

The response shows that Lambda@Edge validated the request and raised an error to inform us that the request did not have a valid header.

Mitigating latency

In this solution, Lambda@Edge reads the API version mappings data from the DynamoDB table. Accessing external data at the edge can cause additional latency to the request. In order to mitigate the latency, solution uses following methods:

Cache data in Lambda@Edge memory: As data is unlikely to change across many Lambda@Edge invocations, Lambda@Edge caches API version mappings data in the memory for a certain period of time. It reduces latency by avoiding an external network call for each individual request.
Use Amazon DynamoDB global table: It brings data closer to the CloudFront distribution and reduces external network call latency.

Cleaning up

To clean up the resources provisioned as part of the solution:

Run following command to delete the deployed application:
```
sam delete
```
Go to the AWS Cloud9 web console. Select the environment you created then choose Delete.

Conclusion

Header-based API versioning is a commonly used versioning strategy. This post shows how to use CloudFront to implement a header-based API versioning solution for API Gateway. It uses the AWS SAM CLI to build and deploy a sample serverless application to test the solution in the AWS Cloud.

To learn more about API Gateway, visit the API Gateway developer guide documentation, and for CloudFront, refer to Amazon CloudFront developer guide documentation.

For more serverless learning resources, visit Serverless Land.

Introducing cross-account Amazon ECR access for AWS Lambda

2021-11-04 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/introducing-cross-account-amazon-ecr-access-for-aws-lambda/

This post is written by Brian Zambrano, Enterprise Solutions Architect and Indranil Banerjee, Senior Solution Architect.

In December 2020, AWS announced support for packaging AWS Lambda functions using container images. Customers use the container image packaging format for workloads like machine learning inference made possible by the 10 GB container size increase and familiar container tooling.

Many customers use multiple AWS accounts for application development but centralize Amazon Elastic Container Registry (ECR) images to a single account. Until today, a Lambda function had to reside in the same AWS account as the ECR repository that owned the container image. Cross-account ECR access with AWS Lambda functions has been one of the most requested features since launch.

From today, you can now deploy Lambda functions that reference container images from an ECR repository in a different account within the same AWS Region.

Overview

The example demonstrates how to use the cross-account capability using two AWS example accounts:

ECR repository owner: Account ID 111111111111
Lambda function owner: Account ID 222222222222

The high-level process consists of the following steps:

Create an ECR repository using Account 111111111111 that grants Account 222222222222 appropriate permissions to use the image
Build a Lambda-compatible container image and push it to the ECR repository
Deploy a Lambda function in account 222222222222 and reference the container image in the ECR repository from account 111111111111

This example uses the AWS Serverless Application Model (AWS SAM) to create the ECR repository and its repository permissions policy. AWS SAM provides an easier way to manage AWS resources with CloudFormation.

To build the container image and upload it to ECR, use Docker and the AWS Command Line Interface (CLI). To build and deploy a new Lambda function that references the ECR image, use AWS SAM. Find the example code for this project in the GitHub repository.

Create an ECR repository with a cross-account access policy

Using AWS SAM, I create a new ECR repository named cross-account-function in the us-east-1 Region with account 111111111111. In the template.yaml file, RepositoryPolicyText defines the permissions for the ECR Repository. This template grants account 222222222222 access so that a Lambda function in that account can reference images in the ECR repository:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: SAM Template for cross-account-function ECR Repo

Resources:
  HelloWorldRepo:
    Type: AWS::ECR::Repository
    Properties:
      RepositoryName: cross-account-function
      RepositoryPolicyText:
        Version: "2012-10-17"
        Statement:
          - Sid: CrossAccountPermission
            Effect: Allow
            Action:
              - ecr:BatchGetImage
              - ecr:GetDownloadUrlForLayer
            Principal:
              AWS:
                - arn:aws:iam::222222222222:root
          - Sid: LambdaECRImageCrossAccountRetrievalPolicy
            Effect: Allow
            Action:
              - ecr:BatchGetImage
              - ecr:GetDownloadUrlForLayer
            Principal:
              Service: lambda.amazonaws.com
            Condition:
              StringLike:
                aws:sourceArn:
                  - arn:aws:lambda:us-east-1:222222222222:function:*

Outputs:
  ERCRepositoryUri:
    Description: "ECR RepositoryUri which may be referenced by Lambda functions"
    Value: !GetAtt HelloWorldRepo.RepositoryUri

The RepositoryPolicyText has two statements that are required for Lambda functions to work as expected:

CrossAccountPermission – Allows account 222222222222 to create and update Lambda functions that reference this ECR repository
LambdaECRImageCrossAccountRetrievalPolicy – Lambda eventually marks a function as INACTIVE when not invoked for an extended period. This statement is necessary so that Lambda service in account 222222222222 can pull the image again for optimization and caching.

To deploy this stack, run the following commands:

git clone https://github.com/aws-samples/lambda-cross-account-ecr.git
cd sam-ecr-repo
sam build

AWS SAM build results


sam deploy --guided

AWS SAM deploy results

Once AWS SAM deploys the stack, a new ECR repository named cross-account-function exists. The repository has a permissions policy that allows Lambda functions in account 222222222222 to access the container images. You can verify this in the ECR console for this repository:

Permissions displayed in the console

You can also extend this policy to enable multiple accounts by adding additional account IDs to the Principal and Condition evaluations lists in the CrossAccountPermission and LambdaECRImageCrossAccountRetrievalPolicy permissions policy. Narrowing the ECR permission policy is a best practice. With this launch, if you are working with multiple accounts in an AWS Organization we recommend enumerating your account IDs in the ECR permissions policy.

Amazon ECR repository policies use a subset of IAM policies to control access to individual ECR repositories. Refer to the ECR repository policies documentation to learn more.

Build a Lambda-compatible container image

Next, you build a container image using Docker and the AWS CLI. For this step, you need Docker, a Dockerfile, and Python code that responds to Lambda invocations.

Use the AWS-maintained Python 3.9 container image as the basis for the Dockerfile:

FROM public.ecr.aws/lambda/python:3.9
COPY app.py ${LAMBDA_TASK_ROOT}
CMD ["app.handler"]

The code for this example, in app.py, is a Hello World application.

import json
def handler(event, context):
    return {
        "statusCode": 200,
        "body": json.dumps({"message": "hello world!"}),
    }

To build and tag the image and push it to ECR using the same name as the repository (cross-account-function) for the image name and 01 as the tag, run:
```
$ docker build -t cross-account-function:01 .
```
Docker build results
Tag the image for upload to the ECR. The command parameters vary depending on the account id and Region. If you’re unfamiliar with the tagging steps for ECR, view the exact commands for your repository using the View push commands button from the ECR repository console page:
```
$ docker tag cross-account-function:01 111111111111.dkr.ecr.us-east-1.amazonaws.com/cross-account-function:01
```

$ aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 111111111111.dkr.ecr.us-east-1.amazonaws.com
$ docker push 111111111111.dkr.ecr.us-east-1.amazonaws.com/cross-account-function:01

Docker push results

Deploying a Lambda Function

The last step is to build and deploy a new Lambda function in account 222222222222. The AWS SAM template below, saved to a file named template.yaml, references the ECR image for the Lambda function’s ImageUri. This template also instructs AWS SAM to create an Amazon API Gateway REST endpoint integrating the Lambda function.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Sample SAM Template for sam-ecr-cross-account-demo

Globals:
  Function:
    Timeout: 3
Resources:
  HelloWorldFunction:
    Type: AWS::Serverless::Function
    Properties:
      PackageType: Image
      ImageUri: 111111111111.dkr.ecr.us-east-1.amazonaws.com/cross-account-function:01
      Architectures:
        - x86_64
      Events:
        HelloWorld:
          Type: Api
          Properties:
            Path: /hello
            Method: get

Outputs:
  HelloWorldApi:
    Description: "API Gateway endpoint URL for Prod stage for Hello World function"
    Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/hello/"

Use AWS SAM to deploy this template:

cd ../sam-cross-account-lambda
sam build

AWS SAM build results

sam deploy --guided

SAM deploy results

Now that the Lambda function is deployed, test using the API Gateway endpoint that AWS SAM created:

Testing the endpoint

Because it references a container image with the ImageUri parameter in the AWS SAM template, subsequent deployments must use the –resolve-image-repos parameter:

sam deploy --resolve-image-repos

Conclusion

This post demonstrates how to create a Lambda-compatible container image in one account and reference it from a Lambda function in another account. It shows an example of an ECR policy to enable cross-account functionality. It also shows how to use AWS SAM to deploy container-based functions using the ImageUri parameter.

To learn more about serverless and AWS SAM, visit the Sessions with SAM series and find more resources at Serverless Land.

#ServerlessForEveryone

Choosing between storage mechanisms for ML inferencing with AWS Lambda

2021-11-02 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/choosing-between-storage-mechanisms-for-ml-inferencing-with-aws-lambda/

This post is written by Veda Raman, SA Serverless, Casey Gerena, Sr Lab Engineer, Dan Fox, Principal Serverless SA.

For real-time machine learning inferencing, customers often have several machine learning models trained for specific use-cases. For each inference request, the model must be chosen dynamically based on the input parameters.

This blog post walks through the architecture of hosting multiple machine learning models using AWS Lambda as the compute platform. There is a CDK application that allows you to try these different architectures in your own account. Finally, it then discusses the different storage options for hosting the models and the benefits of each.

Overview

The serverless architecture for inferencing uses AWS Lambda and API Gateway. The machine learning models are stored either in Amazon S3 or Amazon EFS. Alternatively, they are part of the Lambda function deployed as a container image and stored in Amazon ECR.

All three approaches package and deploy the machine learning inference code as Lambda function along with the dependencies as a container image. More information on how to deploy Lambda functions as container images can be found here.

A user sends a request to Amazon API Gateway requesting a machine learning inference.
API Gateway receives the request and triggers Lambda function with the necessary data.
Lambda loads the container image from Amazon ECR. This container image contains the inference code and business logic to run the machine learning model. However, it does not store the machine learning model (unless using the container hosted option, see step 6).
Model storage option: For S3, when the Lambda function is triggered, it downloads the model files from S3 dynamically and performs the inference.
Model storage option: For EFS, when the Lambda function is triggered, it accesses the models via the local mount path set in the Lambda file system configuration and performs the inference.
Model storage option: If using the container hosted option, you must package the model in Amazon ECR with the application code defined for the Lambda function in step 3. The model runs in the same container as the application code. In this case, choosing the model happens at build-time as opposed to runtime.
Lambda returns the inference prediction to API Gateway and then to the user.

The storage option you choose, either Amazon S3, Amazon EFS, or Amazon ECR via Lambda OCI deployment, to host the models influences the inference latency, cost of the infrastructure and DevOps deployment strategies.

Comparing single and multi-model inference architectures

There are two types of ML inferencing architectures, single model and multi-model. In single model architecture, you have a single ML inference model that performs the inference for all incoming requests. The model is stored either in S3, ECR (via OCI deployment with Lambda), or EFS and is then used by a compute service such as Lambda.

The key characteristic of a single model is that each has its own compute. This means that for every Lambda function there is a single model associated with it. It is a one-to-one relationship.

Multi-model inferencing architecture is where there are multiple models to be deployed and the model to perform the inference should be selected dynamically based on the type of request. So you may have four different models for a single application and you want a Lambda function to choose the appropriate model at invocation time. It is a many-to-one relationship.

Regardless of whether you use single or multi-model, the models must be stored in S3, EFS, or ECR via Lambda OCI deployments.

Should I load a model outside the Lambda handler or inside?

It is a general best practice in Lambda to load models and anything else that may take a longer time to process outside of the Lambda handler. For example, loading a third-party package dependency. This is due to cold start invocation times – for more information on performance, read this blog.

However, if you are running a multi-model inference, you may want to load inside the handler so you can load a model dynamically. This means you could potentially store 100 models in EFS and determine which model to load at the time of invocation of the Lambda function.

In these instances, it makes sense to load the model in the Lambda handler. This can increase the processing time of your function, since you are loading the model at the time of request.

Deploying the solution

The example application is open-sourced. It performs NLP question/answer inferencing using the HuggingFace BERT model using the PyTorch framework (expanding upon previous work found here). The inference code and the PyTorch framework are packaged as a container image and then uploaded to ECR and the Lambda service.

The solution has three stacks to deploy:

MlEfsStack – Stores the inference models inside of EFS and loads two models inside the Lambda handler, the model is chosen at invocation time.
MlS3Stack – Stores the inference model inside of S3 and loads a single model outside of the Lambda handler.
MlOciStack – Stores the inference models inside of the OCI container loads two models outside of the Lambda handler, the model is chosen at invocation time.

To deploy the solution, follow along the README file on GitHub.

Testing the solution

To test the solution, you can either send an inference request through API Gateway or invoke the Lambda function through the CLI. To send a request to the API, run the following command in a terminal (be sure to replace with your API endpoint and Region):

curl --location --request POST 'https://asdf.execute-api.us-east-1.amazonaws.com/develop/' --header 'Content-Type: application/json' --data-raw '{"model_type": "nlp1","question": "When was the car invented?","context": "Cars came into global use during the 20th century, and developed economies depend on them. The year 1886 is regarded as the birth year of the modern car when German inventor Karl Benz patented his Benz Patent-Motorwagen. Cars became widely available in the early 20th century. One of the first cars accessible to the masses was the 1908 Model T, an American car manufactured by the Ford Motor Company. Cars were rapidly adopted in the US, where they replaced animal-drawn carriages and carts, but took much longer to be accepted in Western Europe and other parts of the world."}'

General recommendations for model storage

For single model architectures, you should always load the ML model outside of the Lambda handler for increased performance on subsequent invocations after the initial cold start, this is true regardless of the model storage architecture that is chosen.

For multi-model architectures, if possible, load your model outside of the Lambda handler; however, if you have too many models to load in advance then load them inside of the Lambda handler. This means that a model will be loaded at every invocation of Lambda, increasing the duration of the Lambda function.

Recommendations for model hosting on S3

S3 is a good option if you need a simpler, low-cost storage option to store models. S3 is recommended when you cannot predict your application traffic volume for inference.

Additionally, if you must retrain the model, you can upload the retrained model to the S3 bucket without redeploying the Lambda function.

Recommendations for model hosting on EFS

EFS is a good option if you have a latency-sensitive workload for inference or you are already using EFS in your environment for other machine learning related activities (for example, training or data preparation).

With EFS, you must VPC-enable the Lambda function to mount the EFS filesystem, which requires an additional configuration.

For EFS, it’s recommended that you perform throughput testing with both EFS burst mode and provisioned throughput modes. Depending on inference request traffic volume, if the burst mode is not able to provide the desired performance, you must provision throughput for EFS. See the EFS burst throughput documentation for more information.

Recommendations for container hosted models

This is the simplest approach since all the models are available in the container image uploaded to Lambda. This also has the lowest latency since you are not downloading models from external storage.

However, it requires that all the models are packaged into the container image. If you have too many models that cannot fit into the 10 GB of storage space in the container image, then this is not a viable option.

One drawback of this approach is that anytime a model changes, you must re-package the models with the inference Lambda function code.

This approach is recommended if your models can fit in the 10 GB limit for container images and you are not re-training models frequently.

Cleaning up

To clean up resources created by the CDK templates, run “cdk destroy <StackName>”

Conclusion

Using a serverless architecture for real-time inference can scale your application for any volume of traffic while removing the operational burden of managing your own infrastructure.

In this post, we looked at the serverless architecture that can be used to perform real-time machine learning inference. We then discussed single and multi-model architectures and how to load the models in the Lambda function. We then looked at the different storage mechanisms available to host the machine learning models. We compared S3, EFS, and container hosting for storing models and provided our recommendations of when to use each.

For more learning resources on serverless, visit Serverless Land.

Build workflows for Amazon Forecast with AWS Step Functions

2021-11-01 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/build-workflows-for-amazon-forecast-with-aws-step-functions/

This post is written by Péter Molnár, Data Scientist, ML ProServe and Sachin Doshi, Senior Application Architect, ProServe.

This blog builds a full lifecycle workflow for Amazon Forecast to predict household electricity consumption from historic data. Previously, developers used AWS Lambda function to build workflows for Amazon Forecast. You can now use AWS Step Functions with AWS SDK integrations to reduce cost and complexity.

Step Functions recently expanded the number of supported AWS service integrations from 17 to over 200 with AWS API actions from 46 to over 9,000 with AWS SDK service integrations.

Step Functions is a low-code visual workflow service used for workflow automation and service orchestration. Developers use Step Functions with managed services such as artificial intelligence services, Amazon S3, and AWS Glue.

You can create state machines that use AWS SDK Service Integrations with Amazon States Language (ASL), AWS Cloud Development Kit (AWS CDK), AWS Serverless Application Model (AWS SAM), or visually using AWS Step Function Workflow Studio.

To create workflows for AWS AI services like Forecast, you can use Step Functions AWS SDK service integrations. This approach can be simpler because it allows users to build solutions without writing JavaScript or Python code.

Workflow for Amazon Forecast

The solution includes four components:

IAM role granting Step Functions control over Forecast.
IAM role granting Forecast access to S3 storage locations.
S3 bucket with input data
Define Step Functions state machine and parameters for Forecast.

The repo provides an AWS SAM template to deploy these resources in your AWS account.

Understanding Amazon Forecast

Amazon Forecast is a fully managed service for time series forecasting. Forecast uses machine learning to combine time series data with additional variables to build forecasts.

Using Amazon Forecast involves steps that may take from minutes to hours. Instead of executing each step and waiting for its completion, you use Step Functions to define the steps of the forecasting process.

These are the individual steps of the Step Functions workflow:

Create a dataset: In Forecast, there are three types of datasets, target time series, related time series, and item metadata. The target time series is required and the others provide additional context with certain algorithms.
Import data: This moves the information from S3 into a storage volume where the data is used for training and validation.
Create a dataset group: This is the large box that isolates models and the data they are trained on from each other.
Train a model: Forecast automates this process for you but you can also select algorithms. You can provide your own hyper parameters or use hyperparameter optimization (HPO) to determine the most performant values.
Export back-test results: this creates a detailed table of the model performance.
Deploy a predictor to deploy the model to use it to generate a forecast.
Export forecast to create future predictions.

For more details, read the documentation of the Forecast APIs.

Example dataset

This example uses the individual household electric power consumption dataset. This dataset is available from the UCI Machine Learning Repository. We have aggregated the usage data to hourly intervals.

The dataset has three columns: the timestamp, value, and item ID. These are the minimum required to generate a forecast with Amazon Forecast.

Read more about the data and parameters in https://github.com/aws-samples/amazon-forecast-samples/tree/master/notebooks/basic/Tutorial.

Step Functions AWS SDK integrations

The AWS SDK integrations of Step Functions reduce the need for Lambda functions that call the Forecast APIs. You can call any AWS SDK-compatible service directly from the ASL. Use the following syntax in the resource field of a Step Functions task:

arn:aws:states:::aws-sdk:serviceName:apiAction.[serviceIntegrationPattern]

The following example compares using Step Functions AWS SDK service integrations with calling the boto3 Python method to create a dataset with a corresponding resource in the state machine definition. This is the ASL of a Step Functions state machine:

"States": {
  "Create-Dataset": {
    "Resource": "arn:aws:states:::aws-sdk:forecast:createDataset",
    "Parameters": {
      "DatasetName": "blog_example",
      "DataFrequency": "H",
      "Domain": "CUSTOM",
      "DatasetType": "TARGET_TIME_SERIES",
      "Schema": {
        "Attributes": [
          {
            "AttributeName": "timestamp",
            "AttributeType": "timestamp"
          },
          {
            "AttributeName": "target_value",
            "AttributeType": "float"
          },
          {
            "AttributeName": "item_id",
            "AttributeType": "string"
          }
        ]
      }
    },
    "ResultPath": "$.createDatasetResult",
    "Next": "Import-Data"
  }

The structure is similar to the corresponding boto3 methods. Compare the Python code with the state machine code – it uses the same parameters as calling the Python API:

forecast = boto3.client('forecast')
response = forecast.create_dataset(
             Domain='CUSTOM',
             DatasetType='TARGET_TIME_SERIES',
             DatasetName='blog_example',
             DataFrequency='H', 
             Schema={
               'Attributes': [
                  {
                    'AttributeName': 'timestamp',
                    'AttributeType': 'timestamp'
                  },
                  {
                    'AttributeName': 'target_value',
                    'AttributeType': 'float'
                  },
                  {
                    'AttributeName': 'item_id',
                    'AttributeType': 'string'
                  }
               ]
             }
           )

Handling asynchronous API calls

Several Forecast APIs run asynchronously, such as createDatasetImportJob and createPredictor. This means that your workflow must wait until the import job is completed.

You can use one of two methods in the state machine: create a wait loop, or allow any following task that depends on the completion of the previous task to retry.

In general, it is good practice to allow any task to retry for a few times. For simplicity this example does not include general error handling. Read the blog Handling Errors, Retries, and adding Alerting to Step Function State Machine Executions to learn more about writing robust state machines.

1. State machine wait loop

To wait for an asynchronous task to complete, use the services’ Describe* API methods to get the status of current job. You can implement the wait loop with the native Step Function tasks Choice and Wait.

Here, the task “Check-Data-Import” calls the describeDatasetImportJob API to receive a status value of the running job:

"Check-Data-Import": {
  "Type": "Task",
  "Resource": "arn:aws:states:::aws-sdk:forecast:describeDatasetImportJob",
  "Parameters": {
    "DatasetImportJobArn.$": "$.createDatasetImportJobResult.DatasetImportJobArn"
   },
   "ResultPath": "$.describeDatasetImportJobResult",
   "Next": "Fork-Data-Import"
 },
 "Fork-Data-Import": {
   "Type": "Choice",
   "Choices": [
     {
       "Variable": "$.describeDatasetImportJobResult.Status",
       "StringEquals": "ACTIVE",
       "Next": "Done-Data-Import"
     }
   ],
   "Default": "Wait-Data-Import"
 },
 "Wait-Data-Import": {
   "Type": "Wait",
   "Seconds": 60,
   "Next": "Check-Data-Import"
 },
 "Done-Data-Import": {
   "Type": "Pass",
   "Next": "Create-Predictor"
 },

2. Fail and retry

Alternatively, use the Retry parameter to specify how to repeat the API call in case of an error. This example shows how the attempt to create the forecast is repeated if the resource that it depends on is not created. In this case, the preceding task of creating the predictor.

The time between retries is set to 180 seconds and the number of retries must not exceed 100. This means that the workflow waits 3 minutes before trying again. The longest time to wait for the ML training is five hours.

With the BackoffRate set to 1, the wait interval of 3 minutes remains constant. Value greater that 1 may reduce the number of retries but may also add increased wait time for training jobs that run for several hours:

"Create-Forecast": {
  "Type": "Task",
  "Resource": "arn:aws:states:::aws-sdk:forecast:createForecast",
  "Parameters": {
    "ForecastName.$": "States.Format('{}_forecast', $.ProjectName)",
    "PredictorArn.$": "$.createPredictorResult.PredictorArn"
   },
   "ResultPath": "$.createForecastResult",
   "Retry": [
     {
       "ErrorEquals": ["Forecast.ResourceInUseException"],
       "IntervalSeconds": 180,
       "BackoffRate": 1.0,
       "MaxAttempts": 100
     }
   ],
   "Next": "Forecast-Export"
 },

Deploying the workflow

The AWS Serverless Application Model Command Line Interface (AWS SAM CLI) is an extension of the AWS CLI that adds functionality for building and testing serverless applications. Follow the instructions to install the AWS SAM CLI.

To build and deploy the application:

Clone the GitHub repo:
git clone https://github.com/aws-samples/aws-stepfunctions-examples/tree/main/sam/demo-forecast-service-integration
Change directory to cloned repo.
cd demo-forecast-service-integration
Enable execute permissions for the deployment script
chmod 700 ./bootstrap_deployment_script.sh
Execute the script with a stack name of your choosing as parameter.
./bootstrap_deployment_script.sh <Here goes your stack name>

The script builds the AWS SAM template and deploys the stack under the given name. The AWS SAM template creates the underlying resources like S3 bucket, IAM policies, and Step Functions workflow. The script also copies the data file used for training to the newly created S3 bucket.

Running the workflow

After the AWS SAM Forecast workflow application is deployed, run the workflow to train a forecast predictor for the energy consumption:

Navigate to the AWS Step Functions console.
Select the state machine named “ForecastWorkflowStateMachine-*” and choose New execution.
Define the “ProjectName” parameter in form of a JSON structure. The name for the Forecast Dataset group is “household_energy_forecast”.
Choose Start execution.

Viewing resources in the Amazon Forecast console

Navigate to the Amazon Forecast console and select the data set group “household_energy_forecast”. You can see the details of the Forecast resource as they are created. The provided state machine executes every step in the workflow and then deletes all resources, leaving the output files in S3.

You can disable the clean-up process by editing the state machine:

Choose Edit to open the editor.
Find the tasks “Clean-Up” and change the “Next” state from “Delete-Forecast_export” to “SuccessState”.
```
"Clean-Up": {
   "Type": "Pass",
   "Next": "SuccessState"
},
```
Delete all tasks named Delete-*.

Remember to delete the dataset group manually if you bypass the clean-up process of the workflow.

Analyzing Forecast Results

The forecast workflow creates a folder “forecast_results” for all of its output files. In there you find the subfolders “backtestexport” with data produced by Backtest-Export task, and “forecast” with the predicted energy demand forecast produced by the Forecast-Export job.

The “backtestexport” folder contains two tables: “accuracy-metrics-values” with the model performance accuracy metrics, and “forecast-values” with the predicted forecast values of the training set. Read the blog post Amazon Forecast now supports accuracy measurements for individual items for details.

The forecast predictions are stored in the “forecast” folder. The table contains forecasts at three different quantiles: 10%, 50% and 90%.

The data files are partitioned into multiple CSV files. In order to analyze them, first download and merge the files into proper tables. Use the AWS CLI command to download

BUCKET="<your account number>-<your region>-sf-forecast-workflow"
aws s3 cp s3://$BUCKET/forecast_results . –recursive

Alternatively, you may import and analyze the data into Amazon Athena.

Cleaning up

To delete the application that you created, use the AWS SAM CLI.

sam delete --stack-name <Here goes your stack name>

Also delete the data files in the S3 bucket. If you skipped the clean-up tasks in your workflow, you must delete the dataset group from the Forecast console.

Important things to know

Here are things know, that will help you to use AWS SDK service integration:

Call AWS SDK services directly from the ASL in the resource field of a task state. To do this, use the following syntax: arn:aws:states:::aws-sdk:serviceName:apiAction.[serviceIntegrationPattern]
Use camelCase for apiAction names in the Resource field, such as “copyObject”, and use PascalCase for parameter names in the Parameters field, such as “CopySource”.
Step Functions cannot generate IAM policies for most AWS SDK service integrations. You must add those to the IAM role of the state machine explicitly.

Learn more about this new capability by reading its documentation.

Conclusion

This post shows how to create a Step Functions workflow for Forecast using AWS SDK service integrations, which allows you to use over 200 with AWS API actions. It shows two patterns for handling asynchronous tasks. The first pattern queries the describe-* API repeatedly and the second pattern uses the “Retry” option. This simplifies the development of workflows because in many cases they can replace Lambda functions.

For more serveless learning resources, visit Serverless Land.

Creating AWS Lambda environment variables from AWS Secrets Manager

2021-10-28 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/creating-aws-lambda-environmental-variables-from-aws-secrets-manager/

This post is written by Andy Hall, Senior Solutions Architect.

AWS Lambda layers and extensions are used by third-party software providers for monitoring Lambda functions. A monitoring solution needs environmental variables to provide configuration information to send metric information to an endpoint.

Managing this information as environmental variables across thousands of Lambda functions creates operational overhead. Instead, you can use the approach in this blog post to create environmental variables dynamically from information hosted in AWS Secrets Manager.

This can help avoid managing secret rotation for individual functions. It ensures that values stay encrypted until runtime, and abstracts away the management of the environmental variables.

Overview

This post shows how to create a Lambda layer for Node.js, Python, Ruby, Java, and .NET Core runtimes. It retrieves values from Secrets Manager and converts the secret into an environmental variable that can be used by other layers and functions. The Lambda layer uses a wrapper script to fetch information from Secrets Manager and create environmental variables.

The steps in the process are as follows:

The Lambda service responds to an event and initializes the Lambda context.
The wrapper script is called as part of the Lambda init phase.
The wrapper script calls a Golang executable passing in the ARN for the secret to retrieve.
The Golang executable uses the Secrets Manager API to retrieve the decrypted secret.
The wrapper script converts the information into environmental variables and calls the next step in processing.

All of the code for this post is available from this GitHub repo.

The wrapper script

The wrapper script is the main entry-point for the extension and is called by the Lambda service as part of the init phase. During this phase, the wrapper script will read in basic information from the environment and call the Golang executable. If there was an issue with the Golang executable, the wrapper script will log a statement and exit with an error.

# Get the secret value by calling the Go executable
values=$(${fullPath}/go-retrieve-secret -r "${region}" -s "${secretArn}" -a "${roleName}" -t ${timeout})
last_cmd=$?

# Verify that the last command was successful
if [[ ${last_cmd} -ne 0 ]]; then
    echo "Failed to setup environment for Secret ${secretArn}"
    exit 1
fi

Golang executable

This uses Golang to invoke the AWS APIs since the Lambda execution environment does natively provide the AWS Command Line Interface. The Golang executable can be included in a layer so that the layer works with a number of Lambda runtimes.

The Golang executable captures and validates the command line arguments to ensure that required parameters are supplied. If Lambda does not have permissions to read and decrypt the secret, you can supply an ARN for a role to assume.

The following code example shows how the Golang executable retrieves the necessary information to assume a role using the AWS Security Token Service:

client := sts.NewFromConfig(cfg)

return client.AssumeRole(ctx,
    &sts.AssumeRoleInput{
        RoleArn: &roleArn,
        RoleSessionName: &sessionName,
    },
)

After obtaining the necessary permissions, the secret can be retrieved using the Secrets Manager API. The following code example uses the new credentials to create a client connection to Secrets Manager and the secret:

client := secretsmanager.NewFromConfig(cfg, func(o *secretsmanager.Options) {
    o.Credentials = aws.NewCredentialsCache(credentials.NewStaticCredentialsProvider(*assumedRole.Credentials.AccessKeyId, *assumedRole.Credentials.SecretAccessKey, *assumedRole.Credentials.SessionToken))
})
return client.GetSecretValue(ctx, &secretsmanager.GetSecretValueInput{
    SecretId: aws.String(secretArn),
})

After retrieving the secret, the contents must be converted into a format that the wrapper script can use. The following sample code covers the conversion from a secret string to JSON by storing the data in a map. Once the data is in a map, a loop is used to output the information as key-value pairs.

// Convert the secret into JSON
var dat map[string]interface{}

// Convert the secret to JSON
if err := json.Unmarshal([]byte(*result.SecretString), &dat); err != nil {
    fmt.Println("Failed to convert Secret to JSON")
    fmt.Println(err)
    panic(err)
}

// Get the secret value and dump the output in a manner that a shell script can read the
// data from the output
for key, value := range dat {
    fmt.Printf("%s|%s\n", key, value)
}

Conversion to environmental variables

After the secret information is retrieved by using Golang, the wrapper script can now loop over the output, populate a temporary file with export statements, and execute the temporary file. The following code covers these steps:

# Read the data line by line and export the data as key value pairs 
# and environmental variables
echo "${values}" | while read -r line; do 
    
    # Split the line into a key and value
    ARRY=(${line//|/ })

    # Capture the kay value
    key="${ARRY[0]}"

    # Since the key had been captured, no need to keep it in the array
    unset ARRY[0]

    # Join the other parts of the array into a single value.  There is a chance that
    # The split man have broken the data into multiple values.  This will force the
    # data to be rejoined.
    value="${ARRY[@]}"
    
    # Save as an env var to the temp file for later processing
    echo "export ${key}=\"${value}\"" >> ${tempFile}
done

# Source the temp file to read in the env vars
. ${tempFile}

At this point, the information stored in the secret is now available as environmental variables to layers and the Lambda function.

Deployment

To deploy this solution, you must build on an instance that is running an Amazon Linux 2 AMI. This ensures that the compiled Golang executable is compatible with the Lambda execution environment.

The easiest way to deploy this solution is from an AWS Cloud9 environment but you can also use an Amazon EC2 instance. To build and deploy the solution into your environment, you need the ARN of the secret that you want to use. A build script is provided to ease deployment and perform compilation, archival, and AWS CDK execution.

To deploy, run:

./build.sh <ARN of the secret to use>

Once the build is complete, the following resources are deployed into your AWS account:

A Lambda layer (called get-secrets-layer)
A second Lambda layer for testing (called second-example-layer)
A Lambda function (called example-get-secrets-lambda)

Testing

To test the deployment, create a test event to send to the new example-get-secrets-lambda Lambda function using the AWS Management Console. The test Lambda function uses both the get-secrets-layer and second-example-layer Lambda layers, and the secret specified from the build. This function logs the values of environmental variables that were created by the get-secrets-layer and second-example-layer layers:

The secret contains the following information:

{
  "EXAMPLE_CONNECTION_TOKEN": "EXAMPLE AUTH TOKEN",
  "EXAMPLE_CLUSTER_ID": "EXAMPLE CLUSTER ID",
  "EXAMPLE_CONNECTION_URL": "EXAMPLE CONNECTION URL",
  "EXAMPLE_TENANT": "EXAMPLE TENANT",
  "AWS_LAMBDA_EXEC_WRAPPER": "/opt/second-example-layer"
}

This is the Python code for the example-get-secrets-lambda function:

import os
import json
import sys

def lambda_handler(event, context):
    print(f"Got event in main lambda [{event}]",flush=True)
    
    # Return all of the data
    return {
        'statusCode': 200,
        'layer': {
            'EXAMPLE_AUTH_TOKEN': os.environ.get('EXAMPLE_AUTH_TOKEN', 'Not Set'),
            'EXAMPLE_CLUSTER_ID': os.environ.get('EXAMPLE_CLUSTER_ID', 'Not Set'),
            'EXAMPLE_CONNECTION_URL': os.environ.get('EXAMPLE_CONNECTION_URL', 'Not Set'),
            'EXAMPLE_TENANT': os.environ.get('EXAMPLE_TENANT', 'Not Set'),
            'AWS_LAMBDA_EXEC_WRAPPER': os.environ.get('AWS_LAMBDA_EXEC_WRAPPER', 'Not Set')
        },
        'secondLayer': {
            'SECOND_LAYER_EXECUTE': os.environ.get('SECOND_LAYER_EXECUTE', 'Not Set')
        }
    }

When running a test using the AWS Management Console, you see the following response returned from the Lambda in the AWS Management Console:

{
  "statusCode": 200,
  "layer": {
    "EXAMPLE_AUTH_TOKEN": "EXAMPLE AUTH TOKEN",
    "EXAMPLE_CLUSTER_ID": "EXAMPLE CLUSTER ID",
    "EXAMPLE_CONNECTION_URL": "EXAMPLE CONNECTION URL",
    "EXAMPLE_TENANT": "EXAMPLE TENANT",
    "AWS_LAMBDA_EXEC_WRAPPER": "/opt/second-example-layer"
  },
  "secondLayer": {
    "SECOND_LAYER_EXECUTE": "true"
  }
}

When the secret changes, there is a delay before those changes are available to the Lambda layers and function. This is because the layer only executes in the init phase of the Lambda lifecycle. After the Lambda execution environment is recreated and initialized, the layer executes and creates environmental variables with the new secret information.

Conclusion

This solution provides a way to convert information from Secrets Manager into Lambda environment variables. By following this approach, you can centralize the management of information through Secrets Manager, instead of at the function level.

For more information about the Lambda lifecycle, see the Lambda execution environment lifecycle documentation.

The code for this post is available from this GitHub repo.

For more serverless learning resources, visit Serverless Land.

Accelerating serverless development with AWS SAM Accelerate

2021-10-27 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/accelerating-serverless-development-with-aws-sam-accelerate/

Building a serverless application changes the way developers think about testing their code. Previously, developers would emulate the complete infrastructure locally and only commit code ready for testing. However, with serverless, local emulation can be more complex.

In this post, I show you how to bypass most local emulation by testing serverless applications in the cloud against production services using AWS SAM Accelerate. AWS SAM Accelerate aims to increase infrastructure accuracy for testing with sam sync, incremental builds, and aggregated feedback for developers. AWS SAM Accelerate brings the developer to the cloud and not the cloud to the developer.

AWS SAM Accelerate

The AWS SAM team has listened to developers wanting a better way to emulate the cloud on their local machine and we believe that testing against the cloud is the best path forward. With that in mind, I am happy to announce the beta release of AWS SAM Accelerate!

Previously, the latency of deploying after each change has caused developers to seek other options. AWS SAM Accelerate is a set of features to reduce that latency and enable developers to test their code quickly against production AWS services in the cloud.

To demonstrate the different options, this post uses an example application called “Blog”. To follow along, create your version of the application by downloading the demo project. Note, you need the latest version of AWS SAM and Python 3.9 installed. AWS SAM Accelerate works with other runtimes, but this example uses Python 3.9.

After installing the pre-requisites, set up the demo project with the following commands:

Create a folder for the project called blog
mkdir blog && cd blog
Initialize a new AWS SAM project:
sam init
Chose option 2 for Custom Template Location.
Enter https://github.com/aws-samples/aws-sam-accelerate-demo as the location.

AWS SAM downloads the sample project into the current folder. With the blog application in place, you can now try out AWS SAM Accelerate.

AWS SAM sync

The first feature of AWS SAM Accelerate is a new command called sam sync. This command synchronizes your project declared in an AWS SAM template to the AWS Cloud. However, sam sync differentiates between code and configuration.

AWS SAM defines code as the following:

AWS Lambda function code.
AWS Lambda layer resources.
AWS Step Functions templates in Amazon States Language form.
Amazon API Gateway OpenAPI documents identified in the CodeUri parameter.

Anything else is considered configuration. The following description of the sam sync options explains how sam sync differentiates between configuration synchronization and code synchronization. The resulting patterns are the fastest way to test code in the cloud with AWS SAM.

Using sam sync (no options)

The sam sync command with no options deploys or updates all infrastructure and code like the sam deploy command. However, unlike sam deploy, sam sync bypasses the AWS CloudFormation changeset process. To see this, run:

sam sync --stack-name blog

AWS SAM sync with no options

First, sam sync builds the code using the sam build command and then the application is synchronized to the cloud.

Successful sync

Using SAM sync code, resource, resource-id flags

The sam sync command can also synchronize code changes to the cloud without updating the infrastructure. This code synchronization uses the service APIs and bypasses CloudFormation, allowing AWS SAM to update the code in seconds instead of minutes.

To synchronize code, use the --code flag, which instructs AWS SAM to sync all the code resources in the stack:

sam sync --stack-name blog --code

AWS SAM sync with the code flag

The sam sync command verifies each of the code types present and synchronizes the sources to the cloud. This example uses an API Gateway REST API and two Lambda functions. AWS SAM skips the REST API because there is no external OpenAPI file for this project. However, the Lambda functions and their dependencies are synchronized.

You can limit the synchronized resources by using the --resource flag with the --code flag:

sam sync --stack-name blog --code --resource AWS::Serverless::Function

SAM sync specific resource types

This command limits the synchronization to Lambda functions. Other available resources are AWS::Serverless::Api, AWS::Serverless::HttpApi, and AWS::Serverless::StateMachine.

You can target one specific resource with the --resource-id flag to get more granular:

sam sync --stack-name blog --code --resource-id HelloWorldFunction

SAM sync specific resource

This time sam sync ignores the GreetingFunction and only updates the HelloWorldFunction declared with the command’s --resource-id flag.

Using the SAM sync watch flag

The sam sync --watch option tells AWS SAM to monitor for file changes and automatically synchronize when changes are detected. If the changes include configuration changes, AWS SAM performs a standard synchronization equivalent to the sam sync command. If the changes are code only, then AWS SAM synchronizes the code with the equivalent of the sam sync --code command.

The first time you run the sam sync command with the --watch flag, AWS SAM ensures that the latest code and infrastructure are in the cloud. It then monitors for file changes until you quit the command:

sam sync --stack-name blog --watch

Initial sync

To see a change, modify the code in the HelloWorldFunction (hello_world/app.py) by updating the response to the following:

return {
  "statusCode": 200,
  "body": json.dumps({
    "message": "hello world, how are you",
    # "location": ip.text.replace("\n", "")
  }),
}

Once you save the file, sam sync detects the change and syncs the code for the HelloWorldFunction to the cloud.

AWS SAM detects changes

Auto dependency layer nested stack

During the initial sync, there is a logical resource name called AwsSamAutoDependencyLayerNestedStack. This feature helps to synchronize code more efficiently.

When working with Lambda functions, developers manage the code for the Lambda function and any dependencies required for the Lambda function. Before AWS SAM Accelerate, if a developer does not create a Lambda layer for dependencies, then the dependencies are re-uploaded with the function code on every update. However, with sam sync, the dependencies are automatically moved to a temporary layer to reduce latency.

Auto dependency layer in change set

During the first synchronization, sam sync creates a single nested stack that maintains a Lambda layer for each Lambda function in the stack.

Auto dependency layer in console

These layers are only updated when the dependencies for one of the Lambda functions are updated. To demonstrate, change the requirements.txt (greeting/requirements.txt) file for the GreetingFunction to the following:

Requests
boto3

AWS SAM detects the change, and the GreetingFunction and its temporary layer are updated:

Auto dependency layer synchronized

The Lambda function changes because the Lambda layer version must be updated.

Incremental builds with sam build

The second feature of AWS SAM Accelerate is an update to the SAM build command. This change separates the cache for dependencies from the cache for the code. The build command now evaluates these separately and only builds artifacts that have changed.

To try this out, build the project with the cached flag:

sam build --cached

The first build establishes cache

The first build recognizes that there is no cache and downloads the dependencies and builds the code. However, when you rerun the command:

The second build uses existing cached artifacts

The sam build command verifies that the dependencies have not changed. There is no need to download them again so it builds only the application code.

Finally, update the requirements file for the HelloWorldFunction (hello_w0rld/requirements.txt) to:

Requests
boto3

Now rerun the build command:

AWS SAM build detects dependency changes

The sam build command detects a change in the dependency requirements and rebuilds the dependencies and the code.

Aggregated feedback for developers

The final part of AWS SAM Accelerate’s beta feature set is aggregating logs for developer feedback. This feature is an enhancement to the already existing sam logs command. In addition to pulling Amazon CloudWatch Logs or the Lambda function, it is now possible to retrieve logs for API Gateway and traces from AWS X-Ray.

To test this, start the sam logs:

sam logs --stack-name blog --include-traces --tail

Invoke the HelloWorldApi endpoint returned in the outputs on syncing:

curl https://112233445566.execute-api.us-west-2.amazonaws.com/Prod/hello

The sam logs command returns logs for the AWS Lambda function, Amazon API Gateway REST execution logs, and AWS X-Ray traces.

AWS Lambda logs from Amazon CloudWatch

Amazon API Gateway execution logs from Amazon CloudWatch

Traces from AWS X-Ray

The full picture

Development diagram for AWS SAM Accelerate

With AWS SAM Accelerate, creating and testing an application is easier and faster. To get started:

Start a new project:
sam init
Synchronize the initial project with a development environment:
sam sync --stack-name <project name> --watch
Start monitoring for logs:
sam logs --stack-name <project name> --include-traces --tail
Test using response data or logs.
Iterate.
Rinse and repeat!

Some caveats

AWS SAM Accelerate is in beta as of today. The team has worked hard to implement a solid minimum viable product (MVP) to get feedback from our community. However, there are a few caveats.

Amazon State Language (ASL) code updates for Step Functions does not currently support DefinitionSubstitutions.
API Gateway OpenAPI template must be defined in the DefiitionUri parameter and does not currently support pseudo parameters and intrinsic functions at this time
The sam logs command only supports execution logs on REST APIs and access logs on HTTP APIs.
Function code cannot be inline and must be defined as a separate file in the CodeUri parameter.

Conclusion

When testing serverless applications, developers must get to the cloud as soon as possible. AWS SAM Accelerate helps developers escape from emulating the cloud locally and move to the fidelity of testing in the cloud.

In this post, I walk through the philosophy of why the AWS SAM team built AWS SAM Accelerate. I provide an example application and demonstrate the different features designed to remove barriers from testing in the cloud.

We invite the serverless community to help improve AWS SAM for building serverless applications. As with AWS SAM and the AWS SAM CLI (which includes AWS SAM Accelerate), this project is open source and you can contribute to the repository.

For more serverless content, visit Serverless Land.

Monitoring and tuning federated GraphQL performance on AWS Lambda

2021-10-26 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/monitoring-and-tuning-federated-graphql-performance-on-aws-lambda/

This post is written by Krzysztof Lis, Senior Software Development Engineer, IMDb.

Our federated GraphQL at IMDb distributes requests across 19 subgraphs (graphlets). To ensure reliability for customers, IMDb monitors availability and performance across the whole stack. This article focuses on this challenge and concludes a 3-part federated GraphQL series:

Part 1 presents the migration from a monolithic REST API to a federated GraphQL (GQL) endpoint running on AWS Lambda.
Part 2 describes schema management in federated GQL systems.

This presents an approach towards performance tuning. It compares graphlets with the same logic and different runtime (for example, Java and Node.js) and shows best practices for AWS Lambda tuning.

The post describes IMDb’s test strategy that emphasizes the areas of ownership for the Gateway and Graphlet teams. In contrast to the legacy monolithic system described in part 1, the federated GQL gateway does not own any business logic. Consequently, the gateway integration tests focus solely on platform features, leaving the resolver logic entirely up to the graphlets.

Monitoring and alarming

Efficient monitoring of a distributed system requires you to track requests across all components. To correlate service issues with issues in the gateway or other services, you must pass and log the common request ID.

Capture both error and latency metrics for every network call. In Lambda, you cannot send a response to the client until all work for that request is complete. As a result, this can add latency to a request.

The recommended way to capture metrics is Amazon CloudWatch embedded metric format (EMF). This scales with Lambda and helps avoid throttling by the Amazon CloudWatch PutMetrics API. You can also search and analyze your metrics and logs more easily using CloudWatch Logs Insights.

Lambda configured timeouts emit a Lambda invocation error metric, which can make it harder to separate timeouts from errors thrown during invocation. By specifying a timeout in-code, you can emit a custom metric to alarm on to treat timeouts differently from unexpected errors. With EMF, you can flush metrics before timing out in code, unlike the Lambda-configured timeout.

Running out of memory in a Lambda function also appears as a timeout. Use CloudWatch Insights to see if there are Lambda invocations that are exceeding the memory limits.

You can enable AWS X-Ray tracing for Lambda with a small configuration change to enable tracing. You can also trace components like SDK calls or custom sub segments.

Gateway integration tests

The Gateway team wants tests to be independent from the underlying data served by the graphlets. At the same time, they must test platform features provided by the Gateway – such as graphlet caching.

To simulate the real gateway-graphlet integration, IMDb uses a synthetic test graphlet that serves mock data. Given the graphlet’s simplicity, this reduces the risk of unreliable graphlet data. We can run tests asserting only platform features with the assumption of stable and functional, improving confidence that failing tests indicate issues with the platform itself.

This approach helps to reduce false positives in pipeline blockages and improves the continuous delivery rate. The gateway integration tests are run against the exposed endpoint (for example, a content delivery network) or by invoking the gateway Lambda function directly and passing the appropriate payload.

The former approach allows you to detect potential issues with the infrastructure setup. This is useful when you use infrastructure as code (IaC) tools like AWS CDK. The latter further narrows down the target of the tests to the gateway logic, which may be appropriate if you have extensive infrastructure monitoring and testing already in place.

Graphlet integration tests

The Graphlet team focuses only on graphlet-specific features. This usually means the resolver logic for the graph fields they own in the overall graph. All the platform features – including query federation and graphlet response caching – are already tested by the Gateway Team.

The best way to test the specific graphlet is to run the test suite by directly invoking the Lambda function. If there is any issue with the gateway itself, it does cause a false-positive failure for the graphlet team.

Load tests

It’s important to determine the maximum traffic volume your system can handle before releasing to production. Before the initial launch and before any high traffic events (for example, the Oscars or Golden Globes), IMDb conducts thorough load testing of our systems.

To perform meaningful load testing, the workload captures traffic logs to IMDb pages. We later replay the real customer traffic at the desired transaction-per-second (TPS) volume. This ensures that our tests approximate real-life usage. It reduces the risk of skewing test results due to over-caching and disproportionate graphlet usage. Vegeta is an example of a tool you can use to run the load test against your endpoint.

Canary tests

Canary testing can also help ensure high availability of an endpoint. The canary produces the traffic. This is a configurable script that runs on a schedule. You configure the canary script to follow the same routes and perform the same actions as a user, which allows you to continually verify the user experience even without live traffic.

Canaries should emit success and failure metrics that you can alarm on. For example, if a canary runs 100 times per minute and the success rate drops below 90% in three consecutive data points, you may choose to notify a technician about a potential issue.

Compared with integration tests, canary tests run continuously and do not require any code changes to trigger. They can be a useful tool to detect issues that are introduced outside the code change. For example, through manual resource modification in the AWS Management Console or an upstream service outage.

Performance tuning

There is a per-account limit on the number of concurrent Lambda invocations shared across all Lambda functions in a single account. You can help to manage concurrency by separating high-volume Lambda functions into different AWS accounts. If there is a traffic surge to any one of the Lambda functions, this isolates the concurrency used to a single AWS account.

Lambda compute power is controlled by the memory setting. With more memory comes more CPU. Even if a function does not require much memory, you can adjust this parameter to get more CPU power and improve processing time.

When serving real-time traffic, Provisioned Concurrency in Lambda functions can help to avoid cold start latency. (Note that you should use max, not average for your auto scaling metric to keep it more responsive for traffic increases.) For Java functions, code in static blocks is run before the function is invoked. Provisioned Concurrency is different to reserved concurrency, which sets a concurrency limit on the function and throttles invocations above the hard limit.

Use the maximum number of concurrent executions in a load test to determine the account concurrency limit for high-volume Lambda functions. Also, configure a CloudWatch alarm for when you are nearing the concurrency limit for the AWS account.

There are concurrency limits and burst limits for Lambda function scaling. Both are per-account limits. When there is a traffic surge, Lambda creates new instances to handle the traffic. “Burst limit = 3000” means that the first 3000 instances can be obtained at a much faster rate (invocations increase exponentially). The remaining instances are obtained at a linear rate of 500 per minute until reaching the concurrency limit.

An alternative way of thinking this is that the rate at which concurrency can increase is 500 per minute with a burst pool of 3000. The burst limit is fixed, but the concurrency limit can be increased by requesting a quota increase.

You can further reduce cold start latency by removing unused dependencies, selecting lightweight libraries for your project, and favoring compile-time over runtime dependency injection.

Impact of Lambda runtime on performance

Choice of runtime impacts the overall function performance. We migrated a graphlet from Java to Node.js with complete feature parity. The following graph shows the performance comparison between the two:

To illustrate the performance difference, the graph compares the slowest latencies for Node.js and Java – the P80 latency for Node.js was lower than the minimal latency we recorded for Java.

Conclusion

There are multiple factors to consider when tuning a federated GQL system. You must be aware of trade-offs when deciding on factors like the runtime environment of Lambda functions.

An extensive testing strategy can help you scale systems and narrow down issues quickly. Well-defined testing can also keep pipelines clean of false-positive blockages.

Using CloudWatch EMF helps to avoid PutMetrics API throttling and allows you to run CloudWatch Logs Insights queries against metric data.

For more serverless learning resources, visit Serverless Land.

Creating AWS Serverless batch processing architectures

2021-10-21 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/creating-aws-serverless-batch-processing-architectures/

This post is written by Reagan Rosario, AWS Solutions Architect and Mark Curtis, Solutions Architect, WWPS.

Batch processes are foundational to many organizations and can help process large amounts of information in an efficient and automated way. Use cases include file intake processes, queue-based processing, and transactional jobs, in addition to heavy data processing jobs.

This post explains a serverless solution for batch processing to implement a file intake process. This example uses AWS Step Functions for orchestration, AWS Lambda functions for on-demand instance compute, Amazon S3 for storing the data, and Amazon SES for sending emails.

Overview

This post’s example takes a common use-case of a business’s need to process data uploaded as a file. The test file has various data fields such as item ID, order date, order location. The data must be validated, processed, and enriched with related information such as unit price. Lastly, this enriched data may need to be sent to a third-party system.

Step Functions allows you to coordinate multiple AWS services in fully managed workflows to build and update applications quickly. You can also create larger workflows out of smaller workflows by using nesting. This post’s architecture creates a smaller and modular Chunk processor workflow, which is better for processing smaller files.

As the file size increases, the size of the payload passed between states increases. Executions that pass large payloads of data between states can be stopped if they exceed the maximum payload size of 262,144 bytes.

To process large files and to make the workflow modular, I split the processing between two workflows. One workflow is responsible for splitting up a larger file into chunks. A second nested workflow is responsible for processing records in individual chunk files. This separation of high-level workflow steps from low-level workflow steps also allows for easier monitoring and debugging.

Splitting the files in multiple chunks can also improve performance by processing each chunk in parallel. You can further improve the performance by using dynamic parallelism via the map state for each chunk.

The file upload to an S3 bucket triggers the S3 event notification. It invokes the Lambda function asynchronously with an event that contains details about the object.
Lambda function calls the Main batch orchestrator workflow to start the processing of the file.
Main batch orchestrator workflow reads the input file and splits it into multiple chunks and stores them in an S3 bucket.
Main batch orchestrator then invokes the Chunk Processor workflow for each split file chunk.
Each Chunk processor workflow execution reads and processes a single split chunk file.
Chunk processor workflow writes the processed chunk file back to the S3 bucket.
Chunk processor workflow writes the details about any validation errors in an Amazon DynamoDB table.
Main batch orchestrator workflow then merges all the processed chunk files and saves it to an S3 bucket.
Main batch orchestrator workflow then emails the consolidated files to the intended recipients using Amazon Simple Email Service.

The Main batch orchestrator workflow orchestrates the processing of the file.
1. The first task state Split Input File into chunks calls a Lambda function. It splits the main file into multiple chunks based on the number of records and stores each chunk into an S3 bucket.
2. The next task state Call Step Functions for each chunk invokes a Lambda function. It triggers a workflow for each chunk of the file. It passes information such as the name of bucket and the key where the chunk file to be processed is present.
3. Then we wait for all the child workflow executions to complete.
4. Once all the child workflows are processed successfully, the next task state is Merge all Files. This combines all the processed chunks into a single file and then stores the file back to the S3 bucket.
5. The next task state Email the file takes the output file. It generates an S3 presigned URL for the file and sends an email with the S3 presigned URL.
The Chunk processor workflow is responsible for processing each row from the chunk file that was passed.
1. The first task state Read reads the chunked file from S3 and converts it to an array of JSON objects. Each JSON object represents a row in the chunk file.
2. The next state is a map state called Process messages (not shown in the preceding visual workflow). It runs a set of steps for each element of an input array. The input to the map state is an array of JSON objects passed by the previous task.
3. Within the map state, Validate Data is the first state. It invokes a Lambda function that validates each JSON object using the rules that you have created. Records that fail validation are stored in an Amazon DynamoDB table.
4. The next state Get Financial Data invokes Amazon API Gateway endpoints to enrich the data in the file with data from a DynamoDB table.
5. When the map state iterations are complete, the Write output file state triggers a task. It calls a Lambda function, which converts the JSON data back to CSV and writes the output object to S3.

Prerequisites

AWS account.
AWS SAM CLI.
Python 3.
An AWS Identity and Access Management (IAM) role with appropriate access.

Deploying the application

Clone the repository.
Change to the directory and build the application source:
sam build
Package and deploy the application to AWS. When prompted, input the corresponding parameters as shown below:
sam deploy –guided
Note the template parameters:
- SESSender: The sender email address for the output file email.
- SESRecipient: The recipient email address for the output file email.
- SESIdentityName: An email address or domain that Amazon SES users use to send email.
- InputArchiveFolder: Amazon S3 folder where the input file will be archived after processing.
- FileChunkSize: Size of each of the chunks, which is split from the input file.
- FileDelimiter: Delimiter of the CSV file (for example, a comma).
After the stack creation is complete, you see the source bucket created in Outputs.
Review the deployed components in the AWS CloudFormation Console.

Testing the solution

Before you can send an email using Amazon SES, you must verify each identity that you’re going to use as a “From”, “Source”, “Sender”, or “Return-Path” address to prove that you own it. Refer Verifying identities in Amazon SES for more information.
Locate the S3 bucket (SourceBucket) in the Resources section of the CloudFormation stack. Choose the physical ID.
In the S3 console for the SourceBucket, choose Create folder. Name the folder input and choose Create folder.
The S3 event notification on the SourceBucket uses “input” as the prefix and “csv” as the suffix. This triggers the notification Lambda function. This is created as a part of the custom resource in the AWS SAM template.
In the S3 console for the SourceBucket, choose the Upload button. Choose Add files and browse to the input file (testfile.csv). Choose Upload.
Review the data in the input file testfile.csv.
After the object is uploaded, the event notification triggers the Lambda Function. This starts the main orchestrator workflow. In the Step Functions console, you see the workflow is in a running state.
Choose an individual state machine to see additional information.
After a few minutes, both BlogBatchMainOrchestrator and BlogBatchProcessChunk workflows have completed all executions. There is one execution for the BlogBatchMainOrchestrator workflow and multiple invocations of the BlogBatchProcessChunk workflow. This is because the BlogBatchMainOrchestrator invokes the BlogBatchProcessChunk for each of the chunked files.

Checking the output

Open the S3 console and verify the folders created after the process has completed.

The following subfolders are created after the processing is complete:
– input_archive – Folder for archival of the input object.
– 0a47ede5-4f9a-485e-874c-7ff19d8cadc5 – Subfolder with a unique UUID in the name. This is created for storing the objects generated during batch execution.
Select the folder 0a47ede5-4f9a-485e-874c-7ff19d8cadc5.

output – This folder contains the completed output objects, some housekeeping files, and processed chunk objects.
to_process – This folder contains all the split objects from the original input file.
Open the processed object from the output/completed folder.

Inspect the output object testfile.csv. It is enriched with additional data (columns I through N) from the DynamoDB table fetched through an API call.

Viewing a completed workflow

Open the Step Functions console and browse to the BlogBatchMainOrchestrator and BlogBatchProcessChunk state machines. Choose one of the executions of each to locate the Graph Inspector. This shows the execution results for each state.

BlogBatchMainOrchestrator:

BlogBatchProcessChunk:

Batch performance

For this use case, this is the time taken for the batch to complete, based on the number of input records:

No. of records	Time for batch completion
10 k	5 minutes
100 k	7 minutes

The performance of the batch depends on other factors such as the Lambda memory settings and data in the file. Read more about Profiling functions with AWS Lambda Power Tuning.

Conclusion

This blog post shows how to use Step Functions’ features and integrations to orchestrate a batch processing solution. You use two Steps Functions workflows to implement batch processing, with one workflow splitting the original file and a second workflow processing each chunk file.

The overall performance of our batch processing application is improved by splitting the input file into multiple chunks. Each chunk is processed by a separate state machine. Map states further improve the performance and efficiency of workflows by processing individual rows in parallel.

Download the code from this repository to start building a serverless batch processing system.

Additional Resources:

For more serverless learning resources, visit Serverless Land.

Operating serverless at scale: Keeping control of resources – Part 3

2021-10-19 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/operating-serverless-at-scale-keeping-control-of-resources-part-3/

This post is written by Jerome Van Der Linden, Solutions Architect.

In the previous part of this series, I provide application archetypes for developers to follow company best practices and include libraries needed for compliance. But using these archetypes is optional and teams can still deploy resources without them. Even if they use them, the templates can be modified. Developers can remove a layer, over-permission functions, or allow access to APIs without appropriate authorization.

To avoid this, you must define guardrails. Templates are good for providing guidance, best practices and to improve productivity. But they do not prevent actions like guardrails do. There are two kinds of guardrails:

Proactive: you define rules and permissions that avoid some specific actions.
Reactive: you define controls that detect if something happens and trigger notifications to alert someone or remediate actions.

This third part on serverless governance describes different guardrails and ways to implement them.

Implementing proactive guardrails

Proactive guardrails are often the most efficient but also the most restrictive. Be sure to apply them with caution as you could reduce developers’ agility and productivity. For example, test in a sandbox account before applying more broadly.

In this category, you typically find IAM policies and service control policies. This section explores some examples applied to serverless applications.

Controlling access through policies

Part 2 discusses Lambda layers, to include standard components and ensure compliance of Lambda functions. You can enforce the use of a Lambda layer when creating or updating a function, using the following policy. The condition checks if a layer is configured with the appropriate layer ARN:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ConfigureFunctions",
            "Effect": "Allow",
            "Action": [
                "lambda:CreateFunction",
                "lambda:UpdateFunctionConfiguration"
            ],
            "Resource": "*",
            "Condition": {
                "ForAllValues:StringLike": {
                    "lambda:Layer": [
                        "arn:aws:lambda:*:123456789012:layer:my-company-layer:*"
                    ]
                }
            }
        }
    ]
}

When deploying Lambda functions, some companies also want to control the source code integrity and verify it has not been altered. Using code signing for AWS Lambda, you can sign the package and verify its signature at deployment time. If the signature is not valid, you can be warned or even block the deployment.

An administrator must first create a signing profile (you can see it as a trusted publisher) using AWS Signer. Then, a developer can reference this profile in its AWS SAM template to sign the Lambda function code:

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.9
      CodeSigningConfigArn: !Ref MySignedFunctionCodeSigningConfig

  MySignedFunctionCodeSigningConfig:
    Type: AWS::Lambda::CodeSigningConfig
    Properties:
      AllowedPublishers:
        SigningProfileVersionArns:
          - arn:aws:signer:eu-central-1:123456789012:/signing-profiles/MySigningProfile
      CodeSigningPolicies:
        UntrustedArtifactOnDeployment: "Enforce"

Using the AWS SAM CLI and the --signing-profile option, you can package and deploy the Lambda function using the appropriate configuration. Read the documentation for more details.

You can also enforce the use of code signing by using a policy so that every function must be signed before deployment. Use the following policy and a condition requiring a CodeSigningConfigArn:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ConfigureFunctions",
            "Effect": "Allow",
            "Action": [
                "lambda:CreateFunction"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "lambda:CodeSigningConfigArn": "arn:aws:lambda:eu-central-1:123456789012:code-signing-config:csc-0c44689353457652"
                }
            }
        }
    ]
}

When using Amazon API Gateway, you may want to use a standard authorization mechanism. For example, a Lambda authorizer to validate a JSON Web Token (JWT) issued by your company identity provider. You can do that using a policy like this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyWithoutJWTLambdaAuthorizer",
      "Effect": "Deny",
      "Action": [
        "apigateway:PUT",
        "apigateway:POST",
        "apigateway:PATCH"
      ],
      "Resource": [
        "arn:aws:apigateway:eu-central-1::/apis",
        "arn:aws:apigateway:eu-central-1::/apis/??????????",
        "arn:aws:apigateway:eu-central-1::/apis/*/authorizers",
        "arn:aws:apigateway:eu-central-1::/apis/*/authorizers/*"
      ],
      "Condition": {
        "ForAllValues:StringNotEquals": {
          "apigateway:Request/AuthorizerUri": 
            "arn:aws:apigateway:eu-central-1:lambda:path/2015-03-31/functions/arn:aws:lambda:eu-central-1:123456789012:function:MyCompanyJWTAuthorizer/invocations"
        }
      }
    }
  ]
}

To enforce the use of mutual authentication (mTLS) and TLS version 1.2 for APIs, use the following policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "EnforceTLS12",
      "Effect": "Allow",
      "Action": [
        "apigateway:POST"
      ],
      "Resource": [
        "arn:aws:apigateway:eu-central-1::/domainnames",
        "arn:aws:apigateway:eu-central-1::/domainnames/*"
      ],
      "Condition": {
        "ForAllValues:StringEquals": {
            "apigateway:Request/SecurityPolicy": "TLS_1_2"
        }
      }
    }
  ]
}

You can apply other guardrails for Lambda, API Gateway, or another service. Read the available policies and conditions for your service here.

Securing self-service with permissions boundaries

When creating a Lambda function, developers must create a role that the function will assume when running. But by giving the ability to create roles to developers, one could elevate their permission level. In the following diagram, you can see that an admin gives this ability to create roles to developers:

Developer 1 creates a role for a function. This only allows Amazon DynamoDB read/write access and a basic execution role for Lambda (for Amazon CloudWatch Logs). But developer 2 is creating a role with administrator permission. Developer 2 cannot assume this role but can pass it to the Lambda function. This role could be used to create resources on Amazon EC2, delete an Amazon RDS database or an Amazon S3 bucket, for example.

To avoid users elevating their permissions, define permissions boundaries. With these, you can limit the scope of a Lambda function’s permissions. In this example, an admin still gives the same ability to developers to create roles but this time with a permissions boundary attached. Now the function cannot perform actions that exceed this boundary:

The admin must first define the permissions boundaries within an IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "LambdaDeveloperBoundary",
            "Effect": "Allow",
            "Action": [
                "s3:List*",
                "s3:Get*",
                "logs:*",
                "dynamodb:*",
                "lambda:*"
            ],
            "Resource": "*"
        }
    ]
}

Note that this boundary is still too permissive and you should reduce and adopt a least privilege approach. For example, you may not want to grant the dynamodb:DeleteTable permission or restrict it to a specific table.

The admin can then provide the CreateRole permission with this boundary using a condition:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CreateRole",
            "Effect": "Allow",
            "Action": [
                "iam:CreateRole"
            ],
            "Resource": "arn:aws:iam::123456789012:role/lambdaDev*",
            "Condition": {
                "StringEquals": {
                    "iam:PermissionsBoundary": "arn:aws:iam::123456789012:policy/lambda-dev-boundary"
                }
            }
        }
    ]
}

Developers assuming a role lambdaDev* can create a role for their Lambda functions but these functions cannot have more permissions than defined in the boundary.

Deploying reactive guardrails

The principle of least privilege is not always easy to accomplish. To achieve it without this permission management burden, you can use reactive guardrails. Actions are allowed but they are detected and trigger a notification or a remediation action.

To accomplish this on AWS, use AWS Config. It continuously monitors your resources and their configurations. It assesses them against compliance rules that you define and can notify you or automatically remediate to non-compliant resources.

AWS Config has more than 190 built-in rules and some are related to serverless services. For example, you can verify that an API Gateway REST API is configured with SSL or protected by a web application firewall (AWS WAF). You can ensure that a DynamoDB table has back up configured in AWS Backup or that data is encrypted.

Lambda also has a set of rules. For example, you can ensure that functions have a concurrency limit configured, which you should. Most of these rules are part of the “Operational Best Practices for Serverless” conformance pack to ease their deployment as a single entity. Otherwise, setting rules and remediation can be done in the AWS Management Console or AWS CLI.

If you cannot find a rule for your use case in the AWS Managed Rules, you can find additional ones on GitHub or write your own using the Rule Development Kit (RDK). For example, enforcing the use of a Lambda layer for functions. This is possible using a service control policy but it denies the creation or modification of the function if the layer is not provided. You can use this policy in production but you may only want to notify the developers in their sandbox accounts or test environments.

By using the RDK CLI, you can bootstrap a new rule:

rdk create LAMBDA_LAYER_CHECK --runtime python3.9 \
--resource-types AWS::Lambda::Function \
--input-parameters '{"LayerArn":"arn:aws:lambda:region:account:layer:layer_name", "MinLayerVersion":"1"}'

It generates a Lambda function, some tests, and a parameters.json file that contains the configuration for the rule. You can then edit the Lambda function code and the evaluate_compliance method. To check for a layer:

LAYER_REGEXP = 'arn:aws:lambda:[a-z]{2}((-gov)|(-iso(b?)))?-[a-z]+-\d{1}:\d{12}:layer:[a-zA-Z0-9-_]+'

def evaluate_compliance(event, configuration_item, valid_rule_parameters):
    pkg = configuration_item['configuration']['packageType']
    if not pkg or pkg != "Zip":
        return build_evaluation_from_config_item(configuration_item, 'NOT_APPLICABLE',
                                                 annotation='Layers can only be used with functions using Zip package type')

    layers = configuration_item['configuration']['layers']
    if not layers:
        return build_evaluation_from_config_item(configuration_item, 'NON_COMPLIANT',
                                                 annotation='No layer is configured for this Lambda function')

    regex = re.compile(LAYER_REGEXP + ':(.*)')
    annotation = 'Layer ' + valid_rule_parameters['LayerArn'] + ' not used for this Lambda function'
    for layer in layers:
        arn = layer['arn']
        version = regex.search(arn).group(5)
        arn = re.sub('\:' + version + '$', '', arn)
        if arn == valid_rule_parameters['LayerArn']:
            if version >= valid_rule_parameters['MinLayerVersion']:
                return build_evaluation_from_config_item(configuration_item, 'COMPLIANT')
            else:
                annotation = 'Wrong layer version (was ' + version + ', expected ' + valid_rule_parameters['MinLayerVersion'] + '+)'

    return build_evaluation_from_config_item(configuration_item, 'NON_COMPLIANT',
                                             annotation=annotation)

You can find the complete source of this AWS Config rule and its tests on GitHub.

Once the rule is ready, use the command rdk deploy to deploy it on your account. To deploy it across multiple accounts, see the documentation. You can then define remediation actions. For example, automatically add the missing layer to the function or send a notification to the developers using Amazon Simple Notification Service (SNS).

Conclusion

This post describes guardrails that you can set up in your accounts or across the organization to keep control over deployed resources. These guardrails can be more or less restrictive according to your requirements.

Use proactive guardrails with service control policies to define coarse-grained permissions and block everything that must not be used. Define reactive guardrails for everything else to aid agility and productivity but still be informed of the activity and potentially remediate.

This concludes this series on serverless governance:

Standardization is an important aspect of the governance to speed up teams and ensure that deployed applications are operable and compliant with your internal rules. Use templates, layers, and other mechanisms to create shareable archetypes to apply these standards and rules at the enterprise level.
It’s important to keep visibility and control on your resources, to understand how your environment evolves and to be able to operate and act if needed. Tags and guardrails are helpful to achieve this and they should evolve as your maturity with the cloud evolves.

Find more SCP examples and all the AWS managed AWS Config rules in the documentation.

For more serverless learning resources, visit Serverless Land.

Building dynamic Amazon SNS subscriptions for auto scaling container workloads

2021-10-18 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-dynamic-amazon-sns-subscriptions-for-auto-scaling-container-workloads/

This post is written by Mithun Mallick, Senior Specialist Solutions Architect, App Integration.

Amazon Simple Notification Service (SNS) is a serverless publish subscribe messaging service. It supports a push-based subscriptions model where subscribers must register an endpoint to receive messages. Amazon Simple Queue Service (SQS) is one such endpoint, which is used by applications to receive messages published on an SNS topic.

With containerized applications, the container instances poll the queue and receive the messages. However, containerized applications can scale out for a variety of reasons. The creation of an SQS queue for each new container instance creates maintenance overhead for customers. You must also clean up the SNS-SQS subscription once the instance scales in.

This blog walks through a dynamic subscription solution, which automates the creation, subscription, and deletion of SQS queues for an Auto Scaling group of containers running in Amazon Elastic Container Service (ECS).

Overview

The solution is based on the use of events to achieve the dynamic subscription pattern. ECS uses the concept of tasks to create an instance of a container. You can find more details on ECS tasks in the ECS documentation.

This solution uses the events generated by ECS to manage the complete lifecycle of an SNS-SQS subscription. It uses the task ID as the name of the queue that is used by the ECS instance for pulling messages. More details on the ECS task ID can be found in the task documentation.

This also uses Amazon EventBridge to apply rules on ECS events and trigger an AWS Lambda function. The first rule detects the running state of an ECS task and triggers a Lambda function, which creates the SQS queue with the task ID as queue name. It also grants permission to the queue and creates the SNS subscription on the topic.

As the container instance starts up, it can send a request to its metadata URL and retrieve the task ID. The task ID is used by the container instance to poll for messages. If the container instance terminates, ECS generates a task stopped event. This event matches a rule in Amazon EventBridge and triggers a Lambda function. The Lambda function retrieves the task ID, deletes the queue, and deletes the subscription from the SNS topic. The solution decouples the container instance from any overhead in maintaining queues, applying permissions, or managing subscriptions. The security permissions for all SNS-SQS management are handled by the Lambda functions.

This diagram shows the solution architecture:

Events from ECS are sent to the default event bus. There are various events that are generated as part of the lifecycle of an ECS task. You can find more on the various ECS task states in ECS task documentation. This solution uses ECS as the container orchestration service but you can also use Amazon Elastic Kubernetes Service.(EKS). For EKS, you must apply the rules for EKS task state events.

Walkthrough of the implementation

The code snippets are shortened for brevity. The full source code of the solution is in the GitHub repository. The solution uses AWS Serverless Application Model (AWS SAM) for deployment.

SNS topic

The SNS topic is used to send notifications to the ECS tasks. The following snippet from the AWS SAM template shows the definition of the SNS topic:

  SNSDynamicSubsTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: !Ref DynamicSubTopicName

Container instance

The container instance subscribes to the SNS topic using an SQS queue. The container image is a Java class that reads messages from an SQS queue and prints them in the logs. The following code shows some of the message processor implementation:

AmazonSQS sqs = AmazonSQSClientBuilder.defaultClient();
AmazonSQSResponder responder = AmazonSQSResponderClientBuilder.standard()
        .withAmazonSQS(sqs)
        .build();

SQSMessageConsumer consumer = SQSMessageConsumerBuilder.standard()
        .withAmazonSQS(responder.getAmazonSQS())
        .withQueueUrl(queue_url)
        .withConsumer(message -> {
            System.out.println("The message is " + message.getBody());
            sqs.deleteMessage(queue_url,message.getReceiptHandle());

        }).build();
consumer.start();

The queue_url highlighted is the task ID of the ECS task. It is retrieved in the constructor of the class:

String metaDataURL = map.get("ECS_CONTAINER_METADATA_URI_V4");

HttpGet request = new HttpGet(metaDataURL);
CloseableHttpResponse response = httpClient.execute(request);

HttpEntity entity = response.getEntity();
if (entity != null) {
    String result = EntityUtils.toString(entity);
    String taskARN = JsonPath.read(result, "$['Labels']['com.amazonaws.ecs.task-arn']").toString();
    String[] arnTokens = taskARN.split("/");
    taskId = arnTokens[arnTokens.length-1];
    System.out.println("The task arn : "+taskId);
}

queue_url = sqs.getQueueUrl(taskId).getQueueUrl();

The queue URL is constructed from the task ID of the container. Each queue is dedicated to each of the tasks or the instances of the container running in ECS.

EventBridge rules

The following event pattern on the default event bus captures events that match the start of the container instance. The rule triggers a Lambda function:

      EventPattern:
        source:
          - aws.ecs
        detail-type:
          - "ECS Task State Change"
        detail:
          desiredStatus:
            - "RUNNING"
          lastStatus:  
            - "RUNNING"

The start rule routes events to a Lambda function that creates a queue with the name as the task ID. It creates the subscription to the SNS topic and grants permission on the queue to receive messages from the topic.

This event pattern matches STOPPED events of the container task. It also triggers a Lambda function to delete the queue and the associated subscription:

      EventPattern:
        source:
          - aws.ecs
        detail-type:
          - "ECS Task State Change"
        detail:
          desiredStatus:
            - "STOPPED"
          lastStatus:  
            - "STOPPED"

Lambda functions

There are two Lambda functions that perform the queue creation, subscription, authorization, and deletion.

The SNS-SQS-Subscription-Service

The following code creates the queue based on the task id, applies policies, and subscribes it to the topic. It also stores the subscription ARN in a Amazon DynamoDB table:

# get the task id from the event
taskArn = event['detail']['taskArn']
taskArnTokens = taskArn.split('/')
taskId = taskArnTokens[len(taskArnTokens)-1]

create_queue_resp = sqs_client.create_queue(QueueName=queue_name)

response = sns.subscribe(TopicArn=topic_arn, Protocol="sqs", Endpoint=queue_arn)

ddbresponse = dynamodb.update_item(
    TableName=SQS_CONTAINER_MAPPING_TABLE,
    Key={
        'id': {
            'S' : taskId.strip()
        }
    },
    AttributeUpdates={
        'SubscriptionArn':{
            'Value': {
                'S': subscription_arn
            }
        }
    },
    ReturnValues="UPDATED_NEW"
)

The cleanup service

The cleanup function is triggered when the container instance is stopped. It fetches the subscription ARN from the DynamoDB table based on the taskId. It deletes the subscription from the topic and deletes the queue. You can modify this code to include any other cleanup actions or trigger a workflow. The main part of the function code is:

taskId = taskArnTokens[len(taskArnTokens)-1]

ddbresponse = dynamodb.get_item(TableName=SQS_CONTAINER_MAPPING_TABLE,Key={'id': { 'S' : taskId}})
snsresp = sns.unsubscribe(SubscriptionArn=subscription_arn)

queuedelresp = sqs_client.delete_queue(QueueUrl=queue_url)

Conclusion

This blog shows an event driven approach to handling dynamic SNS subscription requirements. It relies on the ECS service events to trigger appropriate Lambda functions. These create the subscription queue, subscribe it to a topic, and delete it once the container instance is terminated.

The approach also allows the container application logic to focus only on consuming and processing the messages from the queue. It does not need any additional permissions to subscribe or unsubscribe from the topic or apply any additional permissions on the queue. Although the solution has been presented using ECS as the container orchestration service, it can be applied for EKS by using its service events.

For more serverless learning resources, visit Serverless Land.