Tag Archives: serverless

Orchestrating AWS Glue crawlers using AWS Step Functions

2022-06-27 Benjamin Smith

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/orchestrating-aws-glue-crawlers-using-aws-step-functions/

This blog post is written by Justin Callison, General Manager, AWS Workflow.

Organizations generate terabytes of data every day in a variety of semistructured formats. AWS Glue and Amazon Athena can give you a simpler and more cost-effective way to analyze this data with no infrastructure to manage. AWS Glue crawlers identify the schema of your data and manage the metadata required to analyze the data in place, without the need to transform this data and load into a data warehouse.

The timing of when your crawlers run and complete is important. You must ensure the crawler runs after your data has updated and before you query it with Athena or analyze with an AWS Glue job. If not, your analysis may experience errors or return incomplete results.

In this blog, you learn how to use AWS Step Functions, a low-code visual workflow service that integrates with over 220 AWS services. The service orchestrates your crawlers to control when they start, confirm completion, and combine them into end-to-end, serverless data processing workflows.

Using Step Functions to orchestrate multiple AWS Glue crawlers, provides a number of benefits when compared to implementing a solution directly with code. Firstly, the workflow provides an instant visual understanding of the application, and any errors that might occur during execution. Step Functions’ ability to run nested workflows inside a Map state helps to decouple and reuse application components with native array iteration. Finally, the Step Functions wait state lets the workflow periodically poll the status of the crawl job, without incurring additional cost for idol wait time.

Deploying the example

With this example, you create three datasets in Amazon S3, then use Step Functions to orchestrate AWS Glue crawlers to analyze the datasets and make them available to query using Athena.

You deploy the example with AWS CloudFormation using the following steps:

Download the template.yaml file from here.
Log in to the AWS Management Console and go to AWS CloudFormation.
Navigate to Stacks -> Create stack and select With new resources (standard).
Select Template is ready and Upload a template file, then Choose File and select the template.yaml file that you downloaded in Step 1 and choose Next.
Enter a stack name, such as glue-stepfunctions-demo, and choose Next.
Choose Next, check the acknowledgement boxes in the Capabilities and transforms section, then choose Create stack.
After deployment, the status updates to CREATE_COMPLETE.

Create your datasets

Navigate to Step Functions in the AWS Management Console and select the create-dataset state machine from the list. This state machine uses Express Workflows and the Parallel state to build three datasets concurrently in S3. The first two datasets include information by user and location respectively and include files per day over the 5-year period from 2016 to 2020. The third dataset is a simpler, all-time summary of data by location.

To create the datasets, you choose Start execution from the toolbar for the create-dataset state machine, then choose Start execution again in the dialog box. This runs the state machine and creates the datasets in S3.

Navigate to the S3 console and view the glue-demo-databucket created for this example. In this bucket, in a folder named data, there are three subfolders, each containing a dataset.

The all-time-location-summaries folder contains a set of JSON files, one for each location.

The daily-user-summaries and daily-location-summaries contain a folder structure with nested folders for each year, month, and date. In addition to making this data easier to navigate via the console, this folder structure provides hints to AWS Glue that it can use to partition this dataset and make it more efficient to query.

Crawling

You now use AWS Glue crawlers to analyze these datasets and make them available to query. Navigate to the AWS Glue console, select Crawlers to see the list of Crawlers that you created when you deployed this example. Select the daily-user-summaries crawler to view details and note that they have tags assigned to indicate metadata such as the datatype of the data and whether the dataset is-partitioned.

Now, return to the Step Functions console and view the run-crawlers-with-tags state machine. This state machine uses AWS SDK service integrations to get a list of all crawlers matching the tag criteria you enter. It then uses the map state and the optimized service integration for Step Functions to execute the run-crawler state machine for each of the matching crawlers concurrently. The run-crawler state machine starts each crawler and monitors status until the crawler completes. Once each of the individual crawlers have completed, the run-crawlers-with-tags state machine also completes.

To initiate the crawlers:

Choose Start execution from the top of the page when viewing the run-crawlers-with-tags state machine
Provide the following as Input
{"tags": {"datatype": "json"}}
Choose Start execution.

After 2-3 minutes, the execution finishes with a Succeeded status once all three crawlers have completed. During this time, you can navigate to the run-crawler state machine to view the individual, nested executions per crawler or to the AWS Glue console to see the status of the crawlers.

Querying the data using Amazon Athena

Now, navigate to the Athena console where you can see the database and tables created by your crawlers. Note that AWS Glue recognized the partitioning scheme and included fields for year, month, and date in addition to user and usage fields for the data contained in the JSON files.

If you have not used Athena in this account before, you see a message instructing you to set a query result location. Choose View settings -> Manage -> Browse S3 and select the athena-results bucket that you created when you deployed the example. Choose Save then return to the Editor to continue.

You can now run queries such as the following, to calculate the total usage for all users over 5 years.

SELECT SUM(usage) all_time_usage FROM “daily_user_summaries”

You can also add filters, as shown in the following example, which limit results to those from 2016.

SELECT SUM(usage) all_time_usage FROM “daily_user_summaries” WHERE year = ‘2016’

Note this second query scanned only 17% as much data (133 KB vs 797 KB) and completed faster. This is because Athena used the partitioning information to avoid querying the full dataset. While the differences in this example are small, for real-world datasets with terabytes of data, your cost and latency savings from partitioning data can be substantial.

The disadvantage of a partitioning scheme is that new folders are not included in query results until you add new partitions. Re-running your crawler identifies and adds the new partitions and using Step Functions to orchestrate these crawlers makes that task simpler.

Extending the example

You can use these example state machines as they are in your AWS accounts to manage your existing crawlers. You can use Amazon S3 event notifications with Amazon EventBridge to trigger crawlers based on data changes. With the Optimized service integration for Amazon Athena, you can extend your workflows to execute queries against these crawled datasets. And you can use these examples to integrate crawler execution into your end-to-end data processing workflows, creating reliable, auditable workflows from ingestion through to analysis.

Conclusion

In this blog post, you learn how to use Step Functions to orchestrate AWS Glue crawlers. You deploy an example that generates three datasets, then uses Step Functions to start and coordinate crawler runs that analyze this data and make it available to query using Athena.

To learn more about Step Functions, visit Serverless Land.

Sending Amazon EventBridge events to private endpoints in a VPC

2022-06-23 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/sending-amazon-eventbridge-events-to-private-endpoints-in-a-vpc/

This post is written by Emily Shea, Senior GTM Specialist, Event-Driven Architectures.

Building with events can help you accelerate feature velocity and build scalable, fault tolerant applications. You can achieve loose coupling in your application using asynchronous communication via events. Loose coupling allows each development team to build and deploy independently and each component to scale and fail without impacting the others. This approach is referred to as event-driven architecture.

Amazon EventBridge helps you build event-driven architectures. You can publish events to the EventBridge event bus and EventBridge routes those events to targets. You can write rules to filter events and only send them to the interested targets. For example, an order fulfillment service may only be interested in events of type ‘new order created.’

EventBridge is serverless, so there is no infrastructure to manage and the service scales automatically. EventBridge has native integrations with over 100 AWS services and over 40 SaaS providers.

Amazon EventBridge has a native integration with AWS Lambda, and many AWS customers use events to trigger Lambda functions to process events. You may also want to send events to workloads running on Amazon EC2 or containerized workloads deployed with Amazon ECS or Amazon EKS. These services are deployed into an Amazon Virtual Private Cloud, or VPC.

For some use cases, you may be able to expose public endpoints for your VPC. You can use EventBridge API destinations to send events to any public HTTP endpoint. API destinations include features like OAuth support and rate limiting to control the number of events you are sending per second.

However, some customers are not able to expose public endpoints for security or compliance purposes. This tutorial shows you how to send EventBridge events to a private endpoint in a VPC using a Lambda function to relay events. This solution deploys the Lambda function connected to the VPC and uses IAM permissions to enable EventBridge to invoke the Lambda function. Learn more about Lambda VPC connectivity here.

In this blog post, you learn how to send EventBridge events to a private endpoint in a VPC. You set up an example application with an EventBridge event bus, a Lambda function to relay events, a Flask application running in an EKS cluster to receive events behind an Application Load Balancer (ALB), and a secret stored in Secrets Manager for authenticating requests. This application uses EKS and Secrets Manager to demonstrate sending and authenticating requests to a containerized workload, but the same pattern applies for other container orchestration services like ECS and your preferred secret management solution.

Continue reading for the full example application and walkthrough. If you have an existing application in a VPC, you can deploy just the event relay portion and input your VPC details as parameters.

Solution overview

An event is sent to the EventBridge bus.
If the event matches a certain pattern (ex, if ‘detail-type’ is ‘inbound-event-sent’), an EventBridge rule uses EventBridge’s input transformer to format the event as an HTTP call.
The EventBridge rule pushes the event to a Lambda function connected to the VPC and a CloudWatch Logs group for debugging.
The Lambda function calls Secrets Manager and retrieves a secret key. It appends the secret key to the event headers and then makes an HTTP call to the ALB URL endpoint.
ALB routes this HTTP call to a node group in the EKS cluster. The Flask application running on the EKS cluster calls Secret Manager, confirms that the secret key is valid, and then processes the event.
The Lambda function receives a response from ALB.
1. If the Flask application fails to process the event for any reason, the Lambda function raises an error. The function’s failure destination is configured to send the event and the error message to an SQS dead letter queue.
2. If the Flask application successfully processes the event and the ‘return-response-event’ flag in the event was set to ‘true’, then the Lambda function publishes a new ‘outbound-event-sent’ event to the same EventBridge bus.
Another EventBridge rule matches detail-type ‘outbound-event-sent’ events and routes these to the CloudWatch Logs group for debugging.

Prerequisites

To run the application, you must install the AWS CLI, Docker CLI, eksctl, kubectl, and AWS SAM CLI.

To clone the repository, run:

git clone https://github.com/aws-samples/eventbridge-events-to-vpc.git

Creating the EKS cluster

In the example-vpc-application directory, use eksctl to create the EKS cluster using the config file.
```
cd example-vpc-application
eksctl create cluster --config-file eksctl_config.yaml
```
This takes a few minutes. This step creates an EKS cluster with one node group in us-east-1. The EKS cluster has a service account with IAM permissions to access the Secrets Manager secret you create later.
Use your AWS account’s default Amazon Elastic Container Registry (ECR) private registry to store the container image. First, follow these instructions to authenticate Docker to ECR. Next, run this command to create a new ECR repository. The create-repository command returns a repository URI (for example, 123456789.dkr.ecr.us-east-1.amazonaws.com/events-flask-app).
```
aws ecr create-repository --repository-name events-flask-app 
```
Use the repository URI in the following commands to build, tag, and push the container image to ECR.
```
docker build --tag events-flask-app .
docker tag events-flask-app:latest {repository-uri}:1
docker push {repository-uri}:1
```
In the Kuberenetes deployment manifest file (/example-vpc-application/manifests/deployment.yaml), fill in your repository URI and container image version (for example, 123456789.dkr.ecr.us-east-1.amazonaws.com/events-flask-app:1)

Deploy the Flask application and Application Load Balancer

Within the example-vpc-application directory, use kubectl to apply the Kubernetes manifest files. This step deploys the ALB, which takes time to create and you may receive an error message during the deployment (‘no endpoints available for service “aws-load-balancer-webhook-service”‘). Rerun the same command until the ALB is deployed and you no longer receive the error message.
```
kubectl apply --kustomize manifests/
```
Once the deployment is completed, verify that the Flask application is running by retrieving the Kubernetes pod logs. The first command retrieves a pod name to fill in for the second command.
```
kubectl get pod --namespace vpc-example-app
kubectl logs --namespace vpc-example-app {pod-name} --follow
```
You should see the Flask application outputting ‘Hello from my container!’ in response to GET request health checks.

Get VPC and ALB details

Next, you retrieve the security group ID, private subnet IDs, and ALB DNS Name to deploy the Lambda function connected to the same VPC and private subnet and send events to the ALB.

In the AWS Management Console, go to the VPC dashboard and find Subnets. Copy the subnet IDs for the two private subnets (for example, subnet name ‘eksctl-events-cluster/SubnetPrivateUSEAST1A’).
In the VPC dashboard, under Security, find the Security Groups tab. Copy the security group ID for ‘eksctl-events-cluster/ClusterSharedNodeSecurityGroup’.
Go to the EC2 dashboard. Under Load Balancing, find the Load Balancer tab. There is a load balancer associated with your VPC ID. Copy the DNS name for the load balancer, adding ‘http://’ as a prefix (for example, http://internal-k8s-vpcexamp-vpcexamp-c005e07d1a-1074647274.us-east-1.elb.amazonaws.com).

Create the Secrets Manager VPC endpoint

You need a VPC endpoint for your application to call Secrets Manager.

In the VPC dashboard, find the Endpoints tab and choose Create Endpoint. Select Secrets Manager as the service, and then select the VPC, private subnets, and security group that you copied in the previous step. Choose Create.

Deploy the event relay application

Deploy the event relay application using the AWS Serverless Application Model (AWS SAM) CLI:

Open a new terminal window and navigate to the event-relay directory. Run the following AWS SAM CLI commands to build the application and step through a guided deployment.
```
cd event-relay
sam build
sam deploy --guided
```
The guided deployment process prompts for input parameters. Enter ‘event-relay-app’ as the stack name and accept the default Region. For other parameters, submit the ALB and VPC details you copied: Url (ALB DNS name), security group ID, and private subnet IDs. For the Secret parameter, pass any value.The AWS SAM template saves this value as a Secrets Manager secret to authenticate calls to the container application. This is an example of how to pass secrets in the event relay HTTP call. Replace this with your own authentication method in production environments.
Accept the defaults for the remaining options. For ‘Deploy this changeset?’, select ‘y’. Here is an example of the deployment parameters.

Test the event relay application

Both the Flask application in a VPC and the event relay application are now deployed. To test the event relay application, keep the Kubernetes pod logs from a previous step open to monitor requests coming into the Flask application.

You can open a new terminal window and run this AWS CLI command to put an event on the bus, or go to the EventBridge console, find your event bus, and use the Send events UI.
```
aws events put-events \
--entries '[{"EventBusName": "event-relay-bus" ,"Source": "eventProducerApp", "DetailType": "inbound-event-sent", "Detail": "{ \"event-id\": \"123\", \"return-response-event\": true }"}]'
```
When the event is relayed to the Flask application, a POST request in the Kubernetes pod logs confirms that the application processed the event.
Navigate to the CloudWatch Logs groups in the AWS Management Console. In the ‘/aws/events/event-bus-relay-logs’ group, there are logs for the EventBridge events. In ‘/aws/lambda/EventRelayFunction’ stream, the Lambda function relays the inbound event and puts a new outbound event on the EventBridge bus.
You can test the SQS dead letter queue by creating an error. For example, you can manually change the Lambda function code in the console to pass an incorrect value for the secret. After sending a test event, navigate to the SQS queue in the console and poll for messages. The message shows the error message from the Flask application and the full event that failed to process.

Cleaning up

In the VPC dashboard in the AWS Management Console, find the Endpoints tab and delete the Secrets Manager VPC endpoint. Next, run the following commands to delete the rest of the example application. Be sure to run the commands in this order as some of the resources have dependencies on one another.

sam delete --stack-name event-relay-app
kubectl --namespace vpc-example-app delete ingress vpc-example-app-ingress

From the example-vpc-application directory, run this command.

eksctl delete cluster --config-file eksctl_config.yaml

Conclusion

Event-driven architectures and EventBridge can help you accelerate feature velocity and build scalable, fault tolerant applications. This post demonstrates how to send EventBridge events to a private endpoint in a VPC using a Lambda function to relay events and emit response events.

To learn more, read Getting started with event-driven architectures and visit EventBridge tutorials on Serverless Land.

Managing multi-tenant APIs using Amazon API Gateway

2022-06-22 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/managing-multi-tenant-apis-using-amazon-api-gateway/

This post is written by Satish Mane, Solutions Architect.

Many ISVs provide platforms as a service in a multi-tenant environment. You can achieve multi-tenancy by partitioning the platform based on customer identifiers such as customer ID or account ID. The architecture for multi-tenant environments is often composed of authentication, authorization, a service layer, queues, and databases.

The primary focus of these architectures is to simplify the addition of more features. The multi-tenant design pattern has opened up new challenges and opportunities for software vendors thanks to microservice architectures gaining popularity. The challenge in a multi-tenant environment is that excessive load by a single customer, because of many requests to an API, can affect the entire platform.

This blog post looks at how to protect and monetize multi-tenant APIs using Amazon API Gateway. It describes a multi-tenant architecture design pattern based on a custom tenant ID to onboard customers. A tenant in a multi-tenant platform represents the customer having a group of users with common access, but individuals having specific permissions to the platform.

Overview

This example protects multi-tenant platform REST APIs using Amazon Cognito, Amazon API Gateway, and AWS Lambda.

In the following sections, you learn how to use the API Gateway’s usage plans to protect and productize multi-tenant platforms. Usage plans enable throttling of excessive API requests and apply an API usage quota policy. The user authenticates using Amazon Cognito to get a JSON Web Token (JWT) that is passed to API Gateway for authorization.

The multi-tenant platform that exposes REST APIs has clients such as a mobile app, a web application, and API clients that consume the REST APIs. This post focuses on protecting REST APIs with Amazon Cognito as the security layer for authenticating users and issuing tokens using OpenID Connect. The token contains the customer identity information, such as the tenant ID to which the users belong. API Gateway throttles the requests from a tenant only after the limit defined in the usage plans exceeds.

This architecture shows the flow of user requests:

The client application sends a request to Amazon Cognito using the /oauth/authorize or /login API. Amazon Cognito authenticates the user credentials.
Amazon Cognito redirects using an authorization code grant and prompts the user to enter credentials. After authentication, it returns the authorization code.
It then passes the authorization code to obtain a JWT from Amazon Cognito.
Upon successful authentication, Amazon Cognito returns a JWT, such as acccess_token, id_token, refresh_token. The access/id token stores information about the granted permissions including tenant ID to which this user belongs to.
The client application invokes the REST API that is deployed in API Gateway. The API request passes the JWT as a bearer token in the API request Authorization header.
Since the tenant ID is hidden in the encrypted JWT token, the Lambda authorizer function validates and decodes the token, and extracts the tenant ID from the JWT.
The Lambda token authorizer function returns an IAM policy along with tenant ID from the decoded token to which a user belongs.
The application’s REST API is configured with usage plans against a custom API key, which is the tenant ID in API Gateway. API Gateway evaluates the IAM policy and looks up the usage policy using the API key. It throttles API requests if the number of requests exceed the throttle or quota limits in the usage policy.
If the number of API requests is within the limit, then API Gateway sends requests to the downstream application REST API. This could be deployed using containers, Lambda, or an Amazon EC2 instance.

Customer (tenant) onboarding

There are multiple ways to set up multi-tenant applications. You can either create tenant-specific pools or add tenant ID as a custom attribute in each user profile. This blog uses the latter approach. The tenant ID is added to the JWT after successful authentication.

Since, tenant ID is an API key in API Gateway, the length of tenant ID must be a minimum of 20 characters. You can define the structure of tenant ID such as <customer id>-<random string>. As part of tenant onboarding, you can automate configuring the API key and usage plans in API Gateway using CDK APIs. Here, you configure the API key and usage plan as part of the solution deployment itself.

Authentication and authorization

You need a user pool and application client enabled with the authorization code mechanism for authenticating users. API Gateway can verify JWT OAuth tokens against single Amazon Cognito user pools. To get tenant information (tenant ID), use a custom Lambda authorizer function in API Gateway to verify the token, extract the tenant id, and return to API Gateway.

API Gateway usage plans

API Gateway supports the usage plan feature for REST APIs only. This solution uses an integration point as a MOCK integration type. You can use the usage plan to set the throttle and quota limit that are associated with API keys. API keys can be generated or you can use a custom key. To enforce usage plans for each tenant separately, use tenant ID as a prefix to a uniquely generated value to prepare the custom API key.

Configure API Gateway to integrate API key and Usage plan

You need to enable REST API to use the API key and set the source to AUTHORIZER. There are two ways to accept API keys for every incoming request. You can supply it as part of the incoming request HEADER or via a custom authorizer Lambda function. This example uses a custom authorizer Lambda function to retrieve the API key that is extracted from the JWT received through an incoming API request. Customers only pass encrypted JWTs in the request authorization header. These steps are automated using the AWS CDK.

Pre-requisites

An AWS Account
An AWS Identity and Access Management (IAM) administrator access
The AWS Command Line Interface (AWS CLI)
Install AWS CDK

Deploying the example

The example source code is available on GitHub. To deploy and configure solution:

Clone the repository to your local machine.

git clone https://github.com/aws-samples/api-gateway-usage-policy-based-api-protection

Prepare the deployment package.

cdk synth
npm run build
npm install --prefix aws-usage-policy-stack/lambda/src

Configure the user pool in Amazon Cognito.
```
npx cdk deploy CognitoStack
```
Open the AWS Management Console and navigate to Amazon Cognito. Choose Manage user pool and select your user pool. Note down the pool ID under general settings.

Create a user with a tenant ID.

aws cognito-idp admin-create-user --user-pool-id <REPLACE WITH COGNITO POOL ID> --username <REPLACE WITH USERNAME> \
--user-attributes Name="given_name",Value="<REPLACE WITH FIRST NAME>" Name="family_name",Value="<REPLACE WITH LAST NAME>" " Name="custom:tenant_id",Value="<REPLACE WITH CUSTOMER ID>" \
--temporary-password change1t

To simplify testing the OAuth flow, use https://openidconnect.net/. In the configuration, set the JWKS well known URI.
https://cognito-idp.<REPLACE WITH AWS REGION>.amazonaws.com/<REPLACE WITH COGNITO POOL ID>/.well-known/openid-configuration
Test the OAuth flow with https://openidconnect.net/ to fetch the JWT ID token. Save the token in a text editor for later use.
Open aws-usage-policy-stack/app.ts in an IDE and replace “NOT_DEFINED” with the 20-character long tenant ID from the previous section.
Configure the user pool in API Gateway and create the Lambda function:
```
npx cdk deploy ApigatewayStack
```
After successfully deploying the API Gateway stack, open the AWS Management Console and select API Gateway. Locate ProductRestApi in the name column and note its ID.

Testing the example

Test the example using the following curl command. It throttles the requests to the deployed API based on defined limits and quotas. The following thresholds are preset: API quota limit of 5 requests/day, throttle limit of 10 requests/second, and a burst limit of 2 requests/second.

To simulate the scenario and start throttling requests.

Open a terminal window.
Install the curl utility if necessary.

Run the following command six times after replacing placeholders with the correct values.

curl -H "Authorization: Bearer <REPLACE WITH ID_TOKEN received in step 7 of Deploy Amazon Cognito Resources>" -X GET https://<REPLACE WITH REST API ID noted in step 10 of Deploy Amazon API Gateway resources>.execute-api.eu-west-1.amazonaws.com/dev/products.

You receive the message {“message”: “Limit Exceeded”} after you run the command for the sixth time. To repeat the tests, navigate to the API Gateway console. Change the quota limits in the usage plan and run the preceding command again. You can monitor HTTP/2 429 exceptions (Limit Exceeded) in API Gateway dashboard.

Any changes to usage plan limits do not need redeployment of the API in API Gateway. You can change limits dynamically. Changes take a few seconds to become effective.

Cleaning up

To avoid incurring future charges, clean up the resources created. To delete the CDK stack, use the following command. Since there are multiple stacks, you must explicitly specify the name of the stacks.

cdk destroy CognitoStack ApigatewayStack

Conclusion

This post covers the API Gateway usage plan feature to protect multi-tenant APIs from excessive request loads and also as a product offering that enforces customer specific usage quotas.

To learn more about Amazon API Gateway, refer to Amazon API Gateway documentation. For more serverless learning resources, visit Serverless Land.

Combining Amazon AppFlow with AWS Step Functions to maximize application integration benefits

2022-06-15 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/combining-amazon-appflow-with-aws-step-functions-to-maximize-application-integration-benefits/

This post is written by Ahmad Aboushady, Senior Technical Account Manager and Kamen Sharlandjiev, Senior Specialist Solution Architect, Integration.

In this blog post, you learn how to orchestrate AWS service integrations to reduce the manual steps in your workflow. The example uses AWS Step Functions SDK integration to integrate Amazon AppFlow and AWS Glue catalog without writing custom code. It automatically uses Amazon EventBridge to trigger Step Functions every time a new Amazon AppFlow flow finishes running.

Amazon AppFlow enables customers to transfer data securely between software as a service (SaaS) applications, like Salesforce, SAP, Zendesk, Slack, ServiceNow, and multiple AWS services.

An everyday use case of Amazon AppFlow is creating a customer-360 by integrating marketing, customer support, and sales data. For example, analyze the revenue impact of different marketing channels by synchronizing the revenue data from Salesforce with marketing data from Adobe Marketo.

This involves setting up flows to ingest data from different data sources and SaaS applications to AWS Data Lake based on Amazon S3. It uses AWS Glue to crawl and catalog this data. Customers use this catalog to access data quickly in several ways.

For example, they query the data using Amazon Athena or Amazon QuickSight for visualizations, business intelligence and anomaly detection. You can create those data flows quickly with no code required. However, to complete the next set of requirements, customers often go through multiple manual steps of provisioning and configuring different AWS resources. One such step requires creating AWS Glue crawler and running it with every Amazon AppFlow flow execution.

Step Functions can help us automate this process. This is a low-code workflow orchestration service that offers a visual workflow designer. You can quickly build workflows using the built-in drag-and-drop interface available in the AWS Management Console.

You can follow this blog and build your end-to-end state machine using the Step Functions Workflow Studio, or use the AWS Serverless Application Model (AWS SAM) template to deploy the example. The Step Functions state machine uses SDK integration with other AWS Services, so you don’t need to write any custom integration code.

Overview

The following diagram depicts the workflow with the different states in the state machine. You can group these into three phases: preparation, processing, and configuration.

The preparation phase captures all the configuration parameters and collects information about the metadata of the data, ingested by Amazon AppFlow.
The processing phase generates the AWS Glue table definition and sets the required parameters based on the destination file type. It iterates through the different columns and adds them as part of the table definition.
The last phase provides the Glue Catalog resources by creating or updating an existing AWS Glue table. With each Amazon AppFlow flow execution, the state machine determines if a new Glue table partition is required.

Preparation phase

The first state, “SetDatabaseAndContext”, is a pass state where you set the configuration parameters used in later states. Set the AWS Glue database and table name and capture the details of the data flow. You can do this by using the parameters filter to build a new JSON payload using parts of the state input similar to:

"Parameters": {
        "Config": {
          "Database": "<Glue-Database-Name>",
          "TableName.$": "$.detail['flow-name']",
          "detail.$": "$.detail"
        }
}

The following state, “DatabaseExist?” is an AWS SDK integration using a “GetDatabase” call to AWS Glue to ensure that the database exists. Here, the state uses error handling to intercept exception messages from the SDK call. This feature splits the workflow and adds an extra step if needed.

In this case, the SDK call returns an exception if the database does not exist, and the workflow invokes the “CreateDatabase” state. It moves to the “CleanUpError” state to clean up any errors and set the configuration parameters accordingly. Afterwards, with the database in place, the workflow continues to the next state: “DescribeFlow”. This returns the metadata of the Amazon AppFlow flow. Part of this metadata is the list of the object fields, which you must create in the Glue table and partitions.

Here is an error handling state that catches exceptions and routes the flow to execute an extra step:

"Catch": [
  {
    "ErrorEquals": [
      "States.ALL"
    ],
    "Comment": "Create Glue Database",
    "Next": "CreateDatabase",
    "ResultPath": "$.error"
  }
]

In the next state, “DescribeFlow”, you use the AWS SDK integration to get the Amazon AppFlow flow configuration. This uses the Amazon AppFlow “DescribeFlow” API call. It moves to “S3AsDestination?”, which is a choice state to check if S3 is a destination for the flow. Amazon AppFlow allows you to bring data into different purpose-built data stores, such as S3, Amazon Redshift, or external SaaS or data warehouse applications. This automation can only continue if the configured destination is S3.

The choice definition is:

"Choices": [
  {
    "Variable": "$.FlowConfig.DestinationFlowConfigList[0].ConnectorType",
    "StringEquals": "S3",
    "Next": "GenerateTableDefinition"
  }
],
"Default": "S3NotDestination"

Processing phase

The following state generates the base AWS Glue table definition based on the destination file type. Then it uses a map state to iterate and transform the Amazon AppFlow schema output into what the AWS Glue Data Catalog expects as input.

Next, add the “GenerateTableDefinition” state and use the parameters filter to build a new JSON payload output. Finally, use the information from the “DescribeFlow” state similar to:

"Parameters": {
  "Config.$": "$.Config",
  "FlowConfig.$": "$.FlowConfig",
  "TableInput": {
    "Description": "Created by AmazonAppFlow",
    "Name.$": "$.Config.TableName",
    "PartitionKeys": [
      {
        "Name": "partition_0",
        "Type": "string"
      }
    ],
    "Retention": 0,
    "Parameters": {
      "compressionType": "none",
      "classification.$": "$.FlowConfig.DestinationFlowConfigList[0].DestinationConnectorProperties['S3'].S3OutputFormatConfig.FileType",
      "typeOfData": "file"
    },
    "StorageDescriptor": {
      "BucketColumns": [],
      "Columns.$": "$.FlowConfig.Tasks[?(@.TaskType == 'Map')]",
      "Compressed": false,
      "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
      "Location.$": "States.Format('{}/{}/', $.Config.detail['destination-object'], $.FlowConfig.FlowName)",
      "NumberOfBuckets": -1,
      "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
      "SortColumns": [],
      "StoredAsSubDirectories": false
    },
    "TableType": "EXTERNAL_TABLE"
  }
}

The following state, “DestinationFileFormatEvaluator”, is a choice state to change the JSON payload according to the destination file type. Amazon AppFlow supports different file type conversions when S3 is the destination of your data. These formats are CSV, Parquet, and JSON Lines. AWS Glue uses various serialization libraries according to the file type.

You iterate within a map state to transform the AWS Glue table schema and set the column type to a known AWS Glue format. If the file type is unrecognized or does not have an equivalent in Glue, default this field to string. The map state configuration is defined as:

"Iterator": {
        "StartAt": "KnownFIleFormat?",
        "States": {
          "KnownFIleFormat?": {
            "Type": "Choice",
            "Choices": [
              {
                "Or": [
                  {
                    "Variable": "$.TaskProperties.SOURCE_DATA_TYPE",
                    "StringEquals": "boolean"
                  },
                  {
                    "Variable": "$.TaskProperties.SOURCE_DATA_TYPE",
                    "StringEquals": "double"
                  },
                  .
                  .
                  .
                  .
                  {
                    "Variable": "$.TaskProperties.SOURCE_DATA_TYPE",
                    "StringEquals": "timestamp"
                  }
                ],
                "Next": "1:1 mapping"
              }
            ],
            "Default": "Cast to String"
          },
          "1:1 mapping": {
            "Type": "Pass",
            "End": true,
            "Parameters": {
              "Name.$": "$.DestinationField",
              "Type.$": "$.TaskProperties.SOURCE_DATA_TYPE"
            }
          },
          "Cast to String": {
            "Type": "Pass",
            "End": true,
            "Parameters": {
              "Name.$": "$.DestinationField",
              "Type": "string"
            }
          }
        }
      },
"ItemsPath": "$.TableInput.StorageDescriptor.Columns",
"ResultPath": "$.TableInput.StorageDescriptor.Columns",

Configuration phase

The next stage in the workflow is “TableExist?”, which checks if the Glue table exists. If the state machine detects any error because the table does not exist, it moves to the “CreateTable” state. Alternatively, it goes to the “UpdateTable” state.

Both states use the AWS SDK integration to create or update the AWS Glue table definition using the “TableInput” parameter. AWS Glue operates with partitions. Every time you have new data stored in a new S3 prefix, you must update the table and add a new partition showing where the data sits.

You need an extra step to check if Amazon AppFlow has stored the data into a new S3 prefix or an existing one. In the “AddPartition?” State, you must review and determine the next step of your workflow. For example, you must validate that the flow executed successfully and processed data.

A choice state helps with those checks:

"And": [
            {
              "Variable": "$.Config.detail['execution-id']",
              "IsPresent": true
            },
            {
              "Variable": "$.Config.detail['status']",
              "StringEquals": "Execution Successful"
            },
            {
              "Not": {
                "Variable": "$.Config.detail['num-of-records-processed']",
                "StringEquals": "0"
              }
            }
          ]

Amazon AppFlow supports different types of flow execution. With scheduled flows, you can regularly configure Amazon AppFlow to hydrate a data lake by bringing only new data since its last execution. Sometimes, after a successful flow execution, there is no new data to ingest. The workflow concludes and moves to the success state in such cases. However, if there is new data, the state machine continues to the next state, “SingleFileAggregation?”.

Amazon AppFlow supports different file aggregation strategies and allows you to aggregate all ingested records into a single or multiple files. Depending on your flow configuration, it may store your data in a different S3 prefix with each flow execution.

In this state, you check this configuration to decide if you need a new partition for your AWS Glue table.

"Variable": "$.FlowConfig.DestinationFlowConfigList[0].DestinationConnectorProperties.S3.S3OutputFormatConfig.AggregationConfig.AggregationType",
"StringEquals": "SingleFile"

If the data flow aggregates all records into a single file per flow execution, it stores all data into a single S3 prefix. In this case, there is a single partition in your AWS Glue table. You must create that single partition the first time this state machine executes for a specific flow.

Use the AWS SDK integration to get the table partition from the AWS Glue in the “IsPartitionExist?” state. Conclude the workflow and move to the “Success” state if the partition exists. Otherwise, create that single partition in another state, “CreateMainPartition”.

If the flow run does not aggregate files, every flow run generates multiple files into a new S3 prefix. In this case, you add a new partition to the AWS Glue table. A pass state, “ConfigureDestination”, configures the required parameters for the partition creation:

"Parameters": {
        "InputFormat.$": "$.TableInput.StorageDescriptor.InputFormat",
        "OutputFormat.$": "$.TableInput.StorageDescriptor.OutputFormat",
        "Columns.$": "$.TableInput.StorageDescriptor.Columns",
        "Compressed.$": "$.TableInput.StorageDescriptor.Compressed",
        "SerdeInfo.$": "$.TableInput.StorageDescriptor.SerdeInfo",
        "Location.$": "States.Format('{}{}', $.TableInput.StorageDescriptor.Location, $.Config.detail['execution-id'])"
      },
 "ResultPath": "$.TableInput.StorageDescriptor"

Next, move to the “CreateNewPartition” state to use the AWS SDK integration to create a new partition to the Glue table similar to:

"Parameters": {
        "DatabaseName.$": "$.Config.Database",
        "TableName.$": "$.Config.TableName",
        "PartitionInput": {
          "Values.$": "States.Array($.Config.detail['execution-id'])",
          "StorageDescriptor.$": "$.TableInput.StorageDescriptor"
        }
      },
"Resource": "arn:aws:states:::aws-sdk:glue:createPartition"

This concludes the workflow with a “Succeed” state after configuring the AWS Glue table in response to the new Amazon AppFlow flow run.

Conclusion

This blog post explores how to integrate Amazon AppFlow and AWS Glue using Step Functions to automate your business requirements. You can use AWS Lambda to simplify the configuration phase and reduce state transitions or create complex checks, filters, or even data cleansing and preparation.

This approach allows you to tailor the schema conversion to your business requirements. Use this AWS SAM template, to deploy this example. This provides the Step Functions workflow described in this post and the EventBridge rule to trigger the state machine after each Amazon AppFlow flow run. The template also includes all required IAM roles and permissions.

For more serverless learning resources, visit Serverless Land.

Extending PowerShell on AWS Lambda with other services

2022-06-02 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/extending-powershell-on-aws-lambda-with-other-services/

This post expands on the functionality introduced with the PowerShell custom runtime for AWS Lambda. The previous blog explains how the custom runtime approach makes it easier to run Lambda functions written in PowerShell.

You can add additional functionality to your PowerShell serverless applications by importing PowerShell modules, which are shareable packages of code. Build your own modules or import from the wide variety of existing vendor modules to manage your infrastructure and applications.

You can also take advantage of the event-driven nature of Lambda, which allows you to run Lambda functions in response to events. Events can include an object being uploaded to Amazon S3, a message placed on an Amazon SQS queue, a scheduled task using Amazon EventBridge, or an HTTP request from Amazon API Gateway. Lambda functions support event triggers from over 200 AWS services and software as a service (SaaS) applications.

Adding PowerShell modules

You can add PowerShell modules from a number of locations. These can include modules from the AWS Tools for PowerShell, from the PowerShell Gallery, or your own custom modules. Lambda functions access these PowerShell modules within specific folders within the Lambda runtime environment.

You can include PowerShell modules via Lambda layers, within your function code package, or container image. When using .zip archive functions, you can use layers to package and share modules to use with your functions. Layers reduce the size of uploaded deployment archives and can make it faster to deploy your code. You can attach up to five layers to your function, one of which must be the PowerShell custom runtime layer. You can include multiple modules per layer.

The custom runtime configures PowerShell’s PSModulePath environment variable, which contains the list of folder locations to search to find modules. The runtime searches the folders in the following order:

1. User supplied modules as part of function package

You can include PowerShell modules inside the published Lambda function package in a /modules subfolder.

2. User supplied modules as part of Lambda layers

You can publish Lambda layers that include PowerShell modules in a /modules subfolder. This allows you to share modules across functions and accounts. Lambda extracts layers to /opt within the Lambda runtime environment so the modules are located in /opt/modules. This is the preferred solution to use modules with multiple functions.

3. Default/user supplied modules supplied with PowerShell

You can also include additional default modules and add them within a /modules folder within the PowerShell custom runtime layer.

For example, the following function includes four Lambda layers. One layer includes the custom runtime. Three additional layers include further PowerShell modules; the AWS Tools for PowerShell, your own custom modules, and third-party modules. You can also include additional modules with your function code.

Lambda layers

Within your PowerShell code, you can load modules during the function initialization (init) phase. This initializes the modules before the handler function runs, which speeds up subsequent warm-start invocations.

Adding modules from the AWS Tools for PowerShell

This post shows how to use the AWS Tools for PowerShell to manage your AWS services and resources. The tools are packaged as a set of PowerShell modules that are built on the functionality exposed by the AWS SDK for .NET. You can follow similar packaging steps to add other modules to your functions.

The AWS Tools for PowerShell are available as three distinct packages:

The AWS.Tools package is the preferred modularized version, which allows you to load only the modules for the services you want to use. This reduces package size and function memory usage. The AWS.Tools cmdlets support auto-importing modules without having to call Import-Module first. However, specifically importing the modules during the function init phase is more efficient and can reduce subsequent invoke duration. The AWS.Tools.Common module is required and provides cmdlets for configuration and authentication that are not service specific.

The accompanying GitHub repository contains the code for the custom runtime, along with a number of example applications. There are also module build instructions for adding a number of common PowerShell modules as Lambda layers, including AWS.Tools.

Building an event-driven PowerShell function

The repository contains an example of an event-driven demo application that you can build using serverless services.

A clothing printing company must manage its t-shirt size and color inventory. The printers store t-shirt orders for each day in a CSV file. The inventory service is one service that must receive the CSV file. It parses the file and, for each order, records the details to manage stock deliveries.

The stores upload the files to S3. This automatically invokes a PowerShell Lambda function, which is configured to respond to the S3 ObjectCreated event. The Lambda function receives the S3 object location as part of the $LambdaInput event object. It uses the AWS Tools for PowerShell to download the file from S3. It parses the contents and, for each line in the CSV file, sends the individual order details as an event to an EventBridge event bus.

In this example, there is a single rule to log the event to Amazon CloudWatch Logs to show the received event. However, you could route each order, depending on the order details, to different targets. For example, you can send different color combinations to SQS queues, which the dyeing service can use to order dyes. You could send particular size combinations to another Lambda function that manages cloth orders.

Example event-driven application

The previous blog post shows how to use the AWS Serverless Application Model (AWS SAM) to build a Lambda layer, which includes only the AWS.Tools.Common module to run Get-AWSRegion. To build a PowerShell application to process objects from S3 and send events to EventBridge, you can extend this functionality by also including the AWS.Tools.S3 and AWS.Tools.EventBridge modules in a Lambda layer.

Lambda layers, including S3 and EventBridge

Building the AWS Tools for PowerShell layer

You could choose to add these modules and rebuild the existing layer. However, the example in this post creates a new Lambda layer to show how you can have different layers for different module combinations of AWS.Tools. The example also adds the Lambda layer Amazon Resource Name (ARN) to AWS Systems Manager Parameter Store to track deployed layers. This allows you to reference them more easily in infrastructure as code tools.

The repository includes build scripts for both Windows and non-Windows developers. Windows does not natively support Makefiles. When using Windows, you can use either Windows Subsystem for Linux (WSL), Docker Desktop, or native PowerShell.

When using Linux, macOS, WSL, or Docker, the Makefile builds the Lambda layers. After downloading the modules, it also extracts the additional AWS.Tools.S3 and AWS.Tools.EventBridge modules.

# Download AWSToolsLayer module binaries
curl -L -o $(ARTIFACTS_DIR)/AWS.Tools.zip https://sdk-for-net.amazonwebservices.com/ps/v4/latest/AWS.Tools.zip
mkdir -p $(ARTIFACTS_DIR)/modules

# Extract select AWS.Tools modules (AWS.Tools.Common required)
unzip $(ARTIFACTS_DIR)/AWS.Tools.zip 'AWS.Tools.Common/**/*' -d $(ARTIFACTS_DIR)/modules/
unzip $(ARTIFACTS_DIR)/AWS.Tools.zip 'AWS.Tools.S3/**/*' -d $(ARTIFACTS_DIR)/modules/
unzip $(ARTIFACTS_DIR)/AWS.Tools.zip 'AWS.Tools.EventBridge/**/*' -d $(ARTIFACTS_DIR)/modules/

When using native PowerShell on Windows to build the layer, the build-AWSToolsLayer.ps1 script performs the same file copy functionality as the Makefile. You can use this option for Windows without WSL or Docker.

### Extract entire AWS.Tools modules to stage area but only move over select modules
…
Move-Item "$PSScriptRoot\stage\AWS.Tools.Common" "$PSScriptRoot\modules\" -Force
Move-Item "$PSScriptRoot\stage\AWS.Tools.S3" "$PSScriptRoot\modules\" -Force
Move-Item "$PSScriptRoot\stage\AWS.Tools.EventBridge" "$PSScriptRoot\modules\" -Force

The Lambda function code imports the required modules in the function init phase.

Import-Module "AWS.Tools.Common"
Import-Module "AWS.Tools.S3"
Import-Module "AWS.Tools.EventBridge"

For other combinations of AWS.Tools, amend the example build-AWSToolsLayer.ps1 scripts to add the modules you require. You can use a similar download and copy process, or PowerShell’s Save-Module to build layers for modules from other locations.

Building and deploying the event-driven serverless application

Follow the instructions in the GitHub repository to build and deploy the application.

The demo application uses AWS SAM to deploy the following resources:

PowerShell custom runtime.
Additional Lambda layer containing the AWS.Tools.Common, AWS.Tools.S3, and AWS.Tools.EventBridge modules from AWS Tools for PowerShell. The layer ARN is stored in Parameter Store.
S3 bucket to store CSV files.
Lambda function triggered by S3 upload.
Custom EventBridge event bus and rule to send events to CloudWatch Logs.

Testing the event-driven application

Use the AWS CLI or AWS Tools for PowerShell to copy the sample CSV file to S3. Replace BUCKET_NAME with your S3 SourceBucket Name from the AWS SAM outputs.

AWS CLI

aws s3 cp .\test.csv s3://BUCKET_NAME

AWS Tools for PowerShell

Write-S3Object -BucketName BUCKET_NAME -File .\test.csv

The S3 file copy action generates an S3 notification event. This invokes the PowerShell Lambda function, passing the S3 file location details as part of the function $LambdaInput event object.

The function downloads the S3 CSV file, parses the contents, and sends the individual lines to EventBridge, which logs the events to CloudWatch Logs.

Navigate to the CloudWatch Logs group /aws/events/demo-s3-lambda-eventbridge.

You can see the individual orders logged from the CSV file.

EventBridge logs showing CSV lines

Conclusion

You can extend PowerShell Lambda applications to provide additional functionality.

This post shows how to import your own or vendor PowerShell modules and explains how to build Lambda layers for the AWS Tools for PowerShell.

You can also take advantage of the event-driven nature of Lambda to run Lambda functions in response to events. The demo application shows how a clothing printing company builds a PowerShell serverless application to manage its t-shirt size and color inventory.

See the accompanying GitHub repository, which contains the code for the custom runtime, along with additional installation options and additional examples.

Start running PowerShell on Lambda today.

For more serverless learning resources, visit Serverless Land.

Optimizing your AWS Lambda costs – Part 2

2022-06-02 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/optimizing-your-aws-lambda-costs-part-2/

This post is written by Chris Williams, Solutions Architect and Thomas Moore, Solutions Architect, Serverless.

Part 1 of this blog series looks at optimizing AWS Lambda costs through right-sizing a function’s memory, and code tuning. We also explore how using Graviton2, Provisioned Concurrency and Compute Savings Plans can offer a reduction in per-millisecond billing.

Part 2 continues to explore cost optimization techniques for Lambda with a focus on architectural improvements and cost-effective logging.

Event filtering

A common serverless architecture pattern is Lambda reading events from a queue or a stream, such as Amazon SQS or Amazon Kinesis Data Streams. This uses an event source mapping, which defines how the Lambda service handles incoming messages or records from the event source.

Sometimes you don’t want to process every message in the queue or stream because the data is not relevant. For example, if IoT vehicle data is sent to a Kinesis Stream and you only want to process events where tire_pressure is < 32, then the Lambda code may look like this:

def lambda_handler(event, context):
   if(event[“tire_pressure”] >=32):
      return

# business logic goes here

This is inefficient as you are paying for Lambda invocations and execution time when there is no business value beyond filtering.

Lambda now supports the ability to filter messages before invocation, simplifying your code and reducing costs. You only pay for Lambda when the event matches the filter criteria and triggers an invocation.

Filtering is supported for Kinesis Streams, Amazon DynamoDB Streams and SQS by specifying filter criteria when setting up the event source mapping. For example, using the following AWS CLI command:

aws lambda create-event-source-mapping \
--function-name fleet-tire-pressure-evaluator \
--batch-size 100 \
--starting-position LATEST \
--event-source-arn arn:aws:kinesis:us-east-1:123456789012:stream/fleet-telemetry \
--filter-criteria '{"Filters": [{"Pattern": "{\"tire_pressure\": [{\"numeric\": [\"<\", 32]}]}"}]}'

After applying the filter, Lambda is only invoked when tire_pressure is less than 32 in messages received from the Kinesis Stream. In this example, it may indicate a problem with the vehicle and require attention.

For more information on how to create filters, refer to examples of event pattern rules in EventBridge, as Lambda filters messages in the same way. Event filtering is explored in greater detail in the Lambda event filtering launch blog.

Avoid idle wait time

Lambda function duration is one dimension used for calculating billing. When function code makes a blocking call, you are billed for the time that it waits to receive a response.

This idle wait time can grow when Lambda functions are chained together, or a function is acting as an orchestrator for other functions. For customers who have workflows such as batch operations or order delivery systems, this adds management overhead. Additionally, it may not be possible to complete all workflow logic and error handling within the maximum Lambda timeout of 15 minutes.

Instead of handling this logic in function code, re-architect your solution to use AWS Step Functions as an orchestrator of the workflow. When using a standard workflow, you are billed for each state transition within the workflow rather than the total duration of the workflow. In addition, you can move support for retries, wait conditions, error workflows and callbacks into the state condition allowing your Lambda functions to focus on business logic.

The following example shows an example Step Functions state machine, where a single Lambda function is split into multiple states. During the wait period, there is no charge. You are only billed on state transition.

Direct integrations

If a Lambda function is not performing custom logic when it integrates with other AWS services, it may be unnecessary and could be replaced by a lower-cost direct integration.

For example, you may be using API Gateway together with a Lambda function to read from a DynamoDB table:

This could be replaced using a direct integration, removing the Lambda function:

API Gateway supports transformations to return the output response in a format the client expects. This avoids having to use a Lambda function to do the transformation. You can find more detailed instructions on creating an API Gateway with an AWS service integration in the documentation.

You can also benefit from direct integration when using Step Functions. Today, Step Functions supports over 200 AWS services and 9,000 API actions. This gives greater flexibility for direct service integration and in many cases removes the need for a proxy Lambda function. This can simplify Step Function workflows and may reduce compute costs.

Reduce logging output

Lambda automatically stores logs that the function code generates through Amazon CloudWatch Logs. This may be useful for understanding what is happening within your application in near real-time. CloudWatch Logs includes a charge for the total data ingested throughout the month. Therefore, reducing output to include only necessary information can help reduce costs.

When you deploy workloads into production, review the logging level of your application. For example, in a pre-production environment, debug logs can be beneficial in providing additional information to tune the function. Within your production workloads, you may disable debug level logs and use a logging library (such as the Lambda Powertools Python Logger). You can define a minimum logging level to output by using an environment variable, which allows configuration outside of the function code.

Structuring your log format enforces a standard set of information through a defined schema, instead of allowing variable formats or large volumes of text. Defining structures such as error codes and adding accompanying metrics leads to a reduction in the volume of text that repeats throughout your logs. This also improves the ability to filter logs for specific error types and reduces the risk of a mistyped character in a log message.

Use Cost-Effective Storage for Logs

Once CloudWatch Logs ingests data, by default it is persisted forever with a per-GB monthly storage fee. As log data ages, it typically becomes less valuable in the immediate time frame, and is instead reviewed historically on an ad-hoc basis. However, the storage pricing within CloudWatch Logs remains the same.

To avoid this, set retention policies on your CloudWatch Logs log groups to delete old log data automatically. This retention policy applies to both your existing and future log data.

Some applications logs may need to persist for months or years for compliance or regulatory requirements. Instead of keeping the logs in CloudWatch Logs, export them to Amazon S3. By doing this, you can take advantage of lower-cost storage object classes while factoring in any expected usage patterns for how or when data is accessed.

Conclusion

Cost optimization is an important part of creating well-architected solutions and this is no different when using serverless. This blog series explores some best practice techniques to help reduce your Lambda bill.

If you are already running AWS Lambda applications in production today, some techniques are easier to implement than others. For example, you can purchase Savings Plans with zero code or architecture changes, whereas avoiding idle wait time will require new services and code changes. Evaluate which technique is right for your workload in a development environment before applying changes to production.

If you are still in the design and development stages, use this blog series as a reference to incorporate these cost optimization techniques at an early stage. This ensures that your solutions are optimized from day one.

To get hands-on implementing some of the techniques discussed, take the Serverless Optimization Workshop.

For more serverless learning resources, visit Serverless Land.

Optimizing your AWS Lambda costs – Part 1

2022-06-02 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/optimizing-your-aws-lambda-costs-part-1/

This post is written by Chris Williams, Solutions Architect and Thomas Moore, Solutions Architect, Serverless.

When you develop and architect solutions, cost-optimization should always be part of the process. This is no different for serverless applications that have been developed using AWS Lambda.

Your workloads may vary in terms of complexity, usage patterns, and technology. However, the following advice is applicable to all customers when deciding how to prioritize cost optimization with the tradeoffs of developing the application:

Efficient code makes better use of resources.
Consider the downstream services in architectural decisions.
Optimization should be a continuous cycle of improvement.
Prioritize changes that make the greatest improvements first.

The Optimizing your AWS Lambda costs blog series reviews operational and architectural guidance. It can be applied to existing Lambda functions and those that you develop in the future.

Introduction to Lambda pricing

Lambda pricing is calculated as a combination of:

Total number of requests
Total duration of invocations
Configured memory allocated

When you optimize Lambda functions, each of these components impacts the total monthly cost. This pricing applies after you exceed the AWS Free Tier that is offered for Lambda.

Right-Sizing

Right-sizing is a good starting point in the process of cost optimization. This exercise helps to identify the lowest cost for applications without affecting performance or requiring code changes.

For Lambda, this is accomplished by configuring the memory for a function, ranging anywhere from 128 MB up to 10,240 MB (10 GB). By adjusting the memory configuration, you also adjust the amount of vCPU that is available to the function during invocation. Tuning these settings provides memory- or CPU-bound applications with access to additional resources during the execution, which may lead to an overall reduced duration of invocation.

Identifying the optimal configuration for your Lambda functions may be manually intensive, especially if changes are made frequently. The AWS Lambda Power Tuning tool provides a solution powered by AWS Step Functions that can help identify the appropriate configuration. It analyzes a set of memory configurations against an example payload.

In the following example, as the memory increases for this Lambda function, the total invocation time improves. This leads to a reduction in the cost for the total execution without affecting the original performance of the function. For this function, the optimal memory configuration for the function is 512 MB, as this is where the resource utilization is most efficient for the total cost of each invocation. This varies per function, and using the tool on your Lambda functions can identify if they benefit from right-sizing.

This exercise should be completed regularly, especially if code is released frequently. Once you have identified the appropriate memory setting for your Lambda functions, you should add right-sizing to your processes. The AWS Lambda Power Tuning tool generates programmatic output that can be used by your CI/CD workflows during the release of new code, allowing the automation of memory configuration.

Performance efficiency

A key component to Lambda pricing is the total duration of the invocation. The longer the function takes to run, the more it costs and the higher the latency in your application. For this reason, it is important to ensure the code you write is as efficient as possible and follows the Lambda best practices.

At a high level, to optimize code :

Minimize deployment package size to its runtime necessities. This reduces the amount of time it takes for the package to be downloaded and unpacked.
Minimize the complexity of dependencies. Simpler frameworks often load faster.
Take advantage of execution reuse. Initialize SDK clients and database connections outside the function handler, and cache static assets locally in the /tmp directory. Subsequent invocations can reuse open connections and resources in memory and in /tmp.
Follow general coding performance best practices for your chosen language and runtime.

To help visualize the components of your application and identify performance bottlenecks, use AWS X-Ray with Lambda. You can enable X-Ray active tracing on new and existing functions by editing the function configuration. For example, with the AWS CLI:

aws lambda update-function-configuration --function-name my-function \
--tracing-config Mode=Active

The AWS X-Ray SDK can be used to trace all AWS SDK calls inside Lambda functions. This helps to identify any bottlenecks in the application performance. The X-Ray SDK for Python can be used to capture data for other libraries such as requests, sqlite3, and httplib, as shown in the following example:

Amazon CodeGuru Profiler is another tool that can help with code optimization. It uses machines learning algorithms to help find the most expensive lines of code and suggests ways to improve efficiency. It can be enabled on existing Lambda functions by enabling code profiling in the function configuration.

The CodeGuru console shows the results as a series of visualizations and recommendations.

Use these tools together with the documented best practices to evaluate your code’s performance when developing your serverless applications. Efficient code often means faster applications and reduced costs.

AWS Graviton2

In September 2021, Lambda functions powered by Arm-based AWS Graviton2 processors became generally available. Graviton2 functions are designed to deliver up to 19% better performance at 20% lower cost than x86. In addition to the lower billing cost when using Arm, you could also see a reduction in the function duration due to the CPU performance improvement, reducing costs even further.

You can configure both new and existing functions to target the AWS Graviton2 processor. Functions are invoked in the same way and integrations with services, applications and tools are not affected by the architecture change. Many functions only need the configuration change to take advantage of the price/performance of Graviton2. Others may require repackaging to use Arm-specific dependencies.

It’s always recommended to test your workloads before making the change. To see how much your code benefits from using Graviton2, use the Lambda Power Tuning tool to compare against x86. The tool allows you to compare two results on the same chart:

Provisioned concurrency

Where customers are looking to reduce their Lambda function cold starts or avoid burst throttling, provisioned concurrency provides execution environments that are ready to be invoked. It can also reduce total Lambda cost when there is a consistent volume of traffic. This is because the provisioned concurrency pricing model offers a lower total price when it is fully used.

Similar to the standard Lambda function pricing, there are price components for total requests, total duration, and memory configuration. In addition, there is the cost of each provisioned concurrency environment (based on its memory configuration). When this execution environment is fully utilized, the combined cost of the invocation and the execution environment can offer up to 16% savings on duration cost when compared to the regular on-demand pricing.

If you cannot maximize usage in an execution environment, provisioned concurrency can still offer a lower total price per invocation. In the following example, once it’s consumed for more than 60% of the available time, it becomes cheaper than using the on-demand pricing model. The savings increase in line with capacity usage.

To identify the invocation baseline of a Lambda function, look at the average concurrent execution metrics per hour over the previous 24 hours. This helps you to find a consistent baseline throughout the day where you are consistently using multiple execution environments.

For Lambda functions where peak invocation levels are only expected during particular windows of time, take advantage of a scheduled scaling action. Where traffic patterns are not easy to determine, you can implement Application Auto Scaling to adjust based on the current level of utilization.

Compute savings plans

AWS Savings Plans is a flexible pricing model offering lower prices compared to on-demand pricing, in exchange for a specific usage commitment (measured in $/hour) for a one- or three-year period.

Compute Savings Plans include Amazon EC2, AWS Fargate and Lambda. Lambda usage for duration and provisioned concurrency are charged at a discounted Savings Plans rate of up to 17% for a 1- or 3-year term.

You can implement Savings Plans without any function code or configuration changes. They can be a simpler way to save money for Lambda-based workloads. Before deciding to use a savings plan, analyze previous patterns to understand any variations in your month to month usage.

Conclusion

This blog post explains how Lambda pricing works and how right-sizing applications and tuning them for performance efficiency offers a more cost-efficient utilization model. The results can also reduce latency, creating a better experience for your end users.

The post explores how architectural changes such as moving to Graviton2 and configuring provisioned concurrency can provide cost reductions for the same operations. Finally, you can use Compute Savings Plans to add an additional cost reduction once you establish a baseline of usage per month.

Part 2 introduces further optimization opportunities for reducing Lambda invocations, moving to an asynchronous model, and reducing logging costs.

For more serverless learning resources, visit Serverless Land.

Testing Amazon EventBridge events using AWS Step Functions

2022-06-01 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/testing-amazon-eventbridge-events-using-aws-step-functions/

This post is written by Siarhei Kazhura, Solutions Architect and Riaz Panjwani, Solutions Architect.

Amazon EventBridge is a serverless event bus that can be used to ingest and process events from a variety of sources, such as AWS services and SaaS applications. With EventBridge, developers can build loosely coupled and independently scalable event-driven applications.

It can be useful to know with EventBridge when events are not able to reach the desired destination. This can be caused by multiple factors, such as:

Event pattern does not match the event rule
Event transformer failure
Event destination expects a different payload (for example, API destinations) and returns an error

EventBridge sends metrics to Amazon CloudWatch, which allows for the detection of failed invocations on a given event rule. You can also use EventBridge rules with a dead-letter queue (DLQ) to identify any failed event deliveries. The messages delivered to the queue contain additional metadata such as error codes, error messages, and the target ARN for debugging.

However, understanding why events fail to deliver is still a manual process. Checking CloudWatch metrics for failures, and then the DLQ takes time. This is evident when developing new functionality, when you must constantly update the event matching patterns and event transformers, and run tests to see if they provide the desired effect. EventBridge sandbox functionality can help with manual testing but this approach does not scale or help with automated event testing.

This post demonstrates how to automate testing for EventBridge events. It uses AWS Step Functions for orchestration, along with Amazon DynamoDB and Amazon S3 to capture the results of your events, Amazon SQS for the DLQ, and AWS Lambda to invoke the workflows and processing.

Overview

Using the solution provided in this post, users can track events from its inception to delivery and identify where any issues or errors are occurring. This solution is also customizable, and can incorporate integration tests against events to test pattern matching and transformations.

At a high level:

The event testing workflow is exposed via an API Gateway endpoint, and users can send a request.
This request is validated and routed to a Step Functions EventTester workflow, which performs the event test.
The EventTester workflow creates a sample event based on the received payload, and performs multiple tests on the sample event.
The sample event is matched against the rule that is being tested. The results are stored in an Amazon DynamoDB EventTracking table, and the transformed event payload is stored in the TransformedEventPayload Amazon S3 bucket.
The EventTester workflow has an embedded AWS Step Functions workflow called EventStatusPoller. The EventStatusPoller workflow polls the EventTracking table.
The EventStatusPoller workflow has a customizable 10-second timeout. If the timeout is reached, this may indicate that the event pattern does not match. EventBridge tests if the event does not match against a given pattern, using the AWS SDK for EventBridge.
After completing the tests, the response is formatted and sent back to the API Gateway. By default, the timeout is set to 15 seconds.
API Gateway processes the response, strips the unnecessary elements, and sends the response back to the issuer. You can use this response to verify if the test event delivery is successful, or identify the reason a failure occurred.

EventTester workflow

After an API call, this event is sent to the EventTester express workflow. This orchestrates the automated testing, and returns the results of the test.

In this workflow:

1. The test event is sent to EventBridge to see if the event matches the rule and can be transformed. The result is stored in a DynamoDB table.
2. The PollEventStatus synchronous Express Workflow is invoked. It polls the DynamoDB table until a record with the event ID is found or it reaches the timeout. The configurable timeout is 15 seconds by default.
3. If a record is found, it checks the event status.

From here, there are three possible states. In the first state, if the event status has succeeded:

4. The response from the PollEventStatus workflow is parsed and the payload is formatted.
5. The payload is stored in an S3 bucket.
6. The final response is created, which includes the payload, the event ID, and the event status.
7. The execution is successful, and the final response is returned to the user.

In the second state, if no record is found in the table and the PollEventStatus workflow reaches the timeout:

8. The most likely explanation for reaching the timeout is that the event pattern does not match the rule, so the event is not processed. You can build a test to verify if this is the issue.
9. From the EventBridge SDK, the TestEventPattern call is made to see if the event pattern matches the rule.
10. The results of the TestEventPattern call are checked.
11. If the event pattern does not match the rule, then the issue has been successfully identified and the response is created to be sent back to the user. If the event pattern matches the rule, then the issue has not been identified.
12. The response shows that this is an unexpected error.

In the third state, this acts as a catch-all to any other errors that may occur:

13. The response is created with the details of the unexpected error.
14. The execution has failed, and the final response is sent back to the user.

Event testing process

The following diagram shows how events are sent to EventBridge and their results are captured in S3 and DynamoDB. This is the first step of the EventTester workflow:

When the event is tested:

The sample event is received and sent to the EventBridge custom event bus.
A CatchAll rule is triggered, which captures all events on the custom event bus.
All events from the CatchAll rule are sent to a CloudWatch log group, which allows for an original payload inspection.
The event is also propagated to the EventTesting rule. The event is matched against the rule pattern, and if successful the event is transformed based on the transformer provided.
If the event is matched and transformed successfully, the Lambda function EventProcessor is invoked to process the transformed event payload. You can add additional custom code to this function for further testing of the event (for example, API integration with the transformed payload).
The event status is updated to SUCCESS and the event metadata is saved to the EventTracking DynamoDB table.
The transformed event payload is saved to the TransformedEventPayload S3 bucket.
If there’s an error, EventBridge sends the event to the SQS DLQ.
The Lambda function ErrorHandler polls the DLQ and processes the errors in batches.
The event status is updated to ERROR and the event metadata is saved to the EventTracking DynamoDB table.
The event payload is saved to the TransformedEventPayload S3 bucket.

EventStatusPoller workflow

When the poller runs:

It checks the DynamoDB table to see if the event has been processed.
The result of the poll is checked.
If the event has not been processed, the workflow loops and polls the DynamoDB table again.
If the event has been processed, the results of the event are passed to next step in the Event Testing workflow.

Visit Composing AWS Step Functions to abstract polling of asynchronous services for additional details.

Testing at scale

The EventTester workflow uses Express Workflows, which can handle testing high volume event workloads. For example, you can run the solution against large volumes of historical events stored in S3 or CloudWatch.

This can be achieved by using services such as Lambda or AWS Fargate to read the events in batches and run tests simultaneously. To achieve optimal performance, some performance tuning may be required depending on the scale and events that are being tested.

To minimize the cost of the demo, the DynamoDB table is provisioned with 5 read capacity units and 5 write capacity units. For a production system, consider using on-demand capacity, or update the provisioned table capacity.

Event sampling

In this implementation, the EventBridge EventTester can be used to periodically sample events from your system for testing:

Any existing rules that must be tested are provisioned via the AWS CDK.
The sampling rule is added to an existing event bus, and has the same pattern as the rule that is tested. This filters out events that are not processed by the tested rule.
SQS queue is used for buffering.
Lambda function processes events in batches, and can optionally implement sampling. For example, setting a 10% sampling rate will take one random message out of 10 messages in a given batch.
The event is tested against the endpoint provided. Note that the EventTesting rule is also provisioned via AWS CDK from the same code base as the tested rule. The tested rule is replicated into the EventTesting workflow.
The result is returned to a Lambda function, and is then sent to CloudWatch Logs.
A metric is set based on the number of ERROR responses in the logs.
An alarm is configured when the ERROR metric crosses a provided threshold.

This sampling can complement existing metrics exposed for EventBridge via CloudWatch.

Solution walkthrough

To follow the solution walkthrough, visit the solution repository. The walkthrough explains:

Prerequisites required.
Detailed solution deployment walkthrough.
Solution customization and testing.
Cleanup process.
Cost considerations.

Conclusion

This blog post outlines how to use Step Functions, Lambda, SQS, DynamoDB, and S3 to create a workflow that automates the testing of EventBridge events. With this example, you can send events to the EventBridge Event Tester endpoint to verify that event delivery is successful or identify the root cause for event delivery failures.

For more serverless learning resources, visit Serverless Land.

Amazon EMR Serverless Now Generally Available – Run Big Data Applications without Managing Servers

2022-06-01 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/amazon-emr-serverless-now-generally-available-run-big-data-applications-without-managing-servers/

At AWS re:Invent 2021, we introduced three new serverless options for our data analytics services – Amazon EMR Serverless, Amazon Redshift Serverless, and Amazon MSK Serverless – that make it easier to analyze data at any scale without having to configure, scale, or manage the underlying infrastructure.

Today we announce the general availability of Amazon EMR Serverless, a serverless deployment option for customers to run big data analytics applications using open-source frameworks like Apache Spark and Hive without configuring, managing, and scaling clusters or servers.

With EMR Serverless, you can run analytics workloads at any scale with automatic scaling that resizes resources in seconds to meet changing data volumes and processing requirements. EMR Serverless automatically scales resources up and down to provide just the right amount of capacity for your application, and you only pay for what you use.

During the preview, we heard from customers that EMR Serverless is cost-effective because they do not incur cost from having to overprovision resources to deal with demand spikes. They do not have to worry about right-sizing instances or applying OS updates, and can focus on getting products to market faster.

Amazon EMR provides various deployment options to run applications to fit varied needs such as EMR clusters on Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Kubernetes Service (Amazon EKS) clusters, AWS Outposts, or EMR Serverless.

EMR on Amazon EC2 clusters is suitable for customers that need maximum control and flexibility over how to run their application. With EMR clusters, customers can choose the EC2 instance type to enhance the performance of certain applications, customize the Amazon Machine Image (AMI), choose EC2 instance configuration, customize, and extend open-source frameworks and install additional custom software on cluster instances.
EMR on Amazon EKS is suitable for customers that want to standardize on EKS to manage clusters across applications or use different versions of an open-source framework on the same cluster.
EMR on AWS Outposts is for customers who want to run EMR closer to their data center within an Outpost.
EMR Serverless is suitable for customers that want to avoid managing and operating clusters, and simply want to run applications using open-source frameworks.

Also, when you build an application using an EMR release (for example, a Spark job using EMR release 6.4), you can choose to run it on an EMR cluster, EMR on EKS, or EMR Serverless without having to rewrite the application. This allows you to build applications for a given framework version and retain the flexibility to change the deployment model based on future operational needs.

Getting Started with Amazon EMR Serverless
To get started with EMR Serverless, you can use Amazon EMR Studio, a free EMR feature which provides an end to end development and debugging experience. With EMR Studio, you can create EMR Serverless applications (Spark or Hive), choose the version of open-source software for your application, submit jobs, check the status of running jobs, and invoke Spark UI or Tez UI for job diagnostics.

When you select the Get started button in the EMR Serverless Console, you can create and set up EMR Studio with preconfigured EMR Serverless applications.

In EMR Studio, when you choose Applications in the Serverless menu, you can create one or more EMR Serverless applications and choose the open source framework and version for your use case. If you want separate logical environments for test and production or for different line-of-business use cases, you can create separate applications for each logical environment.

An EMR Serverless application is a combination of (a) the EMR release version for the open-source framework version you want to use and (b) the specific runtime that you want your application to use, such as Apache Spark or Apache Hive.

When you choose Create application, you can set your application Name, Type of either Spark or Hive, and supported Release version. You can also select the option of default or custom settings for pre-initialized capacity, application limits, and Amazon Virtual Private Cloud (Amazon VPC) connectivity options. Each EMR Serverless application is isolated from other applications and runs within a secure VPC.

Use the default option if you want jobs to start immediately. But charges apply for each worker when the application is started. To learn more about pre-initialized capacity, see Configuring and managing pre-initialized capacity.

When you select Start application, your application is setup to start with pre-initialized capacity of 1 Spark driver and 1 Spark executor. Your application is by default configured to start when jobs are submitted and stop when the application is idle for more than 15 minutes.

You can customize these settings and setup different application limits by selecting Choose custom settings.

In the Job runs menu, you can see a list of run jobs for your application.

Choose Submit job and set up job details such as the name, AWS Identity and Access Management (IAM) role used by the job, script location, and arguments of the JAR or Python script in the Amazon Simple Storage Service (Amazon S3) bucket that you want to run.

If you want logs for your Spark or Hive jobs to be submitted to your S3 bucket, you will need to setup the S3 bucket in the same Region where you are running EMR Serverless jobs.

Optionally, you can set additional configuration properties that you can specify for each job, such as Spark properties, job configurations to override the default configurations for applications (such as using the AWS Glue Data Catalog as its metastore), storing logs to Amazon S3, and retaining logs for 30 days.

The following is an example of running a Python script using the StartJobRun API.

$ aws emr-serverless start-job-run \
    --application-id <application_id> \
    --execution-role-arn <iam_role_arn> \
    --job-driver '{
        "sparkSubmit": {
            "entryPoint": "s3://spark-scripts/scripts/spark-etl.py",
            "entryPointArguments": "s3://spark-scripts/output",
            "sparkSubmitParameters": "--conf spark.executor.cores=1 --conf spark.executor.memory=4g --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1"
        }
    }' \
    --configuration-overrides '{
        "monitoringConfiguration": {
           "s3MonitoringConfiguration": {
             "logUri": "s3://spark-scripts/logs/"
           }
        }
    }'

You can check on job results in your S3 bucket. For details, you can use Spark UI for Spark Application, and Hive/Tez UI in the Job runs menu to understand how the job ran or to debug it if it failed.

For more debugging, EMR Serverless will push event logs to the sparklogs folder in your S3 log destination for Spark applications. In the case of Hive applications, EMR Serverless will continuously upload the Hive driver and Tez tasks logs to the HIVE_DRIVER or TEZ_TASK folders of your S3 log destination. To learn more, see Logging in the AWS documentation.

Things to Know
With EMR Serverless, you can get all the benefits of running Amazon EMR. I want to quote some things to know about EMR Serverless from an AWS Big Data Blog post of preview announcements:

Automatic and fine-grained scaling – EMR Serverless automatically scales up workers at each stage of processing your job and scales them down when they’re not required. You’re charged for aggregate vCPU, memory, and storage resources used from the time a worker starts running until it stops, rounded up to the nearest second with a 1-minute minimum. For example, your job may require 10 workers for the first 10 minutes of processing the job and 50 workers for the next 5 minutes. With fine-grained automatic scaling, you only incur cost for 10 workers for 10 minutes and 50 workers for 5 minutes. As a result, you don’t have to pay for underutilized resources.
Resilience to Availability Zone failures – EMR Serverless is a Regional service. When you submit jobs to an EMR Serverless application, it can run in any Availability Zone in the Region. In case an Availability Zone is impaired, a job submitted to your EMR Serverless application is automatically run in a different (healthy) Availability Zone. When using resources in a private VPC, EMR Serverless recommends that you specify the private VPC configuration for multiple Availability Zones so that EMR Serverless can automatically select a healthy Availability Zone.
Enable shared applications – When you submit jobs to an EMR Serverless application, you can specify the IAM role that must be used by the job to access AWS resources such as S3 objects. As a result, different IAM principals can run jobs on a single EMR Serverless application, and each job can only access the AWS resources that the IAM principal is allowed to access. This enables you to set up scenarios where a single application with a pre-initialized pool of workers is made available to multiple tenants wherein each tenant can submit jobs using a different IAM role but use the common pool of pre-initialized workers to immediately process requests.

Now Available
Amazon EMR Serverless is available in US East (N. Virginia), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo) Regions. With EMR Serverless, there are no upfront costs, and you pay only for the resources you use. You pay for the amount of vCPU, memory, and storage resources consumed by your applications. For pricing details, see the EMR Serverless pricing page.

To learn more, visit the Amazon EMR Serverless User Guide. Please send feedback to AWS re:Post for Amazon EMR Serverless or through your usual AWS support contacts.

Learn all the details about Amazon EMR Serverless and get started today.

– Channy

Building resilient private APIs using Amazon API Gateway

2022-05-27 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/building-resilient-private-apis-using-amazon-api-gateway/

This post written by Giedrius Praspaliauskas, Senior Solutions Architect, Serverless.

Modern architectures meet recovery objectives (recovery time objective, RTO, and recovery point objective, RPO) by being resilient to routine and unexpected infrastructure disruptions. Depending on the recovery objectives and regulatory requirements, developers must choose the disaster recovery strategy. For more on disaster recovery strategies, see this whitepaper.

AWS’ global infrastructure enables customers to support applications with near zero RTO requirements. Customers can run workloads in multiple Regions, in a multi-site active/active manner, and serve traffic from all Regions. To do so, developers often must implement private multi-regional APIs for use by the applications.

This blog describes how to implement this solution using Amazon API Gateway and Amazon Route 53.

Overview

The first step is to build a REST API that uses private API Gateway endpoints with custom domain names as described in this sample. The next step is to deploy APIs in two AWS Regions and configure Route 53, following a disaster recovery strategy.
This architecture uses the following resources in each Region:

Two region architecture example

API Gateway with an AWS Lambda function as integration target.
Amazon Virtual Private Cloud (Amazon VPC) with two private subnets, used to deploy VPC endpoint and Network Load Balancers (NLB).
AWS Transit Gateway to establish connectivity between the two VPCs in different Regions.
VPC endpoint to access API Gateway from a private VPC.
Elastic network interfaces (ENIs) created by the VPC endpoint.
A Network Load Balancer with ENIs in its target group and a TLS listener with an AWS Certificate Manager (ACM) certificate, used as a facade for the API.
ACM issues and manages certificates used by the NLB TLS listener and API Gateway custom domain.
Route 53 with private hosted zones used for DNS resolution.
Amazon CloudWatch with alarms used for Route 53 health checks.

This sample implementation uses a Lambda function as an integration target, though it can target on-premises resources accessible via AWS Direct Connect, and applications running on Amazon EKS. For more information on best practices designing API Gateway private APIs, see this whitepaper.

This post uses AWS Transit Gateway with inter-Region peering to establish connectivity between the two VPCs. Depending on the networking needs and infrastructure already in place, you may tailor the architecture and use a different approach. Read this whitepaper for more information on available VPC-to-VPC connectivity options.

Implementation

Prerequisites

You can use existing infrastructure to deploy private APIs. Otherwise check the sample repository for templates and detailed instructions on how to provision the necessary infrastructure.

This post uses the AWS Serverless Application Model (AWS SAM) to deploy private APIs with custom domain names. Visit the documentation to install it in your environment.

Deploying private APIs into multiple Regions

To deploy private APIs with custom domain names and CloudWatch alarms for health checks into the two AWS Regions:

Download the AWS SAM template file api.yaml from the sample repository. Replace default parameter values in the template with ones that match your environment or provide them during the deployment step.

Navigate to the directory containing the template. Run following commands to deploy the API stack in the us-east-1 Region:

sam build --template-file api.yaml
sam deploy --template-file api.yaml --guided --stack-name private-api-gateway --region us-east-1

Repeat the deployment in the us-west-2 Region. Update the parameters values in the template to match the second Region (or provide them as an input during the deployment step). Run the following commands:
```
sam build --template-file api.yaml
sam deploy --template-file api.yaml --guided --stack-name private-api-gateway --region us-west-2
```

Setting up Route 53

With the API stacks deployed in both Regions, create health checks to use them in Route 53:

Navigate to Route 53 in the AWS Management Console. Use the CloudWatch alarms created in the previous step to create health checks (one per Region):

Configuring the health check for region 1

Configuring the health check for region 2

This Route 53 health check uses the CloudWatch alarms that are based on a static threshold of the NLB healthy host count metric in each Region. In your implementation, you may need more sophisticated API health status tracking. It can use anomaly detection, include metric math, or use a composite state of the multiple alarms. Check this documentation article for more information on CloudWatch alarms. You can also use the approach documented in this blog post as an alternative. It will help you to create a health check solution that observes the state of the private resources and creates custom metrics that are more specific to your use case. For example, such metrics could include increased database transactions’ failure rate, number of timed out requests to a downstream legacy system, status of an external system that your workload depends on, etc.
Create a private Route 53 hosted zone. Associate this with the VPCs where the client applications that access private APIs reside:

Create a private hosted zone
Create Route 53 private zone alias records, pointing to the NLBs in both Regions and VPCs, using the same name for both records (for example, private.internal.example.com):

Create alias records

This post uses Route 53 latency-based routing for private DNS to implement resilient active-active private API architecture. Depending on your use case and disaster recovery strategy, you can change this approach and use geolocation-based routing, failover, or weighted routing. See the documentation for more details on supported routing policies for the records in a private hosted zone.

In this implementation, client applications that connect to the private APIs reside in the VPCs and can access Route 53 private hosted zones. You may also operate an application that runs on-premises and must access the private APIs. Read this blog post for more information on how to create DNS naming that spans the entire network.

Validating the configuration

To validate this implementation, I use a bastion instance in each of the VPCs. I connect to them using SSH or AWS Systems Manager Session Manager (see this documentation article for details).

Run the following command from both bastion instances:
```
dig +short private.internal.com
```
The response should contain IP addresses of the NLB in one VPC:
```
10.2.2.188
10.2.1.211
```
After DNS resolution verification, run the following command in each of the VPCs:
```
curl -s -XGET https://private.internal.example.com/demo
```
The response should include event data as the Lambda function received it.
To simulate an outage, navigate to Route 53 in the AWS Management Console. Select the health check that corresponds to the Region where you received the response, and invert the health status:
After a few minutes, retry the same DNS resolution and API response validation steps. This time, it routes all your requests to the remaining healthy API stack.

Cleaning Up

To avoid incurring further charges, delete all the resources that you have created in Route 53 records, health checks, and private hosted zones.

Run the following commands to delete API stacks in both Regions:

sam delete --stack-name private-api-gateway --region us-west-2
sam delete --stack-name private-api-gateway --region us-east-1

If you used the sample templates to provision the infrastructure, follow the steps listed in the Cleanup section of the sample repository instructions.

Conclusion

This blog post walks through implementing a multi-Regional private API using API Gateway with custom domain names. This approach allows developers to make their internal applications and workloads more resilient, react to disruptions, and meet their disaster recovery scenario objectives.

For additional guidance on such architectures, including multi-Region application architecture, see this solution, blog post, re:Invent presentation.

For more serverless learning resources, visit Serverless Land.

Introducing the PowerShell custom runtime for AWS Lambda

2022-05-25 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-the-powershell-custom-runtime-for-aws-lambda/

The new PowerShell custom runtime for AWS Lambda makes it even easier to run Lambda functions written in PowerShell to process events. Using this runtime, you can run native PowerShell code in Lambda without having to compile code, which simplifies deployment and testing. You can also now view PowerShell code in the AWS Management Console, and have more control over function output and logging.

Lambda has supported running PowerShell since 2018. However, the existing solution uses the .NET Core runtime implementation for PowerShell. It uses the additional AWSLambdaPSCore modules for deployment and publishing, which require compiling the PowerShell code into C# binaries to run on .NET. This adds additional steps to the development process.

This blog post explains the benefits of using a custom runtime and information about the runtime options. You can deploy and run an example native PowerShell function in Lambda.

PowerShell custom runtime benefits

The new custom runtime for PowerShell uses native PowerShell, instead of compiling PowerShell and hosting it on the .NET runtime. Using native PowerShell means the function runtime environment matches a standard PowerShell session, which simplifies the development and testing process.

You can now also view and edit PowerShell code within the Lambda console’s built-in code editor. You can embed PowerShell code within an AWS CloudFormation template, or other infrastructure as code tools.

PowerShell code in Lambda console

This custom runtime returns everything placed on the pipeline as the function output, including the output of Write-Output. This gives you more control over the function output, error messages, and logging. With the previous .NET runtime implementation, your function returns only the last output from the PowerShell pipeline.

Building and packaging the custom runtime

The custom runtime is based on Lambda’s provided.al2 runtime, which runs in an Amazon Linux environment. This includes AWS credentials from an AWS Identity and Access Management (IAM) role that you manage. You can build and package the runtime as a Lambda layer, or include it in a container image. When packaging as a layer, you can add it to multiple functions, which can simplify deployment.

The runtime is based on the cross-platform PowerShell Core. This means that you can develop Lambda functions for PowerShell on Windows, Linux, or macOS.

You can build the custom runtime using a number of tools, including the AWS Command Line Interface (AWS CLI), AWS Tools for PowerShell, or with infrastructure as code tools such as AWS CloudFormation, AWS Serverless Application Model (AWS SAM), Serverless Framework, and AWS Cloud Development Kit (AWS CDK).

Showing the PowerShell custom runtime in action

In the following demo, you learn how to deploy the runtime and explore how it works with a PowerShell function.

The accompanying GitHub repository contains the code for the custom runtime, along with additional installation options and a number of examples.

This demo application uses AWS SAM to deploy the following resources:

PowerShell custom runtime based on provided.al2 as a Lambda layer.
Additional Lambda layer containing the AWS.Tools.Common module from AWS Tools for PowerShell.
Both layers store their Amazon Resource Names (ARNs) as parameters in AWS Systems Manager Parameter Store which can be referenced in other templates.
Lambda function with three different handler options.

To build the custom runtime and the AWS Tools for PowerShell layer for this example, AWS SAM uses a Makefile. This downloads the specified version of PowerShell from GitHub and the AWSTools.

Windows does not natively support Makefiles. When using Windows, you can use either Windows Subsystem for Linux (WSL), Docker Desktop, or native PowerShell.

Pre-requisites:

AWS SAM
If building on Windows:

Clone the repository and change into the example directory

git clone https://github.com/awslabs/aws-lambda-powershell-runtime
cd examples/demo-runtime-layer-function

Use one of the following Build options, A, B, C, depending on your operating system and tools.

A) Build using Linux or WSL

Build the custom runtime, Lambda layer, and function packages using native Linux or WSL.

sam build --parallel

sam build –parallel

B) Build using Docker

You can build the custom runtime, Lambda layer, and function packages using Docker. This uses a Linux-based Lambda-like Docker container to build the packages. Use this option for Windows without WSL or as an isolated Mac/Linux build environment.

sam build --parallel --use-container

sam build –parallel –use-container

C) Build using PowerShell for Windows

You can use native PowerShell for Windows to download and extract the custom runtime and Lambda layer files. This performs the same file copy functionality as the Makefile. It adds the files to the source folders rather than a build location for subsequent deployment with AWS SAM. Use this option for Windows without WSL or Docker.

.\build-layers.ps1

Build layers using PowerShell

Test the function locally

Once the build process is complete, you can use AWS SAM to test the function locally.

sam local invoke

This uses a Lambda-like environment to run the function locally and returns the function response, which is the result of Get-AWSRegion.

sam local invoke

Deploying to the AWS Cloud

Use AWS SAM to deploy the resources to your AWS account. Run a guided deployment to set the default parameters for the first deploy.

sam deploy -g

For subsequent deployments you can use sam deploy.

Enter a Stack Name and accept the remaining initial defaults.

AWS SAM deploy –g

AWS SAM deploys the infrastructure and outputs the details of the resources.

AWS SAM resources

View, edit, and invoke the function in the AWS Management Console

You can view, edit code, and invoke the Lambda function in the Lambda Console.

Navigate to the Functions page and choose the function specified in the sam deploy Outputs.

Using the built-in code editor, you can view the function code.

This function imports the AWS.Tools.Common module during the init process. The function handler runs and returns the output of Get-AWSRegion.

To invoke the function, select the Test button and create a test event. For more information on invoking a function with a test event, see the documentation.

You can see the results in the Execution result pane.

Lambda console test

You can also view a snippet of the generated Function Logs below the Response. View the full logs in Amazon CloudWatch Logs. You can navigate directly via the Monitor tab.

Invoke the function using the AWS CLI

From a command prompt invoke the function. Amend the --function-name and --region values for your function. This should return "StatusCode": 200 for a successful invoke.

aws lambda invoke --function-name "aws-lambda-powershell-runtime-Function-6W3bn1znmW8G" --region us-east-1 invoke-result

View the function results which are outputted to invoke-result.

cat invoke-result

cat invoke result

Invoke the function using the AWS Tools for PowerShell

You can invoke the Lambda function using the AWS Tools for PowerShell and capture the response in a variable. The response is available in the Payload property of the $Response object, which can be read using the .NET StreamReader class.

$Response = Invoke-LMFunction -FunctionName aws-lambda-powershell-runtime-PowerShellFunction-HHdKLkXxnkUn -LogType Tail
[System.IO.StreamReader]::new($Response.Payload).ReadToEnd()

This outputs the result of AWS-GetRegion.

Cleanup

Use AWS SAM to delete the AWS resources created by this template.

sam delete

PowerShell runtime information

Variables

The runtime defines the following variables which are made available to the Lambda function.

$LambdaInput: A PSObject that contains the Lambda function input event data.
$LambdaContext: An object that provides methods and properties with information about the invocation, function, and runtime environment. For more information, see the GitHub repository.

PowerShell module support

You can include additional PowerShell modules either via a Lambda Layer, or within your function code package, or container image. Using Lambda layers provides a convenient way to package and share modules that you can use with your Lambda functions. Layers reduce the size of uploaded deployment archives and make it faster to deploy your code.

The PSModulePath environment variable contains a list of folder locations that are searched to find user-supplied modules. This is configured during the runtime initialization. Folders are specified in the following order:

Modules as part of function package in a /modules subfolder.
Modules as part of Lambda layers in a /modules subfolder.
Modules as part of the PowerShell custom runtime layer in a /modules subfolder.

Lambda handler options

There are three different Lambda handler formats supported with this runtime.

<script.ps1>

You provide a PowerShell script that is the handler. Lambda runs the entire script on each invoke.

<script.ps1>::<function_name>

You provide a PowerShell script that includes a PowerShell function name. The PowerShell function name is the handler. The PowerShell runtime dot-sources the specified <script.ps1>. This allows you to run PowerShell code during the function initialization cold start process. Lambda then invokes the PowerShell handler function <function_name>. On subsequent invokes using the same runtime environment, Lambda invokes only the handler function <function_name>.

Module::<module_name>::<function_name>

You provide a PowerShell module. You include a PowerShell function as the handler within the module. Add the PowerShell module using a Lambda Layer or by including the module in the Lambda function code package. The PowerShell runtime imports the specified <module_name>. This allows you to run PowerShell code during the module initialization cold start process. Lambda then invokes the PowerShell handler function <function_name> in a similar way to the <script.ps1>::<function_name> handler method.

Function logging and metrics

Lambda automatically monitors Lambda functions on your behalf and sends function metrics to Amazon CloudWatch. Your Lambda function comes with a CloudWatch Logs log group and a log stream for each instance of your function. The Lambda runtime environment sends details about each invocation to the log stream, and relays logs and other output from your function’s code. For more information, see the documentation.

Output from Write-Host, Write-Verbose, Write-Warning, and Write-Error is written to the function log stream. The output from Write-Output is added to the pipeline, which you can use with your function response.

Error handling

The runtime can terminate your function because it ran out of time, detected a syntax error, or failed to marshal the response object into JSON.

Your function code can throw an exception or return an error object. Lambda writes the error to CloudWatch Logs and, for synchronous invocations, also returns the error in the function response output.

See the documentation on how to view Lambda function invocation errors for the PowerShell runtime using the Lambda console and the AWS CLI.

Conclusion

The PowerShell custom runtime for Lambda makes it even easier to run Lambda functions written in PowerShell.

The custom runtime runs native PowerShell. You can view, and edit your code within the Lambda console. The runtime supports a number of different handler options, and you can include additional PowerShell modules.

See the accompanying GitHub repository which contains the code for the custom runtime, along with installation options and a number of examples. Start running PowerShell on Lambda today.

For more serverless learning resources, visit Serverless Land.

Acknowledgements

This custom runtime builds on the work of Norm Johanson, Kevin Marquette, Andrew Pearce, Jonathan Nunn, and Afroz Mohammed.

AWS Week in Review – May 16, 2022

2022-05-16 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-week-in-review-may-16-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

I had been on the road for the last five weeks and attended many of the AWS Summits in Europe. It was great to talk to so many of you in person. The Serverless Developer Advocates are going around many of the AWS Summits with the Serverlesspresso booth. If you attend an event that has the booth, say “Hi ” to my colleagues, and have a coffee while asking all your serverless questions. You can find all the upcoming AWS Summits in the events section at the end of this post.

Last week’s launches
Here are some launches that got my attention during the previous week.

AWS Step Functions announced a new console experience to debug your state machine executions – Now you can opt-in to the new console experience of Step Functions, which makes it easier to analyze, debug, and optimize Standard Workflows. The new page allows you to inspect executions using three different views: graph, table, and event view, and add many new features to enhance the navigation and analysis of the executions. To learn about all the features and how to use them, read Ben’s blog post.

Example on how the Graph View looks

AWS Lambda now supports Node.js 16.x runtime – Now you can start using the Node.js 16 runtime when you create a new function or update your existing functions to use it. You can also use the new container image base that supports this runtime. To learn more about this launch, check Dan’s blog post.

AWS Amplify announces its Android library designed for Kotlin – The Amplify Android library has been rewritten for Kotlin, and now it is available in preview. This new library provides better debugging capacities and visibility into underlying state management. And it is also using the new AWS SDK for Kotlin that was released last year in preview. Read the What’s New post for more information.

Three new APIs for batch data retrieval in AWS IoT SiteWise – With this new launch AWS IoT SiteWise now supports batch data retrieval from multiple asset properties. The new APIs allow you to retrieve current values, historical values, and aggregated values. Read the What’s New post to learn how you can start using the new APIs.

AWS Secrets Manager now publishes secret usage metrics to Amazon CloudWatch – This launch is very useful to see the number of secrets in your account and set alarms for any unexpected increase or decrease in the number of secrets. Read the documentation on Monitoring Secrets Manager with Amazon CloudWatch for more information.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other launches and news that you may have missed:

IBM signed a deal with AWS to offer its software portfolio as a service on AWS. This allows customers using AWS to access IBM software for automation, data and artificial intelligence, and security that is built on Red Hat OpenShift Service on AWS.

Podcast Charlas Técnicas de AWS – If you understand Spanish, this podcast is for you. Podcast Charlas Técnicas is one of the official AWS podcasts in Spanish. This week’s episode introduces you to Amazon DynamoDB and shares stories on how different customers use this database service. You can listen to all the episodes directly from your favorite podcast app or the podcast web page.

AWS Open Source News and Updates – Ricardo Sueiras, my colleague from the AWS Developer Relation team, runs this newsletter. It brings you all the latest open-source projects, posts, and more. Read edition #112 here.

Upcoming AWS Events
It’s AWS Summits season and here are some virtual and in-person events that might be close to you:

May 18, AWS Summit Tel Aviv (in-person)
May 18, AWS Summit ASEAN (online)
May 18–19, AWS Summit Australia and New Zealand (online)
May 18–19, AWS Summit Atlanta (in-person)
May 23–25, AWS Summit Washington, DC (in-person)

You can register for re:MARS to get fresh ideas on topics such as machine learning, automation, robotics, and space. The conference will be in person in Las Vegas, June 21–24.

That’s all for this week. Check back next Monday for another Week in Review!

— Marcia

Send email using Workers with MailChannels

2022-05-13 Erwin van der Koogh

Post Syndicated from Erwin van der Koogh original https://blog.cloudflare.com/sending-email-from-workers-with-mailchannels/

Send email using Workers with MailChannels

Here at Cloudflare we often talk about HTTP and related protocols as we work to help build a better Internet. However, the Simple Mail Transfer Protocol (SMTP) — used to send emails — is still a massive part of the Internet too.

Even though SMTP is turning 40 years old this year, most businesses still rely on email to validate user accounts, send notifications, announce new features, and more.

Sending an email is simple from a technical standpoint, but getting an email actually delivered to an inbox can be extremely tricky. Because of the enormous amount of spam that is sent every single day, all major email providers are very wary of things like new domains and IP addresses that start sending emails.

That is why we are delighted to announce a partnership with MailChannels. MailChannels has created an email sending service specifically for Cloudflare Workers that removes all the friction associated with sending emails. To use their service, you do not need to validate a domain or create a separate account. MailChannels filters spam before sending out an email, so you can feel safe putting user-submitted content in an email and be confident that it won’t ruin your domain reputation with email providers. But the absolute best part? Thanks to our friends at MailChannels, it is completely free to send email.

In the words of their CEO Ken Simpson: “Cloudflare Workers and Pages are changing the game when it comes to ease of use and removing friction to get started. So when we sat down to see what friction we could remove from sending out emails, it turns out that with our incredible anti-spam and anti-phishing, the answer is “everything”. We can’t wait to see what applications the community is going to build on top of this.”

The only constraint currently is that the integration only works when the request comes from a Cloudflare IP address. So it won’t work yet when you are developing on your local machine or running a test on your build server.

First let’s walk you through how to send out your first email using a Worker.

export default {
  async fetch(request) {
    send_request = new Request('https://api.mailchannels.net/tx/v1/send', {
      method: 'POST',
      headers: {
        'content-type': 'application/json',
      },
      body: JSON.stringify({
        personalizations: [
          {
            to: [{ email: '[email protected]', name: 'Test Recipient' }],
          },
        ],
        from: {
          email: '[email protected]',
          name: 'Workers - MailChannels integration',
        },
        subject: 'Look! No servers',
        content: [
          {
            type: 'text/plain',
            value: 'And no email service accounts and all for free too!',
          },
        ],
      }),
    })
  },
}

That is all there is to it. You can modify the example to make it send whatever email you want.

The MailChannels integration makes it easy to send emails to and from anywhere with Workers. However, we also wanted to make it easier to send emails to yourself from a form on your website. This is perfect for quickly and painlessly setting up pages such as “Contact Us” forms, landing pages, and sales inquiries.

The Pages Plugin Framework that we announced earlier this week allows other people to email you without exposing your email address.

The only thing you need to do is copy and paste the following code snippet in your /functions/_middleware.ts file. Now, every form that has the data-static-form-name attribute will automatically be emailed to you.

import mailchannelsPlugin from "@cloudflare/pages-plugin-mailchannels";

export const onRequest = mailchannelsPlugin({
  personalizations: [
    {
      to: [{ name: "ACME Support", email: "[email protected]" }],
    },
  ],
  from: { name: "Enquiry", email: "[email protected]" },
  respondWith: () =>
    new Response(null, {
      status: 302,
      headers: { Location: "/thank-you" },
    }),
});

Here is an example of what such a form would look like. You can make the form as complex as you like, the only thing it needs is the data-static-form-name attribute. You can give it any name you like to be able to distinguish between different forms.

<!DOCTYPE html>
<html>
  <body>
    <h1>Contact</h1>
    <form data-static-form-name="contact">
      <div>
        <label>Name<input type="text" name="name" /></label>
      </div>
      <div>
        <label>Email<input type="email" name="email" /></label>
      </div>
      <div>
        <label>Message<textarea name="message"></textarea></label>
      </div>
      <button type="submit">Send!</button>
    </form>
  </body>
</html>

So as you can see there is no barrier left when it comes to sending out emails. You can copy and paste the above Worker or Pages code into your projects and immediately start to send email for free.

If you have any questions about using MailChannels in your Workers, or want to learn more about Workers in general, please join our Cloudflare Developer Discord server.

Node.js 16.x runtime now available in AWS Lambda

2022-05-12 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/node-js-16-x-runtime-now-available-in-aws-lambda/

This post is written by Dan Fox, Principal Specialist Solutions Architect, Serverless.

You can now develop AWS Lambda functions using the Node.js 16 runtime. This version is in active LTS status and considered ready for general use. To use this new version, specify a runtime parameter value of nodejs16.x when creating or updating functions or by using the appropriate container base image.

The Node.js 16 runtime includes support for ES modules and top-level await that was added to the Node.js 14 runtime in January 2022. This is especially useful when used with Provisioned Concurrency to reduce cold start times.

This runtime version is supported by functions running on either Arm-based AWS Graviton2 processors or x86-based processors. Using the Graviton2 processor architecture option allows you to get up to 34% better price performance.

We recognize that customers have been waiting for some time for this runtime release. We hear your feedback and plan to release the next Node.js runtime version in a timelier manner.

AWS SDK for JavaScript

The Node.js 16 managed runtimes and container base images bundle the AWS JavaScript SDK version 2. Using the bundled SDK is convenient for a few use cases. For example, developers writing short functions via the Lambda console or inline functions via CloudFormation templates may find it useful to reference the bundled SDK.

In general, however, including the SDK in the function’s deployment package is good practice. Including the SDK pins a function to a specific minor version of the package, insulating code from SDK API or behavior changes. To include the SDK, refer to the SDK package and version in the dependency object of the package.json file. Use a package manager like npm or yarn to download and install the library locally before building and deploying your deployment package to the AWS Cloud.

Customers who take this route should consider using the JavaScript SDK, version 3. This version of the SDK contains modular packages. This can reduce the size of your deployment package, improving your function’s performance. Additionally, version 3 contains improved TypeScript compatibility, and using it will maximize compatibility with future runtime releases.

Language updates

With this release, Lambda customers can take advantage of new Node.js 16 language features, including:

Toolchain and compiler upgrades, including prebuilt binaries for Apple Silicon.
Stable timers promises API.
RegExp match indices.
Faster calls with argument size mismatch.
An update to V8 release 9.4

Prebuilt binaries for Apple Silicon

Node.js 16 is the first runtime release to ship prebuilt binaries for Apple Silicon. Customers using M1 processors in Apple computers may now develop Lambda functions locally using this runtime version.

Stable timers promises API

The timers promises API offers timer functions that return promise objects, improving the functionality for managing timers. This feature is available for both ES modules and CommonJS.

You may designate your function as an ES module by changing the file name extension of your handler file to .mjs, or by specifying “type” as “module” in the function’s package.json file. Learn more about using Node.js ES modules in AWS Lambda.


  // index.mjs
  
  import { setTimeout } from 'timers/promises';

  export async function handler() {
    await setTimeout(2000);
    return;
  }

RegExp match indices

The RegExp match indices feature allows developers to get an array of the start and end indices of the captured string in a regular expression. Use the “/d” flag in your regular expression to access this feature.

  // handler.js
  
  exports.lambdaHandler = async () => {
    const matcher = /(AWS )(Lambda)/d.exec('AWS Lambda');
    console.log("match: " + matcher.indices[0]) // 0,10
    console.log("first capture group: " + matcher.indices[1]) // 0,4
    console.log("second capture group: " + matcher.indices[2]) // 4,10
  }

Working with TypeScript

Many developers using Node.js runtimes in Lambda develop their code using TypeScript. To better support TypeScript developers, we have recently published new documentation on using TypeScript with Lambda, and added beta TypeScript support to the AWS SAM CLI.

We are also working on a TypeScript version of Lambda PowerTools. This is a suite of utilities for Lambda developers to simplify the adoption of best practices, such as tracing, structured logging, custom metrics, and more. Currently, AWS Lambda Powertools for TypeScript is in beta developer preview.

Runtime updates

To help keep Lambda functions secure, AWS will update Node.js 16 with all minor updates released by the Node.js community when using the zip archive format. For Lambda functions packaged as a container image, pull, rebuild, and deploy the latest base image from DockerHub or the Amazon ECR Public Gallery.

Amazon Linux 2

The Node.js 16 managed runtime, like Node.js 14, Java 11, and Python 3.9, is based on an Amazon Linux 2 execution environment. Amazon Linux 2 provides a secure, stable, and high-performance execution environment to develop and run cloud and enterprise applications.

Conclusion

Lambda now supports Node.js 16. Get started building with Node.js 16 by specifying a runtime parameter value of nodejs16.x when creating your Lambda functions using the zip archive packaging format.

You can also build Lambda functions in Node.js 16 by deploying your function code as a container image using the Node.js 16 AWS base image for Lambda. You can read about the Node.js programming model in the AWS Lambda documentation to learn more about writing functions in Node.js 16.

For existing Node.js functions, review your code for compatibility with Node.js 16 including deprecations, then migrate to the new runtime by changing the function’s runtime configuration to nodejs16.x.

For more serverless learning resources, visit Serverless Land.

Orchestrate big data jobs on on-premises clusters with AWS Step Functions

2022-05-11 Göksel SARIKAYA

Post Syndicated from Göksel SARIKAYA original https://aws.amazon.com/blogs/big-data/orchestrate-big-data-jobs-on-on-premises-clusters-with-aws-step-functions/

Customers with specific needs to run big data compute jobs on an on-premises infrastructure often require a scalable orchestration solution. For large-scale distributed compute clusters, the orchestration of jobs must be scalable to maximize their utilization, while at the same time remain resilient to any failures to prevent blocking the ever-growing influx of data and jobs. Moreover, on-premises compute resources can’t be extended on demand, therefore, the jobs may be competing for the same resources with different priorities.

This post showcases serverless building blocks for orchestrating big data jobs using AWS Step Functions, AWS Lambda, and Amazon DynamoDB with a focus on reliability, maintainability, and monitoring. In this solution, Step Functions enables thousands of workflows to run parallel. Additionally, Lambda provides flexibility implementing arbitrary interfaces to the on-premises infrastructure and its compute resources. With additional steps in the orchestration, the solution also allows operations to monitor thousands of parallel jobs in a visual interface for better debugging.

Architecture

The proposed serverless solution consists of the following main components:

Job trigger – Requests new compute jobs to run on the on-premises cluster. For simplicity, in this architecture we assume that the trigger is a client calling Step Functions directly. However, you could extend this to include Amazon API Gateway to create a job API to interface with the orchestration solution or a rule engine to trigger jobs when relevant data becomes available.
Job manager – This Step Functions workflow runs once per compute job, with multiple workflows running in parallel. It tracks the status of a job from queueing, scheduling, running, retrying, all the way to its completion. Ideally, a job can be scheduled immediately, but workflows can run for days if a job is very low priority and compute resources are sparse. The job manager delegates the decision when or where to run the job to the job queue manager. Communication to the on-premises cluster is abstracted through a Lambda adapter.
Job queue manager – Maintains a queue of all jobs. With the given job properties (for example based on priority), the job queue manager decides the running time of jobs, and the cluster on which they run. To illustrate the concept, the architecture considers real-time information on the resource utilization of the compute clusters (memory, CPU) for scheduling. However, you could apply different scheduling algorithms as required given the flexibility of Lambda.
On-premises compute cluster – Provides the computing resources, data nodes, and tools to run compute jobs.

The following diagram illustrates the solution architecture.

The main process of the solution consists of seven steps:

The job trigger runs a new Step Functions workflow to run a compute job on premises and provides the necessary information (such as priority and required resources).
The job manager creates a new record in DynamoDB to add the job to the queue of the job queue manager, and the workflow waits for the job queue manager to call back.
Amazon EventBridge triggers a scheduled Lambda function in the job queue manager periodically (for example, every 5 minutes), decoupled from the job requests.
The job scheduler Lambda function retrieves real-time information from cluster metrics to see whether jobs can be scheduled at this point in time.
The job scheduler function fetches all queued jobs from DynamoDB and tries to schedule as many of those jobs as possible to available compute resources based on priority, as well as memory and CPU demands.
For each job that can be scheduled to the compute cluster, the job scheduler function communicates back to the job manager to signal the workflow to continue and that the job can be run.
The job manager communicates with the on-premises cluster through the compute cluster adapter Lambda function to run the job, track its status periodically, and retry in case of errors.

On-premises compute cluster

In this post, we assume the on-premises compute cluster offers interfaces to interact with the compute resources. For example, customers could run a Spark compute cluster on premises that allows the following basic interactions through an API:

Upload and trigger a compute job on a cluster (for example, upload a Spark JAR file and submit)
Get the status of a compute job (such as running, stopped, or error)
Get error output in case of failures in the compute job (for example, the job failed due to access denied)

In addition, we assume the cluster can provide metrics on its current utilization. For example, the cluster could provide Prometheus metrics as aggregates over all resources within a compute cluster:

Memory utilization (for example, 2 TB with 80% utilization)
CPU utilization (for example, 5,000 cores with 50% utilization)

We use the terminology introduced here for the example in this post. Depending on the capabilities of the on-premises cluster, you can adjust these concepts. For example, the compute cluster could use Kubernetes or SLURM instead of Spark.

Job manager

The job manager is responsible for communicating with on-premises clusters to trigger big data jobs and query their status. It’s a Step Functions state machine that consists of three steps, as illustrated in the following figure.

The first step is JobQueueRequest, which makes a request to the job queue manager component and waits for the callback. When the job queue manager sends OK to the waiting step with a callback pattern, the second step StartJobRun runs.

The StartJobRun step communicates with the on-premises environment (for example, via HTTP post to a REST API endpoint) to trigger an on-premises job.

The third step GetJobStatus queries the job status from the on-premises cluster. If the job status is InProgress, the state machine waits for a configured time. When the Wait state is over, it returns to the GetJobStatus step to query the job status again in a loop. When the job returns a successful state from the on-premises cluster, the state machine completes its cycle with a Success state. If the job fails with a timeout or with an error, the state machine completes its cycle with a Fail state.

The following screenshot shows the details of the state machine on the Step Functions console.

Job queue manager

The job queue manager is responsible for managing job queues based on job priorities and cluster utilization. It consists of DynamoDB, Lambda, and EventBridge.

The JobQueue table keeps data of waiting jobs, including jobId as the primary key, priority as the sort key, needed memory and CPU consumptions, callbackId, and timestamp information. You can add further information to the table dynamically if required by the scheduling algorithm.

The following screenshot shows the attribute details of the JobQueue table.

EventBridge triggers the job scheduler Lambda function on a regular bases in a configured interval. First, the job scheduler function gets waiting jobs data from the JobQueue table in DynamoDB. Then it establishes a connection with the on-premises cluster to fetch cluster metrics such as memory and CPU utilization. Based on this information, the function decides which jobs are ready to be triggered on the on-premises cluster.

The scheduling algorithm proposed here follows a simple concept to maximize resource utilization, while respecting the job priority. Essentially, for an on-premises cluster (we could potentially have multiple in different geographies), the job scheduler Lambda function builds a queue of jobs according to their priority, while allocating the first job in the queue to compute resources on the cluster. If enough resources are available, the scheduler moves to the next job in the queue and repeats.

Due to the flexibility of Lambda functions, you can tailor the scheduling algorithm for a specific use case. Cluster scheduling algorithms are still an open research topic with different optimization goals, such as throughput, data location, fairness, deadlines, and more.

Get started

In this section, we provide a starting point for the solution described in this post. The steps walk you through creating a Step Functions state machine with the appropriate template, and the necessary Lambda and DynamoDB interactions to create the job manager and job queue manager building blocks. Example code for the Lambda functions is excluded from this post, because the communication with the on-premises cluster to trigger jobs can vary depending on your on-premises interface.

On the Step Functions console, choose State machines.
Choose Create state machine.
Select Run a sample project.
Select Job Poller.
Scroll down to see the sample projects, which are defined using Amazon States Language (ASL).
Review the example definition, then choose Next.
Choose Deploy resources.
Deployment can take up to 10 minutes.
The deployment creates the state machine that is responsible for job management. After you deploy the resources, you need to edit the sample ASL code to add the extra JobQueueRequest step in the state machine.
Select the created state machine.
Choose Edit to add ARNs of the three Lambda functions to make a request in the job queue manager (Job Queue Request), to submit a job to the on-premises cluster (Submit Job), and to poll the status of the jobs (Get Job Status).
Now you’re ready to create the job queue manager.
On the DynamoDB console, create a table for storing job metadata.
On the EventBridge console, create a scheduled rule that triggers the Lambda function at a configured interval.
On the Lambda console, create the function that communicates with the on-premises cluster to fetch cluster metrics. It also gets jobs from the DynamoDB table to retrieve information including job priorities, required memory, and CPU to run the job on the on-premises cluster.

Limitations

This solution uses Step Functions to track all jobs until completion, and therefore the Step Functions quotas must be considered for potential use cases. Mainly, a workflow can run for a maximum of 1 year (cannot be increased) and by default 1 million parallel runs can run in a single account (can be increased to millions). See Quotas for further details.

Conclusion

This post described how to orchestrate big data jobs running in parallel on on-premises clusters with a Step Functions workflow. To learn more about how to use Step Functions workflows for serverless orchestration, visit Serverless Land.

About the Authors

Göksel Sarikaya is a Senior Cloud Application Architect at AWS Professional Services. He enables customers to design scalable, high-performance, and cost effective applications using the AWS Cloud. He helps them to be more flexible and competitive during their digital transformation journey.

Nicolas Jacob Baer is a Senior Cloud Application Architect with a strong focus on data engineering and machine learning, based in Switzerland. He works closely with enterprise customers to design data platforms and build advanced analytics/ml use-cases.

Shukhrat Khodjaev is a Senior Engagement Manager at AWS ProServe, based out of Berlin. He focuses on delivering engagements in the field of Big Data and AI/ML that enable AWS customers to uncover and to maximize their value through efficient use of data.

Durable Objects Alarms — a wake-up call for your applications

2022-05-11 Matt Alonso

Post Syndicated from Matt Alonso original https://blog.cloudflare.com/durable-objects-alarms/

Durable Objects Alarms — a wake-up call for your applications

Since we launched Durable Objects, developers have leveraged them as a novel building block for distributed applications.

Durable Objects provide globally unique instances of a JavaScript class a developer writes, accessed via a unique ID. The Durable Object associated with each ID implements some fundamental component of an application — a banking application might have a Durable Object representing each bank account, for example. The bank account object would then expose methods for incrementing a balance, transferring money or any other actions that the application needs to do on the bank account.

Durable Objects work well as a stateful backend for applications — while Workers can instantiate a new instance of your code in any of Cloudflare’s data centers in response to a request, Durable Objects guarantee that all requests for a given Durable Object will reach the same instance on Cloudflare’s network.

Each Durable Object is single-threaded and has access to a stateful storage API, making it easy to build consistent and highly-available distributed applications on top of them.

This system makes distributed systems’ development easier — we’ve seen some impressive applications launched atop Durable Objects, from collaborative whiteboarding tools to conflict-free replicated data type (CRDT) systems for coordinating distributed state launch.

However, up until now, there’s been a piece missing — how do you invoke a Durable Object when a client Worker is not making requests to it?

As with any distributed system, Durable Objects can become unavailable and stop running. Perhaps the machine you were running on was unplugged, or the datacenter burned down and is never coming back, or an individual object exceeded its memory limit and was reset. Before today, a subsequent request would reinitialize the Durable Object on another machine, but there was no way to programmatically wake up an Object.

Durable Objects Alarms are here to change that, unlocking new use cases for Durable Objects like queues and deferred processing.

What is a Durable Object Alarm?

Durable Object Alarms allow you, from within your Durable Object, to schedule the object to be woken up at a time in the future. When the alarm’s scheduled time comes, the Durable Object’s alarm() handler will be called. If this handler throws an exception, the alarm will be automatically retried using exponential backoff until it succeeds — alarms have guaranteed at-least-once execution.

How are Alarms different from Workers Cron Triggers?

Alarms are more fine-grained than Cron Triggers. While a Workers service can have up to three Cron Triggers configured at once, it can have an unlimited amount of Durable Objects, each of which can have a single alarm active at a time.

Alarms are directly scheduled from and invoke a function within your Durable Object. Cron Triggers, on the other hand, are not programmatic — they execute based on their schedules, which have to be configured via the Cloudflare Dashboard or centralized configuration APIs.

How do I use Alarms?

First, you’ll need to add the durable_object_alarms compatibility flag to your wrangler.toml.

compatibility_flags = ["durable_object_alarms"]

Next, implement an alarm() handler in your Durable Object that will be called when the alarm executes. From anywhere else in your Durable Object, call state.storage.setAlarm() and pass in a time for the alarm to run at. You can use state.storage.getAlarm() to retrieve the currently set alarm time.

In this example, we implemented an alarm handler that wakes the Durable Object up once every 10 seconds to batch requests to a single Durable Object, deferring processing until there is enough work in the queue for it to be worthwhile to process them.

export default {
  async fetch(request, env) {
    let id = env.BATCHER.idFromName("foo");
    return await env.BATCHER.get(id).fetch(request);
  },
};

const SECONDS = 1000;

export class Batcher {
  constructor(state, env) {
    this.state = state;
    this.storage = state.storage;
    this.state.blockConcurrencyWhile(async () => {
      let vals = await this.storage.list({ reverse: true, limit: 1 });
      this.count = vals.size == 0 ? 0 : parseInt(vals.keys().next().value);
    });
  }
  async fetch(request) {
    this.count++;

    // If there is no alarm currently set, set one for 10 seconds from now
    // Any further POSTs in the next 10 seconds will be part of this kh.
    let currentAlarm = await this.storage.getAlarm();
    if (currentAlarm == null) {
      this.storage.setAlarm(Date.now() + 10 * SECONDS);
    }

    // Add the request to the batch.
    await this.storage.put(this.count, await request.text());
    return new Response(JSON.stringify({ queued: this.count }), {
      headers: {
        "content-type": "application/json;charset=UTF-8",
      },
    });
  }
  async alarm() {
    let vals = await this.storage.list();
    await fetch("http://example.com/some-upstream-service", {
      method: "POST",
      body: Array.from(vals.values()),
    });
    await this.storage.deleteAll();
    this.count = 0;
  }
}

Once every 10 seconds, the alarm() handler will be called. In the event an unexpected error terminates the Durable Object, it will be re-instantiated on another machine, following a short delay, after which it can continue processing.

Under the hood, Alarms are implemented by making reads and writes to the storage layer. This means Alarm get and set operations follow the same rules as any other storage operation – writes are coalesced with other writes, and reads have a defined ordering. See our blog post on the caching layer we implemented for Durable Objects for more information.

Durable Objects Alarms guarantee fault-tolerance

Alarms are designed to have no single point of failure and to run entirely on our edge – every Cloudflare data center running Durable Objects is capable of running alarms, including migrating Durable Objects from unhealthy data centers to healthy ones as necessary to ensure that their Alarm executes. Single failures should resolve in under 30 seconds, while multiple failures may take slightly longer.

We achieve this by storing alarms in the same distributed datastore that backs the Durable Object storage API. This allows alarm reads and writes to behave identically to storage reads and writes and to be performed atomically with them, and ensures that alarms are replicated across multiple datacenters.

Within each data center capable of running Durable Objects, there are multiple processes responsible for tracking upcoming alarms and triggering them, providing fault tolerance and scalability within the data center. A single elected leader in each data center is responsible for detecting failure of other data centers and assigning responsibility of those alarms to healthy local processes in its own data center. In the event of leader failure, another leader will be elected and become responsible for executing Alarms in the data center. This allows us to guarantee at-least-once execution for all Alarms.

How do I get started?

Alarms are a great way to build new distributed primitives, like queues, atop Durable Objects. They also provide a method for guaranteeing work within a Durable Object will complete, without relying on a client request to “kick” the Object.

You can get started with Alarms now by enabling Durable Objects in the Cloudflare dashboard. For more info, check the developer docs or jump in our Discord.

Come join us at Cloudflare Connect New York this Thursday!

2022-05-10 Jen Taylor

Post Syndicated from Jen Taylor original https://blog.cloudflare.com/cloudflare-connect-nyc-2022/

Come join us at Cloudflare Connect New York this Thursday!

We take a break from Platform Week to share big news – we’re going to New York this week for our Cloudflare Connect customer event.

We’re packing our bags, getting on planes and heading to New York to do our first live customer event since 2019 and we could not be more excited. It is time with you – the people building, delivering and securing the apps and networks we know and trust – that are the inspiration for the innovation we deliver. We can’t wait to spend time with you.

Our co-founder and CEO Matthew Prince will kick off the day with his view from the top. We’ll then be breaking out into focused conversations to dig in on our latest product news and roadmaps.

Excited about what we’re talking about for Platform Week? Come chat with the Workers team in person and hear more about the roadmap.

Intrigued by the latest DDoS stats we posted and want to learn more? Meet with the team analyzing the attacks and learn about where we go from here.

Not sure where to start your Zero Trust journey? We’ll talk you through what we’re seeing and introduce you to other customers who are in the process of rolling out Zero Trust solutions for their teams so you can learn from each other.

Don’t miss it! Register now – use the code BetterInternet to join us in-person for free. Not in New York? No worries – we’re coming to London, Sydney and San Francisco later this year.

Debugging AWS Step Functions executions with the new console experience

2022-05-09 Benjamin Smith

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/debugging-aws-step-functions-executions-with-the-new-console-experience/

Today, AWS Step Functions introduces a new opt-in console experience that makes it easier to analyze, debug, and optimize Standard Workflows.

Builders create Step Functions workflows to orchestrate multiple services into business-critical applications with minimal code. Customers wanted better ways to debug workflow executions and analyze the payload as it passes through each state.

This blog post explains the new capabilities of the enhanced Step Functions executions page. It shows how to debug workflows quickly, sort and filter on state events, and view the input and output path processing for each state.

Overview

The new Executions Details page allows you to inspect executions using three different view types: Graph view, Table view, and Event view. It has multi-level navigation enhancements for analyzing the map state feature, the ability to search execution history based on unique attributes and improved events, and table navigation with filtering, sorting, and pagination. For a full list of all the feature enhancements, see the Step Functions documentation.

Getting started

To get started, go to the Step Functions state machines page in the AWS Management Console:

Choose any standard workflow from the list.
From the workflows executions list page, choose an execution to analyze.
Choose the New executions page button:

An enhanced execution summary section at the top of the page contains some new information:

You can copy the execution ARN to the clipboard by choosing the copy icon.
This section now shows the number of state transitions for the execution. This is helpful in optimizing the cost of your workflows. With Step Functions, you pay for the number of state transitions used per month for each state transition over the AWS Free Tier.
You can view the total execution duration.

The following execution summary is a new section that displays execution errors. This helps to find the root cause of any workflow faults or failures:

Choosing the between execution views

The following examples show how to use the new execution views to inspect a workflow execution. This example focuses on the Order Processing Workflow that powers Serverlesspresso. Serverlesspresso is an interactive serverless application showcased at AWS re:Invent and AWS Summits. It allows attendees to order coffee from their smartphones. Each order starts a workflow execution.

Graph view

The graph view provides a visual representation of the workflow execution path. It shows which states succeeded, failed, or are currently in progress, and any errors caught. The legend at the bottom of the graph helps to decode each color.

To access the Graph view, choose the Graph view from the view navigation, as shown in the following screenshot. There is a new option to render the graph vertically or horizontally, which you can choose from the Layout option.

The graph view shows that the workflow caught an error at the Emit–Workflow Started TT state. I choose this state from the graph to view more details about it.

The Events tab

Each state in a workflow execution moves through a sequence of events, from TaskStateEntered to TaskStateExited. The Events tab, shown in the following image, displays all the events for the selected state, with their corresponding timestamp.

You can drill down into an event to see its output. In this case, the TaskTimedOut event happens to the selected state with the error message “States.Timeout”. This corresponds with the Caught error shown in the Graph view.

The Input & Output tab

Amazon States Language (ASL) enables you to filter and manipulate data at various stages of a workflow state’s execution using paths. A path is a string beginning with $ that lets you identify and filter subsets of JSON text. Learning how to apply these filters helps to build efficient workflows with minimal state transitions.

Select the Input & Output tab, and then toggle the Advanced view option to display the payload after each path process is applied.

This is useful for checking what each JSON path evaluates to. For example, the following images show how I configure the state Parameters, along with the Task input. This is what the parameters evaluate to after Step Functions applies the JSON path processing:

State Details and Definition tabs

The new execution page lets you view a state’s definition and execution details in isolation from the other states. The task details section contains additional information such as the Duration, Heartbeat, Started After and Timeout values.

Table view

The Table view provides a tabular representation of each state. Use this view to access information quickly about a state’s duration, resources, or status. A new timeline column shows the relative duration that each state took to complete. You can configure which columns are displayed.

To search and filter the table based on unique attributes such as state name or error type, start typing into the search input. Use the predictive autocomplete to define the search criteria. You can also choose a relative or absolute time range to filter by.

Event view

Step Functions stores all changes to state as a sequence of events to give you more visibility into the execution. The event view allows you to find and investigate a particular state event quickly by using the search and sort options:

Choose the Timestamp column header to list events in reverse order of occurrence.
Choose the arrow in the left-most column to reveal more details about each event
Use the search input to search by keyword, state name, event type, or attribute.
The date button lets you filter events by date or time.

Map State

The map state (“Type”: “Map”) allows you to run a set of steps for each element of an input array. The Step Functions execution page now helps you investigate and debug workflows using the Map state with the following enhancements:

A hierarchical table view of steps for each iteration.
An integrations overview, showing the execution summary at a glance.
A paginated list of every event across all iterations.

The following Serverlesspresso workflow processes orders in batches. The map state iterates over each order, sanitizes each item, and checks it is currently in stock. If the item is not in stock, the map state throws a failure. If it is in stock, the order is recorded to a database, and a new event is emitted onto the serverless event bus.

The workflow runs for a new batch of orders and produces the following executions results page:

Using the Graph view:

You can step through each iteration using the Map iteration viewer
See the summary status in the iterations overview section

The Table view shows a hierarchical list of each iteration and I can drill down into the execution that failed to investigate further.

Use the Event view to search for failed executions to quickly filter the results:

Conclusion

Today, Step Functions is launching a new opt-in console experience to help builders analyse, debug, and optimize Step Functions Standard Workflows. These enhancements include three different ways to view your workflow executions, better visibility into workflows that use the Map state, fast access to workflow faults and failures, and new tools to search and sort workflow executions. This blog post shows how to use the new views to debug workflows, sort and filter on state events, and view the input and output path processing for each state.

The new console executions experience is generally available in AWS Regions where Step Functions is available.

For more information on building applications with Step Functions, visit serverlessland.com.

Benefits of migrating to event-driven architecture

2022-05-09 Talia Nassi

Post Syndicated from Talia Nassi original https://aws.amazon.com/blogs/compute/benefits-of-migrating-to-event-driven-architecture/

Two common options when building applications are request-response and event-driven architecture. In request-response architecture, an application’s components communicate via API calls. The client sends a request and expects a response before performing the next task. In event-driven architecture, the client generates an event and can immediately move on to its next task. Different parts of the application then respond to the event as needed.

In this post, you learn about reasons to consider moving from request-response architecture to an event-driven architecture.

Challenges with request-response architecture

When starting to a build a new application, many developers default to a request-response architecture. A request-response architecture may tightly integrate components and those components communicate via synchronous calls. While a request-response approach is often easier to get started with, it can become challenging as your application grows in complexity.

In this post, I review an example request-response ecommerce application and demonstrate the challenges of tightly coupled integrations. Then I show you how building the same application with an event-driven architecture can give you increased scalability, fault tolerance, and developer velocity.

Close coordination between microservices

In a typical ecommerce application that uses a synchronous API, the client makes a request to place an order and the order service sends the request downstream to an invoice service. If successful, the order service responds with a success message or confirmation number.

In this initial stage, this is a straightforward connection between the two services. The challenge comes when you add more services that integrate with the order service.

If you add a fulfillment service and a forecasting service, the order service has more responsibilities and more complexity. The order service must know how to call each service’s API, from the API call structure to the API’s retry semantics. If there are any backwards incompatible changes to the APIs, the order service team must update them. The system forwards heavy traffic spikes to the order service’s dependency, which may not have the same scaling capabilities. Also, dependent services may transmit errors back up the stack to the client.

Error handling and retries

Now, you add new downstream services for fulfillment and shipping orders to the ecommerce application.

In the happy path, everything works as expected: The order service triggers invoicing, payment systems, and updates forecasting. Once payment clears, this triggers the fulfillment and packing of the order, and then informs the shipping service to request tracking information.

However, if the fulfillment center cannot find the product because they are out of stock, then fulfillment might have to alert the invoice service, then reverse the payment or issue a refund. If fulfillment fails, then the system that triggers shipping might fail as well. Forecasting must also be updated to reflect the change. This remediation workflow is all just to address one of the many potential “unhappy paths” that can occur in this API-driven ecommerce application.

Close coordination between development teams

In a synchronously integrated application, teams must coordinate any new services that are added to the application. This can slow down each development team’s ability to release new features. Imagine your team works on the payment service but you weren’t told that another team added a new rewards service. What now happens when the fulfillment service errors?

Fulfillment may orchestrate all the other services. Your payments team gets a message and you undo the payment, but you may not know who handles retries and error logic. If the rewards service changes vendors and has a new API, and does not tell your team, you may not be aware of the new service.

Ultimately, it can be hard to coordinate these orchestrations and workflows as systems become more complex and management adds more services. This is one reason that it can be beneficial to migrate to event-driven architecture.

Benefits of event-driven architecture

Event-driven architecture can help solve the problems of the close coordination of microservices, error handling and retries, and coordination between development teams.

Close coordination between microservices

In event-driven architecture, the publisher emits an event, which is acknowledged by the event bus. The event bus routes events to subscribers, which process events with self-contained business logic. There is no direct communication between publishers and subscribers.

Decoupled applications enable teams to act more independently, which can increase their velocity. For example, with an API-based integration, if your team wants to know about a change that happened in another team’s microservice, you might have to ask that team to make an API call to your service. Consequently, you may have to account for authentication, coordination with the other team over the structure of the API call. This causes back and forth between teams, which slows down development time. With an event-driven application, you can subscribe to events sent from your microservice and the event bus (for example, Amazon EventBridge) takes care of routing the event and handling authentication.

Error handling and retries

Another reason to migrate to event-driven architecture is to handle unpredictable traffic. Ecommerce websites like Amazon.com have variable amounts of traffic depending on the day. Once you place an order, several things happen.

First, Amazon checks your credit card to make sure that funds are available. Then, Amazon has to pack the merchandise and load onto trucks. That all happens in an Amazon fulfillment center. There is no synchronous API call for the Amazon backend to package and ship products. After the system confirms your payment, the front end puts together some information describing the event and puts your account number, credit card info, and what you bought in a packaged event and put it into the cloud and onto a queue. Later, another piece of software removes the event from the queue and starts the packaging and shipping.

The key point about this process is that these processes can all run at different rates. Normally, the rate at which customers place orders and the rate at which the warehouses can get the boxes packed are roughly equivalent. However, on busy days like Prime Day, customers place orders much more quickly than the warehouses can operate.

Ecommerce applications, like Amazon.com, must be able to scale up to handle unpredictable traffic. When a customer places an order, an event bus like Amazon EventBridge receives the event and all of the downstream microservices are able to select the order event for processing. Because each of the microservices can fail independently, there are no single points of failure.

Loose coordination between development teams

Event-driven architectures promote development team independence due to loose coupling between publishers and subscribers. Applications can subscribe to events with routing requirements and business logic that are separate from the publisher and other subscribers. This allows publishers and subscribers to change independently of each other, providing more flexibility to the overall architecture.

Decoupled applications also allow you to build new features faster. Adding new features or extending existing ones can be simpler with event-driven architectures because you either add new events, or modify existing ones. This process removes complexity in your application.

Conclusion

In this post, you learn about the challenges of developing applications with request-response architecture. In request-response architecture, the client must send a request and wait for a response before moving on to its next task. As applications grow in complexity, this tightly coupled architecture can cause issues. Event-driven architectures can increase scalability, fault tolerance, and developer velocity by decoupling components of your application.

For more serverless content, go to serverlessland.com.

Orchestrating Amazon S3 Glacier Deep Archive object retrieval using AWS Step Functions

2022-05-06 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/orchestrating-amazon-s3-glacier-deep-archive-object-retrieval-using-aws-step-functions/

This blog was written by Monica Cortes Sack, Solutions Architect, Oskar Neumann, Partner Solutions Architect, and Dhiraj Mahapatro, Principal Specialist SA, Serverless.

AWS Step Functions now support over 220 services and over 10,000 AWS API actions. This enables you to use the AWS SDK integration directly instead of writing an AWS Lambda function as a proxy.

One such service integration is with Amazon S3. Currently, you write scripts using AWS CLI S3 commands to achieve automation around running S3 tasks. For example, S3 integrates with AWS Transfer Family, builds a custom security check, takes action on S3 buckets on S3 object creations, or orchestrates a workflow around S3 Glacier Deep Archive object retrieval. These script executions do not provide an execution history or an easy way to validate the behavior.

Step Functions’ AWS SDK integration with S3 declaratively creates serverless workflows around S3 tasks without relying on those scripts. You can validate the execution history and behavior of a Step Functions workflow.

This blog highlights one of the S3 use cases. It shows how to orchestrate workflows around S3 Glacier Deep Archive object retrieval, cost estimation, and interaction with the requester using Step Functions. The demo application provides additional details on the entire architecture.

S3 Glacier Deep Archive is a storage class in S3 used for data that is rarely accessed. The service provides durable and secure long-term storage, trading immediate access for cost-effectiveness. You must restore archived objects before they are downloadable. It supports two options for object retrieval:

Standard – Access objects within 12 hours of the start of the restoration process.
Bulk – Access objects within 48 hours of the start of the restoration process.

Business use case

Consider a research institute that stores backups on S3 Glacier Deep Archive. The backups are maintained in S3 Glacier Deep Archive for redundancy. The institute has multiple researchers with one central IT team. When a researcher requests an object from S3 Glacier Deep Archive, the central IT team retrieves it and charges the corresponding research group for retrieval and data transfer costs.

Researchers are the end users and do not operate in the AWS Cloud. They run computing clusters on-premises and depend on the central IT team to provide them with the restored archive. A member of the research team requesting an object retrieval provides the following information to the central IT team:

Object key to be restored.
The number of days the researcher needs the object accessible for download.
Researcher’s email address.
Retrieve within 12 or 48 hours SLA. This determines whether “Standard” or “Bulk” retrieval respectively.

The following overall architecture explains the setup on AWS and the interaction between a researcher and the central IT team’s architecture.

Architecture overview

Architecture diagram

The researcher uses a front-end application to request object retrieval from S3 Glacier Deep Archive.
Amazon API Gateway synchronously invokes AWS Step Functions Express Workflow.
Step Functions initiates RestoreObject from S3 Glacier Deep Archive.
Step Functions stores the metadata of this retrieval in an Amazon DynamoDB table.
Step Functions uses Amazon SES to email the researcher about archive retrieval initiation.
Upon completion, S3 sends the RestoreComplete event to Amazon EventBridge.
EventBridge rule triggers another Step Functions for post-processing after the restore is complete.
A Lambda function inside the Step Functions calculates the estimated cost (retrieval and data transfer out) and updates existing metadata in the DynamoDB table.
Sync data from DynamoDB table using Amazon Athena Federated Queries to generate reports dashboard in Amazon QuickSight.
Step Functions uses SES to email the researcher with cost details.
Once the researcher receives an email, the researcher uses the front-end application to call the /download API endpoint.
API Gateway invokes a Lambda function that generates a pre-signed S3 URL of the retrieved object and returns it in the response.

Setup

To run the sample application, you must install CDK v2, Node.js, and npm.

To clone the repository, run:

git clone https://github.com/aws-samples/aws-stepfunctions-examples.git
cd cdk/app-glacier-deep-archive-retrieval

To deploy the application, run:
```
cdk deploy --all
```

Identifying workflow components

Starting the restore object workflow

The first component is accepting the researcher’s request to start the archive retrieval process. The sample application created from the demo provides a basic front-end app that shows the files from an S3 bucket that has objects stored in S3 Glacier Deep Archive. The researcher retrieves file requests from the front-end application reached by the sample application’s Amazon CloudFront URL.

Glacier Deep Archive Retrieval menu

The front-end app asks the researcher for an email address, the number of days the researcher wants the object to be available for download, and their ETA on retrieval speed. Based on the retrieval speed, the researcher accepts either Standard or Bulk object retrieval. To test this, put objects in the data bucket under the S3 Glacier Deep Archive storage class and use the front-end application to retrieve them.

Item retrieval prompt

The researcher then chooses the Retrieve file. This action invokes an API endpoint provided by API Gateway. The API Gateway endpoint synchronously invokes a Step Functions Express Workflow. This validates the restore object request, gets the object metadata, and starts to restore the object from S3 Glacier Deep Archive.

The state machine stores the metadata of the restore object AWS SDK call in a DynamoDB table for later use. You can use this metadata to build a dashboard in Amazon QuickSight for reporting and administration purposes. Finally, the state machine uses Amazon SES to email the researcher, notifying them about the restore object initiation process:

Restore object initiated

The following state machine shows the workflow:

Workflow diagram

The ability to use S3 APIs declaratively using AWS SDK from Step Functions makes it convenient to integrate with S3. This approach avoids writing a Lambda function to wrap the SDK calls. The following portion of the state machine definition shows the usage of S3 HeadObject and RestoreObject APIs:

"Get Object Metadata": {
    "Next": "Initiate Restore Object from Deep Archive",
    "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "Next": "Bad Request"
    }],
    "Type": "Task",
    "ResultPath": "$.result.metadata",
    "Resource": "arn:aws:states:::aws-sdk:s3:headObject",
    "Parameters": {
        "Bucket": "glacierretrievalapp-databucket-abc123",
        "Key.$": "$.fileKey"
    }
}, 
"Initiate Restore Object from Deep Archive": {
    "Next": "Update restore operation metadata",
    "Type": "Task",
    "ResultPath": null,
    "Resource": "arn:aws:states:::aws-sdk:s3:restoreObject",
    "Parameters": {
        "Bucket": "glacierretrievalapp-databucket-abc123",
        "Key.$": "$.fileKey",
        "RestoreRequest": {
            "Days.$": "$.requestedForDays"
        }
    }
}

You can extend the previous workflow and build your own Step Functions workflows to orchestrate other S3 related workflows.

Processing after object restoration completion

S3 RestoreObject is a long-running process for S3 Glacier Deep Archive objects. S3 emits a RestoreCompleted event notification on the object restore completion to EventBridge. You set up an EventBridge rule to trigger another Step Functions workflow as a target for this event. This workflow takes care of the object restoration post-processing.

cfnDataBucket.addPropertyOverride('NotificationConfiguration.EventBridgeConfiguration.EventBridgeEnabled', true);

An EventBridge rule triggers the following Step Functions workflow and passes the event payload as an input to the Step Functions execution:

new aws_events.Rule(this, 'invoke-post-processing-rule', {
  eventPattern: {
    source: ["aws.s3"],
    detailType: [
      "Object Restore Completed"
    ],
    detail: {
      bucket: {
        name: [props.dataBucket.bucketName]
      }
    }
  },
  targets: [new aws_events_targets.SfnStateMachine(this.stateMachine, {
    input: aws_events.RuleTargetInput.fromObject({
      's3Event': aws_events.EventField.fromPath('$')
    })
  })]
});

The Step Functions workflow gets object metadata from the DynamoDB table and then invokes a Lambda function to calculate the estimated cost. The Lambda function calculates the estimated retrieval and the data transfer costs using the contentLength of the retrieved object and the Price List API for the unit cost. The workflow then updates the calculated cost in the DynamoDB table.

The retrieval cost and the data transfer out cost are proportional to the size of the retrieved object. The Step Functions workflow also invokes a Lambda function to create the download API URL for object retrieval. Finally, it emails the researcher with the estimated cost and the download URL as a restoration completion notification.

Workflow studio diagram

The email notification to the researcher looks like:

Email example

Downloading the restored object

Once the object restoration is complete, the researcher can download the object from the front-end application.

Front end retrieval menu

The researcher chooses the Download action, which invokes another API Gateway endpoint. The endpoint integrates with a Lambda function as a backend that creates a pre-signed S3 URL sent as a response to the browser.

Administering object restoration usage

This architecture also provides a view for the central IT team to understand object restoration usage. You achieve this by creating reports and dashboards from the metadata stored in DynamoDB.

The sample application uses Amazon Athena Federated Queries and Amazon Athena DynamoDB Connector to generate a reports dashboard in Amazon QuickSight. You can also use Step Functions AWS SDK integration with Amazon Athena and visualize the workflows in the Athena console.

The following QuickSight visualization shows the count of restored S3 Glacier Deep Archive objects by their contentType:

QuickSight visualization

Considerations

With the preceding approach, you should consider that:

You must start the object retrieval in the same Region as the Region of the archived object.
S3 Glacier Deep Archive only supports standard and bulk retrievals.
You must enable the “Object Restore Completed” event notification on the S3 bucket with the S3 Glacier Deep Archive object.
The researcher’s email must be verified in SES.
Use a Lambda function for the Price List GetProducts API as the service endpoints are available in specific Regions.

Cleanup

To clean up the infrastructure used in this sample application, run:

cdk destroy --all

Conclusion

Step Functions’ AWS SDK integration opens up different opportunities to orchestrate a workflow. Step Functions provides native support for retries and error handling, which offloads the heavy lifting of handling them manually in scripts.

This blog shows one example use case with S3 Glacier Deep Archive. With AWS SDK integration in Step Functions, you can build any workflow orchestration using S3 or S3 control APIs. For example, a workflow to enforce AWS Key Management Service encryption based on an S3 event, or create static website hosting on-demand in a few steps.

With different S3 API calls available in Step Functions’ Workflow Studio, you can declaratively build a Step Functions workflow instead of imperatively calling each S3 API from a shell script or command line. Refer to the demo application for more details.

For more serverless learning resources, visit Serverless Land.