Tag Archives: Amazon Simple Notification Service (SNS)

Building a Serverless Streaming Pipeline to Deliver Reliable Messaging

2024-03-06 Chris McPeek

Post Syndicated from Chris McPeek original https://aws.amazon.com/blogs/compute/building-a-serverless-streaming-pipeline-to-deliver-reliable-messaging/

This post is written by Jeff Harman, Senior Prototyping Architect, Vaibhav Shah, Senior Solutions Architect and Erik Olsen, Senior Technical Account Manager.

Many industries are required to provide audit trails for decision and transactional systems. AI assisted decision making requires monitoring the full inputs to the decision system in near real time to prevent fraud, detect model drift, and discrimination. Modern systems often use a much wider array of inputs for decision making, including images, unstructured text, historical values, and other large data elements. These large data elements pose a challenge to traditional audit systems that deal with relatively small text messages in structured formats. This blog shows the use of serverless technology to create a reliable, performant, traceable, and durable streaming pipeline for audit processing.

Overview

Consider the following four requirements to develop an architecture for audit record ingestion:

Audit record size: Store and manage large payloads (256k – 6 MB in size) that may be heterogeneous, including text, binary data, and references to other storage systems.
Audit traceability: The data stored has full traceability of the payload and external processes to monitor the process via subscription-based events.
High Performance: The time required for blocking writes to the system is limited to the time it takes to transmit the audit record over the network.
High data durability: Once the system sends a payload receipt, the payload is at very low risk of loss because of system failures.

The following diagram shows an architecture that meets these requirements and models the flow of the audit record through the system.

The primary source of latency is the time it takes for an audit record to be transmitted across the network. Applications sending audit records make an API call to an Amazon API Gateway endpoint. An AWS Lambda function receives the message and an Amazon ElastiCache for Redis cluster provides a low latency initial storage mechanism for the audit record. Once the data is stored in ElastiCache, the AWS Step Functions workflow then orchestrates the communication and persistence functions.

Subscribers receive four Amazon Simple Notification Service (Amazon SNS) notifications pertaining to arrival and storage of the audit record payload, storage of the audit record metadata, and audit record archive completion. Users can subscribe an Amazon Simple Queue Service (SQS) queue to the SNS topic and use fan out mechanisms to achieve high reliability.

The Ingest Message Lambda function sends an initial receipt notification
The Message Archive Handler Lambda function notifies on storage of the audit record from ElastiCache to Amazon Simple Storage Service (Amazon S3)
The Message Metadata Handler Lambda function notifies on storage of the message metadata into Amazon DynamoDB
The Final State Aggregation Lambda function notifies that the audit record has been archived.

Any failure by the three fundamental processing steps: Ingestion, Data Archive, and Metadata Archive triggers a message in an SQS Dead Letter Queue (DLQ) which contains the original request and an explanation of the failure reason. Any failure in the Ingest Message function invokes the Ingest Message Failure function, which stores the original parameters to the S3 Failed Message Storage bucket for later analysis.

The Step Functions workflow provides orchestration and parallel path execution for the system. The detailed workflow below shows the execution flow and notification actions. The transformer steps convert the internal data structures into the format required for consumers.

Data structures

There are types three events and messages managed by this system:

Incoming message: This is the message the producer sends to an API Gateway endpoint.
Internal message: This event contains the message metadata allowing subsequent systems to understand the originating message producer context.
Notification message: Messages that allow downstream subscribers to act based on the message.

Solution walkthrough

The message producer calls the API Gateway endpoint, which enforces the security requirements defined by the business. In this implementation, API Gateway uses an API key for providing more robust security. API Gateway also creates a security header for consumption by the Ingest Message Lambda function. API Gateway can be configured to enforce message format standards, see Use request validation in API Gateway for more information.

The Ingest Message Lambda function generates a message ID that tracks the message payload throughout its lifecycle. Then it stores the full message in the ElastiCache for Redis cache. The Ingest Message Lambda function generates an internal message with all the elements necessary as described above. Finally, the Lambda function handler code starts the Step Functions workflow with the internal message payload.

If the Ingest Message Lambda function fails for any reason, the Lambda function invokes the Ingestion Failure Handler Lambda function. This Lambda function writes any recoverable incoming message data to an S3 bucket and sends a notification on the Ingest Message dead letter queue.

The Step Functions workflow then runs three processes in parallel.

The Step Functions workflow triggers the Message Archive Data Handler Lambda function to persist message data from the ElastiCache cache to an S3 bucket. Once stored, the Lambda function returns the S3 bucket reference and state information. There are two options to remove the internal message from the cache. Remove the message from cache immediately before sending the internal message and updating the ElastiCache cache flag or wait for the ElastiCache lifecycle to remove a stale message from cache. This solution waits for the ElastiCache lifecycle to remove the message.
The workflow triggers the Message Metadata Handler Lambda function to write all message metadata and security information to DynamoDB. The Lambda function replies with the DynamoDB reference information.
Finally, the Step Functions workflow sends a message to the SNS topic to inform subscribers that the message has arrived and the data persistence processes have started.

After each of the Lambda functions’ processes complete, the Lambda function sends a notification to the SNS notification topic to alert subscribers that each action is complete. When both Message Metadata and Message Archive Lambda functions are done, the Final Aggregation function makes a final update to the metadata in DynamoDB to include S3 reference information and to remove the ElastiCache Redis reference.

Deploying the solution

Prerequisites:

AWS Serverless Application Model (AWS SAM) is installed (see Getting started with AWS SAM)
AWS User/Credentials with appropriate permissions to run AWS CloudFormation templates in the target AWS account
Python 3.8 – 3.10
The AWS SDK for Python (Boto3) is installed
The requests python library is installed

The source code for this implementation can be found at https://github.com/aws-samples/blog-serverless-reliable-messaging

Installing the Solution:

Clone the git repository to a local directory
git clone https://github.com/aws-samples/blog-serverless-reliable-messaging.git
Change into the directory that was created by the clone operation, usually blog_serverless_reliable_messaging
Execute the command: sam build
Execute the command: sam deploy –-guided. You are asked to supply the following parameters:
1. Stack Name: Name given to this deployment (example: serverless-streaming)
2. AWS Region: Where to deploy (example: us-east-1)
3. ElasticacheInstanceClass: EC2 cache instance type to use with (example: cache.t3.small)
4. ElasticReplicaCount: How many replicas should be used with ElastiCache (recommended minimum: 2)
5. ProjectName: Used for naming resources in account (example: serverless-streaming)
6. MultiAZ: True/False if multiple Availability Zones should be used (recommend: True)
7. The default parameters can be selected for the remainder of questions

Testing:

Once you have deployed the stack, you can test it through the API gateway endpoint with the API key that is referenced in the deployment output. There are two methods for retrieving the API key either via the AWS console (from the link provided in the output – ApiKeyConsole) or via the AWS CLI (from the AWS CLI reference in the output – APIKeyCLI).

You can test directly in the Lambda service console by invoking the ingest message function.

A test message is available at the root of the project test_message.json for direct Lambda function testing of the Ingest function.

In the console navigate to the Lambda service
From the list of available functions, select the “<project name> -IngestMessageFunction-xxxxx” function
Under the “Function overview” select the “Test” tab
Enter an event name of your choosing
Copy and paste the contents of test_message.json into the “Event JSON” box
Click “Save” then after it has saved, click the “Test”

If successful, you should see something similar to the below in the details:

{
"isBase64Encoded": false,
"statusCode": 200,
"headers": {
"Access-Control-Allow-Headers": "Content-Type",
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Methods": "OPTIONS,POST"
},
"body": "{\"messageID\": \"XXXXXXXXXXXXXX\"}"
}

In the S3 bucket “<project name>-s3messagearchive-xxxxxx“, find the payload of the original json with a key based on the date and time of the script execution, e.g.: YEAR/MONTH/DAY/HOUR/MINUTE with a file name of the messageID
In a DynamoDB table named metaDataTable, you should find a record with a messageID equal to the messageID from above that contains all of the metadata related to the payload

A python script is included with the code in the test_client folder

Replace the <Your API key key here> and the <Your API Gateway URL here (IngestMessageApi)> values with the correct ones for your environment in the test_client.py file
Execute the test script with Python 3.8 or higher with the requests package installed
Example execution (from main directory of git clone):
python3 -m pip install -r ./test_client/requirements.txt
python3 ./test_client/test_client.py
Successful output shows the messageID and the header JSON payload:
```
{
"messageID": " XXXXXXXXXXXXXX"
}
```
In the S3 bucket “<project name>-s3messagearchive-xxxxxx“, you should be able to find the payload of the original json with a key based on the date and time of the script execution, e.g.: YEAR/MONTH/DAY/HOUR/MINUTE with a file name of the messageID
In a DynamoDB table named metaDataTable, you should find a record with a messageID equal to the messageID from above that contains all of the meta data related to the payload

Conclusion

This blog describes architectural patterns, messaging patterns, and data structures that support a highly reliable messaging system for large messages. The use of serverless services including Lambda functions, Step Functions, ElastiCache, DynamoDB, and S3 meet the requirements of modern audit systems to be scalable and reliable. The architecture shared in this blog post is suitable for a highly regulated environment to store and track messages that are larger than typical logging systems, records sized between 256k and 6MB. The architecture serves as a blueprint that can be extended and adapted to fit further serverless use cases.

For serverless learning resources, visit Serverless Land.

Serverless ICYMI Q4 2023

2024-01-09 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/serverless-icymi-q4-2023/

Welcome to the 24th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

2023 Q4 Calendar

ServerlessVideo

ServerlessVideo at re:Invent 2024

ServerlessVideo is a demo application built by the AWS Serverless Developer Advocacy team to stream live videos and also perform advanced post-video processing. It uses several AWS services including AWS Step Functions, Amazon EventBridge, AWS Lambda, Amazon ECS, and Amazon Bedrock in a serverless architecture that makes it fast, flexible, and cost-effective. Key features include an event-driven core with loosely coupled microservices that respond to events routed by EventBridge. Step Functions orchestrates using both Lambda and ECS for video processing to balance speed, scale, and cost. There is a flexible plugin-based architecture using Step Functions and EventBridge to integrate and manage multiple video processing workflows, which include GenAI.

ServerlessVideo allows broadcasters to stream video to thousands of viewers using Amazon IVS. When a broadcast ends, a Step Functions workflow triggers a set of configured plugins to process the video, generating transcriptions, validating content, and more. The application incorporates various microservices to support live streaming, on-demand playback, transcoding, transcription, and events. Learn more about the project and watch videos from reinvent 2023 at video.serverlessland.com.

AWS Lambda

AWS Lambda enabled outbound IPv6 connections from VPC-connected Lambda functions, providing virtually unlimited scale by removing IPv4 address constraints.

The AWS Lambda and AWS SAM teams also added support for sharing test events across teams using AWS SAM CLI to improve collaboration when testing locally.

AWS Lambda introduced integration with AWS Application Composer, allowing users to view and export Lambda function configuration details for infrastructure as code (IaC) workflows.

AWS added advanced logging controls enabling adjustable JSON-formatted logs, custom log levels, and configurable CloudWatch log destinations for easier debugging. AWS enabled monitoring of errors and timeouts occurring during initialization and restore phases in CloudWatch Logs as well, making troubleshooting easier.

For Kafka event sources, AWS enabled failed event destinations to prevent functions stalling on failing batches by rerouting events to SQS, SNS, or S3. AWS also enhanced Lambda auto scaling for Kafka event sources in November to reach maximum throughput faster, reducing latency for workloads prone to large bursts of messages.

AWS launched support for Python 3.12 and Java 21 Lambda runtimes, providing updated libraries, smaller deployment sizes, and better AWS service integration. AWS also introduced a simplified console workflow to automate complex network configuration when connecting functions to Amazon RDS and RDS Proxy.

Additionally in December, AWS enabled faster individual Lambda function scaling allowing each function to rapidly absorb traffic spikes by scaling up to 1000 concurrent executions every 10 seconds.

Amazon ECS and AWS Fargate

In Q4 of 2023, AWS introduced several new capabilities across its serverless container services including Amazon ECS, AWS Fargate, AWS App Runner, and more. These features help improve application resilience, security, developer experience, and migration to modern containerized architectures.

In October, Amazon ECS enhanced its task scheduling to start healthy replacement tasks before terminating unhealthy ones during traffic spikes. This prevents going under capacity due to premature shutdowns. Additionally, App Runner launched support for IPv6 traffic via dual-stack endpoints to remove the need for address translation.

In November, AWS Fargate enabled ECS tasks to selectively use SOCI lazy loading for only large container images in a task instead of requiring it for all images. Amazon ECS also added idempotency support for task launches to prevent duplicate instances on retries. Amazon GuardDuty expanded threat detection to Amazon ECS and Fargate workloads which users can easily enable.

Also in November, the open source Finch container tool for macOS became generally available. Finch allows developers to build, run, and publish Linux containers locally. A new website provides tutorials and resources to help developers get started.

Finally in December, AWS Migration Hub Orchestrator added new capabilities for replatforming applications to Amazon ECS using guided workflows. App Runner also improved integration with Route 53 domains to automatically configure required records when associating custom domains.

AWS Step Functions

In Q4 2023, AWS Step Functions announced the redrive capability for Standard Workflows. This feature allows failed workflow executions to be redriven from the point of failure, skipping unnecessary steps and reducing costs. The redrive functionality provides an efficient way to handle errors that require longer investigation or external actions before resuming the workflow.

Step Functions also launched support for HTTPS endpoints in AWS Step Functions, enabling easier integration with external APIs and SaaS applications without needing custom code. Developers can now connect to third-party HTTP services directly within workflows. Additionally, AWS released a new test state capability that allows testing individual workflow states before full deployment. This feature helps accelerate development by making it faster and simpler to validate data mappings and permissions configurations.

AWS announced optimized integrations between AWS Step Functions and Amazon Bedrock for orchestrating generative AI workloads. Two new API actions were added specifically for invoking Bedrock models and training jobs from workflows. These integrations simplify building prompt chaining and other techniques to create complex AI applications with foundation models.

Finally, the Step Functions Workflow Studio is now integrated in the AWS Application Composer. This unified builder allows developers to design workflows and define application resources across the full project lifecycle within a single interface.

Amazon EventBridge

Amazon EventBridge announced support for new partner integrations with Adobe and Stripe. These integrations enable routing events from the Adobe and Stripe platforms to over 20 AWS services. This makes it easier to build event-driven architectures to handle common use cases.

Amazon SNS

In Q4, Amazon SNS added native in-place message archiving for FIFO topics to improve event stream durability by allowing retention policies and selective replay of messages without provisioning separate resources. Additional message filtering operators were also introduced including suffix matching, case-insensitive equality checks, and OR logic for matching across properties to simplify routing logic implementation for publishers and subscribers. Finally, delivery status logging was enabled through AWS CloudFormation.

Amazon SQS

Amazon SQS has introduced several major new capabilities and updates. These improve visibility, throughput, and message handling for users. Specifically, Amazon SQS enabled AWS CloudTrail logging of key SQS APIs. This gives customers greater visibility into SQS activity. Additionally, SQS increased the throughput quota for the high throughput mode of FIFO queues. This was significantly increased in certain Regions. It also boosted throughput in Asia Pacific Regions. Furthermore, Amazon SQS added dead letter queue redrive support. This allows you to redrive messages that failed and were sent to a dead letter queue (DLQ).

Serverless at AWS re:Invent

Serverless videos from re:Invent

Visit the Serverless Land YouTube channel to find a list of serverless and serverless container sessions from reinvent 2023. Hear from experts like Chris Munns and Julian Wood in their popular session, Best practices for serverless developers, or Nathan Peck and Jessica Deen in Deploying multi-tenant SaaS applications on Amazon ECS and AWS Fargate.

EDA Day Nashville

The AWS Serverless Developer Advocacy team hosted an event-driven architecture (EDA) day conference on October 26, 2022 in Nashville, Tennessee. This inaugural GOTO EDA day convened over 200 attendees ranging from prominent EDA community members to AWS speakers and product managers. Attendees engaged in 13 sessions, two workshops, and panels covering EDA adoption best practices. The event built upon 2022 content by incorporating additional topics like messaging, containers, and machine learning. It also created opportunities for students and underrepresented groups in tech to participate. The full-day conference facilitated education, inspiration, and thoughtful discussion around event-driven architectural patterns and services on AWS.

Videos from EDA Day are now available on the Serverless Land YouTube channel.

Serverless blog posts

October

November

December

Serverless container blog posts

October

November

December

Serverless Office Hours

October

Oct 3 – Governance in depth for serverless apps
Oct 10 – Serverless observability
Oct 17 – Super serverless tools with Lars Jacobsson
Oct 24 – Building GenAI apps
Oct 31 – Visually build AWS applications

November

December

Dec 5 – Step Functions: what’s new
Dec 19 – 2023 Year in review

Containers from the Couch

FooBar

October

November

December

Dec 7 – Build generative AI apps using AWS Step Functions and Amazon Bedrock
Dec 14 – Test State API for Step Functions
Dec 21 – Invoke external endpoints from AWS Step Functions

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

James Beswick: @jbesw
Eric Johnson: @edjgeek
Ben Smith: @benjamin_l_s
Julian Wood: @julian_wood
Marcia Villalba: @mavi888uy
David Boyne: @boyney123

Jeramiah Dooley @jdooley_clt
Jessica Deen @jldeen
Kyle Davis @linux_mclinuxface
Maish Saidel-Keesing @maishsk
Nathan Peck @nathanpeck
Olly Pomeroy @oliver-p
Scott Coulton @scottcoulton

And finally, visit the Serverless Land and Containers on AWS websites for all your serverless and serverless container needs.

AWS Weekly Roundup — AWS Lambda, AWS Amplify, Amazon OpenSearch Service, Amazon Rekognition, and more — December 18, 2023

2023-12-18 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-aws-lambda-aws-amplify-amazon-opensearch-service-amazon-rekognition-and-more-december-18-2023/

My memories of Amazon Web Services (AWS) re:Invent 2023 are still fresh even when I’m currently wrapping up my activities in Jakarta after participating in AWS Community Day Indonesia. It was a great experience, from delivering chalk talks and having thoughtful discussions with AWS service teams, to meeting with AWS Heroes, AWS Community Builders, and AWS User Group leaders. AWS re:Invent brings the global AWS community together to learn, connect, and be inspired by innovation. For me, that spirit of connection is what makes AWS re:Invent always special.

Here’s a quick look of my highlights at AWS re:Invent and AWS Community Day Indonesia:

If you missed AWS re:Invent, you can watch the keynotes and sessions on demand. Also, check out the AWS News Editorial Team’s Top announcements of AWS re:Invent 2023 for all the major launches.

Recent AWS launches
Here are some of the launches that caught my attention in the past two weeks:

Query MySQL and PostgreSQL with AWS Amplify – In this post, Channy wrote how you can now connect your MySQL and PostgreSQL databases to AWS Amplify with just a few clicks. It generates a GraphQL API to query your database tables using AWS CDK.

Migration Assistant for Amazon OpenSearch Service – With this self-service solution, you can smoothly migrate from your self-managed clusters to Amazon OpenSearch Service managed clusters or serverless collections.

AWS Lambda simplifies connectivity to Amazon RDS and RDS Proxy – Now you can connect your AWS Lambda to Amazon RDS or RDS proxy using the AWS Lambda console. With a guided workflow, this improvement helps to minimize complexities and efforts to quickly launch a database instance and correctly connect a Lambda function.

New no-code dashboard application to visualize IoT data – With this announcement, you can now visualize and interact with operational data from AWS IoT SiteWise using a new open source Internet of Things (IoT) dashboard.

Amazon Rekognition improves Face Liveness accuracy and user experience – This launch provides higher accuracy in detecting spoofed faces for your face-based authentication applications.

AWS Lambda supports additional concurrency metrics for improved quota monitoring – Add CloudWatch metrics for your Lambda quotas, to improve visibility into concurrency limits.

AWS Malaysia now supports 3D-Secure authentication – This launch enables 3DS2 transaction authentication required by banks and payment networks, facilitating your secure online payments.

Announcing AWS CloudFormation template generation for Amazon EventBridge Pipes – With this announcement, you can now streamline the deployment of your EventBridge resources with CloudFormation templates, accelerating event-driven architecture (EDA) development.

Enhanced data protection for CloudWatch Logs – With the enhanced data protection, CloudWatch Logs helps identify and redact sensitive data in your logs, preventing accidental exposure of personal data.

Send SMS via Amazon SNS in Asia Pacific – With this announcement, now you can use SMS messaging across Asia Pacific from the Jakarta Region.

Lambda adds support for Python 3.12 – This launch brings the latest Python version to your Lambda functions.

CloudWatch Synthetics upgrades Node.js runtime – Now you can use Node.js 16.1 runtimes for your canary functions.

Manage EBS Volumes for your EC2 fleets – This launch simplifies attaching and managing EBS volumes across your EC2 fleets.

See you next year!
This is the last AWS Weekly Roundup for this year, and we’d like to thank you for being our wonderful readers. We’ll be back to share more launches for you on January 8, 2024.

Happy holidays!

— Donnie

Use Snowflake with Amazon MWAA to orchestrate data pipelines

2023-10-31 Payal Singh

Post Syndicated from Payal Singh original https://aws.amazon.com/blogs/big-data/use-snowflake-with-amazon-mwaa-to-orchestrate-data-pipelines/

This blog post is co-written with James Sun from Snowflake.

Customers rely on data from different sources such as mobile applications, clickstream events from websites, historical data, and more to deduce meaningful patterns to optimize their products, services, and processes. With a data pipeline, which is a set of tasks used to automate the movement and transformation of data between different systems, you can reduce the time and effort needed to gain insights from the data. Apache Airflow and Snowflake have emerged as powerful technologies for data management and analysis.

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed workflow orchestration service for Apache Airflow that you can use to set up and operate end-to-end data pipelines in the cloud at scale. The Snowflake Data Cloud provides a single source of truth for all your data needs and allows your organizations to store, analyze, and share large amounts of data. The Apache Airflow open-source community provides over 1,000 pre-built operators (plugins that simplify connections to services) for Apache Airflow to build data pipelines.

In this post, we provide an overview of orchestrating your data pipeline using Snowflake operators in your Amazon MWAA environment. We define the steps needed to set up the integration between Amazon MWAA and Snowflake. The solution provides an end-to-end automated workflow that includes data ingestion, transformation, analytics, and consumption.

Overview of solution

The following diagram illustrates our solution architecture.

The data used for transformation and analysis is based on the publicly available New York Citi Bike dataset. The data (zipped files), which includes rider demographics and trip data, is copied from the public Citi Bike Amazon Simple Storage Service (Amazon S3) bucket in your AWS account. Data is decompressed and stored in a different S3 bucket (transformed data can be stored in the same S3 bucket where data was ingested, but for simplicity, we’re using two separate S3 buckets). The transformed data is then made accessible to Snowflake for data analysis. The output of the queried data is published to Amazon Simple Notification Service (Amazon SNS) for consumption.

Amazon MWAA uses a directed acyclic graph (DAG) to run the workflows. In this post, we run three DAGs:

DAG1 sets up the Amazon MWAA connection to authenticate to Snowflake. The Snowflake connection string is stored in AWS Secrets Manager, which is referenced in the DAG file.
DAG2 creates the required Snowflake objects (database, table, storage integration, and stage).
DAG3 runs the data pipeline.

The following diagram illustrates this workflow.

See the GitHub repo for the DAGs and other files related to the post.

Note that in this post, we’re using a DAG to create a Snowflake connection, but you can also create the Snowflake connection using the Airflow UI or CLI.

Prerequisites

To deploy the solution, you should have a basic understanding of Snowflake and Amazon MWAA with the following prerequisites:

An AWS account in an AWS Region where Amazon MWAA is supported.
A Snowflake account with admin credentials. If you don’t have an account, sign up for a 30-day free trial. Select the Snowflake enterprise edition for the AWS Cloud platform.
Access to Amazon MWAA, Secrets Manager, and Amazon SNS.
In this post, we’re using two S3 buckets, called airflow-blog-bucket-ACCOUNT_ID and citibike-tripdata-destination-ACCOUNT_ID. Amazon S3 supports global buckets, which means that each bucket name must be unique across all AWS accounts in all the Regions within a partition. If the S3 bucket name is already taken, choose a different S3 bucket name. Create the S3 buckets in your AWS account. We upload content to the S3 bucket later in the post. Replace ACCOUNT_ID with your own AWS account ID or any other unique identifier. The bucket details are as follows:
- airflow-blog-bucket-ACCOUNT_ID – The top-level bucket for Amazon MWAA-related files.
- airflow-blog-bucket-ACCOUNT_ID/requirements – The bucket used for storing the requirements.txt file needed to deploy Amazon MWAA.
- airflow-blog-bucket-ACCOUNT_ID/dags – The bucked used for storing the DAG files to run workflows in Amazon MWAA.
- airflow-blog-bucket-ACCOUNT_ID/dags/mwaa_snowflake_queries – The bucket used for storing the Snowflake SQL queries.
- citibike-tripdata-destination-ACCOUNT_ID – The bucket used for storing the transformed dataset.

When implementing the solution in this post, replace references to airflow-blog-bucket-ACCOUNT_ID and citibike-tripdata-destination-ACCOUNT_ID with the names of your own S3 buckets.

Set up the Amazon MWAA environment

First, you create an Amazon MWAA environment. Before deploying the environment, upload the requirements file to the airflow-blog-bucket-ACCOUNT_ID/requirements S3 bucket. The requirements file is based on Amazon MWAA version 2.6.3. If you’re testing on a different Amazon MWAA version, update the requirements file accordingly.

Complete the following steps to set up the environment:

On the Amazon MWAA console, choose Create environment.
Provide a name of your choice for the environment.
Choose Airflow version 2.6.3.
For the S3 bucket, enter the path of your bucket (s3:// airflow-blog-bucket-ACCOUNT_ID).
For the DAGs folder, enter the DAGs folder path (s3:// airflow-blog-bucket-ACCOUNT_ID/dags).
For the requirements file, enter the requirements file path (s3:// airflow-blog-bucket-ACCOUNT_ID/ requirements/requirements.txt).
Choose Next.
Under Networking, choose your existing VPC or choose Create MWAA VPC.
Under Web server access, choose Public network.
Under Security groups, leave Create new security group selected.
For the Environment class, Encryption, and Monitoring sections, leave all values as default.
In the Airflow configuration options section, choose Add custom configuration value and configure two values:
1. Set Configuration option to secrets.backend and Custom value to airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend.
2. Set Configuration option to secrets.backend_kwargs and Custom value to {"connections_prefix" : "airflow/connections", "variables_prefix" : "airflow/variables"}.
In the Permissions section, leave the default settings and choose Create a new role.
Choose Next.
When the Amazon MWAA environment us available, assign S3 bucket permissions to the AWS Identity and Access Management (IAM) execution role (created during the Amazon MWAA install).

This will direct you to the created execution role on the IAM console.

For testing purposes, you can choose Add permissions and add the managed AmazonS3FullAccess policy to the user instead of providing restricted access. For this post, we provide only the required access to the S3 buckets.

On the drop-down menu, choose Create inline policy.
For Select Service, choose S3.
Under Access level, specify the following:
1. Expand List level and select ListBucket.
2. Expand Read level and select GetObject.
3. Expand Write level and select PutObject.
Under Resources, choose Add ARN.
On the Text tab, provide the following ARNs for S3 bucket access:
1. arn:aws:s3:::airflow-blog-bucket-ACCOUNT_ID (use your own bucket).
2. arn:aws:s3:::citibike-tripdata-destination-ACCOUNT_ID (use your own bucket).
3. arn:aws:s3:::tripdata (this is the public S3 bucket where the Citi Bike dataset is stored; use the ARN as specified here).
Under Resources, choose Add ARN.
On the Text tab, provide the following ARNs for S3 bucket access:
1. arn:aws:s3:::airflow-blog-bucket-ACCOUNT_ID/* (make sure to include the asterisk).
2. arn:aws:s3:::citibike-tripdata-destination-ACCOUNT_ID /*.
3. arn:aws:s3:::tripdata/* (this is the public S3 bucket where the Citi Bike dataset is stored, use the ARN as specified here).
Choose Next.
For Policy name, enter S3ReadWrite.
Choose Create policy.
Lastly, provide Amazon MWAA with permission to access Secrets Manager secret keys.

This step provides the Amazon MWAA execution role for your Amazon MWAA environment read access to the secret key in Secrets Manager.

The execution role should have the policies MWAA-Execution-Policy*, S3ReadWrite, and SecretsManagerReadWrite attached to it.

When the Amazon MWAA environment is available, you can sign in to the Airflow UI from the Amazon MWAA console using link for Open Airflow UI.

Set up an SNS topic and subscription

Next, you create an SNS topic and add a subscription to the topic. Complete the following steps:

On the Amazon SNS console, choose Topics from the navigation pane.
Choose Create topic.
For Topic type, choose Standard.
For Name, enter mwaa_snowflake.
Leave the rest as default.
After you create the topic, navigate to the Subscriptions tab and choose Create subscription.
For Topic ARN, choose mwaa_snowflake.
Set the protocol to Email.
For Endpoint, enter your email ID (you will get a notification in your email to accept the subscription).

By default, only the topic owner can publish and subscribe to the topic, so you need to modify the Amazon MWAA execution role access policy to allow Amazon SNS access.

On the IAM console, navigate to the execution role you created earlier.
On the drop-down menu, choose Create inline policy.
For Select service, choose SNS.
Under Actions, expand Write access level and select Publish.
Under Resources, choose Add ARN.
On the Text tab, specify the ARN arn:aws:sns:<<region>>:<<our_account_ID>>:mwaa_snowflake.
Choose Next.
For Policy name, enter SNSPublishOnly.
Choose Create policy.

Configure a Secrets Manager secret

Next, we set up Secrets Manager, which is a supported alternative database for storing Snowflake connection information and credentials.

To create the connection string, the Snowflake host and account name is required. Log in to your Snowflake account, and under the Worksheets menu, choose the plus sign and select SQL worksheet. Using the worksheet, run the following SQL commands to find the host and account name.

Run the following query for the host name:

WITH HOSTLIST AS 
(SELECT * FROM TABLE(FLATTEN(INPUT => PARSE_JSON(SYSTEM$allowlist()))))
SELECT REPLACE(VALUE:host,'"','') AS HOST
FROM HOSTLIST
WHERE VALUE:type = 'SNOWFLAKE_DEPLOYMENT_REGIONLESS';

Run the following query for the account name:

WITH HOSTLIST AS 
(SELECT * FROM TABLE(FLATTEN(INPUT => PARSE_JSON(SYSTEM$allowlist()))))
SELECT REPLACE(VALUE:host,'.snowflakecomputing.com','') AS ACCOUNT
FROM HOSTLIST
WHERE VALUE:type = 'SNOWFLAKE_DEPLOYMENT';

Next, we configure the secret in Secrets Manager.

On the Secrets Manager console, choose Store a new secret.
For Secret type, choose Other type of secret.
Under Key/Value pairs, choose the Plaintext tab.
In the text field, enter the following code and modify the string to reflect your Snowflake account information:

{"host": "<<snowflake_host_name>>", "account":"<<snowflake_account_name>>","user":"<<snowflake_username>>","password":"<<snowflake_password>>","schema":"mwaa_schema","database":"mwaa_db","role":"accountadmin","warehouse":"dev_wh"}

For example:

{"host": "xxxxxx.snowflakecomputing.com", "account":"xxxxxx" ,"user":"xxxxx","password":"*****","schema":"mwaa_schema","database":"mwaa_db", "role":"accountadmin","warehouse":"dev_wh"}

The values for the database name, schema name, and role should be as mentioned earlier. The account, host, user, password, and warehouse can differ based on your setup.

Choose Next.

For Secret name, enter airflow/connections/snowflake_accountadmin.
Leave all other values as default and choose Next.
Choose Store.

Take note of the Region in which the secret was created under Secret ARN. We later define it as a variable in the Airflow UI.

Configure Snowflake access permissions and IAM role

Next, log in to your Snowflake account. Ensure the account you are using has account administrator access. Create a SQL worksheet. Under the worksheet, create a warehouse named dev_wh.

The following is an example SQL command:

CREATE OR REPLACE WAREHOUSE dev_wh 
 WITH WAREHOUSE_SIZE = 'xsmall' 
 AUTO_SUSPEND = 60 
 INITIALLY_SUSPENDED = true
 AUTO_RESUME = true
 MIN_CLUSTER_COUNT = 1
 MAX_CLUSTER_COUNT = 5;

For Snowflake to read data from and write data to an S3 bucket referenced in an external (S3 bucket) stage, a storage integration is required. Follow the steps defined in Option 1: Configuring a Snowflake Storage Integration to Access Amazon S3(only perform Steps 1 and 2, as described in this section).

Configure access permissions for the S3 bucket

While creating the IAM policy, a sample policy document code is needed (see the following code), which provides Snowflake with the required permissions to load or unload data using a single bucket and folder path. The bucket name used in this post is citibike-tripdata-destination-ACCOUNT_ID. You should modify it to reflect your bucket name.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:DeleteObject",
        "s3:DeleteObjectVersion"
      ],
      "Resource": "arn:aws:s3::: citibike-tripdata-destination-ACCOUNT_ID/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::citibike-tripdata-destination-ACCOUNT_ID"
    }
  ]
}

Create the IAM role

Next, you create the IAM role to grant privileges on the S3 bucket containing your data files. After creation, record the Role ARN value located on the role summary page.

Configure variables

Lastly, configure the variables that will be accessed by the DAGs in Airflow. Log in to the Airflow UI and on the Admin menu, choose Variables and the plus sign.

Add four variables with the following key/value pairs:

Key aws_role_arn with value <<snowflake_aws_role_arn>> (the ARN for role mysnowflakerole noted earlier)
Key destination_bucket with value <<bucket_name>> (for this post, the bucket used in citibike-tripdata-destination-ACCOUNT_ID)
Key target_sns_arn with value <<sns_Arn>> (the SNS topic in your account)
Key sec_key_region with value <<region_of_secret_deployment>> (the Region where the secret in Secrets Manager was created)

The following screenshot illustrates where to find the SNS topic ARN.

The Airflow UI will now have the variables defined, which will be referred to by the DAGs.

Congratulations, you have completed all the configuration steps.

Run the DAG

Let’s look at how to run the DAGs. To recap:

DAG1 (create_snowflake_connection_blog.py) – Creates the Snowflake connection in Apache Airflow. This connection will be used to authenticate with Snowflake. The Snowflake connection string is stored in Secrets Manager, which is referenced in the DAG file.
DAG2 (create-snowflake_initial-setup_blog.py) – Creates the database, schema, storage integration, and stage in Snowflake.
DAG3 (run_mwaa_datapipeline_blog.py) – Runs the data pipeline, which will unzip files from the source public S3 bucket and copy them to the destination S3 bucket. The next task will create a table in Snowflake to store the data. Then the data from the destination S3 bucket will be copied into the table using a Snowflake stage. After the data is successfully copied, a view will be created in Snowflake, on top of which the SQL queries will be run.

To run the DAGs, complete the following steps:

Upload the DAGs to the S3 bucket airflow-blog-bucket-ACCOUNT_ID/dags.
Upload the SQL query files to the S3 bucket airflow-blog-bucket-ACCOUNT_ID/dags/mwaa_snowflake_queries.
Log in to the Apache Airflow UI.
Locate DAG1 (create_snowflake_connection_blog), un-pause it, and choose the play icon to run it.

You can view the run state of the DAG using the Grid or Graph view in the Airflow UI.

After DAG1 runs, the Snowflake connection snowflake_conn_accountadmin is created on the Admin, Connections menu.

Locate and run DAG2 (create-snowflake_initial-setup_blog).

After DAG2 runs, the following objects are created in Snowflake:

The database mwaa_db
The schema mwaa_schema
The storage integration mwaa_citibike_storage_int
The stage mwaa_citibike_stg

Before running the final DAG, the trust relationship for the IAM user needs to be updated.

Log in to your Snowflake account using your admin account credentials.
Open your SQL worksheet created earlier and run the following command:

DESC INTEGRATION mwaa_citibike_storage_int;

mwaa_citibike_storage_int is the name of the integration created by the DAG2 in the previous step.

From the output, record the property value of the following two properties:

STORAGE_AWS_IAM_USER_ARN – The IAM user created for your Snowflake account.
STORAGE_AWS_EXTERNAL_ID – The external ID that is needed to establish a trust relationship.

Now we grant the Snowflake IAM user permissions to access bucket objects.

On the IAM console, choose Roles in the navigation pane.
Choose the role mysnowflakerole.
On the Trust relationships tab, choose Edit trust relationship.
Modify the policy document with the DESC STORAGE INTEGRATION output values you recorded. For example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::5xxxxxxxx:user/mgm4-s- ssca0079"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "AWSPARTNER_SFCRole=4_vsarJrupIjjJh77J9Nxxxx/j98="
        }
      }
    }
  ]
}

The AWS role ARN and ExternalId will be different for your environment based on the output of the DESC STORAGE INTEGRATION query

Locate and run the final DAG (run_mwaa_datapipeline_blog).

At the end of the DAG run, the data is ready for querying. In this example, the query (finding the top start and destination stations) is run as part of the DAG and the output can be viewed from the Airflow XCOMs UI.

In the DAG run, the output is also published to Amazon SNS and based on the subscription, an email notification is sent out with the query output.

Another method to visualize the results is directly from the Snowflake console using the Snowflake worksheet. The following is an example query:

SELECT START_STATION_NAME,
COUNT(START_STATION_NAME) C FROM MWAA_DB.MWAA_SCHEMA.CITIBIKE_VW 
GROUP BY 
START_STATION_NAME ORDER BY C DESC LIMIT 10;

There are different ways to visualize the output based on your use case.

As we observed, DAG1 and DAG2 need to be run only one time to set up the Amazon MWAA connection and Snowflake objects. DAG3 can be scheduled to run every week or month. With this solution, the user examining the data doesn’t have to log in to either Amazon MWAA or Snowflake. You can have an automated workflow triggered on a schedule that will ingest the latest data from the Citi Bike dataset and provide the top start and destination stations for the given dataset.

Clean up

To avoid incurring future charges, delete the AWS resources (IAM users and roles, Secrets Manager secrets, Amazon MWAA environment, SNS topics and subscription, S3 buckets) and Snowflake resources (database, stage, storage integration, view, tables) created as part of this post.

Conclusion

In this post, we demonstrated how to set up an Amazon MWAA connection for authenticating to Snowflake as well as to AWS using AWS user credentials. We used a DAG to automate creating the Snowflake objects such as database, tables, and stage using SQL queries. We also orchestrated the data pipeline using Amazon MWAA, which ran tasks related to data transformation as well as Snowflake queries. We used Secrets Manager to store Snowflake connection information and credentials and Amazon SNS to publish the data output for end consumption.

With this solution, you have an automated end-to-end orchestration of your data pipeline encompassing ingesting, transformation, analysis, and data consumption.

To learn more, refer to the following resources:

About the authors

Payal Singh is a Partner Solutions Architect at Amazon Web Services, focused on the Serverless platform. She is responsible for helping partner and customers modernize and migrate their applications to AWS.

James Sun is a Senior Partner Solutions Architect at Snowflake. He actively collaborates with strategic cloud partners like AWS, supporting product and service integrations, as well as the development of joint solutions. He has held senior technical positions at tech companies such as EMC, AWS, and MapR Technologies. With over 20 years of experience in storage and data analytics, he also holds a PhD from Stanford University.

Bosco Albuquerque is a Sr. Partner Solutions Architect at AWS and has over 20 years of experience working with database and analytics products from enterprise database vendors and cloud providers. He has helped technology companies design and implement data analytics solutions and products.

Manuj Arora is a Sr. Solutions Architect for Strategic Accounts in AWS. He focuses on Migration and Modernization capabilities and offerings in AWS. Manuj has worked as a Partner Success Solutions Architect in AWS over the last 3 years and worked with partners like Snowflake to build solution blueprints that are leveraged by the customers. Outside of work, he enjoys traveling, playing tennis and exploring new places with family and friends.

AWS Weekly Roundup – re:Post Selections, SNS and SQS FIFO improvements, multi-VPC ENI attachments, and more – October 30, 2023

2023-10-30 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-repost-selections-sns-and-sqs-fifo-improvements-multi-vpc-eni-attachments-and-more-october-30-2023/

It’s less than a month to AWS re:Invent, but interesting news doesn’t slow down in the meantime. This week is my turn to help keep you up to date!

Last week’s launches
Here are some of the launches that caught my attention last week:

AWS re:Post – With re:Post, you have access to a community of experts that helps you become even more successful on AWS. With Selections, community members can organize knowledge in an aggregated view to create learning paths or curated content sets.

Amazon SNS – First-in-First-out (FIFO) topics now support the option to store and replay messages without needing to provision a separate archival resource. This improves the durability of your event-driven applications and can help you recover from downstream failure scenarios. Find out more in this AWS Comput Blog post – Archiving and replaying messages with Amazon SNS FIFO. Also, you can now use custom data identifiers to protect not only common sensitive data (such as names, addresses, and credit card numbers) but also domain-specific sensitive data, such as your company’s employee IDs. You can find additional info on this feature in this AWS Security blog post – Mask and redact sensitive data published to Amazon SNS using managed and custom data identifiers.

Amazon SQS – With the increased throughput quota for FIFO high throughput mode, you can process up to 18,000 transactions per second, per API action. Note the throughput quota depends on the AWS Region.

Amazon OpenSearch Service – OpenSearch Serverless now supports automated time-based data deletion with new index lifecycle policies. To determine the best strategy to deliver accurate and low latency vector search queries, OpenSearch can now intelligently evaluate optimal filtering strategies, like pre-filtering with approximate nearest neighbor (ANN) or filtering with exact k-nearest neighbor (k-NN). Also, OpenSearch Service now supports Internet Protocol Version 6 (IPv6).

Amazon EC2 – With multi-VPC ENI attachments, you can launch an instance with a primary elastic network interface (ENI) in one virtual private cloud (VPC) and attach a secondary ENI from another VPC. This helps maintain network-level segregation, but still allows specific workloads (like centralized appliances and databases) to communicate between them.

AWS CodePipeline – With parameterized pipelines, you can dynamically pass input parameters to a pipeline execution. You can now start a pipeline execution when a specific git tag is applied to a commit in the source repository.

Amazon MemoryDB – Now supports Graviton3-based R7g nodes that deliver up to 28 percent increased throughput compared to R6g. These nodes also deliver higher networking bandwidth.

Other AWS news
Here are a few posts from some of the other AWS and cloud blogs that I follow:

Networking & Content Delivery Blog – Some of the technical management and hardware decisions we make when building AWS network infrastructure: A Continuous Improvement Model for Interconnects within AWS Data Centers

DevOps Blog – To help enterprise customers understand how many of developers use CodeWhisperer, how often they use it, and how often they accept suggestions: Introducing Amazon CodeWhisperer Dashboard and CloudWatch Metrics

Front-End Web & Mobile Blog – How to restrict access to your GraphQL APIs to consumers within a private network: Architecture Patterns for AWS AppSync Private APIs

Architecture Blog – Another post in this super interesting series: Let’s Architect! Designing systems for stream data processing

From Community.AWS: Load Testing WordPress Amazon Lightsail Instances and Future-proof Your .NET Apps With Foundation Model Choice and Amazon Bedrock.

Don’t miss the latest AWS open source newsletter by my colleague Ricardo.

Upcoming AWS events
Check your calendars and sign up for these AWS events

AWS Community Days – Join a community-led conference run by AWS user group leaders in your region: Jaipur (November 4), Vadodara (November 4), Brasil (November 4), Central Asia (Kazakhstan, Uzbekistan, Kyrgyzstan, and Mongolia on November 17-18), and Guatemala (November 18).

AWS re:Invent (November 27 – December 1) – Join us to hear the latest from AWS, learn from experts, and connect with the global cloud community. Browse the session catalog and attendee guides and check out the highlights for generative AI.

Here you can browse all upcoming AWS-led in-person and virtual events and developer-focused events.

And that’s all from me for this week. On to the next one!

— Danilo

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Archiving and replaying messages with Amazon SNS FIFO

2023-10-27 Benjamin Smith

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/archiving-and-replaying-messages-with-amazon-sns-fifo/

This post is written by A Mohammed Atiq, Solutions Architect and Mithun Mallick, Principal Solutions Architect, Serverless

Amazon Simple Notification Service (SNS) offers a flexible, fully managed messaging service, allowing applications to send and receive messages. SNS acts as a channel, delivering events from publishers to subscribers.

Today, AWS is announcing a new capability that enables you to archive and replay messages published to SNS FIFO (first-in first-Out) topics. Now, when enabled with an archive policy, SNS FIFO topics automatically:

Archives events, with a no-code, in-place message archive that doesn’t require any external resources. You only need to define an archive policy on your topic, including the required retention period (from 1 to 365 days).
Replays events: subscribers benefit from a managed, no-code message replay functionality, with built-in progress reporting and message filtering capabilities. To initiate a replay, subscribers simply apply a replay policy to their subscription, defining a starting point and an ending point using timestamps.

This feature can be useful in failure recovery and state replication scenarios.

Failure recovery

In failure recovery scenarios, developers can use this to reprocess a subset of messages and recover from a downstream application failure or a dependency issue.

Consider a situation where a search application needs to reprocess messages because the search engine’s index has been erased. To initiate recovery, the search application would update the ReplayPolicy attribute in its existing subscription using the SetSubscriptionAttributes API action, to start receiving messages from a specific point in time, rather than from when the Archive policy was applied to the topic.

State replication

For state replication scenarios, this feature enables new applications to duplicate the state of previously subscribed applications.

Consider an internal data warehouse application that must replicate the state of an external search application to make the data indexed in the search engine available to product managers and other internal staff. The data warehouse application subscribes its newly created endpoint (for example, an Amazon SQS FIFO queue) to the topic using the Subscribe API action and sets the ReplayPolicy subscription attribute.

If it opts to replicate the full state of the search engine, it might set the timestamp in its ReplayPolicy to coincide with the search engine’s subscription’s creation date and time, ensuring all data ever delivered to the search engine is also delivered to the data warehouse tool.

Enabling the archive policy via the SNS console

When creating a new SNS FIFO topic, you see an option for the archive policy. This policy determines how long SNS stores your messages, making them available for potential resending to a subscription if necessary. The Archive policy does not activate by default – you must enable it for each topic manually or automate the operation.

For instance, the retention period for this FIFO topic is set at 30 days. However, you can adjust this duration anywhere from 1 to 365 days. Once you activate the archive policy, messages sent to this topic are archived for the defined period.

To confirm that the archive policy is in effect after creating the topic, check the topic details. Next to the retention policy, and its status is displayed as Active.

By subscribing an SQS FIFO queue to an SNS FIFO topic, you can replay messages, and the Replay status shows Not running. You can subscribe both FIFO and standard SQS queues to their SNS FIFO topics, providing flexibility for various use-case requirements. To initiate a replay, navigate to the SNS console, choose Replay, and then choose Start replay.

When you initiate a replay, a window appears, allowing you to specify the start and end dates, as well as the exact time from which messages are archived. This feature affords the flexibility to replay only messages of interest, instead of every archived message, by allowing you to set on a specific schedule. When you choose Start replay, the service begins sending messages to the subscriber.

You can also define settings for the SNS FIFO archive and replay features with both AWS CloudFormation and the AWS Serverless Application Model (AWS SAM).

Use Cases

Replaying events for error recovery in microservices

In a scenario where an insurance application uses multiple microservices, consider one claims processing microservice encounters an error and drops a claim. Such an oversight can cause the workload to be out of sync.

With the archive and replay feature, you can revisit and replay events from the time the error was detected. This allows the microservice to recognize the missed event and complete the necessary actions, ensuring the system remains updated and accurate.

Messages are published to an SNS FIFO topic from an application.
Messages are delivered to an SQS FIFO queue containing claim details to be processed by a downstream microservice.
The microservice fails to process a series of messages due to an exception and discards all of the messages.
The user then initiates a replay from the SNS FIFO topic, specifies the time frame of messages to replay based on when the failure occurred.
The microservice is now able to successfully process the replayed messages and persists data into a DynamoDB table.

Replicating state across Regions

In situations where an application spans multiple Regions, and a microservice encounters difficulties in its primary Region, you can replicate the infrastructure to another Region using an active/standby setup.

You can reroute traffic to the standby microservice in the secondary Region, maintaining synchronization through event replays. You can set an end time in the SNS replay policy but if this isn’t defined, replaying continues until all the most recent messages are sent.

After, the SNS subscription resumes normal functioning, capturing all new messages. This approach is suitable for many state replication scenarios, such as cross-Region backup strategies, as it helps minimize downtime and prevent message loss.

Messages are published to an SNS FIFO topic from an application.
Messages are delivered to an SQS FIFO queue containing claim details to be processed by a downstream microservice.
The microservice failed to process a series of messages due to an exception and discarded all of the messages.
The user then subscribes a new SQS FIFO queue in another region, initiates a replay from the SNS FIFO topic and specifies the time frame of messages to replay based on when the failure occurred.
The microservice in a different region is able to retrieve the replayed messages from the new SQS FIFO queue, successfully processes the series of messages and persists data into a DynamoDB table.

Configuring SNS FIFO archive and replay for auto insurance processing

Managing auto insurance claims requires timely coordination. This walkthrough shows the combined benefits of SNS FIFO and SQS FIFO to process claims in the correct sequence.

Both SQS FIFO and SQS standard queues can be subscribed to the SNS FIFO topic, offering versatility in handling claims. The archive and replay functionality of SNS FIFO is paramount; disruptions in downstream microservices don’t compromise claim integrity due to the replay capability.

This walkthrough guides you through deploying an auto insurance claims processing example using the AWS CLI. You create an SNS FIFO topic for claim submissions and two SQS FIFO queues. The first queue is for primary processing of the claims, while the second is specifically for message replays to support application state replication across various system instances.

Prerequisites

AWS Command Line Interface (CLI): You can download and set it up using the documentation.
JQ: To install, follow installation instructions.

Step 1 – Creating resources using the AWS CLI and storing variables

Run the following commands in the terminal.

REGION=$(aws configure get region)

# Create an SNS FIFO topic for auto insurance claims
AUTO_INSURANCE_TOPIC_ARN=$(aws sns create-topic --name "AutoInsuranceClaimsTopic.fifo" --attributes "FifoTopic=true,ContentBasedDeduplication=true,DisplayName=Auto Insurance Claims Topic" --region $REGION | jq -r '.TopicArn')

# Create primary and replay SQS FIFO queues
AUTO_INSURANCE_QUEUE_URL=$(aws sqs create-queue --queue-name "AutoInsuranceClaimsQueue.fifo" --attributes "FifoQueue=true" --region $REGION | jq -r '.QueueUrl')
AUTO_INSURANCE_REPLAY_QUEUE_URL=$(aws sqs create-queue --queue-name "AutoInsuranceReplayQueue.fifo" --attributes "FifoQueue=true" --region $REGION | jq -r '.QueueUrl')

# Get ARNs for both SQS queues
AUTO_INSURANCE_QUEUE_ARN=$(aws sqs get-queue-attributes --queue-url $AUTO_INSURANCE_QUEUE_URL --attribute-names QueueArn --output text --query 'Attributes.QueueArn')
AUTO_INSURANCE_REPLAY_QUEUE_ARN=$(aws sqs get-queue-attributes --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --attribute-names QueueArn --region $REGION | jq -r '.Attributes.QueueArn')

# Define a policy allowing the topic to publish to both queues
SQS_POLICY_TEMPLATE="{\"Policy\" : \"{ \\\"Version\\\": \\\"2012-10-17\\\", \\\"Statement\\\": [ { \\\"Sid\\\": \\\"1\\\", \\\"Effect\\\": \\\"Allow\\\", \\\"Principal\\\": { \\\"Service\\\": \\\"sns.amazonaws.com\\\" }, \\\"Action\\\": [\\\"sqs:SendMessage\\\"], \\\"Resource\\\": [\\\"$AUTO_INSURANCE_QUEUE_ARN\\\", \\\"$AUTO_INSURANCE_REPLAY_QUEUE_ARN\\\"], \\\"Condition\\\": { \\\"ArnLike\\\": { \\\"aws:SourceArn\\\": [\\\"$AUTO_INSURANCE_TOPIC_ARN\\\"] } } } ]}\"}"

# Apply the access policy to the queues
aws sqs set-queue-attributes --queue-url $AUTO_INSURANCE_QUEUE_URL --attributes file://<(echo $SQS_POLICY_TEMPLATE)
aws sqs set-queue-attributes --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --attributes file://<(echo $SQS_POLICY_TEMPLATE)

# Subscribe the primary queue to the created SNS FIFO topic
aws sns subscribe --topic-arn $AUTO_INSURANCE_TOPIC_ARN --protocol sqs --notification-endpoint $AUTO_INSURANCE_QUEUE_ARN --region $REGION

Step 2 – Setting the archive policy on the SNS FIFO topic

Modify the attributes of the SNS FIFO topic to set a retention period. This determines how long a message is retained in the topic archive. This example uses 30 days.

# Set a 30-day retention period for the SNS FIFO topic

aws sns set-topic-attributes --region $REGION --topic-arn $AUTO_INSURANCE_TOPIC_ARN --attribute-name ArchivePolicy --attribute-value "{\"MessageRetentionPeriod\":\"30\"}"

Step 3- Publishing auto insurance claim details

Publish a sample claim to the SNS FIFO topic. This step mimics a real-world scenario where an insurance claim must be processed by subscribers of the topic.

# Get the current timestamp and publish a sample insurance claim
TIMESTAMP_START=$(date -u +%FT%T.000Z)
aws sns publish --region $REGION --topic-arn $AUTO_INSURANCE_TOPIC_ARN --message "{ \"claim_type\": \"collision\", \"registration\": \"AB123CDE\" }" --message-group-id "group1"

Step 4 – Reading auto insurance claim details

Retrieve the insurance claim details from the primary SQS FIFO queue. This simulates a process reading the insurance claim to take action. After reading the message, the claim is deleted from the queue to avoid reprocessing.

# Fetch the claim details from the primary queue, then delete to avoid redundancy
MESSAGE=$(aws sqs receive-message --region $REGION --queue-url $AUTO_INSURANCE_QUEUE_URL --output json)
MESSAGE_TEXT=$(echo "$MESSAGE" | jq -r '.Messages[0].Body')
MESSAGE_RECEIPT=$(echo "$MESSAGE" | jq -r '.Messages[0].ReceiptHandle')
aws sqs delete-message --region $REGION --queue-url $AUTO_INSURANCE_QUEUE_URL --receipt-handle $MESSAGE_RECEIPT
echo "Received claim details: ${MESSAGE_TEXT}"

Step 5 – Subscribing the replay SQS queue to the SNS FIFO topic

To ensure no claims are lost, configure a replay policy for your SQS FIFO queue subscription. This policy sets the schedule from which messages are replayed to the SQS FIFO queue. Here, you subscribe a replay queue with a replay policy and then monitor the status of the replay queue. Once complete, read the replayed claim details from the secondary SQS FIFO queue. If any processing issues occurred initially, there is a second chance to process the claim.

Subscribe replay queue to SNS FIFO topic:

# Subscribe the replay queue to the topic and define its replay policy
NEW_SUBSCRIPTION_ARN=$(aws sns subscribe --region $REGION --topic-arn $AUTO_INSURANCE_TOPIC_ARN --protocol sqs --return-subscription-arn --notification-endpoint $AUTO_INSURANCE_REPLAY_QUEUE_ARN --attributes "{\"ReplayPolicy\":\"{\\\"PointType\\\":\\\"Timestamp\\\",\\\"StartingPoint\\\":\\\"$TIMESTAMP_START\\\"}\"}" --output json | jq -r '.SubscriptionArn')

To monitor the replay status:

# Wait for the replay to complete
while [[ $(aws sns get-subscription-attributes --region $REGION --subscription-arn $NEW_SUBSCRIPTION_ARN --output text | awk 'END{print $9}') != 'Completed' ]]; do printf "."; sleep 5; done; echo "Replay complete";

To read the replayed message and delete the message from the queue:

# Fetch the replayed message and then remove it from the queue
REPLAYED_MESSAGE=$(aws sqs receive-message --region $REGION --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --output json)
REPLAYED_MESSAGE_TEXT=$(echo "$REPLAYED_MESSAGE" | jq -r '.Messages[0].Body')
REPLAYED_MESSAGE_RECEIPT=$(echo "$REPLAYED_MESSAGE" | jq -r '.Messages[0].ReceiptHandle')
aws sqs delete-message --region $REGION --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --receipt-handle $REPLAYED_MESSAGE_RECEIPT
echo "Received replayed claim details: ${REPLAYED_MESSAGE_TEXT}"

Cleaning up

To avoid incurring unnecessary costs, clean up the resources created in this walkthrough:

# Delete the primary SQS FIFO queue
aws sqs delete-queue --queue-url $AUTO_INSURANCE_QUEUE_URL --region $REGION

# Delete the replay SQS FIFO queue
aws sqs delete-queue --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --region $REGION

# Unset the 'ArchivePolicy' attribute
aws sns set-topic-attributes --region $REGION --topic-arn $AUTO_INSURANCE_TOPIC_ARN --attribute-name ArchivePolicy --attribute-value "{}"

# Delete the SNS FIFO topic
aws sns delete-topic --topic-arn $AUTO_INSURANCE_TOPIC_ARN --region $REGION

Conclusion

The new SNS FIFO archive and replay feature provides a substantial foundation for event-driven applications, emphasizing failure recovery and application state replication. These features allow developers to efficiently manage and recover from disruptions, and ensure state replication across different application instances or environments.

Get started with this new SNS FIFO capability by using the AWS Management Console, AWS CLI, AWS Software Development Kit (SDK), or AWS CloudFormation. For information on cost, see SNS pricing and SQS pricing.

For more serverless learning resources, visit Serverless Land.

Mask and redact sensitive data published to Amazon SNS using managed and custom data identifiers

2023-10-25 Otavio Ferreira

Post Syndicated from Otavio Ferreira original https://aws.amazon.com/blogs/security/mask-and-redact-sensitive-data-published-to-amazon-sns-using-managed-and-custom-data-identifiers/

Today, we’re announcing a new capability for Amazon Simple Notification Service (Amazon SNS) message data protection. In this post, we show you how you can use this new capability to create custom data identifiers to detect and protect domain-specific sensitive data, such as your company’s employee IDs. Previously, you could only use managed data identifiers to detect and protect common sensitive data, such as names, addresses, and credit card numbers.

Overview

Amazon SNS is a serverless messaging service that provides topics for push-based, many-to-many messaging for decoupling distributed systems, microservices, and event-driven serverless applications. As applications become more complex, it can become challenging for topic owners to manage the data flowing through their topics. These applications might inadvertently start sending sensitive data to topics, increasing regulatory risk. To mitigate the risk, you can use message data protection to protect sensitive application data using built-in, no-code, scalable capabilities.

To discover and protect data flowing through SNS topics with message data protection, you can associate data protection policies to your topics. Within these policies, you can write statements that define which types of sensitive data you want to discover and protect. Within each policy statement, you can then define whether you want to act on data flowing inbound to an SNS topic or outbound to an SNS subscription, the AWS accounts or specific AWS Identity and Access Management (IAM) principals the statement applies to, and the actions you want to take on the sensitive data found.

Now, message data protection provides three actions to help you protect your data. First, the audit operation reports on the amount of sensitive data found. Second, the deny operation helps prevent the publishing or delivery of payloads that contain sensitive data. Third, the de-identify operation can mask or redact the sensitive data detected. These no-code operations can help you adhere to a variety of compliance regulations, such as Health Insurance Portability and Accountability Act (HIPAA), Federal Risk and Authorization Management Program (FedRAMP), General Data Protection Regulation (GDPR), and Payment Card Industry Data Security Standard (PCI DSS).

This message data protection feature coexists with the message data encryption feature in SNS, both contributing to an enhanced security posture of your messaging workloads.

Managed and custom data identifiers

After you add a data protection policy to your SNS topic, message data protection uses pattern matching and machine learning models to scan your messages for sensitive data, then enforces the data protection policy in real time. The types of sensitive data are referred to as data identifiers. These data identifiers can be either managed by Amazon Web Services (AWS) or custom to your domain.

Managed data identifiers (MDI) are organized into five categories:

Personally identifiable information (PII) includes data identifiers such as name, home address, email address, and social security number
Protected health information (PHI) examples include healthcare card number, prescription drug code, and healthcare procedure code
Financial information examples include bank account number, credit card number, and credit card magnetic strip data
Credential information examples include AWS secret access key, SSH private key, and PGP private key
Device information examples include IP addresses

In a data protection policy statement, you refer to a managed data identifier using its Amazon Resource Name (ARN), as follows:

{
    "Name": "__example_data_protection_policy",
    "Description": "This policy protects sensitive data in expense reports",
    "Version": "2021-06-01",
    "Statement": [{
        "DataIdentifier": [
            "arn:aws:dataprotection::aws:data-identifier/CreditCardNumber"
        ],
        "..."
    }]
}

Custom data identifiers (CDI), on the other hand, enable you to define custom regular expressions in the data protection policy itself, then refer to them from policy statements. Using custom data identifiers, you can scan for business-specific sensitive data, which managed data identifiers can’t. For example, you can use a custom data identifier to look for company-specific employee IDs in SNS message payloads. Internally, SNS has guardrails to make sure custom data identifiers are safe and that they add only low single-digit millisecond latency to message processing.

In a data protection policy statement, you refer to a custom data identifier using only the name that you have given it, as follows:

{
    "Name": "__example_data_protection_policy",
    "Description": "This policy protects sensitive data in expense reports",
    "Version": "2021-06-01",
    "Configuration": {
        "CustomDataIdentifier": [{
            "Name": "MyCompanyEmployeeId", "Regex": "EID-\d{9}-US"
        }]
    },
    "Statement": [{
        "DataIdentifier": [
            "arn:aws:dataprotection::aws:data-identifier/CreditCardNumber",
            "MyCompanyEmployeeId"
        ],
        "..."
    }]
}

Note that custom data identifiers can be used in conjunction with managed data identifiers, as part of the same data protection policy statement. In the preceding example, both MyCompanyEmployeeId and CreditCardNumber are in scope.

For more information, see Data Identifiers, in the SNS Developer Guide.

Inbound and outbound data directions

In addition to the DataIdentifier property, each policy statement also sets the DataDirection property (whose value can be either Inbound or Outbound) as well as the Principal property (whose value can be any combination of AWS accounts, IAM users, and IAM roles).

When you use message data protection for data de-identification and set DataDirection to Inbound, instances of DataIdentifier published by the Principal are masked or redacted before the payload is ingested into the SNS topic. This means that every endpoint subscribed to the topic receives the same modified payload.

When you set DataDirection to Outbound, on the other hand, the payload is ingested into the SNS topic as-is. Then, instances of DataIdentifier are either masked, redacted, or kept as-is for each subscribing Principal in isolation. This means that each endpoint subscribed to the SNS topic might receive a different payload from the topic, with different sensitive data de-identified, according to the data access permissions of its Principal.

The following snippet expands the example data protection policy to include the DataDirection and Principal properties.

{
    "Name": "__example_data_protection_policy",
    "Description": "This policy protects sensitive data in expense reports",
    "Version": "2021-06-01",
    "Configuration": {
        "CustomDataIdentifier": [{
            "Name": "MyCompanyEmployeeId", "Regex": "EID-\d{9}-US"
        }]
    },
    "Statement": [{
        "DataIdentifier": [
            "MyCompanyEmployeeId",
            "arn:aws:dataprotection::aws:data-identifier/CreditCardNumber"
        ],
        "DataDirection": "Outbound",
        "Principal": [ "arn:aws:iam::123456789012:role/ReportingApplicationRole" ],
        "..."
    }]
}

In this example, ReportingApplicationRole is the authenticated IAM principal that called the SNS Subscribe API at subscription creation time. For more information, see How do I determine the IAM principals for my data protection policy? in the SNS Developer Guide.

Operations for data de-identification

To complete the policy statement, you need to set the Operation property, which informs the SNS topic of the action that it should take when it finds instances of DataIdentifer in the outbound payload.

The following snippet expands the data protection policy to include the Operation property, in this case using the Deidentify object, which in turn supports masking and redaction.

{
    "Name": "__example_data_protection_policy",
    "Description": "This policy protects sensitive data in expense reports",
    "Version": "2021-06-01",
    "Configuration": {
        "CustomDataIdentifier": [{
            "Name": "MyCompanyEmployeeId", "Regex": "EID-\d{9}-US"
        }]
    },
    "Statement": [{
        "Principal": [
            "arn:aws:iam::123456789012:role/ReportingApplicationRole"
        ],
        "DataDirection": "Outbound",
        "DataIdentifier": [
            "MyCompanyEmployeeId",
            "arn:aws:dataprotection::aws:data-identifier/CreditCardNumber"
        ],
        "Operation": { "Deidentify": { "MaskConfig": { "MaskWithCharacter": "#" } } }
    }]
}

In this example, the MaskConfig object instructs the SNS topic to mask instances of CreditCardNumber in Outbound messages to subscriptions created by ReportingApplicationRole, using the MaskWithCharacter value, which in this case is the hash symbol (#). Alternatively, you could have used the RedactConfig object instead, which would have instructed the SNS topic to simply cut the sensitive data off the payload.

The following snippet shows how the outbound payload is masked, in real time, by the SNS topic.

// original message published to the topic:
My credit card number is 4539894458086459

// masked message delivered to subscriptions created by ReportingApplicationRole:
My credit card number is ################

For more information, see Data Protection Policy Operations, in the SNS Developer Guide.

Applying data de-identification in a use case

Consider a company where managers use an internal expense report management application where expense reports from employees can be reviewed and approved. Initially, this application depended only on an internal payment application, which in turn connected to an external payment gateway. However, this workload eventually became more complex, because the company started also paying expense reports filed by external contractors. At that point, the company built a mobile application that external contractors could use to view their approved expense reports. An important business requirement for this mobile application was that specific financial and PII data needed to be de-identified in the externally displayed expense reports. Specifically, both the credit card number used for the payment and the internal employee ID that approved the payment had to be masked.

Figure 1: Expense report processing application

To distribute the approved expense reports to both the payment application and the reporting application that backed the mobile application, the company used an SNS topic with a data protection policy. The policy has only one statement, which masks credit card numbers and employee IDs found in the payload. This statement applies only to the IAM role that the company used for subscribing the AWS Lambda function of the reporting application to the SNS topic. This access permission configuration enabled the Lambda function from the payment application to continue receiving the raw data from the SNS topic.

The data protection policy from the previous section addresses this use case. Thus, when a message representing an expense report is published to the SNS topic, the Lambda function in the payment application receives the message as-is, whereas the Lambda function in the reporting application receives the message with the financial and PII data masked.

Deploying the resources

You can apply a data protection policy to an SNS topic using the AWS Management Console, AWS Command Line Interface (AWS CLI), AWS SDK, or AWS CloudFormation.

To automate the provisioning of the resources and the data protection policy of the example expense management use case, we’re going to use CloudFormation templates. You have two options for deploying the resources:

Run the /templates/message_data_protection_cdi/deploy script in the aws-sns-samples repository in GitHub.
Alternatively, use the following four CloudFormation templates, in order. Allow time for each stack to complete before deploying the next stack.

Deploy using the individual CloudFormation templates in sequence

Prerequisites template: This first template provisions two IAM roles with a managed policy that enables them to create SNS subscriptions and configure the subscriber Lambda functions. You will use these provisioned IAM roles in steps 3 and 4 that follow.
Topic owner template: The second template provisions the SNS topic along with its access policy and data protection policy.
Payment subscriber template: The third template provisions the Lambda function and the corresponding SNS subscription that comprise of the Payment application stack. When prompted, select the PaymentApplicationRole in the Permissions panel before running the template. Moreover, the CloudFormation console will require you to acknowledge that a CloudFormation transform might require access capabilities.
Reporting subscriber template: The final template provisions the Lambda function and the SNS subscription that comprise of the Reporting application stack. When prompted, select the ReportingApplicationRole in the Permissions panel, before running the template. Moreover, the CloudFormation console will require, once again, that you acknowledge that a CloudFormation transform might require access capabilities.

Figure 2: Select IAM role

Now that the application stacks have been deployed, you’re ready to start testing.

Testing the data de-identification operation

Use the following steps to test the example expense management use case.

In the Amazon SNS console, select the ApprovalTopic, then choose to publish a message to it.

In the SNS message body field, enter the following message payload, representing an external contractor expense report, then choose to publish this message:

{
    "expense": {
        "currency": "USD",
        "amount": 175.99,
        "category": "Office Supplies",
        "status": "Approved",
        "created_at": "2023-10-17T20:03:44+0000",
        "updated_at": "2023-10-19T14:21:51+0000"
    },
    "payment": {
        "credit_card_network": "Visa",
        "credit_card_number": "4539894458086459"
    },
    "reviewer": {
        "employee_id": "EID-123456789-US",
        "employee_location": "Seattle, USA"
    },
    "contractor": {
        "employee_id": "CID-000012348-CA",
        "employee_location": "Vancouver, CAN"
    }
}

In the CloudWatch console, select the log group for the PaymentLambdaFunction, then choose to view its latest log stream. Now look for the log stream entry that shows the message payload received by the Lambda function. You will see that no data has been masked in this payload, as the payment application requires raw financial data to process the credit card transaction.

Still in the CloudWatch console, select the log group for the ReportingLambdaFunction, then choose to view its latest log stream. Now look for the log stream entry that shows the message payload received by this Lambda function. You will see that the values for properties credit_card_number and employee_id have been masked, protecting the financial data from leaking into the external reporting application.

{
    "expense": {
        "currency": "USD",
        "amount": 175.99,
        "category": "Office Supplies",
        "status": "Approved",
        "created_at": "2023-10-17T20:03:44+0000",
        "updated_at": "2023-10-19T14:21:51+0000"
    },
    "payment": {
        "credit_card_network": "Visa",
        "credit_card_number": "################"
    },
    "reviewer": {
        "employee_id": "################",
        "employee_location": "Seattle, USA"
    },
    "contractor": {
        "employee_id": "CID-000012348-CA",
        "employee_location": "Vancouver, CAN"
    }
}

As shown, different subscribers received different versions of the message payload, according to their sensitive data access permissions.

Cleaning up the resources

After testing, avoid incurring usage charges by deleting the resources that you created. Open the CloudFormation console and delete the four CloudFormation stacks that you created during the walkthrough.

Conclusion

This post showed how you can use Amazon SNS message data protection to discover and protect sensitive data published to or delivered from your SNS topics. The example use case shows how to create a data protection policy that masks messages delivered to specific subscribers if the payloads contain financial or personally identifiable information.

For more details, see message data protection in the SNS Developer Guide. For information on costs, see SNS pricing.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on AWS re:Post or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

How to prevent SMS Pumping when using Amazon Pinpoint or SNS

2023-10-23 Akshada Umesh Lalaye

Post Syndicated from Akshada Umesh Lalaye original https://aws.amazon.com/blogs/messaging-and-targeting/how-to-prevent-sms-pumping-when-using-amazon-pinpoint-or-sns/

SMS fraud is, unfortunately, a common issue that all senders of SMS encounter as they adopt SMS as a communication channel. This post defines the most common types of fraud and provides concrete guidance on how to mitigate or eliminate each of them.

Introduction to SMS Pumping:

SMS Pumping, also known as an SMS Flood attack, or Artificially Inflated Traffic (AIT), occurs when fraudsters exploit a phone number input field to acquire a one-time passcode (OTP), an app download link, or any other content via SMS. In cases where these input forms lack sufficient security measures, attackers can artificially increase the volume of SMS traffic, thereby exploiting vulnerabilities in your application. The perpetrators dispatch SMS messages to a selection of numbers under the jurisdiction of a particular mobile network operator (MNO), ultimately receiving a portion of the resulting revenue. It is essential to understand how to detect these attacks and prevent them.

Common Evidence of SMS Pumping:

Dramatic Decrease in Conversion Rates: A common SMS use case is for identity verification through the use of One Time Passwords (OTP) but this could also be seen in other types of use cases where a clear and consistent conversion rate is seen. A drop in a normally stable conversion rate may be caused by an increase in volume that will never convert and can indicate an issue that requires investigation. Setting up an alert for anomalies in conversion rates is always a good practice.
SMS Requests or Deliveries from Unknown Countries: If your application normally sends SMS to a defined set of countries and you begin to receive requests for a different country, then then this should be investigated.
Spike in Outgoing Messages: A significant and sudden increase in outgoing messages could indicate an issue that requires investigation.
Spike in Messages Sent to a Block of Adjacent Numbers: Fraudsters often deploy bots and programmatically loop through numbers in a sequence. You will probably notice an increase in messages to a group of nearby numbers frequently for example, +11111111110, +11111111111

How to Identify and Prevent SMS Pumping Attacks:

Now that we understand the common signs of SMS pumping, lets discuss how to use AWS Services to identify, confirm the fraud and how to place measures in place to prevent it in the first place.

Identify:

Spike in Outgoing Messages and SMS requests to unknown countries: If you are using Amazon Simple Notification Service (SNS), in the SNS console, you can navigate to Mobile → Text Messaging(SMS) , Delivery statistics and Delivery statistics by country to understand the message requests. You can also check to see if there is any spike.

Delivery Statistics (UTC)

If you are using Amazon Pinpoint, you can use transactional messaging under analytics section to understand the SMS patterns

Transactional Messaging Charts

Spikes in Messages Sent to a Block of Adjacent Numbers: If you are using SNS you can use CloudWatch logs to analyse the destination numbers.

You can use CloudWatch Insights query on below log groups

sns/<region>/<Accountnumber>/DirectPublishToPhoneNumber
sns/<region>/<Accountnumber>/DirectPublishToPhoneNumber/failure

The below query will print all the logs that have the destination number like +11111111111
fields @timestamp, @message, @logStream, @log
| filter delivery.destination like '+11111111111'
| limit 20

If you are using Amazon Pinpoint, you can enable event stream to analyse destination numbers.

If you have deployed Digital User Engagement Events Database Solution You can use the below sample Amazon Athena query which displays entries that have the destination number like +11111111111

SELECT * FROM "due_eventdb"."sms_success" where destination_phone_number like '%11111111111%'

SELECT * FROM "due_eventdb"."sms_failure" where destination_phone_number like '%11111111111%'

How to Prevent SMS Pumping:

Geo Protection
- You can integrate your SMS API with AWS Web Application Firewall (WAF). With WAF you can set up rules to inspect the body and query parameters to allow only requests that your app expects.

- - Example: If you expect only users from India to sign up in your application, you can include rules such as “\+91[0-9]{10}”, which allows only Indian numbers as input.
  - Note: SNS and Pinpoint APIs are not natively integrated with WAF. However, you can connect your application to an Amazon API Gateway with which you can integrate with WAF.
  - How to Create a Regex Pattern Set with WAF – The below Regex Pattern set will allow sending messages to Australia (+61) and India (+91) destination phone numbers

1. 1. 1. 1. Sign in to the AWS Management Console and navigate to AWS WAF console
      2. In the navigation pane, choose Regex pattern sets and then Create regex pattern set.
      3. Enter a name and description for the regex pattern set. You’ll use these to identify it when you want to use the set. For example, Allowed_SMS_Countries
      4. Select the Region where you want to store the regex pattern set
      5. In the Regular expressions text box, enter one regex pattern per line
      6. Review the settings for the regex pattern set, and choose Create regex pattern set

Regex pattern set details

- - Create a Web ACL with above Regex Pattern Set
    1. 1. Sign in to the AWS Management Console and navigate to AWS WAF console
      2. In the navigation pane, choose Web ACLs and then Create web ACL
      3. Enter a Name, Description and CloudWatch metric name for Web ACL details
      4. Select Resource type as Regional resources
      5. Click Next
        
        Web ACL details
      6. Click on Add Rules > Add my own rules and rule groups
      7. Enter Rule name and select Regular rule
        
        Web ACL Rule Builder
      8. Select Inspect > Body, Content type as JSON, JSON match scope as Values, Content to inspect as Full JSON content
      9. Select Match type as Matches pattern from regex pattern set and select the Regex pattern set as “Allowed_SMS_Countries” created above
      10. Select Action as Allow
      11. Click Add Rule
        
        Web ACL Rule builder statement
      12. Select Block for Default web ACL action for requests that don’t match any rules
        
        Web ACL Rules
      13. Set rule priority and Click Next
        
        Web ACL Rule priority
      14. Configure metrics and Click Next
        
        Web ACL metrics
      15. Review and Click Create web ACL

For more information, please refer to WebACL

- - Configure Amazon SNS SMS via API Gateway and Lambda to validate phone number and send SMS from SNS via Amazon API Gateway and AWS Lambda. You can also setup Pinpoint SMS via API Gateway and Lambda
  - Associate the WebACL created above with the API Gateway stage

Rate Limit Requests
- AWS WAF provides an option to rate limit per originating IP. You can define the maximum number of requests allowed in a five-minute period that satisfy the criteria you provide, before limiting the requests using the rule action setting
CAPTCHA
- Implement CAPTCHA in your application request process to protect your application against common bot traffic
Turn off “Shared Routes”
- Review this blog post and the use of the “SharedRoutesEnabled“ parameter within pools in the Pinpoint V2 SMS API
Exponential Delay Verification Retries
- Implement a delay between multiple messages to the same phone number. This doesn’t completely eliminate but will help slow down the attack
Set CloudWatch Alarm
- Configure an Amazon CloudWatch alarm to get notified if the SNS SMSMonthToDateSpentUSD or Pinpoint TextMessageMonthlySpend metric goes beyond a specific threshold
Validate Phone Numbers – You can use the Pinpoint Phone number validate API to check the values for CountryCodeIso2, CountryCodeNumeric, and PhoneType prior to sending SMS and then only send SMS to countries that match your criteria
Sample API Response:

{
"NumberValidateResponse": {
"Carrier": "ExampleCorp Mobile",
"City": "Seattle",
"CleansedPhoneNumberE164": "+12065550142",
"CleansedPhoneNumberNational": "2065550142",
"Country": "United States",
"CountryCodeIso2": "US",
"CountryCodeNumeric": "1",
"OriginalPhoneNumber": "+12065550142",
"PhoneType": "MOBILE",
"PhoneTypeCode": 0,
"Timezone": "America/Los_Angeles",
"ZipCode": "98101"
}
}

Conclusion:

This post covers the basics of SMS pumping attacks, the different mechanisms that can be used to detect them, and some potential ways to solve for or mitigate them using services and features like Pinpoint Validate API and WAF.

Further Reading:

Review the documentation of WAF with API gateway
here

Review the documentation of Phone number validate
here

Review the Web Access Control lists
here

Resources:

Mobile text messaging (SMS) –
https://docs.aws.amazon.com/sns/latest/dg/sns-mobile-phone-number-as-subscriber.html

Amazon Pinpoint –
https://aws.amazon.com/pinpoint/

Amazon SNS –
https://aws.amazon.com/sns/

AWS WAF –
https://aws.amazon.com/waf

Amazon API Gateway –
https://aws.amazon.com/api-gateway/

Amazon Athena –
https://aws.amazon.com/athena/

AWS Lambda –
https://aws.amazon.com/lambda/

Digital user database –
https://aws.amazon.com/solutions/implementations/digital-user-engagement-events-database/

Serverless ICYMI Q3 2023

2023-10-17 Benjamin Smith

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/serverless-icymi-q3-2023/

Welcome to the 23rd edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

AWS announces the general availability of Amazon Bedrock

Amazon Web Services (AWS) unveils five generative artificial intelligence (AI) innovations to democratize generative AI applications. Amazon Bedrock, now generally available, enables experimentation with top foundation models (FMs) and allows customization with proprietary data.

It supports creating managed agents for complex tasks without code and ensures security and privacy. Amazon Titan Embeddings, another FM, is generally available for various language-related use cases. Meta’s Llama 2, coming soon, enhances dialogue scenarios.

The upcoming Amazon CodeWhisperer customization capability enables secure customization using private code bases. Generative BI authoring capabilities in Amazon QuickSight simplify visualization creation for business analysts.

AWS Lambda

AWS Lambda now detects and stops recursive loops in Lambda functions. AWS Lambda now detects and halts functions caught in recursive or infinite loops, guarding against unexpected costs. Lambda identifies recursive behavior, discontinuing requests after 16 invocations. The feature addresses pitfalls stemming from misconfiguration or coding bugs, introducing detailed error messaging, and allowing users to set maximum limits on retry intervals. Notifications about recursive occurrences are relayed through the AWS Health Dashboard, emails, and CloudWatch Alarms for streamlined troubleshooting. Lambda uses AWS X-Ray trace headers for invocation tracking, requiring supported AWS SDK versions.

AWS simplifies writing .NET 6 Lambda functions. The Lambda Annotations Framework for .NET. A new programming model makes the experience of writing Lambda functions in C# feel more natural for .NET developers by using C# source generator technology. This streamlines the development workflow for .NET developers, making it easier to create serverless applications using the latest version of the .NET framework.

AWS Lambda and Amazon EventBridge Pipes now support enhanced filtering. Additional filtering capabilities include the ability to match against characters at the end of a value (suffix filtering), ignore case sensitivity (equals-ignore-case), and have a single rule match if any conditions across multiple separate fields are true (OR matching).

AWS Lambda Functions powered by AWS Graviton2 are now available in 6 additional Regions. Graviton2 processors are known for their performance benefits, and this expansion provides users with more choices for running serverless workloads.

AWS Lambda adds support for Python 3.11 allowing developers to take advantage of the latest features and improvements in the Python programming language for their serverless functions.

AWS Step Functions

AWS Step Functions enhances Workflow Studio, focusing on an Advanced Starter Template and Code Mode for efficient AWS Step Functions workflow creation. Users benefit from streamlined design-to-code transitions, pasting Amazon States Language (ASL) definitions directly into Workflow Studio, speeding up adjustments. Enhanced workflow execution and configuration allow direct execution and setting adjustments within Workflow Studio, improving user experience.

AWS Step Functions launches enhanced error handling This update helps users to identify errors with precision and refine retry strategies. Step Functions now enables detailed error messages in Fail states and precise control over retry intervals. Use the new maximum limits and jitter functionality to ensure efficient and controlled retries, preventing service overload in recovery scenarios.

AWS Step Functions distributed map is now available in the AWS GovCloud (US) Regions. This release highlights the availability of the distributed map feature in Step Functions specifically tailored for the AWS GovCloud (US) Regions. The distributed map feature is a powerful capability for orchestrating parallel and distributed processing in serverless workflows.

AWS SAM

AWS SAM CLI announces local testing and debugging support on Terraform projects.

Developers can now use AWS SAM CLI to locally test and debug AWS Lambda functions and Amazon API Gateway defined in their Terraform projects. AWS SAM CLI reads infrastructure resource information from the Terraform application, allowing users to start Lambda functions and API Gateway endpoints locally in a Docker container.

This update enables faster development cycles for Terraform users, who can use AWS SAM CLI commands like `AWS SAM local start-api`, `sam local start-lambda`, and `sam local invoke`, along with `sam local generate` for generating mock test events.

Amazon EventBridge

Amazon EventBridge Scheduler adds schedule deletion after completion. This feature offers enhanced functionality by supporting the automatic deletion of schedules upon completion of their last invocation. It is applicable to various scheduling types, including one-time, cron, and rate schedules with an end date. Amazon EventBridge Scheduler, a centralized and highly scalable service, enables the creation, execution, and management of schedules.

With the ability to schedule millions of tasks invoking over 270 AWS services and 6,000 API operations. This update streamlines the process of managing completed schedules. The automatic deletion feature reduces the need for manual intervention or custom code, saving time and simplifying scalability for users leveraging EventBridge Scheduler.

Amazon EventBridge Pipes now available in three additional Regions. This update extends the availability of Amazon EventBridge Pipes, a powerful event-routing service, to three additional Regions.

Amazon EventBridge API Destinations is now available in additional Regions. Providing users with more options for building scalable and decoupled applications.

Amazon EventBridge Schema Registry and Schema Discovery now in additional Regions. This expansion allows you to discover and store event structure – or schema – in a shared, central location. You can download code bindings for those schemas for Java, Python, TypeScript, and Golang so it’s easier to use events as objects in your code.

Amazon SNS

To enhance message privacy and security, Amazon Simple Notification Service (SNS) implemented Message Data Protection, allowing users to de-identify outbound messages via redaction or masking. Amazon SNS FIFO topics now support message delivery to Amazon SQS Standard queues. This provides users with increased flexibility in managing message delivery and ordering.

Expanding its monitoring capabilities, Amazon SNS introduced Additional Usage Metrics in Amazon CloudWatch. This enhancement allows users to gain more comprehensive insights into the performance and utilization of their SNS resources. SNS extended its global SMS sending capabilities to Israel (Tel Aviv), providing users in that Region with additional options for SMS notifications. SNS also expanded its reach by supporting Mobile Push Notifications in twelve new AWS Regions. This expansion aligns with the growing demand for mobile notification capabilities, offering a broader coverage for users across diverse Regions.

Amazon SQS

Amazon Simple Queue Service (SQS) introduced a number of updates. Attribute-Based Access Control (ABAC) was implemented for scalable access permissions, while message data protection can now de-identify outbound messages via redaction or masking. SQS FIFO topics now support message delivery to Amazon SQS Standard queues, providing enhanced flexibility. Addressing throughput demands, SQS increased the quota for FIFO High Throughput mode. JSON protocol support was previewed, offering improved message format flexibility. These updates underscore SQS’s commitment to advanced security and flexibility.

Amazon API Gateway

Amazon API Gateway undergoes a console refresh, aligning with Cloudscape Design System guidelines. Notable enhancements include improved usability, sortable tables, enhanced API key management, and direct API deployment from the Resource view. The update introduces dark mode, accessibility improvements, and visual alignment with HTTP APIs and AWS Services.

GOTO EDA day Nashville 2023

Join GOTO EDA Day in Nashville on October 26 for insights on event-driven architectures. Learn from industry leaders at Music City Center with talks, panels, and Hands-On Labs. Limited tickets available.

Serverless blog posts

July 2023

July 5- Implementing AWS Lambda error handling patterns

July 6 – Implementing AWS Lambda error handling patterns

July 7 – Understanding AWS Lambda’s invoke throttling limits

July 10 – Detecting and stopping recursive loops in AWS Lambda functions

July 11 – Implementing patterns that exit early out of a parallel state in AWS Step Functions

July 26 – Migrating AWS Lambda functions from the Go1.x runtime to the custom runtime on Amazon Linux 2

July 27 – Python 3.11 runtime now available in AWS Lambda

August 2023

August 2 – Automatically delete schedules upon completion with Amazon EventBridge Scheduler

August 7 – Using response streaming with AWS Lambda Web Adapter to optimize performance

August 15 – Integrating IBM MQ with Amazon SQS and Amazon SNS using Apache Camel

August 15 – Implementing the transactional outbox pattern with Amazon EventBridge Pipes

August 23 – Protecting an AWS Lambda function URL with Amazon CloudFront and Lambda@Edge

August 29 – Enhancing file sharing using Amazon S3 and AWS Step Functions

August 31 – Enhancing Workflow Studio with new features for streamlined authoring

September 2023

September 5 – AWS SAM support for HashiCorp Terraform now generally available

September 14 – Building a secure webhook forwarder using an AWS Lambda extension and Tailscale

September 18 – Building resilient serverless applications using chaos engineering

September 19 – Implementing idempotent AWS Lambda functions with Powertools for AWS Lambda (TypeScript)

September 19 – Centralizing management of AWS Lambda layers across multiple AWS Accounts

September 26 – Architecting for scale with Amazon API Gateway private integrations

September 26 – Visually design your application with AWS Application Composer

Videos

Serverless Office Hours – Tues 10AM PT

July 2023

July 4 – Benchmarking Lambda cold starts

July 11 – Lambda testing: AWS SAM remote invoke

July 18 – Using DynamoDB global tables

July 25 – Serverless observability with SLIC-watch

August 2023

August 1 – Step Functions versions and aliases

August 8 – Deploying Lambda with EKS and Crossplane / Managing Lambda with Kubernetes

August 15 – Serverless caching with Momento

September 2023

September 5 – Run any web app on Lambda

September 12 – Building an API platform on AWS

September 19 – Idempotency: exactly once processing

September 26 – AWS Amplify Studio + GraphQL

FooBar Serverless YouTube channel

July 2023

July 27 – Generative AI and Serverless to create a new story everyday

August 2023

August 3 – Getting started with Data Streaming

August 10 – Amazon Kinesis Data Streams – Shards? Provisioned? On-demand? What does all this mean?

August 17 – Put and consume events with AWS Lambda, Amazon Kinesis Data Stream and Event Source Mapping

August 24 – Create powerful data pipelines with Amazon Kinesis and EventBridge Pipes

August 31 – New Step Functions versions and alias!

September 2023

September 7 – Amazon Kinesis Data Firehose – What is this service for?

September 14 – Kinesis Data Firehose with AWS CDK – Lambda transformations

September 21 – Advanced Event Source Mapping configuration | AWS Lambda and Amazon Kinesis Data Streams

September 28 – Data Streaming Patterns

Still looking for more?

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

Eric Johnson: @edjgeek
James Beswick: @jbesw
Ben Smith: @benjamin_l_s
Julian Wood: @julian_wood
Marcia Villalba: @mavi888uy
David Boyne: @boyney123

How to add notifications and manual approval to an AWS CDK Pipeline

2023-08-17 Jehu Gray

Post Syndicated from Jehu Gray original https://aws.amazon.com/blogs/devops/how-to-add-notifications-and-manual-approval-to-an-aws-cdk-pipeline/

A deployment pipeline typically comprises several stages such as dev, test, and prod, which ensure that changes undergo testing before reaching the production environment. To improve the reliability and stability of release processes, DevOps teams must review Infrastructure as Code (IaC) changes before applying them in production. As a result, implementing a mechanism for notification and manual approval that grants stakeholders improved access to changes in their release pipelines has become a popular practice for DevOps teams.

Notifications keep development teams and stakeholders informed in real-time about updates and changes to deployment status within release pipelines. Manual approvals establish thresholds for transitioning a change from one stage to the next in the pipeline. They also act as a guardrail to mitigate risks arising from errors and rework because of faulty deployments.

Please note that manual approvals, as described in this post, are not a replacement for the use of automation. Instead, they complement automated checks within the release pipeline.

In this blog post, we describe how to set up notifications and add a manual approval stage to AWS Cloud Development Kit (AWS CDK) Pipeline.

Concepts

CDK Pipeline

CDK Pipelines is a construct library for painless continuous delivery of CDK applications. CDK Pipelines can automatically build, test, and deploy changes to CDK resources. CDK Pipelines are self-mutating which means as application stages or stacks are added, the pipeline automatically reconfigures itself to deploy those new stages or stacks. Pipelines need only be manually deployed once, afterwards, the pipeline keeps itself up to date from the source code repository by pulling the changes pushed to the repository.

Notifications

Adding notifications to a pipeline provides visibility to changes made to the environment by utilizing the NotificationRule construct. You can also use this rule to notify pipeline users of important changes, such as when a pipeline starts execution. Notification rules specify both the events and the targets, such as Amazon Simple Notification Service (Amazon SNS) topic or AWS Chatbot clients configured for Slack which represents the nominated recipients of the notifications. An SNS topic is a logical access point that acts as a communication channel while Chatbot is an AWS service that enables DevOps and software development teams to use messaging program chat rooms to monitor and respond to operational events.

Manual Approval

In a CDK pipeline, you can incorporate an approval action at a specific stage, where the pipeline should pause, allowing a team member or designated reviewer to manually approve or reject the action. When an approval action is ready for review, a notification is sent out to alert the relevant parties. This combination of notifications and approvals ensures timely and efficient decision-making regarding crucial actions within the pipeline.

Solution Overview

The solution explains a simple web service that is comprised of an AWS Lambda function that returns a static web page served by Amazon API Gateway. Since Continuous Deployment and Continuous Integration (CI/CD) are important components to most web projects, the team implements a CDK Pipeline for their web project.

There are two important stages in this CDK pipeline; the Pre-production stage for testing and the Production stage, which contains the end product for users.

The flow of the CI/CD process to update the website starts when a developer pushes a change to the repository using their Integrated Development Environment (IDE). An Amazon CloudWatch event triggers the CDK Pipeline. Once the changes reach the pre-production stage for testing, the CI/CD process halts. This is because a manual approval gate is between the pre-production and production stages. So, it becomes a stakeholder’s responsibility to review the changes in the pre-production stage before approving them for production. The pipeline includes an SNS notification that notifies the stakeholder whenever the pipeline requires manual approval.

After approving the changes, the CI/CD process proceeds to the production stage and the updated version of the website becomes available to the end user. If the approver rejects the changes, the process ends at the pre-production stage with no impact to the end user.

The following diagram illustrates the solution architecture.

This diagram shows the CDK pipeline process in the solution and how applications or updates are deployed using AWS Lambda Function to end users.

Figure 1. This image shows the CDK pipeline process in our solution and how applications or updates are deployed using AWS Lambda Function to end users.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
Install Python version 3.6 or later
A basic understanding of CDK and CDK Pipelines. Please go through the Python Workshop on cdkworkshop.com to follow along with the code examples and get hands-on learning about CDK and related concepts.
Install AWS CDK version 2.73.0 or later
Set up a CDK pipeline, and have a basic understanding of how SNS works .
Since the pipeline stack is being modified, there may be a need to run cdk deploy locally again.
NOTE: The CDK Pipeline code structure used in the CDK workshop can be found here: Pipeline stack Code.

Add notification to the pipeline

In this tutorial, perform the following steps:

Add the import statements for AWS CodeStar notifications and SNS to the import section of the pipeline stack py

import aws_cdk.aws_codestarnotifications as notifications
import aws_cdk.pipelines as pipelines
import aws_cdk.aws_sns as sns
import aws_cdk.aws_sns_subscriptions as subs

Ensure the pipeline is built by calling the ‘build pipeline’ function.

pipeline.build_pipeline()

Create an SNS topic.

topic = sns.Topic(self, "MyTopic1")

Add a subscription to the topic. This specifies where the notifications are sent (Add the stakeholders’ email here).

topic.add_subscription(subs.EmailSubscription("[email protected]"))

Define a rule. This contains the source for notifications, the event trigger, and the target .

rule = notifications.NotificationRule(self, "NotificationRule", )

Assign the source the value pipeline.pipeline The first pipeline is the name of the CDK pipeline(variable) and the .pipeline is to show it is a pipeline(function).

source=pipeline.pipeline,

Define the events to be monitored. Specify notifications for when the pipeline starts, when it fails, when the execution succeeds, and finally when manual approval is needed.

events=["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed","codepipeline-pipeline-pipeline-execution-succeeded", 
"codepipeline-pipeline-manual-approval-needed"],

For the complete list of supported event types for pipelines, see here
Finally, add the target. The target here is the topic created previously.

targets=[topic]

The combination of all the steps becomes:

pipeline.build_pipeline()
topic = sns.Topic(self, "MyTopic1")
topic.add_subscription(subs.EmailSubscription("[email protected]"))
rule = notifications.NotificationRule(self, "NotificationRule",
source=pipeline.pipeline,
events=["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed","codepipeline-pipeline-pipeline-execution-succeeded", 
"codepipeline-pipeline-manual-approval-needed"],
targets=[topic]
)

Adding Manual Approval

Add the ManualApprovalStep import to the aws_cdk.pipelines import statement.

from aws_cdk.pipelines import (
CodePipeline,
CodePipelineSource,
ShellStep,
ManualApprovalStep
)

Add the ManualApprovalStep to the production stage. The code must be added to the add_stage() function.

 prod = WorkshopPipelineStage(self, "Prod")
        prod_stage = pipeline.add_stage(prod,
            pre = [ManualApprovalStep('PromoteToProduction')])

When a stage is added to a pipeline, you can specify the pre and post steps, which are arbitrary steps that run before or after the contents of the stage. You can use them to add validations like manual or automated gates to the pipeline. It is recommended to put manual approval gates in the set of pre steps, and automated approval gates in the set of post steps. So, the manual approval action is added as a pre step that runs after the pre-production stage and before the production stage .

The final version of the pipeline_stack.py becomes:

from constructs import Construct
import aws_cdk as cdk
import aws_cdk.aws_codestarnotifications as notifications
import aws_cdk.aws_sns as sns
import aws_cdk.aws_sns_subscriptions as subs
from aws_cdk import (
    Stack,
    aws_codecommit as codecommit,
    aws_codepipeline as codepipeline,
    pipelines as pipelines,
    aws_codepipeline_actions as cpactions,
    
)
from aws_cdk.pipelines import (
    CodePipeline,
    CodePipelineSource,
    ShellStep,
    ManualApprovalStep
)


class WorkshopPipelineStack(cdk.Stack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)
        
        # Creates a CodeCommit repository called 'WorkshopRepo'
        repo = codecommit.Repository(
            self, "WorkshopRepo", repository_name="WorkshopRepo",
            
        )
        
        #Create the Cdk pipeline
        pipeline = pipelines.CodePipeline(
            self,
            "Pipeline",
            
            synth=pipelines.ShellStep(
                "Synth",
                input=pipelines.CodePipelineSource.code_commit(repo, "main"),
                commands=[
                    "npm install -g aws-cdk",  # Installs the cdk cli on Codebuild
                    "pip install -r requirements.txt",  # Instructs Codebuild to install required packages
                    "npx cdk synth",
                ]
                
            ),
        )

        
         # Create the Pre-Prod Stage and its API endpoint
        deploy = WorkshopPipelineStage(self, "Pre-Prod")
        deploy_stage = pipeline.add_stage(deploy)
    
        deploy_stage.add_post(
            
            pipelines.ShellStep(
                "TestViewerEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": deploy.hc_viewer_url
                },
                commands=["curl -Ssf $ENDPOINT_URL"],
            )
    
        
        )
        deploy_stage.add_post(
            pipelines.ShellStep(
                "TestAPIGatewayEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": deploy.hc_endpoint
                },
                commands=[
                    "curl -Ssf $ENDPOINT_URL",
                    "curl -Ssf $ENDPOINT_URL/hello",
                    "curl -Ssf $ENDPOINT_URL/test",
                ],
            )
            
        )
        
        # Create the Prod Stage with the Manual Approval Step
        prod = WorkshopPipelineStage(self, "Prod")
        prod_stage = pipeline.add_stage(prod,
            pre = [ManualApprovalStep('PromoteToProduction')])
        
        prod_stage.add_post(
            
            pipelines.ShellStep(
                "ViewerEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": prod.hc_viewer_url
                },
                commands=["curl -Ssf $ENDPOINT_URL"],
                
            )
            
        )
        prod_stage.add_post(
            pipelines.ShellStep(
                "APIGatewayEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": prod.hc_endpoint
                },
                commands=[
                    "curl -Ssf $ENDPOINT_URL",
                    "curl -Ssf $ENDPOINT_URL/hello",
                    "curl -Ssf $ENDPOINT_URL/test",
                ],
            )
            
        )
        
        # Create The SNS Notification for the Pipeline
        
        pipeline.build_pipeline()
        
        topic = sns.Topic(self, "MyTopic")
        topic.add_subscription(subs.EmailSubscription("[email protected]"))
        rule = notifications.NotificationRule(self, "NotificationRule",
            source = pipeline.pipeline,
            events = ["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed", "codepipeline-pipeline-manual-approval-needed", "codepipeline-pipeline-manual-approval-succeeded"],
            targets=[topic]
            )

When a commit is made with git commit -am "Add manual Approval" and changes are pushed with git push, the pipeline automatically self-mutates to add the new approval stage.

Now when the developer pushes changes to update the build environment or the end user application, the pipeline execution stops at the point where the approval action was added. The pipeline won’t resume unless a manual approval action is taken.

Image showing the CDK pipeline with the added Manual Approval action on the AWS Management Console

Figure 2. This image shows the pipeline with the added Manual Approval action.

Since there is a notification rule that includes the approval action, an email notification is sent with the pipeline information and approval status to the stakeholder(s) subscribed to the SNS topic.

Image showing the SNS email notification sent when the pipeline starts

Figure 3. This image shows the SNS email notification sent when the pipeline starts.

After pushing the updates to the pipeline, the reviewer or stakeholder can use the AWS Management Console to access the pipeline to approve or deny changes based on their assessment of these changes. This process helps eliminate any potential issues or errors and ensures only changes deemed relevant are made.

Image showing the review action on the AWS Management Console that gives the stakeholder the ability to approve or reject any changes.

Figure 4. This image shows the review action that gives the stakeholder the ability to approve or reject any changes.

If a reviewer rejects the action, or if no approval response is received within seven days of the pipeline stopping for the review action, the pipeline status is “Failed.”

Image showing when a stakeholder rejects the action

Figure 5. This image depicts when a stakeholder rejects the action.

If a reviewer approves the changes, the pipeline continues its execution.

Image showing when a stakeholder approves the action

Figure 6. This image depicts when a stakeholder approves the action.

Considerations

It is important to consider any potential drawbacks before integrating a manual approval process into a CDK pipeline. one such consideration is its implementation may delay the delivery of updates to end users. An example of this is business hours limitation. The pipeline process might be constrained by the availability of stakeholders during business hours. This can result in delays if changes are made outside regular working hours and require approval when stakeholders are not immediately accessible.

Clean up

To avoid incurring future charges, delete the resources. Use cdk destroy via the command line to delete the created stack.

Conclusion

Adding notifications and manual approval to CDK Pipelines provides better visibility and control over the changes made to the pipeline environment. These features ideally complement the existing automated checks to ensure that all updates are reviewed before deployment. This reduces the risk of potential issues arising from bugs or errors. The ability to approve or deny changes through the AWS Management Console makes the review process simple and straightforward. Additionally, SNS notifications keep stakeholders updated on the status of the pipeline, ensuring a smooth and seamless deployment process.

Integrating IBM MQ with Amazon SQS and Amazon SNS using Apache Camel

2023-08-15 Pascal Vogel

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/integrating-ibm-mq-with-amazon-sqs-and-amazon-sns-using-apache-camel/

This post is written by Joaquin Rinaudo, Principal Security Consultant and Gezim Musliaj, DevOps Consultant.

IBM MQ is a message-oriented middleware (MOM) product used by many enterprise organizations, including global banks, airlines, and healthcare and insurance companies.

Customers often ask us for guidance on how they can integrate their existing on-premises MOM systems with new applications running in the cloud. They’re looking for a cost-effective, scalable and low-effort solution that enables them to send and receive messages from their cloud applications to these messaging systems.

This blog post shows how to set up a bi-directional bridge from on-premises IBM MQ to Amazon MQ, Amazon Simple Queue Service (Amazon SQS), and Amazon Simple Notification Service (Amazon SNS).

This allows your producer and consumer applications to integrate using fully managed AWS messaging services and Apache Camel. Learn how to deploy such a solution and how to test the running integration using SNS, SQS, and a demo IBM MQ cluster environment running on Amazon Elastic Container Service (ECS) with AWS Fargate.

This solution can also be used as part of a step-by-step migration using the approach described in the blog post Migrating from IBM MQ to Amazon MQ using a phased approach.

Solution overview

The integration consists of an Apache Camel broker cluster that bi-directionally integrates an IBM MQ system and target systems, such as Amazon MQ running ActiveMQ, SNS topics, or SQS queues.

In the following example, AWS services, in this case AWS Lambda and SQS, receive messages published to IBM MQ via an SNS topic:

The cloud message consumers (Lambda and SQS) subscribe to the solution’s target SNS topic.
The Apache Camel broker connects to IBM MQ using secrets stored in AWS Secrets Manager and reads new messages from the queue using IBM MQ’s Java library. Only IBM MQ messages are supported as a source.
The Apache Camel broker publishes these new messages to the target SNS topic. It uses the Amazon SNS Extended Client Library for Java to store any messages larger than 256 KB in an Amazon Simple Storage Service (Amazon S3) bucket.
Apache Camel stores any message that cannot be delivered to SNS after two retries in an S3 dead letter queue bucket.

The next diagram demonstrates how the solution sends messages back from an SQS queue to IBM MQ:

A sample message producer using Lambda sends messages to an SQS queue. It uses the Amazon SQS Extended Client Library for Java to send messages larger than 256 KB.
The Apache Camel broker receives the messages published to SQS, using the SQS Extended Client Library if needed.
The Apache Camel broker sends the message to the IBM MQ target queue.
As before, the broker stores messages that cannot be delivered to IBM MQ in the S3 dead letter queue bucket.

A phased live migration consists of two steps:

Deploy the broker service to allow reading messages from and writing to existing IBM MQ queues.
Once the consumer or producer is migrated, migrate its counterpart to the newly selected service (SNS or SQS).

Next, you will learn how to set up the solution using the AWS Cloud Development Kit (AWS CDK).

Deploying the solution

Prerequisites

AWS CDK
TypeScript
Java
Docker
Git
Yarn

Step 1: Cloning the repository

Clone the repository using git:

git clone https://github.com/aws-samples/aws-ibm-mq-adapter

Step 2: Setting up test IBM MQ credentials

This demo uses IBM MQ’s mutual TLS authentication. To do this, you must generate X.509 certificates and store them in AWS Secrets Manager by running the following commands in the app folder:

Generate X.509 certificates:
```
./deploy.sh generate_secrets
```
Set up the secrets required for the Apache Camel broker (replace <integration-name> with, for example, dev):
```
./deploy.sh create_secrets broker <integration-name>
```
Set up secrets for the mock IBM MQ system:
```
./deploy.sh create_secrets mock
```
Update the cdk.json file with the secrets ARN output from the previous commands:
- IBM_MOCK_PUBLIC_CERT_ARN
- IBM_MOCK_PRIVATE_CERT_ARN
- IBM_MOCK_CLIENT_PUBLIC_CERT_ARN
- IBMMQ_TRUSTSTORE_ARN
- IBMMQ_TRUSTSTORE_PASSWORD_ARN
- IBMMQ_KEYSTORE_ARN
- IBMMQ_KEYSTORE_PASSWORD_ARN

If you are using your own IBM MQ system and already have X.509 certificates available, you can use the script to upload those certificates to AWS Secrets Manager after running the script.

Step 3: Configuring the broker

The solution deploys two brokers, one to read messages from the test IBM MQ system and one to send messages back. A separate Apache Camel cluster is used per integration to support better use of Auto Scaling functionality and to avoid issues across different integration operations (consuming and reading messages).

Update the cdk.json file with the following values:

accountId: AWS account ID to deploy the solution to.
region: name of the AWS Region to deploy the solution to.
defaultVPCId: specify a VPC ID for an existing VPC in the AWS account where the broker and mock are deployed.
allowedPrincipals: add your account ARN (e.g., arn:aws:iam::123456789012:root) to allow this AWS account to send messages to and receive messages from the broker. You can use this parameter to set up cross-account relationships for both SQS and SNS integrations and support multiple consumers and producers.

Step 4: Bootstrapping and deploying the solution

Make sure you have the correct AWS_PROFILE and AWS_REGION environment variables set for your development account.
Run yarn cdk bootstrap –-qualifier mq <aws://<account-id>/<region> to bootstrap CDK.
Run yarn install to install CDK dependencies.
Finally, execute yarn cdk deploy '*-dev' –-qualifier mq --require-approval never to deploy the solution to the dev environment.

Step 5: Testing the integrations

Use AWS System Manager Session Manager and port forwarding to establish tunnels to the test IBM MQ instance to access the web console and send messages manually. For more information on port forwarding, see Amazon EC2 instance port forwarding with AWS System Manager.

In a command line terminal, make sure you have the correct AWS_PROFILE and AWS_REGION environment variables set for your development account.
In addition, set the following environment variables:
- IBM_ENDPOINT: endpoint for IBM MQ. Example: network load balancer for IBM mock mqmoc-mqada-1234567890.elb.eu-west-1.amazonaws.com.
- BASTION_ID: instance ID for the bastion host. You can retrieve this output from Step 4: Bootstrapping and deploying the solution listed after the mqBastionStack deployment.
Use the following command to set the environment variables:
```
export IBM_ENDPOINT=mqmoc-mqada-1234567890.elb.eu-west-1.amazonaws.com
export BASTION_ID=i-0a1b2c3d4e5f67890
```
Run the script test/connect.sh.
Log in to the IBM web console via https://127.0.0.1:9443/admin using the default IBM user (admin) and the password stored in AWS Secrets Manager as mqAdapterIbmMockAdminPassword.

Sending data from IBM MQ and receiving it in SNS:

In the IBM MQ console, access the local queue manager QM1 and DEV.QUEUE.1.
Send a message with the content Hello AWS. This message will be processed by AWS Fargate and published to SNS.
Access the SQS console and choose the snsIntegrationStack-dev-2 prefix queue. This is an SQS queue subscribed to the SNS topic for testing.
Select Send and receive message.
Select Poll for messages to see the Hello AWS message previously sent to IBM MQ.

Sending data back from Amazon SQS to IBM MQ:

Access the SQS console and choose the queue with the prefix sqsPublishIntegrationStack-dev-3-dev.
Select Send and receive messages.
For Message Body, add Hello from AWS.
Choose Send message.
In the IBM MQ console, access the local queue manager QM1 and DEV.QUEUE.2 to find your message listed under this queue.

Step 6: Cleaning up

Run cdk destroy '*-dev' to destroy the resources deployed as part of this walkthrough.

Conclusion

In this blog, you learned how you can exchange messages between IBM MQ and your cloud applications using Amazon SQS and Amazon SNS.

If you’re interested in getting started with your own integration, follow the README file in the GitHub repository. If you’re migrating existing applications using industry-standard APIs and protocols such as JMS, NMS, or AMQP 1.0, consider integrating with Amazon MQ using the steps provided in the repository.

If you’re interested in running Apache Camel in Kubernetes, you can also adapt the architecture to use Apache Camel K instead.

For more serverless learning resources, visit Serverless Land.

Monitor data pipelines in a serverless data lake

2023-08-09 Virendhar Sivaraman

Post Syndicated from Virendhar Sivaraman original https://aws.amazon.com/blogs/big-data/monitor-data-pipelines-in-a-serverless-data-lake/

AWS serverless services, including but not limited to AWS Lambda, AWS Glue, AWS Fargate, Amazon EventBridge, Amazon Athena, Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (Amazon SQS), and Amazon Simple Storage Service (Amazon S3), have become the building blocks for any serverless data lake, providing key mechanisms to ingest and transform data without fixed provisioning and the persistent need to patch the underlying servers. The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. The advent of rapid adoption of serverless data lake architectures—with ever-growing datasets that need to be ingested from a variety of sources, followed by complex data transformation and machine learning (ML) pipelines—can present a challenge. Similarly, in a serverless paradigm, application logs in Amazon CloudWatch are sourced from a variety of participating services, and traversing the lineage across logs can also present challenges. To successfully manage a serverless data lake, you require mechanisms to perform the following actions:

Reinforce data accuracy with every data ingestion
Holistically measure and analyze ETL (extract, transform, and load) performance at the individual processing component level
Proactively capture log messages and notify failures as they occur in near-real time

In this post, we will walk you through a solution to efficiently track and analyze ETL jobs in a serverless data lake environment. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Overview of solution

The serverless monitoring solution focuses on achieving the following goals:

Capture state changes across all steps and tasks in the data lake
Measure service reliability across a data lake
Quickly notify operations of failures as they happen

To illustrate the solution, we create a serverless data lake with a monitoring solution. For simplicity, we create a serverless data lake with the following components:

Storage layer – Amazon S3 is the natural choice, in this case with the following buckets:
- Landing – Where raw data is stored
- Processed – Where transformed data is stored
Ingestion layer – For this post, we use Lambda and AWS Glue for data ingestion, with the following resources:
- Lambda functions – Two Lambda functions that run to simulate a success state and failure state, respectively
- AWS Glue crawlers – Two AWS Glue crawlers that run to simulate a success state and failure state, respectively
- AWS Glue jobs – Two AWS Glue jobs that run to simulate a success state and failure state, respectively
Reporting layer – An Athena database to persist the tables created via the AWS Glue crawlers and AWS Glue jobs
Alerting layer – Slack is used to notify stakeholders

The serverless monitoring solution is devised to be loosely coupled as plug-and-play components that complement an existing data lake. The Lambda-based ETL tasks state changes are tracked using AWS Lambda Destinations. We have used an SNS topic for routing both success and failure states for the Lambda-based tasks. In the case of AWS Glue-based tasks, we have configured EventBridge rules to capture state changes. These event changes are also routed to the same SNS topic. For demonstration purposes, this post only provides state monitoring for Lambda and AWS Glue, but you can extend the solution to other AWS services.

The following figure illustrates the architecture of the solution.

The architecture contains the following components:

EventBridge rules – EventBridge rules that capture the state change for the ETL tasks—in this case AWS Glue tasks. This can be extended to other supported services as the data lake grows.
SNS topic – An SNS topic that serves to catch all state events from the data lake.
Lambda function – The Lambda function is the subscriber to the SNS topic. It’s responsible for analyzing the state of the task run to do the following:
- Persist the status of the task run.
- Notify any failures to a Slack channel.
Athena database – The database where the monitoring metrics are persisted for analysis.

Deploy the solution

The source code to implement this solution uses AWS Cloud Development Kit (AWS CDK) and is available on the GitHub repo monitor-serverless-datalake. This AWS CDK stack provisions required network components and the following:

Three S3 buckets (the bucket names are prefixed with the AWS account name and Regions, for example, the landing bucket is <aws-account-number>-<aws-region>-landing):
- Landing
- Processed
- Monitor
Three Lambda functions:
- datalake-monitoring-lambda
- lambda-success
- lambda-fail
Two AWS Glue crawlers:
- glue-crawler-success
- glue-crawler-fail
Two AWS Glue jobs:
- glue-job-success
- glue-job-fail
An SNS topic named datalake-monitor-sns
Three EventBridge rules:
- glue-monitor-rule
- event-rule-lambda-fail
- event-rule-lambda-success
An AWS Secrets Manager secret named datalake-monitoring
Athena artifacts:
- monitor database
- monitor-table table

You can also follow the instructions in the GitHub repo to deploy the serverless monitoring solution. It takes about 10 minutes to deploy this solution.

Connect to a Slack channel

We still need a Slack channel to which the alerts are delivered. Complete the following steps:

Set up a workflow automation to route messages to the Slack channel using webhooks.
Note the webhook URL.

The following screenshot shows the field names to use.

The following is a sample message for the preceding template.

On the Secrets Manager console, navigate to the datalake-monitoring secret.
Add the webhook URL to the slack_webhook secret.

Load sample data

The next step is to load some sample data. Copy the sample data files to the landing bucket using the following command:

aws s3 cp --recursive s3://awsglue-datasets/examples/us-legislators s3://<AWS_ACCCOUNT>-<AWS_REGION>-landing/legislators

In the next sections, we show how Lambda functions, AWS Glue crawlers, and AWS Glue jobs work for data ingestion.

Test the Lambda functions

On the EventBridge console, enable the rules that trigger the lambda-success and lambda-fail functions every 5 minutes:

event-rule-lambda-fail
event-rule-lambda-success

After a few minutes, the failure events are relayed to the Slack channel. The following screenshot shows an example message.

Disable the rules after testing to avoid repeated messages.

Test the AWS Glue crawlers

On the AWS Glue console, navigate to the Crawlers page. Here you can start the following crawlers:

glue-crawler-success
glue-crawler-fail

In a minute, the glue-crawler-fail crawler’s status changes to Failed, which triggers a notification in Slack in near-real time.

Test the AWS Glue jobs

On the AWS Glue console, navigate to the Jobs page, where you can start the following jobs:

glue-job-success
glue-job-fail

In a few minutes, the glue-job-fail job status changes to Failed, which triggers a notification in Slack in near-real time.

Analyze the monitoring data

The monitoring metrics are persisted in Amazon S3 for analysis and can be used of historical analysis.

On the Athena console, navigate to the monitor database and run the following query to find the service that failed the most often:

SELECT service_type, count(*) as "fail_count"
FROM "monitor"."monitor"
WHERE event_type = 'failed'
group by service_type
order by fail_count desc;

Over time with rich observability data – time series based monitoring data analysis will yield interesting findings.

Clean up

The overall cost of the solution is less than one dollar but to avoid future costs, make sure to clean up the resources created as part of this post.

Summary

The post provided an overview of a serverless data lake monitoring solution that you can configure and deploy to integrate with enterprise serverless data lakes in just a few hours. With this solution, you can monitor a serverless data lake, send alerts in near-real time, and analyze performance metrics for all ETL tasks operating in the data lake. The design was intentionally kept simple to demonstrate the idea; you can further extend this solution with Athena and Amazon QuickSight to generate custom visuals and reporting. Check out the GitHub repo for a sample solution and further customize it for your monitoring needs.

About the Authors

Virendhar (Viru) Sivaraman is a strategic Senior Big Data & Analytics Architect with Amazon Web Services. He is passionate about building scalable big data and analytics solutions in the cloud. Besides work, he enjoys spending time with family, hiking & mountain biking.

Vivek Shrivastava is a Principal Data Architect, Data Lake in AWS Professional Services. He is a Bigdata enthusiast and holds 14 AWS Certifications. He is passionate about helping customers build scalable and high-performance data analytics solutions in the cloud. In his spare time, he loves reading and finds areas for home automation.

Best practices for implementing event-driven architectures in your organization

2023-07-24 Emanuele Levi

Post Syndicated from Emanuele Levi original https://aws.amazon.com/blogs/architecture/best-practices-for-implementing-event-driven-architectures-in-your-organization/

Event-driven architectures (EDA) are made up of components that detect business actions and changes in state, and encode this information in event notifications. Event-driven patterns are becoming more widespread in modern architectures because:

they are the main invocation mechanism in serverless patterns.
they are the preferred pattern for decoupling microservices, where asynchronous communications and event persistence are paramount.
they are widely adopted as a loose-coupling mechanism between systems in different business domains, such as third-party or on-premises systems.

Event-driven patterns have the advantage of enabling team independence through the decoupling and decentralization of responsibilities. This decentralization trend in turn, permits companies to move with unprecedented agility, enhancing feature development velocity.

In this blog, we’ll explore the crucial components and architectural decisions you should consider when adopting event-driven patterns, and provide some guidance on organizational structures.

Division of responsibilities

The communications flow in EDA (see What is EDA?) is initiated by the occurrence of an event. Most production-grade event-driven implementations have three main components, as shown in Figure 1: producers, message brokers, and consumers.

Figure 1. Three main components of an event-driven architecture

Producers, message brokers, and consumers typically assume the following roles:

Producers

Producers are responsible for publishing the events as they happen. They are the owners of the event schema (data structure) and semantics (meaning of the fields, such as the meaning of the value of an enum field). As this is the only contract (coupling) between producers and the downstream components of the system, the schema and its semantics are crucial in EDA. Producers are responsible for implementing a change management process, which involves both non-breaking and breaking changes. With introduction of breaking changes, consumers are able to negotiate the migration process with producers.

Producers are “consumer agnostic”, as their boundary of responsibility ends when an event is published.

Message brokers

Message brokers are responsible for the durability of the events, and will keep an event available for consumption until it is successfully processed. Message brokers ensure that producers are able to publish events for consumers to consume, and they regulate access and permissions to publish and consume messages.

Message brokers are largely “events agnostic”, and do not generally access or interpret the event content. However, some systems provide a routing mechanism based on the event payload or metadata.

Consumers

Consumers are responsible for consuming events, and own the semantics of the effect of events. Consumers are usually bounded to one business context. This means the same event will have different effect semantics for different consumers. Crucial architectural choices when implementing a consumer involve the handling of unsuccessful message deliveries or duplicate messages. Depending on the business interpretation of the event, when recovering from failure a consumer might permit duplicate events, such as with an idempotent consumer pattern.

Crucially, consumers are “producer agnostic”, and their boundary of responsibility begins when an event is ready for consumption. This allows new consumers to onboard into the system without changing the producer contracts.

Team independence

In order to enforce the division of responsibilities, companies should organize their technical teams by ownership of producers, message brokers, and consumers. Although the ownership of producers and consumers is straightforward in an EDA implementation, the ownership of the message broker may not be. Different approaches can be taken to identify message broker ownership depending on your organizational structure.

Decentralized ownership

Figure 2. Ownership of the message broker in a decentralized ownership organizational structure

In a decentralized ownership organizational structure (see Figure 2), the teams producing events are responsible for managing their own message brokers and the durability and availability of the events for consumers.

The adoption of topic fanout patterns based on Amazon Simple Queue Service (SQS) and Amazon Simple Notification Service (SNS) (see Figure 3), can help companies implement a decentralized ownership pattern. A bus-based pattern using Amazon EventBridge can also be similarly utilized (see Figure 4).

Figure 3. Topic fanout pattern based on Amazon SQS and Amazon SNS

Figure 4. Events bus pattern based on Amazon EventBridge

The decentralized ownership approach has the advantage of promoting team independence, but it is not a fit for every organization. In order to be implemented effectively, a well-established DevOps culture is necessary. In this scenario, the producing teams are responsible for managing the message broker infrastructure and the non-functional requirements standards.

Centralized ownership

Figure 5. Ownership of the message broker in a centralized ownership organizational structure

In a centralized ownership organizational structure, a central team (we’ll call it the platform team) is responsible for the management of the message broker (see Figure 5). Having a specialized platform team offers the advantage of standardized implementation of non-functional requirements, such as reliability, availability, and security. One disadvantage is that the platform team is a single point of failure in both the development and deployment lifecycle. This could become a bottleneck and put team independence and operational efficiency at risk.

Figure 6. Streaming pattern based on Amazon MSK and Kinesis Data Streams

On top of the implementation patterns mentioned in the previous section, the presence of a dedicated team makes it easier to implement streaming patterns. In this case, a deeper understanding on how the data is partitioned and how the system scales is required. Streaming patterns can be implemented using services such as Amazon Managed Streaming for Apache Kafka (MSK) or Amazon Kinesis Data Streams (see Figure 6).

Best practices for implementing event-driven architectures in your organization

The centralized and decentralized ownership organizational structures enhance team independence or standardization of non-functional requirements respectively. However, they introduce possible limits to the growth of the engineering function in a company. Inspired by the two approaches, you can implement a set of best practices which are aimed at minimizing those limitations.

Figure 7. Best practices for implementing event-driven architectures

Introduce a cloud center of excellence (CCoE). A CCoE standardizes non-functional implementation across engineering teams. In order to promote a strong DevOps culture, the CCoE should not take the form of an external independent team, but rather be a collection of individual members representing the various engineering teams.
Decentralize team ownership. Decentralize ownership and maintenance of the message broker to producing teams. This will maximize team independence and agility. It empowers the team to use the right tool for the right job, as long as they conform to the CCoE guidelines.
Centralize logging standards and observability strategies. Although it is a best practice to decentralize team ownership of the components of an event-driven architecture, logging standards and observability strategies should be centralized and standardized across the engineering function. This centralization provides for end-to-end tracing of requests and events, which are powerful diagnosis tools in case of any failure.

Conclusion

In this post, we have described the main architectural components of an event-driven architecture, and identified the ownership of the message broker as one of the most important architectural choices you can make. We have described a centralized and decentralized organizational approach, presenting the strengths of the two approaches, as well as the limits they impose on the growth of your engineering organization. We have provided some best practices you can implement in your organization to minimize these limitations.

Further reading:
To start your journey building event-driven architectures in AWS, explore the following:

For bus-based and topic-based patterns, see What is an Event-Driven Architecture?
For streaming patterns, see What is Apache Kafka?, and Amazon Kinesis.

Detecting and stopping recursive loops in AWS Lambda functions

2023-07-10 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/detecting-and-stopping-recursive-loops-in-aws-lambda-functions/

This post is written by Pawan Puthran, Principal Serverless Specialist TAM, Aneel Murari, Senior Serverless Specialist Solution Architect, and Shree Shrikhande, Senior AWS Lambda Product Manager.

AWS Lambda is announcing a recursion control to detect and stop Lambda functions running in a recursive or infinite loop.

At launch, this feature is available for Lambda integrations with Amazon Simple Queue Service (Amazon SQS), Amazon SNS, or when invoking functions directly using the Lambda invoke API. Lambda now detects functions that appear to be running in a recursive loop and drops requests after exceeding 16 invocations.

This can help reduce costs from unexpected Lambda function invocations because of recursion. You receive notifications about this action through the AWS Health Dashboard, email, or by configuring Amazon CloudWatch Alarms.

Overview

You can invoke Lambda functions in multiple ways. AWS services generate events that invoke Lambda functions, and Lambda functions can send messages to other AWS services. In most architectures, the service or resource that invokes a Lambda function should be different from the service or resource that the function outputs to. Because of misconfiguration or coding bugs, a function can send a processed event to the same service or resource that invokes the Lambda function, causing a recursive loop.

Lambda now detects the function running in a recursive loop between supported services, after exceeding 16 invocations. It returns a RecursiveInvocationException to the caller. There is no additional charge for this feature. For asynchronous invokes, Lambda sends the event to a dead-letter queue or on-failure destination, if one is configured.

The following is an example of an order processing system.

Order processing system

A new order information message is sent to the source SQS queue.
Lambda consumes the message from the source queue using an ESM.
The Lambda function processes the message and sends the updated orders message to a destination SQS queue using the SQS SendMessage API.
The source queue has a dead-letter queue(DLQ) configured for handling any failed or unprocessed messages.
Because of a misconfiguration, the Lambda function sends the message to the source SQS queue instead of the destination queue. This causes a recursive loop of Lambda function invocations.

To explore sample code for this example, see the GitHub repo.

In the preceding example, after 16 invocations, Lambda throws a RecursiveInvocationException to the ESM. The ESM stops invoking the Lambda function and, once the maxReceiveCount is exceeded, SQS moves the message to the source queues configured DLQ.

You receive an AWS Health Dashboard notification with steps to troubleshoot the function.

AWS Health Dashboard notification

You also receive an email notification to the registered email address on the account.

Email notification

Lambda emits a RecursiveInvocationsDropped CloudWatch metric, which you can view in the CloudWatch console.

RecursiveInvocationsDropped CloudWatch metric

How does Lambda detect recursion?

For Lambda to detect recursive loops, your function must use one of the supported AWS SDK versions or higher.

Lambda uses an AWS X-Ray trace header primitive called “Lineage” to track the number of times a function has been invoked with an event. When your function code sends an event using a supported AWS SDK version, Lambda increments the counter in the lineage header. If your function is then invoked with the same triggering event more than 16 times, Lambda stops the next invocation for that event. You do not need to configure active X-Ray tracing for this feature to work.

An example lineage header looks like:

X-Amzn-Trace-Id:Root=1-645f7998-4b1e232810b0bb733dba2eab;Parent=5be88d12eefc1fc0;Sampled=1;Lineage=43e12f0f:5

43e12f0f is the hash of a resource, in this case a Lambda function. 5 is the number of times this function has been invoked with the same event. The logic of hash generation, encoding, and size of the lineage header may change in the future. You should not design any application functionality based on this.

When using an ESM to consume messages from SQS, after the maxReceiveCount value is exceeded, the message is sent to the source queue’s configured DLQ. When Lambda detects a recursive loop and drops subsequent invocations, it returns a RecursiveInvocationException to the ESM. This increments the maxReceiveCount value. When the ESM auto retries to process events, based on the error handling configuration, these retries are not considered recursive invocations.

When using SQS, you can also batch multiple messages into one Lambda event. Where the message batch size is greater than 1, Lambda uses the maximum lineage value within the batch of messages. It drops the entire batch if the value exceeds 16.

Recursion detection in action

You can deploy a sample application example in the GitHub repo to test Lambda recursive loop detection. The application includes a Lambda function that reads from an SQS queue and writes messages back to the same SQS queue.

As prerequisites, you must install:

AWS Command Line Interface (AWS CLI)
AWS Serverless Application Model (AWS SAM) (version 1.81.1 or above)
Docker

To deploy the application:

1. Set your AWS Region:

export REGION=<your AWS region>

1. Clone the GitHub repository

git clone https://github.com/aws-samples/aws-lambda-recursion-detection-sample.git
cd aws-lambda-recursion-detection-sample

1. Use AWS SAM to build and deploy the resources to your AWS account. Enter a stack name, such as lambda-recursion, when prompted. Accept the remaining default values.

sam build –-use-container
sam deploy --guided --region $REGION

To test the application:

1. Save the name of the SQS queue in a local environment variable:

SOURCE_SQS_URL=$(aws cloudformation describe-stacks \ --region $REGION \ --stack-name lambda-recursion \ --query 'Stacks[0].Outputs[?OutputKey==`SourceSQSqueueURL`].OutputValue' --output text)

Publish a message to the source SQS queue:

aws sqs send-message --queue-url $SOURCE_SQS_URL --message-body '{"orderId":"111","productName":"Bolt","orderStatus":"Submitted"}' --region $REGION

This invokes the Lambda function, which writes the message back to the queue.

To verify that Lambda has detected the recursion:

Navigate to the CloudWatch console. Choose All Metrics under Metrics in the left-hand panel and search for RecursiveInvocationsDropped.

Find RecursiveInvocationsDropped.
Choose Lambda > By Function Name and choose RecursiveInvocationsDropped for the function you created. Under Graphed metrics, change the statistic to sum and Period to 1 minute. You see one record. Refresh if you don’t see the metric after a few seconds.

Metrics sum view

Actions to take when Lambda stops a recursive loop

When you receive a notification regarding recursion in your account, the following steps can help address the issue.

To stop further invoke attempts while you fix the underlying configuration issue, set the function concurrency to 0. This acts as an off switch for the Lambda function. You can choose the “Throttle” button in the Lambda console or use the PutFunctionConcurrency API to set the function concurrency to 0.
You can also disable or delete the event source mapping or trigger for the Lambda function.
Check your Lambda function code and configuration for any code defects that create loops. For example, check your environment variables to ensure you are not using the same SQS queue or SNS topic as source and target.
If an SQS Queue is the event source for your Lambda function, configure a DLQ on the source queue.
If an SNS topic is the event source, configure an On-Failure Destination for the Lambda function.

Disabling recursion detection

You may have valid use-cases where Lambda recursion is intentional as part of your design. In this case, use caution and implement suitable guardrails to prevent unexpected charges to your account. To learn more about best practices for using recursive invocation patterns, see Recursive patterns that cause run-away Lambda functions in the AWS Lambda Operator Guide.

This feature is turned on by default to stop recursive loops. To request turning it off for your account, reach out to AWS Support.

Conclusion

Lambda recursion control for SQS and SNS automatically detects and stops functions running in a recursive or infinite loop. This can be due to misconfiguration or coding errors. Recursion control helps reduce unexpected usage with Lambda and downstream services. The post also explains how Lambda detects and stops recursive loops and notifies you through AWS Health Dashboard to troubleshoot the function.

To learn more about the feature, visit the AWS Lambda Developer Guide.

For more serverless learning resources, visit Serverless Land

Implementing AWS Lambda error handling patterns

2023-07-06 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/implementing-aws-lambda-error-handling-patterns/

This post is written by Jeff Chen, Principal Cloud Application Architect, and Jeff Li, Senior Cloud Application Architect

Event-driven architectures are an architecture style that can help you boost agility and build reliable, scalable applications. Splitting an application into loosely coupled services can help each service scale independently. A distributed, loosely coupled application depends on events to communicate application change states. Each service consumes events from other services and emits events to notify other services of state changes.

Handling errors becomes even more important when designing distributed applications. A service may fail if it cannot handle an invalid payload, dependent resources may be unavailable, or the service may time out. There may be permission errors that can cause failures. AWS services provide many features to handle error conditions, which you can use to improve the resiliency of your applications.

This post explores three use-cases and design patterns for handling failures.

Overview

AWS Lambda, Amazon Simple Queue Service (Amazon SQS), Amazon Simple Notification Service (Amazon SNS), and Amazon EventBridge are core building blocks for building serverless event-driven applications.

The post Understanding the Different Ways to Invoke Lambda Functions lists the three different ways of invoking a Lambda function: synchronous, asynchronous, and poll-based invocation. For a list of services and which invocation method they use, see the documentation.

Lambda’s integration with Amazon API Gateway is an example of a synchronous invocation. A client makes a request to API Gateway, which sends the request to Lambda. API Gateway waits for the function response and returns the response to the client. There are no built-in retries or error handling. If the request fails, the client attempts the request again.

Lambda’s integration with SNS and EventBridge are examples of asynchronous invocations. SNS, for example, sends an event to Lambda for processing. When Lambda receives the event, it places it on an internal event queue and returns an acknowledgment to SNS that it has received the message. Another Lambda process reads events from the internal queue and invokes your Lambda function. If SNS cannot deliver an event to your Lambda function, the service automatically retries the same operation based on a retry policy.

Lambda’s integration with SQS uses poll-based invocations. Lambda runs a fleet of pollers that poll your SQS queue for messages. The pollers read the messages in batches and invoke your Lambda function once per batch.

You can apply this pattern in many scenarios. For example, your operational application can add sales orders to an operational data store. You may then want to load the sales orders to your data warehouse periodically so that the information is available for forecasting and analysis. The operational application can batch completed sales as events and place them on an SQS queue. A Lambda function can then process the events and load the completed sale records into your data warehouse.

If your function processes the batch successfully, the pollers delete the messages from the SQS queue. If the batch is not successfully processed, the pollers do not delete the messages from the queue. Once the visibility timeout expires, the messages are available again to be reprocessed. If the message retention period expires, SQS deletes the message from the queue.

The following table shows the invocation types and retry behavior of the AWS services mentioned.

AWS service example	Invocation type	Retry behavior
Amazon API Gateway	Synchronous	No built-in retry, client attempts retries.
Amazon SNS Amazon EventBridge	Asynchronous	Built-in retries with exponential backoff.
Amazon SQS	Poll-based	Retries after visibility timeout expires until message retention period expires.

There are a number of design patterns to use for poll-based and asynchronous invocation types to retain failed messages for additional processing. These patterns can help you recover from delivery or processing failures.

You can explore the patterns and test the scenarios by deploying the code from this repository which uses the AWS Cloud Development Kit (AWS CDK) using Python.

Lambda poll-based invocation pattern

When using Lambda with SQS, if Lambda isn’t able to process the message and the message retention period expires, SQS drops the message. Failure to process the message can be due to function processing failures, including time-outs or invalid payloads. Processing failures can also occur when the destination function does not exist, or has incorrect permissions.

You can configure a separate dead-letter queue (DLQ) on the source queue for SQS to retain the dropped message. A DLQ preserves the original message and is useful for analyzing root causes, handling error conditions properly, or sending notifications that require manual interventions. In the poll-based invocation scenario, the Lambda function itself does not maintain a DLQ. It relies on the external DLQ configured in SQS. For more information, see Using Lambda with Amazon SQS.

The following shows the design pattern when you configure Lambda to poll events from an SQS queue and invoke a Lambda function.

Lambda synchronously polling catches of messages from SQS

Lambda synchronously polling batches of messages from SQS

To explore this pattern, deploy the code in this repository. Once deployed, you can use this instruction to test the pattern with the happy and unhappy paths.

Lambda asynchronous invocation pattern

With asynchronous invokes, there are two failure aspects to consider when using Lambda. The event source cannot deliver the message to Lambda and the Lambda function errors when processing the event.

Event sources vary in how they handle failures delivering messages to Lambda. If SNS or EventBridge cannot send the event to Lambda after exhausting all their retry attempts, the service drops the event. You can configure a DLQ on an SNS topic or EventBridge event bus to hold the dropped event. This works in the same way as the poll-based invocation pattern with SQS.

Lambda functions may then error due to input payload syntax errors, duration time-outs, or the function throws an exception such as a data resource not available.

For asynchronous invokes, you can configure how long Lambda retains an event in its internal queue, up to 6 hours. You can also configure how many times Lambda retries when the function errors, between 0 and 2. Lambda discards the event when the maximum age passes or all retry attempts fail. To retain a copy of discarded events, you can configure either a DLQ or, preferably, a failed-event destination as part of your Lambda function configuration.

A Lambda destination enables you to specify what to do next if an asynchronous invocation succeeds or fails. You can configure a destination to send invocation records to SQS, SNS, EventBridge, or another Lambda function. Destinations are preferred for failure processing as they support additional targets and include additional information. A DLQ holds the original failed event. With a destination, Lambda also passes details of the function’s response in the invocation record. This includes stack traces, which can be useful for analyzing the root cause.

Using both a DLQ and Lambda destinations

You can apply this pattern in many scenarios. For example, many of your applications may contain customer records. To comply with the California Consumer Privacy Act (CCPA), different organizations may need to delete records for a particular customer. You can set up a consumer delete SNS topic. Each organization creates a Lambda function, which processes the events published by the SNS topic and deletes customer records in its managed applications.

The following shows the design pattern when you configure an SNS topic as the event source for a Lambda function, which uses destination queues for success and failure process.

SNS topic as event source for Lambda

You configure a DLQ on the SNS topic to capture messages that SNS cannot deliver to Lambda. When Lambda invokes the function, it sends details of the successfully processed messages to an on-success SQS destination. You can use this pattern to route an event to multiple services for simpler use cases. For orchestrating multiple services, AWS Step Functions is a better design choice.

Lambda can also send details of unsuccessfully processed messages to an on-failure SQS destination.

A variant of this pattern is to replace an SQS destination with an EventBridge destination so that multiple consumers can process an event based on the destination.

To explore how to use an SQS DLQ and Lambda destinations, deploy the code in this repository. Once deployed, you can use this instruction to test the pattern with the happy and unhappy paths.

Using a DLQ

Although destinations is the preferred method to handle function failures, you can explore using DLQs.

The following shows the design pattern when you configure an SNS topic as the event source for a Lambda function, which uses SQS queues for failure process.

Lambda invoked asynchonously

You configure a DLQ on the SNS topic to capture the messages that SNS cannot deliver to the Lambda function. You also configure a separate DLQ for the Lambda function. Lambda saves an unsuccessful event to this DLQ after Lambda cannot process the event after maximum retry attempts.

To explore how to use a Lambda DLQ, deploy the code in this repository. Once deployed, you can use this instruction to test the pattern with happy and unhappy paths.

Conclusion

This post explains three patterns that you can use to design resilient event-driven serverless applications. Error handling during event processing is an important part of designing serverless cloud applications.

You can deploy the code from the repository to explore how to use poll-based and asynchronous invocations. See how poll-based invocations can send failed messages to a DLQ. See how to use DLQs and Lambda destinations to route and handle unsuccessful events.

Learn more about event-driven architecture on Serverless Land.

Monitor Amazon SNS-based applications end-to-end with AWS X-Ray active tracing

2023-05-11 Pascal Vogel

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/monitor-amazon-sns-based-applications-end-to-end-with-aws-x-ray-active-tracing/

This post is written by Daniel Lorch, Senior Consultant and David Mbonu, Senior Solutions Architect.

Amazon Simple Notification Service (Amazon SNS), a messaging service that provides high-throughput, push-based, many-to-many messaging between distributed systems, microservices, and event-driven serverless applications, now supports active tracing with AWS X-Ray.

With AWS X-Ray active tracing enabled for SNS, you can identify bottlenecks and monitor the health of event-driven applications by looking at segment details for SNS topics, such as resource metadata, faults, errors, and message delivery latency for each subscriber.

This blog post reviews common use cases where AWS X-Ray active tracing enabled for SNS provides a consistent view of tracing data across AWS services in real-world scenarios. We cover two architectural patterns which allow you to gain accurate visibility of your end-to-end tracing: SNS to Amazon Simple Queue Service (Amazon SQS) queues and SNS topics to Amazon Kinesis Data Firehose streams.

Getting started with the sample serverless application

To demonstrate AWS X-Ray active tracing for SNS, we will use the Wild Rydes serverless application as shown in the following figure. The application uses a microservices architecture which implements asynchronous messaging for integrating independent systems.

Wild Rydes serverless application architecture

This is how the sample serverless application works:

An Amazon API Gateway receives ride requests from users.
An AWS Lambda function processes ride requests.
An Amazon DynamoDB table serves as a store for rides.
An SNS topic serves as a fan-out for ride requests.
Individual SQS queues and Lambda functions are set up for processing requests via various back-office services (customer notification, customer accounting, and others).
An SNS message filter is in place for the subscription of the extraordinary rides service.
A Kinesis Data Firehose delivery stream archives ride requests in an Amazon Simple Storage Service (Amazon S3) bucket.

Deploying the sample serverless application

Prerequisites

Deployment steps using AWS SAM

The sample application is provided as an AWS SAM infrastructure as code template.

This demonstrative application will deploy an API without authorization. Please consider controlling and managing access to your APIs.

Clone the GitHub repository:

git clone https://github.com/aws-samples/sns-xray-active-tracing-blog-source-code
cd sns-xray-active-tracing-blog-source-code

Build the lab artifacts from source:
```
sam build
```

Deploy the sample solution into your AWS account:

export AWS_REGION=$(aws --profile default configure get region)
sam deploy \
--stack-name wild-rydes-async-msg-2 \
--capabilities CAPABILITY_IAM \
--region $AWS_REGION \
--guided

Confirm SubmitRideCompletionFunction may not have authorization defined, Is this okay? [y/N]: with yes.

Wait until the stack reaches status CREATE_COMPLETE.

See the sample application README.md for detailed deployment instructions.

Testing the application

Once the application is successfully deployed, generate messages and validate that the SNS topic is publishing all messages:

Look up the API Gateway endpoint:

export AWS_REGION=$(aws --profile default configure get region)
aws cloudformation describe-stacks \
--stack-name wild-rydes-async-msg-2 \
--query 'Stacks[].Outputs[?OutputKey==`UnicornManagementServiceApiSubmitRideCompletionEndpoint`].OutputValue' \
--output text

Store this API Gateway endpoint in an environment variable:

export ENDPOINT=$(aws cloudformation describe-stacks \
--stack-name wild-rydes-async-msg-2 \
--query 'Stacks[].Outputs[?OutputKey==`UnicornManagementServiceApiSubmitRideCompletionEndpoint`].OutputValue' \
--output text)

Send requests to the submit ride completion endpoint by executing the following command five or more times with varying payloads:

curl -XPOST -i -H "Content-Type\:application/json" -d '{ "from": "Berlin", "to": "Frankfurt", "duration": 420, "distance": 600, "customer": "cmr", "fare": 256.50 }' $ENDPOINT

Validate that messages are being passed in the application using the CloudWatch service map:

See the sample application README.md for detailed testing instructions.

The sample application shows various use-cases, which are described in the following sections.

Amazon SNS to Amazon SQS fanout scenario

A common application integration scenario for SNS is the Fanout scenario. In the Fanout scenario, a message published to an SNS topic is replicated and pushed to multiple endpoints, such as SQS queues. This allows for parallel asynchronous processing and is a common application integration pattern used in event-driven application architectures.

When an SNS topic fans out to SQS queues, the pattern is called topic-queue-chaining. This means that you add a queue, in our case an SQS queue, between the SNS topic and each of the subscriber services. As messages are buffered in a persistent manner in an SQS queue, no message is lost should a subscriber process run into issues for multiple hours or days, or experience exceptions or crashes.

By placing an SQS queue in front of each subscriber service, you can leverage the fact that a queue can act as a buffering load balancer. As every queue message is delivered to one of potentially many consumer processes, subscriber services can be easily scaled out and in, and the message load is distributed over the available consumer processes. In an event where suddenly a large number of messages arrives, the number of consumer processes has to be scaled out to cope with the additional load. This takes time and you need to wait until additional processes become operational. Since messages are buffered in the queue, you do not lose any messages in the process.

To summarize, in the Fanout scenario or the topic-queue-chaining pattern:

SNS replicates and pushes the message to multiple endpoints.
SQS decouples sending and receiving endpoints.

With AWS X-Ray active tracing enabled on the SNS topic, the CloudWatch service map shows us the complete application architecture, as follows.

Prior to the introduction of AWS X-Ray active tracing on the SNS topic, the AWS X-Ray service would not be able to reconstruct the full service map and the SQS nodes would be missing from the diagram.

To see the integration without AWS X-Ray active tracing enabled, open template.yaml and navigate to the resource RideCompletionTopic. Comment out the property TracingConfig: Active, redeploy and test the solution. The service map should then show an incomplete diagram where the SNS topic is linked directly to the consumer Lambda functions, omitting the SQS nodes.

For this use case, given the Fanout scenario, enabling AWS X-Ray active tracing on the SNS topic provides full end-to-end observability of the traces available in the application.

Amazon SNS to Amazon Kinesis Data Firehose delivery streams for message archiving and analytics

SNS is commonly used with Kinesis Data Firehose delivery streams for message archival and analytics use-cases. You can use SNS topics with Kinesis Data Firehose subscriptions to capture, transform, buffer, compress and upload data to Amazon S3, Amazon Redshift, Amazon OpenSearch Service, HTTP endpoints, and third-party service providers.

We will implement this pattern as follows:

An SNS topic to replicate and push the message to its subscribers.
A Kinesis Data Firehose delivery stream to capture and buffer messages.
An S3 bucket to receive uploaded messages for archival.

In order to demonstrate this pattern, an additional consumer has been added to the SNS topic. The same Fanout pattern applies and the Kinesis Data Firehose delivery stream receives messages from the SNS topic alongside the existing consumers.

The Kinesis Data Firehose delivery stream buffers messages and is configured to deliver them to an S3 bucket for archival purposes. Optionally, an SNS message filter could be added to this subscription to select relevant messages for archival.

With AWS X-Ray active tracing enabled on the SNS topic, the Kinesis Data Firehose node will appear on the CloudWatch service map as a separate entity, as can be seen in the following figure. It is worth noting that the S3 bucket does not appear on the CloudWatch service map as Kinesis does not yet support AWS X-Ray active tracing at the time of writing of this blog post.

Prior to the introduction of AWS X-Ray active tracing on the SNS topic, the AWS X-Ray service would not be able to reconstruct the full service map and the Kinesis Data Firehose node would be missing from the diagram. To see the integration without AWS X-Ray active tracing enabled, open template.yaml and navigate to the resource RideCompletionTopic. Comment out the property TracingConfig: Active, redeploy and test the solution. The service map should then show an incomplete diagram where the Kinesis Data Firehose node is missing.

For this use case, given the data archival scenario with Kinesis Delivery Firehose, enabling AWS X-Ray active tracing on the SNS topic provides additional visibility on the Kinesis Data Firehose node in the CloudWatch service map.

Review faults, errors, and message delivery latency on the AWS X-Ray trace details page

The AWS X-Ray trace details page provides a timeline with resource metadata, faults, errors, and message delivery latency for each segment.

With AWS X-Ray active tracing enabled on SNS, additional segments for the SNS topic itself, but also the downstream consumers (AWS::SNS::Topic, AWS::SQS::Queue and AWS::KinesisFirehose) segments are available, providing additional faults, errors, and message delivery latency for these segments. This allows you to analyze latencies in your messages and their backend services. For example, how long a message spends in a topic, and how long it took to deliver the message to each of the topic’s subscriptions.

Enabling AWS X-Ray active tracing for SNS

AWS X-Ray active tracing is not enabled by default on SNS topics and needs to be explicitly enabled.

The example application used in this blog post demonstrates how to enable active tracing using AWS SAM.

You can enable AWS X-Ray active tracing using the SNS SetTopicAttributes API, SNS Management Console, or via AWS CloudFormation. See Active tracing in Amazon SNS in the Amazon SNS Developer Guide for more options.

Cleanup

To clean up the resources provisioned as part of the sample serverless application, follow the instructions as outlined in the sample application README.md.

Conclusion

AWS X-Ray active tracing for SNS enables end-to-end visibility in real-world scenarios involving patterns like SNS to SQS and SNS to Amazon Kinesis.

But it is not only useful for these patterns. With AWS X-Ray active tracing enabled for SNS, you can identify bottlenecks and monitor the health of event-driven applications by looking at segment details for SNS topics and consumers, such as resource metadata, faults, errors, and message delivery latency for each subscriber.

Enable AWS X-Ray active tracing for SNS to gain accurate visibility of your end-to-end tracing.

For more serverless learning resources, visit Serverless Land.

Serverless ICYMI Q1 2023

2023-04-03 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/serverless-icymi-q1-2023/

Welcome to the 21^st edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

Artificial intelligence (AI) technologies, ChatGPT, and DALL-E are creating significant interest in the industry at the moment. Find out how to integrate serverless services with ChatGPT and DALL-E to generate unique bedtime stories for children.

Example notification of a story hosted with Next.js and App Runner

Serverless Land is a website maintained by the Serverless Developer Advocate team to help you build serverless applications and includes workshops, code examples, blogs, and videos. There is now enhanced search functionality so you can search across resources, patterns, and video content.

ServerlessLand search

AWS Lambda

AWS Lambda has improved how concurrency works with Amazon SQS. You can now control the maximum number of concurrent Lambda functions invoked.

The launch blog post explains the scaling behavior of Lambda using this architectural pattern, challenges this feature helps address, and a demo of maximum concurrency in action.

Maximum concurrency is set to 10 for the SQS queue.

AWS Lambda Powertools is an open-source library to help you discover and incorporate serverless best practices more easily. Lambda Powertools for .NET is now generally available and currently focused on three observability features: distributed tracing (Tracer), structured logging (Logger), and asynchronous business and application metrics (Metrics). Powertools is also available for Python, Java, and Typescript/Node.js programming languages.

To learn more:

Watch AWS Lambda Powertools for .NET on Serverless office hours

Lambda announced a new feature, runtime management controls, which provide more visibility and control over when Lambda applies runtime updates to your functions. The runtime controls are optional capabilities for advanced customers that require more control over their runtime changes. You can now specify a runtime management configuration for each function with three settings, Automatic (default), Function update, or manual.

There are three new Amazon CloudWatch metrics for asynchronous Lambda function invocations: AsyncEventsReceived, AsyncEventAge, and AsyncEventsDropped. You can track the asynchronous invocation requests sent to Lambda functions to monitor any delays in processing and take corrective actions if required. The launch blog post explains the new metrics and how to use them to troubleshoot issues.

Lambda now supports Amazon DocumentDB change streams as an event source. You can use Lambda functions to process new documents, track updates to existing documents, or log deleted documents. You can use any programming language that is supported by Lambda to write your functions.

There is a helpful blog post suggesting best practices for developing portable Lambda functions that allow you to port your code to containers if you later choose to.

AWS Step Functions

AWS Step Functions has expanded its AWS SDK integrations with support for 35 additional AWS services including Amazon EMR Serverless, AWS Clean Rooms, AWS IoT FleetWise, AWS IoT RoboRunner and 31 other AWS services. In addition, Step Functions also added support for 1000+ new API actions from new and existing AWS services such as Amazon DynamoDB and Amazon Athena. For the full list of added services, visit AWS SDK service integrations.

Amazon EventBridge

Amazon EventBridge has launched the AWS Controllers for Kubernetes (ACK) for EventBridge and Pipes . This allows you to manage EventBridge resources, such as event buses, rules, and pipes, using the Kubernetes API and resource model (custom resource definitions).

EventBridge event buses now also support enhanced integration with Service Quotas. Your quota increase requests for limits such as PutEvents transactions-per-second, number of rules, and invocations per second among others will be processed within one business day or faster, enabling you to respond quickly to changes in usage.

AWS SAM

The AWS Serverless Application Model (SAM) Command Line Interface (CLI) has added the sam list command. You can now show resources defined in your application, including the endpoints, methods, and stack outputs required to test your deployed application.

AWS SAM has a preview of sam build support for building and packaging serverless applications developed in Rust. You can use cargo-lambda in the AWS SAM CLI build workflow and AWS SAM Accelerate to iterate on your code changes rapidly in the cloud.

You can now use AWS SAM connectors as a source resource parameter. Previously, you could only define AWS SAM connectors as a AWS::Serverless::Connector resource. Now you can add the resource attribute on a connector’s source resource, which makes templates more readable and easier to update over time.

AWS SAM connectors now also support multiple destinations to simplify your permissions. You can now use a single connector between a single source resource and multiple destination resources.

In October 2022, AWS released OpenID Connect (OIDC) support for AWS SAM Pipelines. This improves your security posture by creating integrations that use short-lived credentials from your CI/CD provider. There is a new blog post on how to implement it.

Find out how best to build serverless Java applications with the AWS SAM CLI.

AWS App Runner

AWS App Runner now supports retrieving secrets and configuration data stored in AWS Secrets Manager and AWS Systems Manager (SSM) Parameter Store in an App Runner service as runtime environment variables.

AppRunner also now supports incoming requests based on HTTP 1.0 protocol, and has added service level concurrency, CPU and Memory utilization metrics.

Amazon S3

Amazon S3 now automatically applies default encryption to all new objects added to S3, at no additional cost and with no impact on performance.

You can now use an S3 Object Lambda Access Point alias as an origin for your Amazon CloudFront distribution to tailor or customize data to end users. For example, you can resize an image depending on the device that an end user is visiting from.

S3 has introduced Mountpoint for S3, a high performance open source file client that translates local file system API calls to S3 object API calls like GET and LIST.

S3 Multi-Region Access Points now support datasets that are replicated across multiple AWS accounts. They provide a single global endpoint for your multi-region applications, and dynamically route S3 requests based on policies that you define. This helps you to more easily implement multi-Region resilience, latency-based routing, and active-passive failover, even when data is stored in multiple accounts.

Amazon Kinesis

Amazon Kinesis Data Firehose now supports streaming data delivery to Elastic. This is an easier way to ingest streaming data to Elastic and consume the Elastic Stack (ELK Stack) solutions for enterprise search, observability, and security without having to manage applications or write code.

Amazon DynamoDB

Amazon DynamoDB now supports table deletion protection to protect your tables from accidental deletion when performing regular table management operations. You can set the deletion protection property for each table, which is set to disabled by default.

Amazon SNS

Amazon SNS now supports AWS X-Ray active tracing to visualize, analyze, and debug application performance. You can now view traces that flow through Amazon SNS topics to destination services, such as Amazon Simple Queue Service, Lambda, and Kinesis Data Firehose, in addition to traversing the application topology in Amazon CloudWatch ServiceLens.

SNS also now supports setting content-type request headers for HTTPS notifications so applications can receive their notifications in a more predictable format. Topic subscribers can create a DeliveryPolicy that specifies the content-type value that SNS assigns to their HTTPS notifications, such as application/json, application/xml, or text/plain.

EDA Visuals collection added to Serverless Land

The Serverless Developer Advocate team has extended Serverless Land and introduced EDA visuals. These are small bite sized visuals to help you understand concept and patterns about event-driven architectures. Find out about batch processing vs. event streaming, commands vs. events, message queues vs. event brokers, and point-to-point messaging. Discover bounded contexts, migrations, idempotency, claims, enrichment and more!

EDA Visuals

To learn more:

Watch EDA visually explained on Serverless office hours

Serverless Repos Collection on Serverless Land

There is also a new section on Serverless Land containing helpful code repositories. You can search for code repos to use for examples, learning or building serverless applications. You can also filter by use-case, runtime, and level.

Serverless Repos Collection

Serverless Blog Posts

January

Jan 12 – Introducing maximum concurrency of AWS Lambda functions when using Amazon SQS as an event source

Jan 20 – Processing geospatial IoT data with AWS IoT Core and the Amazon Location Service

Jan 23 – AWS Lambda: Resilience under-the-hood

Jan 24 – Introducing AWS Lambda runtime management controls

Jan 24 – Best practices for working with the Apache Velocity Template Language in Amazon API Gateway

February

Feb 6 – Previewing environments using containerized AWS Lambda functions

Feb 7 – Building ad-hoc consumers for event-driven architectures

Feb 9 – Implementing architectural patterns with Amazon EventBridge Pipes

Feb 9 – Securing CI/CD pipelines with AWS SAM Pipelines and OIDC

Feb 9 – Introducing new asynchronous invocation metrics for AWS Lambda

Feb 14 – Migrating to token-based authentication for iOS applications with Amazon SNS

Feb 15 – Implementing reactive progress tracking for AWS Step Functions

Feb 23 – Developing portable AWS Lambda functions

Feb 23 – Uploading large objects to Amazon S3 using multipart upload and transfer acceleration

Feb 28 – Introducing AWS Lambda Powertools for .NET

March

Mar 9 – Server-side rendering micro-frontends – UI composer and service discovery

Mar 9 – Building serverless Java applications with the AWS SAM CLI

Mar 10 – Managing sessions of anonymous users in WebSocket API-based applications

Mar 14 –
Implementing an event-driven serverless story generation application with ChatGPT and DALL-E

Videos

Serverless Office Hours – Tues 10AM PT

Weekly office hours live stream. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues. Ask us anything you want about serverless technologies and applications.

YouTube: https://youtube.com/serverlessland
Twitch: https://twitch.tv/aws
LinkedIn: https://linkedin.com/company/serverlessland

January

Jan 10 – Building .NET 7 high performance Lambda functions

Jan 17 – Amazon Managed Workflows for Apache Airflow at Scale

Jan 24 – Using Terraform with AWS SAM

Jan 31 – Preparing your serverless architectures for the big day

February

Feb 07- Visually design and build serverless applications

Feb 14 – Multi-tenant serverless SaaS

Feb 21 – Refactoring to Serverless

Feb 28 – EDA visually explained

March

Mar 07 – Lambda cookbook with Python

Mar 14 – Succeeding with serverless

Mar 21 – Lambda Powertools .NET

Mar 28 – Server-side rendering micro-frontends

FooBar Serverless YouTube channel

Marcia Villalba frequently publishes new videos on her popular serverless YouTube channel. You can view all of Marcia’s videos at https://www.youtube.com/c/FooBar_codes.

January

Jan 12 – Serverless Badge – A new certification to validate your Serverless Knowledge

Jan 19 – Step functions Distributed map – Run 10k parallel serverless executions!

Jan 26 – Step Functions Intrinsic Functions – Do simple data processing directly from the state machines!

February

Feb 02 – Unlock the Power of EventBridge Pipes: Integrate Across Platforms with Ease!

Feb 09 – Amazon EventBridge Pipes: Enrichment and filter of events Demo with AWS SAM

Feb 16 – AWS App Runner – Deploy your apps from GitHub to Cloud in Record Time

Feb 23 – AWS App Runner – Demo hosting a Node.js app in the cloud directly from GitHub (AWS CDK)

March

Mar 02 – What is Amazon DynamoDB? What are the most important concepts? What are the indexes?

Mar 09 – Choreography vs Orchestration: Which is Best for Your Distributed Application?

Mar 16 – DynamoDB Single Table Design: Simplify Your Code and Boost Performance with Table Design Strategies

Mar 23 – 8 Reasons You Should Choose DynamoDB for Your Next Project and How to Get Started

Sessions with SAM & Friends

AWS SAM & Friends

Eric Johnson is exploring how developers are building serverless applications. We spend time talking about AWS SAM as well as others like AWS CDK, Terraform, Wing, and AMPT.

Feb 16 – What’s new with AWS SAM

Feb 23 – AWS SAM with AWS CDK

Mar 02 – AWS SAM and Terraform

Mar 10 – Live from ServerlessDays ANZ

Mar 16 – All about AMPT

Mar 23 – All about Wing

Mar 30 – SAM Accelerate deep dive

Still looking for more?

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

Eric Johnson: @edjgeek
James Beswick: @jbesw
Ben Smith: @benjamin_l_s
Julian Wood: @julian_wood
Marcia Villalba: @mavi888uy
David Boyne: @boyney123

AWS Week in Review – March 27, 2023

2023-03-27 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-week-in-review-march-27-2023/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

In Finland, where I live, spring has arrived. The snow has melted, and the trees have grown their first buds. But I don’t get my hopes high, as usually around Easter we have what is called takatalvi. Takatalvi is a Finnish world that means that the winter returns unexpectedly in the spring.

Last Week’s Launches
Here are some launches that got my attention during the previous week.

AWS SAM CLI – Now the sam sync command will compare your local Serverless Application Model (AWS SAM) template with your deployed AWS CloudFormation template and skip the deployment if there are no changes. For more information, check the latest version of the AWS SAM CLI.

IAM – AWS Identity and Access Management (IAM) has launched two new global condition context keys. With these new condition keys, you can write service control policies (SCPs) or IAM policies that restrict the VPCs and private IP addresses from which your Amazon Elastic Compute Cloud (Amazon EC2) instance credentials can be used, without hard-coding VPC IDs or IP addresses in the policy. To learn more about this launch and how to get started, see How to use policies to restrict where EC2 instance credentials can be used from.

Amazon SNS – Amazon Simple Notification Service (Amazon SNS) now supports setting context-type request headers for HTTP/S notifications, such as application/json, application/xml, or text/plain. With this new feature, applications can receive their notifications in a more predictable format.

AWS Batch – AWS Batch now allows you to configure ephemeral storage up to 200GiB on AWS Fargate type jobs. With this launch, you no longer need to limit the size of your data sets or the size of the Docker images to run machine learning inference.

Application Load Balancer – Application Load Balancer (ALB) now supports Transport Layer Security (TLS) protocol version 1.3, enabling you to optimize the performance of your application while keeping it secure. TLS 1.3 on ALB works by offloading encryption and decryption of TLS traffic from your application server to the load balancer.

Amazon IVS – Amazon Interactive Video Service (IVS) now supports combining videos from multiple hosts into the source of a live stream. For a demo, refer to Add multiple hosts to live streams with Amazon IVS.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other updates and news that you may have missed:

I read the post Implementing an event-driven serverless story generation application with ChatGPT and DALL-E a few days ago, and since then I have been reading my child a lot of AI-generated stories. In this post, David Boyne, explains step by step how you can create an event-driven serverless story generation application. This application produces a brand-new story every day at bedtime with images, which can be played in audio format.

Podcast Charlas Técnicas de AWS – If you understand Spanish, this podcast is for you. Podcast Charlas Técnicas is one of the official AWS podcasts in Spanish, and every other week there is a new episode. The podcast is meant for builders, and it shares stories about how customers have implemented and learned AWS services, how to architect applications, and how to use new services. You can listen to all the episodes directly from your favorite podcast app or at AWS Podcasts en español.

AWS open-source news and updates – The open source newsletter is curated by my colleague Ricardo Sueiras to bring you the latest open-source projects, posts, events, and more.

Upcoming AWS Events
Check your calendars and sign up for the AWS Summit closest to your city. AWS Summits are free events that bring the local community together, where you can learn about different AWS services.

Here are the ones coming up in the next months:

March 28 – AWS Public Sector Symposium Brussels
April 4 – AWS Summit Paris
April 4 – AWS Summit Sydney
May 4 – AWS Summit ASEAN, in Singapore
May 4 – AWS Summit Berlin
May 11 – AWS Summit Stockholm
May 23 – AWS Summit Hong Kong
June 1 – AWS Summit Amsterdam
June 7 – AWS Summit London
June 15 – AWS Summit Madrid
June 22 – AWS Summit Milano

That’s all for this week. Check back next Monday for another Week in Review!

— Marcia

Implementing an event-driven serverless story generation application with ChatGPT and DALL-E

2023-03-14 David Boyne

Post Syndicated from David Boyne original https://aws.amazon.com/blogs/compute/implementing-an-event-driven-serverless-story-generation-application-with-chatgpt-and-dall-e/

This post demonstrates how to integrate AWS serverless services with artificial intelligence (AI) technologies, ChatGPT, and DALL-E. This full stack event-driven application showcases a method of generating unique bedtime stories for children by using predetermined characters and scenes as a prompt for ChatGPT.

Every night at bedtime, the serverless scheduler triggers the application, initiating an event-driven workflow to create and store new unique AI-generated stories with AI-generated images and supporting audio.

These datasets are used to showcase the story on a custom website built with Next.js hosted with AWS App Runner. After the story is created, a notification is sent to the user containing a URL to view and read the story to the children.

Example notification of a story hosted with Next.js and App Runner

By integrating AWS services with AI technologies, you can now create new and innovative ideas that were previously unimaginable.

The application mentioned in this blog post demonstrates examples of point-to-point messaging with Amazon EventBridge pipes, publish/subscribe patterns with Amazon EventBridge and reacting to change data capture events with DynamoDB Streams.

Understanding the architecture

The following image shows the serverless architecture used to generate stories:

Architecture diagram for Serverless bed time story generation with ChatGPT and DALL-E

A new children’s story is generated every day at configured time using Amazon EventBridge Scheduler (Step 1). EventBridge Scheduler is a service capable of scaling millions of schedules with over 200 targets and over 6000 API calls. This example application uses EventBridge scheduler to trigger an AWS Lambda function every night at the same time (7:15pm). The Lambda function is triggered to start the generation of the story.

EventBridge scheduler triggers Lambda function every day at 7:15pm (bed time)

The “Scenes” and “Characters” Amazon DynamoDB tables contain the characters involved in the story and a scene that is randomly selected during its creation. As a result, ChatGPT receives a unique prompt each time. An example of the prompt may look like this:

“`
Write a title and a rhyming story on 2 main characters called Parker and Jackson. The story needs to be set within the scene haunted woods and be at least 200 words long

“`

After the story is created, it is then saved in the “Stories” DynamoDB table (Step 2).

Scheduler triggering Lambda function to generate the story and store story into DynamoDB

Once the story is created this initiates a change data capture event using DynamoDB Streams (Step 3). This event flows through point-to-point messaging with EventBridge pipes and directly into EventBridge. Input transforms are then used to convert the DynamoDB Stream event into a custom EventBridge event, which downstream consumers can comprehend. Adopting this pattern is beneficial as it allows us to separate contracts from the DynamoDB event schema and not having downstream consumers conform to this schema structure, this mapping allows us to remain decoupled from implementation details.

EventBridge Pipes connecting DynamoDB streams directly into EventBridge.

Upon triggering the StoryCreated event in EventBridge, three targets are triggered to carry out several processes (Step 4). Firstly, AI Images are processed, followed by the creation of audio for the story. Finally, the end user is notified of the completed story through Amazon SNS and email subscriptions. This fan-out pattern enables these tasks to be run asynchronously and in parallel, allowing for faster processing times.

EventBridge pub/sub pattern used to start async processing of notifications, audio, and images.

An SNS topic is triggered by the `StoryCreated` event to send an email to the end user using email subscriptions (Step 6). The email consists of a URL with the id of the story that has been created. Clicking on the URL takes the user to the frontend application that is hosted with App Runner.

Using SNS to notify the user of a new story

Example email sent to the user

Amazon Polly is used to generate the audio files for the story (Step 6). Upon triggering the `StoryCreated` event, a Lambda function is triggered, and the story description is used and given to Amazon Polly. Amazon Polly then creates an audio file of the story, which is stored in Amazon S3. A presigned URL is generated and saved in DynamoDB against the created story. This allows the frontend application and browser to retrieve the audio file when the user views the page. The presigned URL has a validity of two days, after which it can no longer be accessed or listened to.

Lambda function to generate audio using Amazon Polly, store in S3 and update story with presigned URL

The `StoryCreated` event also triggers another Lambda function, which uses the OpenAI API to generate an AI image using DALL-E based on the generated story (Step 7). Once the image is generated, the image is downloaded and stored in Amazon S3. Similar to the audio file, the system generates a presigned URL for the image and saves it in DynamoDB against the story. The presigned URL is only valid for two days, after which it becomes inaccessible for download or viewing.

Lambda function to generate images, store in S3 and update story with presigned URL.

In the event of a failure in audio or image generation, the frontend application still loads the story, but does not display the missing image or audio at that moment. This ensures that the frontend can continue working and provide value. If you wanted more control and only trigger the user’s notification event once all parallel tasks are complete the aggregator messaging pattern can be considered.

Hosting the frontend Next.js application with AWS App Runner

Next.js is used by the frontend application to render server-side rendered (SSR) pages that can access the stories from the DynamoDB table, which are then hosted with AWS App Runner after being containerized.

Next.js application hosted with App Runner, with permissions into DynamoDB table.

AWS App Runner enables you to deploy containerized web applications and APIs securely, without needing any prior knowledge of containers or infrastructure. With App Runner, developers can concentrate on their application, while the service handles container startup, running, scaling, and load balancing. After deployment, App Runner provides a secure URL for clients to begin making HTTP requests against.

With App Runner, you have two primary options for deploying your container: source code connections or source images. Using source code connections grants App Runner permission to pull the image file directly from your source code, and with Automatic deployment configured, it can redeploy the application when changes are made. Alternatively, source images provide App Runner with the image’s location in an image registry, and this image is deployed by App Runner.

In this example application, CDK deploys the application using the DockerImageAsset construct with the App Runner construct. Once deployed, App Runner builds and uploads the frontend image to Amazon Elastic Container Registry (ECR) and deploys it. Downstream consumers can access the application using the secure URL provided by App Runner. In this example, the URL is used when the SNS notification is sent to the user when the story is ready to be viewed.

Giving the frontend container permission to DynamoDB table

To grant the Next.js application permission to obtain stories from the Stories DynamoDB table, App Runner instance roles are configured. These roles are optional and can provide the necessary permissions for the container to access AWS services required by the compute service.

If you want to learn more about AWS App Runner, you can explore the free workshop.

Design choices and assumptions

The DynamoDB Time to Live (TTL) feature is ideal for the short-lived nature of daily generated stories. DynamoDB handle the deletion of stories after two days by setting the TTL attribute on each story. Once a story is deleted, it becomes inaccessible through the generated story URLs.

Using Amazon S3 presigned URLs is a method to grant temporary access to a file in S3. This application creates presigned URLs for the audio file and generated images that last for 2 days, after which the URLs for the S3 items become invalid.

Input transforms are used between DynamoDB streams and EventBridge events to decouple the schemas and events consumed by downstream targets. Consuming the events as they are is known as the “conformist” pattern, and couples us to implementation details of DynamoDB streams with downstream EventBridge consumers. This allows the application to remain decoupled from implementation details and remain flexible.

Conclusion

The adoption of artificial intelligence (AI) technology has significantly increased in various industries. ChatGPT, a large language model that can understand and generate human-like responses in natural language, and DALL-E, an image generation system that can create realistic images based on textual descriptions, are examples of such technology. These systems have demonstrated the potential for AI to provide innovative solutions and transform the way we interact with technology.

This blog post explores ways in which you can utilize AWS serverless services with ChatGTP and DALL-E to create a story generation application fronted by a Next.js application hosted with App Runner. EventBridge Scheduler is used to trigger the story creation process then react to change data capture events with DynamoDB streams and EventBridge Pipes, and use Amazon EventBridge to fan out compute tasks to process notifications, images, and audio files.

You can find the documentation and the source code for this application in GitHub.

For more serverless learning resources, visit Serverless Land.

Build AI and ML into Email & SMS for customer engagement

2023-02-28 Vinay Ujjini

Post Syndicated from Vinay Ujjini original https://aws.amazon.com/blogs/messaging-and-targeting/build-ai-and-ml-into-email-sms-for-customer-engagement/

Build AI and ML into Email & SMS for customer engagement

Customers engage with businesses through various channels like email, SMS, Push, and in-app. With the availability and ease of usage of mobile phones, businesses can use 2-way Short Service Messages (SMS) to engage with their customers. Text messaging does not need applications and provides immediate interaction with your customers. Amazon Pinpoint enables businesses & organizations to interact in 2-way SMS messages with their customers. Since it is not practical and scalable for organizations to have people responding to millions of their customer’s texts, we can leverage Amazon Lex which helps build the conversational AI into the 2-way SMS. Amazon Lex is a fully managed artificial intelligent (AI) AWS service with advanced natural language models to design, build, test, and deploy conversational interfaces in applications. Machine Learning (ML) is used in digital marketing to help businesses detect patterns in customer bhevaior.

Today, if customers want to know the latest status on their order, they have to send an email, which is hard for businesses to monitor and respond, and time consuming for the customer to call regarding their order status and also expensive for businesses to field the calls.

This blog post shows how you can elevate your customer’s experience using Amazon Pinpoint’s omni-channel capabilities, Amazon Lex’s AI powered chat, and ML-powered personalization using Amazon Personalize.

The solution presented in this blog helps resolve all the above issues. The example I have used to depict this where a customer orders a bike and since the delivery has been delayed, he wants to get timely updates on the progress. He has been given a phone number by the bike company to text them with any questions. This solution elevates the customer’s experience by providing him with timely update by checking the latest from the database and also sending additional product recommendations, predicting what the customer might need.

Architecture

This solution uses Amazon Pinpoint, Amazon Lex, AWS Lambda, Amazon Dynamo DB, Amazon Simple Notification Services, Amazon Personalize.

AWS architecture diagram AI/ML, Email, SMS.

The customer sends a message to the number provided by the store asking about their order status.
Pinpoint 2-way SMS has as SNS topic tied to it.
The SNS topic relays the message to the Lex integration Lambda.
This Lex integration lambda has the integration between Pinpoint & Lex.
When the customer checks on their order status, Lex taps into the fulfillment lambda that is tied to it.
That lambda checks on the order status from the DynamoDB and sends it back to Lex.
Lex sends the order details to Amazon Pinpoint and Amazon Pinpoint delivers the SMS with the order details to the customer’s phone number.
Amazon Lex lets fulfillment Lambda know to send an email to the customer with the order details.
Fulfillment Lambda create an event called ‘Order Status’ for Amazon Pinpoint Journey to consume in its Journey.
Amazon Pinpoint’s message template reaches out to Amazon Personalize to get the 3 recommendations.
Amazon Pinpoint’s Journey triggers an email message to the customer with the order information and recommendations

Prerequisites

To deploy this solution, you must have the following:

An AWS account.
An Amazon Pinpoint project.
An originating identity that supports 2 way SMS in the country you are planning to send SMS to – Supported countries and regions (SMS channel).
A mobile number to send and receive text messages.
An SMS customer segment – Download the example CSV, that contains a sample SMS & email endpoints. Replace the phone number (column C) with yours, and email with your email and import it to Amazon Pinpoint – How to import an Amazon Pinpoint segment.
Add your mobile number in the Amazon Pinpoint SMS sandbox.
Verify your email address that needs to receive messages from this account.
Download the LexIntegration.zip & RE_Order_Validation.zip Lambda files from this Github location.

Preparation:

Download the CloudFormation template.
Go to Amazon S3 console and create a bucket. I created one for this example as ‘pinpointreinventaiml-code’. Under that S3 bucket, create a sub-folder and name it Lambda.
Upload the 2 zip files you downloaded earlier from the Github.
In Amazon Pinpoint > Phone numbers, Check to make sure the phone number you are using is enabled for SMS and its status is active.
Add the machine learning generated product recommendations using Amazon Personalize.

Check if phone number is enabled & active in Pinpoint console

Phone numbers in Pinpoint console

Solution implementation

Create a Lex Chat bot:

Now it’s time to create your bot. To create your bot, sign in to the Lex console at https://console.aws.amazon.com/lex.
For more information about creating bots in Lex, see https://docs.aws.amazon.com/lex/latest/dg/gs-console.html.
Click on Create bot button. Next steps:
1. Select Create a blank bot radio button.
2. Give a Bot name ‘Order Status’ under Bot name Configuration. (Use the same Bot name as mentioned here. If you change the Bot name here, your CloudFormation will fail)
3. Under IAM permissions, select the radio button Create a role with basic Amazon Lex permissions.
4. For COPPA, choose No. Click Next
5. Under Language dropdown, choose the language of usage. I chose Language as English in my example.
6. Click Done, to complete the Bot creation.
You have to create an Intent within the Bot you just created
1. Click on the Bot you just created. Click on Intents and click the dropdown Add intent and select Add empty intent.
2. Give an intent name and click Ok.
Once the intent is created, go to the intent and open the Conversation flow section in the intent and create a flow that that has the following info and looks like below image:
1. Click on Sample utterance and it takes you to Sample Utterance and type in Order status.
2. Click on initial response and type in Okay, I can help with that. What is your order number?
3. Click on the slot value and click on Add a slot. Name: OrderNumber and Slot type is AMAZON.AlphaNumeric. In the prompt, enter Please enter your order number.
4. Click on Save Intent button. The conversation flow should look like the below screenshot:

Amazon Lex intent

6. Go back to the Intent you just created and click on the Build button that is to the right side of the page.

Build intent

7. Once the build is successfully completed, go back to the Bot you created and click on Aliases on the left frame. Click on the Alias that was created earlier, TestBotAlias.

Bot Alias

8. In the Languages section, click on the English language that we created earlier.
9. Open the Lambda function – optional section and point the source to RE_Order_Validation Lambda that we downloaded earlier.
10. For Lambda function version or alias, select $LATEST. Click on Save.

Add Lambda to Alias

11. Go to Intents, choose the intent you just built and click on Build button again. Once build is complete, you can test the intent.

Import and execute CloudFormation:

Navigate to the Amazon CloudFormation console in the AWS region you want to deploy this solution.
Select Create stack and With new resources. Choose Template is ready as Prerequisite – Prepare template and Upload a template file as Specify template. Upload the template downloaded in step 1 under Preparation section of this document. Click Next.
Fill the AWS CloudFormation parameters as shown below:
Stack name: Give a name to this stack.
1. Under Parameters, for BotAlias: The Bot Alias that you created as part of Amazon Lex above.
2. BotId: The Bot ID for the bot that you created as part of Amazon Lex above.
3. CodeS3Bucket: Give the name of the S3 bucket you created in step3 of the Preparation topic above.
4. OriginationNumber: This is the origination identity phone number you created in step4 of the Preparation topic above.
5. PinpointProjectId: Use the ProjectID you have from step2 of the Prerequisites phase above.
After entering all the parameter info, it would look something like this below:
Click Next. Leave the default options on the next page and click Next again.
Check the box I acknowledge that AWS CloudFormation might create IAM resources with custom names. Click Submit.

Set up data in Amazon Dynamo DB

We are using DynamoDB table here as the transactional database that stores order information for the bike store.
Once the solution has been successfully deployed, navigate to the Amazon DynamoDB console and access the OrderStatus DynamoDB table. Each row created in this table represents an order and it’s details. Each row should have a unique Order_Num that holds the order number and it’s related information. You can put additional information about the order like the example below:

{
   "Order_Num":{
      "Value":"ABC123"
   },
   "Delivery_Dt":{
      "Value":"12/01/2022"
   },
   "Order_Dt":{
      "Value":"11/01/2022"
   }
   "Shipping_Dt":{
      "Value":"11/24/2022"
   }
   "UserId":{
      "Value":"example-iser-id-3"
   }
}

Once you enter the data, it should look like the image below. Click on Create item.

Set up Amazon Simple Notification Service (SNS) topic

We need the Amazon Simple Notification Service here, to provide internal message delivery from publishers (customer’s text message) to subscribers (Amazon Lex in this example). This is used for internal notifications in this use case.
As part of the CloudFormation above, check if you have an SNS topic created by the name LexPinpointIntegrationDemo.
Now, we have successfully created an Amazon SNS topic.

Set up Lambda Functions

Go to AWS Lambda console and open the Lambda function LexIntegration. Under the Function overview, click on the Add trigger. Under Trigger configuration dropdown, select SNS and under SNS Topic select LexPinpointIntegrationDemo topic. Click on Add.
Note: In this example, I used Node.js in a Lambda and Python in another, to show how AWS Lambda functions are flexible to use the scripting language of your choice.

Setting up 2-way SMS in Amazon Pinpoint

Go to Amazon Pinpoint console and click on Phone numbers under SMS & Voice in the left frame. If you don’t see any phone numbers, please refer to #3 under prerequisites section above.
This is how your screen should look like
Click on the number.
On the right frame, expand Two-way SMS drop down arrow.
Click on the check box ‘Enable two-way SMS’.
In the ‘Incoming message destination’ select the radio button ‘Choose an existing SNS topic’ and in the drop down below, choose the SNS topic you built above.
The result would look like the screenshot below:
Click on Save.

Import Machine Learning model into Pinpoint

Go to Amazon Pinpoint.
Click on Machine Learning Models. Click on Add recommender model.
Give a recommender model name and description under model details.
Under Model configuration, choose the radio button ‘Automatically create a role’ and give an IAM role name in the textbox below.
Under recommender model, choose the recommender model campaign that you created in Amazon Personalize earlier in the project.
1. If you did not create it, use this Pinpoint workshop to create a recommender model in Amazon Personalize.
2. The data used in this example is for retail industry, please edit the data as needed for your use case and industry.
Under the settings section:
1. Select ‘User Id’ as identifier.
2. Click on the drop down ‘Number of recommendations per message’ and select 3.
For Processing method, choose ‘Use value returned by model’.
Click on Next.
You are presented with attributes section. Give a display name as ‘product_name’ for the attributes and click next.
On the next screen, you can review all the information provided and click on Publish.
The completed model after publishing looks like the screen below:

Create a Message Template in Amazon Pinpoint

Use chapter 6.4 in this workshop Amazon Pinpoint Workshop to create a message template.
Once the template is created, you need to add recommendations to the message template using this Amazon Pinpoint Workshop details. Change the type of data needed for your use case and industry in this workshop. I used sample retail data.
To create a Amazon Pinpoint Journey, navigate to the Amazon Pinpoint console , select Journeys and click on Create journey.
Give a name, click on Set entry condition in the Journey entry block.
Choose the radio button Add participants when they perform an activity.
Click in the ‘Events’ text box and type in OrderStatus.
Click on Add activity and select Send an email.
Click on choose an email template and select the email message template we created earlier in this blog. Click on choose template button.
Select a Sender email address from the drop down list.
Click Save. The final journey should look like this:
Click on Actions > Settings where you will review the journey settings. There you set the start and end date of the journey if applicable as well as other advanced settings. Configure your journey settings to look like the screenshot below and click Save.
To publish your journey click on Review. On the Review your journey click Next > Mark as reviewed > Publish. A 5 minutes countdown will begin after, which your journey will be live.
Once the journey is live, we need to pass the event OrderStatus and the endpoints will go through that journey and will receive an email.

Testing the solution

Use a phone with a valid number (in this example, I took a US phone number) and send a text ‘Order Status’ to the number generated in Amazon Pinpoint above.
You should get a response “Okay, I can help with that. What is your order number?”
You should type in the order number you generated earlier and stored it in Amazon DynamoDB table.
You should get a response “Your order <order number> was shipped on <shipped_dt> and is expected to be delivered to your address on <delivery_dt>. Your order details have been emailed to you.”
Alternatively, you can test this solution from the Lex bot.
In Amazon Lex, go to the intent you created above and click on the Test button. Next steps:
1. In the text box, enter Order Status.
2. Bot should respond with Okay, I can help with that. What is your order number?
3. You can respond with the order number you entered in the DynamoDB table.
4. Bot should respond with Your order <Order_Num> was shipped on <Shipping_Dt> and is expected to be delivered to your address on <Delivery_Dt>. Your order details have been emailed to you.

Conclusion

Using this blog post, you can elevate your customer’s experience by using Amazon Lex’s AI chat capabilities, Amazon Personalize’s ML recommendation models and trigger a Pinpoint Journey. This blog highlights how organizations can interact in a 2-way SMS with their customers and convert that engagement to a triggered email, with product recommendations, if needed.

Next Steps

You can use the above solution and modify it easily to use it across different verticals and applicable use cases. You can also extend this solution to Amazon Connect to an agent via SMS chat, using this blog.

Clean-up

To delete the solution, go to CloudFormation you created as part od this project. Click on the stack and click Delete.
Navigate to Amazon Pinpoint and stop the Journey you ran in this solution. Delete the Journey, Machine learning models, Message templates you created for this solution. Delete the Project you created for this solution.
In Amazon Lex, delete the intent and bot you created for this solution.
Delete the folder and bucket you created in S3 as part of this project.
Amazon Personalize resources like Dataset groups, datasets, etc. are not created via AWS Cloudformation, thus you have to delete them manually. Please follow the instructions in the AWS documentation on how to clean up the created resources.

Additional resources

Retry delivering failed SMS using Pinpoint

How to target customers using ML, based on their interest in a product

About the Authors

Vinay Ujjini

Vinay Ujjini is an Amazon Pinpoint and Amazon Simple Email Service Principal Specialist Solutions Architect at AWS. He has been solving customer’s omni-channel challenges for over 15 years. In his spare time, he enjoys playing tennis & cricket.