Tag Archives: Amazon Simple Notification Service (SNS)

How to add notifications and manual approval to an AWS CDK Pipeline

Post Syndicated from Jehu Gray original https://aws.amazon.com/blogs/devops/how-to-add-notifications-and-manual-approval-to-an-aws-cdk-pipeline/

A deployment pipeline typically comprises several stages such as dev, test, and prod, which ensure that changes undergo testing before reaching the production environment. To improve the reliability and stability of release processes, DevOps teams must review Infrastructure as Code (IaC) changes before applying them in production. As a result, implementing a mechanism for notification and manual approval that grants stakeholders improved access to changes in their release pipelines has become a popular practice for DevOps teams.

Notifications keep development teams and stakeholders informed in real-time about updates and changes to deployment status within release pipelines. Manual approvals establish thresholds for transitioning a change from one stage to the next in the pipeline. They also act as a guardrail to mitigate risks arising from errors and rework because of faulty deployments.

Please note that manual approvals, as described in this post, are not a replacement for the use of automation. Instead, they complement automated checks within the release pipeline.

In this blog post, we describe how to set up notifications and add a manual approval stage to AWS Cloud Development Kit (AWS CDK) Pipeline.

Concepts

CDK Pipeline

CDK Pipelines is a construct library for painless continuous delivery of CDK applications. CDK Pipelines can automatically build, test, and deploy changes to CDK resources. CDK Pipelines are self-mutating which means as application stages or stacks are added, the pipeline automatically reconfigures itself to deploy those new stages or stacks. Pipelines need only be manually deployed once, afterwards, the pipeline keeps itself up to date from the source code repository by pulling the changes pushed to the repository.

Notifications

Adding notifications to a pipeline provides visibility to changes made to the environment by utilizing the NotificationRule construct. You can also use this rule to notify pipeline users of important changes, such as when a pipeline starts execution. Notification rules specify both the events and the targets, such as Amazon Simple Notification Service (Amazon SNS) topic or AWS Chatbot clients configured for Slack which represents the nominated recipients of the notifications. An SNS topic is a logical access point that acts as a communication channel while Chatbot is an AWS service that enables DevOps and software development teams to use messaging program chat rooms to monitor and respond to operational events.

Manual Approval

In a CDK pipeline, you can incorporate an approval action at a specific stage, where the pipeline should pause, allowing a team member or designated reviewer to manually approve or reject the action. When an approval action is ready for review, a notification is sent out to alert the relevant parties. This combination of notifications and approvals ensures timely and efficient decision-making regarding crucial actions within the pipeline.

Solution Overview

The solution explains a simple web service that is comprised of an AWS Lambda function that returns a static web page served by Amazon API Gateway. Since Continuous Deployment and Continuous Integration (CI/CD) are important components to most web projects, the team implements a CDK Pipeline for their web project.

There are two important stages in this CDK pipeline; the Pre-production stage for testing and the Production stage, which contains the end product for users.

The flow of the CI/CD process to update the website starts when a developer pushes a change to the repository using their Integrated Development Environment (IDE). An Amazon CloudWatch event triggers the CDK Pipeline. Once the changes reach the pre-production stage for testing, the CI/CD process halts. This is because a manual approval gate is between the pre-production and production stages. So, it becomes a stakeholder’s responsibility to review the changes in the pre-production stage before approving them for production. The pipeline includes an SNS notification that notifies the stakeholder whenever the pipeline requires manual approval.

After approving the changes, the CI/CD process proceeds to the production stage and the updated version of the website becomes available to the end user. If the approver rejects the changes, the process ends at the pre-production stage with no impact to the end user.

The following diagram illustrates the solution architecture.

 

This diagram shows the CDK pipeline process in the solution and how applications or updates are deployed using AWS Lambda Function to end users.

Figure 1. This image shows the CDK pipeline process in our solution and how applications or updates are deployed using AWS Lambda Function to end users.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Add notification to the pipeline

In this tutorial, perform the following steps:

  • Add the import statements for AWS CodeStar notifications and SNS to the import section of the pipeline stack py
import aws_cdk.aws_codestarnotifications as notifications
import aws_cdk.pipelines as pipelines
import aws_cdk.aws_sns as sns
import aws_cdk.aws_sns_subscriptions as subs
  • Ensure the pipeline is built by calling the ‘build pipeline’ function.

pipeline.build_pipeline()

  • Create an SNS topic.

topic = sns.Topic(self, "MyTopic1")

  • Add a subscription to the topic. This specifies where the notifications are sent (Add the stakeholders’ email here).

topic.add_subscription(subs.EmailSubscription("[email protected]"))

  • Define a rule. This contains the source for notifications, the event trigger, and the target .

rule = notifications.NotificationRule(self, "NotificationRule", )

  • Assign the source the value pipeline.pipeline The first pipeline is the name of the CDK pipeline(variable) and the .pipeline is to show it is a pipeline(function).

source=pipeline.pipeline,

  • Define the events to be monitored. Specify notifications for when the pipeline starts, when it fails, when the execution succeeds, and finally when manual approval is needed.
events=["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed","codepipeline-pipeline-pipeline-execution-succeeded", 
"codepipeline-pipeline-manual-approval-needed"],
  • For the complete list of supported event types for pipelines, see here
  • Finally, add the target. The target here is the topic created previously.

targets=[topic]

The combination of all the steps becomes:

pipeline.build_pipeline()
topic = sns.Topic(self, "MyTopic1")
topic.add_subscription(subs.EmailSubscription("[email protected]"))
rule = notifications.NotificationRule(self, "NotificationRule",
source=pipeline.pipeline,
events=["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed","codepipeline-pipeline-pipeline-execution-succeeded", 
"codepipeline-pipeline-manual-approval-needed"],
targets=[topic]
)

Adding Manual Approval

  • Add the ManualApprovalStep import to the aws_cdk.pipelines import statement.
from aws_cdk.pipelines import (
CodePipeline,
CodePipelineSource,
ShellStep,
ManualApprovalStep
)
  • Add the ManualApprovalStep to the production stage. The code must be added to the add_stage() function.
 prod = WorkshopPipelineStage(self, "Prod")
        prod_stage = pipeline.add_stage(prod,
            pre = [ManualApprovalStep('PromoteToProduction')])

When a stage is added to a pipeline, you can specify the pre and post steps, which are arbitrary steps that run before or after the contents of the stage. You can use them to add validations like manual or automated gates to the pipeline. It is recommended to put manual approval gates in the set of pre steps, and automated approval gates in the set of post steps. So, the manual approval action is added as a pre step that runs after the pre-production stage and before the production stage .

  • The final version of the pipeline_stack.py becomes:
from constructs import Construct
import aws_cdk as cdk
import aws_cdk.aws_codestarnotifications as notifications
import aws_cdk.aws_sns as sns
import aws_cdk.aws_sns_subscriptions as subs
from aws_cdk import (
    Stack,
    aws_codecommit as codecommit,
    aws_codepipeline as codepipeline,
    pipelines as pipelines,
    aws_codepipeline_actions as cpactions,
    
)
from aws_cdk.pipelines import (
    CodePipeline,
    CodePipelineSource,
    ShellStep,
    ManualApprovalStep
)


class WorkshopPipelineStack(cdk.Stack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)
        
        # Creates a CodeCommit repository called 'WorkshopRepo'
        repo = codecommit.Repository(
            self, "WorkshopRepo", repository_name="WorkshopRepo",
            
        )
        
        #Create the Cdk pipeline
        pipeline = pipelines.CodePipeline(
            self,
            "Pipeline",
            
            synth=pipelines.ShellStep(
                "Synth",
                input=pipelines.CodePipelineSource.code_commit(repo, "main"),
                commands=[
                    "npm install -g aws-cdk",  # Installs the cdk cli on Codebuild
                    "pip install -r requirements.txt",  # Instructs Codebuild to install required packages
                    "npx cdk synth",
                ]
                
            ),
        )

        
         # Create the Pre-Prod Stage and its API endpoint
        deploy = WorkshopPipelineStage(self, "Pre-Prod")
        deploy_stage = pipeline.add_stage(deploy)
    
        deploy_stage.add_post(
            
            pipelines.ShellStep(
                "TestViewerEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": deploy.hc_viewer_url
                },
                commands=["curl -Ssf $ENDPOINT_URL"],
            )
    
        
        )
        deploy_stage.add_post(
            pipelines.ShellStep(
                "TestAPIGatewayEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": deploy.hc_endpoint
                },
                commands=[
                    "curl -Ssf $ENDPOINT_URL",
                    "curl -Ssf $ENDPOINT_URL/hello",
                    "curl -Ssf $ENDPOINT_URL/test",
                ],
            )
            
        )
        
        # Create the Prod Stage with the Manual Approval Step
        prod = WorkshopPipelineStage(self, "Prod")
        prod_stage = pipeline.add_stage(prod,
            pre = [ManualApprovalStep('PromoteToProduction')])
        
        prod_stage.add_post(
            
            pipelines.ShellStep(
                "ViewerEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": prod.hc_viewer_url
                },
                commands=["curl -Ssf $ENDPOINT_URL"],
                
            )
            
        )
        prod_stage.add_post(
            pipelines.ShellStep(
                "APIGatewayEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": prod.hc_endpoint
                },
                commands=[
                    "curl -Ssf $ENDPOINT_URL",
                    "curl -Ssf $ENDPOINT_URL/hello",
                    "curl -Ssf $ENDPOINT_URL/test",
                ],
            )
            
        )
        
        # Create The SNS Notification for the Pipeline
        
        pipeline.build_pipeline()
        
        topic = sns.Topic(self, "MyTopic")
        topic.add_subscription(subs.EmailSubscription("sta[email protected]"))
        rule = notifications.NotificationRule(self, "NotificationRule",
            source = pipeline.pipeline,
            events = ["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed", "codepipeline-pipeline-manual-approval-needed", "codepipeline-pipeline-manual-approval-succeeded"],
            targets=[topic]
            )
  
    

When a commit is made with git commit -am "Add manual Approval" and changes are pushed with git push, the pipeline automatically self-mutates to add the new approval stage.

Now when the developer pushes changes to update the build environment or the end user application, the pipeline execution stops at the point where the approval action was added. The pipeline won’t resume unless a manual approval action is taken.

Image showing the CDK pipeline with the added Manual Approval action on the AWS Management Console

Figure 2. This image shows the pipeline with the added Manual Approval action.

Since there is a notification rule that includes the approval action, an email notification is sent with the pipeline information and approval status to the stakeholder(s) subscribed to the SNS topic.

Image showing the SNS email notification sent when the pipeline starts

Figure 3. This image shows the SNS email notification sent when the pipeline starts.

After pushing the updates to the pipeline, the reviewer or stakeholder can use the AWS Management Console to access the pipeline to approve or deny changes based on their assessment of these changes. This process helps eliminate any potential issues or errors and ensures only changes deemed relevant are made.

Image showing the review action on the AWS Management Console that gives the stakeholder the ability to approve or reject any changes.

Figure 4. This image shows the review action that gives the stakeholder the ability to approve or reject any changes. 

If a reviewer rejects the action, or if no approval response is received within seven days of the pipeline stopping for the review action, the pipeline status is “Failed.”

Image showing when a stakeholder rejects the action

Figure 5. This image depicts when a stakeholder rejects the action.

If a reviewer approves the changes, the pipeline continues its execution.

Image showing when a stakeholder approves the action

Figure 6. This image depicts when a stakeholder approves the action.

Considerations

It is important to consider any potential drawbacks before integrating a manual approval process into a CDK pipeline. one such consideration is its implementation may delay the delivery of updates to end users. An example of this is business hours limitation. The pipeline process might be constrained by the availability of stakeholders during business hours. This can result in delays if changes are made outside regular working hours and require approval when stakeholders are not immediately accessible.

Clean up

To avoid incurring future charges, delete the resources. Use cdk destroy via the command line to delete the created stack.

Conclusion

Adding notifications and manual approval to CDK Pipelines provides better visibility and control over the changes made to the pipeline environment. These features ideally complement the existing automated checks to ensure that all updates are reviewed before deployment. This reduces the risk of potential issues arising from bugs or errors. The ability to approve or deny changes through the AWS Management Console makes the review process simple and straightforward. Additionally, SNS notifications keep stakeholders updated on the status of the pipeline, ensuring a smooth and seamless deployment process.

Jehu Gray

Jehu Gray is an Enterprise Solutions Architect at Amazon Web Services where he helps customers design solutions that fits their needs. He enjoys exploring whats possible with IaC such as CDK.

Abiola Olanrewaju

Abiola Olanrewaju is an Enterprise Solutions Architect at Amazon Web Services where he helps customers design and implement scalable solutions that drive business outcomes. He has a keen interest in Data Analytics, Security and Automation.

Serge Poueme

Serge Poueme is a Solutions Architect on the AWS for Games Team. He started his career as a software development engineer and enjoys building new products. At AWS, Serge focuses on improving Builders Experience for game developers and optimize servers hosting using Containers. When he is not working he enjoys playing Far Cry or Fifa on his XBOX

Integrating IBM MQ with Amazon SQS and Amazon SNS using Apache Camel

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/integrating-ibm-mq-with-amazon-sqs-and-amazon-sns-using-apache-camel/

This post is written by Joaquin Rinaudo, Principal Security Consultant and Gezim Musliaj, DevOps Consultant.

IBM MQ is a message-oriented middleware (MOM) product used by many enterprise organizations, including global banks, airlines, and healthcare and insurance companies.

Customers often ask us for guidance on how they can integrate their existing on-premises MOM systems with new applications running in the cloud. They’re looking for a cost-effective, scalable and low-effort solution that enables them to send and receive messages from their cloud applications to these messaging systems.

This blog post shows how to set up a bi-directional bridge from on-premises IBM MQ to Amazon MQ, Amazon Simple Queue Service (Amazon SQS), and Amazon Simple Notification Service (Amazon SNS).

This allows your producer and consumer applications to integrate using fully managed AWS messaging services and Apache Camel. Learn how to deploy such a solution and how to test the running integration using SNS, SQS, and a demo IBM MQ cluster environment running on Amazon Elastic Container Service (ECS) with AWS Fargate.

This solution can also be used as part of a step-by-step migration using the approach described in the blog post Migrating from IBM MQ to Amazon MQ using a phased approach.

Solution overview

The integration consists of an Apache Camel broker cluster that bi-directionally integrates an IBM MQ system and target systems, such as Amazon MQ running ActiveMQ, SNS topics, or SQS queues.

In the following example, AWS services, in this case AWS Lambda and SQS, receive messages published to IBM MQ via an SNS topic:

Solution architecture overview for sending messages

  1. The cloud message consumers (Lambda and SQS) subscribe to the solution’s target SNS topic.
  2. The Apache Camel broker connects to IBM MQ using secrets stored in AWS Secrets Manager and reads new messages from the queue using IBM MQ’s Java library. Only IBM MQ messages are supported as a source.
  3. The Apache Camel broker publishes these new messages to the target SNS topic. It uses the Amazon SNS Extended Client Library for Java to store any messages larger than 256 KB in an Amazon Simple Storage Service (Amazon S3) bucket.
  4. Apache Camel stores any message that cannot be delivered to SNS after two retries in an S3 dead letter queue bucket.

The next diagram demonstrates how the solution sends messages back from an SQS queue to IBM MQ:

Solution architecture overview for sending messages

  1. A sample message producer using Lambda sends messages to an SQS queue. It uses the Amazon SQS Extended Client Library for Java to send messages larger than 256 KB.
  2. The Apache Camel broker receives the messages published to SQS, using the SQS Extended Client Library if needed.
  3. The Apache Camel broker sends the message to the IBM MQ target queue.
  4. As before, the broker stores messages that cannot be delivered to IBM MQ in the S3 dead letter queue bucket.

A phased live migration consists of two steps:

  1. Deploy the broker service to allow reading messages from and writing to existing IBM MQ queues.
  2. Once the consumer or producer is migrated, migrate its counterpart to the newly selected service (SNS or SQS).

Next, you will learn how to set up the solution using the AWS Cloud Development Kit (AWS CDK).

Deploying the solution

Prerequisites

  • AWS CDK
  • TypeScript
  • Java
  • Docker
  • Git
  • Yarn

Step 1: Cloning the repository

Clone the repository using git:

git clone https://github.com/aws-samples/aws-ibm-mq-adapter

Step 2: Setting up test IBM MQ credentials

This demo uses IBM MQ’s mutual TLS authentication. To do this, you must generate X.509 certificates and store them in AWS Secrets Manager by running the following commands in the app folder:

  1. Generate X.509 certificates:
    ./deploy.sh generate_secrets
  2. Set up the secrets required for the Apache Camel broker (replace <integration-name> with, for example, dev):
    ./deploy.sh create_secrets broker <integration-name>
  3. Set up secrets for the mock IBM MQ system:
    ./deploy.sh create_secrets mock
  4. Update the cdk.json file with the secrets ARN output from the previous commands:
    • IBM_MOCK_PUBLIC_CERT_ARN
    • IBM_MOCK_PRIVATE_CERT_ARN
    • IBM_MOCK_CLIENT_PUBLIC_CERT_ARN
    • IBMMQ_TRUSTSTORE_ARN
    • IBMMQ_TRUSTSTORE_PASSWORD_ARN
    • IBMMQ_KEYSTORE_ARN
    • IBMMQ_KEYSTORE_PASSWORD_ARN

If you are using your own IBM MQ system and already have X.509 certificates available, you can use the script to upload those certificates to AWS Secrets Manager after running the script.

Step 3: Configuring the broker

The solution deploys two brokers, one to read messages from the test IBM MQ system and one to send messages back. A separate Apache Camel cluster is used per integration to support better use of Auto Scaling functionality and to avoid issues across different integration operations (consuming and reading messages).

Update the cdk.json file with the following values:

  • accountId: AWS account ID to deploy the solution to.
  • region: name of the AWS Region to deploy the solution to.
  • defaultVPCId: specify a VPC ID for an existing VPC in the AWS account where the broker and mock are deployed.
  • allowedPrincipals: add your account ARN (e.g., arn:aws:iam::123456789012:root) to allow this AWS account to send messages to and receive messages from the broker. You can use this parameter to set up cross-account relationships for both SQS and SNS integrations and support multiple consumers and producers.

Step 4: Bootstrapping and deploying the solution

  1. Make sure you have the correct AWS_PROFILE and AWS_REGION environment variables set for your development account.
  2. Run yarn cdk bootstrap –-qualifier mq <aws://<account-id>/<region> to bootstrap CDK.
  3. Run yarn install to install CDK dependencies.
  4. Finally, execute yarn cdk deploy '*-dev' –-qualifier mq --require-approval never to deploy the solution to the dev environment.

Step 5: Testing the integrations

Use AWS System Manager Session Manager and port forwarding to establish tunnels to the test IBM MQ instance to access the web console and send messages manually. For more information on port forwarding, see Amazon EC2 instance port forwarding with AWS System Manager.

  1. In a command line terminal, make sure you have the correct AWS_PROFILE and AWS_REGION environment variables set for your development account.
  2. In addition, set the following environment variables:
    • IBM_ENDPOINT: endpoint for IBM MQ. Example: network load balancer for IBM mock mqmoc-mqada-1234567890.elb.eu-west-1.amazonaws.com.
    • BASTION_ID: instance ID for the bastion host. You can retrieve this output from Step 4: Bootstrapping and deploying the solution listed after the mqBastionStack deployment.

    Use the following command to set the environment variables:

    export IBM_ENDPOINT=mqmoc-mqada-1234567890.elb.eu-west-1.amazonaws.com
    export BASTION_ID=i-0a1b2c3d4e5f67890
  3. Run the script test/connect.sh.
  4. Log in to the IBM web console via https://127.0.0.1:9443/admin using the default IBM user (admin) and the password stored in AWS Secrets Manager as mqAdapterIbmMockAdminPassword.

Sending data from IBM MQ and receiving it in SNS:

  1. In the IBM MQ console, access the local queue manager QM1 and DEV.QUEUE.1.
  2. Send a message with the content Hello AWS. This message will be processed by AWS Fargate and published to SNS.
  3. Access the SQS console and choose the snsIntegrationStack-dev-2 prefix queue. This is an SQS queue subscribed to the SNS topic for testing.
  4. Select Send and receive message.
  5. Select Poll for messages to see the Hello AWS message previously sent to IBM MQ.

Sending data back from Amazon SQS to IBM MQ:

  1. Access the SQS console and choose the queue with the prefix sqsPublishIntegrationStack-dev-3-dev.
  2. Select Send and receive messages.
  3. For Message Body, add Hello from AWS.
  4. Choose Send message.
  5. In the IBM MQ console, access the local queue manager QM1 and DEV.QUEUE.2 to find your message listed under this queue.

Step 6: Cleaning up

Run cdk destroy '*-dev' to destroy the resources deployed as part of this walkthrough.

Conclusion

In this blog, you learned how you can exchange messages between IBM MQ and your cloud applications using Amazon SQS and Amazon SNS.

If you’re interested in getting started with your own integration, follow the README file in the GitHub repository. If you’re migrating existing applications using industry-standard APIs and protocols such as JMS, NMS, or AMQP 1.0, consider integrating with Amazon MQ using the steps provided in the repository.

If you’re interested in running Apache Camel in Kubernetes, you can also adapt the architecture to use Apache Camel K instead.

For more serverless learning resources, visit Serverless Land.

Monitor data pipelines in a serverless data lake

Post Syndicated from Virendhar Sivaraman original https://aws.amazon.com/blogs/big-data/monitor-data-pipelines-in-a-serverless-data-lake/

AWS serverless services, including but not limited to AWS Lambda, AWS Glue, AWS Fargate, Amazon EventBridge, Amazon Athena, Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (Amazon SQS), and Amazon Simple Storage Service (Amazon S3), have become the building blocks for any serverless data lake, providing key mechanisms to ingest and transform data without fixed provisioning and the persistent need to patch the underlying servers. The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. The advent of rapid adoption of serverless data lake architectures—with ever-growing datasets that need to be ingested from a variety of sources, followed by complex data transformation and machine learning (ML) pipelines—can present a challenge. Similarly, in a serverless paradigm, application logs in Amazon CloudWatch are sourced from a variety of participating services, and traversing the lineage across logs can also present challenges. To successfully manage a serverless data lake, you require mechanisms to perform the following actions:

  • Reinforce data accuracy with every data ingestion
  • Holistically measure and analyze ETL (extract, transform, and load) performance at the individual processing component level
  • Proactively capture log messages and notify failures as they occur in near-real time

In this post, we will walk you through a solution to efficiently track and analyze ETL jobs in a serverless data lake environment. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Overview of solution

The serverless monitoring solution focuses on achieving the following goals:

  • Capture state changes across all steps and tasks in the data lake
  • Measure service reliability across a data lake
  • Quickly notify operations of failures as they happen

To illustrate the solution, we create a serverless data lake with a monitoring solution. For simplicity, we create a serverless data lake with the following components:

  • Storage layer – Amazon S3 is the natural choice, in this case with the following buckets:
    • Landing – Where raw data is stored
    • Processed – Where transformed data is stored
  • Ingestion layer – For this post, we use Lambda and AWS Glue for data ingestion, with the following resources:
    • Lambda functions – Two Lambda functions that run to simulate a success state and failure state, respectively
    • AWS Glue crawlers – Two AWS Glue crawlers that run to simulate a success state and failure state, respectively
    • AWS Glue jobs – Two AWS Glue jobs that run to simulate a success state and failure state, respectively
  • Reporting layer – An Athena database to persist the tables created via the AWS Glue crawlers and AWS Glue jobs
  • Alerting layer – Slack is used to notify stakeholders

The serverless monitoring solution is devised to be loosely coupled as plug-and-play components that complement an existing data lake. The Lambda-based ETL tasks state changes are tracked using AWS Lambda Destinations. We have used an SNS topic for routing both success and failure states for the Lambda-based tasks. In the case of AWS Glue-based tasks, we have configured EventBridge rules to capture state changes. These event changes are also routed to the same SNS topic. For demonstration purposes, this post only provides state monitoring for Lambda and AWS Glue, but you can extend the solution to other AWS services.

The following figure illustrates the architecture of the solution.

The architecture contains the following components:

  • EventBridge rules – EventBridge rules that capture the state change for the ETL tasks—in this case AWS Glue tasks. This can be extended to other supported services as the data lake grows.
  • SNS topic – An SNS topic that serves to catch all state events from the data lake.
  • Lambda function – The Lambda function is the subscriber to the SNS topic. It’s responsible for analyzing the state of the task run to do the following:
    • Persist the status of the task run.
    • Notify any failures to a Slack channel.
  • Athena database – The database where the monitoring metrics are persisted for analysis.

Deploy the solution

The source code to implement this solution uses AWS Cloud Development Kit (AWS CDK) and is available on the GitHub repo monitor-serverless-datalake. This AWS CDK stack provisions required network components and the following:

  • Three S3 buckets (the bucket names are prefixed with the AWS account name and Regions, for example, the landing bucket is <aws-account-number>-<aws-region>-landing):
    • Landing
    • Processed
    • Monitor
  • Three Lambda functions:
    • datalake-monitoring-lambda
    • lambda-success
    • lambda-fail
  • Two AWS Glue crawlers:
    • glue-crawler-success
    • glue-crawler-fail
  • Two AWS Glue jobs:
    • glue-job-success
    • glue-job-fail
  • An SNS topic named datalake-monitor-sns
  • Three EventBridge rules:
    • glue-monitor-rule
    • event-rule-lambda-fail
    • event-rule-lambda-success
  • An AWS Secrets Manager secret named datalake-monitoring
  • Athena artifacts:
    • monitor database
    • monitor-table table

You can also follow the instructions in the GitHub repo to deploy the serverless monitoring solution. It takes about 10 minutes to deploy this solution.

Connect to a Slack channel

We still need a Slack channel to which the alerts are delivered. Complete the following steps:

  1. Set up a workflow automation to route messages to the Slack channel using webhooks.
  2. Note the webhook URL.

The following screenshot shows the field names to use.

The following is a sample message for the preceding template.

  1. On the Secrets Manager console, navigate to the datalake-monitoring secret.
  2. Add the webhook URL to the slack_webhook secret.

Load sample data

The next step is to load some sample data. Copy the sample data files to the landing bucket using the following command:

aws s3 cp --recursive s3://awsglue-datasets/examples/us-legislators s3://<AWS_ACCCOUNT>-<AWS_REGION>-landing/legislators

In the next sections, we show how Lambda functions, AWS Glue crawlers, and AWS Glue jobs work for data ingestion.

Test the Lambda functions

On the EventBridge console, enable the rules that trigger the lambda-success and lambda-fail functions every 5 minutes:

  • event-rule-lambda-fail
  • event-rule-lambda-success

After a few minutes, the failure events are relayed to the Slack channel. The following screenshot shows an example message.

Disable the rules after testing to avoid repeated messages.

Test the AWS Glue crawlers

On the AWS Glue console, navigate to the Crawlers page. Here you can start the following crawlers:

  • glue-crawler-success
  • glue-crawler-fail

In a minute, the glue-crawler-fail crawler’s status changes to Failed, which triggers a notification in Slack in near-real time.

Test the AWS Glue jobs

On the AWS Glue console, navigate to the Jobs page, where you can start the following jobs:

  • glue-job-success
  • glue-job-fail

In a few minutes, the glue-job-fail job status changes to Failed, which triggers a notification in Slack in near-real time.

Analyze the monitoring data

The monitoring metrics are persisted in Amazon S3 for analysis and can be used of historical analysis.

On the Athena console, navigate to the monitor database and run the following query to find the service that failed the most often:

SELECT service_type, count(*) as "fail_count"
FROM "monitor"."monitor"
WHERE event_type = 'failed'
group by service_type
order by fail_count desc;

Over time with rich observability data – time series based monitoring data analysis will yield interesting findings.

Clean up

The overall cost of the solution is less than one dollar but to avoid future costs, make sure to clean up the resources created as part of this post.

Summary

The post provided an overview of a serverless data lake monitoring solution that you can configure and deploy to integrate with enterprise serverless data lakes in just a few hours. With this solution, you can monitor a serverless data lake, send alerts in near-real time, and analyze performance metrics for all ETL tasks operating in the data lake. The design was intentionally kept simple to demonstrate the idea; you can further extend this solution with Athena and Amazon QuickSight to generate custom visuals and reporting. Check out the GitHub repo for a sample solution and further customize it for your monitoring needs.


About the Authors

Virendhar (Viru) Sivaraman is a strategic Senior Big Data & Analytics Architect with Amazon Web Services. He is passionate about building scalable big data and analytics solutions in the cloud. Besides work, he enjoys spending time with family, hiking & mountain biking.

Vivek Shrivastava is a Principal Data Architect, Data Lake in AWS Professional Services. He is a Bigdata enthusiast and holds 14 AWS Certifications. He is passionate about helping customers build scalable and high-performance data analytics solutions in the cloud. In his spare time, he loves reading and finds areas for home automation.

Best practices for implementing event-driven architectures in your organization

Post Syndicated from Emanuele Levi original https://aws.amazon.com/blogs/architecture/best-practices-for-implementing-event-driven-architectures-in-your-organization/

Event-driven architectures (EDA) are made up of components that detect business actions and changes in state, and encode this information in event notifications. Event-driven patterns are becoming more widespread in modern architectures because:

  • they are the main invocation mechanism in serverless patterns.
  • they are the preferred pattern for decoupling microservices, where asynchronous communications and event persistence are paramount.
  • they are widely adopted as a loose-coupling mechanism between systems in different business domains, such as third-party or on-premises systems.

Event-driven patterns have the advantage of enabling team independence through the decoupling and decentralization of responsibilities. This decentralization trend in turn, permits companies to move with unprecedented agility, enhancing feature development velocity.

In this blog, we’ll explore the crucial components and architectural decisions you should consider when adopting event-driven patterns, and provide some guidance on organizational structures.

Division of responsibilities

The communications flow in EDA (see What is EDA?) is initiated by the occurrence of an event. Most production-grade event-driven implementations have three main components, as shown in Figure 1: producers, message brokers, and consumers.

Three main components of an event-driven architecture

Figure 1. Three main components of an event-driven architecture

Producers, message brokers, and consumers typically assume the following roles:

Producers

Producers are responsible for publishing the events as they happen. They are the owners of the event schema (data structure) and semantics (meaning of the fields, such as the meaning of the value of an enum field). As this is the only contract (coupling) between producers and the downstream components of the system, the schema and its semantics are crucial in EDA. Producers are responsible for implementing a change management process, which involves both non-breaking and breaking changes. With introduction of breaking changes, consumers are able to negotiate the migration process with producers.

Producers are “consumer agnostic”, as their boundary of responsibility ends when an event is published.

Message brokers

Message brokers are responsible for the durability of the events, and will keep an event available for consumption until it is successfully processed. Message brokers ensure that producers are able to publish events for consumers to consume, and they regulate access and permissions to publish and consume messages.

Message brokers are largely “events agnostic”, and do not generally access or interpret the event content. However, some systems provide a routing mechanism based on the event payload or metadata.

Consumers

Consumers are responsible for consuming events, and own the semantics of the effect of events. Consumers are usually bounded to one business context. This means the same event will have different effect semantics for different consumers. Crucial architectural choices when implementing a consumer involve the handling of unsuccessful message deliveries or duplicate messages. Depending on the business interpretation of the event, when recovering from failure a consumer might permit duplicate events, such as with an idempotent consumer pattern.

Crucially, consumers are “producer agnostic”, and their boundary of responsibility begins when an event is ready for consumption. This allows new consumers to onboard into the system without changing the producer contracts.

Team independence

In order to enforce the division of responsibilities, companies should organize their technical teams by ownership of producers, message brokers, and consumers. Although the ownership of producers and consumers is straightforward in an EDA implementation, the ownership of the message broker may not be. Different approaches can be taken to identify message broker ownership depending on your organizational structure.

Decentralized ownership

Ownership of the message broker in a decentralized ownership organizational structure

Figure 2. Ownership of the message broker in a decentralized ownership organizational structure

In a decentralized ownership organizational structure (see Figure 2), the teams producing events are responsible for managing their own message brokers and the durability and availability of the events for consumers.

The adoption of topic fanout patterns based on Amazon Simple Queue Service (SQS) and Amazon Simple Notification Service (SNS) (see Figure 3), can help companies implement a decentralized ownership pattern. A bus-based pattern using Amazon EventBridge can also be similarly utilized (see Figure 4).

Topic fanout pattern based on Amazon SQS and Amazon SNS

Figure 3. Topic fanout pattern based on Amazon SQS and Amazon SNS

Events bus pattern based on Amazon EventBridge

Figure 4. Events bus pattern based on Amazon EventBridge

The decentralized ownership approach has the advantage of promoting team independence, but it is not a fit for every organization. In order to be implemented effectively, a well-established DevOps culture is necessary. In this scenario, the producing teams are responsible for managing the message broker infrastructure and the non-functional requirements standards.

Centralized ownership

Ownership of the message broker in a centralized ownership organizational structure

Figure 5. Ownership of the message broker in a centralized ownership organizational structure

In a centralized ownership organizational structure, a central team (we’ll call it the platform team) is responsible for the management of the message broker (see Figure 5). Having a specialized platform team offers the advantage of standardized implementation of non-functional requirements, such as reliability, availability, and security. One disadvantage is that the platform team is a single point of failure in both the development and deployment lifecycle. This could become a bottleneck and put team independence and operational efficiency at risk.

Streaming pattern based on Amazon MSK and Kinesis Data Streams

Figure 6. Streaming pattern based on Amazon MSK and Kinesis Data Streams

On top of the implementation patterns mentioned in the previous section, the presence of a dedicated team makes it easier to implement streaming patterns. In this case, a deeper understanding on how the data is partitioned and how the system scales is required. Streaming patterns can be implemented using services such as Amazon Managed Streaming for Apache Kafka (MSK) or Amazon Kinesis Data Streams (see Figure 6).

Best practices for implementing event-driven architectures in your organization

The centralized and decentralized ownership organizational structures enhance team independence or standardization of non-functional requirements respectively. However, they introduce possible limits to the growth of the engineering function in a company. Inspired by the two approaches, you can implement a set of best practices which are aimed at minimizing those limitations.

Best practices for implementing event-driven architectures

Figure 7. Best practices for implementing event-driven architectures

  1. Introduce a cloud center of excellence (CCoE). A CCoE standardizes non-functional implementation across engineering teams. In order to promote a strong DevOps culture, the CCoE should not take the form of an external independent team, but rather be a collection of individual members representing the various engineering teams.
  2. Decentralize team ownership. Decentralize ownership and maintenance of the message broker to producing teams. This will maximize team independence and agility. It empowers the team to use the right tool for the right job, as long as they conform to the CCoE guidelines.
  3. Centralize logging standards and observability strategies. Although it is a best practice to decentralize team ownership of the components of an event-driven architecture, logging standards and observability strategies should be centralized and standardized across the engineering function. This centralization provides for end-to-end tracing of requests and events, which are powerful diagnosis tools in case of any failure.

Conclusion

In this post, we have described the main architectural components of an event-driven architecture, and identified the ownership of the message broker as one of the most important architectural choices you can make. We have described a centralized and decentralized organizational approach, presenting the strengths of the two approaches, as well as the limits they impose on the growth of your engineering organization. We have provided some best practices you can implement in your organization to minimize these limitations.

Further reading:
To start your journey building event-driven architectures in AWS, explore the following:

Detecting and stopping recursive loops in AWS Lambda functions

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/detecting-and-stopping-recursive-loops-in-aws-lambda-functions/

This post is written by Pawan Puthran, Principal Serverless Specialist TAM, Aneel Murari, Senior Serverless Specialist Solution Architect, and Shree Shrikhande, Senior AWS Lambda Product Manager.

AWS Lambda is announcing a recursion control to detect and stop Lambda functions running in a recursive or infinite loop.

At launch, this feature is available for Lambda integrations with Amazon Simple Queue Service (Amazon SQS), Amazon SNS, or when invoking functions directly using the Lambda invoke API. Lambda now detects functions that appear to be running in a recursive loop and drops requests after exceeding 16 invocations.

This can help reduce costs from unexpected Lambda function invocations because of recursion. You receive notifications about this action through the AWS Health Dashboard, email, or by configuring Amazon CloudWatch Alarms.

Overview

You can invoke Lambda functions in multiple ways. AWS services generate events that invoke Lambda functions, and Lambda functions can send messages to other AWS services. In most architectures, the service or resource that invokes a Lambda function should be different from the service or resource that the function outputs to. Because of misconfiguration or coding bugs, a function can send a processed event to the same service or resource that invokes the Lambda function, causing a recursive loop.

Lambda now detects the function running in a recursive loop between supported services, after exceeding 16 invocations. It returns a RecursiveInvocationException to the caller. There is no additional charge for this feature. For asynchronous invokes, Lambda sends the event to a dead-letter queue or on-failure destination, if one is configured.

The following is an example of an order processing system.

Image processing system

Order processing system

  1. A new order information message is sent to the source SQS queue.
  2. Lambda consumes the message from the source queue using an ESM.
  3. The Lambda function processes the message and sends the updated orders message to a destination SQS queue using the SQS SendMessage API.
  4. The source queue has a dead-letter queue(DLQ) configured for handling any failed or unprocessed messages.
  5. Because of a misconfiguration, the Lambda function sends the message to the source SQS queue instead of the destination queue. This causes a recursive loop of Lambda function invocations.

To explore sample code for this example, see the GitHub repo.

In the preceding example, after 16 invocations, Lambda throws a RecursiveInvocationException to the ESM. The ESM stops invoking the Lambda function and, once the maxReceiveCount is exceeded, SQS moves the message to the source queues configured DLQ.

You receive an AWS Health Dashboard notification with steps to troubleshoot the function.

AWS Health Dashboard notification

AWS Health Dashboard notification

You also receive an email notification to the registered email address on the account.

Email notification

Email notification

Lambda emits a RecursiveInvocationsDropped CloudWatch metric, which you can view in the CloudWatch console.

RecursiveInvocationsDropped CloudWatch metric

RecursiveInvocationsDropped CloudWatch metric

How does Lambda detect recursion?

For Lambda to detect recursive loops, your function must use one of the supported AWS SDK versions or higher.

Lambda uses an AWS X-Ray trace header primitive called “Lineage” to track the number of times a function has been invoked with an event. When your function code sends an event using a supported AWS SDK version, Lambda increments the counter in the lineage header. If your function is then invoked with the same triggering event more than 16 times, Lambda stops the next invocation for that event. You do not need to configure active X-Ray tracing for this feature to work.

An example lineage header looks like:

X-Amzn-Trace-Id:Root=1-645f7998-4b1e232810b0bb733dba2eab;Parent=5be88d12eefc1fc0;Sampled=1;Lineage=43e12f0f:5

43e12f0f is the hash of a resource, in this case a Lambda function. 5 is the number of times this function has been invoked with the same event. The logic of hash generation, encoding, and size of the lineage header may change in the future. You should not design any application functionality based on this.

When using an ESM to consume messages from SQS, after the maxReceiveCount value is exceeded, the message is sent to the source queue’s configured DLQ. When Lambda detects a recursive loop and drops subsequent invocations, it returns a RecursiveInvocationException to the ESM. This increments the maxReceiveCount value. When the ESM auto retries to process events, based on the error handling configuration, these retries are not considered recursive invocations.

When using SQS, you can also batch multiple messages into one Lambda event. Where the message batch size is greater than 1, Lambda uses the maximum lineage value within the batch of messages. It drops the entire batch if the value exceeds 16.

Recursion detection in action

You can deploy a sample application example in the GitHub repo to test Lambda recursive loop detection. The application includes a Lambda function that reads from an SQS queue and writes messages back to the same SQS queue.

As prerequisites, you must install:

To deploy the application:

    1. Set your AWS Region:
export REGION=<your AWS region>
    1. Clone the GitHub repository
git clone https://github.com/aws-samples/aws-lambda-recursion-detection-sample.git
cd aws-lambda-recursion-detection-sample
    1. Use AWS SAM to build and deploy the resources to your AWS account. Enter a stack name, such as lambda-recursion, when prompted. Accept the remaining default values.
sam build –-use-container
sam deploy --guided --region $REGION

To test the application:

    1. Save the name of the SQS queue in a local environment variable:
SOURCE_SQS_URL=$(aws cloudformation describe-stacks \ --region $REGION \ --stack-name lambda-recursion \ --query 'Stacks[0].Outputs[?OutputKey==`SourceSQSqueueURL`].OutputValue' --output text)
  1. Publish a message to the source SQS queue:
aws sqs send-message --queue-url $SOURCE_SQS_URL --message-body '{"orderId":"111","productName":"Bolt","orderStatus":"Submitted"}' --region $REGION

This invokes the Lambda function, which writes the message back to the queue.

To verify that Lambda has detected the recursion:

  1. Navigate to the CloudWatch console. Choose All Metrics under Metrics in the left-hand panel and search for RecursiveInvocationsDropped.

    Find RecursiveInvocationsDropped.

    Find RecursiveInvocationsDropped.

  2. Choose Lambda > By Function Name and choose RecursiveInvocationsDropped for the function you created. Under Graphed metrics, change the statistic to sum and Period to 1 minute. You see one record. Refresh if you don’t see the metric after a few seconds.
Metrics sum view

Metrics sum view

Actions to take when Lambda stops a recursive loop

When you receive a notification regarding recursion in your account, the following steps can help address the issue.

  • To stop further invoke attempts while you fix the underlying configuration issue, set the function concurrency to 0. This acts as an off switch for the Lambda function. You can choose the “Throttle” button in the Lambda console or use the PutFunctionConcurrency API to set the function concurrency to 0.
  • You can also disable or delete the event source mapping or trigger for the Lambda function.
  • Check your Lambda function code and configuration for any code defects that create loops. For example, check your environment variables to ensure you are not using the same SQS queue or SNS topic as source and target.
  • If an SQS Queue is the event source for your Lambda function, configure a DLQ on the source queue.
  • If an SNS topic is the event source, configure an On-Failure Destination for the Lambda function.

Disabling recursion detection

You may have valid use-cases where Lambda recursion is intentional as part of your design. In this case, use caution and implement suitable guardrails to prevent unexpected charges to your account. To learn more about best practices for using recursive invocation patterns, see Recursive patterns that cause run-away Lambda functions in the AWS Lambda Operator Guide.

This feature is turned on by default to stop recursive loops. To request turning it off for your account, reach out to AWS Support.

Conclusion

Lambda recursion control for SQS and SNS automatically detects and stops functions running in a recursive or infinite loop. This can be due to misconfiguration or coding errors. Recursion control helps reduce unexpected usage with Lambda and downstream services. The post also explains how Lambda detects and stops recursive loops and notifies you through AWS Health Dashboard to troubleshoot the function.

To learn more about the feature, visit the AWS Lambda Developer Guide.

For more serverless learning resources, visit Serverless Land

Implementing AWS Lambda error handling patterns

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/implementing-aws-lambda-error-handling-patterns/

This post is written by Jeff Chen, Principal Cloud Application Architect, and Jeff Li, Senior Cloud Application Architect

Event-driven architectures are an architecture style that can help you boost agility and build reliable, scalable applications. Splitting an application into loosely coupled services can help each service scale independently. A distributed, loosely coupled application depends on events to communicate application change states. Each service consumes events from other services and emits events to notify other services of state changes.

Handling errors becomes even more important when designing distributed applications. A service may fail if it cannot handle an invalid payload, dependent resources may be unavailable, or the service may time out. There may be permission errors that can cause failures. AWS services provide many features to handle error conditions, which you can use to improve the resiliency of your applications.

This post explores three use-cases and design patterns for handling failures.

Overview

AWS Lambda, Amazon Simple Queue Service (Amazon SQS), Amazon Simple Notification Service (Amazon SNS), and Amazon EventBridge are core building blocks for building serverless event-driven applications.

The post Understanding the Different Ways to Invoke Lambda Functions lists the three different ways of invoking a Lambda function: synchronous, asynchronous, and poll-based invocation. For a list of services and which invocation method they use, see the documentation.

Lambda’s integration with Amazon API Gateway is an example of a synchronous invocation. A client makes a request to API Gateway, which sends the request to Lambda. API Gateway waits for the function response and returns the response to the client. There are no built-in retries or error handling. If the request fails, the client attempts the request again.

Lambda’s integration with SNS and EventBridge are examples of asynchronous invocations. SNS, for example, sends an event to Lambda for processing. When Lambda receives the event, it places it on an internal event queue and returns an acknowledgment to SNS that it has received the message. Another Lambda process reads events from the internal queue and invokes your Lambda function. If SNS cannot deliver an event to your Lambda function, the service automatically retries the same operation based on a retry policy.

Lambda’s integration with SQS uses poll-based invocations. Lambda runs a fleet of pollers that poll your SQS queue for messages. The pollers read the messages in batches and invoke your Lambda function once per batch.

You can apply this pattern in many scenarios. For example, your operational application can add sales orders to an operational data store. You may then want to load the sales orders to your data warehouse periodically so that the information is available for forecasting and analysis. The operational application can batch completed sales as events and place them on an SQS queue. A Lambda function can then process the events and load the completed sale records into your data warehouse.

If your function processes the batch successfully, the pollers delete the messages from the SQS queue. If the batch is not successfully processed, the pollers do not delete the messages from the queue. Once the visibility timeout expires, the messages are available again to be reprocessed. If the message retention period expires, SQS deletes the message from the queue.

The following table shows the invocation types and retry behavior of the AWS services mentioned.

AWS service example Invocation type Retry behavior
Amazon API Gateway Synchronous No built-in retry, client attempts retries.

Amazon SNS

Amazon EventBridge

Asynchronous Built-in retries with exponential backoff.
Amazon SQS Poll-based Retries after visibility timeout expires until message retention period expires.

There are a number of design patterns to use for poll-based and asynchronous invocation types to retain failed messages for additional processing. These patterns can help you recover from delivery or processing failures.

You can explore the patterns and test the scenarios by deploying the code from this repository which uses the AWS Cloud Development Kit (AWS CDK) using Python.

Lambda poll-based invocation pattern

When using Lambda with SQS, if Lambda isn’t able to process the message and the message retention period expires, SQS drops the message. Failure to process the message can be due to function processing failures, including time-outs or invalid payloads. Processing failures can also occur when the destination function does not exist, or has incorrect permissions.

You can configure a separate dead-letter queue (DLQ) on the source queue for SQS to retain the dropped message. A DLQ preserves the original message and is useful for analyzing root causes, handling error conditions properly, or sending notifications that require manual interventions. In the poll-based invocation scenario, the Lambda function itself does not maintain a DLQ. It relies on the external DLQ configured in SQS. For more information, see Using Lambda with Amazon SQS.

The following shows the design pattern when you configure Lambda to poll events from an SQS queue and invoke a Lambda function.

Lambda synchronously polling catches of messages from SQS

Lambda synchronously polling batches of messages from SQS

To explore this pattern, deploy the code in this repository. Once deployed, you can use this instruction to test the pattern with the happy and unhappy paths.

Lambda asynchronous invocation pattern

With asynchronous invokes, there are two failure aspects to consider when using Lambda. The event source cannot deliver the message to Lambda and the Lambda function errors when processing the event.

Event sources vary in how they handle failures delivering messages to Lambda. If SNS or EventBridge cannot send the event to Lambda after exhausting all their retry attempts, the service drops the event. You can configure a DLQ on an SNS topic or EventBridge event bus to hold the dropped event. This works in the same way as the poll-based invocation pattern with SQS.

Lambda functions may then error due to input payload syntax errors, duration time-outs, or the function throws an exception such as a data resource not available.

For asynchronous invokes, you can configure how long Lambda retains an event in its internal queue, up to 6 hours. You can also configure how many times Lambda retries when the function errors, between 0 and 2. Lambda discards the event when the maximum age passes or all retry attempts fail. To retain a copy of discarded events, you can configure either a DLQ or, preferably, a failed-event destination as part of your Lambda function configuration.

A Lambda destination enables you to specify what to do next if an asynchronous invocation succeeds or fails. You can configure a destination to send invocation records to SQS, SNS, EventBridge, or another Lambda function. Destinations are preferred for failure processing as they support additional targets and include additional information. A DLQ holds the original failed event. With a destination, Lambda also passes details of the function’s response in the invocation record. This includes stack traces, which can be useful for analyzing the root cause.

Using both a DLQ and Lambda destinations

You can apply this pattern in many scenarios. For example, many of your applications may contain customer records. To comply with the California Consumer Privacy Act (CCPA), different organizations may need to delete records for a particular customer. You can set up a consumer delete SNS topic. Each organization creates a Lambda function, which processes the events published by the SNS topic and deletes customer records in its managed applications.

The following shows the design pattern when you configure an SNS topic as the event source for a Lambda function, which uses destination queues for success and failure process.

SNS topic as event source for Lambda

SNS topic as event source for Lambda

You configure a DLQ on the SNS topic to capture messages that SNS cannot deliver to Lambda. When Lambda invokes the function, it sends details of the successfully processed messages to an on-success SQS destination. You can use this pattern to route an event to multiple services for simpler use cases. For orchestrating multiple services, AWS Step Functions is a better design choice.

Lambda can also send details of unsuccessfully processed messages to an on-failure SQS destination.

A variant of this pattern is to replace an SQS destination with an EventBridge destination so that multiple consumers can process an event based on the destination.

To explore how to use an SQS DLQ and Lambda destinations, deploy the code in this repository. Once deployed, you can use this instruction to test the pattern with the happy and unhappy paths.

Using a DLQ

Although destinations is the preferred method to handle function failures, you can explore using DLQs.

The following shows the design pattern when you configure an SNS topic as the event source for a Lambda function, which uses SQS queues for failure process.

Lambda invoked asynchonously

Lambda invoked asynchonously

You configure a DLQ on the SNS topic to capture the messages that SNS cannot deliver to the Lambda function. You also configure a separate DLQ for the Lambda function. Lambda saves an unsuccessful event to this DLQ after Lambda cannot process the event after maximum retry attempts.

To explore how to use a Lambda DLQ, deploy the code in this repository. Once deployed, you can use this instruction to test the pattern with happy and unhappy paths.

Conclusion

This post explains three patterns that you can use to design resilient event-driven serverless applications. Error handling during event processing is an important part of designing serverless cloud applications.

You can deploy the code from the repository to explore how to use poll-based and asynchronous invocations. See how poll-based invocations can send failed messages to a DLQ. See how to use DLQs and Lambda destinations to route and handle unsuccessful events.

Learn more about event-driven architecture on Serverless Land.

Monitor Amazon SNS-based applications end-to-end with AWS X-Ray active tracing

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/monitor-amazon-sns-based-applications-end-to-end-with-aws-x-ray-active-tracing/

This post is written by Daniel Lorch, Senior Consultant and David Mbonu, Senior Solutions Architect.

Amazon Simple Notification Service (Amazon SNS), a messaging service that provides high-throughput, push-based, many-to-many messaging between distributed systems, microservices, and event-driven serverless applications, now supports active tracing with AWS X-Ray.

With AWS X-Ray active tracing enabled for SNS, you can identify bottlenecks and monitor the health of event-driven applications by looking at segment details for SNS topics, such as resource metadata, faults, errors, and message delivery latency for each subscriber.

This blog post reviews common use cases where AWS X-Ray active tracing enabled for SNS provides a consistent view of tracing data across AWS services in real-world scenarios. We cover two architectural patterns which allow you to gain accurate visibility of your end-to-end tracing: SNS to Amazon Simple Queue Service (Amazon SQS) queues and SNS topics to Amazon Kinesis Data Firehose streams.

Getting started with the sample serverless application

To demonstrate AWS X-Ray active tracing for SNS, we will use the Wild Rydes serverless application as shown in the following figure. The application uses a microservices architecture which implements asynchronous messaging for integrating independent systems.

Wild Rydes serverless application architecture

This is how the sample serverless application works:

  1. An Amazon API Gateway receives ride requests from users.
  2. An AWS Lambda function processes ride requests.
  3. An Amazon DynamoDB table serves as a store for rides.
  4. An SNS topic serves as a fan-out for ride requests.
  5. Individual SQS queues and Lambda functions are set up for processing requests via various back-office services (customer notification, customer accounting, and others).
  6. An SNS message filter is in place for the subscription of the extraordinary rides service.
  7. A Kinesis Data Firehose delivery stream archives ride requests in an Amazon Simple Storage Service (Amazon S3) bucket.

Deploying the sample serverless application

Prerequisites

Deployment steps using AWS SAM

The sample application is provided as an AWS SAM infrastructure as code template.

This demonstrative application will deploy an API without authorization. Please consider controlling and managing access to your APIs.

  1. Clone the GitHub repository:
    git clone https://github.com/aws-samples/sns-xray-active-tracing-blog-source-code
    cd sns-xray-active-tracing-blog-source-code
  2. Build the lab artifacts from source:
    sam build
  3. Deploy the sample solution into your AWS account:
    export AWS_REGION=$(aws --profile default configure get region)
    sam deploy \
    --stack-name wild-rydes-async-msg-2 \
    --capabilities CAPABILITY_IAM \
    --region $AWS_REGION \
    --guided

    Confirm SubmitRideCompletionFunction may not have authorization defined, Is this okay? [y/N]: with yes.

  4. Wait until the stack reaches status CREATE_COMPLETE.

See the sample application README.md for detailed deployment instructions.

Testing the application

Once the application is successfully deployed, generate messages and validate that the SNS topic is publishing all messages:

  1. Look up the API Gateway endpoint:
    export AWS_REGION=$(aws --profile default configure get region)
    aws cloudformation describe-stacks \
    --stack-name wild-rydes-async-msg-2 \
    --query 'Stacks[].Outputs[?OutputKey==`UnicornManagementServiceApiSubmitRideCompletionEndpoint`].OutputValue' \
    --output text
  2. Store this API Gateway endpoint in an environment variable:
    export ENDPOINT=$(aws cloudformation describe-stacks \
    --stack-name wild-rydes-async-msg-2 \
    --query 'Stacks[].Outputs[?OutputKey==`UnicornManagementServiceApiSubmitRideCompletionEndpoint`].OutputValue' \
    --output text)
  3. Send requests to the submit ride completion endpoint by executing the following command five or more times with varying payloads:
    curl -XPOST -i -H "Content-Type\:application/json" -d '{ "from": "Berlin", "to": "Frankfurt", "duration": 420, "distance": 600, "customer": "cmr", "fare": 256.50 }' $ENDPOINT
  4. Validate that messages are being passed in the application using the CloudWatch service map:
    Messages being passed on the CloudWatch service map

See the sample application README.md for detailed testing instructions.

The sample application shows various use-cases, which are described in the following sections.

Amazon SNS to Amazon SQS fanout scenario

A common application integration scenario for SNS is the Fanout scenario. In the Fanout scenario, a message published to an SNS topic is replicated and pushed to multiple endpoints, such as SQS queues. This allows for parallel asynchronous processing and is a common application integration pattern used in event-driven application architectures.

When an SNS topic fans out to SQS queues, the pattern is called topic-queue-chaining. This means that you add a queue, in our case an SQS queue, between the SNS topic and each of the subscriber services. As messages are buffered in a persistent manner in an SQS queue, no message is lost should a subscriber process run into issues for multiple hours or days, or experience exceptions or crashes.

By placing an SQS queue in front of each subscriber service, you can leverage the fact that a queue can act as a buffering load balancer. As every queue message is delivered to one of potentially many consumer processes, subscriber services can be easily scaled out and in, and the message load is distributed over the available consumer processes. In an event where suddenly a large number of messages arrives, the number of consumer processes has to be scaled out to cope with the additional load. This takes time and you need to wait until additional processes become operational. Since messages are buffered in the queue, you do not lose any messages in the process.

To summarize, in the Fanout scenario or the topic-queue-chaining pattern:

  • SNS replicates and pushes the message to multiple endpoints.
  • SQS decouples sending and receiving endpoints.

The fanout scenario is a common application integration scenario for SNS

With AWS X-Ray active tracing enabled on the SNS topic, the CloudWatch service map shows us the complete application architecture, as follows.

Fanout scenario with an SNS topic that fans out to SQS queues in the CloudWatch service map

Prior to the introduction of AWS X-Ray active tracing on the SNS topic, the AWS X-Ray service would not be able to reconstruct the full service map and the SQS nodes would be missing from the diagram.

To see the integration without AWS X-Ray active tracing enabled, open template.yaml and navigate to the resource RideCompletionTopic. Comment out the property TracingConfig: Active, redeploy and test the solution. The service map should then show an incomplete diagram where the SNS topic is linked directly to the consumer Lambda functions, omitting the SQS nodes.

For this use case, given the Fanout scenario, enabling AWS X-Ray active tracing on the SNS topic provides full end-to-end observability of the traces available in the application.

Amazon SNS to Amazon Kinesis Data Firehose delivery streams for message archiving and analytics

SNS is commonly used with Kinesis Data Firehose delivery streams for message archival and analytics use-cases. You can use SNS topics with Kinesis Data Firehose subscriptions to capture, transform, buffer, compress and upload data to Amazon S3, Amazon Redshift, Amazon OpenSearch Service, HTTP endpoints, and third-party service providers.

We will implement this pattern as follows:

  • An SNS topic to replicate and push the message to its subscribers.
  • A Kinesis Data Firehose delivery stream to capture and buffer messages.
  • An S3 bucket to receive uploaded messages for archival.

Message archiving and analytics using Kinesis Data Firehose delivery streams consumer to the SNS topic

In order to demonstrate this pattern, an additional consumer has been added to the SNS topic. The same Fanout pattern applies and the Kinesis Data Firehose delivery stream receives messages from the SNS topic alongside the existing consumers.

The Kinesis Data Firehose delivery stream buffers messages and is configured to deliver them to an S3 bucket for archival purposes. Optionally, an SNS message filter could be added to this subscription to select relevant messages for archival.

With AWS X-Ray active tracing enabled on the SNS topic, the Kinesis Data Firehose node will appear on the CloudWatch service map as a separate entity, as can be seen in the following figure. It is worth noting that the S3 bucket does not appear on the CloudWatch service map as Kinesis does not yet support AWS X-Ray active tracing at the time of writing of this blog post.

Kinesis Data Firehose delivery streams consumer to the SNS topic in the CloudWatch Service Map

Prior to the introduction of AWS X-Ray active tracing on the SNS topic, the AWS X-Ray service would not be able to reconstruct the full service map and the Kinesis Data Firehose node would be missing from the diagram. To see the integration without AWS X-Ray active tracing enabled, open template.yaml and navigate to the resource RideCompletionTopic. Comment out the property TracingConfig: Active, redeploy and test the solution. The service map should then show an incomplete diagram where the Kinesis Data Firehose node is missing.

For this use case, given the data archival scenario with Kinesis Delivery Firehose, enabling AWS X-Ray active tracing on the SNS topic provides additional visibility on the Kinesis Data Firehose node in the CloudWatch service map.

Review faults, errors, and message delivery latency on the AWS X-Ray trace details page

The AWS X-Ray trace details page provides a timeline with resource metadata, faults, errors, and message delivery latency for each segment.

With AWS X-Ray active tracing enabled on SNS, additional segments for the SNS topic itself, but also the downstream consumers (AWS::SNS::Topic, AWS::SQS::Queue and AWS::KinesisFirehose) segments are available, providing additional faults, errors, and message delivery latency for these segments. This allows you to analyze latencies in your messages and their backend services. For example, how long a message spends in a topic, and how long it took to deliver the message to each of the topic’s subscriptions.

Additional faults, errors, and message delivery latency information on AWS X-Ray trace details page

Enabling AWS X-Ray active tracing for SNS

AWS X-Ray active tracing is not enabled by default on SNS topics and needs to be explicitly enabled.

The example application used in this blog post demonstrates how to enable active tracing using AWS SAM.

You can enable AWS X-Ray active tracing using the SNS SetTopicAttributes API, SNS Management Console, or via AWS CloudFormation. See Active tracing in Amazon SNS in the Amazon SNS Developer Guide for more options.

Cleanup

To clean up the resources provisioned as part of the sample serverless application, follow the instructions as outlined in the sample application README.md.

Conclusion

AWS X-Ray active tracing for SNS enables end-to-end visibility in real-world scenarios involving patterns like SNS to SQS and SNS to Amazon Kinesis.

But it is not only useful for these patterns. With AWS X-Ray active tracing enabled for SNS, you can identify bottlenecks and monitor the health of event-driven applications by looking at segment details for SNS topics and consumers, such as resource metadata, faults, errors, and message delivery latency for each subscriber.

Enable AWS X-Ray active tracing for SNS to gain accurate visibility of your end-to-end tracing.

For more serverless learning resources, visit Serverless Land.

Serverless ICYMI Q1 2023

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/serverless-icymi-q1-2023/

Welcome to the 21st edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

ICYMI2023Q1

In case you missed our last ICYMI, check out what happened last quarter here.

Artificial intelligence (AI) technologies, ChatGPT, and DALL-E are creating significant interest in the industry at the moment. Find out how to integrate serverless services with ChatGPT and DALL-E to generate unique bedtime stories for children.

Example notification of a story hosted with Next.js and App Runner

Example notification of a story hosted with Next.js and App Runner

Serverless Land is a website maintained by the Serverless Developer Advocate team to help you build serverless applications and includes workshops, code examples, blogs, and videos. There is now enhanced search functionality so you can search across resources, patterns, and video content.

SLand-search

ServerlessLand search

AWS Lambda

AWS Lambda has improved how concurrency works with Amazon SQS. You can now control the maximum number of concurrent Lambda functions invoked.

The launch blog post explains the scaling behavior of Lambda using this architectural pattern, challenges this feature helps address, and a demo of maximum concurrency in action.

Maximum concurrency is set to 10 for the SQS queue.

Maximum concurrency is set to 10 for the SQS queue.

AWS Lambda Powertools is an open-source library to help you discover and incorporate serverless best practices more easily. Lambda Powertools for .NET is now generally available and currently focused on three observability features: distributed tracing (Tracer), structured logging (Logger), and asynchronous business and application metrics (Metrics). Powertools is also available for Python, Java, and Typescript/Node.js programming languages.

To learn more:

Lambda announced a new feature, runtime management controls, which provide more visibility and control over when Lambda applies runtime updates to your functions. The runtime controls are optional capabilities for advanced customers that require more control over their runtime changes. You can now specify a runtime management configuration for each function with three settings, Automatic (default), Function update, or manual.

There are three new Amazon CloudWatch metrics for asynchronous Lambda function invocations: AsyncEventsReceived, AsyncEventAge, and AsyncEventsDropped. You can track the asynchronous invocation requests sent to Lambda functions to monitor any delays in processing and take corrective actions if required. The launch blog post explains the new metrics and how to use them to troubleshoot issues.

Lambda now supports Amazon DocumentDB change streams as an event source. You can use Lambda functions to process new documents, track updates to existing documents, or log deleted documents. You can use any programming language that is supported by Lambda to write your functions.

There is a helpful blog post suggesting best practices for developing portable Lambda functions that allow you to port your code to containers if you later choose to.

AWS Step Functions

AWS Step Functions has expanded its AWS SDK integrations with support for 35 additional AWS services including Amazon EMR Serverless, AWS Clean Rooms, AWS IoT FleetWise, AWS IoT RoboRunner and 31 other AWS services. In addition, Step Functions also added support for 1000+ new API actions from new and existing AWS services such as Amazon DynamoDB and Amazon Athena. For the full list of added services, visit AWS SDK service integrations.

Amazon EventBridge

Amazon EventBridge has launched the AWS Controllers for Kubernetes (ACK) for EventBridge and Pipes . This allows you to manage EventBridge resources, such as event buses, rules, and pipes, using the Kubernetes API and resource model (custom resource definitions).

EventBridge event buses now also support enhanced integration with Service Quotas. Your quota increase requests for limits such as PutEvents transactions-per-second, number of rules, and invocations per second among others will be processed within one business day or faster, enabling you to respond quickly to changes in usage.

AWS SAM

The AWS Serverless Application Model (SAM) Command Line Interface (CLI) has added the sam list command. You can now show resources defined in your application, including the endpoints, methods, and stack outputs required to test your deployed application.

AWS SAM has a preview of sam build support for building and packaging serverless applications developed in Rust. You can use cargo-lambda in the AWS SAM CLI build workflow and AWS SAM Accelerate to iterate on your code changes rapidly in the cloud.

You can now use AWS SAM connectors as a source resource parameter. Previously, you could only define AWS SAM connectors as a AWS::Serverless::Connector resource. Now you can add the resource attribute on a connector’s source resource, which makes templates more readable and easier to update over time.

AWS SAM connectors now also support multiple destinations to simplify your permissions. You can now use a single connector between a single source resource and multiple destination resources.

In October 2022, AWS released OpenID Connect (OIDC) support for AWS SAM Pipelines. This improves your security posture by creating integrations that use short-lived credentials from your CI/CD provider. There is a new blog post on how to implement it.

Find out how best to build serverless Java applications with the AWS SAM CLI.

AWS App Runner

AWS App Runner now supports retrieving secrets and configuration data stored in AWS Secrets Manager and AWS Systems Manager (SSM) Parameter Store in an App Runner service as runtime environment variables.

AppRunner also now supports incoming requests based on HTTP 1.0 protocol, and has added service level concurrency, CPU and Memory utilization metrics.

Amazon S3

Amazon S3 now automatically applies default encryption to all new objects added to S3, at no additional cost and with no impact on performance.

You can now use an S3 Object Lambda Access Point alias as an origin for your Amazon CloudFront distribution to tailor or customize data to end users. For example, you can resize an image depending on the device that an end user is visiting from.

S3 has introduced Mountpoint for S3, a high performance open source file client that translates local file system API calls to S3 object API calls like GET and LIST.

S3 Multi-Region Access Points now support datasets that are replicated across multiple AWS accounts. They provide a single global endpoint for your multi-region applications, and dynamically route S3 requests based on policies that you define. This helps you to more easily implement multi-Region resilience, latency-based routing, and active-passive failover, even when data is stored in multiple accounts.

Amazon Kinesis

Amazon Kinesis Data Firehose now supports streaming data delivery to Elastic. This is an easier way to ingest streaming data to Elastic and consume the Elastic Stack (ELK Stack) solutions for enterprise search, observability, and security without having to manage applications or write code.

Amazon DynamoDB

Amazon DynamoDB now supports table deletion protection to protect your tables from accidental deletion when performing regular table management operations. You can set the deletion protection property for each table, which is set to disabled by default.

Amazon SNS

Amazon SNS now supports AWS X-Ray active tracing to visualize, analyze, and debug application performance. You can now view traces that flow through Amazon SNS topics to destination services, such as Amazon Simple Queue Service, Lambda, and Kinesis Data Firehose, in addition to traversing the application topology in Amazon CloudWatch ServiceLens.

SNS also now supports setting content-type request headers for HTTPS notifications so applications can receive their notifications in a more predictable format. Topic subscribers can create a DeliveryPolicy that specifies the content-type value that SNS assigns to their HTTPS notifications, such as application/json, application/xml, or text/plain.

EDA Visuals collection added to Serverless Land

The Serverless Developer Advocate team has extended Serverless Land and introduced EDA visuals. These are small bite sized visuals to help you understand concept and patterns about event-driven architectures. Find out about batch processing vs. event streaming, commands vs. events, message queues vs. event brokers, and point-to-point messaging. Discover bounded contexts, migrations, idempotency, claims, enrichment and more!

EDA-visuals

EDA Visuals

To learn more:

Serverless Repos Collection on Serverless Land

There is also a new section on Serverless Land containing helpful code repositories. You can search for code repos to use for examples, learning or building serverless applications. You can also filter by use-case, runtime, and level.

Serverless Repos Collection

Serverless Repos Collection

Serverless Blog Posts

January

Jan 12 – Introducing maximum concurrency of AWS Lambda functions when using Amazon SQS as an event source

Jan 20 – Processing geospatial IoT data with AWS IoT Core and the Amazon Location Service

Jan 23 – AWS Lambda: Resilience under-the-hood

Jan 24 – Introducing AWS Lambda runtime management controls

Jan 24 – Best practices for working with the Apache Velocity Template Language in Amazon API Gateway

February

Feb 6 – Previewing environments using containerized AWS Lambda functions

Feb 7 – Building ad-hoc consumers for event-driven architectures

Feb 9 – Implementing architectural patterns with Amazon EventBridge Pipes

Feb 9 – Securing CI/CD pipelines with AWS SAM Pipelines and OIDC

Feb 9 – Introducing new asynchronous invocation metrics for AWS Lambda

Feb 14 – Migrating to token-based authentication for iOS applications with Amazon SNS

Feb 15 – Implementing reactive progress tracking for AWS Step Functions

Feb 23 – Developing portable AWS Lambda functions

Feb 23 – Uploading large objects to Amazon S3 using multipart upload and transfer acceleration

Feb 28 – Introducing AWS Lambda Powertools for .NET

March

Mar 9 – Server-side rendering micro-frontends – UI composer and service discovery

Mar 9 – Building serverless Java applications with the AWS SAM CLI

Mar 10 – Managing sessions of anonymous users in WebSocket API-based applications

Mar 14 –
Implementing an event-driven serverless story generation application with ChatGPT and DALL-E

Videos

Serverless Office Hours – Tues 10AM PT

Weekly office hours live stream. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues. Ask us anything you want about serverless technologies and applications.

January

Jan 10 – Building .NET 7 high performance Lambda functions

Jan 17 – Amazon Managed Workflows for Apache Airflow at Scale

Jan 24 – Using Terraform with AWS SAM

Jan 31 – Preparing your serverless architectures for the big day

February

Feb 07- Visually design and build serverless applications

Feb 14 – Multi-tenant serverless SaaS

Feb 21 – Refactoring to Serverless

Feb 28 – EDA visually explained

March

Mar 07 – Lambda cookbook with Python

Mar 14 – Succeeding with serverless

Mar 21 – Lambda Powertools .NET

Mar 28 – Server-side rendering micro-frontends

FooBar Serverless YouTube channel

Marcia Villalba frequently publishes new videos on her popular serverless YouTube channel. You can view all of Marcia’s videos at https://www.youtube.com/c/FooBar_codes.

January

Jan 12 – Serverless Badge – A new certification to validate your Serverless Knowledge

Jan 19 – Step functions Distributed map – Run 10k parallel serverless executions!

Jan 26 – Step Functions Intrinsic Functions – Do simple data processing directly from the state machines!

February

Feb 02 – Unlock the Power of EventBridge Pipes: Integrate Across Platforms with Ease!

Feb 09 – Amazon EventBridge Pipes: Enrichment and filter of events Demo with AWS SAM

Feb 16 – AWS App Runner – Deploy your apps from GitHub to Cloud in Record Time

Feb 23 – AWS App Runner – Demo hosting a Node.js app in the cloud directly from GitHub (AWS CDK)

March

Mar 02 – What is Amazon DynamoDB? What are the most important concepts? What are the indexes?

Mar 09 – Choreography vs Orchestration: Which is Best for Your Distributed Application?

Mar 16 – DynamoDB Single Table Design: Simplify Your Code and Boost Performance with Table Design Strategies

Mar 23 – 8 Reasons You Should Choose DynamoDB for Your Next Project and How to Get Started

Sessions with SAM & Friends

SAMFiends

AWS SAM & Friends

Eric Johnson is exploring how developers are building serverless applications. We spend time talking about AWS SAM as well as others like AWS CDK, Terraform, Wing, and AMPT.

Feb 16 – What’s new with AWS SAM

Feb 23 – AWS SAM with AWS CDK

Mar 02 – AWS SAM and Terraform

Mar 10 – Live from ServerlessDays ANZ

Mar 16 – All about AMPT

Mar 23 – All about Wing

Mar 30 – SAM Accelerate deep dive

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

AWS Week in Review – March 27, 2023

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-week-in-review-march-27-2023/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

In Finland, where I live, spring has arrived. The snow has melted, and the trees have grown their first buds. But I don’t get my hopes high, as usually around Easter we have what is called takatalvi. Takatalvi is a Finnish world that means that the winter returns unexpectedly in the spring.

Last Week’s Launches
Here are some launches that got my attention during the previous week.

AWS SAM CLI – Now the sam sync command will compare your local Serverless Application Model (AWS SAM) template with your deployed AWS CloudFormation template and skip the deployment if there are no changes. For more information, check the latest version of the AWS SAM CLI.

IAM – AWS Identity and Access Management (IAM) has launched two new global condition context keys. With these new condition keys, you can write service control policies (SCPs) or IAM policies that restrict the VPCs and private IP addresses from which your Amazon Elastic Compute Cloud (Amazon EC2) instance credentials can be used, without hard-coding VPC IDs or IP addresses in the policy. To learn more about this launch and how to get started, see How to use policies to restrict where EC2 instance credentials can be used from.

Amazon SNS – Amazon Simple Notification Service (Amazon SNS) now supports setting context-type request headers for HTTP/S notifications, such as application/json, application/xml, or text/plain. With this new feature, applications can receive their notifications in a more predictable format.

AWS Batch – AWS Batch now allows you to configure ephemeral storage up to 200GiB on AWS Fargate type jobs. With this launch, you no longer need to limit the size of your data sets or the size of the Docker images to run machine learning inference.

Application Load Balancer – Application Load Balancer (ALB) now supports Transport Layer Security (TLS) protocol version 1.3, enabling you to optimize the performance of your application while keeping it secure. TLS 1.3 on ALB works by offloading encryption and decryption of TLS traffic from your application server to the load balancer.

Amazon IVS – Amazon Interactive Video Service (IVS) now supports combining videos from multiple hosts into the source of a live stream. For a demo, refer to Add multiple hosts to live streams with Amazon IVS.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other updates and news that you may have missed:

I read the post Implementing an event-driven serverless story generation application with ChatGPT and DALL-E a few days ago, and since then I have been reading my child a lot of  AI-generated stories. In this post, David Boyne, explains step by step how you can create an event-driven serverless story generation application. This application produces a brand-new story every day at bedtime with images, which can be played in audio format.

Podcast Charlas Técnicas de AWS – If you understand Spanish, this podcast is for you. Podcast Charlas Técnicas is one of the official AWS podcasts in Spanish, and every other week there is a new episode. The podcast is meant for builders, and it shares stories about how customers have implemented and learned AWS services, how to architect applications, and how to use new services. You can listen to all the episodes directly from your favorite podcast app or at AWS Podcasts en español.

AWS open-source news and updates – The open source newsletter is curated by my colleague Ricardo Sueiras to bring you the latest open-source projects, posts, events, and more.

Upcoming AWS Events
Check your calendars and sign up for the AWS Summit closest to your city. AWS Summits are free events that bring the local community together, where you can learn about different AWS services.

Here are the ones coming up in the next months:

That’s all for this week. Check back next Monday for another Week in Review!

— Marcia

Implementing an event-driven serverless story generation application with ChatGPT and DALL-E

Post Syndicated from David Boyne original https://aws.amazon.com/blogs/compute/implementing-an-event-driven-serverless-story-generation-application-with-chatgpt-and-dall-e/

This post demonstrates how to integrate AWS serverless services with artificial intelligence (AI) technologies, ChatGPT, and DALL-E. This full stack event-driven application showcases a method of generating unique bedtime stories for children by using predetermined characters and scenes as a prompt for ChatGPT.

Every night at bedtime, the serverless scheduler triggers the application, initiating an event-driven workflow to create and store new unique AI-generated stories with AI-generated images and supporting audio.

These datasets are used to showcase the story on a custom website built with Next.js hosted with AWS App Runner. After the story is created, a notification is sent to the user containing a URL to view and read the story to the children.

Example notification of a story hosted with Next.js and App Runner

Example notification of a story hosted with Next.js and App Runner

By integrating AWS services with AI technologies, you can now create new and innovative ideas that were previously unimaginable.

The application mentioned in this blog post demonstrates examples of point-to-point messaging with Amazon EventBridge pipes, publish/subscribe patterns with Amazon EventBridge and reacting to change data capture events with DynamoDB Streams.

Understanding the architecture

The following image shows the serverless architecture used to generate stories:

Architecture diagram for Serverless bed time story generation with ChatGPT and DALL-E

Architecture diagram for Serverless bed time story generation with ChatGPT and DALL-E

A new children’s story is generated every day at configured time using Amazon EventBridge Scheduler (Step 1). EventBridge Scheduler is a service capable of scaling millions of schedules with over 200 targets and over 6000 API calls. This example application uses EventBridge scheduler to trigger an AWS Lambda function every night at the same time (7:15pm). The Lambda function is triggered to start the generation of the story.

EventBridge scheduler triggers Lambda function every day at 7:15pm (bed time)

EventBridge scheduler triggers Lambda function every day at 7:15pm (bed time)

The “Scenes” and “Characters” Amazon DynamoDB tables contain the characters involved in the story and a scene that is randomly selected during its creation. As a result, ChatGPT receives a unique prompt each time. An example of the prompt may look like this:

“`
Write a title and a rhyming story on 2 main characters called Parker and Jackson. The story needs to be set within the scene haunted woods and be at least 200 words long

“`

After the story is created, it is then saved in the “Stories” DynamoDB table (Step 2).

Scheduler triggering Lambda function to generate the story and store story into DynamoDB

Scheduler triggering Lambda function to generate the story and store story into DynamoDB

Once the story is created this initiates a change data capture event using DynamoDB Streams (Step 3). This event flows through point-to-point messaging with EventBridge pipes and directly into EventBridge. Input transforms are then used to convert the DynamoDB Stream event into a custom EventBridge event, which downstream consumers can comprehend. Adopting this pattern is beneficial as it allows us to separate contracts from the DynamoDB event schema and not having downstream consumers conform to this schema structure, this mapping allows us to remain decoupled from implementation details.

EventBridge Pipes connecting DynamoDB streams directly into EventBridge.

EventBridge Pipes connecting DynamoDB streams directly into EventBridge.

Upon triggering the StoryCreated event in EventBridge, three targets are triggered to carry out several processes (Step 4). Firstly, AI Images are processed, followed by the creation of audio for the story. Finally, the end user is notified of the completed story through Amazon SNS and email subscriptions. This fan-out pattern enables these tasks to be run asynchronously and in parallel, allowing for faster processing times.

EventBridge pub/sub pattern used to start async processing of notifications, audio, and images.

EventBridge pub/sub pattern used to start async processing of notifications, audio, and images.

An SNS topic is triggered by the `StoryCreated` event to send an email to the end user using email subscriptions (Step 6). The email consists of a URL with the id of the story that has been created. Clicking on the URL takes the user to the frontend application that is hosted with App Runner.

Using SNS to notify the user of a new story

Using SNS to notify the user of a new story

Example email sent to the user

Example email sent to the user

Amazon Polly is used to generate the audio files for the story (Step 6). Upon triggering the `StoryCreated` event, a Lambda function is triggered, and the story description is used and given to Amazon Polly. Amazon Polly then creates an audio file of the story, which is stored in Amazon S3. A presigned URL is generated and saved in DynamoDB against the created story. This allows the frontend application and browser to retrieve the audio file when the user views the page. The presigned URL has a validity of two days, after which it can no longer be accessed or listened to.

Lambda function to generate audio using Amazon Polly, store in S3 and update story with presigned URL

Lambda function to generate audio using Amazon Polly, store in S3 and update story with presigned URL

The `StoryCreated` event also triggers another Lambda function, which uses the OpenAI API to generate an AI image using DALL-E based on the generated story (Step 7). Once the image is generated, the image is downloaded and stored in Amazon S3. Similar to the audio file, the system generates a presigned URL for the image and saves it in DynamoDB against the story. The presigned URL is only valid for two days, after which it becomes inaccessible for download or viewing.

Lambda function to generate images, store in S3 and update story with presigned URL.

Lambda function to generate images, store in S3 and update story with presigned URL.

In the event of a failure in audio or image generation, the frontend application still loads the story, but does not display the missing image or audio at that moment. This ensures that the frontend can continue working and provide value. If you wanted more control and only trigger the user’s notification event once all parallel tasks are complete the aggregator messaging pattern can be considered.

Hosting the frontend Next.js application with AWS App Runner

Next.js is used by the frontend application to render server-side rendered (SSR) pages that can access the stories from the DynamoDB table, which are then hosted with AWS App Runner after being containerized.

Next.js application hosted with App Runner, with permissions into DynamoDB table.

Next.js application hosted with App Runner, with permissions into DynamoDB table.

AWS App Runner enables you to deploy containerized web applications and APIs securely, without needing any prior knowledge of containers or infrastructure. With App Runner, developers can concentrate on their application, while the service handles container startup, running, scaling, and load balancing. After deployment, App Runner provides a secure URL for clients to begin making HTTP requests against.

With App Runner, you have two primary options for deploying your container: source code connections or source images. Using source code connections grants App Runner permission to pull the image file directly from your source code, and with Automatic deployment configured, it can redeploy the application when changes are made. Alternatively, source images provide App Runner with the image’s location in an image registry, and this image is deployed by App Runner.

In this example application, CDK deploys the application using the DockerImageAsset construct with the App Runner construct. Once deployed, App Runner builds and uploads the frontend image to Amazon Elastic Container Registry (ECR) and deploys it. Downstream consumers can access the application using the secure URL provided by App Runner. In this example, the URL is used when the SNS notification is sent to the user when the story is ready to be viewed.

Giving the frontend container permission to DynamoDB table

To grant the Next.js application permission to obtain stories from the Stories DynamoDB table, App Runner instance roles are configured. These roles are optional and can provide the necessary permissions for the container to access AWS services required by the compute service.

If you want to learn more about AWS App Runner, you can explore the free workshop.

Design choices and assumptions

The DynamoDB Time to Live (TTL) feature is ideal for the short-lived nature of daily generated stories. DynamoDB handle the deletion of stories after two days by setting the TTL attribute on each story. Once a story is deleted, it becomes inaccessible through the generated story URLs.

Using Amazon S3 presigned URLs is a method to grant temporary access to a file in S3. This application creates presigned URLs for the audio file and generated images that last for 2 days, after which the URLs for the S3 items become invalid.

Input transforms are used between DynamoDB streams and EventBridge events to decouple the schemas and events consumed by downstream targets. Consuming the events as they are is known as the “conformist” pattern, and couples us to implementation details of DynamoDB streams with downstream EventBridge consumers. This allows the application to remain decoupled from implementation details and remain flexible.

Conclusion

The adoption of artificial intelligence (AI) technology has significantly increased in various industries. ChatGPT, a large language model that can understand and generate human-like responses in natural language, and DALL-E, an image generation system that can create realistic images based on textual descriptions, are examples of such technology. These systems have demonstrated the potential for AI to provide innovative solutions and transform the way we interact with technology.

This blog post explores ways in which you can utilize AWS serverless services with ChatGTP and DALL-E to create a story generation application fronted by a Next.js application hosted with App Runner. EventBridge Scheduler is used to trigger the story creation process then react to change data capture events with DynamoDB streams and EventBridge Pipes, and use Amazon EventBridge to fan out compute tasks to process notifications, images, and audio files.

You can find the documentation and the source code for this application in GitHub.

For more serverless learning resources, visit Serverless Land.

Build AI and ML into Email & SMS for customer engagement

Post Syndicated from Vinay Ujjini original https://aws.amazon.com/blogs/messaging-and-targeting/build-ai-and-ml-into-email-sms-for-customer-engagement/

Build AI and ML into Email & SMS for customer engagement

Customers engage with businesses through various channels like email, SMS, Push, and in-app. With the availability and ease of usage of mobile phones, businesses can use 2-way Short Service Messages (SMS) to engage with their customers. Text messaging does not need applications and provides immediate interaction with your customers. Amazon Pinpoint enables businesses & organizations to interact in 2-way SMS messages with their customers. Since it is not practical and scalable for organizations to have people responding to millions of their customer’s texts, we can leverage Amazon Lex which helps build the conversational AI into the 2-way SMS. Amazon Lex is a fully managed artificial intelligent (AI) AWS service with advanced natural language models to design, build, test, and deploy conversational interfaces in applications. Machine Learning (ML) is used in digital marketing to help businesses detect patterns in customer bhevaior.

Today, if customers want to know the latest status on their order, they have to send an email, which is hard for businesses to monitor and respond, and time consuming for the customer to call regarding their order status and also expensive for businesses to field the calls.

This blog post shows how you can elevate your customer’s experience using Amazon Pinpoint’s omni-channel capabilities, Amazon Lex’s AI powered chat, and ML-powered personalization using Amazon Personalize.

The solution presented in this blog helps resolve all the above issues. The example I have used to depict this where a customer orders a bike and since the delivery has been delayed, he wants to get timely updates on the progress. He has been given a phone number by the bike company to text them with any questions. This solution elevates the customer’s experience by providing him with timely update by checking the latest from the database and also sending additional product recommendations, predicting what the customer might need.

Architecture

This solution uses Amazon Pinpoint, Amazon Lex, AWS Lambda, Amazon Dynamo DB, Amazon Simple Notification Services, Amazon Personalize.

AWS architecture diagram AI/ML, Email, SMS.

  1. The customer sends a message to the number provided by the store asking about their order status.
  2. Pinpoint 2-way SMS has as SNS topic tied to it.
  3. The SNS topic relays the message to the Lex integration Lambda.
  4. This Lex integration lambda has the integration between Pinpoint & Lex.
  5. When the customer checks on their order status, Lex taps into the fulfillment lambda that is tied to it.
  6. That lambda checks on the order status from the DynamoDB and sends it back to Lex.
  7. Lex sends the order details to Amazon Pinpoint and Amazon Pinpoint delivers the SMS with the order details to the customer’s phone number.
  8. Amazon Lex lets fulfillment Lambda know to send an email to the customer with the order details.
  9. Fulfillment Lambda create an event called ‘Order Status’ for Amazon Pinpoint Journey to consume in its Journey.
  10. Amazon Pinpoint’s message template reaches out to Amazon Personalize to get the 3 recommendations.
  11. Amazon Pinpoint’s Journey triggers an email message to the customer with the order information and recommendations

Prerequisites

To deploy this solution, you must have the following:

  1. An AWS account.
  2. An Amazon Pinpoint project.
  3. An originating identity that supports 2 way SMS in the country you are planning to send SMS to – Supported countries and regions (SMS channel).
  4. A mobile number to send and receive text messages.
  5. An SMS customer segment – Download the example CSV, that contains a sample SMS & email endpoints. Replace the phone number (column C) with yours, and email with your email and import it to Amazon Pinpoint – How to import an Amazon Pinpoint segment.
  6. Add your mobile number in the Amazon Pinpoint SMS sandbox.
  7. Verify your email address that needs to receive messages from this account.
  8. Download the LexIntegration.zip & RE_Order_Validation.zip Lambda files from this Github location.

Preparation:

  1. Download the CloudFormation template.
  2. Go to Amazon S3 console and create a bucket. I created one for this example as ‘pinpointreinventaiml-code’. Under that S3 bucket, create a sub-folder and name it Lambda.
  3. Upload the 2 zip files you downloaded earlier from the Github.
  4. In Amazon Pinpoint > Phone numbers, Check to make sure the phone number you are using is enabled for SMS and its status is active.
  5. Add the machine learning generated product recommendations using Amazon Personalize.
Check if phone number is enabled & active in Pinpoint console

Phone numbers in Pinpoint console

Solution implementation

Create a Lex Chat bot:

  1. Now it’s time to create your bot. To create your bot, sign in to the Lex console at https://console.aws.amazon.com/lex.
  2. For more information about creating bots in Lex, see https://docs.aws.amazon.com/lex/latest/dg/gs-console.html.
  3. Click on Create bot button. Next steps:
    1. Select Create a blank bot radio button.
    2. Give a Bot name ‘Order Status’ under Bot name Configuration. (Use the same Bot name as mentioned here. If you change the Bot name here, your CloudFormation will fail)
    3. Under IAM permissions, select the radio button Create a role with basic Amazon Lex permissions.
    4. For COPPA, choose No. Click Next
    5. Under Language dropdown, choose the language of usage. I chose Language as English in my example.
    6. Click Done, to complete the Bot creation.
  4. You have to create an Intent within the Bot you just created
    1. Click on the Bot you just created. Click on Intents and click the dropdown Add intent and select Add empty intent.
    2. Give an intent name and click Ok.
  5. Once the intent is created, go to the intent and open the Conversation flow section in the intent and create a flow that that has the following info and looks like below image:
    1. Click on Sample utterance and it takes you to Sample Utterance and type in Order status.
    2. Click on initial response and type in Okay, I can help with that. What is your order number?
    3. Click on the slot value and click on Add a slot. Name: OrderNumber and Slot type is AMAZON.AlphaNumeric. In the prompt, enter Please enter your order number.
    4. Click on Save Intent button. The conversation flow should look like the below screenshot:

Amazon Lex intent

6. Go back to the Intent you just created and click on the Build button that is to the right side of the page.

Build intent

7. Once the build is successfully completed, go back to the Bot you created and click on Aliases on the left frame. Click on the Alias that was created earlier, TestBotAlias.

Bot Alias

8. In the Languages section, click on the English language that we created earlier.
9. Open the Lambda function – optional section and point the source to RE_Order_Validation Lambda that we downloaded earlier.
10. For Lambda function version or alias, select $LATEST. Click on Save.

Add Lambda to Alias

11. Go to Intents, choose the intent you just built and click on Build button again. Once build is complete, you can test the intent.

Import and execute CloudFormation:

  1. Navigate to the Amazon CloudFormation console in the AWS region you want to deploy this solution.
  2. Select Create stack and With new resources. Choose Template is ready as Prerequisite – Prepare template and Upload a template file as Specify template. Upload the template downloaded in step 1 under Preparation section of this document. Click Next.
  3. Fill the AWS CloudFormation parameters as shown below:
  4. Stack name: Give a name to this stack.
    1. Under Parameters, for BotAlias: The Bot Alias that you created as part of Amazon Lex above.
    2. BotId: The Bot ID for the bot that you created as part of Amazon Lex above.
    3. CodeS3Bucket: Give the name of the S3 bucket you created in step3 of the Preparation topic above.
    4. OriginationNumber: This is the origination identity phone number you created in step4 of the Preparation topic above.
    5. PinpointProjectId: Use the ProjectID you have from step2 of the Prerequisites phase above.
  5. After entering all the parameter info, it would look something like this below:
  6. CloudFormation parameters
  7. Click Next. Leave the default options on the next page and click Next again.
  8. Check the box I acknowledge that AWS CloudFormation might create IAM resources with custom names. Click Submit.

Set up data in Amazon Dynamo DB

  1. We are using DynamoDB table here as the transactional database that stores order information for the bike store.
  2. Once the solution has been successfully deployed, navigate to the Amazon DynamoDB console and access the OrderStatus DynamoDB table. Each row created in this table represents an order and it’s details. Each row should have a unique Order_Num that holds the order number and it’s related information. You can put additional information about the order like the example below:
  3. {
       "Order_Num":{
          "Value":"ABC123"
       },
       "Delivery_Dt":{
          "Value":"12/01/2022"
       },
       "Order_Dt":{
          "Value":"11/01/2022"
       }
       "Shipping_Dt":{
          "Value":"11/24/2022"
       }
       "UserId":{
          "Value":"example-iser-id-3"
       }
    }
  4. Once you enter the data, it should look like the image below. Click on Create item.
  5. Dynamo DB values

Set up Amazon Simple Notification Service (SNS) topic

  1. We need the Amazon Simple Notification Service here, to provide internal message delivery from publishers (customer’s text message) to subscribers (Amazon Lex in this example). This is used for internal notifications in this use case.
  2. As part of the CloudFormation above, check if you have an SNS topic created by the name LexPinpointIntegrationDemo.
  3. Now, we have successfully created an Amazon SNS topic.

Set up Lambda Functions

  1. Go to AWS Lambda console and open the Lambda function LexIntegration. Under the Function overview, click on the Add trigger. Under Trigger configuration dropdown, select SNS and under SNS Topic select LexPinpointIntegrationDemo topic. Click on Add.
  2. Note: In this example, I used Node.js in a Lambda and Python in another, to show how AWS Lambda functions are flexible to use the scripting language of your choice.

Setting up 2-way SMS in Amazon Pinpoint

  1. Go to Amazon Pinpoint console and click on Phone numbers under SMS & Voice in the left frame. If you don’t see any phone numbers, please refer to #3 under prerequisites section above.
  2. This is how your screen should look like
  3. Phone numbers in Pinpoint
  4. Click on the number.
  5. On the right frame, expand Two-way SMS drop down arrow.
  6. Click on the check box ‘Enable two-way SMS’.
  7. In the ‘Incoming message destination’ select the radio button ‘Choose an existing SNS topic’ and in the drop down below, choose the SNS topic you built above.
  8. The result would look like the screenshot below:
  9. 2-way SMS
  10. Click on Save.

Import Machine Learning model into Pinpoint

  1. Go to Amazon Pinpoint.
  2. Click on Machine Learning Models. Click on Add recommender model.
  3. Give a recommender model name and description under model details.
  4. Under Model configuration, choose the radio button ‘Automatically create a role’ and give an IAM role name in the textbox below.
  5. Under recommender model, choose the recommender model campaign that you created in Amazon Personalize earlier in the project.
    1. If you did not create it, use this Pinpoint workshop to create a recommender model in Amazon Personalize.
    2. The data used in this example is for retail industry, please edit the data as needed for your use case and industry.
  6. Under the settings section:
    1. Select ‘User Id’ as identifier.
    2. Click on the drop down ‘Number of recommendations per message’ and select 3.
  7. For Processing method, choose ‘Use value returned by model’.
  8. Click on Next.
  9. You are presented with attributes section. Give a display name as ‘product_name’ for the attributes and click next.
  10. On the next screen, you can review all the information provided and click on Publish.
  11. The completed model after publishing looks like the screen below:
  12. Personalize model in Pinpoint

Create a Message Template in Amazon Pinpoint

  1. Use chapter 6.4 in this workshop Amazon Pinpoint Workshop to create a message template.
  2. Once the template is created, you need to add recommendations to the message template using this Amazon Pinpoint Workshop details. Change the type of data needed for your use case and industry in this workshop. I used sample retail data.
  3. To create a Amazon Pinpoint Journey, navigate to the Amazon Pinpoint console , select Journeys and click on Create journey.
  4. Give a name, click on Set entry condition in the Journey entry block.
  5. Choose the radio button Add participants when they perform an activity.
  6. Click in the ‘Events’ text box and type in OrderStatus.
  7. Pinpoint Journey entry
  8. Click on Add activity and select Send an email.
  9. Click on choose an email template and select the email message template we created earlier in this blog. Click on choose template button.
  10. Select a Sender email address from the drop down list.
  11. Choose sender email here
  12. Click Save. The final journey should look like this:
  13. This is the final journey
  14. Click on Actions > Settings where you will review the journey settings. There you set the start and end date of the journey if applicable as well as other advanced settings. Configure your journey settings to look like the screenshot below and click Save.
  15. Journey settings
  16. To publish your journey click on Review. On the Review your journey click Next > Mark as reviewed > Publish. A 5 minutes countdown will begin after, which your journey will be live.
  17. Once the journey is live, we need to pass the event OrderStatus and the endpoints will go through that journey and will receive an email.

Testing the solution

  1. Use a phone with a valid number (in this example, I took a US phone number) and send a text ‘Order Status’ to the number generated in Amazon Pinpoint above.
  2. You should get a response “Okay, I can help with that. What is your order number?”
  3. You should type in the order number you generated earlier and stored it in Amazon DynamoDB table.
  4. You should get a response “Your order <order number> was shipped on <shipped_dt> and is expected to be delivered to your address on <delivery_dt>. Your order details have been emailed to you.”
  5. Text message flow
  6. Alternatively, you can test this solution from the Lex bot.
  7. In Amazon Lex, go to the intent you created above and click on the Test button. Next steps:
    1. In the text box, enter Order Status.
    2. Bot should respond with Okay, I can help with that. What is your order number?
    3. You can respond with the order number you entered in the DynamoDB table.
    4. Bot should respond with Your order <Order_Num> was shipped on <Shipping_Dt> and is expected to be delivered to your address on <Delivery_Dt>. Your order details have been emailed to you.
    5. Testing the 2 way messaging in Lex console

Conclusion

Using this blog post, you can elevate your customer’s experience by using Amazon Lex’s AI chat capabilities, Amazon Personalize’s ML recommendation models and trigger a Pinpoint Journey. This blog highlights how organizations can interact in a 2-way SMS with their customers and convert that engagement to a triggered email, with product recommendations, if needed.

Next Steps

You can use the above solution and modify it easily to use it across different verticals and applicable use cases. You can also extend this solution to Amazon Connect to an agent via SMS chat, using this blog.

Clean-up

  1. To delete the solution, go to CloudFormation you created as part od this project. Click on the stack and click Delete.
  2. Navigate to Amazon Pinpoint and stop the Journey you ran in this solution. Delete the Journey, Machine learning models, Message templates you created for this solution. Delete the Project you created for this solution.
  3. In Amazon Lex, delete the intent and bot you created for this solution.
  4. Delete the folder and bucket you created in S3 as part of this project.
  5. Amazon Personalize resources like Dataset groups, datasets, etc. are not created via AWS Cloudformation, thus you have to delete them manually. Please follow the instructions in the AWS documentation on how to clean up the created resources.

Additional resources

Retry delivering failed SMS using Pinpoint

How to target customers using ML, based on their interest in a product

 About the Authors

Vinay Ujjini

Vinay Ujjini is an Amazon Pinpoint and Amazon Simple Email Service Principal Specialist Solutions Architect at AWS. He has been solving customer’s omni-channel challenges for over 15 years. In his spare time, he enjoys playing tennis & cricket.

Automating adverse events reporting for pharma with Amazon Connect and Amazon Lex

Post Syndicated from Siva Thangavel original https://aws.amazon.com/blogs/architecture/automating-adverse-events-reporting-for-pharma-with-amazon-connect-and-amazon-lex/

Every pharmaceutical company manufacturing medicine must provide customers nationwide with a method to report adverse events following medicine usage as well as emergency assistance as needed. To comply with regulatory policy and enable an Adverse Events Reporting System (AERS), pharma companies must provide dedicated, toll-free phone numbers and contact center agents to handle inbound calls.

But they must also be prepared for sudden spikes in call volume, which can increase contact center agents’ workloads and lead to long wait times for customers. With these limitations comes the possibility that customers may not be able to report adverse events.

Further still, as medicine status keeps changing, all agents must be retrained to handle calls and extend support. Pharma companies incur significant costs for training and onboarding additional agents, as well as the physical infrastructure to support their work.

To overcome these challenges, we designed a self-service Interactive Voice Response (IVR) solution with Amazon Connect. The IVR solution handles customer calls without agent involvement. It captures customer information and records data into an enterprise AERS. It also provides an option to receive a link to an Adverse Events (AE) portal using Short Message Service (SMS), or to be routed to a live agent queue.

In this blog post, we introduce a reference architecture for this use case. This framework can help other pharma companies solve similar problems.

Solution overview

Let’s explore how the IVR solution architecture routes customer calls step by step, as shown in Figure 1.

Adverse events reporting architecture diagram

Figure 1. Adverse events reporting architecture diagram

  1.  Callers who dial in to report a medicine-related AE are routed to the Amazon Lex chatbot through IVR in Amazon Connect.
  2. Callers can proceed to IVR self-service functions, such as understanding the intent of a customer call and the AE.
  3. AEs are analyzed with Amazon SageMaker for a decision on whether to complete the call on IVR or forward it to an agent.
  4. If the caller remains on the self-service option, the bot captures information from 15 to 20 essential questions.
  5. The bot follows a hybrid workflow that allows for guided responses where appropriate and free-text conversations using AWS Lambda. It confirms captured AE information with the caller before closing the call and submitting information to the AERS system.
  6. The bot provides the option to route the call to a human agent contextually.
  7. The bot provides the option to share an AE reporting link over SMS using Amazon Simple Notification System (SNS), so the caller can access it through a mobile device to continue AE reporting outside of the call.
  8. The bot records customer interactions in AERS using Amazon DynamoDB, leveraging the current validated process used by the AE portal team
  9. The bot makes call recordings available for auditing, monitoring, and training purposes. These recordings are not be provided to live agents.
  10. Standard analytics are available to help the business continuously train the bot and measure its performance.

Leveraging IVR as an extended solution

Recorded customer calls can be used for further analytics with Amazon Transcribe. Actionable insights can be derived from the text using a machine learning (ML) model such as AE detection at scale. A (Named Entity Recognition model (NER) model can also identify medicines and caller types.

Further, all recorded calls may be stored in a secure AWS ecosystem and archived for longer durations for compliance purposes. Storage costs can be optimized by setting up a policy to move old calls to Amazon Simple Storage Service Glacier (Amazon S3 Glacier) storage classes and calls over two years old to the Amazon S3 Deep Glacier storage class. This results in significant cost saving and helps companies archive at scale.

Finally, the Amazon Lex bot can be enhanced and continuously trained with additional intents and utterances to handle complex AE reporting for various drugs. This provides significant cost saving and operational efficiencies as bots can be trained faster than human agents, as well as at scale.

Conclusion: Using IVR to better manage AE reporting

This IVR solution was deployed for a pharma company and helped handle unusually high call volumes for AE reporting with its current agent population. It resulted in cost savings in contact center operations and significantly improved the customer experience by reducing wait times.

The IVR solution can also be used with any existing contact center platform to first forward the calls to Amazon Connect for initial triage, and then handover to existing platforms for agent involvement. This adds intelligence to existing contact centers.

This blog post demonstrates how pharma companies can leverage the self-service option for AERS to handle any AE reporting call. With solution enhancements using Amazon SageMaker models, it can quickly be transformed to handle calls for any medicine. They can also:

  • Incorporate related information into the model, such as the age, gender, or existing AEs to further improve the ML prediction performance
  • Leverage audio data augmentation plus handcrafted features to help yield better predictions
  • Use the audio-based diagnostic prediction in an Amazon Connect contact flow to triage the targeted group of incoming calls and escalate to a doctor for follow up if necessary
  • Allow call center agents to use the intelligence provided by the acoustic classification in conjunction with Contact Lens for Amazon Connect, which provides a turn-by-turn transcript; real-time alerts; automated call categorization based on keywords and phrases; sentiment analysis, and sensitive data redaction—truly making it a real-time intelligent solution.

The IVR solution can also be used for other industry use cases where a series of data is collected from customers. This solution improves the customer experience and can be implemented without increasing call center agent counts.

Migrating to token-based authentication for iOS applications with Amazon SNS

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/migrating-to-token-based-authentication-for-ios-applications-with-amazon-sns/

This post is written by Yashlin Naidoo, Cloud Support Engineer.

Amazon Simple Notification Service (Amazon SNS) enables you to send notifications directly to a mobile push endpoint. For iOS apps, Amazon SNS dispatches the notification on your application’s behalf to the Apple Push Notification service (APNs).

To send mobile push notifications via Amazon SNS, you must provide a set of credentials to connect to the APNs (see Prerequisites for Amazon SNS user notifications).

Amazon SNS supports two methods for authenticating with iOS mobile push endpoints when sending a mobile push notification via the APNs:

  • Certificate-based authentication
  • Token-based authentication

To use certificate-based authentication, you must configure Amazon SNS with a provider certificate. Amazon SNS will use this certificate on your behalf to establish a secure connection with the APNs to dispatch your mobile push notifications. For each application that you support, you will need to provide unique certificates.

As the number of applications you manage grows, you will also need to create and manage an increasing number of certificates. Furthermore, certificates expire yearly, and you must renew them to ensure that Amazon SNS can continue to send mobile push notifications on your behalf. To learn more about how to use certificate-based authentication, see Certificate-based authentication for iOS applications with Amazon SNS on the AWS Compute Blog.

For new and existing iOS applications, we recommend that you use token-based authentication. To learn more about how to use token-based authentication, see Token-Based authentication for iOS applications with Amazon SNS on the AWS Compute Blog.

There are several benefits in using token-based authentication:

  • You can use a single token that is shared among all of your applications.
  • You can remove the need for yearly certificate renewal for certificate-based authentication.
  • You can improve the security of your application by using token-based requests. For these requests, your credentials are never transferred from Amazon SNS to your mobile push notification provider, making the communication less likely to be compromised.

Token-based authentication is the latest authentication method provided by the APNs that improves security for your applications, requires less management effort, and is more efficient. We recommend migrating as soon as possible to ensure the security and ease of operations of your applications.

This blog post provides step-by-step instructions for migrating your iOS application from certificate-based authentication to token-based authentication with Amazon SNS. You will learn how to create a new token using your Apple developer account. Next, you will migrate your platform application to token-based authentication. Finally, you will test your application by sending a test push notification via Amazon SNS to a device to confirm the successful migration.

Prerequisites

  • XCode IDE
  • iOS application with a valid p.12 certificate

Before proceeding with this migration, we recommend to stop sending push notifications to your applications until the migration is complete to avoid any disruptions in your message delivery workloads.

Walkthrough

You can also create a test platform application with token-based authentication to ensure that the Amazon SNS platform application is created successfully. Finally, you can create a device token and send a test push notification to it. Once confirmed that the application works correctly, you can migrate your main platform application to token-based authentication.

Creating a .p8 token to upload to Amazon SNS

  1. Log in to your Apple Developer account.
  2. Choose Certificates, Identifiers & Profiles.
  3. In the Keys section, choose the Add button (+).
  4. Under Register a New Key, for Key Name, type the token key name and tick the box for Apple Push Notifications service (APNs) for the key services.
  5. Select Continue.
  6. In the Register a New Key section, check that all values were entered correctly.
  7. Select Register to register the new token key.
  8. Download your token key. Store it in a safe location, as you can’t download the token key again.

Migrating your platform application from certificate-based authentication to token-based authentication

  1. Navigate to the Amazon SNS console. Expand the Mobile menu and choose Push Notification.
  2. Choose your platform application.
  3. Choose Edit. Under Apple credentials section choose Token:
    1. Under Token, select Choose file to upload the .p8 token key file.
    2. Provide values for signing key ID, team ID and bundle ID. These values can be found in your Apple Developer account. Ensure that your bundle ID is identical to the ID used for this application with certificate-based authentication.
  4. Event notifications – optional: refer to the following guide for enabling event notifications: Mobile app events
  5. Delivery status logging – optional: refer to the following guide for enabling delivery status logging: How do I access Amazon SNS topic delivery logs for push notifications? Find more information on these steps can in the Mobile push notifications best practices.
    Apple credential settings
  6. Choose Save changes. This changes your platform application to token-based authentication.

Testing push notification delivery to your device

In this section, you will test sending a push notification to your device using the Amazon SNS console and the AWS Command Line Interface (AWS CLI).

Amazon SNS console

  1. From the Amazon SNS console, navigate to your platform endpoint and choose Publish message.
  2. For message body, select Custom payload for each delivery protocol to send to the endpoint. This example uses a custom payload that allows you to provide additional APNs headers:
    Custom payload for each delivery model configuration
  3. Choose Publish message.
  4. The push notification is delivered to your device:
    iOS sample notification message

AWS CLI

Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent AWS CLI version.
Run the following command. For target-arn, specify your platform application endpoint ARN:

aws sns publish \
    --target-arn arn:aws:sns:us-west-2:191418023309:endpoint/APNS_SANDBOX/computeblogdemo/ba7a35f8-c73d-364f-9edd-5c438add0533 \
    --message '{"APNS_SANDBOX": "{\"aps\":{\"alert\":\"Sample message for iOS development endpoints\"}}"}' \
    --message-attributes '{"AWS.SNS.MOBILE.APNS.PUSH_TYPE":{"DataType":"String","StringValue":"alert"}}' \
    --message-structure json
  1. An output containing a MessageId is shown in case of successful delivery:
    {
        "MessageId": "83ecb3a1-c728-5b7c-96e5-e8417d5cd4f4"
    }
  2. The push notification is delivered to your device:
    iOS sample notification message

Troubleshooting

You might encounter various errors when migrating to token-based authentication. This section explains how to troubleshoot these errors.

If a message is not delivered after publishing it to your platform application endpoint, refer to the Amazon CloudWatch failed logs of your platform application. These logs are named sns/your-aws-region/your-accountID/app/platform_name/application_name/Failure.

Once you have navigated to your platform application’s CloudWatch failed log group, click on one of the log streams based on the time that you published the message. Focus on the following attributes:

  • statusCode: error messages are grouped according to the status code.
  • status : shows whether a message was delivered successfully to the provider or if it failed to deliver.
  • providerResponse: provides the response message from the provider and is only shown in case a message failed to deliver.

We will look through messages that failed to deliver because of the following errors:

InvalidProviderToken

"providerResponse": "{\"reason\":\"InvalidProviderToken\"}",
"statusCode": 403,
"status": "FAILURE"

The cause of this error can be an incorrect token key ID, team ID or if the token is invalid.

To resolve this issue, go to your Apple Developer account and ensure that you are providing the correct token ID, team ID and that your token key exists.

TopicDisallowed

"providerResponse": "{\"reason\":\"TopicDisallowed\"}",
"statusCode": 400,
"status": "FAILURE"

The cause of this error can be an incorrect bundle ID or a device token that was created with the wrong bundle ID.

To resolve this issue, go to your Apple Developer account and navigate to your existing certificate used when migrating to token-based authentication. Confirm the bundle ID assigned to this certificate and ensure you are using the same ID for your platform application and also for your device tokens.

Conclusion

Developers can send mobile push notifications for the APNs using token-based authentication by using a .p8 key to authenticate an Apple device endpoint. This is the recommended authentication method due to improved security and lower management effort by removing the need for annual certificate renewal and by being able to share tokens among multiple applications.

To learn more about APNs token-based authentication with Amazon SNS, visit the Amazon SNS Developer Guide.

For more serverless learning resources, visit Serverless Land.

Week in Review – February 13, 2023

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/week-in-review-february-13-2023/

AWS announced 32 capabilities since we published the last Week in Review blog post a week ago. I also read a couple of other news and blog posts.

Here is my summary.

The VPC section of the AWS Management Console now allows you to visualize your VPC resources, such as the relationships between a VPC and its subnets, routing tables, and gateways. This visualization was available at VPC creation time only, and now you can go back to it using the Resource Map tab in the console. You can read the details in Channy’s blog post.

CloudTrail Lake now gives you the ability to ingest activity events from non-AWS sources. This lets you immutably store and then process activity events without regard to their origin–AWS, on-premises servers, and so forth. All of this power is available to you with a single API call: PutAuditEvents. We launched AWS CloudTrail Lake about a year ago. It is a managed organization-scale data lake that aggregates, immutably stores, and allows querying of events recorded by CloudTrail. You can use it for auditing, security investigation, and troubleshooting. Again, my colleague Channy wrote a post with the details.

There are three new Amazon CloudWatch metrics for asynchronous AWS Lambda function invocations: AsyncEventsReceived, AsyncEventAge, and AsyncEventsDropped. These metrics provide visibility for asynchronous Lambda function invocations. They help you to identify the root cause of processing issues such as throttling, concurrency limit, function errors, processing latency because of retries, or missing events. You can learn more and have access to a sample application in this blog post.

Amazon Simple Notification Service (Amazon SNS) now supports AWS X-Ray to visualize, analyze, and debug applications. Developers can now trace messages going through Amazon SNS, making it easier to understand or debug microservices or serverless applications.

Amazon EC2 Mac instances now support replacing root volumes for quick instance restoration. Stopping and starting EC2 Mac instances trigger a scrubbing workflow that can take up to one hour to complete. Now you can swap the root volume of the instance with an EBS snapshot or an AMI. It helps to reset your instance to a previous known state in 10–15 minutes only. This significantly speeds up your CI and CD pipelines.

Amazon Polly launches two new Japanese NTTS voices. Neural Text To Speech (NTTS) produces the most natural and human-like text-to-speech voices possible. You can try these voices in the Polly section of the AWS Management Console. With this addition, according to my count, you can now choose among 52 NTTS voices in 28 languages or language variants (French from France or from Quebec, for example).

The AWS SDK for Java now includes the AWS CRT HTTP Client. The HTTP client is the center-piece powering our SDKs. Every single AWS API call triggers a network call to our API endpoints. It is therefore important to use a low-footprint and low-latency HTTP client library in our SDKs. AWS created a common HTTP client for all SDKs using the C programming language. We also offer 11 wrappers for 11 programming languages, from C++ to Swift. When you develop in Java, you now have the option to use this common HTTP client. It provides up to 76 percent cold start time reduction on AWS Lambda functions and up to 14 percent less memory usage compared to the Netty-based HTTP client provided by default. My colleague Zoe has more details in her blog post.

X in Y Jeff started this section a while ago to list the expansion of new services and capabilities to additional Regions. I noticed 10 Regional expansions this week:

Other AWS News
This week, I also noticed these AWS news items:

My colleague Mai-Lan shared some impressive customer stories and metrics related to the use and scale of Amazon S3 Glacier. Check it out to learn how to put your cold data to work.

Space is the final (edge) frontier. I read this blog post published on avionweek.com. It explains how AWS helps to deploy AIML models on observation satellites to analyze image quality before sending them to earth, saving up to 40 percent satellite bandwidth. Interestingly, the main cause for unusable satellite images is…clouds.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

AWS re:Invent recaps in your area. During the re:Invent week, we had lots of new announcements, and in the next weeks, you can find in your area a recap of all these launches. All the events are posted on this site, so check it regularly to find an event nearby.

AWS re:Invent keynotes, leadership sessions, and breakout sessions are available on demand. I recommend that you check the playlists and find the talks about your favorite topics in one collection.

AWS Summits season will restart in Q2 2023. The dates and locations will be announced here. Paris and Sidney are kicking off the season on April 4th. You can register today to attend these in-person, free events (Paris, Sidney).

Stay Informed
That was my selection for this week! To better keep up with all of this news, do not forget to check out the following resources:

— seb
This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Decreasing incident response time for OutSystems with AWS serverless technology

Post Syndicated from Ivo Pinto original https://aws.amazon.com/blogs/architecture/decreasing-incident-response-time-for-outsystems-with-aws-serverless-technology/

Leading modern application platform space OutSystems is a low-code platform that provides tools for companies to develop, deploy, and manage omnichannel enterprise applications.

Security is a top priority at OutSystems. Their Security Operations Center (SOC) deals with thousands of incidents a year, each with a set of response actions that need to be executed as quickly as possible. Providing security at such large scale is a challenge, even for the most well-prepared organizations. Manual and repetitive tasks account for the majority of the response time involved in this process, and decreasing this key metric requires orchestration and automation.

Security orchestration, automation, and response (SOAR) systems are designed to translate security analysts’ manual procedures into automated actions, making them faster and more scalable.

In this blog post, we’ll explore how OutSystems lowered their incident response time by 99 percent by designing and deploying a custom SOAR using Serverless services on AWS.

Solution architecture

Security incidents happen with unknown frequency, making serverless services a natural fit to boost security at OutSystems because of their increased agility and capability to scale to zero.

There are two ways to trigger SOAR actions in this architecture:

  1. Automatically through Security Information and Event Management (SIEM) security incident findings
  2. On-demand through chat application

Using the first method, when a security incident is detected by the SIEM, an event is published to Amazon Simple Notification Service (Amazon SNS). This triggers an AWS Lambda function that creates a ticket in an internal ticketing system. Then the Lambda Playbooks function triggers to decide which playbook to run depending on the incident details.

Each playbook is a set of actions that are executed in response to a trigger. Playbooks are the key component behind automated tasks. OutSystems uses AWS Step Functions to orchestrate the actions and Lambda functions to execute them.

But this solution does not exist in isolation. Depending on the playbook, Step Functions interacts with other components such as AWS Secrets Manager or external APIs.

Using the second method, the on-demand trigger for OutSystems SOAR relies on a chat application. This application calls a Lambda function URL that interacts with the playbooks we just discussed.

Figure 1 represents the high-level architecture of OutSystems’ custom SOAR.

SOAR architecture for AWS

Figure 1. SOAR architecture for AWS

This architecture was deployed with Infrastructure as Code (IaC) using AWS CloudFormation and AWS CodePipeline.

This same IaC architecture is used when new playbooks or updates to existing ones are made. Code changes that are committed to a source control repository trigger the CodePipeline which uses AWS CodeBuild and CloudFormation change sets to deploy the updates to the affected resources.

Use cases

The use cases that OutSystems has deployed playbooks for to date include:

  • SQL injection
  • Unauthorized access to credentials
  • Issuance of new certificates
  • Login brute forces
  • Impossible travel

Let’s explore the Impossible travel use case. Impossible travel happens when a user logs in from one location, and then later logs in from a different location that would be impossible to travel between within the elapsed time.

When the SIEM identifies this behavior, it triggers an alert and the following actions are performed:

  1. A ticket is created
  2. An IP address check is performed in reputation databases, such as AbuseIPDB or VirusTotal
  3. An IP address check is performed in the internal database, and the IP address is added if it is not found
  4. A search is performed for past events with the same IP address
  5. A WHOIS is performed on the IP address
  6. Recent logins of the user are identified in the SIEM, along with all related information
  7. All of this information is automatically added to the ticket. Every step listed here was previously performed manually; a task that took an average of 15 minutes. Now, the process takes just 8 seconds—a 99.1% incident response time improvement.

The following remediation actions can also be automated, along with many others:

Some of these remediation actions are already in place, while others are in development.

Conclusion

At OutSystems, much like at AWS, security is considered “job zero.” It is not only important to be proactive in preventing security incidents, but when they happen, the response must be quick, effective, and as immune to human error as possible.

With the implementation of this custom SOAR, OutSystems reduced the average response time to security incidents by 99%. Tasks that previously took 76 hours of analysts’ time are now accomplished automatically within 31 minutes.

During the evaluation period, SOAR addressed hundreds of real-world incidents with some threat intel use cases being executed thousands of times.

An architecture composed of serverless services ensures OutSystems does not pay for systems that are standing by waiting for work, and at the same time, not compromising on performance.

If you are interested in this topic—how to respond to security incidents using AWS serverless services—be sure you also read the Orchestrating a security incident response with AWS Step Functions and How to get started with security response automation on AWS blog posts.

Building a healthcare data pipeline on AWS with IBM Cloud Pak for Data

Post Syndicated from Eduardo Monich Fronza original https://aws.amazon.com/blogs/architecture/building-a-healthcare-data-pipeline-on-aws-with-ibm-cloud-pak-for-data/

Healthcare data is being generated at an increased rate with the proliferation of connected medical devices and clinical systems. Some examples of these data are time-sensitive patient information, including results of laboratory tests, pathology reports, X-rays, digital imaging, and medical devices to monitor a patient’s vital signs, such as blood pressure, heart rate, and temperature.

These different types of data can be difficult to work with, but when combined they can be used to build data pipelines and machine learning (ML) models to address various challenges in the healthcare industry, like the prediction of patient outcome, readmission rate, or disease progression.

In this post, we demonstrate how to bring data from different sources, like Snowflake and connected health devices, to form a healthcare data lake on Amazon Web Services (AWS). We also explore how to use this data with IBM Watson to build, train, and deploy ML models. You can learn how to integrate model endpoints with clinical health applications to generate predictions for patient health conditions.

Solution overview

The main parts of the architecture we discuss are (Figure 1):

  1. Using patient data to improve health outcomes
  2. Healthcare data lake formation to store patient health information
  3. Analyzing clinical data to improve medical research
  4. Gaining operational insights from healthcare provider data
  5. Providing data governance to maintain the data privacy
  6. Building, training, and deploying an ML model
  7. Integration with the healthcare system
Data pipeline for the healthcare industry using IBM CP4D on AWS

Figure 1. Data pipeline for the healthcare industry using IBM CP4D on AWS

IBM Cloud Pak for Data (CP4D) is deployed on Red Hat OpenShift Service on AWS (ROSA). It provides the components IBM DataStage, IBM Watson Knowledge Catalogue, IBM Watson Studio, IBM Watson Machine Learning, plus a wide variety of connections with data sources available in a public cloud or on-premises.

Connected health devices, on the edge, use sensors and wireless connectivity to gather patient health data, such as biometrics, and send it to the AWS Cloud through Amazon Kinesis Data Firehose. AWS Lambda transforms the data that is persisted to Amazon Simple Storage Service (Amazon S3), making that information available to healthcare providers.

Amazon Simple Notification Service (Amazon SNS) is used to send notifications whenever there is an issue with the real-time data ingestion from the connected health devices. In case of failures, messages are sent via Amazon SNS topics for rectifying and reprocessing of failure messages.

DataStage performs ETL operations and move patient historical information from Snowflake into Amazon S3. This data, combined with the data from the connected health devices, form a healthcare data lake, which is used in IBM CP4D to build and train ML models.

The pipeline described in architecture uses Watson Knowledge Catalogue, which provides data governance framework and artifacts to enrich our data assets. It protects sensitive patient information from unauthorized access, like individually identifiable information, medical history, test results, or insurance information.

Data protection rules define how to control access to data, mask sensitive values, or filter rows from data assets. The rules are automatically evaluated and enforced each time a user attempts to access a data asset in any governed catalog of the platform.

After this, the datasets are published to Watson Studio projects, where they are used to train ML models. You can develop models using Jupyter Notebook, IBM AutoAI (low-code), or IBM SPSS modeler (no-code).

For the purpose of this use case, we used logistic regression algorithm for classifying and predicting the probability of an event, such as disease risk management to assist doctors in making critical medical decisions. You can also build ML models using algorithms like Classification, Random Forest, and K-Nearest Neighbor. These are widely used to predict disease risk.

Once the models are trained, they are exposed as endpoints with Watson Machine Learning and integrated with the healthcare application to generate predictions by analyzing patient symptoms.

The healthcare applications are a type of clinical software that offer crucial physiological insights and predict the effects of illnesses and possible treatments. It provides built-in dashboards that display patient information together with the patient’s overall metrics for outcomes and treatments. This can help healthcare practitioners gain insights into patient conditions. It also can help medical institutions prioritize patients with more risk factors and curate clinical and behavioral health plans.

Finally, we are using IBM Security QRadar XDR SIEM to collect, process, and aggregate Amazon Virtual Private Cloud (Amazon VPC) flow logs, AWS CloudTrail logs, and IBM CP4D logs. QRadar XDR uses this information to manage security by providing real-time monitoring, alerts, and responses to threats.

Healthcare data lake

A healthcare data lake can help health organizations turn data into insights. It is centralized, curated, and securely stores data on Amazon S3. It also enables you to break down data silos and combine different types of analytics to gain insights. We are using the DataStage, Kinesis Data Firehose, and Amazon S3 services to build the healthcare data lake.

Data governance

Watson Knowledge Catalogue provides an ML catalogue for data discovery, cataloging, quality, and governance. We define policies in Watson Knowledge Catalogue to enable data privacy and overall access to and utilization of this data. This includes sensitive data and personal information that needs to be handled through data protection, quality, and automation rules. To learn more about IBM data governance, please refer to Running a data quality analysis (Watson Knowledge Catalogue).

Build, train, and deploy the ML model

Watson Studio empowers data scientists, developers, and analysts to build, run, and manage AI models on IBM CP4D.

In this solution, we are building models using Watson Studio by:

  1. Promoting the governed data from Watson Knowledge Catalogue to Watson Studio for insights
  2. Using ETL features, such as built-in search, automatic metadata propagation, and simultaneous highlighting, to process and transform large amounts of data
  3. Training the model, including model technique selection and application, hyperparameter setting and adjustment, validation, ensemble model development and testing; algorithm selection; and model optimization
  4. Evaluating the model based on metric evaluation, confusion matrix calculations, KPIs, model performance metrics, model quality measurements for accuracy and precision
  5. Deploying the model on Watson Machine Learning using online deployments, which create an endpoint to generate a score or prediction in real time
  6. Integrating the endpoint with applications like health applications, as demonstrated in Figure 1

Conclusion

In this blog, we demonstrated how to use patient data to improve health outcomes by creating a healthcare data lake and analyzing clinical data. This can help patients and healthcare practitioners make better, faster decisions and prioritize cases. We also discussed how to build an ML model using IBM Watson and integrate it with healthcare applications for health analysis.

Additional resources

Introducing payload-based message filtering for Amazon SNS

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-payload-based-message-filtering-for-amazon-sns/

This post is written by Prachi Sharma (Software Development Manager, Amazon SNS), Mithun Mallick (Principal Solutions Architect, AWS Integration Services), and Otavio Ferreira (Sr. Software Development Manager, Amazon SNS).

Amazon Simple Notification Service (SNS) is a messaging service for Application-to-Application (A2A) and Application-to-Person (A2P) communication. The A2A functionality provides high-throughput, push-based, many-to-many messaging between distributed systems, microservices, and event-driven serverless applications. These applications include Amazon Simple Queue Service (SQS), Amazon Kinesis Data Firehose, AWS Lambda, and HTTP/S endpoints. The A2P functionality enables you to communicate with your customers via mobile text messages (SMS), mobile push notifications, and email notifications.

Today, we’re introducing the payload-based message filtering option of SNS, which augments the existing attribute-based option, enabling you to offload additional filtering logic to SNS and further reduce your application integration costs. For more information, see Amazon SNS Message Filtering.

Overview

You use SNS topics to fan out messages from publisher systems to subscriber systems, addressing your application integration needs in a loosely-coupled way. Without message filtering, subscribers receive every message published to the topic, and require custom logic to determine whether an incoming message needs to be processed or filtered out. This results in undifferentiating code, as well as unnecessary infrastructure costs. With message filtering, subscribers set a filter policy to their SNS subscription, describing the characteristics of the messages in which they are interested. Thus, when a message is published to the topic, SNS can verify the incoming message against the subscription filter policy, and only deliver the message to the subscriber upon a match. For more information, see Amazon SNS Subscription Filter Policies.

However, up until now, the message characteristics that subscribers could express in subscription filter policies were limited to metadata in message attributes. As a result, subscribers could not benefit from message filtering when the messages were published without attributes. Examples of such messages include AWS events published to SNS from 60+ other AWS services, like Amazon Simple Storage Service (S3), Amazon CloudWatch, and Amazon CloudFront. For more information, see Amazon SNS Event Sources.

The new payload-based message filtering option in SNS empowers subscribers to express their SNS subscription filter policies in terms of the contents of the message. This new capability further enables you to use SNS message filtering for your event-driven architectures (EDA) and cross-account workloads, specifically where subscribers may not be able to influence a given publisher to have its events sent with attributes. With payload-based message filtering, you have a simple, no-code option to further prevent unwanted data from being delivered to and processed by subscriber systems, thereby simplifying the subscribers’ code as well as reducing costs associated with downstream compute infrastructure. This new message filtering option is available across SNS Standard and SNS FIFO topics, for JSON message payloads.

Applying payload-based filtering in a use case

Consider an insurance company moving their lead generation platform to a serverless architecture based on microservices, adopting enterprise integration patterns to help them develop and scale these microservices independently. The company offers a variety of insurance types to its customers, including auto and home insurance. The lead generation and processing workflow for each insurance type is different, and entails notifying different backend microservices, each designed to handle a specific type of insurance request.

Payload filtering example

Payload filtering example

The company uses multiple frontend apps to interact with customers and receive leads from them, including a web app, a mobile app, and a call center app. These apps submit the customer-generated leads to an internal lead storage microservice, which then uploads the leads as XML documents to an S3 bucket. Next, the S3 bucket publishes events to an SNS topic to notify that lead documents have been created. Based on the contents of each lead document, the SNS topic forks the workflow by delivering the auto insurance leads to an SQS queue and the home insurance leads to another SQS queue. These SQS queues are respectively polled by the auto insurance and the home insurance lead processing microservices. Each processing microservice applies its business logic to validate the incoming leads.

The following S3 event, in JSON format, refers to a lead document uploaded with key auto-insurance-2314.xml to the S3 bucket. S3 automatically publishes this event to SNS, which in turn matches the S3 event payload against the filter policy of each subscription in the SNS topic. If the event matches the subscription filter policy, SNS delivers the event to the subscribed SQS queue. Otherwise, SNS filters the event out.

{
  "Records": [{
    "eventVersion": "2.1",
    "eventSource": "aws:s3",
    "awsRegion": "sa-east-1",
    "eventTime": "2022-11-21T03:41:29.743Z",
    "eventName": "ObjectCreated:Put",
    "userIdentity": {
      "principalId": "AWS:AROAJ7PQSU42LKEHOQNIC:demo-user"
    },
    "requestParameters": {
      "sourceIPAddress": "177.72.241.11"
    },
    "responseElements": {
      "x-amz-request-id": "SQCC55WT60XABW8CF",
      "x-amz-id-2": "FRaO+XDBrXtx0VGU1eb5QaIXH26tlpynsgaoJrtGYAWYRhfVMtq/...dKZ4"
    },
    "s3": {
      "s3SchemaVersion": "1.0",
      "configurationId": "insurance-lead-created",
      "bucket": {
        "name": "insurance-bucket-demo",
        "ownerIdentity": {
          "principalId": "A1ATLOAF34GO2I"
        },
        "arn": "arn:aws:s3:::insurance-bucket-demo"
      },
      "object": {
        "key": "auto-insurance-2314.xml",
        "size": 17,
        "eTag": "1530accf30cab891d759fa3bb8322211",
        "sequencer": "00737AF379B2683D6C"
      }
    }
  }]
}

To express its interest in auto insurance leads only, the SNS subscription for the auto insurance lead processing microservice sets the following filter policy. Note that, unlike attribute-based policies, payload-based policies support property nesting.

{
  "Records": {
    "s3": {
      "object": {
        "key": [{
          "prefix": "auto-"
        }]
      }
    },
    "eventName": [{
      "prefix": "ObjectCreated:"
    }]
  }
}

Likewise, to express its interest in home insurance leads only, the SNS subscription for the home insurance lead processing microservice sets the following filter policy.

{
  "Records": {
    "s3": {
      "object": {
        "key": [{
          "prefix": "home-"
        }]
      }
    },
    "eventName": [{
      "prefix": "ObjectCreated:"
    }]
  }
}

Note that each filter policy uses the string prefix matching capability of SNS message filtering. In this use case, this matching capability enables the filter policy to match only the S3 objects whose key property value starts with the insurance type it’s interested in (either auto- or home-). Note as well that each filter policy matches only the S3 events whose eventName property value starts with ObjectCreated, as opposed to ObjectRemoved. For more information, see Amazon S3 Event Notifications.

Deploying the resources and filter policies

To deploy the AWS resources for this use case, you need an AWS account with permissions to use SNS, SQS, and S3. On your development machine, install the AWS Serverless Application Model (SAM) Command Line Interface (CLI). You can find the complete SAM template for this use case in the aws-sns-samples repository in GitHub.

The SAM template has a set of resource definitions, as presented below. The first resource definition creates the SNS topic that receives events from S3.

InsuranceEventsTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: insurance-events-topic

The next resource definition creates the S3 bucket where the insurance lead documents are stored. This S3 bucket publishes an event to the SNS topic whenever a new lead document is created.

InsuranceEventsBucket:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain
    DependsOn: InsuranceEventsTopicPolicy
    Properties:
      BucketName: insurance-doc-events
      NotificationConfiguration:
        TopicConfigurations:
          - Topic: !Ref InsuranceEventsTopic
            Event: 's3:ObjectCreated:*'

The next resource definitions create the SQS queues to be subscribed to the SNS topic. As presented in the architecture diagram, there’s one queue for auto insurance leads, and another queue for home insurance leads.

AutoInsuranceEventsQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: auto-insurance-events-queue
      
HomeInsuranceEventsQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: home-insurance-events-queue

The next resource definitions create the SNS subscriptions and their respective filter policies. Note that, in addition to setting the FilterPolicy property, you need to set the FilterPolicyScope property to MessageBody in order to enable the new payload-based message filtering option for each subscription. The default value for the FilterPolicyScope property is MessageAttributes.

AutoInsuranceEventsSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      Protocol: sqs
      Endpoint: !GetAtt AutoInsuranceEventsQueue.Arn
      TopicArn: !Ref InsuranceEventsTopic
      FilterPolicyScope: MessageBody
      FilterPolicy:
        '{"Records":{"s3":{"object":{"key":[{"prefix":"auto-"}]}}
        ,"eventName":[{"prefix":"ObjectCreated:"}]}}'
  
HomeInsuranceEventsSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      Protocol: sqs
      Endpoint: !GetAtt HomeInsuranceEventsQueue.Arn
      TopicArn: !Ref InsuranceEventsTopic
      FilterPolicyScope: MessageBody
      FilterPolicy:
        '{"Records":{"s3":{"object":{"key":[{"prefix":"home-"}]}}
        ,"eventName":[{"prefix":"ObjectCreated:"}]}}'

Once you download the full SAM template from GitHub to your local development machine, run the following command in your terminal to build the deployment artifacts.

sam build –t SNS-Payload-Based-Filtering-SAM.template

Once SAM has finished building the deployment artifacts, run the following command to deploy the AWS resources and the SNS filter policies. The command guides you through the process of setting deployment preferences, which you can answer based on your requirements. For more information, refer to the SAM Developer Guide.

sam deploy --guided

Once SAM has finished deploying the resources, you can start testing the solution in the AWS Management Console.

Testing the filter policies

Go the AWS CloudFormation console, choose the stack created by the SAM template, then choose the Outputs tab. Note the name of the S3 bucket created.

S3 bucket name

S3 bucket name

Now switch to the S3 console, and choose the bucket with the corresponding name. Once on the bucket details page, upload a test file whose name starts with the auto- prefix. For example, you can name your test file auto-insurance-7156.xml. The upload triggers an S3 event, typed as ObjectCreated, which is then routed through the SNS topic to the SQS queue that stores auto insurance leads.

Insurance bucket contents

Insurance bucket contents

Now switch to the SQS console, and choose to receive messages for the SQS queue storing an auto insurance lead. Note that the SQS queue for home insurance leads is empty.

SQS home insurance queue empty

SQS home insurance queue empty

If you want to check the filter policy configured, you may switch to the SNS console, choose the SNS topic created by the SAM template, and choose the SNS subscription for auto insurance leads. Once on the subscription details page, you can view the filter policy, in JSON format, alongside the filter policy scope set to “Message body”.

SNS filter policy

SNS filter policy

You may repeat the testing steps above, now with another file whose name starts with the home- prefix, and see how the S3 event is routed through the SNS topic to the SQS queue that stores home insurance leads.

Monitoring the filtering activity

CloudWatch provides visibility into your SNS message filtering activity, with dedicated metrics, which also enables you to create alarms. You can use the NumberOfNotifcationsFilteredOut-MessageBody metric to monitor the number of messages filtered out due to payload-based filtering, as opposed to attribute-based filtering. For more information, see Monitoring Amazon SNS topics using CloudWatch.

Moreover, you can use the NumberOfNotificationsFilteredOut-InvalidMessageBody metric to monitor the number of messages filtered out due to having malformed JSON payloads. You can have these messages with malformed JSON payloads moved to a dead-letter queue (DLQ) for troubleshooting purposes. For more information, see Designing Durable Serverless Applications with DLQ for Amazon SNS.

Cleaning up

To delete all the AWS resources that you created as part of this use case, run the following command from the project root directory.

sam delete

Conclusion

In this blog post, we introduce the use of payload-based message filtering for SNS, which provides event routing for JSON-formatted messages. This enables you to write filter policies based on the contents of the messages published to SNS. This also removes the message parsing overhead from your subscriber systems, as well as any custom logic from your publisher systems to move message properties from the payload to the set of attributes. Lastly, payload-based filtering can facilitate your event-driven architectures (EDA) by enabling you to filter events published to SNS from 60+ other AWS event sources.

For more information, see Amazon SNS Message Filtering, Amazon SNS Event Sources, and Amazon SNS Pricing. For more serverless learning resources, visit Serverless Land.

AWS Week in Review – November 7, 2022

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-week-in-review-november-7-2022/

With three weeks to go until AWS re:Invent opens in Las Vegas, the AWS News Blog Team is hard at work creating blog posts to share the latest launches and previews with you. As usual, we have a strong mix of new services, new features, and a surprise or two.

Last Week’s Launches
Here are some launches that caught my eye last week:

Amazon SNS Data Protection and Masking – After a quick public preview, this cool feature is now generally available. It uses pattern matching, machine learning models, and content policies to help protect data at scale. You can find many different kinds of personally identifiable information (PII) and protected health information (PHI) in message bodies and either block message delivery or mask (de-identify) the sensitive data, all in real-time and on a per-topic basis. To learn more, read the blog post or the message data protection documentation.

Amazon Textract Updates – This service extracts text, handwriting, and data from any document or image. This past week we updated the AnalyzeID function so that it can now extract the machine readable zone (MRZ) on passports issued by the United States, and we added the entire OCR output to the API response. We also updated the machine learning models that power the AnalyzeDocument function, with a focus on single-character boxed forms commonly found on tax and immigration documents. Finally, we updated the AnalyzeExpense function with support for new fields and higher accuracy for existing fields, bringing the total field count to more than 40.

Another Amazon Braket Processor – Our quantum computing service now supports Aquila, a new 256-qubit quantum computer from QuEra that is based on a programmable array of neutral Rubidium atoms. According to the What’s New, Aquila supports the Analog Hamiltonian Simulation (AHS) paradigm, allowing it to solve for the static and dynamic properties of quantum systems composed of many interacting particles.

Amazon S3 on Outposts – This service now lets you use additional S3 Lifecycle rules to optimize capacity management. You can expire objects as they age or are replaced with newer versions, with control at the bucket level, or for subsets defined by prefixes, object tags, or object sizes. There’s more info in the What’s New and in the S3 documentation.

AWS CloudFormation – There were two big updates last week: support for Amazon RDS Multi-AZ deployments with two readable standbys, and better access to detailed information on failed stack instances for operations on CloudFormation StackSets.

Amazon MemoryDB for Redis – You can now use data tiering as a lower cost way to to scale your clusters up to hundreds of terabytes of capacity. This new option uses a combination of instance memory and SSD storage in each cluster node, with all data stored durably in a multi-AZ transaction log. There’s more information in the What’s New and the blog post.

Amazon EC2 – You can now remove launch permissions for Amazon Machine Images (AMIs) that are directly shared with your AWS account.

X in Y – We launched existing AWS services and instance types in additional Regions:

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some additional news items that you may find interesting:

AWS Open Source News and Updates – My colleague Ricardo Sueiras highlights new open source projects, tools, and demos from the AWS Community. Read Installment 134 to see what’s going on!

New Case Study – A new AWS case study describes how Taggle (a company focused on smart water solutions in Australia) created an IoT platform that runs on AWS and uses Amazon Kinesis Data Streams to store & ingest data in real time. Using AWS allowed them to scale to accommodate 80,000 additional sensors that will roll out in 2022.

Upcoming AWS Events
re:Invent 2022AWS re:Invent is just three weeks away! Join us live from November 28th to December 2nd for keynotes, training and certification opportunities, and over 1,500 technical sessions. If you cannot make it to Las Vegas you can also join us online to watch the keynotes and leadership sessions live. Be sure to check out the re:Invent 2022 Attendee Guides, each curated by an AWS Hero, AWS industry team, or AWS partner.

PeerTalk – If you will be attending re:Invent in person and are interested in meeting with me or any of our featured experts, be sure to check out PeerTalk, our new onsite networking program.

That’s all for this week!

Jeff;

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS.

How USAA built an Amazon S3 malware scanning solution

Post Syndicated from Jonathan Nguyen original https://aws.amazon.com/blogs/architecture/how-usaa-built-an-amazon-s3-malware-scanning-solution/

United Services Automobile Association (USAA) is a San Antonio-based insurance, financial services, banking, and FinTech company supporting millions of military members and their families. USAA has partnered with Amazon Web Services (AWS) to digitally transform and build multiple USAA solutions that help keep members safe and save members money and time.

Why build a S3 malware scanning solution?

As complex companies’ businesses continue to grow, there may be an increased need for collaboration and interactions with outside vendors. Prior to developing an Amazon Simple Storage Solution (Amazon S3) scanning solution, a security review and approval process for application teams to ingest data into an AWS Organization from external vendors’ AWS accounts may be warranted, to ensure additional threats are not being introduced. This could result in a lengthy review and exception process, and subsequently, could hinder the velocity of application teams’ collaboration with external vendors.

USAA security standards, like those of most companies, require all data from external vendors to be treated as untrusted, and therefore must be scanned by an antivirus or antimalware solution prior to being ingested by downstream processes within the AWS environment. Companies looking to automate the scanning process may want to consider a solution where all incoming external data flow through a demilitarized drop zone to be scanned, and subsequently released to downstream processes if malware and viruses are not detected.

S3 malware scanning solution overview

Dedicated AWS accounts should be provisioned for specific data classifications and used as a demilitarized zone (DMZ) for an untrusted staging area. The solution discussed in this blog uses a dedicated staging AWS account that controls the release of Amazon S3 objects to other AWS accounts within an AWS Organization. AWS accounts within an AWS Organization should follow security best practices in terms of infrastructure, networking, logging, and security. External vendors should explicitly be given limited permissions to appropriate resources in their respective staging S3 bucket.

A staging S3 bucket should have specific resource policies restricting which applications and identity and access management (IAM) principals can interact with S3 objects using object attributes, such as object tags, to determine whether an object has been scanned, and what the results of that scan are. Additional guardrails are implemented using Service Control Policies (SCP) to restrict authorized IAM principals to create or modify S3 object attributes (Figure 1).

Amazon S3 antivirus and antimalware scanning architecture workflow

Figure 1. Amazon S3 antivirus and antimalware scanning architecture workflow

  1. The external vendor copies an object to the staging S3 bucket.
  2. The staging S3 bucket has event notifications configured and generates an event.
  3. The S3 PutObject event is sent to an Object Created Amazon Simple Queue Service (Amazon SQS) queue topic.
  4. An Amazon Elastic Compute Cloud (Amazon EC2) Auto Scaling group is configured to scale based on messages in the Object Created SQS queue.
  5. An antivirus and antimalware scanning service application on the Amazon EC2 instances takes the following actions on objects within the Object Created Amazon SQS queue:
    a. Tag the S3 object with an “In Progress” status.
    b. Get the object from the Staging S3 bucket and stores it in a local ephemeral file system.
    c. Scan the copied object using antivirus or antimalware tool.
    d. Based on the antivirus or antimalware scan results, tag the S3 object with the scan results (for example, No_Malware_Detected vs. Malware_Detected).
    e. Create and publish a payload to the Object Scanned Amazon Simple Notification Service (Amazon SNS) topic, allowing application team filtering.
    f. Delete the message from the Object Created SQS queue.
  6. Application teams are subscribed to the Object Scanned SNS topic with a filter for their application.
  7. For any objects where a virus or malware is detected, a company can use its cyber threat response team to conduct a thorough analysis and take appropriate actions.

USAA built a custom anti-virus and anti-malware scanning application using EC2 instances, using a private, hardened Amazon Machine Image (AMI). For cost-efficacy purposes, the EC2 automatic scaling event can be configured based on Object Created SQS queue depth and Service Level Objective (SLO). A serverless version of an anti-virus and anti-malware solution can be used instead of an EC2 application, depending on your specific use-case and other factors. Some important factors include antivirus and antimalware tool serverless support, resource tuning and configuration requirements, and additional AWS services to manage that could possibly result in a bottleneck. If your enterprise is going with a serverless approach, you can use open-source tools such as ClamAV using Lambda functions.

In the event of an infected object, proper guardrails and response mechanisms need to be in place. USAA teams have developed playbooks to monitor the health and performance of S3 scanning solution, as well as responding to detected virus or malware.

This cloud native, event-driven solution has benefited multiple USAA application teams who have previously requested the ability to ingest data into AWS workloads from teams outside of USAA’s AWS Organization, and allowed additional capabilities and functionality to better serve their members. To enhance this solution even further, USAA’s security team plans to incorporate additional mechanisms to find specific objects that either failed or required additional processing, without having to scan all objects in the buckets. This can be accomplished by including an additional AWS Lambda function and Amazon DynamoDB table to track object metadata as objects get added to the Object Created SQS queue for processing. The metadata could possibly include information such as S3 bucket origin, S3 object key, version ID, scan status, and the original S3 event payload to replay the event into the Object Created SQS queue. The Lambda function primarily ensures the DynamoDB table is kept up to date as objects are processed, as well as handling issues for objects that may need to be reprocessed. The DynamoDB table also has time-to-live (TTL) configured to clear records as they expire from the Staging S3 bucket.

Conclusion

In this post, we reviewed how USAA’s Public Cloud Security team facilitated collaboration and interactions with external vendors and AWS workloads securely by creating a scalable solution to scan S3 objects for virus and malware prior to releasing objects downstream. The solution uses native AWS services and can be utilized for any use-cases requiring antivirus or antimalware capabilities. Because the S3 object scanning solution uses EC2 instances, you can use your existing antivirus or antimalware enterprise tool.

Adding approval notifications to EC2 Image Builder before sharing AMIs

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/adding-approval-notifications-to-ec2-image-builder-before-sharing-amis-2/

­­­­­This blog post is written by, Glenn Chia Jin Wee, Associate Cloud Architect, and Randall Han, Professional Services.

You may be required to manually validate the Amazon Machine Image (AMI) built from an Amazon Elastic Compute Cloud (Amazon EC2) Image Builder pipeline before sharing this AMI to other AWS accounts or to an AWS organization. Currently, Image Builder provides an end-to-end pipeline that automatically shares AMIs after they’ve been built.

In this post, we will walk through the steps to enable approval notifications before AMIs are shared with other AWS accounts. Image Builder supports automated image testing using test components. The recommended best practice is to automate test steps, however situations can arise where test steps become either challenging to automate or internal compliance policies mandate manual checks be conducted prior to distributing images. In such situations, having a manual approval step is useful if you would like to verify the AMI configuration before it is shared to other AWS accounts or an AWS Organization. A manual approval step reduces the potential for sharing an incorrectly configured AMI with other teams which can lead to downstream issues. This solution sends an email with a link to approve or reject the AMI. Users approve the AMI after they’ve verified that it is built according to specifications. Upon approving the AMI, the solution automatically shares it with the specified AWS accounts.

OverviewArchitecture Diagram

  1. In this solution, an Image Builder Pipeline is run that builds a Golden AMI in Account A. After the AMI is built, Image Builder publishes data about the AMI to an Amazon Simple Notification Service (Amazon SNS)
  2. The SNS Topic passes the data to an AWS Lambda function that subscribes to it.
  3. The Lambda function that subscribes to this topic retrieves the data, formats it, and then starts an SSM Automation, passing it the AMI Name and ID.
  4. The first step of the SSM Automation is a manual approval step. The SSM Automation first publishes to an SNS Topic that has an email subscription with the Approver’s email. The approver will receive the email with a URL that they can click to approve the step.
  5. The approval step defines a specific AWS Identity and Access Management (IAM) Role as an approver. This role has the minimum required permissions to approve the manual approval step. After performing manual tests on the Golden AMI, the Approver principal will assume this role.
  6. After assuming this role, the approver will click on the approval link that was sent via email. After approving the step, an AWS Lambda Function is triggered.
  7. This Lambda Function shares the Golden AMI with Account B and sends an email notifying the Target Account Recipients that the AMI has been shared.

Prerequisites

For this walkthrough, you will need the following:

  • Two AWS accounts – one to host the solution resources, and the second which receives the shared Golden AMI.
    • In the account that hosts the solution, prepare an AWS Identity and Access Management (IAM) principal with the sts:AssumeRole permission. This principal must assume the IAM Role that is listed as an approver in the Systems Manager approval step. The ARN of this IAM principal is used in the AWS CloudFormation Approver parameter, This ARN is added to the trust policy of approval IAM Role.
    • In addition, in the account hosting the solution, ensure that the IAM principal deploying the CloudFormation template has the required permissions to create the resources in the stack.
  • A new Amazon Virtual Private Cloud (Amazon VPC) will be created from the stack. Make sure that you have fewer than five VPCs in the selected Region.

Walkthrough

In this section, we will guide you through the steps required to deploy the Image Builder solution. The solution is deployed with CloudFormation.

In this scenario, we deploy the solution within the approver’s account. The approval email will be sent to a predefined email address for manual approval, before the newly created AMI is shared to target accounts.

The approver first assumes the approval IAM Role and then selects the approval link. This leads to the Systems Manager approval page. Upon approval, an email notification will be sent to the predefined target account email address, notifying the relevant stakeholders that the AMI has been successfully shared.

The high-level steps we will follow are:

  1. In Account A, deploy the provided AWS CloudFormation template. This includes an example Image Builder Pipeline, Amazon SNS topics, Lambda functions, and an SSM Automation Document.
  2. Approve the SNS subscription from your supplied email address.
  3. Run the pipeline from the Amazon EC2 Image Builder Console.
  4. [Optional] To conduct manual tests, launch an Amazon EC2 instance from the built AMI after the pipeline runs.
  5. An email will be sent to you with options to approve or reject the step. Ensure that you have assumed the IAM Role that is the approver before clicking the approval link that leads to the SSM console approval page.
  6. Upon approving the step, an AWS Lambda function shares the AMI to the Account B and also sends an email to the target account email recipients notifying them that the AMI has been shared.
  7. Log in to Account B and verify that the AMI has been shared.

Step 1: Deploy the AWS CloudFormation template

1. The CloudFormation template, template.yaml that deploys the solution can also found at this GitHub repository. Follow the instructions at the repository to deploy the stack.

Step 2: Verify your email address

  1. After running the deployment, you will receive an email prompting you to confirm the Subscription at the approver email address. Choose Confirm subscription.

SNS Topic Subscription confirmation email

  1. This leads to the following screen, which shows that your subscription is confirmed.

subscription-confirmation

  1. Repeat the previous 2 steps for the target email address.

Step 3: Run the pipeline from the Image Builder console

  1. In the Image Builder console, under Image pipelines, select the checkbox next to the Pipeline created, choose Actions, and select Run pipeline.

run-image-builder-pipeline

Note: The pipeline takes approximately 20 – 30 minutes to complete.

Step 4: [Optional] Launch an Amazon EC2 instance from the built AMI

If you have a requirement to manually validate the AMI before sharing it with other accounts or to the AWS organization an approver will launch an Amazon EC2 instance from the built AMI and conduct manual tests on the EC2 instance to make sure it is functional.

  1. In the Amazon EC2 console, under Images, choose AMIs. Validate that the AMI is created.

ami-in-account-a

  1. Follow AWS docs: Launching an EC2 instances from a custom AMI for steps on how to launch an Amazon EC2 instance from the AMI.

Step 5: Select the approval URL in the email sent

  1. When the pipeline is run successfully, you will receive another email with a URL to approve the AMI.

approval-email

  1. Before clicking on the Approve link, you must assume the IAM Role that is set as an approver for the Systems Manager step.
  2. In the CloudFormation console, choose the stack that was deployed.

cloudformation-stack

4. Choose Outputs and copy the IAM Role name.

ssm-approval-role-output

5. While logged in as the IAM Principal that has permissions to assume the approval IAM Role, follow the instructions at AWS IAM documentation for switching a role using the console to assume the approval role.
In the Switch Role page, in Role paste the name of the IAM Role that you copied in the previous step.

Note: This IAM Role was deployed with minimum permissions. Hence, seeing warning messages in the console is expected after assuming this role.

switch-role

6. Now in the approval email, select the Approve URL. This leads to the Systems Manager console. Choose Submit.

approve-console

7. After approving the manual step, the second step is executed, which shares the AMI to the target account.

automation-step-success

Step 6: Verify that the AMI is shared to Account B

  1. Log in to Account B.
  2. In the Amazon EC2 console, under Images, choose AMIs. Then, in the dropdown, choose Private images. Validate that the AMI is shared.

verify-ami-in-account-b

  1. Verify that a success email notification was sent to the target account email address provided.

target-email

Clean up

This section provides the necessary information for deleting various resources created as part of this post.

  1. Deregister the AMIs that were created and shared.
    1. Log in to Account A and follow the steps at AWS documentation: Deregister your Linux AMI.
  2. Delete the CloudFormation stack. For instructions, refer to Deleting a stack on the AWS CloudFormation console.

Conclusion

In this post, we explained how to enable approval notifications for an Image Builder pipeline before AMIs are shared to other accounts. This solution can be extended to share to more than one AWS account or even to an AWS organization. With this solution, you will be notified when new golden images are created, allowing you to verify the accuracy of their configuration before sharing them to for wider use. This reduces the possibility of sharing AMIs with misconfigurations that the written tests may not have identified.

We invite you to experiment with different AMIs created using Image Builder, and with different Image Builder components. Check out this GitHub repository for various examples that use Image Builder. Also check out this blog on Image builder integrations with EC2 Auto Scaling Instance Refresh. Let us know your questions and findings in the comments, and have fun!