Tag Archives: Amazon Simple Notification Service (SNS)

Sending mobile push notifications and managing device tokens with serverless applications

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/sending-mobile-push-notifications-and-managing-device-tokens-with-serverless-application/

This post is written by Rafa Xu, Cloud Architect, Serverless and Joely Huang, Cloud Architect, Serverless.

Amazon Simple Notification Service (SNS) is a fast, flexible, fully managed push messaging service in the cloud. SNS can send mobile push notifications directly to applications on mobile devices such as message alerts and badge updates. SNS sends push notifications to a mobile endpoint created by supplying a mobile token and platform application.

When publishing mobile push notifications, a device token is used to generate an endpoint. This identifies where the push notification is sent (target destination). To push notifications successfully, the token must be up to date and the endpoint must be validated and enabled.

A common challenge when pushing notifications is keeping the token up to date. Tokens can automatically change due to reasons such as mobile operating system (OS) updates and application store updates.

This post provides a serverless solution to this challenge. It also provides a way to publish push notifications to specific end users by maintaining a mapping between users, endpoints, and tokens.

Overview

To publish mobile push notifications using SNS, generate an SNS endpoint to use as a destination target for the push notification. To create the endpoint, you must supply:

  1. A mobile application token: The mobile operating system (OS) issues the token to the application. It is a unique identifier for the application and mobile device pair.
  2. Platform Application Amazon Resource Name (ARN): SNS provides this ARN when you create a platform application object. The platform application object requires a valid set of credentials issued by the mobile platform, which you provide to SNS.

Once the endpoint is generated, you can store and reuse it again. This prevents the application from creating endpoints indefinitely, which could exhaust the SNS endpoint limit.

To reuse the endpoints and successfully push notifications, there are a number of challenges:

  • Mobile application tokens can change due to a number of reasons, such as application updates. As a result, the publisher must update the platform endpoint to ensure it uses an up-to-date token.
  • Mobile application tokens can become invalid. When this happens, messages won’t be published, and SNS disables the endpoint with the invalid token. To resolve this, publishers must retrieve a valid token and re-enable the platform endpoint
  • Mobile applications can have many users, each user could have multiple devices, or one device could have multiple users. To send a push notification to a specific user, a mapping between the user, device, and platform endpoints should be maintained.

For more information on best practices for managing mobile tokens, refer to this post.

Follow along the blog post to learn how to implement a serverless workflow for managing and maintaining valid endpoints and user mappings.

Solution overview

The solution uses the following AWS services:

  • Amazon API Gateway: Provides a token registration endpoint URL used by the mobile application. Once called, it invokes an AWS Lambda function via the Lambda integration.
  • Amazon SNS: Generates and maintains the target endpoint and manages platform application objects.
  • Amazon DynamoDB: Serverless database for storing endpoints that also maintains a mapping between the user, endpoint, and mobile operating system.
  • AWS Lambda: Retrieves endpoints from DynamoDB, validates and generates endpoints, and publishes notifications by making requests to SNS.

The following diagram represents a simplified interaction flow between the AWS services:

Solution architecture

To register the token, the mobile app invokes the registration token endpoint URL generated by Amazon API Gateway. The token registration happens every time a user logs in or opens the application. This ensures that the token and endpoints are always valid during the application usage.

The mobile application passes the token, user, and mobileOS as parameters to API Gateway, which forwards the request to the Lambda function.

The Lambda function validates the token and endpoint for the user by making API calls to DynamoDB and SNS:

  1. The Lambda function checks DynamoDB to see if the endpoint has been previously created.
    1. If the endpoint does not exist, it creates a platform endpoint via SNS.
  2. Obtain the endpoint attributes from SNS:
    1. Check the “enabled” endpoint attribute and set to “true” to enable the platform endpoint, if necessary.
    2. Validate the “token” endpoint attribute with the token provided in the API Gateway request. If it does not match, update the “token” attribute.
    3. Send a request to SNS to update the endpoint attributes.
  3. If a new endpoint is created, update DynamoDB with the new endpoint.
  4. Return a successful response to API Gateway.

Deploying the AWS Serverless Application Model (AWS SAM) template

Use the AWS SAM template to deploy the infrastructure for this workflow. Before deploying the template, first create a platform application in SNS.

  1. Navigate to the SNS console. Select Push Notifications on the left-hand menu to create a platform application:
    Mobile push notifications
  2. This shows the creation of a platform application for iOS applications:
    Create platform application
  3. To install AWS SAM, visit the installation page.
  4. To deploy the AWS SAM template, navigate to the directory where the template is located. Run the commands in the terminal:
    git clone https://github.com/aws-samples/serverless-mobile-push-notification
    cd serverless-mobile-push-notification
    sam build
    sam deploy --guided

Lambda function code snippets

The following section explains code from the Lambda function for the workflow.

Create the platform endpoint

If the endpoint exists, store it as a variable in the code. If the platform endpoint does not exist in the DynamoDB database, create a new endpoint:

        need_update_ddb = False
        response = table.get_item(Key={'username': username, 'appos': appos})
        if 'Item' not in response:
            # create endpoint
            response = snsClient.create_platform_endpoint(
                PlatformApplicationArn=SUPPORTED_PLATFORM[appos],
                Token=token,
            )
            devicePushEndpoint = response['EndpointArn']
            need_update_ddb = True
        else:
            # update the endpoint
            devicePushEndpoint = response['Item']['endpoint']

Check and update endpoint attributes

Check that the token attribute for the platform endpoint matches the token received from the mobile application through the request. This also checks for the endpoint “enabled” attribute and re-enables the endpoint if necessary:

response = snsClient.get_endpoint_attributes(
                EndpointArn=devicePushEndpoint
            )
            endpointAttributes = response['Attributes']

            previousToken = endpointAttributes['Token']
            previousStatus = endpointAttributes['Enabled']
            if previousStatus.lower() != 'true' or previousToken != token:
                snsClient.set_endpoint_attributes(
                    EndpointArn=devicePushEndpoint,
                    Attributes={
                        'Token': token,
                        'Enabled': 'true'
                    }
                )

Update the DynamoDB table with the newly generated endpoint

If a platform endpoint is newly created, meaning there is no item in the DynamoDB table, create a new item in the table:

        if need_update_ddb:
            table.update_item(
                Key={
                    'username': username,
                    'appos': appos
                },
                UpdateExpression="set endpoint=:e",
                ExpressionAttributeValues={
                    ':e': devicePushEndpoint
                },
                ReturnValues="UPDATED_NEW"
            )

As best practice, the code cleans up the table, in case there are multiple entries for the same endpoint mapped to different users. This can happen when the mobile application is used by multiple users on the same device. When one user logs out and a different user logs in, this creates a new entry in the DynamoDB table to map the endpoint with the new user.

As a result, you must remove the entry that maps the same endpoint to the previously logged in user. This way, you only keep the endpoint that matches the user provided by the mobile application through the request.

result = table.query(
    # Add the name of the index you want to use in your query.
        IndexName="endpoint-index",
        KeyConditionExpression=Key('endpoint').eq(devicePushEndpoint),
    )
    for item in result['Items']:
        if item['username'] != username and item['appos'] == appos:
            print(f"deleting orphan item: username {username}, os {appos}".format(username=item['username'], appos=appos))
            table.delete_item(
                Key={
                    'username': item['username'],
                    'appos': appos
                },
            )

Conclusion

This blog shows how to deploy a serverless solution for validating and managing SNS platform endpoints and tokens. To publish push notifications successfully, use SNS to check the endpoint attribute and ensure it is mapped to the correct token and the endpoint is enabled.

This approach uses DynamoDB to store the device token and platform endpoints for each user. This allows you to send push notifications to specific users, retrieve, and reuse previously created endpoints. You create a Lambda function to facilitate the workflow, including validating the DynamoDB item for storing an enabled and up-to-date token.

Visit this link to learn more about Amazon SNS mobile push notifications: http://docs.aws.amazon.com/sns/latest/dg/SNSMobilePush.html

For more serverless learning resources, visit Serverless Land.

Building well-architected serverless applications: Building in resiliency – part 2

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-building-in-resiliency-part-2/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

Reliability question REL2: How do you build resiliency into your serverless application?

This post continues part 1 of this reliability question. Previously, I cover managing failures using retries, exponential backoff, and jitter. I explain how DLQs can isolate failed messages. I show how to use state machines to orchestrate long running transactions rather than handling these in application code.

Required practice: Manage duplicate and unwanted events

Duplicate events can occur when a request is retried or multiple consumers process the same message from a queue or stream. A duplicate can also happen when a request is sent twice at different time intervals with the same parameters. Design your applications to process multiple identical requests to have the same effect as making a single request.

Idempotency refers to the capacity of an application or component to identify repeated events and prevent duplicated, inconsistent, or lost data. This means that receiving the same event multiple times does not change the result beyond the first time the event was received. An idempotent application can, for example, handle multiple identical refund operations. The first refund operation is processed. Any further refund requests to the same customer with the same payment reference should not be processes again.

When using AWS Lambda, you can make your function idempotent. The function’s code must properly validate input events and identify if the events were processed before. For more information, see “How do I make my Lambda function idempotent?

When processing streaming data, your application must anticipate and appropriately handle processing individual records multiple times. There are two primary reasons why records may be delivered more than once to your Amazon Kinesis Data Streams application: producer retries and consumer retries. For more information, see “Handling Duplicate Records”.

Generate unique attributes to manage duplicate events at the beginning of the transaction

Create, or use an existing unique identifier at the beginning of a transaction to ensure idempotency. These identifiers are also known as idempotency tokens. A number of Lambda triggers include a unique identifier as part of the event:

You can also create your own identifiers. These can be business-specific, such as transaction ID, payment ID, or booking ID. You can use an opaque random alphanumeric string, unique correlation identifiers, or the hash of the content.

A Lambda function, for example can use these identifiers to check whether the event has been previously processed.

Depending on the final destination, duplicate events might write to the same record with the same content instead of generating a duplicate entry. This may therefore not require additional safeguards.

Use an external system to store unique transaction attributes and verify for duplicates

Lambda functions can use Amazon DynamoDB to store and track transactions and idempotency tokens to determine if the transaction has been handled previously. DynamoDB Time to Live (TTL) allows you to define a per-item timestamp to determine when an item is no longer needed. This helps to limit the storage space used. Base the TTL on the event source. For example, the message retention period for SQS.

Using DynamoDB to store idempotent tokens

Using DynamoDB to store idempotent tokens

You can also use DynamoDB conditional writes to ensure a write operation only succeeds if an item attribute meets one of more expected conditions. For example, you can use this to fail a refund operation if a payment reference has already been refunded. This signals to the application that it is a duplicate transaction. The application can then catch this exception and return the same result to the customer as if the refund was processed successfully.

Third-party APIs can also support idempotency directly. For example, Stripe allows you to add an Idempotency-Key: <key> header to the request. Stripe saves the resulting status code and body of the first request made for any given idempotency key, regardless of whether it succeeded or failed. Subsequent requests with the same key return the same result.

Validate events using a pre-defined and agreed upon schema

Implicitly trusting data from clients, external sources, or machines could lead to malformed data being processed. Use a schema to validate your event conforms to what you are expecting. Process the event using the schema within your application code or at the event source when applicable. Events not adhering to your schema should be discarded.

For API Gateway, I cover validating incoming HTTP requests against a schema in “Implementing application workload security – part 1”.

Amazon EventBridge rules match event patterns. EventBridge provides schemas for all events that are generated by AWS services. You can create or upload custom schemas or infer schemas directly from events on an event bus. You can also generate code bindings for event schemas.

SNS supports message filtering. This allows a subscriber to receive a subset of the messages sent to the topic using a filter policy. For more information, see the documentation.

JSON Schema is a tool for validating the structure of JSON documents. There are a number of implementations available.

Best practice: Consider scaling patterns at burst rates

Load testing your serverless application allows you to monitor the performance of an application before it is deployed to production. Serverless applications can be simpler to load test, thanks to the automatic scaling built into many of the services. For more information, see “How to design Serverless Applications for massive scale”.

In addition to your baseline performance, consider evaluating how your workload handles initial burst rates. This ensures that your workload can sustain burst rates while scaling to meet possibly unexpected demand.

Perform load tests using a burst strategy with random intervals of idleness

Perform load tests using a burst of requests for a short period of time. Also introduce burst delays to allow your components to recover from unexpected load. This allows you to future-proof the workload for key events when you do not know peak traffic levels.

There are a number of AWS Marketplace and AWS Partner Network (APN) solutions available for performance testing, including Gatling FrontLine, BlazeMeter, and Apica.

In regulating inbound request rates – part 1, I cover running a performance test suite using Gatling, an open source tool.

Gatling performance results

Gatling performance results

Amazon does have a network stress testing policy that defines which high volume network tests are allowed. Tests that purposefully attempt to overwhelm the target and/or infrastructure are considered distributed denial of service (DDoS) tests and are prohibited. For more information, see “Amazon EC2 Testing Policy”.

Review service account limits with combined utilization across resources

AWS accounts have default quotas, also referred to as limits, for each AWS service. These are generally Region-specific. You can request increases for some limits while other limits cannot be increased. Service Quotas is an AWS service that helps you manage your limits for many AWS services. Along with looking up the values, you can also request a limit increase from the Service Quotas console.

Service Quotas dashboard

Service Quotas dashboard

As these limits are shared within an account, review the combined utilization across resources including the following:

  • Amazon API Gateway: number of requests per second across all APIs. (link)
  • AWS AppSync: throttle rate limits. (link)
  • AWS Lambda: function concurrency reservations and pool capacity to allow other functions to scale. (link)
  • Amazon CloudFront: requests per second per distribution. (link)
  • AWS IoT Core message broker: concurrent requests per second. (link)
  • Amazon EventBridge: API requests and target invocations limit. (link)
  • Amazon Cognito: API limits. (link)
  • Amazon DynamoDB: throughput, indexes, and request rates limits. (link)

Evaluate key metrics to understand how workloads recover from bursts

There are a number of key Amazon CloudWatch metrics to evaluate and alert on to understand whether your workload recovers from bursts.

  • AWS Lambda: Duration, Errors, Throttling, ConcurrentExecutions, UnreservedConcurrentExecutions. (link)
  • Amazon API Gateway: Latency, IntegrationLatency, 5xxError, 4xxError. (link)
  • Application Load Balancer: HTTPCode_ELB_5XX_Count, RejectedConnectionCount, HTTPCode_Target_5XX_Count, UnHealthyHostCount, LambdaInternalError, LambdaUserError. (link)
  • AWS AppSync: 5XX, Latency. (link)
  • Amazon SQS: ApproximateAgeOfOldestMessage. (link)
  • Amazon Kinesis Data Streams: ReadProvisionedThroughputExceeded, WriteProvisionedThroughputExceeded, GetRecords.IteratorAgeMilliseconds, PutRecord.Success, PutRecords.Success (if using Kinesis Producer Library), GetRecords.Success. (link)
  • Amazon SNS: NumberOfNotificationsFailed, NumberOfNotificationsFilteredOut-InvalidAttributes. (link)
  • Amazon Simple Email Service (SES): Rejects, Bounces, Complaints, Rendering Failures. (link)
  • AWS Step Functions: ExecutionThrottled, ExecutionsFailed, ExecutionsTimedOut. (link)
  • Amazon EventBridge: FailedInvocations, ThrottledRules. (link)
  • Amazon S3: 5xxErrors, TotalRequestLatency. (link)
  • Amazon DynamoDB: ReadThrottleEvents, WriteThrottleEvents, SystemErrors, ThrottledRequests, UserErrors. (link)

Conclusion

This post continues from part 1 and looks at managing duplicate and unwanted events with idempotency and an event schema. I cover how to consider scaling patterns at burst rates by managing account limits and show relevant metrics to evaluate

Build resiliency into your workloads. Ensure that applications can withstand partial and intermittent failures across components that may only surface in production. In the next post in the series, I cover the performance efficiency pillar from the Well-Architected Serverless Lens.

For more serverless learning resources, visit Serverless Land.

Building well-architected serverless applications: Building in resiliency – part 1

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-building-in-resiliency-part-1/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

Reliability question REL2: How do you build resiliency into your serverless application?

Evaluate scaling mechanisms for serverless and non-serverless resources to meet customer demand. Build resiliency into your workload to make your serverless application resilient to withstand partial and intermittent failures across components that may only surface in production.

Required practice: Manage transaction, partial, and intermittent failures

Whenever one service or system calls another, there is a chance that failures can happen. Services or systems often don’t fail as a single unit, but rather suffer partial or transient failures. Applications should be designed to handle component failures as part of the architecture. The system should be designed to detect failure and, ideally, automatically heal itself.

Transaction failures can occur when a component is unavailable or under high load. Partial failures can occur when a percentage of requests succeeds, including during batch processing. Intermittent failures might occur when a request fails for a short period of time due to network or other transient issues.

AWS serverless services, including AWS Lambda, are fault-tolerant and designed to handle failures. If a service invokes a Lambda function and there is a service disruption, Lambda invokes the function in a different Availability Zone.

When you invoke a function directly, you determine the strategy for handling errors. You can retry, send the event to a destination or queue for debugging, or ignore the error. Clients such as the AWS Command Line Interface (CLI) and the AWS SDK retry on client timeouts, throttling errors (429), and other errors that are not caused by a bad request.

When you invoke a function indirectly, you must be aware of the retry behavior of the invoker and any service that the request encounters along the way. For more information, see “Error handling and automatic retries in AWS Lambda”. You can configure Maximum Retry Attempts and Maximum Event Age for asynchronous invocations.

When reading from Amazon Kinesis Data Streams and Amazon DynamoDB Streams, Lambda retries the entire batch of items. Retries continue until the records expire or exceed the maximum age that you configure on the event source mapping. You can also configure the event source mapping to split a failed batch into two batches. Retrying with smaller batches isolates bad records and works around timeout issues.

Partial failures can occur in non-atomic operations. PutRecords for Kinesis and BatchWriteItem for DynamoDB return a successful response if at least one record is ingested successfully. Always inspect the response when using such operations and programmatically deal with partial failures.

Use exponential backoff with jitter

The simplest technique for dealing with failures in a networked environment is to retry calls until they succeed. This technique increases the reliability of the application and reduces operational costs for the developer.

However, it is not always safe to retry. A retry can further increase the load on the system being called if the system is already failing due to an overload. To avoid this problem, use backoff. Instead of retrying immediately and aggressively, the client waits some amount of time between tries. The most common pattern is an exponential backoff, which uses exponentially longer wait times between retries. This is typically capped to a maximum delay and number of retries.

If all backoff retries are still happening at the same time, this can still overload a system or cause contention. To avoid this problem, use jitter. Jitter adds some amount of randomness to the backoff to spread the retries around in time. This can help prevent large bursts by spreading out the rate when clients connect. For more information see the Amazon Builders’ Library article “Timeouts, retries, and backoff with jitter” and AWS Architecture blog post “Exponential Backoff And Jitter”.

Exponential backoff and jitter

Exponential backoff and jitter

When your application responds to callers in fail-fast scenarios and when performance is degraded, inform the caller via headers or metadata when they can retry.

Each AWS SDK implements automatic retry logic including exponential backoff. For downstream calls, you can adjust AWS and third-party SDK retries, backoffs, TCP, and HTTP timeouts. This helps you decide when to stop retrying. For more information, see the documentation and troubleshooting steps for Lambda and the AWS SDK.

Use a dead-letter queue mechanism to retain, investigate and retry failed transactions

There are a number of ways to handle message failures including destinations and dead-letter queues.

You can configure Lambda to send records of asynchronous invocations to another destination service. These include Amazon Simple Queue Service (SQS), Amazon Simple Notification Service (SNS), Lambda, and Amazon EventBridge. You can configure separate destinations for events that fail processing and events that are successfully processed. The invocation record contains details about the event, the response, and the reason that the record was sent.

The following example shows a function that sends a record of a successful invocation to an EventBridge event bus. When an event fails all processing attempts, Lambda sends an invocation record to an SQS queue. It includes the function’s response in the invocation record.

AWS Lambda destinations for asynchronous invocation

AWS Lambda destinations for asynchronous invocation

SNS, SQS, Lambda, and EventBridge support dead-letter queues (DLQs). DLQs make your applications more resilient and durable by storing messages or events that can’t be processed correctly into a dedicated SQS queue. This helps you debug your application by isolating the problematic messages to determine why their processing failed. One you have resolved the issue, re-process the failed message. For more information, see “When should I use a dead-letter queue?” There is an example serverless application to redrive the messages from an SQS DLQ back to its source SQS queue.

For Lambda, DLQs provide an alternative to a failure destination. Lambda destinations is preferable for asynchronous invocations.

Good practice: Orchestrate long-running transactions

Long-running transactions can be processed by one or multiple components. Consider implementing the saga pattern using state machines for these types of transactions.

The saga pattern coordinates transactions between multiple microservices as part of a state machine. Each service that performs a transaction publishes an event to trigger the next transaction in the saga. This continues until the transaction chain is complete. If a transaction fails, saga orchestrates a series of compensating transactions that undo the changes that were made by the preceding transactions.

This is preferable to handling complex or long-running transactions within application code. State machines prevent cascading failures and avoid tightly coupling components with orchestrating logic and business logic.

Use a state machine to visualize distributed transactions, and to separate business logic from orchestration logic.

AWS Step Functions lets you coordinate multiple AWS services into serverless workflows via state machines. Within Step Functions, you can set separate retries, backoff rates, max attempts, intervals, and timeouts. These are set for every step of your state machine using a declarative language.

In the serverless airline example used in this series, Step Functions is used to orchestrate the Booking microservice. The ProcessBooking state machine handles all the necessary steps to create bookings, including payment.

Booking service Step Functions state machine

Booking service Step Functions state machine

The state machine uses a combination of service integrations using DynamoDB, SQS, and Lambda functions to coordinate transactions and handle failures.

For example, the Reserve Booking task invokes a Lambda function. The task has retry and error handling configured as part of the task definition.

"Reserve Booking": {
	"Type": "Task",
	"Resource": "${ReserveBooking.Arn}",
	"TimeoutSeconds": 5,
	"Retry": [
		{
			"ErrorEquals": [
				"BookingReservationException"
			],
			"IntervalSeconds": 1,
			"BackoffRate": 2,
			"MaxAttempts": 2
		}
	],
	"Catch": [
		{
			"ErrorEquals": [
				"States.ALL"
			],
			"ResultPath": "$.bookingError",
			"Next": "Cancel Booking"
		}
	],
	"ResultPath": "$.bookingId",
	"Next": "Collect Payment"
},

Step Functions supports direct service integrations, including DynamoDB. The Reserve Flight task directly updates the flightTable without requiring a Lambda function.

"Reserve Flight": {
	"Type": "Task",
	"Resource": "arn:aws:states:::dynamodb:updateItem",
	"Parameters": {
		"TableName.$": "$.flightTable",
		"Key": {
			"id": {
				"S.$": "$.outboundFlightId"
			}
		},
		"UpdateExpression": "SET seatCapacity = seatCapacity - :dec",
		"ExpressionAttributeValues": {
			":dec": {
				"N": "1"
			},
			":noSeat": {
				"N": "0"
			}
		},
		"ConditionExpression": "seatCapacity > :noSeat"
	},

By default, when a state reports an error, Step Functions causes the execution to fail entirely.

Utilize dead-letter queues in response to failed state machine executions

Any state within the Step Functions workflow can encounter runtime errors. These include state machine definition issues, task failures such as Lambda function exceptions, or transient issues such as network connectivity issues. For more information, see “Error handling in Step Functions”.

Use the Step Functions service integration with SQS to send failed transactions to a DLQ as the final step. This adds a higher level of durability within your state machines.

For example, the airline Notify Failed Booking final task catches failed states from four previous steps. It sends the results to the Booking DLQ.

Booking service Step Functions DLQ

Booking service Step Functions DLQ

The message includes the output of the previous failed states for further troubleshooting.

"Booking DLQ": {
	"Type": "Task",
	"Resource": "arn:aws:states:::sqs:sendMessage",
	"Parameters": {
		"QueueUrl": "${BookingsDLQ}",
		"MessageBody.$": "$"
	},
	"ResultPath": "$.deadLetterQueue",
	"Next": "Booking Failed"
},

The Step Functions documentation has more information on calling SQS.

Conclusion

Build resiliency into your workloads. This makes sure that your application can withstand partial and intermittent failures across components that may only surface in production.

In this post, I cover managing failures using retries, exponential backoff, and jitter. I explain how DLQs can isolate failed messages. I show how to use state machines to orchestrate long running transactions rather than handling these in application code.

This well-architected question continues in part 2 where I look at managing duplicate and unwanted events with idempotency and an event schema. I cover how to consider scaling patterns at burst rates by managing account limits and show relevant metrics to evaluate.

For more serverless learning resources, visit Serverless Land.

Building well-architected serverless applications: Regulating inbound request rates – part 1

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-regulating-inbound-request-rates-part-1/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

Reliability question REL1: How do you regulate inbound request rates?

Defining, analyzing, and enforcing inbound request rates helps achieve better throughput. Regulation helps you adapt different scaling mechanisms based on customer demand. By regulating inbound request rates, you can achieve better throughput, and adapt client request submissions to a request rate that your workload can support.

Required practice: Control inbound request rates using throttling

Throttle inbound request rates using steady-rate and burst rate requests

Throttling requests limits the number of requests a client can make during a certain period of time. Throttling allows you to control your API traffic. This helps your backend services maintain their performance and availability levels by limiting the number of requests to actual system throughput.

To prevent your API from being overwhelmed by too many requests, Amazon API Gateway throttles requests to your API. These limits are applied across all clients using the token bucket algorithm. API Gateway sets a limit on a steady-state rate and a burst of request submissions. The algorithm is based on an analogy of filling and emptying a bucket of tokens representing the number of available requests that can be processed.

Each API request removes a token from the bucket. The throttle rate then determines how many requests are allowed per second. The throttle burst determines how many concurrent requests are allowed. I explain the token bucket algorithm in more detail in “Building well-architected serverless applications: Controlling serverless API access – part 2

Token bucket algorithm

Token bucket algorithm

API Gateway limits the steady-state rate and burst requests per second. These are shared across all APIs per Region in an account. For further information on account-level throttling per Region, see the documentation. You can request account-level rate limit increases using the AWS Support Center. For more information, see Amazon API Gateway quotas and important notes.

You can configure your own throttling levels, within the account and Region limits to improve overall performance across all APIs in your account. This restricts the overall request submissions so that they don’t exceed the account-level throttling limits.

You can also configure per-client throttling limits. Usage plans restrict client request submissions to within specified request rates and quotas. These are applied to clients using API keys that are associated with your usage policy as a client identifier. You can add throttling levels per API route, stage, or method that are applied in a specific order.

For more information on API Gateway throttling, see the AWS re:Invent presentation “I didn’t know Amazon API Gateway could do that”.

API Gateway throttling

API Gateway throttling

You can also throttle requests by introducing a buffering layer using Amazon Kinesis Data Stream or Amazon SQS. Kinesis can limit the number of requests at the shard level while SQS can limit at the consumer level. For more information on using SQS as a buffer with Amazon Simple Notification Service (SNS), read “How To: Use SNS and SQS to Distribute and Throttle Events”.

Identify steady-rate and burst rate requests that your workload can sustain at any point in time before performance degraded

Load testing your serverless application allows you to monitor the performance of an application before it is deployed to production. Serverless applications can be simpler to load test, thanks to the automatic scaling built into many of the services. During a load test, you can identify quotas that may act as a limiting factor for the traffic you expect and take action.

Perform load testing for a sustained period of time. Gradually increase the traffic to your API to determine your steady-state rate of requests. Also use a burst strategy with no ramp up to determine the burst rates that your workload can serve without errors or performance degradation. There are a number of AWS Marketplace and AWS Partner Network (APN) solutions available for performance testing, Gatling Frontline, BlazeMeter, and Apica.

In the serverless airline example used in this series, you can run a performance test suite using Gatling, an open source tool.

To deploy the test suite, follow the instructions in the GitHub repository perf-tests directory. Uncomment the deploy.perftest line in the repository Makefile.

Perf-test makefile

Perf-test makefile

Once the file is pushed to GitHub, AWS Amplify Console rebuilds the application, and deploys an AWS CloudFormation stack. You can run the load tests locally, or use an AWS Step Functions state machine to run the setup and Gatling load test simulation.

Performance test using Step Functions

Performance test using Step Functions

The Gatling simulation script uses constantUsersPerSec and rampUsersPerSec to add users for a number of test scenarios. You can use the test to simulate load on the application. Once the tests run, it generates a downloadable report.

Gatling performance results

Gatling performance results

Artillery Community Edition is another open-source tool for testing serverless APIs. You configure the number of requests per second and overall test duration, and it uses a headless Chromium browser to run its test flows. For Artillery, the maximum number of concurrent tests is constrained by your local computing resources and network. To achieve higher throughput, you can use Serverless Artillery, which runs the Artillery package on Lambda functions. As a result, this tool can scale up to a significantly higher number of tests.

For more information on how to use Artillery, see “Load testing a web application’s serverless backend”. This runs tests against APIs in a demo application. For example, one of the tests fetches 50,000 questions per hour. This calls an API Gateway endpoint and tests whether the AWS Lambda function, which queries an Amazon DynamoDB table, can handle the load.

Artillery performance test

Artillery performance test

This is a synchronous API so the performance directly impacts the user’s experience of the application. This test shows that the median response time is 165 ms with a p95 time of 201 ms.

Performance test API results

Performance test API results

Another consideration for API load testing is whether the authentication and authorization service can handle the load. For more information on load testing Amazon Cognito and API Gateway using Step Functions, see “Using serverless to load test Amazon API Gateway with authorization”.

API load testing with authentication and authorization

API load testing with authentication and authorization

Conclusion

Regulating inbound requests helps you adapt different scaling mechanisms based on customer demand. You can achieve better throughput for your workloads and make them more reliable by controlling requests to a rate that your workload can support.

In this post, I cover controlling inbound request rates using throttling. I show how to use throttling to control steady-rate and burst rate requests. I show some solutions for performance testing to identify the request rates that your workload can sustain before performance degradation.

This well-architected question will be continued where I look at using, analyzing, and enforcing API quotas. I cover mechanisms to protect non-scalable resources.

For more serverless learning resources, visit Serverless Land.

Using Amazon Macie to Validate S3 Bucket Data Classification

Post Syndicated from Bill Magee original https://aws.amazon.com/blogs/architecture/using-amazon-macie-to-validate-s3-bucket-data-classification/

Securing sensitive information is a high priority for organizations for many reasons. At the same time, organizations are looking for ways to empower development teams to stay agile and innovative. Centralized security teams strive to create systems that align to the needs of the development teams, rather than mandating how those teams must operate.

Security teams who create automation for the discovery of sensitive data have some issues to consider. If development teams are able to self-provision data storage, how does the security team protect that data? If teams have a business need to store sensitive data, they must consider how, where, and with what safeguards that data is stored.

Let’s look at how we can set up Amazon Macie to validate data classifications provided by decentralized software development teams. Macie is a fully managed service that uses machine learning (ML) to discover sensitive data in AWS. If you are not familiar with Macie, read New – Enhanced Amazon Macie Now Available with Substantially Reduced Pricing.

Data classification is part of the security pillar of a Well-Architected application. Following the guidelines provided in the AWS Well-Architected Framework, we can develop a resource-tagging scheme that fits our needs.

Overview of decentralized data validation system

In our example, we have multiple levels of data classification that represent different levels of risk associated with each classification. When a software development team creates a new Amazon Simple Storage Service (S3) bucket, they are responsible for labeling that bucket with a tag. This tag represents the classification of data stored in that bucket. The security team must maintain a system to validate that the data in those buckets meets the classification specified by the development teams.

This separation of roles and responsibilities for development and security teams who work independently requires a validation system that’s decoupled from S3 bucket creation. It should automatically detect new buckets or data in the existing buckets, and validate the data against the assigned classification tags. It should also notify the appropriate development teams of misclassified or unclassified buckets in a timely manner. These notifications can be through standard notification channels, such as email or Slack channel notifications.

Validation and alerts with AWS services

Figure 1. Validation system for Data Classification

Figure 1. Validation system for data classification

We assume that teams are permitted to create S3 buckets and we will use AWS Config to enforce the following required tags: DataClassification and SupportSNSTopic. The DataClassification tag indicates what type of data is allowed in the bucket. The SupportSNSTopic tag indicates an Amazon Simple Notification Service (SNS) topic. If there are issues found with the data in the bucket, a message is published to the topic, and Amazon SNS will deliver an alert. For example, if there is personally identifiable information (PII) data in a bucket that is classified as non-sensitive, the system will alert the owners of the bucket.

Macie is configured to scan all S3 buckets on a scheduled basis. This configuration ensures that any new bucket and data placed in the buckets is analyzed the next time the Macie job runs.

Macie provides several managed data identifiers for discovering and classifying the data. These include bank account numbers, credit card information, authentication credentials, PII, and more. You can also create custom identifiers (or rules) to gather information not covered by the managed identifiers.

Macie integrates with Amazon EventBridge to allow us to capture data classification events and route them to one or more destinations for reporting and alerting needs. In our configuration, the event initiates an AWS Lambda. The Lambda function is used to validate the data classification inferred by Macie against the classification specified in the DataClassification tag using custom business logic. If a data classification violation is found, the Lambda then sends a message to the Amazon SNS topic specified in the SupportSNSTopic tag.

The Lambda function also creates custom metrics and sends those to Amazon CloudWatch. The metrics are organized by engineering team and severity. This allows the security team to create a dashboard of metrics based on the Macie findings. The findings can also be filtered per engineering team and severity to determine which teams need to be contacted to ensure remediation.

Conclusion

This solution provides a centralized security team with the tools it needs. The team can validate the data classification of an Amazon S3 bucket that is self-provisioned by a development team. New Amazon S3 buckets are automatically included in the Macie jobs and alerts. These are only sent out if the data in the bucket does not conform to the classification specified by the development team. The data auditing process is loosely coupled with the Amazon S3 Bucket creation process, enabling self-service capabilities for development teams, while ensuring proper data classification. Your teams can stay agile and innovative, while maintaining a strong security posture.

Learn more about Amazon Macie and Data Classification.

Should I Run my Containers on AWS Fargate, AWS Lambda, or Both?

Post Syndicated from Rob Solomon original https://aws.amazon.com/blogs/architecture/should-i-run-my-containers-on-aws-fargate-aws-lambda-or-both/

Containers have transformed how companies build and operate software. Bundling both application code and dependencies into a single container image improves agility and reduces deployment failures. But what compute platform should you choose to be most efficient, and what factors should you consider in this decision?

With the release of container image support for AWS Lambda functions (December 2020), customers now have an additional option for building serverless applications using their existing container-oriented tooling and DevOps best practices. In addition, a single container image can be configured to run on both of these compute platforms: AWS Lambda (using serverless functions) or AWS Fargate (using containers).

Three key factors can influence the decision of what platform you use to deploy your container: startup time, task runtime, and cost. That decision may vary each time a task is initiated, as shown in the three scenarios following.

Design considerations for deploying a container

Total task duration consists of startup time and runtime. The startup time of a containerized task is the time required to provision the container compute resource and deploy the container. Task runtime is the time it takes for the application code to complete.

Startup time: Some tasks must complete quickly. For example, when a user waits for a web response, or when a series of tasks is completed in sequential order. In those situations, the total duration time must be minimal. While the application code may be optimized to run faster, startup time depends on the chosen compute platform as well. AWS Fargate container startup time typically takes from 60 to 90 seconds. AWS Lambda initial cold start can take up to 5 seconds. Following that first startup, the same containerized function has negligible startup time.

Task runtime: The amount of time it takes for a task to complete is influenced by the compute resources allocated (vCPU and memory) and application code. AWS Fargate lets you select vCPU and memory size. With AWS Lambda, you define the amount of allocated memory. Lambda then provisions a proportional quantity of vCPU. In both AWS Fargate and AWS Lambda uses, increasing the amount of compute resources may result in faster completion time. However, this will depend on the application. While the additional compute resources incur greater cost, the total duration may be shorter, so the overall cost may also be lower.

AWS Lambda has a maximum limit of 15 minutes of runtime. Lambda shouldn’t be used for these tasks to avoid the likelihood of timeout errors.

Figure 1 illustrates the proportion of startup time to total duration. The initial steepness of each line shows a rapid decrease in startup overhead. This is followed by a flattening out, showing a diminishing rate of efficiency. Startup time delay becomes less impactful as the total job duration increases. Other factors (such as cost) become more significant.

Figure 1. Ratio of startup time as a function to overall job duration for each service

Figure 1. Ratio of startup time as a function to overall job duration for each service

Cost: When making the choice between Fargate and Lambda, it is important to understand the different pricing models. This way, you can make the appropriate selection for your needs.

Figure 2 shows a cost analysis of Lambda vs Fargate. This is for the entire range of configurations for a runtime task. For most of the range of configurable memory, AWS Lambda is more expensive per second than even the most expensive configuration of Fargate.

Figure 2. Total cost for both AWS Lambda and AWS Fargate based on task duration

Figure 2. Total cost for both AWS Lambda and AWS Fargate based on task duration

From a cost perspective, AWS Fargate is more cost-effective for tasks running for several seconds or longer. If cost is the only factor at play, then Fargate would be the better choice. But the savings gained by using Fargate may be offset by the business value gained from the shorter Lambda function startup time.

Dynamically choose your compute platform

In the following scenarios, we show how a single container image can serve multiple use cases. The decision to run a given containerized application on either AWS Lambda or AWS Fargate can be determined at runtime. This decision depends on whether cost, speed, or duration are the priority.

In Figure 3, an image-processing AWS Batch job runs on a nightly schedule, processing tens of thousands of images to extract location information. When run as a batch job, image processing may take 1–2 hours. The job pulls images stored in Amazon Simple Storage Service (S3) and writes the location metadata to Amazon DynamoDB. In this case, AWS Fargate provides a good combination of compute and cost efficiency. An added benefit is that it also supports tasks that exceed 15 minutes. If a single image is submitted for real-time processing, response time is critical. In that case, the same image-processing code can be run on AWS Lambda, using the same container image. Rather than waiting for the next batch process to run, the image is processed immediately.

Figure 3. One-off invocation of a typically long-running batch job

Figure 3. One-off invocation of a typically long-running batch job

In Figure 4, a SaaS application uses an AWS Lambda function to allow customers to submit complex text search queries for files stored in an Amazon Elastic File System (EFS) volume. The task should return results quickly, which is an ideal condition for AWS Lambda. However, a small percentage of jobs run much longer than the average, exceeding the maximum duration of 15 minutes.

A straightforward approach to avoid job failure is to initiate an Amazon CloudWatch alarm when the Lambda function times out. CloudWatch alarms can automatically retry the job using Fargate. An alternate approach is to capture historical data and use it to create a machine learning model in Amazon SageMaker. When a new job is initiated, the SageMaker model can predict the time it will take the job to complete. Lambda can use that prediction to route the job to either AWS Lambda or AWS Fargate.

Figure 4. Short duration tasks with occasional outliers running longer than 15 minutes

Figure 4. Short duration tasks with occasional outliers running longer than 15 minutes

In Figure 5, a customer runs a containerized legacy application that encompasses many different kinds of functions, all related to a recurring data processing workflow. Each function performs a task of varying complexity and duration. These can range from processing data files, updating a database, or submitting machine learning jobs.

Using a container image, one code base can be configured to contain all of the individual functions. Longer running functions, such as data preparation and big data analytics, are routed to Fargate. Shorter duration functions like simple queries can be configured to run using the container image in AWS Lambda. By using AWS Step Functions as an orchestrator, the process can be automated. In this way, a monolithic application can be broken up into a set of “Units of Work” that operate independently.

Figure 5. Heterogeneous function orchestration

Figure 5. Heterogeneous function orchestration

Conclusion

If your job lasts milliseconds and requires a fast response to provide a good customer experience, use AWS Lambda. If your function is not time-sensitive and runs on the scale of minutes, use AWS Fargate. For tasks that have a total duration of under 15 minutes, customers must decide based on impacts to both business and cost. Select the service that is the most effective serverless compute environment to meet your requirements. The choice can be made manually when a job is scheduled or by using retry logic to switch to the other compute platform if the first option fails. The decision can also be based on a machine learning model trained on historical data.

ICYMI: Serverless Q2 2021

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-q2-2021/

Welcome to the 14th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

Q2 calendar

In case you missed our last ICYMI, check out what happened last quarter here.

AWS Step Functions

Step Functions launched Workflow Studio, a new visual tool that provides a drag-and-drop user interface to build Step Functions workflows. This exposes all the capabilities of Step Functions that are available in Amazon States Language (ASL). This makes it easier to build and change workflows and build definitions in near-real time.

For more:

Workflow Studio

The new data flow simulator in the Step Functions console helps you evaluate the inputs and outputs passed through your state machine. It allows you to simulate each of the fields used to process data and updates in real time. It can help accelerate development with workflows and help visualize JSONPath processing.

For more:

Data flow simulator

Also, Amazon API Gateway can now invoke synchronous Express Workflows using REST APIs.

Amazon EventBridge

EventBridge now supports cross-Region event routing from any commercial AWS Region to a list of supported Regions. This feature allows you to centralize global events for auditing and monitoring or replicate events across Regions.

EventBridge cross-Region routing

The service now also supports bus-to-bus event routing in the same Region and in the same AWS account. This can be useful for centralizing events related to a single project, application, or team within your organization.

EventBridge bus-to-bus

You can now use EventBridge as a resource within Step Functions workflows. This provides a direct service integration for both standard and Express Workflows. You can publish events directly to a specified event bus using either a request-response or wait-for-callback pattern.

EventBridge added a new target for rules – Amazon SageMaker Pipelines. This allows you to use a rule to trigger a continuous integration and continuous deployment (CI/CD) service for your machine learning workloads.

AWS Lambda

Lambda Extensions

AWS Lambda extensions are now generally available including some performance and functionality improvements. Lambda extensions provide a new way to integrate your chosen monitoring, observability, security, and governance tools with AWS Lambda. These use the Lambda Runtime Extensions API to integrate with the execution environment and provide hooks into the Lambda lifecycle.

To help build your own extensions, there is an updated GitHub repository with example code.

To learn more:

  • Watch a Tech Talk with Julian Wood.
  • Watch the 8-episode Learning Path series covering all aspects of extensions.

Extensions available today

Amazon CloudWatch Lambda Insights support for Lambda container images is now generally available.

Amazon SNS

Amazon SNS has expanded the set of filter operators available to include IP address matching, existence of an attribute key, and “anything-but” matching.

The service has also introduced an SMS sandbox to help developers testing workloads that send text messages.

To learn more:

Amazon DynamoDB

DynamoDB announced CloudFormation support for several features. First, it now supports configuring Kinesis Data Streams using CloudFormation. This allows you to use infrastructure as code to set up Kinesis Data Streams instead of DynamoDB streams.

The service also announced that NoSQL Workbench now supports CloudFormation, so you can build data models and configure table capacity settings directly from the tool. Finally, you can now create and manage global tables with CloudFormation.

Learn how to use the recently launched Serverless Patterns Collection to configure DynamoDB as an event source for Lambda.

AWS Amplify

Amplify Hosting announced support for server-side rendered (SSR) apps built with the Next.js framework. This provides a zero configuration option for developers to deploy and host their Next.js-based applications.

The Amplify GLI now allows developers to make multiple DynamoDB GSI updates in a single deployment. This can help accelerate data model iterations. Additionally, the data management experience in the Amplify Admin UI launched at AWS re:Invent 2020 is now generally available.

AWS Serverless Application Model (AWS SAM)

AWS SAM has a public preview of support for local development and testing of AWS Cloud Development Kit (AWS CDK) projects.

To learn more:

Serverless blog posts

Operating Lambda

The “Operating Lambda” blog series includes the following posts in this quarter:

Streaming data

The “Building serverless applications with streaming data” blog series shows how to use Lambda with Kinesis.

Getting started with serverless for developers

Learn how to build serverless applications from your local integrated development environment (IDE).

April

May

June

Tech Talks & Events

We hold AWS Online Tech Talks covering serverless topics throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page. We also regularly deliver talks at conferences and events around the world, speak on podcasts, and record videos you can find to learn in bite-sized chunks.

Here are some from Q2:

Serverless Live was a day of talks held on May 19, featuring the serverless developer advocacy team, along with Adrian Cockroft and Jeff Barr. You can watch a replay of all the talks on the AWS Twitch channel.

Videos

YouTube ServerlessLand channel

Serverless Office Hours – Tues 10 AM PT / 1PM EST

Weekly live virtual office hours. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues. Ask us anything you want about serverless technologies and applications.

YouTube: youtube.com/serverlessland
Twitch: twitch.tv/aws

April

May

June

DynamoDB Office Hours

Are you an Amazon DynamoDB customer with a technical question you need answered? If so, join us for weekly Office Hours on the AWS Twitch channel led by Rick Houlihan, AWS principal technologist and Amazon DynamoDB expert. See upcoming and previous shows

Learning Path – AWS Lambda Extensions: The deep dive

Are you looking for a way to more easily integrate AWS Lambda with your favorite monitoring, observability, security, governance, and other tools? Welcome to AWS Lambda extensions: The deep dive, a learning path video series that shows you everything about augmenting Lambda functions using Lambda extensions.

There are also other helpful videos covering serverless available on the Serverless Land YouTube channel.

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

Building well-architected serverless applications: Managing application security boundaries – part 1

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-managing-application-security-boundaries-part-1/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

Security question SEC2: How do you manage your serverless application’s security boundaries?

Defining and securing your serverless application’s boundaries ensures isolation for, within, and between components.

Required practice: Evaluate and define resource policies

Resource policies are AWS Identity and Access Management (IAM) statements. They are attached to resources such as an Amazon S3 bucket, or an Amazon API Gateway REST API resource or method. The policies define what identities have fine-grained access to the resource. To see which services support resource-based policies, see “AWS Services That Work with IAM”. For more information on how resource policies and identity policies are evaluated, see “Identity-Based Policies and Resource-Based Policies”.

Understand and determine which resource policies are necessary

Resource policies can protect a component by restricting inbound access to managed services. Use resource policies to restrict access to your component based on a number of identities, such as the source IP address/range, function event source, version, alias, or queues. Resource policies are evaluated and enforced at IAM level before each AWS service applies it’s own authorization mechanisms, when available. For example, IAM resource policies for API Gateway REST APIs can deny access to an API before an AWS Lambda authorizer is called.

If you use multiple AWS accounts, you can use AWS Organizations to manage and govern individual member accounts centrally. Certain resource policies can be applied at the organizations level, providing guardrail for what actions AWS accounts within the organization root or OU can do. For more information see, “Understanding how AWS Organization Service Control Policies work”.

Review your existing policies and how they’re configured, paying close attention to how permissive individual policies are. Your resource policies should only permit necessary callers.

Implement resource policies to prevent unauthorized access

For Lambda, use resource-based policies to provide fine-grained access to what AWS IAM identities and event sources can invoke a specific version or alias of your function. Resource-based policies can also be used to control access to Lambda layers. You can combine resource policies with Lambda event sources. For example, if API Gateway invokes Lambda, you can restrict the policy to the API Gateway ID, HTTP method, and path of the request.

In the serverless airline example used in this series, the IngestLoyalty service uses a Lambda function that subscribes to an Amazon Simple Notification Service (Amazon SNS) topic. The Lambda function resource policy allows SNS to invoke the Lambda function.

Lambda resource policy document

Lambda resource policy document

API Gateway resource-based policies can restrict API access to specific Amazon Virtual Private Cloud (VPC), VPC endpoint, source IP address/range, AWS account, or AWS IAM users.

Amazon Simple Queue Service (SQS) resource-based policies provide fine-grained access to certain AWS services and AWS IAM identities (users, roles, accounts). Amazon SNS resource-based policies restrict authenticated and non-authenticated actions to topics.

Amazon DynamoDB resource-based policies provide fine-grained access to tables and indexes. Amazon EventBridge resource-based policies restrict AWS identities to send and receive events including to specific event buses.

For Amazon S3, use bucket policies to grant permission to your Amazon S3 resources.

The AWS re:Invent session Best practices for growing a serverless application includes further suggestions on enforcing security best practices.

Best practices for growing a serverless application

Best practices for growing a serverless application

Good practice: Control network traffic at all layers

Apply controls for controlling both inbound and outbound traffic, including data loss prevention. Define requirements that help you protect your networks and protect against exfiltration.

Use networking controls to enforce access patterns

API Gateway and AWS AppSync have support for AWS Web Application Firewall (AWS WAF) which helps protect web applications and APIs from attacks. AWS WAF enables you to configure a set of rules called a web access control list (web ACL). These allow you to block, or count web requests based on customizable web security rules and conditions that you define. These can include specified IP address ranges, CIDR blocks, specific countries, or Regions. You can also block requests that contain malicious SQL code, or requests that contain malicious script. For more information, see How AWS WAF Works.

private API endpoint is an API Gateway interface VPC endpoint that can only be accessed from your Amazon Virtual Private Cloud (Amazon VPC). This is an elastic network interface that you create in a VPC. Traffic to your private API uses secure connections and does not leave the Amazon network, it is isolated from the public internet. For more information, see “Creating a private API in Amazon API Gateway”.

To restrict access to your private API to specific VPCs and VPC endpoints, you must add conditions to your API’s resource policy. For example policies, see the documentation.

By default, Lambda runs your functions in a secure Lambda-owned VPC that is not connected to your account’s default VPC. Functions can access anything available on the public internet. This includes other AWS services, HTTPS endpoints for APIs, or services and endpoints outside AWS. The function cannot directly connect to your private resources inside of your VPC.

You can configure a Lambda function to connect to private subnets in a VPC in your account. When a Lambda function is configured to use a VPC, the Lambda function still runs inside the Lambda service VPC. The function then sends all network traffic through your VPC and abides by your VPC’s network controls. Functions deployed to virtual private networks must consider network access to restrict resource access.

AWS Lambda service VPC with VPC-to-VPT NAT to customer VPC

AWS Lambda service VPC with VPC-to-VPT NAT to customer VPC

When you connect a function to a VPC in your account, the function cannot access the internet, unless the VPC provides access. To give your function access to the internet, route outbound traffic to a NAT gateway in a public subnet. The NAT gateway has a public IP address and can connect to the internet through the VPC’s internet gateway. For more information, see “How do I give internet access to my Lambda function in a VPC?”. Connecting a function to a public subnet doesn’t give it internet access or a public IP address.

You can control the VPC settings for your Lambda functions using AWS IAM condition keys. For example, you can require that all functions in your organization are connected to a VPC. You can also specify the subnets and security groups that the function’s users can and can’t use.

Unsolicited inbound traffic to a Lambda function isn’t permitted by default. There is no direct network access to the execution environment where your functions run. When connected to a VPC, function outbound traffic comes from your own network address space.

You can use security groups, which act as a virtual firewall to control outbound traffic for functions connected to a VPC. Use security groups to permit your Lambda function to communicate with other AWS resources. For example, a security group can allow the function to connect to an Amazon ElastiCache cluster.

To filter or block access to certain locations, use VPC routing tables to configure routing to different networking appliances. Use network ACLs to block access to CIDR IP ranges or ports, if necessary. For more information about the differences between security groups and network ACLs, see “Compare security groups and network ACLs.”

In addition to API Gateway private endpoints, several AWS services offer VPC endpoints, including Lambda. You can use VPC endpoints to connect to AWS services from within a VPC without an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.

Using tools to audit your traffic

When you configure a Lambda function to use a VPC, or use private API endpoints, you can use VPC Flow Logs to audit your traffic. VPC Flow Logs allow you to capture information about the IP traffic going to and from network interfaces in your VPC. Flow log data can be published to Amazon CloudWatch Logs or S3 to see where traffic is being sent to at a granular level. Here are some flow log record examples. For more information, see “Learn from your VPC Flow Logs”.

Block network access when required

In addition to security groups and network ACLs, third-party tools allow you to disable outgoing VPC internet traffic. These can also be configured to allow traffic to AWS services or allow-listed services.

Conclusion

Managing your serverless application’s security boundaries ensures isolation for, within, and between components. In this post, I cover how to evaluate and define resource policies, showing what policies are available for various serverless services. I show some of the features of AWS WAF to protect APIs. Then I review how to control network traffic at all layers. I explain how Lambda functions connect to VPCs, and how to use private APIs and VPC endpoints. I walk through how to audit your traffic.

This well-architected question will be continued where I look at using temporary credentials between resources and components. I cover why smaller, single purpose functions are better from a security perspective, and how to audit permissions. I show how to use AWS Serverless Application Model (AWS SAM) to create per-function IAM roles.

For more serverless learning resources, visit https://serverlessland.com.

Create a serverless feedback collector application using Amazon Pinpoint’s two-way SMS functionality

Post Syndicated from Murat Balkan original https://aws.amazon.com/blogs/messaging-and-targeting/create-a-serverless-feedback-collector-application-by-using-amazon-pinpoints-two-way-sms-functionality/

Introduction

Two-way SMS communication is used by many companies to create interactive engagements with their customers. Traditional SMS notifications are one-way. While this is valid for many different use cases like one-time passwords (OTP) notifications and security notifications or reminders, some other use-cases may benefit from collecting information from the same channel. Two-way SMS allows customers to create this feedback mechanism and enhance business interactions and overall customer experience.

SMS is chosen for its simplicity and availability across different sets of devices. By combining the two-way SMS mechanism with the vast breadth of services Amazon Web Services (AWS) offers, companies can create effective architectures to better interact and serve their customers.

This blog post shows you how a serverless online appointment application can use Amazon Pinpoint’s two-way SMS functionality to collect customer feedback for completed appointments. You will learn how Amazon Pinpoint interacts with other AWS serverless services with its out-of-the-box integrations to create a scalable messaging application.

Architecture

By completing the steps in this post, you can create a system that uses the architecture illustrated in the following image:

The architecture of a feedback collector application that is composed of serverless AWS services

The flow of events starts when a Amazon DynamoDB table item, representing an online appointment, changes its status to COMPLETED. An AWS Lambda function which is subscribed to these changes over DynamoDB Streams detects this change and sends an SMS to the customer by using Amazon Pinpoint API’s sendMessages operation.

Amazon Pinpoint delivers the SMS to the recipient and generates a unique message ID to the AWS Lambda function. The Lambda function then adds this message ID to a DynamoDB table called “message-lookup”. This table is used for tracking different feedback requests sent during a multi-step conversation and associate them with the appointment ids. At this stage, the Lambda function also populates another table “feedbacks” which will hold the feedback responses that will be sent as SMS reply messages.

Each time a recipient replies to an SMS, Amazon Pinpoint publishes this reply event to an Amazon SNS topic which is subscribed by an Amazon SQS queue. Amazon Pinpoint will also add a messageId to this event which allows you to bind it to a sendMessages operation call.

A second AWS Lambda function polls these reply events from the Amazon SQS queue. It checks whether the reply is in the correct format (i.e. a number) and also associated with a previous request. If all conditions are met, the AWS Lambda function checks the ConversationStage attribute’s value from its message-lookup table. According to the current stage and the SMS answer received, AWS Lambda function will determine the next step.

For example, if the feedback score received is less than 5, a follow-up SMS is sent to the user asking if they’ll be happy to receive a call from the customer support team.

All SMS replies from the users are reflected to “feedbacks” table for further analysis.

Deploying the Sample Application

  1. Clone this GitHub repository to your local machine and install and configure AWS SAM with a test AWS IAM user.

You will use AWS SAM to deploy the remaining parts of this serverless architecture.

The AWS SAM template creates the following resources:

    • An Amazon DynamoDB table (appointments) that contains information about appointments, customers and their appointment status.
    • An Amazon DynamoDB table (feedbacks) that holds the received feedbacks from customers.
    • An Amazon DynamoDB table (message-lookup) that holds the Amazon Pinpoint message ids and associate them to appointments to track a multi-step conversation.
    • Two AWS Lambda functions (FeedbackSender and FeedbackReceiver)
    • An Amazon SNS topic that collects state change events from Amazon Pinpoint.
    • An Amazon SQS queue that queues the incoming messages.
    • An Amazon Pinpoint Application with an associated SMS channel.

This architecture consists of two Lambda functions, which are represented as two different apps in the AWS SAM template. These functions are named FeedbackSender and FeedbackReceiver. The FeedbackSender function listens the Amazon DynamoDB Stream associated with the appointments table and sends the SMS message requesting a feedback. Second Lambda function, FeedbackReceiver, polls the Amazon SQS queue and updates the feedbacks table in Amazon DynamoDB. (pinpoint-two-way-sms)

          Note: You’ll incur some costs by deploying this stack into your account.

  1. To start the SAM deployment, navigate to the root directory of the repository you downloaded and where the template.yaml AWS SAM template resides. AWS SAM also requires you to specify an Amazon Simple Storage Service (Amazon S3) bucket to hold the deployment artifacts. If you haven’t already created a bucket for this purpose, create one now. The bucket should have read and write access by an AWS Identity and Access Management (IAM) user.

At the command line, enter the following command to package the application:

sam package --template template.yaml --output-template-file output_template.yaml --s3-bucket BUCKET_NAME_HERE

In the preceding command, replace BUCKET_NAME_HERE with the name of the Amazon S3 bucket that should hold the deployment artifacts.

AWS SAM packages the application and copies it into this Amazon S3 bucket.

When the AWS SAM package command finishes running, enter the following command to deploy the package:

sam deploy --template-file output_template.yaml --stack-name BlogStackPinpoint --capabilities CAPABILITY_IAM

When you run this command, AWS SAM shows the progress of the deployment. When the deployment finishes, navigate to the Amazon Pinpoint console and choose the project named “BlogApplication”. This example uses “BlogStackPinpoint” as the stack name, you can change this to any other name you want.

  1. From the left navigation, choose Settings, SMS and voice. On the SMS and voice settings page, choose the Request phone number button under Number settings

Screenshot of request phone number screen

  1. Choose a target country. Set the Default message type as Transactional, and click on the Request long codes button to buy a long code.

Note: In United States, you can also request a Toll Free Number(TFN)

Screenshot showing long code additio

A long code will be added to the Number settings list.

  1. Choose the newly added number to reach the SMS Settings page and enable the option Enable two-way-SMS. At the Incoming messages destination, select Choose an existing SNS topic, and from the drop down select the Amazon SNS topic that was created by the BlogStackPinpoint stack.

Choose Save to save your SMS settings.

 

Testing the Sample Application

Now that the application is deployed and configured, test it by creating sample records in the Amazon DynamoDB table. Navigate to Amazon DynamoDB console and reach the tables view. Inspect the tables that were created by the AWS SAM application.

Here, appointments table is the table where the appointments and their statuses are kept. It tracks the appointment lifecycle events with items identified by unique ids. In this sample scenario, we are assuming that an appointment application creates a record with ‘CREATED’ status when a new appointment is planned. After the appointment is finished, same application updates the status to ‘COMPLETED’ which will trigger the feedback collection process. Feedback results are collected in the feedbacks table. Amazon Pinpoint message id’s, conversation stage and appointment id’s are kept in the message-lookup table.

  1. To start testing the end-to-end flow, choose the appointments table to open table overview page.
  2. Next, select the Items tab and choose the Create item From the dropdown, select Text. Add the following and choose Save to create your first appointment object. While adding the following object, replace CustomerPhone attribute’s value with a phone number you own. The feedback request messages will be delivered to that number. Note: This number should match the country number for the long code you provisioned.

{

"CustomerName": "Customer A",

"CustomerPhone": "+12345678900",

"AppointmentStatus":"CREATED",

"id": "1"

}

  1. To trigger sending the feedback SMS, you need to set an existing item’s status to “COMPLETED” To do this, select the item and click Edit from the Actions menu.

Replace the item’s current JSON with the following.

{

"AppointmentStatus": "COMPLETED",

"CustomerName": "Customer A",

"CustomerPhone": "+12345678900",

"id": "1"

}

  1. Before choosing the Save button, double check that you have set CustomerPhone attribute’s value to a valid phone number.

After the change, you should receive an SMS message asking for a feedback. Provide a numeric reply of that is less than five to this message. This will trigger a follow up question asking for a consent to receive an in-person callback.

 

During your SMS conversation with the application, inspect the feedbacks table. The feedback you have given over this two-way SMS channel should have been reflected into the table.

If you want to repeat the process, make sure to increment the AppointmentId field for any additional appointment records.

Cleanup

To clean up the resources you used in your account, simply navigate to AWS Cloudformation console and delete the stack named “BlogStackPinpoint”.

After the stack is deleted, you also need to delete the Long code from the Pinpoint Console by choosing the number and pressing Remove phone number button. You can also delete the Amazon S3 bucket you used for packaging and deploying the AWS SAM application.

Conclusion

This architecture shows how Amazon Pinpoint can be used to make two-way SMS communication with your customers. You can implement Two-way SMS functionality in other use cases such as appointment reminders, polls, Q&A services, and more.

To learn more about Pinpoint and it’s two-way SMS mechanism, you can visit the Pinpoint documentation.

 

Building Multi-partner integration on AWS using Event-Driven Architecture

Post Syndicated from Vivek Kant original https://aws.amazon.com/blogs/architecture/building-multi-partner-integration-on-aws-using-event-driven-architecture/

Summary

Finserv MARKETS enables customers to buy financial services products such as credit cards, loans, insurance, and investments from various partners. Finserv integrates with a large number of partners in real time to provide services to customers.

Each partner has their own semantic APIs which can pose a challenge. There are also issues of latency and failures. We’ll show how our solution based on Event Driven Architecture (EDA) on AWS with Reactive design offers a solution to address these challenges.

Challenges with multi-partner integration

  • Latency – Making API calls to multiple partners increases latency of a service even in the scenario in which the calls are made in parallel.
  • Timeouts – If a partner’s APIs time out, this could impact overall service performance and availability.
  • Failures – A partner API failure could lead to the failure or major performance degradation of the overall service.
  • Customer Experience – Preserving the customer experience when depending on partner integration is a challenge, considering these technical issues.

Conceptual solution

In order to build this multi-partner platform, we discussed using traditional command-driven synchronous architecture. We also considered adopting an Event-Driven Architecture (EDA). Building our solution on EDA enabled us to deliver consistent customer experience while addressing failures and performance issues from partner APIs.

The diagram following shows a conceptual view of the solution:

Event Driven Conceptual view

Figure 1 – Conceptual View of Our Solution

The key components of the solution are:

1. User Interface: It is the single page application running on the user’s browser. For example, the UI makes an API call to business services to calculate insurance premiums with required parameters and receives a unique identifier. This enables the UI to be reactive and display the responses from the partner as they arrive.

2. Business Service: A microservice which provides APIs for:

• The user interface to submit and generate events for request for partner offer

• A callback to submit the response from the partner integration service in a reactive manner

3. Event Bus: The event bus infrastructure enables transportation, routing, and delivery of events to the right services. The business service raises a set of events to the event bus for a quote request. There is one event for each partner that is listened to by the reactive services.

4. Reactive Services: Services that consume events and call partner integration services for the calling partner API. On receiving the partner response, it calls the callback API on the business service. These services are organized by product domain (for example Motor Insurance).

5. Partner Integration Service: This service is responsible for integration with partners. The service translates the canonical request to partner-specific API calls. The service implements partner-specific security and error handling. There is one partner integration service per partner.

Realizing this solution on AWS

Amazon Web Services (AWS) offers cloud-native services enabling us to realize this solution.

Event Driven Architecture

Figure 2 – Realizing our solution on AWS

We used the following services to implement this architecture:

User Interface
We built the load balancing interface in Angular and host it on Apache Web Servers running Amazon Elastic Container Service (Amazon ECS). Elastic Load Balancing (ELB) distributes traffic evenly across the containers. Running a container gives us the required flexibility and scalability needed.

Business Service
The business service is a microservice built using Spring Boot and running on Amazon ECS containers. It uses an ELB used for service load balancing. The choice of Spring Boot and Amazon ECS gives flexibility and scalability through cluster auto scaling.

Data Store
We use polyglot architecture for data storage. For different product journeys and depending on data lifecycle, we choose either Amazon Aurora Postgres or Amazon ElastiCache for Redis. This gives us the right mix of performance and required durability for each business use case.

Event Bus
We evaluated Amazon Kinesis and Amazon Simple Notification Service (Amazon SNS) and came to the conclusion that for our volume and use cases, Amazon SNS offers the right capability. We implemented this by defining topics, for example, a four-wheeler insurance quote. This topic is subscribed by reactive services for each partner integration, where the partner service is called and a quote is generated. For downstream functionality where the API is to be called of the selected partner (for example, insurance policy issuance or credit card application submission), we chose Amazon Simple Queue Service (SQS). Amazon SQS provides simple queuing for asynchronous processing.

Reactive Services
These services are built using two different technologies. Spring Boot microservice running in an Amazon ECS container and AWS Lambda functions. Spring Boot is chosen due to our team’s familiarity of this technology; however, our plan is to move completely towards usage of Lambda functions for all asynchronous reactive services.

Partner Integration Service
Partner integration services provides the abstraction layer between partner API and canonical API, and is called by reactive services synchronously. In some cases, the error is passed back to reactive service to decide on retry. In other cases, for example, a policy issuance API retry is built into this service using exponential backoff strategy.

Partner API tracking
Partner API tracking gives us the right way of tracking the partner request and proactively address failures. We use Amazon Elasticsearch Service and Kibana for tracking. We can implement a circuit breaker pattern to shut down any partner for a period of time should failures reach a given threshold.

How this solution addresses multi-partner integration

This solution is built on the foundation of Event Driven Architecture with reactive design and is able to address the following challenges:

Latency
Since the user interface asynchronously polls for the response, it receives the partner response as soon as it arrives. This makes the response latency on the fastest partner API rather than slowest.

Timeout
Timeouts are set in the UI for polling, so if a partner API doesn’t respond, we time it out without any degradation. These timeouts are set based on the established user experience benchmarks. For example, how long do we let a user wait to see all insurance quotes before degrading their experience?

Failures
In case of an API failure, the UI will time out. Our API tracking can also enable a circuit breaker pattern to take a partner offline in the event of persistent failures.

Customer Experience
The solution gives a consistent experience to the user. Users can make their choice from the partner quotes/offers in a reactive way as received, rather than waiting for all offers to be shown. This design meets our customer requirements, derived from our Voice of Customer (VoC) study sessions.

Conclusion

In the digital space, APIs are the most common mechanism for system integration. Building a solution that is scalable, resilient, and provides the best customer experience is challenging. Event Driven Architecture with Reactive design offers a solution to address these issues. We process over 5000+ requests every day in Insurance and Credit domains. We’ve been able to achieve required availability of over 99.9%, while maintaining a positive customer experience on this platform.

Introducing message archiving and analytics for Amazon SNS

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-message-archiving-and-analytics-for-amazon-sns/

This blog post is courtesy of Sebastian Caceres (AWS Consultant, DevOps), Otavio Ferreira (Sr. Manager, Amazon SNS), Prachi Sharma and Mary Gao (Software Engineers, Amazon SNS).

Today, we are announcing the release of a message delivery protocol for Amazon SNS based on Amazon Kinesis Data Firehose. This is a new way to integrate SNS with storage and analytics services, without writing custom code.

SNS provides topics for push-based, many-to-many pub/sub messaging to help you decouple distributed systems, microservices, and event-driven serverless applications. As applications grow, so does the need to archive messages to meet compliance goals. These archives can also provide important operational and business insights.

Previously, custom code was required to create data pipelines, using general-purpose SNS subscription endpoints, such as Amazon SQS queues or AWS Lambda functions. You had to manage data transformation, data buffering, data compression, and the upload to data stores.

Overview

With the new native integration between SNS and Kinesis Data Firehose, you can send messages to storage and analytics services, using a purpose-built SNS subscription type.

Once you configure a subscription, messages published to the SNS topic are sent to the subscribed Kinesis Data Firehose delivery stream. The messages are then delivered to the destination endpoint configured in the delivery stream, which can be an Amazon S3 bucket, an Amazon Redshift table, or an Amazon Elasticsearch Service index.

You can also use a third-party service provider as the destination of a delivery stream, including Datadog, New Relic, MongoDB, and Splunk. No custom code is required to bridge the services. For more information, see Fanout to Kinesis Data Firehose streams, in the SNS Developer Guide.

Amazon SNS subscriber types with Amazon Kinesis Data Firehose.

The new Kinesis Data Firehose subscription type and its destinations are part of the application-to-application (A2A) messaging offering of SNS. The addition of this subscription type expands the SNS A2A offering to include the following use cases:

  • Run analytics on SNS messages, using Amazon Kinesis Data Analytics, Amazon Elasticsearch Service, or Amazon Redshift as a delivery stream destination. You can use this option to gain insights and detect anomalies in workloads.
  • Index and search SNS messages, using Amazon Elasticsearch Service as a delivery stream destination. From there, you can create dashboards using Kibana, a data visualization and exploration tool.
  • Store SNS messages for backup and auditing purposes, using S3 as a destination of choice. You can then use Amazon Athena to query the S3 bucket for analytics purposes.
  • Apply transformation to SNS messages. For example, you may obfuscate personally identifiable information (PII) or protected health information (PHI) using a Lambda function invoked by the delivery stream.
  • Feed SNS messages into cloud-based application monitoring and observability tools, using Datadog, New Relic, or Splunk as a destination. You can choose this option to enrich DevOps or marketing workflows.

As with all supported message delivery protocols, you can filter, monitor, and encrypt messages.

To simplify architecture and further avoid custom code, you can use an SNS subscription filter policy. This enables you to route only the relevant subset of SNS messages to the Kinesis Data Firehose delivery stream. For more information, see SNS message filtering.

To monitor the throughput, you can check the NumberOfMessagesPublished and the NumberOfNotificationsDelivered metrics for SNS, and the IncomingBytes, IncomingRecords, DeliveryToS3.Records and DeliveryToS3.Success metrics for Kinesis Data Firehose. For additional information, see Monitoring SNS topics using CloudWatch and Monitoring Kinesis Data Firehose using CloudWatch.

For security purposes, you can choose to have data encrypted at rest, using server-side encryption (SSE), in addition to encrypted in transit using HTTPS. For more information, see SNS SSE, Kinesis Data Firehose SSE, and S3 SSE.

Applying SNS message archiving and analytics in a use case

For example, consider an airline ticketing platform that operates in a regulated environment. The compliance framework requires that the company archives all ticket sales for at least 5 years.

Example architecture of a flight ticket selling platform.

The platform is based on an event-driven serverless architecture. It has a ticket seller Lambda function that publishes an event to an SNS topic for every ticket sold. The SNS topic fans out the event to subscribed systems that are interested in processing this type of event. In the preceding diagram, two systems are interested: one focused on payment processing, and another on fraud control. Each subscribed system is invoked by an SQS queue and an event processing Lambda function.

To meet the compliance goal on data retention, the airline company subscribes a Kinesis Data Firehose delivery stream to their existing SNS topic. They use an S3 bucket as the stream destination. After this, all events published to the SNS topic are archived in the S3 bucket.

The company can then use Athena to query the S3 bucket with standard SQL to run analytics and gain insights on ticket sales. For example, they can query for the most popular flight destinations or the most frequent flyers.

Subscribing a Kinesis Data Firehose stream to an SNS topic

You can set up a Kinesis Data Firehose subscription to an SNS topic using the AWS Management Console, the AWS CLI, or the AWS SDKs. You can also use AWS CloudFormation to automate the provisioning of these resources.

We use CloudFormation for this example. The provided CloudFormation template creates the following resources:

  • An SNS topic
  • An S3 bucket
  • A Kinesis Data Firehose delivery stream
  • A Kinesis Data Firehose subscription in SNS
  • Two SQS subscriptions in SNS
  • Two IAM roles with access to deliver messages:
    • From SNS to Kinesis Data Firehose
    • From Kinesis Data Firehose to S3

To provision the infrastructure, use the following template:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: Template for creating an SNS archiving use case
Resources:
  ticketUploadStream:
    DependsOn:
    - ticketUploadStreamRolePolicy
    Type: AWS::KinesisFirehose::DeliveryStream
    Properties:
      S3DestinationConfiguration:
        BucketARN: !Sub 'arn:${AWS::Partition}:s3:::${ticketArchiveBucket}'
        BufferingHints:
          IntervalInSeconds: 60
          SizeInMBs: 1
        CompressionFormat: UNCOMPRESSED
        RoleARN: !GetAtt ticketUploadStreamRole.Arn
  ticketArchiveBucket:
    Type: AWS::S3::Bucket
  ticketTopic:
    Type: AWS::SNS::Topic
  ticketPaymentQueue:
    Type: AWS::SQS::Queue
  ticketFraudQueue:
    Type: AWS::SQS::Queue
  ticketQueuePolicy:
    Type: AWS::SQS::QueuePolicy
    Properties:
      PolicyDocument:
        Statement:
          Effect: Allow
          Principal:
            Service: sns.amazonaws.com
          Action:
            - sqs:SendMessage
          Resource: '*'
          Condition:
            ArnEquals:
              aws:SourceArn: !Ref ticketTopic
      Queues:
        - !Ref ticketPaymentQueue
        - !Ref ticketFraudQueue
  ticketUploadStreamSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      TopicArn: !Ref ticketTopic
      Endpoint: !GetAtt ticketUploadStream.Arn
      Protocol: firehose
      SubscriptionRoleArn: !GetAtt ticketUploadStreamSubscriptionRole.Arn
  ticketPaymentQueueSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      TopicArn: !Ref ticketTopic
      Endpoint: !GetAtt ticketPaymentQueue.Arn
      Protocol: sqs
  ticketFraudQueueSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      TopicArn: !Ref ticketTopic
      Endpoint: !GetAtt ticketFraudQueue.Arn
      Protocol: sqs
  ticketUploadStreamRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Sid: ''
          Effect: Allow
          Principal:
            Service: firehose.amazonaws.com
          Action: sts:AssumeRole
  ticketUploadStreamRolePolicy:
    Type: AWS::IAM::Policy
    Properties:
      PolicyName: FirehoseticketUploadStreamRolePolicy
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Action:
          - s3:AbortMultipartUpload
          - s3:GetBucketLocation
          - s3:GetObject
          - s3:ListBucket
          - s3:ListBucketMultipartUploads
          - s3:PutObject
          Resource:
          - !Sub 'arn:aws:s3:::${ticketArchiveBucket}'
          - !Sub 'arn:aws:s3:::${ticketArchiveBucket}/*'
      Roles:
      - !Ref ticketUploadStreamRole
  ticketUploadStreamSubscriptionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - sns.amazonaws.com
          Action:
          - sts:AssumeRole
      Policies:
      - PolicyName: SNSKinesisFirehoseAccessPolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Action:
            - firehose:DescribeDeliveryStream
            - firehose:ListDeliveryStreams
            - firehose:ListTagsForDeliveryStream
            - firehose:PutRecord
            - firehose:PutRecordBatch
            Effect: Allow
            Resource:
            - !GetAtt ticketUploadStream.Arn

To test, publish a message to the SNS topic. After the delivery stream buffer interval of 60 seconds, the message appears in the destination S3 bucket. For information on message formats, see Amazon SNS message formats in Amazon Kinesis Data Firehose destinations.

Cleaning up

After testing, avoid incurring usage charges by deleting the resources you created during the walkthrough. If you used the CloudFormation template, delete all the objects from the S3 bucket before deleting the stack.

Conclusion

In this post, we show how SNS delivery to Kinesis Data Firehose enables you to integrate SNS with storage and analytics services. The example shows how to create an SNS subscription to use a Kinesis Data Firehose delivery stream to store SNS messages in an S3 bucket.

You can adapt this configuration for your needs for storage, encryption, data transformation, and data pipeline architecture. For more information, see Fanout to Kinesis Data Firehose streams in the SNS Developer Guide.

For details on pricing, see SNS pricing and Kinesis Data Firehose pricing. For more serverless learning resources, visit Serverless Land.

Discovering sensitive data in AWS CodeCommit with AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/discovering-sensitive-data-in-aws-codecommit-with-aws-lambda-2/

This post is courtesy of Markus Ziller, Solutions Architect.

Today, git is a de facto standard for version control in modern software engineering. The workflows enabled by git’s branching capabilities are a major reason for this. However, with git’s distributed nature, it can be difficult to reliably remove changes that have been committed from all copies of the repository. This is problematic when secrets such as API keys have been accidentally committed into version control. The longer it takes to identify and remove secrets from git, the more likely that the secret has been checked out by another user.

This post shows a solution that automatically identifies credentials pushed to AWS CodeCommit in near-real-time. I also show three remediation measures that you can use to reduce the impact of secrets pushed into CodeCommit:

  • Notify users about the leaked credentials.
  • Lock the repository for non-admins.
  • Hard reset the CodeCommit repository to a healthy state.

I use the AWS Cloud Development Kit (CDK). This is an open source software development framework to model and provision cloud application resources. Using the CDK can reduce the complexity and amount of code needed to automate the deployment of resources.

Overview of solution

The services in this solution are AWS Lambda, AWS CodeCommit, Amazon EventBridge, and Amazon SNS. These services are part of the AWS serverless platform. They help reduce undifferentiated work around managing servers, infrastructure, and the parts of the application that add less value to your customers. With serverless, the solution scales automatically, has built-in high availability, and you only pay for the resources you use.

Solution architecture

This diagram outlines the workflow implemented in this blog:

  1. After a developer pushes changes to CodeCommit, it emits an event to an event bus.
  2. A rule defined on the event bus routes this event to a Lambda function.
  3. The Lambda function uses the AWS SDK for JavaScript to get the changes introduced by commits pushed to the repository.
  4. It analyzes the changes for secrets. If secrets are found, it publishes another event to the event bus.
  5. Rules associated with this event type then trigger invocations of three Lambda functions A, B, and C with information about the problematic changes.
  6. Each of the Lambda functions runs a remediation measure:
    • Function A sends out a notification to an SNS topic that informs users about the situation (A1).
    • Function B locks the repository by setting a tag with the AWS SDK (B2). It sends out a notification about this action (B2).
    • Function C runs git commands that remove the problematic commit from the CodeCommit repository (C2). It also sends out a notification (C1).

Walkthrough

The following walkthrough explains the required components, their interactions and how the provisioning can be automated via CDK.

For this walkthrough, you need:

Checkout and deploy the sample stack:

  1. After completing the prerequisites, clone the associated GitHub repository by running the following command in a local directory:
    git clone [email protected]:aws-samples/discover-sensitive-data-in-aws-codecommit-with-aws-lambda.git
  2. Open the repository in a local editor and review the contents of cdk/lib/resources.ts, src/handlers/commits.ts, and src/handlers/remediations.ts.
  3. Follow the instructions in the README.md to deploy the stack.

The CDK will deploy resources for the following services in your account.

Using CodeCommit to manage your git repositories

The CDK creates a new empty repository called TestRepository and adds a tag RepoState with an initial value of ok. You later use this tag in the LockRepo remediation strategy to restrict access.

It also creates two IAM groups with one user in each. Members of the CodeCommitSuperUsers group are always able to access the repository, while members of the CodeCommitUsers group can only access the repository when the value of the tag RepoState is not locked.

I also import the CodeCommitSystemUser into the CDK. Since the user requires git credentials in a downloaded CSV file, it cannot be created by the CDK. Instead it must be created as described in the README file.

The following CDK code sets up all the described resources:

const TAG_NAME = "RepoState";

const superUsers = new Group(this, "CodeCommitSuperUsers", { groupName: "CodeCommitSuperUsers" });
superUsers.addUser(new User(this, "CodeCommitSuperUserA", {
    password: new Secret(this, "CodeCommitSuperUserPassword").secretValue,
    userName: "CodeCommitSuperUserA"
}));

const users = new Group(this, "CodeCommitUsers", { groupName: "CodeCommitUsers" });
users.addUser(new User(this, "User", {
    password: new Secret(this, "CodeCommitUserPassword").secretValue,
    userName: "CodeCommitUserA"
}));

const systemUser = User.fromUserName(this, "CodeCommitSystemUser", props.codeCommitSystemUserName);

const repo = new Repository(this, "Repository", {
    repositoryName: "TestRepository",
    description: "The repository to test this project out",
});
Tags.of(repo).add(TAG_NAME, "ok");

users.addToPolicy(new PolicyStatement({
    effect: Effect.ALLOW,
    actions: ["*"],
    resources: [repo.repositoryArn],
    conditions: {
        StringNotEquals: {
            [`aws:ResourceTag/${TAG_NAME}`]: "locked"
        }
    }
}));

superUsers.addToPolicy(new PolicyStatement({
    effect: Effect.ALLOW,
    actions: ["*"],
    resources: [repo.repositoryArn]
}));

Using EventBridge to pass events between components

I use EventBridge, a serverless event bus, to connect the Lambda functions together. Many AWS services like CodeCommit are natively integrated into EventBridge and publish events about changes in their environment.

repo.onCommit is a higher-level CDK construct. It creates the required resources to invoke a Lambda function for every commit to a given repository. The created events rule looks like this:

EventBridge rule definition

Note that this event rule only matches commit events in TestRepository. To send commits of all repositories in that account to the inspecting Lambda function, remove the resources filter in the event pattern.

CodeCommit Repository State Change is a default event that is published by CodeCommit if changes are made to a repository. In addition, I define CodeCommit Security Event, a custom event, which Lambda publishes to the same event bus if secrets are discovered in the inspected code.

The sample below shows how you can set up Lambda functions as targets for both type of events.

const DETAIL_TYPE = "CodeCommit Security Event";
const eventBus = new EventBus(this, "CodeCommitEventBus", {
    eventBusName: "CodeCommitSecurityEvents"
});

repo.onCommit("AnyCommitEvent", {
    ruleName: "CallLambdaOnAnyCodeCommitEvent",
    target: new targets.LambdaFunction(commitInspectLambda)
});


new Rule(this, "CodeCommitSecurityEvent", {
    eventBus,
    enabled: true,
    ruleName: "CodeCommitSecurityEventRule",
    eventPattern: {
        detailType: [DETAIL_TYPE]
    },
    targets: [
        new targets.LambdaFunction(lockRepositoryLambda),
        new targets.LambdaFunction(raiseAlertLambda),
        new targets.LambdaFunction(forcefulRevertLambda)
    ]
});

Using Lambda functions to run remediation measures

AWS Lambda functions allow you to run code in response to events. The example defines four Lambda functions.

By comparing the delta to its predecessor, the commitInspectLambda function analyzes if secrets are introduced by a commit. With the CDK, you can create a Lambda function with:

const myLambdaInCDK = new Function(this, "UniqueIdentifierRequiredByCDK", {
    runtime: Runtime.NODEJS_12_X,
    handler: "<handlerfile>.<function name>",
    code: Code.fromAsset(path.join(__dirname, "..", "..", "src", "handlers")),
    // See git repository for complete code
});

The code for this Lambda function uses the AWS SDK for JavaScript to fetch the details of the commit, the differences introduced, and the new content.

The code checks each modified file line by line with a regular expression that matches typical secret formats. In src/handlers/regex.json, I provide a few regular expressions that match common secrets. You can extend this with your own patterns.

If a secret is discovered, a CodeCommit Security Event is published to the event bus. EventBridge then invokes all Lambda functions that are registered as targets with this event. This demo triggers three remediation measures.

The raiseAlertLambda function uses the AWS SDK for JavaScript to send out a notification to all subscribers (that is, CodeCommit administrators) on an SNS topic. It takes no further action.

SNS.publish({
    TopicArn: <TOPIC_ARN>,
    Subject: `[ACTION REQUIRED] Secrets discovered in <repo>`
    Message: `<Your message>
}

Notification about secrets discovered in a commit in TestRepository

The lockRepositoryLambda function uses the AWS SDK for JavaScript to change the RepoState tag from ok to locked. This restricts access to members of the CodeCommitSuperUsers IAM group.

CodeCommit.tagResource({
    resourceArn: event.detail.repositoryArn,
    tags: {
        RepoState: "locked"
    }
})

In addition, the Lambda function uses SNS to send out a notification. The forcefulRevertLambda function runs the following git commands:

git clone <repository>
git checkout <branch>
git reset –hard <previousCommitId>
git push origin <branch> --force

These commands reset the repository to the last accepted commit, by forcefully removing the respective commit from the git history of your CodeCommit repo. I advise you to handle this with care and only activate it on a real project if you fully understand the consequences of rewriting git history.

The Node.js v12 runtime for Lambda does not have a git runtime installed by default. You can add one by using the git-lambda2 Lambda layer. This allows you to run git commands from within the Lambda function.

Logs for the remediation measure Hard Reset

Finally, this Lambda function also sends out a notification. The complete code is available in the GitHub repo.

Using SNS to notify users

To notify users about secrets discovered and actions taken, you create an SNS topic and subscribe to it via email.

const topic = new Topic(this, "CodeCommitSecurityEventNotification", {
    displayName: "CodeCommitSecurityEventNotification",
});

topic.addSubscription(new subs.EmailSubscription(/* your email address */));

Testing the solution

You can test the deployed solution by running these two sets of commands. First, add a file with no credentials:

echo "Clean file - no credentials here" > clean_file.txt
git add clean_file.txt
git commit clean_file.txt -m "Adds clean_file.txt"
git push

Then add a file containing credentials:

SECRET_LIKE_STRING=$(cat /dev/urandom | env LC_CTYPE=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
echo "secret=$SECRET_LIKE_STRING" > problematic_file.txt
git add problematic_file.txt
git commit problematic_file.txt -m "Adds secret-like string to problematic_file.txt"
git push

This first command creates, commits and pushes an unproblematic file clean_file.txt that will pass the checks of commitInspectLambda. The second command creates, commits, and pushes problematic_file.txt, which matches the regular expressions and triggers the remediation measures.

If you check your email, you soon receive notifications about actions taken by the Lambda functions.

Cleaning up

To avoid incurring charges, delete the resources by running cdk destroy and confirming the deletion.

Conclusion

This post demonstrates how you can implement a solution to discover secrets in commits to AWS CodeCommit repositories. It also defines different strategies to remediate this.

The CDK code to set up all components is minimal and can be extended for remediation measures. The template is portable between Regions and uses serverless technologies to minimize cost and complexity.

For more serverless learning resources, visit Serverless Land.

ICYMI: Serverless pre:Invent 2020

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-preinvent-2020/

During the last few weeks, the AWS serverless team has been releasing a wave of new features in the build-up to AWS re:Invent 2020. This post recaps some of the most important releases for serverless developers.

re:Invent is virtual and free to all attendees in 2020 – register here. See the complete list of serverless sessions planned and join the serverless DA team live on Twitch. Also, follow your DAs on Twitter for live recaps and Q&A during the event.

AWS re:Invent 2020

AWS Lambda

We launched Lambda Extensions in preview, enabling you to more easily integrate monitoring, security, and governance tools into Lambda functions. You can also build your own extensions that run code during Lambda lifecycle events, and there is an example extensions repo for starting development.

You can now send logs from Lambda functions to custom destinations by using Lambda Extensions and the new Lambda Logs API. Previously, you could only forward logs after they were written to Amazon CloudWatch Logs. Now, logging tools can receive log streams directly from the Lambda execution environment. This makes it easier to use your preferred tools for log management and analysis, including Datadog, Lumigo, New Relic, Coralogix, Honeycomb, or Sumo Logic.

Lambda Extensions API

Lambda launched support for Amazon MQ as an event source. Amazon MQ is a managed broker service for Apache ActiveMQ that simplifies deploying and scaling queues. This integration increases the range of messaging services that customers can use to build serverless applications. The event source operates in a similar way to using Amazon SQS or Amazon Kinesis. In all cases, the Lambda service manages an internal poller to invoke the target Lambda function.

We also released a new layer to make it simpler to integrate Amazon CodeGuru Profiler. This service helps identify the most expensive lines of code in a function and provides recommendations to help reduce cost. With this update, you can enable the profiler by adding the new layer and setting environment variables. There are no changes needed to the custom code in the Lambda function.

Lambda announced support for AWS PrivateLink. This allows you to invoke Lambda functions from a VPC without traversing the public internet. It provides private connectivity between your VPCs and AWS services. By using VPC endpoints to access the Lambda API from your VPC, this can replace the need for an Internet Gateway or NAT Gateway.

For developers building machine learning inferencing, media processing, high performance computing (HPC), scientific simulations, and financial modeling in Lambda, you can now use AVX2 support to help reduce duration and lower cost. By using packages compiled for AVX2 or compiling libraries with the appropriate flags, your code can then benefit from using AVX2 instructions to accelerate computation. In the blog post’s example, enabling AVX2 for an image-processing function increased performance by 32-43%.

Lambda now supports batch windows of up to 5 minutes when using SQS as an event source. This is useful for workloads that are not time-sensitive, allowing developers to reduce the number of Lambda invocations from queues. Additionally, the batch size has been increased from 10 to 10,000. This is now the same as the batch size for Kinesis as an event source, helping Lambda-based applications process more data per invocation.

Code signing is now available for Lambda, using AWS Signer. This allows account administrators to ensure that Lambda functions only accept signed code for deployment. Using signing profiles for functions, this provides granular control over code execution within the Lambda service. You can learn more about using this new feature in the developer documentation.

Amazon EventBridge

You can now use event replay to archive and replay events with Amazon EventBridge. After configuring an archive, EventBridge automatically stores all events or filtered events, based upon event pattern matching logic. You can configure a retention policy for archives to delete events automatically after a specified number of days. Event replay can help with testing new features or changes in your code, or hydrating development or test environments.

EventBridge archived events

EventBridge also launched resource policies that simplify managing access to events across multiple AWS accounts. This expands the use of a policy associated with event buses to authorize API calls. Resource policies provide a powerful mechanism for modeling event buses across multiple account and providing fine-grained access control to EventBridge API actions.

EventBridge resource policies

EventBridge announced support for Server-Side Encryption (SSE). Events are encrypted using AES-256 at no additional cost for customers. EventBridge also increased PutEvent quotas to 10,000 transactions per second in US East (N. Virginia), US West (Oregon), and Europe (Ireland). This helps support workloads with high throughput.

AWS Step Functions

Synchronous Express Workflows have been launched for AWS Step Functions, providing a new way to run high-throughput Express Workflows. This feature allows developers to receive workflow responses without needing to poll services or build custom solutions. This is useful for high-volume microservice orchestration and fast compute tasks communicating via HTTPS.

The Step Functions service recently added support for other AWS services in workflows. You can now integrate API Gateway REST and HTTP APIs. This enables you to call API Gateway directly from a state machine as an asynchronous service integration.

Step Functions now also supports Amazon EKS service integration. This allows you to build workflows with steps that synchronously launch tasks in EKS and wait for a response. In October, the service also announced support for Amazon Athena, so workflows can now query data in your S3 data lakes.

These new integrations help minimize custom code and provide built-in error handling, parameter passing, and applying recommended security settings.

AWS SAM CLI

The AWS Serverless Application Model (AWS SAM) is an AWS CloudFormation extension that makes it easier to build, manage, and maintains serverless applications. On November 10, the AWS SAM CLI tool released version 1.9.0 with support for cached and parallel builds.

By using sam build --cached, AWS SAM no longer rebuilds functions and layers that have not changed since the last build. Additionally, you can use sam build --parallel to build functions in parallel, instead of sequentially. Both of these new features can substantially reduce the build time of larger applications defined with AWS SAM.

Amazon SNS

Amazon SNS announced support for First-In-First-Out (FIFO) topics. These are used with SQS FIFO queues for applications that require strict message ordering with exactly once processing and message deduplication. This is designed for workloads that perform tasks like bank transaction logging or inventory management. You can also use message filtering in FIFO topics to publish updates selectively.

SNS FIFO

AWS X-Ray

X-Ray now integrates with Amazon S3 to trace upstream requests. If a Lambda function uses the X-Ray SDK, S3 sends tracing headers to downstream event subscribers. With this, you can use the X-Ray service map to view connections between S3 and other services used to process an application request.

AWS CloudFormation

AWS CloudFormation announced support for nested stacks in change sets. This allows you to preview changes in your application and infrastructure across the entire nested stack hierarchy. You can then review those changes before confirming a deployment. This is available in all Regions supporting CloudFormation at no extra charge.

The new CloudFormation modules feature was released on November 24. This helps you develop building blocks with embedded best practices and common patterns that you can reuse in CloudFormation templates. Modules are available in the CloudFormation registry and can be used in the same way as any native resource.

Amazon DynamoDB

For customers using DynamoDB global tables, you can now use your own encryption keys. While all data in DynamoDB is encrypted by default, this feature enables you to use customer managed keys (CMKs). DynamoDB also announced support for global tables in the Europe (Milan) and Europe (Stockholm) Regions. This feature enables you to scale global applications for local access in workloads running in different Regions and replicate tables for higher availability and disaster recovery (DR).

The DynamoDB service announced the ability to export table data to data lakes in Amazon S3. This enables you to use services like Amazon Athena and AWS Lake Formation to analyze DynamoDB data with no custom code required. This feature does not consume table capacity and does not impact performance and availability. To learn how to use this feature, see this documentation.

AWS Amplify and AWS AppSync

You can now use existing Amazon Cognito user pools and identity pools for Amplify projects, making it easier to build new applications for an existing user base. AWS Amplify Console, which provides a fully managed static web hosting service, is now available in the Europe (Milan), Middle East (Bahrain), and Asia Pacific (Hong Kong) Regions. This service makes it simpler to bring automation to deploying and hosting single-page applications and static sites.

AWS AppSync enabled AWS WAF integration, making it easier to protect GraphQL APIs against common web exploits. You can also implement rate-based rules to help slow down brute force attacks. Using AWS Managed Rules for AWS WAF provides a faster way to configure application protection without creating the rules directly. AWS AppSync also recently expanded service availability to the Asia Pacific (Hong Kong), Middle East (Bahrain), and China (Ningxia) Regions, making the service now available in 21 Regions globally.

Still looking for more?

Join the AWS Serverless Developer Advocates on Twitch throughout re:Invent for live Q&A, session recaps, and more! See this page for the full schedule.

For more serverless learning resources, visit Serverless Land.

Tracking the latest server images in Amazon EC2 Image Builder pipelines

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/tracking-the-latest-server-images-in-amazon-ec2-image-builder-pipelines/

This post courtesy of Anoop Rachamadugu, Cloud Architect at AWS

The Amazon EC2 Image Builder service helps users to build and maintain server images. The images created by EC2 Image Builder can be used with Amazon Elastic Compute Cloud (EC2) and on-premises. Image Builder reduces the effort of keeping images up-to-date and secure by providing a graphical interface, built-in automation, and AWS-provided security settings. Customers have told us that they manage multiple server images and are looking for ways to track the latest server images created by the pipelines.

In this blog post, I walk through a solution that uses AWS Lambda and AWS Systems Manager (SSM) Parameter Store. It tracks and updates the latest Amazon Machine Image (AMI) IDs every time an Image Builder pipeline is run. With Lambda, you pay only for what you use. You are charged based on the number of requests for your functions and the time it takes for your code to run. In this case, the Lambda function is invoked upon the completion of the image builder pipeline. Standard SSM parameters are available at no additional charge.

Users can reference the SSM parameters in automation scripts and AWS CloudFormation templates providing access to the latest AMI ID for your EC2 infrastructure. Consider the use case of updating Amazon Machine Image (AMI) IDs for the EC2 instances in your CloudFormation templates. Normally, you might map AMI IDs to specific instance types and Regions. Then to update these, you would manually change them in each of your templates. With the SSM parameter integration, your code remains untouched and a CloudFormation stack update operation automatically fetches the latest Parameter Store value.

Overview

This solution uses a Lambda function written in Python that subscribes to an Amazon Simple Notification Service (SNS) topic. The Lambda function and the SNS topic are deployed using AWS SAM CLI. Once deployed, the SNS topic must be configured in an existing Image Builder pipeline. This results in the Lambda function being invoked at the completion of the Image Builder pipeline.

When a Lambda function subscribes to an SNS topic, it is invoked with the payload of the published messages. The Lambda function receives the message payload as an input parameter. The Lambda function first checks the message payload to see if the image status is available. If the image state is available, it retrieves the AMI ID from the message payload and updates the SSM parameter.

EC2 Image builder architecture diagram

EC2 Image builder architecture diagram

Prerequisites

To get started with this solution, the following is required:

Deploying the solution

The solution consists of two files, which can be downloaded from the amazon-ec2-image-builder GitHub repository.

  1. The Python file image-builder-lambda-update-ssm.py contains the code for the Lambda function. It first checks the SNS message payload to determine if the image is available. If it’s available, it extracts the AMI ID from the SNS message payload and updates the SSM parameter specified.The ‘ssm_parameter_name’ variable specifies the SSM parameter path where the AMI ID should be stored and updated. The Lambda function finishes by adding tags to the SSM parameter.
  2. The template.yaml file is an AWS SAM template. It deploys the Lambda function, SNS topic, and IAM role required for the Lambda function. I use Python 3.7 as the runtime and assign a memory of 256 MB for the Lambda function. The IAM policy gives the Lambda function permissions to retrieve and update SSM parameters. Deploy this application using the AWS SAM CLI guided deploy:
    sam deploy --guided

After deploying the application, note the ARN of the created SNS topic. Next, update the infrastructure settings of an existing Image Builder pipeline with this newly created SNS topic. This results in the Lambda function being invoked upon the completion of the image builder pipeline.

Configuration details

Configuration details

Verifying the solution

After the completion of the image builder pipeline, use the AWS CLI or check the AWS Management Console to verify the updated SSM parameter. To verify via AWS CLI, run the following commands to retrieve and list the tags attached to the SSM parameter:

aws ssm get-parameter --name ‘/ec2-imagebuilder/latest’
aws ssm list-tags-for-resource --resource-type "Parameter" --resource-id ‘/ec2-imagebuilder/latest’

To verify via the AWS Management Console, navigate to the Parameter Store under AWS Systems Manager. Search for the parameter /ec2-imagebuilder/latest:

AWS Systems Manager: Parameter Store

AWS Systems Manager: Parameter Store

Select the Tags tab to view the tags attached to the SSM parameter:

Image builder tags list

Image builder tags list

Referencing the SSM Parameter in CloudFormation templates

Users can reference the SSM parameters in automation scripts and AWS CloudFormation templates providing access to the latest AMI ID for your EC2 infrastructure. This sample code shows how to reference the SSM parameter in a CloudFormation template.

Parameters :
  LatestAmiId :
    Type : 'AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>'
    Default: ‘/ec2-imagebuilder/latest’

Resources :
  Instance :
    Type : 'AWS::EC2::Instance'
    Properties :
      ImageId : !Ref LatestAmiId

Conclusion

In this blog post, I demonstrate a solution that allows users to track and update the latest AMI ID created by the Image Builder pipelines. The Lambda function retrieves the AMI ID of the image created by a pipeline and update an AWS Systems Manager parameter. This Lambda function is triggered via an SNS topic configured in an Image Builder pipeline.

The solution is deployed using AWS SAM CLI. I also note how users can reference Systems Manager parameters in AWS CloudFormation templates providing access to the latest AMI ID for your EC2 infrastructure.

The amazon-ec2-image-builder-samples GitHub repository provides a number of examples for getting started with EC2 Image Builder. Image Builder can make it easier for you to build virtual machine (VM) images.

Getting started with RPA using AWS Step Functions and Amazon Textract

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/getting-started-with-rpa-using-aws-step-functions-and-amazon-textract/

This post is courtesy of Joe Tringali, Solutions Architect.

Many organizations are using robotic process automation (RPA) to automate workflow, back-office processes that are labor-intensive. RPA, as software bots, can often handle many of these activities. Often RPA workflows contain repetitive manual tasks that must be done by humans, such as viewing invoices to find payment details.

AWS Step Functions is a serverless function orchestrator and workflow automation tool. Amazon Textract is a fully managed machine learning service that automatically extracts text and data from scanned documents. Combining these services, you can create an RPA bot to automate the workflow and enable employees to handle more complex tasks.

In this post, I show how you can use Step Functions and Amazon Textract to build a workflow that enables the processing of invoices. Download the code for this solution from https://github.com/aws-samples/aws-step-functions-rpa.

Overview

The following serverless architecture can process scanned invoices in PDF or image formats for submitting payment information to a database.

Example architecture
To implement this architecture, I use single-purpose Lambda functions and Step Functions to build the workflow:

  1. Invoices are scanned and loaded into an Amazon Simple Storage Service (S3) bucket.
  2. The loading of an invoice into Amazon S3 triggers an AWS Lambda function to be invoked.
  3. The Lambda function starts an asynchronous Amazon Textract job to analyze the text and data of the scanned invoice.
  4. The Amazon Textract job publishes a completion notification message with a status of “SUCCEEDED” or “FAILED” to an Amazon Simple Notification Service (SNS) topic.
  5. SNS sends the message to an Amazon Simple Queue Service (SQS) queue that is subscribed to the SNS topic.
  6. The message in the SQS queue triggers another Lambda function.
  7. The Lambda function initiates a Step Functions state machine to process the results of the Amazon Textract job.
  8. For an Amazon Textract job that completes successfully, a Lambda function saves the document analysis into an Amazon S3 bucket.
  9. The loading of the document analysis to Amazon S3 triggers another Lambda function.
  10. The Lambda function retrieves the text and data of the scanned invoice to find the payment information. It writes an item to an Amazon DynamoDB table with a status indicating if the invoice can be processed.
  11. If the DynamoDB item contains the payment information, another Lambda function is invoked.
  12. The Lambda function archives the processed invoice into another S3 bucket.
  13. If the DynamoDB item does not contain the payment information, a message is published to an Amazon SNS topic requesting that the invoice be reviewed.

Amazon Textract can extract information from the various invoice images and associate labels with the data. You must then handle the various labels that different invoices may associate with the payee name, due date, and payment amount.

Determining payee name, due date and payment amount

After the document analysis has been saved to S3, a Lambda function retrieves the text and data of the scanned invoice to find the information needed for payment. However, invoices can use a variety of labels for the same piece of data, such a payment’s due date.

In the example invoices included with this blog, the payment’s due date is associated with the labels “Pay On or Before”, “Payment Due Date” and “Payment Due”. Payment amounts can also have different labels, such as “Total Due”, “New Balance Total”, “Total Current Charges”, and “Please Pay”. To address this, I use a series of helper functions in the app.py file in the process_document_analysis folder of the GitHub repo.

In app.py, there is the following get_ky_map helper function:

def get_kv_map(blocks):
    key_map = {}
    value_map = {}
    block_map = {}
    for block in blocks:
        block_id = block['Id']
        block_map[block_id] = block
        if block['BlockType'] == "KEY_VALUE_SET":
            if 'KEY' in block['EntityTypes']:
                key_map[block_id] = block
            else:
                value_map[block_id] = block
    return key_map, value_map, block_map

The get_kv_map function is invoked by the Lambda function handler. It iterates over the “Blocks” element of the document analysis produced by Amazon Textract to create dictionaries of keys (labels) and values (data) associated with each block identified by Amazon Textract. It then invokes the following get_kv_relationship helper function:

def get_kv_relationship(key_map, value_map, block_map):
    kvs = {}
    for block_id, key_block in key_map.items():
        value_block = find_value_block(key_block, value_map)
        key = get_text(key_block, block_map)
        val = get_text(value_block, block_map)
        kvs[key] = val
    return kvs

The get_kv_relationship function merges the key and value dictionaries produced by the get_kv_map function to create a single Python key value dictionary where labels are the keys to the dictionary and the invoice’s data are the values. The handler then invokes the following get_line_list helper function:

def get_line_list(blocks):
    line_list = []
    for block in blocks:
        if block['BlockType'] == "LINE":
            if 'Text' in block: 
                line_list.append(block["Text"])
    return line_list

Extracting payee names is more complex because the data may not be labeled. The payee may often differ from the entity sending the invoice. With the Amazon Textract analysis in a format more easily consumable by Python, I use the following get_payee_name helper function to parse and extract the payee:

def get_payee_name(lines):
    payee_name = ""
    payable_to = "payable to"
    payee_lines = [line for line in lines if payable_to in line.lower()]
    if len(payee_lines) > 0:
        payee_line = payee_lines[0]
        payee_line = payee_line.strip()
        pos = payee_line.lower().find(payable_to)
        if pos > -1:
            payee_line = payee_line[pos + len(payable_to):]
            if payee_line[0:1] == ':':
                payee_line = payee_line[1:]
            payee_name = payee_line.strip()
    return payee_name

The get_amount helper function searches the key value dictionary produced by the get_kv_relationship function to retrieve the payment amount:

def get_amount(kvs, lines):
    amount = None
    amounts = [search_value(kvs, amount_tag) for amount_tag in amount_tags if search_value(kvs, amount_tag) is not None]
    if len(amounts) > 0:
        amount = amounts[0]
    else:
        for idx, line in enumerate(lines):
            if line.lower() in amount_tags:
                amount = lines[idx + 1]
                break
    if amount is not None:
        amount = amount.strip()
        if amount[0:1] == '$':
            amount = amount[1:]
    return amount

The amount_tags variable contains a list of possible labels associated with the payment amount:

amount_tags = ["total due", "new balance total", "total current charges", "please pay"]

Similarly, the get_due_date helper function searches the key value dictionary produced by the get_kv_relationship function to retrieve the payment due date:

def get_due_date(kvs):
    due_date = None
    due_dates = [search_value(kvs, due_date_tag) for due_date_tag in due_date_tags if search_value(kvs, due_date_tag) is not None]
    if len(due_dates) > 0:
        due_date = due_dates[0]
    if due_date is not None:
        date_parts = due_date.split('/')
        if len(date_parts) == 3:
            due_date = datetime(int(date_parts[2]), int(date_parts[0]), int(date_parts[1])).isoformat()
        else:
            date_parts = [date_part for date_part in re.split("\s+|,", due_date) if len(date_part) > 0]
            if len(date_parts) == 3:
                datetime_object = datetime.strptime(date_parts[0], "%b")
                month_number = datetime_object.month
                due_date = datetime(int(date_parts[2]), int(month_number), int(date_parts[1])).isoformat()
    else:
        due_date = datetime.now().isoformat()
    return due_date

The due_date_tag contains a list of possible labels associated with the payment due:

due_date_tags = ["pay on or before", "payment due date", "payment due"]

If all required elements needed to issue a payment are found, it adds an item to the DynamoDB table with a status attribute of “Approved for Payment”. If the Lambda function cannot determine the value of one or more required elements, it adds an item to the DynamoDB table with a status attribute of “Pending Review”.

Payment Processing

If the item in the DynamoDB table is marked “Approved for Payment”, the processed invoice is archived. If the item’s status attribute is marked “Pending Review”, an SNS message is published to an SNS Pending Review topic. You can subscribe to this topic so that you can add additional labels to the Python code for determining payment due dates and payment amounts.

Note that the Lambda functions are single-purpose functions, and all workflow logic is contained in the Step Functions state machine. This diagram shows the various tasks (states) of a successful workflow.

State machine workflow

For more information about this solution, download the code from the GitHub repo (https://github.com/aws-samples/aws-step-functions-rpa).

Prerequisites

Before deploying the solution, you must install the following prerequisites:

  1. Python.
  2. AWS Command Line Interface (AWS CLI) – for instructions, see Installing the AWS CLI.
  3. AWS Serverless Application Model Command Line Interface (AWS SAM CLI) – for instructions, see Installing the AWS SAM CLI.

Deploying the solution

The solution creates the following S3 buckets with names suffixed by your AWS account ID to prevent a global namespace collision of your S3 bucket names:

  • scanned-invoices-<YOUR AWS ACCOUNT ID>
  • invoice-analyses-<YOUR AWS ACCOUNT ID>
  • processed-invoices-<YOUR AWS ACCOUNT ID>

The following steps deploy the example solution in your AWS account. The solution deploys several components including a Step Functions state machine, Lambda functions, S3 buckets, a DynamoDB table for payment information, and SNS topics.

AWS CloudFormation requires an S3 bucket and stack name for deploying the solution. To deploy:

  1. Download code from GitHub repo (https://github.com/aws-samples/aws-step-functions-rpa).
  2. Run the following command to build the artifacts locally on your workstation:sam build
  3. Run the following command to create a CloudFormation stack and deploy your resources:sam deploy --guided --capabilities CAPABILITY_NAMED_IAM

Monitor the progress and wait for the completion of the stack creation process from the AWS CloudFormation console before proceeding.

Testing the solution

To test the solution, upload the PDF test invoices to the S3 bucket named scanned-invoices-<YOUR AWS ACCOUNT ID>.

A Step Functions state machine with the name <YOUR STACK NAME>-ProcessedScannedInvoiceWorkflow runs the workflow. Amazon Textract document analyses are stored in the S3 bucket named invoice-analyses-<YOUR AWS ACCOUNT ID>, and processed invoices are stored in the S3 bucket named processed-invoices-<YOUR AWS ACCOUNT ID>. Processed payments are found in the DynamoDB table named <YOUR STACK NAME>-invoices.

You can monitor the status of the workflows from the Step Functions console. Upon completion of the workflow executions, review the items added to DynamoDB from the Amazon DynamoDB console.

Cleanup

To avoid ongoing charges for any resources you created in this blog post, delete the stack:

  1. Empty the three S3 buckets created during deployment using the S3 console:
    – scanned-invoices-<YOUR AWS ACCOUNT ID>
    – invoice-analyses-<YOUR AWS ACCOUNT ID>
    – processed-invoices-<YOUR AWS ACCOUNT ID>
  2. Delete the CloudFormation stack created during deployment using the CloudFormation console.

Conclusion

In this post, I showed you how to use a Step Functions state machine and Amazon Textract to automatically extract data from a scanned invoice. This eliminates the need for a person to perform the manual step of reviewing an invoice to find payment information to be fed into a backend system. By replacing the manual steps of a workflow with automation, an organization can free up their human workforce to handle more value-added tasks.

To learn more, visit AWS Step Functions and Amazon Textract for more information. For more serverless learning resources, visit https://serverlessland.com.

 

Application integration patterns for microservices: Running distributed RFQs

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/application-integration-patterns-running-distributed-rfqs/

This post is courtesy of Dirk Fröhner, Principal Solutions Architect.

The first blog in this series introduces asynchronous messaging for building loosely coupled systems that can scale, operate, and evolve individually. It considers messaging as a communications model for microservices architectures. Part 2 dives into fan-out strategies and applies the respective patterns to a concrete use case.

In this post, I look at how to apply messaging patterns to help coordinate distributed requests and responses. Specifically, I focus on a composite pattern called scatter-gather, as presented in the book “Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions” (Hohpe and Woolf, 2004).

I also show how a client can communicate with a backend via synchronous REST API operations while asynchronous messaging is applied internally for processing.

Overview

The use case is for Wild Rydes, a fictional application that replaces traditional taxis with unicorns. It’s used in several hands-on AWS workshops that illustrate serverless development concepts.

Wild Rydes wants to allow customers to initiate requests for quotation (RFQs) for their rides. This allows unicorns to make special offers to potential customers within a defined schedule. A customer can send their ride details and ask for quotations from all unicorns that are within a certain vicinity. The customer can then choose the best offer.

Wild Rydes

The scatter-gather pattern

The scatter-gather pattern can be used to implement this use-case on the server side. This pattern is ideal for requesting responses from multiple parties, then aggregating and processing that data.

As presented by Hohpe and Woolf, the scatter-gather pattern is a composite pattern that illustrates how to “broadcast a message to multiple recipients and re-aggregate the responses back into a single message”. The pattern is illustrated in the following diagram.

Scatter-gather architecture

The flow starts with the Requester to initiate the broadcast to all potential Responders. This can be architected in a loosely coupled manner using pub-sub messaging with Amazon SNS or Amazon MQ, as shown in this blog post.

All responders must send their answers somewhere for aggregation and processing. This can also be architected in a loosely coupled manner using a message queue with Amazon SQS or Amazon MQ, as described in this blog post.

The Aggregator component consumes the individual responses from the response queue. It forwards the aggregate to the Processor component for final processing. Both Aggregator and Processor can be part of the same application or process. If separated, they can be decoupled through messaging. The Requester can also be part of the same application or process as Aggregator and Processor.

Explaining the architecture and API

In this section, I walk through the use-case and explain how it can be architected and implemented. I show how the scatter-gather pattern works in the backend, and the client-to-backend communication.

Submit instant ride RFQ

To initiate such an RFQ, the customer app communicates with the ride booking service on the backend. The ride booking service exposes a REST API. By default, an RFQ runs for five minutes, but Wild Rydes is working on a feature to let a customer individually set that value.

A request to submit an instant-ride RFQ contains start and destination locations for the ride and the customer ID:

POST /<submit-instant-ride-rfq-resource-path> HTTP/1.1
...

{
    "from": "...",
    "to": "...",
    "customer": "..."
}

The RFQ is a lengthy process so the client app should not expect an immediate response. Instead, the API accepts the RFQ, creates an RFQ task resource, and returns to the client. The response contains a URL to request an update for the status. It also provides an estimated time for the end of the RFQ:

HTTP/1.1 202 Accepted
...

{
    "links": {
        "self": "http://.../<rfq-task-resource-path>",
        "...": "..."
    },
    "status": "running",
    "eta": "..."
}

The following architecture shows this interaction, excluding the process after a new RFQ is submitted.

Client app interaction

Processing the RFQ

The backend uses the scatter-gather pattern to publish the RFQ to unicorns and collect responses for aggregation and processing.

Backend architecture

1. The ride booking service acts as the requester in the scatter-gather pattern. Following a new RFQ from the client app, it publishes the details into an SNS topic. This topic is related to the location of the ride’s starting point since customers need quotes from unicorns within the vicinity. These messages are the green request messages.

2. The unicorn management service maintains instances of unicorn management resources and subscribes them to RFQ topics related to their current location. These resources receive the RFQ request messages and handle the interaction with the Wild Rydes unicorn app.

3. The unicorns in the vicinity are notified through the Wild Rydes unicorn app about the new RFQ and can react if they are available. Notification options between the unicorn management service and the Wild Rydes unicorn app include push notifications and web sockets.

4. Every addressed unicorn can now submit their quote. All quotes go back through the unicorn management resources and the unicorn management service into the RFQ response queue. They act as the responders in the sense of the scatter-gather pattern.

5. The ride booking service also acts as aggregator and processor in the sense of the scatter-gather pattern. It uses SQS to consume messages from an RFQ response queue that eventually contains the RFQ responses from the involved unicorns. It starts doing so immediately after it publishes the details of a new RFQ into the RFQ topic. The messages from the RFQ response queue relate to the blue response messages.

The ride booking service consumes all incoming responses from that queue. This continues until the deadline or all participating unicorns have answered, whatever occurs first. The aggregator responsibility can be as simple as persisting the details of each incoming RFQ response into an Amazon DynamoDB table.

To match incoming responses to the right RFQ, it uses a fundamental integration pattern, correlation ID. In this pattern, a requester adds a unique ID to an outgoing message and each responder is asked to forward this ID in their response.

Also, responders must know where to send their responses to. To keep this dynamic, there is another fundamental integration pattern: return address. It suggests that a requester adds meta information into outgoing messages that indicate the address for their responses. In this architecture, this is the ARN of the SQS queue that acts as the RFQ response queue. This supports an option to simplify the response management: the RFQ response queue is a dedicated queue per customer.

Lastly, the processor responsibility in the ride booking service reads the RFQ responses from the DynamoDB table. It converts the data to JSON for the Wild Rydes customer app.

Check RFQ status

During the RFQ processing, a customer may want to know how many responses have already arrived, or if the results are already available. After submitting an instant ride RFQ, the client receives a representation of the running task. It can use the self-link to request an update:

GET /<rfq-task-resource-path> HTTP/1.1

While the task is running, a response from the ride booking service comes back with the respective status value and the count of responses that have already arrived:

HTTP/1.1 200 OK
...

{
    "links": {
        "self": "http://.../<rfq-task-resource-path>",
        "...": "..."
    },
    "status": "running",
    "responses-received": 2,
    "eta": "..."
}

After the RFQ is completed

An RFQ is completed if either the time is up or all unicorns have answered. The result of the RFQ is then available to the customer. If the client requests an update to the task representation, the response indicates this by redirecting to the RFQ result:

HTTP/1.1 303 See Other
Location: <url-of-rfq-result-resource>

Requesting a representation of the results resource, the client receives the quotes of all the participating unicorns. The frontend customer app can visualize these accordingly:

HTTP/1.1 200 OK
...

{
    "links": { ... },
    "from": "...",
    "to": "...",
    "customer": "...",
    "quotes": [ ... ]
}

The ride booking service can also use means of active notifications to make the customer app aware once the RFQ result is ready, including the link to the RFQ result. Examples for this include push notifications and web sockets.

Conclusion

In this blog, I present the scatter-gather pattern, which is a composite pattern based on pub-sub and point-to-point messaging channels. It also employs correlation ID and return address. I show how this is implemented in the Wild Rydes example application. You can use this integration pattern for communication in your microservices.

I cover how synchronous API communication between end user client and backend can work along with asynchronous messaging for request processing internally.

To learn more:

For more serverless learning resources, visit https://serverlessland.com.

Introducing Amazon SNS FIFO – First-In-First-Out Pub/Sub Messaging

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-sns-fifo-first-in-first-out-pub-sub-messaging/

When designing a distributed software architecture, it is important to define how services exchange information. For example, the use of asynchronous communication decouples components and simplifies scaling, reducing the impact of changes and making it easier to release new features.

The two most common forms of asynchronous service-to-service communication are message queues and publish/subscribe messaging:

  • With message queues, messages are stored on the queue until they are processed and deleted by a consumer. On AWS, Amazon Simple Queue Service (SQS) provides a fully managed message queuing service with no administrative overhead.
  • With pub/sub messaging, a message published to a topic is delivered to all subscribers to the topic. On AWS, Amazon Simple Notification Service (SNS) is a fully managed pub/sub messaging service that enables message delivery to a large number of subscribers. Each subscriber can also set a filter policy to receive only the messages that it cares about.

You can use topics when you want to fan out messages to multiple applications, and queues when you want to send messages to one application. Using topics and queues together, you can decouple microservices, distributed systems, and serverless applications.

With SQS, you can use FIFO (First-In-First-Out) queues to preserve the order in which messages are sent and received, and to avoid that a message is processed more than once.

Introducing SNS FIFO Topics
Today, we are adding similar capabilities for pub/sub messaging with the introduction of SNS FIFO topics, providing strict message ordering and deduplicated message delivery to one or more subscribers.

FIFO topics manage ordering and deduplication similar to FIFO queues:

Ordering – You configure a message group by including a message group ID when publishing a message to a FIFO topic. For each message group ID, all messages are sent and delivered in order of their arrival. For example, to ensure the delivery of messages related to the same customer in order, you can publish these messages to the topic using the customer’s account number as the message group ID. There is no limit in the number of message groups with FIFO topics and queues. You don’t need to declare in advance the message group ID, any value will work. If you don’t have a logical distinction between messages, you can simply use the same message group ID for all and have a single group of ordered messages. The message group ID is passed to any subscribed FIFO queue.

Deduplication – Distributed systems (like SNS) and client applications sometimes generate duplicate messages. You can avoid duplicated message deliveries from the topic in two ways: either by enabling content-based deduplication on the topic, or by adding a deduplication ID to the messages that you publish. With message content-based deduplication, SNS uses a SHA-256 hash to generate the message deduplication ID using the body of the message. After a message with a specific deduplication ID is published successfully, there is a 5-minute interval during which any message with the same deduplication ID is accepted but not delivered. If you subscribe a FIFO queue to a FIFO topic, the deduplication ID is passed to the queue and it is used by SQS to avoid duplicate messages being received.

You can use FIFO topics and queues together to simplify the implementation of applications where the order of operations and events is critical, or when you cannot tolerate duplicates. For example, to process financial operations and inventory updates, or to asynchronously apply commands that you receive from a client device. FIFO queues can use message filtering in FIFO topics to selectively receive only a subset of messages rather than every message published to the topic.

How to Use SNS FIFO Topics
A common scenario where FIFO topics can help is when you receive updates that need to be processed in order. For example, I can use a FIFO topic to receive updates from an application where my customers edit their account profiles. Then, I subscribe an SQS FIFO queue to the FIFO topic, and use the queue as trigger for a Lambda function that applies the account updates to an Amazon DynamoDB table used by my Customer management system that needs to be kept in sync.

The decoupling introduced by the FIFO topic makes it easier to add new functionality with minimal impact to existing applications. For example, to reward my loyal customers with additional promotions, I add a new Loyalty application that is storing information in a relational database managed by Amazon Aurora. To keep the customer’s information stored in the Loyalty database in sync with my other applications, I can subscribe a new FIFO queue to the same FIFO topic, and add a new Lambda function that receives customer updates in the same order as they are generated, and applies them to the Loyalty database. In this way, I don’t need to change code and configuration of other applications to integrate the new Loyalty app.

First, I create two FIFO queues in the SQS console, leaving all options to their defaults:

  • The customer.fifo queue to process updates in my Customer management system.
  • The loyalty.fifo queue to help me collect and store customer updates for the Loyalty application.

In the SNS console, I create the updates.fifo topic. I select FIFO as type, and enable Content-based message deduplication.

Then,  I subscribe the customer.fifo and loyalty.fifo queues to the topic.

To be able to receive messages, I add a statement to the access policy of both queues granting the updates.fifo topic permissions to send messages to the queues. For example, for the customer.fifo queue the statement is:

{
  "Effect": "Allow",
  "Principal": {
    "Service": "sns.amazonaws.com"
  },
  "Action": "SQS:SendMessage",
  "Resource": "arn:aws:sqs:us-east-2:123412341234:customer.fifo",
  "Condition": {
    "ArnLike": {
      "aws:SourceArn": "arn:aws:sns:us-east-2:123412341234:updates.fifo"
    }
  }
}

Now, I use the SNS console to publish 4 messages in sequence. For all messages, I use the same message group ID. In this way, they are all in the same message group. The only part that is different is the message body, where I use in order:

  • Update One
  • Update Two
  • Update Three
  • Update One

In the SQS console, I see that only 3 messages have been delivered to the FIFO queues:

Why is that? When I created the FIFO topics, I enabled content-based deduplication. The 4 messages were sent within the 5-minute deduplication window. The last message has been recognized as a duplicate of the first one and has not been delivered to the subscribed queues.

Let’s see the actual messages in the queues. I use the AWS Command Line Interface (CLI) to receive the messages from SQS, and the jq command-line JSON processor to format the output and get only the Message in the Body.

Here are the messages in the customer.fifo queue:

$ aws sqs receive-message --queue-url https://sqs.us-east-2.amazonaws.com/123412341234/customer.fifo --max-number-of-messages 10 | jq '.Messages[].Body | fromjson | .Message'

"Update One"
"Update Two"
"Update Three"

And these are the messages in the loyalty.fifo queue:

$ aws sqs receive-message --queue-url https://sqs.us-east-2.amazonaws.com/123412341234/loyalty.fifo --max-number-of-messages 10 | jq '.Messages[].Body | fromjson | .Message'

"Update One"
"Update Two"
"Update Three"

As expected, the 3 messages with unique content have been delivered to both queues in the same order as they were sent.

Available Now
You can use SNS FIFO topics in all commercial regions. You can process up to 300 transactions per second (TPS) per FIFO topic or FIFO queue. With SNS, you pay only for what you use, you can find more information in the pricing page.

To learn more, please see the documentation.

Danilo

Building event-driven architectures with Amazon SNS FIFO

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-event-driven-architectures-with-amazon-sns-fifo/

This post is courtesy of Christian Mueller, Principal Solutions Architect.

Developers increasingly adopt event-driven architectures to decouple their distributed applications. Often, these events must be propagated in a strictly ordered manner to all subscribed applications. Using Amazon SNS FIFO topics and Amazon SQS FIFO queues, you can address use cases that require end-to-end message ordering, deduplication, filtering, and encryption.

In this blog post, I introduce a sample event-driven architecture. I walk through an implementation based on Amazon SNS FIFO topics and Amazon SQS FIFO queues.

Common requirements in event-driven-architectures

In event-driven architectures, data consistency is a common business requirement. This is often translated into technical requirements such as zero message loss and strict message ordering. For example, if you update your domain object rapidly, you want to be sure that all events are received by each subscriber in exactly the order they occurred. This way, the current domain object state is what each subscriber received as the latest update event. Similarly, all update events should be received after the initial create event.

Before Amazon SNS FIFO, architects had to design applications to check if messages are received out of order before processing.

Comparing SNS and SNS FIFO

Another common challenge is preventing message duplicates when sending events to the messaging service. If an event publisher receives an error, such as a network timeout, the publisher does not know if the messaging service could receive and successfully process the message or not.

The client may retry, as this is the default behavior for some HTTP response codes in AWS SDKs. This can cause duplicate messages.

Before Amazon SNS FIFO, developers had to design receivers to be idempotent. In some cases, where the event cannot be idempotent, this requires the receiver to be implemented in an idempotent way. Often, this is done by adding a key-value store like Amazon DynamoDB or Amazon ElastiCache for Redis to the service. Using this approach, the receiver can track if the event has been seen before.

Exactly once processing and message deduplication

Exploring the recruiting agency example

This sample application models a recruitment agency with a job listings website. The application is composed of multiple services. I explain 3 of them in more detail.

Sample application architecture

A custom service, the anti-corruption service, receives a change data capture (CDC) event stream of changes from a relational database. This service translates the low-level technical database events into meaningful business events for the domain services for easy consumption. These business events are sent to the SNS FIFO “JobEvents.fifo“ topic. Here, interested services subscribe to these events and process them asynchronously.

In this domain, the analytics service is interested in all events. It has an SQS FIFO “AnalyticsJobEvents.fifo” queue subscribed to the SNS FIFO “JobEvents.fifo“ topic. It uses SQS FIFO as event source for AWS Lambda, which processes and stores these events in Amazon S3. S3 is object storage service with high scalability, data availability, durability, security, and performance. This allows you to use services like Amazon EMR, AWS Glue or Amazon Athena to get insights into your data to extract value.

The inventory service owns an SQS FIFO “InventoryJobEvents.fifo” queue, which is subscribed to the SNS FIFO “JobEvents.fifo“ topic. It is only interested in “JobCreated” and “JobDeleted” events, as it only tracks which jobs are currently available and stores this information in a DynamoDB table. Therefore, it uses an SNS filter policy to only receive these events, instead of receiving all events.

This sample application focuses on the SNS FIFO capabilities, so I do not explore other services subscribed to the SNS FIFO topic. This sample follows the SQS best practices and SNS redrive policy recommendations and configures dead-letter queues (DLQ). This is useful in case SNS cannot deliver an event to the subscribed SQS queue. It also helps if the function fails to process an event from the corresponding SQS FIFO queue multiple times. As a requirement in both cases, the attached SQS DLQ must be an SQS FIFO queue.

Deploying the application

To deploy the application using infrastructure as code, it uses the AWS Serverless Application Model (SAM). SAM provides shorthand syntax to express functions, APIs, databases, and event source mappings. It is expanded into AWS CloudFormation syntax during deployment.

To get started, clone the “event-driven-architecture-with-sns-fifo” repository, from here. Alternatively, download the repository as a ZIP file from here and extract it to a directory of your choice.

As a prerequisite, you must have SAM CLI, Python 3, and PIP installed. You must also have the AWS CLI configured properly.

Navigate to the root directory of this project and build the application with SAM. SAM downloads required dependencies and stores them locally. Execute the following commands in your terminal:

git clone https://github.com/aws-samples/event-driven-architecture-with-amazon-sns-fifo.git
cd event-driven-architecture-with-amazon-sns-fifo
sam build

You see the following output:

Deployment output

Now, deploy the application:

sam deploy --guided

Provide arguments for the deployments, such as the stack name and preferred AWS Region:

SAM guided deployment

After a successful deployment, you see the following output:

Successful deployment message

Learning more about the implementation

I explore the three services forming this sample application, and how they use the features of SNS FIFO.

Anti-corruption service

The anti-corruption service owns the SNS FIFO “JobEvents.fifo” topic, where it publishes business events related to job postings. It uses an SNS FIFO topic, as end-to-end ordering per job ID is required. SNS FIFO is configured not to perform content-based deduplication, as I require a unique message deduplication ID for each event for deduplication. The corresponding definition in the SAM template looks like this:

  JobEventsTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: JobEvents.fifo
      FifoTopic: true
      ContentBasedDeduplication: false

For simplicity, the anti-corruption function in the sample application doesn’t consume an external database CDC stream. It uses Amazon CloudWatch Events as an event source to trigger the function every minute.

I provide the SNS FIFO topic Amazon Resource Name (ARN) as an environment variable in the function. This makes this function more portable to deploy in different environments and stages. The function’s AWS Identity and Access Management (IAM) policy grants permissions to publish messages to only this SNS topic:

  AntiCorruptionFunction:
    Type: AWS::Serverless::
    Properties:
      CodeUri: anti-corruption-service/
      Handler: app.lambda_handler
      Runtime: python3.7
      MemorySize: 256
      Environment:
        Variables:
          TOPIC_ARN: !Ref JobEventsTopic
      Policies:
        - SNSPublishMessagePolicy
            TopicName: !GetAtt JobEventsTopic.TopicName
      Events:
        Trigger:
          Type: 
          Properties:
            Schedule: 'rate(1 minute)'

The anti-corruption function uses features in the SNS publish API, which allows you to define a “MessageDeduplicationId” and a “MessageGroupId”. The “MessageDeduplicationId” is used to filter out duplicate messages, which are sent to SNS FIFO within in 5-minute deduplication interval. The “MessageGroupId” is required, as SNS FIFO processes all job events for the same message group in a strictly ordered manner, isolated from other message groups processed through the same topic.

Another important aspect in this implementation is the use of “MessageAttributes”. We define a message attribute with the name “eventType” and values like “JobCreated”, “JobSalaryUpdated”, and “JobDeleted”. This allows subscribers to define SNS filter policies to only receive certain events they are interested in:

import boto3
from datetime import datetime
import json
import os
import random
import uuid

TOPIC_ARN = os.environ['TOPIC_ARN']

sns = boto3.client('sns')

def lambda_handler(event, context):
    jobId = str(random.randrange(0, 1000))

    send_job_created_event(jobId)
    send_job_updated_event(jobId)
    send_job_deleted_event(jobId)
    return

def send_job_created_event(jobId):
    messageId = str(uuid.uuid4())

    response = sns.publish(
        TopicArn=TOPIC_ARN,
        Subject=f'Job {jobId} created',
        MessageDeduplicationId=messageId,
        MessageGroupId=f'JOB-{jobId}',
        Message={...},
        MessageAttributes = {
            'eventType': {
                'DataType': 'String',
                'StringValue': 'JobCreated'
            }
        }
    )
    print('sent message and received response: {}'.format(response))
    return

def send_job_updated_event(jobId):
    messageId = str(uuid.uuid4())

    response = sns.publish(...)
    print('sent message and received response: {}'.format(response))
    return

def send_job_deleted_event(jobId):
    messageId = str(uuid.uuid4())

    response = sns.publish(...)
    print('sent message and received response: {}'.format(response))
    return

Analytics service

The analytics service owns an SQS FIFO “AnalyticsJobEvents.fifo” queue which is subscribed to the SNS FIFO “JobEvents.fifo” topic. Following best practices, I define redrive policies for the SQS FIFO queue and the SNS FIFO subscription in the template:

  AnalyticsJobEventsQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: AnalyticsJobEvents.fifo
      FifoQueue: true
      RedrivePolicy:
        deadLetterTargetArn: !GetAtt AnalyticsJobEventsQueueDLQ.Arn
        maxReceiveCount: 3

  AnalyticsJobEventsQueueToJobEventsTopicSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      Endpoint: !GetAtt AnalyticsJobEventsQueue.Arn
      Protocol: sqs
      RawMessageDelivery: true
      TopicArn: !Ref JobEventsTopic
      RedrivePolicy: !Sub '{"deadLetterTargetArn": "${AnalyticsJobEventsSubscriptionDLQ.Arn}"}'

The analytics function uses SQS FIFO as an event source for Lambda. The S3 bucket name is an environment variable for the function, which increases the code portability across environments and stages. The IAM policy for this function only grants permissions write objects to this S3 bucket:

  AnalyticsFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: analytics-service/
      Handler: app.lambda_handler
      Runtime: python3.7
      MemorySize: 256
      Environment:
        Variables:
          BUCKET_NAME: !Ref AnalyticsBucket
      Policies:
        - S3WritePolicy:
            BucketName: !Ref AnalyticsBucket
      Events:
        Trigger:
          Type: SQS
          Properties:
            Queue: !GetAtt AnalyticsJobEventsQueue.Arn
            BatchSize: 10

View the function implementation at the GitHub repo.

Inventory service

The inventory service also owns an SQS FIFO “InventoryJobEvents.fifo” queue which is subscribed to the SNS FIFO “JobEvents.fifo” topic. It uses redrive policies for the SQS FIFO queue and the SNS FIFO subscription as well. This service is only interested in certain events, so uses an SNS filter policy to specify these events:

  InventoryJobEventsQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: InventoryJobEvents.fifo
      FifoQueue: true
      RedrivePolicy:
        deadLetterTargetArn: !GetAtt InventoryJobEventsQueueDLQ.Arn
        maxReceiveCount: 3

  InventoryJobEventsQueueToJobEventsTopicSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      Endpoint: !GetAtt InventoryJobEventsQueue.Arn
      Protocol: sqs
      RawMessageDelivery: true
      TopicArn: !Ref JobEventsTopic
      FilterPolicy: '{"eventType":["JobCreated", "JobDeleted"]}'
      RedrivePolicy: !Sub '{"deadLetterTargetArn": "${InventoryJobEventsQueueSubscriptionDLQ.Arn}"}'

The inventory function also uses SQS FIFO as event source for Lambda. The DynamoDB table name is set as an environment variable, so the function can look up the name during initialization. The IAM policy grants read/write permissions for only this table:

  InventoryFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: inventory-service/
      Handler: app.lambda_handler
      Runtime: python3.7
      MemorySize: 256
      Environment:
        Variables:
          TABLE_NAME: !Ref InventoryTable
      Policies:
        - DynamoDBCrudPolicy:
            TableName: !Ref InventoryTable
      Events:
        Trigger:
          Type: SQS
          Properties:
            Queue: !GetAtt InventoryJobEventsQueue.Arn
            BatchSize: 10

View the function implementation at the GitHub repo.

Conclusion

Amazon SNS FIFO topics can simplify the design of event-driven architectures and reduce custom code in building such applications.

By using the native integration with Amazon SQS FIFO queues, you can also build architectures that fan out to thousands of subscribers. This pattern helps achieve data consistency, deduplication, filtering, and encryption in near real time, using managed services.

For information on regional availability and service quotas, see SNS endpoints and quotas and SQS endpoints and quotas. For more information on the FIFO functionality, see SNS FIFO and SQS FIFO in their Developer Guides.

Optimizing the cost of serverless web applications

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/optimizing-the-cost-of-serverless-web-applications/

Web application backends are one of the most frequent types of serverless use-case for customers. The pay-for-value model can make it cost-efficient to build web applications using serverless tools.

While serverless cost is generally correlated with level of usage, there are architectural decisions that impact cost efficiency. The impact of these choices is more significant as your traffic grows, so it’s important to consider the cost-effectiveness of different designs and patterns.

This blog post reviews some common areas in web applications where you may be able to optimize cost. It uses the Happy Path web application as a reference example, which you can read about in the introductory blog post.

Serverless web applications generally use a combination of the services in the following diagram. I cover each of these areas to highlight common areas for cost optimization.

Serverless architecture by AWS service

The API management layer: Selecting the right API type

Most serverless web applications use an API between the frontend client and the backend architecture. Amazon API Gateway is a common choice since it is a fully managed service that scales automatically. There are three types of API offered by the service – REST APIs, WebSocket APIs, and the more recent HTTP APIs.

HTTP APIs offer many of the features in the REST APIs service, but the cost is often around 70% less. It supports Lambda service integration, JWT authorization, CORS, and custom domain names. It also has a simpler deployment model than REST APIs. This feature set tends to work well for web applications, many of which mainly use these capabilities. Additionally, HTTP APIs will gain feature parity with REST APIs over time.

The Happy Path application is designed for 100,000 monthly active users. It uses HTTP APIs, and you can inspect the backend/template.yaml to see how to define these in the AWS Serverless Application Model (AWS SAM). If you have existing AWS SAM templates that are using REST APIs, in many cases you can change these easily:

REST to HTTP API

Content distribution layer: Optimizing assets

Amazon CloudFront is a content delivery network (CDN). It enables you to distribute content globally across 216 Points of Presence without deploying or managing any infrastructure. It reduces latency for users who are geographically dispersed and can also reduce load on other parts of your service.

A typical web application uses CDNs in a couple of different ways. First, there is the distribution of the application itself. For single-page application frameworks like React or Vue.js, the build processes create static assets that are ideal for serving over a CDN.

However, these builds may not be optimized and can be larger than necessary. Many frameworks offer optimization plugins, and the JavaScript community frequently uses Webpack to bundle modules and shrink deployment packages. Similarly, any media assets used in the application build should be optimized. You can use tools like Lighthouse to analyze your web apps to find images that can be resized or compressed.

Optimizing images

The second common CDN use-case for web apps is for user-generated content (UGC). Many apps allow users to upload images, which are then shared with other users. A typical photo from a 12-megapixel smartphone is 3–9 MB in size. This high resolution is not necessary when photos are rendered within web apps. Displaying the high-resolution asset results in slower download performance and higher data transfer costs.

The Happy Path application uses a Resizer Lambda function to optimize these uploaded assets. This process creates two different optimized images depending upon which component loads the asset.

Image sizes in front-end applications

The upload S3 bucket shows the original size of the upload from the smartphone:

The distribution S3 bucket contains the two optimized images at different sizes:

Optimized images in the distribution S3 bucket

The distribution file sizes are 98–99% smaller. For a busy web application, using optimized image assets can make a significant difference to data transfer and CloudFront costs.

Additionally, you can convert to highly optimized file formats such as WebP to reduce file size even further. Not all browsers support this format, but you can use CSS on the frontend to fall back to other types if needed:

<img src="myImage.webp" onerror="this.onerror=null; this.src='myImage.jpg'">

The data layer

AWS offers many different database and storage options that can be useful for web applications. Billing models vary by service and Region. By understanding the data access and storage requirements of your app, you can make informed decisions about the right service to use.

Generally, it’s more cost-effective to store binary data in S3 than a database. First, when the data is uploaded, you can upload directly to S3 with presigned URLs instead of proxying data via API Gateway or another service.

If you are using Amazon DynamoDB, it’s best practice to store larger items in S3 and include a reference token in a table item. Part of DynamoDB pricing is based on read capacity units (RCUs). For binary items such as images, it is usually more cost-efficient to use S3 for storage.

Many web developers who are new to serverless are familiar with using a relational database, so choose Amazon RDS for their database needs. Depending upon your use-case and data access patterns, it may be more cost effective to use DynamoDB instead. RDS is not a serverless service so there are monthly charges for the underlying compute instance. DynamoDB pricing is based upon usage and storage, so for many web apps may be a lower-cost choice.

Integration layer

This layer includes services like Amazon SQS, Amazon SNS, and Amazon EventBridge, which are essential for decoupling serverless applications. Each of these have a request-based pricing component, where 64 KB of a payload is billed as one request. For example, a single SQS message with a 256 KB payload is billed as four requests. There are two optimization methods common for web applications.

1. Combine messages

Many messages sent to these services are much smaller than 64 KB. In some applications, the publishing service can combine multiple messages to reduce the total number of publish actions to SNS. Additionally, by either eliminating unused attributes in the message or compressing the message, you can store more data in a single request.

For example, a publishing service may be able to combine multiple messages together in a single publish action to an SNS topic:

  • Before optimization, a publishing service sends 100,000,000 1KB-messages to an SNS topic. This is charged as 100 million messages for a total cost of $50.00.
  • After optimization, the publishing service combines messages to send 1,562,500 64KB-messages to an SNS topic. This is charged as 1,562,500 messages for a total cost of $0.78.

2. Filter messages

In many applications, not every message is useful for a consuming service. For example, an SNS topic may publish to a Lambda function, which checks the content and discards the message based on some criteria. In this case, it’s more cost effective to use the native filtering capabilities of SNS. The service can filter messages and only invoke the Lambda function if the criteria is met. This lowers the compute cost by only invoking Lambda when necessary.

For example, an SNS topic receives messages about customer orders and forwards these to a Lambda function subscriber. The function is only interested in canceled orders and discards all other messages:

  • Before optimization, the SNS topic sends all messages to a Lambda function. It evaluates the message for the presence of an order canceled attribute. On average, only 25% of the messages are processed further. While SNS does not charge for delivery to Lambda functions, you are charged each time the Lambda service is invoked, for 100% of the messages.
  • After optimization, using an SNS subscription filter policy, the SNS subscription filters for canceled orders and only forwards matching messages. Since the Lambda function is only invoked for 25% of the messages, this may reduce the total compute cost by up to 75%.

3. Choose a different messaging service

For complex filtering options based upon matching patterns, you can use EventBridge. The service can filter messages based upon prefix matching, numeric matching, and other patterns, combining several rules into a single filter. You can create branching logic within the EventBridge rule to invoke downstream targets.

EventBridge offers a broader range of targets than SNS destinations. In cases where you publish from an SNS topic to a Lambda function to invoke an EventBridge target, you could use EventBridge instead and eliminate the Lambda invocation. For example, instead of routing from SNS to Lambda to AWS Step Functions, instead create an EventBridge rule that routes events directly to a state machine.

Business logic layer

Step Functions allows you to orchestrate complex workflows in serverless applications while eliminating common boilerplate code. The Standard Workflow service charges per state transition. Express Workflows were introduced in December 2019, with pricing based on requests and duration, instead of transitions.

For workloads that are processing large numbers of events in shorter durations, Express Workflows can be more cost-effective. This is designed for high-volume event workloads, such as streaming data processing or IoT data ingestion. For these cases, compare the cost of the two workflow types to see if you can reduce cost by switching across.

Lambda is the on-demand compute layer in serverless applications, which is billed by requests and GB-seconds. GB-seconds is calculated by multiplying duration in seconds by memory allocated to the function. For a function with a 1-second duration, invoked 1 million times, here is how memory allocation affects the total cost in the US East (N. Virginia) Region:

Memory (MB) GB/S Compute cost Total cost
128 125,000 $ 2.08 $ 2.28
512 500,000 $ 8.34 $ 8.54
1024 1,000,000 $ 16.67 $ 16.87
1536 1,500,000 $ 25.01 $ 25.21
2048 2,000,000 $ 33.34 $ 33.54
3008 2,937,500 $ 48.97 $ 49.17

There are many ways to optimize Lambda functions, but one of the most important choices is memory allocation. You can choose between 128 MB and 3008 MB, but this also impacts the amount of virtual CPU as memory increases. Since total cost is a combination of memory and duration, choosing more memory can often reduce duration and lower overall cost.

Instead of manually setting the memory for a Lambda function and running executions to compare duration, you can use the AWS Lambda Power Tuning tool. This uses Step Functions to run your function against varying memory configurations. It can produce a visualization to find the optimal memory setting, based upon cost or execution time.

Optimizing costs with the AWS Lambda Power Tuning tool

Conclusion

Web application backends are one of the most popular workload types for serverless applications. The pay-per-value model works well for this type of workload. As traffic grows, it’s important to consider the design choices and service configurations used to optimize your cost.

Serverless web applications generally use a common range of services, which you can logically split into different layers. This post examines each layer and suggests common cost optimizations helpful for web app developers.

To learn more about building web apps with serverless, see the Happy Path series. For more serverless learning resources, visit https://serverlessland.com.

Building resilient serverless patterns by combining messaging services

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-resilient-no-code-serverless-patterns-by-combining-messaging-services/

In “Choosing between messaging services for serverless applications”, I explain the features and differences between the core AWS messaging services. Amazon SQS, Amazon SNS, and Amazon EventBridge provide queues, publish/subscribe, and event bus functionality for your applications. Individually, these are robust, scalable services that are fundamental building blocks of serverless architectures.

However, you can also combine these services to solve specific challenges in distributed architectures. By doing this, you can use specific features of each service to build sophisticated patterns with little code. These combinations can make your applications more resilient and scalable, and reduce the amount of custom logic and architecture in your workload.

In this blog post, I highlight several important patterns for serverless developers. I also show how you use and deploy these integrations with the AWS Serverless Application Model (AWS SAM).

Examples in this post refer to code that can be downloaded from this GitHub repo. The README.md file explains how to deploy and run each example.

SNS to SQS: Adding resilience and throttling to message throughput

SNS has a robust retry policy that results in up to 100,010 delivery attempts over 23 days. If a downstream service is unavailable, it may be overwhelmed by retries when it comes back online. You can solve this issue by adding an SQS queue.

Adding an SQS queue between the SNS topic and its subscriber has two benefits. First, it adds resilience to message delivery, since the messages are durably stored in a queue. Second, it throttles the rate of messages to the consumer, helping smooth out traffic bursts caused by the service catching up with missed messages.

To build this in an AWS SAM template, you first define the two resources, and the SNS subscription:

  MySqsQueue:
    Type: AWS::SQS::Queue

  MySnsTopic:
    Type: AWS::SNS::Topic
    Properties:
      Subscription:
        - Protocol: sqs
          Endpoint: !GetAtt MySqsQueue.Arn

Finally, you provide permission to the SNS topic to publish to the queue, using the AWS::SQS::QueuePolicy resource:

  SnsToSqsPolicy:
    Type: AWS::SQS::QueuePolicy
    Properties:
      PolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Sid: "Allow SNS publish to SQS"
            Effect: Allow
            Principal: "*"
            Resource: !GetAtt MySqsQueue.Arn
            Action: SQS:SendMessage
            Condition:
              ArnEquals:
                aws:SourceArn: !Ref MySnsTopic
      Queues:
        - Ref: MySqsQueue

To test this, you can publish a message to the SNS topic and then inspect the SQS queue length using the AWS CLI:

aws sns publish --topic-arn "arn:aws:sns:us-east-1:123456789012:sns-sqs-MySnsTopic-ABC123ABC" --message "Test message"
aws sqs get-queue-attributes --queue-url "https://sqs.us-east-1.amazonaws.com/123456789012/sns-sqs-MySqsQueue- ABC123ABC " --attribute-names ApproximateNumberOfMessages

This results in the following output:

CLI output

Another usage of this pattern is when you want to filter messages in architectures using an SQS queue. By placing the SNS topic in front of the queue, you can use the message filtering capabilities of SNS. This ensures that only the messages you need are published to the queue. To use message filtering in AWS SAM, use the AWS:SNS:Subcription resource:

  QueueSubcription:
    Type: 'AWS::SNS::Subscription'
    Properties:
      TopicArn: !Ref MySnsTopic
      Endpoint: !GetAtt MySqsQueue.Arn
      Protocol: sqs
      FilterPolicy:
        type:
        - orders
        - payments 
      RawMessageDelivery: 'true'

EventBridge to SNS: combining features of both services

Both SNS and EventBridge have different characteristics in terms of targets, and integration with broader features. This table compares the major differences between the two services:

Amazon SNS Amazon EventBridge
Number of targets 10 million (soft) 5
Limits 100,000 topics. 12,500,000 subscriptions per topic. 100 event buses. 300 rules per event bus.
Input transformation No Yes – see details.
Message filtering Yes – see details. Yes, including IP address matching – see details.
Format Raw or JSON JSON
Receive events from AWS CloudTrail No Yes
Targets HTTP(S), SMS, SNS Mobile Push, Email/Email-JSON, SQS, Lambda functions 15 targets including AWS LambdaAmazon SQSAmazon SNSAWS Step FunctionsAmazon Kinesis Data StreamsAmazon Kinesis Data Firehose.
SaaS integration No Yes – see integration partners.
Schema Registry integration No Yes – see details.
Dead-letter queues supported Yes No
Public visibility Can create public topics Cannot create public buses
Cross-Region You can subscribe your AWS Lambda functions to an Amazon SNS topic in any Region. Targets must be same Region. You can publish across Region to another event bus.

In this pattern, you configure an SNS topic as a target of an EventBridge rule:

SNS topic as a target for an EventBridge rule

In the AWS SAM template, you declare the resources in the preceding diagram as follows:

Resources:
  MySnsTopic:
    Type: AWS::SNS::Topic

  EventRule: 
    Type: AWS::Events::Rule
    Properties: 
      Description: "EventRule"
      EventPattern: 
        account: 
          - !Sub '${AWS::AccountId}'
        source:
          - "demo.cli"
      Targets: 
        - Arn: !Ref MySnsTopic
          Id: "SNStopic"

The default bus already exists in every AWS account, so there is no need to declare it. For the event bus to publish matching events to the SNS topic, you define permissions using the AWS::SNS::TopicPolicy resource:

  EventBridgeToToSnsPolicy:
    Type: AWS::SNS::TopicPolicy
    Properties: 
      PolicyDocument:
        Statement:
        - Effect: Allow
          Principal:
            Service: events.amazonaws.com
          Action: sns:Publish
          Resource: !Ref MySnsTopic
      Topics: 
        - !Ref MySnsTopic       

EventBridge has a limit of five targets per rule. In cases where you must send events to hundreds or thousands of targets, publishing to SNS first and then subscribing those targets to the topic works around this limit. Both services have different targets, and this pattern allows you to deliver EventBridge events to SMS, HTTP(s), email and SNS mobile push.

You can transform and filter the message using these services, often without needing an AWS Lambda function. SNS does not support input transformation but you can do this in an EventBridge rule. Message filtering is possible in both services but EventBridge provides richer content filtering capabilities.

AWS CloudTrail can log and monitor activity across services in your AWS account. It can be a useful source for events, allowing you to respond dynamically to objects in Amazon S3 or react to changes in your environment, for example. This natively integrates with EventBridge, allowing you to ingest events at scale from dozens of services.

Using EventBridge enables you to source events from outside your AWS account, offering integrations with a list of software as a service (SaaS) providers. This capability allows you to receive events from your accounts with SaaS providers like Zendesk, PagerDuty, and Auth0. These events are delivered to a partner event bus in your account, and can then be filtered and routed to an SNS topic.

Additionally, this pattern allows you to deliver events to Lambda functions in other AWS accounts and in other AWS Regions. You can invoke Lambda from SNS topics in other Regions and accounts. It’s also possible to make SNS topics publicly read-only, making them extensible endpoints that other third parties can consume from. SNS has comprehensive access control, which you can incorporate into this pattern.

Cross-account publishing

EventBridge to SQS: Building fault-tolerant microservices

EventBridge can route events to targets such as microservices. In the case of downstream failures, the service retries events for up to 24 hours. For workloads where you need a longer period of time to store and retry messages, you can deliver the events to an SQS queue in each microservice. This durably stores those events until the downstream service recovers. Additionally, this pattern protects the microservice from large bursts of traffic by throttling the delivery of messages.

Fault-tolerant microservices architecture

The resources declared in the AWS SAM template are similar to the previous examples, but it uses the AWS::SQS::QueuePolicy resource to grant the appropriate permission to EventBridge:

  EventBridgeToToSqsPolicy:
    Type: AWS::SQS::QueuePolicy
    Properties:
      PolicyDocument:
        Statement:
        - Effect: Allow
          Principal:
            Service: events.amazonaws.com
          Action: SQS:SendMessage
          Resource:  !GetAtt MySqsQueue.Arn
      Queues:
        - Ref: MySqsQueue

Conclusion

You can combine these services in your architectures to implement patterns that solve complex challenges, often with little code required. This blog post shows three examples that implement message throttling and queueing, integrating SNS and EventBridge, and building fault tolerant microservices.

To learn more building decoupled architectures, see this Learning Path series on EventBridge. For more serverless learning resources, visit https://serverlessland.com.