Tag Archives: AWS Lambda

Capturing client events using Amazon API Gateway and Amazon EventBridge

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/capturing-client-events-using-amazon-api-gateway-and-amazon-eventbridge/

This post is written by Tim Bruce, Senior Solutions Architect, DevAx.

Event producers are one of the three main components in an event-driven architecture. Event producers create and publish events to event routers, which send them to event consumers. Any portion of a system, including a mobile or web client, can be an event producer.

To extend the event model to your mobile and web clients, you must implement standards for security, messaging formats, and event storage.

This post shows how to build a client-enabled event-handling solution. It uses Amazon EventBridge, Amazon API Gateway, AWS Lambda, and Amazon Cognito. This architecture supports routing client events to internal and external destinations. It provides a blueprint that you can use to simplify the integration.

Overview

This example creates a RESTful API using API Gateway. It sends events directly to EventBridge without the need for compute services. In production, you have more requirements than only receiving and forwarding events. Additional requirements include security, user identification, validation, enrichment, transformation, event forwarding, and storing.

In this example, API Gateway provides security and user identification by invoking a Lambda authorizer. The authorizer generates a policy and returns client identification to API Gateway. API Gateway then performs request validation and message enrichment before forwarding the events to EventBridge.

EventBridge evaluates the events against rules and forwards the events to targets. The rules apply transformation to the events and forward an event to up to five targets. Targets include AWS services, such as Amazon Kinesis Data Firehose, and many third-party solutions, such as Zendesk, with HTTPS endpoints.

Lastly, Kinesis Data Firehose provides a cost-effective solution to store events into an Amazon S3 bucket. Before storing the events, Kinesis Data Firehose transforms records via Lambda transformers. It also partitions records using data in the record or calculated data via a Lambda function. Kinesis Data Firehose uses this partitioning data to create keys in the bucket and store matching records within the keys.

Example architecture

Example architecture

The example consists of the following resources defined in the AWS SAM template:

Data flow

Data flow

  1. Application clients collect or generate the events.
  2. The client sends the events to API Gateway as URL-encoded JSON. The client includes the user’s JWT in an authorization header with the request for validation.
  3. The Lambda authorizer validates the JWT with Amazon Cognito and returns the user’s unique clientID value to API Gateway.
  4. API Gateway transforms the request into events, appending clientId, the bus name, and environment.
  5. API Gateway sends the events to EventBridge.
  6. EventBridge rules match the events and:
    1. Forwards all client events to Kinesis Data Firehose.
    2. Forwards client events with detail.eventType of “loyaltypurchase” to Zendesk.
  7. Kinesis Data Firehose receives the records.
  8. The Kinesis Data Firehose data transformation processes each record, moving the client ID to the detail object.
  9. Kinesis Data Firehose partitions the records and stores them in an S3 bucket.

Overall design

The following sections discuss details of the solution, starting from the event in a web or mobile client. This solution requires the client to create an HTTPS request, including the user’s JWT as an authorization header.

{"entries": [{"entry": "{\"eventType\": \"searching\", \"schemaVersion\":1, \"data\": {\"searchTerm\":\"games\"}}"}]}

The preceding JSON shows a sample request body for this solution. The top-level item “entries” is an array of “entry” items. API Gateway will translate each “entry” to the event-detail field in EventBridge events. The client must escape the data for “entry” to prevent translation errors.

API Gateway and Lambda authorizer

API Gateway receives the request and validates the JWT by invoking the Lambda authorizer. The authorizer generates a policy allowing the request for valid tokens. It adds the Amazon Cognito “custom:clientId” custom attribute to the response context before returning the response to API Gateway. The “custom:clientId” attribute is a unique client identifier in the form of a UUID that downstream systems can use to retrieve data about the customer.

API Gateway validates the request by matching the request body against a model. Models represent what a request should look like. A mapping template then transforms valid requests to the format required by EventBridge. Mapping templates use velocity templating language (VTL) to do this.

VTL template
This mapping template uses a #foreach loop to process the array “entries” from the request body. The process enriches each event with the user’s “custom:clientId” and stage variables for bus name and environment from API Gateway.

Integration request

The preceding API Gateway AWS integration enables API Gateway to send the events to EventBridge without using compute services, such as Lambda or Amazon EC2. The integration and IAM execution role enable API Gateway to call the EventBridge PutEvents API to do this.

EventBridge rules and transformations

EventBridge rules match events against criteria, transform the events, and forward the events to targets. There are two rules in this example. One processes events for Zendesk tickets and the other forwards data to Kinesis Data Firehose to store events for triage and analytics.

This example creates service tickets in the Zendesk ticketing system. The tickets trigger agents to contact customers who are expecting a call to complete their purchases. The software client, by sending the event directly, reducing time-to-action for back-office processes and helping improve customer satisfaction.

Matching EventBridge rule

This rule matches client event messages for loyalty purchases and forwards details to the Zendesk API. The rule includes a transformation, which selects a portion of the event before sending the information to the target.

EventBridge uses an API destination to store details about the HTTP endpoint and usage policies. Additionally, an EventBridge connection and an AWS Secrets Manager secret store details. These include the authentication policy and authentication credentials to connect to the API destination.

Zendesk dashboard

Successfully processed events open tickets in Zendesk using the API destination. Agents now have a list of customers to contact.

Enterprises often require storing the events for troubleshooting or analytics. EventBridge does not include a newline between records when forwarding events to Kinesis Data Firehose. Because of this, it may be more challenging to discern each record when analyzing the data.

Rule to transform events
A rule for all client events changes this behavior. This AWS CloudFormation snippet defines the rule that will transform each event, adding a new line after each. The “\n” character in the InputTemplate field adds the separator between records before forwarding the data to Kinesis Data Firehose.

After, Kinesis Data Firehose receives each record separated by a new line, enabling both triage and analytics without extra overhead.

Kinesis Data Firehose to S3

Kinesis Data Firehose is a cost-effective way to batch and write records to S3. It offers optional transformation capabilities by invoking a Lambda function. This example uses a Lambda function that moves the “clientID” field to the detail section of the event record.

Kinesis Data Firehose to S3

Kinesis Data Firehose also supports dynamic partitioning of records when writing to S3. It selects data from the records or data calculated by a Lambda function. In this example, it selects data from the records to store data in separate folders in S3.

Event durability considerations

You can extend this example using an EventBridge archive and Amazon Kinesis Data Streams. Archiving allows you to create an encrypted archive of matching events. You can define the data retention in days, from one through indefinite. You can replay events from your archive when you must re-process data.

Kinesis Data Streams is a serverless data streaming solution. The EventBridge rule for all records can forward data to Kinesis Data Streams instead of Kinesis Data Firehose. Multiple applications can consume the Kinesis Data Streams. Kinesis Data Firehose would consume this stream of data and store it in S3.

Prerequisites

You need the following prerequisites to deploy the example solution:

Implementation

The full source of the solution is in the GitHub repository and is deployed with AWS SAM.

  1. Create a Secrets Manager secret using the command the AWS CLI:
    aws secretsmanager create-secret --name proto/Zendesk --secret-string '{"username":"<YOUR EMAIL>","apiKey":"<YOUR APIKEY>"}
  2. Clone the solution repository using git:
    git clone https://github.com/aws-samples/client-event-sample
  3. Build the AWS SAM project:
    sam build --use-container
  4. Deploy the project using AWS SAM:
    sam deploy --guided --capabilities CAPABILITY_NAMED_IAMAWS SAM deployment output
  5. From the outputs from the deployment, set the following shell variables:
    APPCLIENTID=<output APPCLIENTID>
    APIID=<output APIID>
    REGION=<region you deployed to>
  6. Create a user in Amazon Cognito using the AWS CLI:
    aws cognito-idp sign-up --client-id $APPCLIENTID --username <YOUR USER ID> --password <YOUR PASSWORD> --user-attributes Name=email,Value=<YOUR EMAIL>
  7. After you receive the confirmation code, confirm the user using the AWS CLI:
    aws cognito-idp confirm-sign-up --client-id $APPCLIENTID --username <userid> --confirmation-code <confirmation code>
  8. Test the user login with the AWS CLI:
    aws cognito-idp initiate-auth --auth-flow USER_PASSWORD_AUTH --client-id $APPCLIENTID --auth-parameters USERNAME=<YOUR USER ID>,PASSWORD=<YOUR PASSWORD>

If successful, this returns a JSON web token (JWT).

Testing the client event solution

  1. The sample repository includes an event generator in the util directory. The generator uses your credentials and simulates events from a user’s software client. From the utils directory, run the generator:
    python3 generator.py
    --minutes <minutes to run generator> --batch <batch size from 1-10>
    --errors <True|False> --userid <YOUR USER ID> --password <YOUR
    PASSWORD> --region $REGION --appclientid $APPCLIENTID --apiid $APIID
  2. Log in to your Zendesk console and view the created tickets.
  3. After five minutes, review the “clientevents” bucket to view the event records.

Cleaning up

To remove the example:

  1. Delete the data stored in the clientevents buckets created from the template.
  2. Delete the stack using the command:
    sam delete --stack-name clientevents
  3. Delete the secret using the command:
    aws secretsmanager delete-secret --secret-id <arn of secret>

Conclusion

This post shows how to send client events to an API and EventBridge to enable new customer experiences. The example covers enabling new experiences by creating a way for software clients to send events with minimal custom code. This blueprint shows how you can include client events in your solution, featuring validation, enrichment, transformation, and storage.

You can modify the example code provided here for your use in your organization. This enables your client software to register events without modifying backend code.

For more serverless learning resources, visit Serverless Land.

Automating Anomaly Detection in Ecommerce Traffic Patterns

Post Syndicated from Aditya Pendyala original https://aws.amazon.com/blogs/architecture/automating-anomaly-detection-in-ecommerce-traffic-patterns/

Many organizations with large ecommerce presences have procedures to detect major anomalies in their user traffic. Often, these processes use static alerts or manual monitoring. However, the ability to detect minor anomalies in traffic patterns near real-time can be challenging. Early detection of these minor anomalies in ecommerce traffic (such as website page visits and order completions) helps organizations take corrective actions to address issues. This decreases negative impacts to business key performance indicators (KPIs).

In this blog post, we will demonstrate an artificial intelligence/machine learning (AI/ML) solution using AWS services. We’ll show how Amazon Kinesis and Amazon Lookout for Metrics can be used to detect major and minor anomalies near-real time, based on historical and current traffic trends.

The inconsistency of ecommerce traffic

The ecommerce traffic (and number of orders placed) varies based on season, month, date, and time of day. For example, ecommerce websites experience high traffic during weekday evening hours, compared to morning hours. Similarly, there is a spike in web traffic on weekends, compared to weekdays. However, the ecommerce traffic on holiday events (for example, Black Friday, Cyber Monday) does not follow this trend. Due to such dynamic and varying patterns, detecting minor anomalies in user traffic near-real time becomes difficult.

We need a smart solution that can detect the smallest deviation in user traffic based on historical data (date and time). As you can imagine, programming these trends based on static rules is time-intensive. In the next section, we discuss a solution that can help organizations automate and detect minor (and major) anomalies while still accounting for varying traffic trends.

The components of our anomaly detection solution

The architecture consists of three functional components:

  • The ecommerce application that customers use for interaction
  • The data ingesting, transforming, and storage platform
  • Anomaly detection and notification

This solution automates data ingestion and anomaly detection, and provides a graphical user interface to interact, tweak, and filter anomalies based on severity.

Figure 1 illustrates the architecture of this solution:

Figure 1. Architecture diagram of an anomaly detection solution for ecommerce traffic

Figure 1. Architecture diagram of an anomaly detection solution for ecommerce traffic

Let’s look at the individual components of this architecture before reviewing the overall solution.

The ecommerce application that customers use for interaction 

A customer’s journey of purchasing a product online involves user actions that include:

  • Searching for and viewing the product on the “Product Display Page” (PDP)
  • Adding to the “cart”
  • Completing the purchase on the “checkout“ page

The traffic on these pages is broken down into chunks based on time intervals. These serve as the data points that we can use to understand traffic patterns.

The data ingesting, transforming, and storage platform

Ecommerce applications generate data in multiple formats and in different volumes. This data must be fed into a streaming platform that can ingest and collect data continuously. Typically, the data must be transformed and stored for analysis and machine learning purposes. To satisfy these requirements, we will use Amazon Kinesis Data Streams as a streaming platform for data ingestion. Amazon Kinesis Data Firehose with AWS Lambda can transform the data. And we’ll store the data in Amazon Simple Storage Service (S3).

Anomaly detection and notification in near-real time

Once our data is ready, we must analyze it near-real time to identify anomalies. We must notify the concerned team about this anomaly so that they can take necessary corrective actions, if needed. We will use Lookout for Metrics and Amazon Simple Notification Service (SNS) to satisfy these requirements.

Lookout for Metrics can detect and diagnose anomalies in traffic patterns using ML. Amazon Lookout for Metrics accepts feedback on detected anomalies and tunes the results to improve accuracy over time. Lookout for Metrics is also capable of integrating with Amazon SNS, which can send notifications via SMS, mobile push, and emails.

Monitoring ecommerce traffic with Lookout for Metrics

As shown in Figure 1, data from user traffic and user interactions with the ecommerce application is captured as a function of time, and ingested into Kinesis Data Streams. Using Kinesis Data Firehose and Lambda, data is transformed and stored in an S3 bucket. We then create a detector in Lookout for Metrics and use the S3 bucket as the data source. Because of seamless integration between S3 and Lookout for Metrics, data from S3 bucket is automatically ingested into the detector we created.

Once the detector is activated, Lookout for Metrics will start monitoring the data for anomalies, and start identifying the anomalies near-real time. Lookout for Metrics also provides a mechanism to adjust severity threshold on a scale of 0-100, which will help decrease false positives as much as desired. In addition, it integrates with SNS, and can publish notifications to an SNS Topic. An email/ SMS or mobile push subscription can be created on this topic, which will notify users about any current anomalies.

 Conclusion

In this post, we discussed how minor anomalies are hard to detect near-real time in ecommerce traffic of organizations. We also discussed the services that can be used to monitor these anomalies, such as Lookout for Metrics. Use this architecture to help you monitor, detect anomalies in near-real time, and reduce any negative impact to your business KPIs.

For further reading:

Automate Amazon Connect Data Streaming using AWS CDK

Post Syndicated from Tarik Makota original https://aws.amazon.com/blogs/architecture/automate-amazon-connect-data-streaming-using-aws-cdk/

Many customers want to provision Amazon Web Services (AWS) cloud resources quickly and consistently with lifecycle management, by treating infrastructure as code (IaC). Commonly used services are AWS CloudFormation and HashiCorp Terraform. Currently, customers set up Amazon Connect data streaming manually, as the service is not available under CloudFormation resource types. Customers may want to extend it to retrieve real-time contact and agent data. Integration is done manually and can result in issues with IaC.

Amazon Connect contact trace records (CTRs) capture the events associated with a contact in the contact center. Amazon Connect agent event streams are Amazon Kinesis Data Streams that provide near real-time reporting of agent activity within the Amazon Connect instance. The events published to the stream include these contact control panel (CCP) events:

  • Agent login
  • Agent logout
  • Agent connects with a contact
  • Agent status change, such as to available to handle contacts, or on break, or at training.

In this blog post, we will show you how to automate Amazon Connect data streaming using AWS Cloud Development Kit (AWS CDK). AWS CDK is an open source software development framework to define your cloud application resources using familiar programming languages. We will create a custom CDK resource, which in turn uses Amazon Connect API. This can be used as a template to automate other parts of Amazon Connect, or for other AWS services that don’t expose its full functionality through CloudFormation.

Overview of Amazon Connect automation solution

Amazon Connect is an omnichannel cloud contact center that helps you provide superior customer service. We will stream Amazon Connect agent activity and contact trace records to Amazon Kinesis. We will assume that data will then be used by other services or third-party integrations for processing. Here are the high-level steps and AWS services that we are going use, see Figure 1:

  1. Amazon Connect: We will create an instance and enable data streaming
  2. Cloud Deployment Toolkit: We will create custom resource and orchestrate automation
  3. Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose: To stream data out of Connect
  4. AWS Identity and Access Management (IAM): To govern access and permissible actions across all AWS services
  5. Third-party tool or Amazon S3: Used as a destination of Connect data via Amazon Kinesis data
Figure 1. Connect data streaming automation workflow

Figure 1. Connect data streaming automation workflow

Walkthrough and deployment tasks

Sample code for this solution is provided in this GitHub repo. The code is packaged as a CDK application, so the solution can be deployed in minutes. The deployment tasks are as follows:

  • Deploy the CDK app
  • Update Amazon Connect instance settings
  • Import the demo flow and data

Custom Resources enables you to write custom logic in your CloudFormation deployment. You implement the creation, update, and deletion logic to define the custom resource deployment.

CDK implements the AWSCustomResource, which is an AWS Lambda backed custom resource that uses the AWS SDK to provision your resources. This means that the CDK stack deploys a provisioning Lambda. Upon deployment, it calls the AWS SDK API operations that you defined for the resource lifecycle (create, update, and delete).

Prerequisites

For this walkthrough, you need the following prerequisites:

Deploy and verify

1. Deploy the CDK application.

The resources required for this demo are packaged as a CDK app. Before proceeding, confirm you have command line interface (CLI) access to the AWS account where you would like to deploy your solution.

  • Open a terminal window and clone the GitHub repository in a directory of your choice:
    git clone [email protected]:aws-samples/connect-cdk-blog
  • Navigate to the cdk-app directory and follow the deployment instructions. The default Region is usually us-east-1. If you would like to deploy in another Region, you can run:
    export AWS_DEFAULT_REGION=eu-central-1

2. Create the CloudFormation stack by initiating the following commands.

source .env/bin/activate
pip install -r requirements.txt
cdk synth
cdk bootstrap
cdk deploy  --parametersinstanceId={YOUR-AMAZON-CONNECT-INSTANCE-ID}

--parameters ctrStreamName={CTRStream}

--parameters agentStreamName={AgentStream}

Note: By default, the stack will create contact trace records stream [ctrStreamName] as a Kinesis Data Stream. If you want to use an Amazon Kinesis Data Firehose delivery stream instead, you can modify this behavior by going to cdk.json and adding “ctr_stream_type”: “KINESIS_FIREHOSE” as a parameter under “context.”

Once the status of CloudFormation stack is updated to CREATE_COMPLETE, the following resources are created:

  • Kinesis Data Stream
  • IAM roles
  • Lambda

3. Verify the integration.

  • Kinesis Data Streams are added to the Amazon Connect instance
Figure 2. Screenshot of Amazon Connect with Data Streaming enabled

Figure 2. Screenshot of Amazon Connect with Data Streaming enabled

Cleaning up

You can remove all resources provisioned for the CDK app by running the following command under connect-app directory:

cdk destroy

This will not remove your Amazon Connect instance. You can remove it by navigating to the AWS Management Console -> Services -> Amazon Connect. Find your Connect instance and click Delete.

Conclusion

In this blog, we demonstrated how to maintain Amazon Connect as Infrastructure as Code (IaC). Using a custom resource of AWS CDK, we have shown how to automate setting Amazon Kinesis Data Streams to Data Streaming in Amazon Connect. The same approach can be extended to automate setting other Amazon Connect properties such as Amazon Lex, AWS Lambda, Amazon Polly, and Customer Profiles. This approach will help you to integrate Amazon Connect with your Workflow Management Application in a faster and consistent manner, and reduce manual configuration.

For more information, refer to Enable Data Streaming for your instance.

Codacy Measures Developer Productivity using AWS Serverless

Post Syndicated from Catarina Gralha original https://aws.amazon.com/blogs/architecture/codacy-measures-developer-productivity-using-aws-serverless/

Codacy is a DevOps insights company based in Lisbon, Portugal. Since its launch in 2012, Codacy has helped software development and engineering teams reduce defects, keep technical debt in check, and ship better code, faster.

Codacy’s latest product, Pulse, is a service that helps understand and improve the performance of software engineering teams. This includes measuring metrics such as deployment frequency, lead time for changes, or mean time to recover. Codacy’s main platform is built on top of AWS products like Amazon Elastic Kubernetes Service (EKS), but they have taken Pulse one step further with AWS serverless.

In this post, we will explore the Pulse’s requirements, architecture, and the services it is built on, including AWS Lambda, Amazon API Gateway, and Amazon DynamoDB.

Pulse prototype requirements

Codacy had three clear requirements for their initial Pulse prototype.

  1. The solution must enable the development team to iterate quickly and have minimal time-to-market (TTM) to validate the idea.
  2. The solution must be easily scalable and match the demands of both startups and large enterprises alike. This was of special importance, as Codacy wanted to onboard Pulse with some of their existing customers. At the time, these customers already had massive amounts of information.
  3. The solution must be cost-effective, particularly during the early stages of the product development.

Enter AWS serverless

Codacy could have built Pulse on top of Amazon EC2 instances. However, this brings the undifferentiated heavy lifting of having to provision, secure, and maintain the instances themselves.

AWS serverless technologies are fully managed services that abstract the complexity of infrastructure maintenance away from developers and operators, so they can focus on building products.

Serverless applications also scale elastically and automatically behind the scenes, so customers don’t need to worry about capacity provisioning. Furthermore, these services are highly available by design and span multiple Availability Zones (AZs) within the Region in which they are deployed. This gives customers higher confidence that their systems will continue running even if one Availability Zone is impaired.

AWS serverless technologies are cost-effective too, as they are billed per unit of value, as opposed to billing per provisioned capacity. For example, billing is calculated by the amount of time a function takes to complete or the number of messages published to a queue, rather than how long an EC2 instance runs. Customers only pay when they are getting value out of the services, for example when serving an actual customer request.

Overview of Pulse’s solution architecture

An event is generated when a developer performs a specific action as part of their day-to-day tasks, such as committing code or merging a pull request. These events are the foundational data that Pulse uses to generate insights and are thus processed by multiple Pulse components called modules.

Let’s take a detailed look at a few of them.

Ingestion module

Figure 1. Pulse ingestion module architecture

Figure 1. Pulse ingestion module architecture

Figure 1 shows the ingestion module, which is the entry point of events into the Pulse platform and is built on AWS serverless applications as follows:

  • The ingestion API is exposed to customers using Amazon API Gateway. This defines REST, HTTP, and WebSocket APIs with sophisticated functionality such as request validation, rate limiting, and more.
  • The actual business logic of the API is implemented as AWS Lambda functions. Lambda can run custom code in a fully managed way. You only pay for the time that the function takes to run, in 1-millisecond increments. Lambda natively supports multiple languages, but customers can also bring their own runtimes or container images as needed.
  • API requests are authorized with keys, which are stored in Amazon DynamoDB, a key-value NoSQL database that delivers single-digit millisecond latency at any scale. API Gateway invokes a Lambda function that validates the key against those stored in DynamoDB (this is called a Lambda authorizer.)
  • While API Gateway provides a default domain name for each API, Codacy customizes it with Amazon Route 53, a service that registers domain names and configures DNS records. Route 53 offers a service level agreement (SLA) of 100% availability.
  • Events are stored in raw format in Pulse’s data lake, which is built on top of AWS’ object storage service, Amazon Simple Storage Service (S3). With Amazon S3, you can store massive amounts of information at low cost using simple HTTP requests. The data is highly available and durable.
  • Whenever a new event is ingested by the API, a message is published in Pulse’s message bus. (More information later in this post.)

Events module

Figure 2. Pulse events module architecture

Figure 2. Pulse events module architecture

The events module handles the aggregation and storage of events for actual consumption by customers, see Figure 2:

  • Events are consumed from the message bus and processed with a Lambda function, which stores them in Amazon Redshift.
  • Amazon Redshift is AWS’ managed data warehouse, and enables Pulse’s users to get insights and metrics by running analytical (OLAP) queries with the highest performance.
  • These metrics are exposed to customers via another API (the public API), which is also built on API Gateway.
  • The business logic for this API is implemented using Lambda functions, like the Ingestion module.

Message bus

Figure 3. Message bus architecture

Figure 3. Message bus architecture

We mentioned earlier that Pulse’s modules communicate messages with each other via the “message bus.” When something occurs at a specific component, a message (event) is published to the bus. At the same time, developers create subscriptions for each module that should receive these messages. This is known as the publisher/subscriber pattern (pub/sub for short), and is a fundamental piece of event-driven architectures.

With the message bus, you can decouple all modules from each other. In this way, a publisher does not need to worry about how many or who their subscribers are, or what to do if a new one arrives. This is all handled by the message bus.

Pulse’s message bus is built like this, shown in Figure 3:

  • Events are published via Amazon Simple Notification Service (SNS), using a construct called a topic. Topics are the basic unit of message publication and consumption. Components are subscribed to this topic, and you can filter out unwanted messages.
  • Developers configure Amazon SNS subscriptions to have the events sent to a queue, which provides a buffering layer from which workers can process messages. At the same time, queues also ensure that messages are not lost if there is an error. In Pulse’s case, these queues are implemented with Amazon Simple Queue Service (SQS).

Other modules

There are other parts of Pulse architecture that also use AWS serverless. For example, user authentication and sign-up are handled by Amazon Cognito, and Pulse’s frontend application is hosted on Amazon S3. This app is served to customers worldwide with low latency using Amazon CloudFront, a content delivery network.

Summary and next steps

By using AWS serverless, Codacy has been able to reduce the time required to bring Pulse to market by staying focused on developing business logic, rather than managing servers. Furthermore, Codacy is confident they can handle Pulse’s growth, as this serverless architecture will scale automatically according to demand.

Migrating AWS Lambda functions to Arm-based AWS Graviton2 processors

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/migrating-aws-lambda-functions-to-arm-based-aws-graviton2-processors/

AWS Lambda now allows you to configure new and existing functions to run on Arm-based AWS Graviton2 processors in addition to x86-based functions. Using this processor architecture option allows you to get up to 34% better price performance. This blog post highlights some considerations when moving from x86 to arm64 as the migration process is code and workload dependent.

Functions using the Arm architecture benefit from the performance and security built into the Graviton2 processor, which is designed to deliver up to 19% better performance for compute-intensive workloads. Workloads using multithreading and multiprocessing, or performing many I/O operations, can experience lower invocation time, which reduces costs.

Duration charges, billed with millisecond granularity, are 20 percent lower when compared to current x86 pricing. This also applies to duration charges when using Provisioned Concurrency. Compute Savings Plans supports Lambda functions powered by Graviton2.

The architecture change does not affect the way your functions are invoked or how they communicate their responses back. Integrations with APIs, services, applications, or tools are not affected by the new architecture and continue to work as before.

The following runtimes, which use Amazon Linux 2, are supported on Arm:

  • Node.js 12 and 14
  • Python 3.8 and 3.9
  • Java 8 (java8.al2) and 11
  • .NET Core 3.1
  • Ruby 2.7
  • Custom runtime (provided.al2)

Lambda@Edge does not support Arm as an architecture option.

You can create and manage Lambda functions powered by Graviton2 processor using the AWS Management Console, AWS Command Line Interface (AWS CLI), AWS CloudFormation, AWS Serverless Application Model (AWS SAM), and AWS Cloud Development Kit (AWS CDK). Support is also available through many AWS Lambda Partners.

Understanding Graviton2 processors

AWS Graviton processors are custom built by AWS. Generally, you don’t need to know about the specific Graviton processor architecture, unless your applications can benefit from specific features.

The Graviton2 processor uses the Neoverse-N1 core and supports Arm V8.2 (include CRC and crypto extensions) plus several other architectural extensions. In particular, Graviton2 supports the Large System Extensions (LSE), which improve locking and synchronization performance across large systems.

Migrating x86 Lambda functions to arm64

Many Lambda functions may only need a configuration change to take advantage of the price/performance of Graviton2. Other functions may require repackaging the Lambda function using Arm-specific dependencies, or rebuilding the function binary or container image.

You may not require an Arm processor on your development machine to create Arm-based functions. You can build, test, package, compile, and deploy Arm Lambda functions on x86 machines using AWS SAM and Docker Desktop. If you have an Arm-based system, such as an Apple M1 Mac, you can natively compile binaries.

Functions without architecture-specific dependencies or binaries

If your functions don’t use architecture-specific dependencies or binaries, you can switch from one architecture to the other with a single configuration change. Many functions using interpreted languages such as Node.js and Python, or functions compiled to Java bytecode, can switch without any changes. Ensure you check binaries in dependencies, Lambda layers, and Lambda extensions.

To switch functions from x86 to arm64, you can change the Architecture within the function runtime settings using the Lambda console.

Edit AWS Lambda function Architecture

Edit AWS Lambda function Architecture

If you want to display or log the processor architecture from within a Lambda function, you can use OS specific calls. For example, Node.js process.arch or Python platform.machine().

When using the AWS CLI to create a Lambda function, specify the --architectures option. If you do not specify the architecture, the default value is x86-64. For example, to create an arm64 function, specify --architectures arm64.

aws lambda create-function \
    --function-name MyArmFunction \
    --runtime nodejs14.x \
    --architectures arm64 \
    --memory-size 512 \
    --zip-file fileb://MyArmFunction.zip \
    --handler lambda.handler \
    --role arn:aws:iam::123456789012:role/service-role/MyArmFunction-role

When using AWS SAM or CloudFormation, add or amend the Architectures property within the function configuration.

MyArmFunction:
  Type: AWS::Lambda::Function
  Properties:
    Runtime: nodejs14.x
    Code: src/
    Architectures:
  	- arm64
    Handler: lambda.handler
    MemorySize: 512

When initiating an AWS SAM application, you can specify:

sam init --architecture arm64

When building Lambda layers, you can specify CompatibleArchitectures.

MyArmLayer:
  Type: AWS::Lambda::LayerVersion
  Properties:
    ContentUri: layersrc/
    CompatibleArchitectures:
      - arm64

Building function code for Graviton2

If you have dependencies or binaries in your function packages, you must rebuild the function code for the architecture you want to use. Many packages and dependencies have arm64 equivalent versions. Test your own workloads against arm64 packages to see if your workloads are good migration candidates. Not all workloads show improved performance due to the different processor architecture features.

For compiled languages like Rust and Go, you can use the provided.al2 custom runtime, which supports Arm. You provide a binary that communicates with the Lambda Runtime API.

When compiling for Go, set GOARCH to arm.

GOOS=linux GOARCH=arm go build

When compiling for Rust, set the target.

cargo build --release -- target-cpu=neoverse-n1

The default installation of Python pip on some Linux distributions is out of date (<19.3). To install binary wheel packages released for Graviton, upgrade the pip installation using:

sudo python3 -m pip install --upgrade pip

The Arm software ecosystem is continually improving. As a general rule, use later versions of compilers and language runtimes whenever possible. The AWS Graviton Getting Started GitHub repository includes known recent changes to popular packages that improve performance, including ffmpeg, PHP, .Net, PyTorch, and zlib.

You can use https://pkgs.org/ as a package repository search tool.

Sometimes code includes architecture specific optimizations. These can include code optimized in assembly using specific instructions for CRC, or enabling a feature that works well on particular architectures. One way to see if any optimizations are missing for arm64 is to search the code for __x86_64__ ifdefs and see if there is corresponding arm64 code included. If not, consider alternative solutions.

For additional language-specific considerations, see the links within the GitHub repository.

The Graviton performance runbook is a performance profiling reference by the Graviton to benchmark, debug, and optimize application code.

Building functions packages as container images

Functions packaged as container images must be built for the architecture (x86 or arm64) they are going to use. There are arm64 architecture versions of the AWS provided base images for Lambda. To specify a container image for arm64, use the arm64 specific image tag, for example, for Node.js 14:

  • public.ecr.aws/lambda/nodejs:14-arm64
  • public.ecr.aws/lambda/nodejs:latest-arm64
  • public.ecr.aws/lambda/nodejs:14.2021.10.01.16-arm64

Arm64 Images are also available from Docker Hub.

You can also use arbitrary Linux base images in addition to the AWS provided Amazon Linux 2 images. Images that support arm64 include Alpine Linux 3.12.7 or later, Debian 10 and 11, Ubuntu 18.04 and 20.04. For more information and details of other supported Linux versions, see Operating systems available for Graviton based instances.

Migrating a function

Here is an example of how to migrate a Lambda function from x86 to arm64 and take advantage of newer software versions to improve price and performance. You can follow a similar approach to test your own code.

I have an existing Lambda function as part of an AWS SAM template configured without an Architectures property, which defaults to x86_64.

  Imagex86Function:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.9

The Lambda function code performs some compute intensive image manipulation. The code uses a dependency configured with the following version:

{
  "dependencies": {
    "imagechange": "^1.1.1"
  }
}

I duplicate the Lambda function within the AWS SAM template using the same source code and specify arm64 as the Architectures.

  ImageArm64Function:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.9
      Architectures:
        - arm64

I use AWS SAM to build both Lambda functions. I specify the --use-container flag to build each function within its architecture-specific build container.

sam build –use-container

I can use sam local invoke to test the arm64 function locally even on an x86 system.

AWS SAM local invoke

AWS SAM local invoke

I then use sam deploy to deploy the functions to the AWS Cloud.

The AWS Lambda Power Tuning open-source project runs your functions using different settings to suggest a configuration to minimize costs and maximize performance. The tool allows you to compare two results on the same chart and incorporate arm64-based pricing. This is useful to compare two versions of the same function, one using x86 and the other arm64.

I compare the performance of the X86 and arm64 Lambda functions and see that the arm64 Lambda function is 12% cheaper to run:

Compare x86 and arm64 with dependency version 1.1.1

Compare x86 and arm64 with dependency version 1.1.1

I then upgrade the package dependency to use version 1.2.1, which has been optimized for arm64 processors.

{
  "dependencies": {
    "imagechange": "^1.2.1"
  }
}

I use sam build and sam deploy to redeploy the updated Lambda functions with the updated dependencies.

I compare the original x86 function with the updated arm64 function. Using arm64 with a newer dependency code version increases the performance by 30% and reduces the cost by 43%.

Compare x86 and arm64 with dependency version 1.2.1

Compare x86 and arm64 with dependency version 1.2.1

You can use Amazon CloudWatch,to view performance metrics such as duration, using statistics. You can then compare average and p99 duration between the two architectures. Due to the Graviton2 architecture, functions may be able to use less memory. This could allow you to right-size function memory configuration, which also reduces costs.

Deploying arm64 functions in production

Once you have confirmed your Lambda function performs successfully on arm64, you can migrate your workloads. You can use function versions and aliases with weighted aliases to control the rollout. Traffic gradually shifts to the arm64 version or rolls back automatically if any specified CloudWatch alarms trigger.

AWS SAM supports gradual Lambda deployments with a feature called Safe Lambda deployments using AWS CodeDeploy. You can compile package binaries for arm64 using a number of CI/CD systems. AWS CodeBuild supports building Arm based applications natively. CircleCI also has Arm compute resource classes for deployment. GitHub Actions allows you to use self-hosted runners. You can also use AWS SAM within GitHub Actions and other CI/CD pipelines to create arm64 artifacts.

Conclusion

Lambda functions using the Arm/Graviton2 architecture provide up to 34 percent price performance improvement. This blog discusses a number of considerations to help you migrate functions to arm64.

Many functions can migrate seamlessly with a configuration change, others need to be rebuilt to use arm64 packages. I show how to migrate a function and how updating software to newer versions may improve your function performance on arm64. You can test your own functions using the Lambda PowerTuning tool.

Start migrating your Lambda functions to Arm/Graviton2 today.

For more serverless learning resources, visit Serverless Land.

How to enrich AWS Security Hub findings with account metadata

Post Syndicated from Siva Rajamani original https://aws.amazon.com/blogs/security/how-to-enrich-aws-security-hub-findings-with-account-metadata/

In this blog post, we’ll walk you through how to deploy a solution to enrich AWS Security Hub findings with additional account-related metadata, such as the account name, the Organization Unit (OU) associated with the account, security contact information, and account tags. Account metadata can help you search findings, create insights, and better respond to and remediate findings.

AWS Security Hub ingests findings from multiple AWS services, including Amazon GuardDuty, Amazon Inspector, Amazon Macie, AWS Firewall Manager, AWS Identity and Access Management (IAM) Access Analyzer, and AWS Systems Manager Patch Manager. Findings from each service are normalized into the AWS Security Finding Format (ASFF), so you can review findings in a standardized format and take action quickly. You can use AWS Security Hub to provide a single view of all security-related findings, and to set up alerts, automate remediation, and export specific findings to third‑party incident management systems.

The Security or DevOps teams responsible for investigating, responding to, and remediating Security Hub findings may need additional account metadata beyond the account ID, to determine what to do about the finding or where to route it. For example, determining whether the finding originated from a development or production account can be key to determining the priority of the finding and the type of remediation action needed. Having this metadata information in the finding allows customers to create custom insights in Security Hub to track which OUs or applications (based on account tags) have the most open security issues. This blog post demonstrates a solution to enrich your findings with account metadata to help your Security and DevOps teams better understand and improve their security posture.

Solution Overview

In this solution, you will use a combination of AWS Security Hub, Amazon EventBridge and AWS Lambda to ingest the findings and automatically enrich them with account related metadata by querying AWS Organizations and Account management service APIs. The solution architecture is shown in Figure 1 below:

Figure 1: Solution Architecture and workflow for metadata enrichment

Figure 1: Solution Architecture and workflow for metadata enrichment

The solution workflow includes the following steps:

  1. New findings and updates to existing Security Hub findings from all the member accounts flow into the Security Hub administrator account. Security Hub generates Amazon EventBridge events for the findings.
  2. An EventBridge rule created as part of the solution in the Security Hub administrator account will trigger a Lambda function configured as a target every time an EventBridge notification for a new or updated finding imported into Security Hub matches the EventBridge rule shown below:
    {
      "detail-type": ["Security Hub Findings - Imported"],
      "source": ["aws.securityhub"],
      "detail": {
        "findings": {
          "RecordState": ["ACTIVE"],
          "UserDefinedFields": {
            "findingEnriched": [{
              "exists": false
            }]
          }
        }
      }
    }

  3. The Lambda function uses the account ID from the event payload to retrieve both the account information and the alternate contact information from the AWS Organizations and Account management service API. The following code within the helper.py constructs the account_details object representing the account information to enrich the finding:
    def get_account_details(account_id, role_name):
        account_details ={}
        organizations_client = AwsHelper().get_client('organizations')
        response = organizations_client.describe_account(AccountId=account_id)
        account_details["Name"] = response["Account"]["Name"]
        response = organizations_client.list_parents(ChildId=account_id)
        ou_id = response["Parents"][0]["Id"]
        if ou_id and response["Parents"][0]["Type"] == "ORGANIZATIONAL_UNIT":
            response = organizations_client.describe_organizational_unit(OrganizationalUnitId=ou_id)
            account_details["OUName"] = response["OrganizationalUnit"]["Name"]
        elif ou_id:
            account_details["OUName"] = "ROOT"
        if role_name:
            account_client = AwsHelper().get_session_for_role(role_name).client("account")
        else:
            account_client = AwsHelper().get_client('account')
        try:
            response = account_client.get_alternate_contact(
                AccountId=account_id,
                AlternateContactType='SECURITY'
            )
            if response['AlternateContact']:
                print("contact :{}".format(str(response["AlternateContact"])))
                account_details["AlternateContact"] = response["AlternateContact"]
        except account_client.exceptions.AccessDeniedException as error:
            #Potentially due to calling alternate contact on Org Management account
            print(error.response['Error']['Message'])
        
        response = organizations_client.list_tags_for_resource(ResourceId=account_id)
        results = response["Tags"]
        while "NextToken" in response:
            response = organizations_client.list_tags_for_resource(ResourceId=account_id, NextToken=response["NextToken"])
            results.extend(response["Tags"])
        
        account_details["tags"] = results
        AccountHelper.logger.info("account_details: %s" , str(account_details))
        return account_details

  4. The Lambda function updates the finding using the Security Hub BatchUpdateFindings API to add the account related data into the Note and UserDefinedFields attributes of the SecurityHub finding:
    #lookup and build the finding note and user defined fields  based on account Id
    enrichment_text, tags_dict = enrich_finding(account_id, assume_role_name)
    logger.debug("Text to post: %s" , enrichment_text)
    logger.debug("User defined Fields %s" , json.dumps(tags_dict))
    #add the Note to the finding and add a userDefinedField to use in the event bridge rule and prevent repeat lookups
    response = secHubClient.batch_update_findings(
        FindingIdentifiers=[
            {
                'Id': enrichment_finding_id,
                'ProductArn': enrichment_finding_arn
            }
        ],
        Note={
            'Text': enrichment_text,
            'UpdatedBy': enrichment_author
        },
        UserDefinedFields=tags_dict
    )

    Note: All state change events published by AWS services through Amazon Event Bridge are free of cost. The AWS Lambda free tier includes 1M free requests per month, and 400,000 GB-seconds of compute time per month at the time of publication of this post. If you process 2M requests per month, the estimated cost for this solution would be approximately $7.20 USD per month.

  5. Prerequisites

    1. Your AWS organization must have all features enabled.
    2. This solution requires that you have AWS Security Hub enabled in an AWS multi-account environment which is integrated with AWS Organizations. The AWS Organizations management account must designate a Security Hub administrator account, which can view data from and manage configuration for its member accounts. Follow these steps to designate a Security Hub administrator account for your AWS organization.
    3. All the members accounts are tagged per your organization’s tagging strategy and their security alternate contact is filled. If the tags or alternate contacts are not available, the enrichment will be limited to the Account Name and the Organizational Unit name.
    4. Trusted access must be enabled with AWS Organizations for AWS Account Management service. This will enable the AWS Organizations management account to call the AWS Account Management API operations (such as GetAlternateContact) for other member accounts in the organization. Trusted access can be enabled either by using AWS Management Console or by using AWS CLI and SDKs.

      The following AWS CLI example enables trusted access for AWS Account Management in the calling account’s organization.

      aws organizations enable-aws-service-access --service-principal account.amazonaws.com

    5. An IAM role with a read only access to lookup the GetAlternateContact details must be created in the Organizations management account, with a trust policy that allows the Security Hub administrator account to assume the role.

    Solution Deployment

    This solution consists of two parts:

    1. Create an IAM role in your Organizations management account, giving it necessary permissions as described in the Create the IAM role procedure below.
    2. Deploy the Lambda function and the other associated resources to your Security Hub administrator account

    Create the IAM role

    Using console, AWS CLI or AWS API

    Follow the Creating a role to delegate permissions to an IAM user instructions to create a IAM role using the console, AWS CLI or AWS API in the AWS Organization management account with role name as account-contact-readonly, based on the trust and permission policy template provided below. You will need the account ID of your Security Hub administrator account.

    The IAM trust policy allows the Security Hub administrator account to assume the role in your Organization management account.

    IAM Role trust policy

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<SH administrator Account ID>:root"
          },
          "Action": "sts:AssumeRole",
          "Condition": {}
        }
      ]
    }

    Note: Replace the <SH Delegated Account ID> with the account ID of your Security Hub administrator account. Once the solution is deployed, you should update the principal in the trust policy shown above to use the new IAM role created for the solution.

    IAM Permission Policy

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "account:GetAlternateContact"
                ],
                "Resource": "arn:aws:account::<Org. Management Account id>:account/o-*/*"
            }
        ]
    }

    The IAM permission policy allows the Security Hub administrator account to look up the alternate contact information for the member accounts.

    Make a note of the Role ARN for the IAM role similar to this format:

    arn:aws:iam::<Org. Management Account id>:role/account-contact-readonly. 
    			

    You will need this while the deploying the solution in the next procedure.

    Using AWS CloudFormation

    Alternatively, you can use the  provided CloudFormation template to create the role in the management account. The IAM role ARN is available in the Outputs section of the created CloudFormation stack.

    Deploy the Solution to your Security Hub administrator account

    You can deploy the solution using either the AWS Management Console, or from the GitHub repository using the AWS SAM CLI.

    Note: if you have designated an aggregation Region within the Security Hub administrator account, you can deploy this solution only in the aggregation Region, otherwise you need to deploy this solution separately in each Region of the Security Hub administrator account where Security Hub is enabled.

    To deploy the solution using the AWS Management Console

    1. In your Security Hub administrator account, launch the template by choosing the Launch Stack button below, which creates the stack the in us-east-1 Region.

      Note: if your Security Hub aggregation region is different than us-east-1 or want to deploy the solution in a different AWS Region, you can deploy the solution from the GitHub repository described in the next section.

      Select this image to open a link that starts building the CloudFormation stack

    2. On the Quick create stack page, for Stack name, enter a unique stack name for this account; for example, aws-security-hub–findings-enrichment-stack, as shown in Figure 2 below.
      Figure 2: Quick Create CloudFormation stack for the Solution

      Figure 2: Quick Create CloudFormation stack for the Solution

    3. For ManagementAccount, enter the AWS Organizations management account ID.
    4. For OrgManagementAccountContactRole, enter the role ARN of the role you created previously in the Create IAM role procedure.
    5. Choose Create stack.
    6. Once the stack is created, go to the Resources tab and take note of the name of the IAM Role which was created.
    7. Update the principal element of the IAM role trust policy which you previously created in the Organization management account in the Create the IAM role procedure above, replacing it with the role name you noted down, as shown below.
      Figure 3 Update Management Account Role’s Trust

      Figure 3 Update Management Account Role’s Trust

    To deploy the solution from the GitHub Repository and AWS SAM CLI

    1. Install the AWS SAM CLI
    2. Download or clone the github repository using the following commands
      $ git clone https://github.com/aws-samples/aws-security-hub-findings-account-data-enrichment.git
      $ cd aws-security-hub-findings-account-data-enrichment

    3. Update the content of the profile.txt file with the profile name you want to use for the deployment
    4. To create a new bucket for deployment artifacts, run create-bucket.sh by specifying the region as argument as below.
      $ ./create-bucket.sh us-east-1

    5. Deploy the solution to the account by running the deploy.sh script by specifying the region as argument
      $ ./deploy.sh us-east-1

    6. Once the stack is created, go to the Resources tab and take note of the name of the IAM Role which was created.
    7. Update the principal element of the IAM role trust policy which you previously created in the Organization management account in the Create the IAM role procedure above, replacing it with the role name you noted down, as shown below.
      "AWS": "arn:aws:iam::<SH Delegated Account ID>: role/<Role Name>"

    Using the enriched attributes

    To test that the solution is working as expected, you can create a standalone security group with an ingress rule that allows traffic from the internet. This will trigger a finding in Security Hub, which will be populated with the enriched attributes. You can then use these enriched attributes to filter and create custom insights, or take specific response or remediation actions.

    To generate a sample Security Hub finding using AWS CLI

    1. Create a Security Group using following AWS CLI command:
      aws ec2 create-security-group --group-name TestSecHubEnrichmentSG--description "Test Security Hub enrichment function"

    2. Make a note of the security group ID from the output, and use it in Step 3 below.
    3. Add an ingress rule to the security group which allows unrestricted traffic on port 100:
      aws ec2 authorize-security-group-ingress --group-id <Replace Security group ID> --protocol tcp --port 100 --cidr 0.0.0.0/0

    Within few minutes, a new finding will be generated in Security Hub, warning about the unrestricted ingress rule in the TestSecHubEnrichmentSG security group. For any new or updated findings which do not have the UserDefinedFields attribute findingEnriched set to true, the solution will enrich the finding with account related fields in both the Note and UserDefinedFields sections in the Security Hub finding.

    To see and filter the enriched finding

    1. Go to Security Hub and click on Findings on the left-hand navigation.
    2. Click in the filter field at the top to add additional filters. Choose a filter field of AWS Account ID, a filter match type of is, and a value of the AWS Account ID where you created the TestSecHubEnrichmentSG security group.
    3. Add one more filter. Choose a filter field of Resource type, a filter match type of is, and the value of AwsEc2SecurityGroup.
    4. Identify the finding for security group TestSecHubEnrichmentSG with updates to Note and UserDefinedFields, as shown in Figures 4 and 5 below:
      Figure 4: Account metadata enrichment in Security Hub finding’s Note field

      Figure 4: Account metadata enrichment in Security Hub finding’s Note field

      Figure 5: Account metadata enrichment in Security Hub finding’s UserDefinedFields field

      Figure 5: Account metadata enrichment in Security Hub finding’s UserDefinedFields field

      Note: The actual attributes you will see as part of the UserDefinedFields may be different from the above screenshot. Attributes shown will depend on your tagging configuration and the alternate contact configuration. At a minimum, you will see the AccountName and OU fields.

    5. Once you confirm that the solution is working as expected, delete the stand-alone security group TestSecHubEnrichmentSG, which was created for testing purposes.

    Create custom insights using the enriched attributes

    You can use the attributes available in the UserDefinedFields in the Security Hub finding to filter the findings. This lets you generate custom Security Hub Insight and reports tailored to suit your organization’s needs. The example shown in Figure 6 below creates a custom Security Hub Insight for findings grouped by severity for a specific owner, using the Owner attribute within the UserDefinedFields object of the Security Hub finding.

    Figure 6: Custom Insight with Account metadata filters

    Figure 6: Custom Insight with Account metadata filters

    Event Bridge rule for response or remediation action using enriched attributes

    You can also use the attributes in the UserDefinedFields object of the Security Hub finding within the EventBridge rule to take specific response or remediation actions based on values in the attributes. In the example below, you can see how the Environment attribute can be used within the EventBridge rule configuration to trigger specific actions only when value matches PROD.

    {
      "detail-type": ["Security Hub Findings - Imported"],
      "source": ["aws.securityhub"],
      "detail": {
        "findings": {
          "RecordState": ["ACTIVE"],
          "UserDefinedFields": {
            "Environment": "PROD"
          }
        }
      }
    }

    Conclusion

    This blog post walks you through a solution to enrich AWS Security Hub findings with AWS account related metadata using Amazon EventBridge notifications and AWS Lambda. By enriching the Security Hub findings with account related information, your security teams have better visibility, additional insights and improved ability to create targeted reports for specific account or business teams, helping them prioritize and improve overall security response. To learn more, see:

 
If you have feedback about this post, submit comments in the Comments section below. If you have any questions about this post, start a thread on the AWS Security Hub forum.

Want more AWS Security news? Follow us on Twitter.

Siva Rajamani

Siva Rajamani

Siva Rajamani is a Boston-based Enterprise Solutions Architect at AWS. Siva enjoys working closely with customers to accelerate their AWS cloud adoption and improve their overall security posture.

Prashob Krishnan

Prashob Krishnan

Prashob Krishnan is a Denver-based Technical Account Manager at AWS. Prashob is passionate about security. He enjoys working with customers to solve their technical challenges and help build a secure scalable architecture on the AWS Cloud.

Introducing AWS Lambda batching controls for message broker services

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-aws-lambda-batching-controls-for-message-broker-services/

This post is written by Mithun Mallick, Senior Specialist Solutions Architect.

AWS Lambda now supports configuring a maximum batch window for instance-based message broker services to fine tune when Lambda invocations occur. This feature gives you an additional control on batching behavior when processing data. It applies to Amazon Managed Streaming for Apache Kafka (Amazon MSK), self-hosted Apache Kafka, and Amazon MQ for Apache ActiveMQ and RabbitMQ.

Apache Kafka is an open source event streaming platform used to support workloads such as data pipelines and streaming analytics. It is conceptually similar to Amazon Kinesis. Amazon MSK is a fully managed, highly available service that simplifies the setup, scaling, and management of clusters running Kafka.

Amazon MQ is a managed, highly available message broker service for Apache ActiveMQ and RabbitMQ that makes it easier to set up and operate message brokers on AWS. Amazon MQ reduces your operational responsibilities by managing the provisioning, setup, and maintenance of message brokers for you.

Amazon MSK, self-hosted Apache Kafka and Amazon MQ for ActiveMQ and RabbitMQ are all available as event sources for AWS Lambda. You configure an event source mapping to use Lambda to process items from a stream or queue. This allows you to use these message broker services to store messages and asynchronously integrate them with downstream serverless workflows.

In this blog, I explain how message batching works. I show how to use the new maximum batching window control for the managed message broker services and self-managed Apache Kafka.

Understanding batching

For event source mappings, the Lambda service internally polls for new records or messages from the event source, and then synchronously invokes the target Lambda function. Lambda reads the messages in batches and provides these to your function as an event payload. Batching allows higher throughput message processing, up to 10,000 messages in a batch. The payload limit of a single invocation is 6 MB.

Previously, you could only use batch size to configure the maximum number of messages Lambda would poll for. Once a defined batch size is reached, the poller invokes the function with the entire set of messages. This feature is ideal when handling a low volume of messages or batches of data that take time to build up.

Batching window

The new Batch Window control allows you to set the maximum amount of time, in seconds, that Lambda spends gathering records before invoking the function. This brings similar batching functionality that AWS supports with Amazon SQS to Amazon MQ, Amazon MSK and self-managed Apache Kafka. The Lambda event source mapping batching functionality can be described as follows.

Batching controls with Lambda event source mapping

Batching controls with Lambda event source mapping

Using MaximumBatchingWindowInSeconds, you can set your function to wait up to 300 seconds for a batch to build before processing it. This allows you to create bigger batches if there are enough messages. You can manage the average number of records processed by the function with each invocation. This increases the efficiency of each invocation, and reduces the frequency.

Setting MaximumBatchingWindowInSeconds to 0 invokes the target Lambda function as soon as the Lambda event source receives a message from the broker.

Message broker batching behavior

For ActiveMQ, the Lambda event source mapping uses the Java Message Service (JMS) API to receive messages. For RabbitMQ, Lambda uses a RabbitMQ client library to get messages from the queue.

The Lambda event source mappings act as a consumer when polling the queue. The batching pattern for all instance-based message broker services is the same. As soon as a message is received, the batching window timer starts. If there are more messages, the consumer makes additional calls to the broker and adds them to a buffer. It keeps a count of the number of messages and the total size of the payload.

The batch is considered complete if the addition of a new message makes the batch size equal to or greater than 6 MB, or the batch window timeout is reached. If the batch size is greater than 6 MB, the last message is returned back to the broker.

Lambda then invokes the target Lambda function synchronously and passes on the batch of messages to the function. The Lambda event source continues to poll for more messages and as soon as it retrieves the next message, the batching window starts again. Polling and invocation of the target Lambda function occur in separate processes.

Kafka uses a distributed append log architecture to store messages. This works differently from ActiveMQ and RabbitMQ as messages are not removed from the broker once they have been consumed. Instead, consumers must maintain an offset to the last record or message that was consumed from the broker. Kafka provides several options in the consumer API to simplify the tracking of offsets.

Amazon MSK and Apache Kafka store data in multiple partitions to provide higher scalability. Lambda reads the messages sequentially for each partition and a batch may contain messages from different partitions.  Lambda then commits the offsets once the target Lambda function is invoked successfully.

Configuring the maximum batching window

To reduce Lambda function invocations for existing or new functions, set the MaximumBatchingWindowInSeconds value close to 300 seconds. A longer batching window can introduce additional latency. For latency-sensitive workloads set the MaximumBatchingWindowInSeconds value to an appropriate setting.

To configure Maximum Batching on a function in the AWS Management Console, navigate to the function in the Lambda console. Create a new Trigger, or edit an existing once. Along with the Batch size you can configure a Batch window. The Trigger Configuration page is similar across the broker services.

Max batching trigger window

Max batching trigger window

You can also use the AWS CLI to configure the --maximum-batching-window-in-seconds parameter.

For example, with Amazon MQ:

aws lambda create-event-source-mapping --function-name my-function \
--maximum-batching-window-in-seconds 300 --batch-size 100 --starting-position AT_TIMESTAMP \
--event-source-arn arn:aws:mq:us-east-1:123456789012:broker:ExampleMQBroker:b-24cacbb4-b295-49b7-8543-7ce7ce9dfb98

You can use AWS CloudFormation to configure the parameter. The following example configures the MaximumBatchingWindowInSeconds as part of the AWS::Lambda::EventSourceMapping resource for Amazon MQ:

  LambdaFunctionEventSourceMapping:
    Type: AWS::Lambda::EventSourceMapping
    Properties:
      BatchSize: 10
      MaximumBatchingWindowInSeconds: 300
      Enabled: true
      Queues:
        - "MyQueue"
      EventSourceArn: !GetAtt MyBroker.Arn
      FunctionName: !GetAtt LambdaFunction.Arn
      SourceAccessConfigurations:
        - Type: BASIC_AUTH
          URI: !Ref secretARNParameter

You can also use AWS Serverless Application Model (AWS SAM) to configure the parameter as part of the Lambda function event source.

MQReceiverFunction:
      Type: AWS::Serverless::Function 
      Properties:
        FunctionName: MQReceiverFunction
        CodeUri: src/
        Handler: app.lambda_handler
        Runtime: python3.9
        Events:
          MQEvent:
            Type: MQ
            Properties:
              Broker: !Ref brokerARNParameter
              BatchSize: 10
              MaximumBatchingWindowInSeconds: 300
              Queues:
                - "workshop.queueC"
              SourceAccessConfigurations:
                - Type: BASIC_AUTH
                  URI: !Ref secretARNParameter

Error handling

If your function times out or returns an error for any of the messages in a batch, Lambda retries the whole batch until processing succeeds or the messages expire.

When a function encounters an unrecoverable error, the event source mapping is paused and the consumer stops processing records. Any other consumers can continue processing, provided that they do not encounter the same error.  If your Lambda event records exceed the allowed size limit of 6 MB, they can go unprocessed.

For Amazon MQ, you can redeliver messages when there’s a function error. You can configure dead-letter queues (DLQs) for both Apache ActiveMQ, and RabbitMQ. For RabbitMQ, you can set a per-message TTL to move failed messages to a DLQ.

Since the same event may be received more than once, functions should be designed to be idempotent. This means that receiving the same event multiple times does not change the result beyond the first time the event was received.

Conclusion

Lambda supports a number of event sources including message broker services like Amazon MQ and Amazon MSK. This post explains how batching works with the event sources and how messages are sent to the Lambda function.

Previously, you could only control the batch size. The new Batch Window control allows you to set the maximum amount of time, in seconds, that Lambda spends gathering records before invoking the function. This can increase the overall throughput of message processing and reduces Lambda invocations, which may improve cost.

For more serverless learning resources, visit Serverless Land.

How Ribbon Communications Built a Scalable, Resilient Robocall Mitigation Platform

Post Syndicated from Siva Rajamani original https://aws.amazon.com/blogs/architecture/how-ribbon-communications-built-a-scalable-resilient-robocall-mitigation-platform/

Ribbon Communications provides communications software, and IP and optical networking end-to-end solutions that deliver innovation, unparalleled scale, performance, and agility to service providers and enterprise.

Ribbon Communications is helping customers modernize their networks. In today’s data-hungry, 24/7 world, this equates to improved competitive positioning and business outcomes. Companies are migrating from on-premises equipment for telephony services and looking for equivalent as a service (aaS) offerings. But these solutions must still meet the stringent resiliency, availability, performance, and regulatory requirements of a telephony service.

The telephony world is inundated with robocalls. In the United States alone, there were an estimated 50.5 billion robocalls in 2021! In this blog post, we describe the Ribbon Identity Hub – a holistic solution for robocall mitigation. The Ribbon Identity Hub enables services that sign and verify caller identity, which is compliant to the ATIS standards under the STIR/SHAKEN framework. It also evaluates and scores calls for the probability of nuisance and fraud.

Ribbon Identity Hub is implemented in Amazon Web Services (AWS). It is a fully managed service for telephony service providers and enterprises. The solution is secure, multi-tenant, automatic scaling, and multi-Region, and enables Ribbon to offer managed services to a wide range of telephony customers. Ribbon ensures resiliency and performance with efficient use of resources in the telephony environment, where load ratios between busy and idle time can exceed 10:1.

Ribbon Identity Hub

The Ribbon Identity Hub services are separated into a data (call-transaction) plane, and a control plane.

Data plane (call-transaction)

The call-transaction processing is typically invoked on a per-call-setup basis where availability, resilience, and performance predictability are paramount. Additionally, due to high variability in load, automatic scaling is a prerequisite.

Figure 1. Data plane architecture

Figure 1. Data plane architecture

Several AWS services come together in a solution that meets all these important objectives:

  1. Amazon Elastic Container Service (ECS): The ECS services are set up for automatic scaling and span two Availability Zones. This provides the horizontal scaling capability, the self-healing capacity, and the resiliency across Availability Zones.
  2. Elastic Load Balancing – Application Load Balancer (ALB): This provides the ability to distribute incoming traffic to ECS services as the target. In addition, it also offers:
    • Seamless integration with the ECS Auto Scaling group. As the group grows, traffic is directed to the new instances only when they are ready. As traffic drops, traffic is drained from the target instances for graceful scale down.
    • Full support for canary and linear upgrades with zero downtime. Maintains full-service availability without any changes or even perception for the client devices.
  3. Amazon Simple Storage Service (S3): Transaction detail records associated with call-related requests must be securely and reliably maintained for over a year due to billing and other contractual obligations. Amazon S3 simplifies this task with high durability, lifecycle rules, and varied controls for retention.
  4. Amazon DynamoDB: Building resilient services is significantly easier when the compute processing can be stateless. Amazon DynamoDB facilitates such stateless architectures without compromise. Coupled with the availability of the Amazon DynamoDB Accelerator (DAX) caching layer, the solution can meet the extreme low latency operation requirements.
  5. AWS Key Management Service (KMS): Certain tenant configuration is highly confidential and requires elevated protection. Furthermore, the data is part of the state that must be recovered across Regions in disaster recovery scenarios. To meet the security requirements, the KMS is used for envelope encryption using per-tenant keys. Multi-Region KMS keys facilitates the secure availability of this state across Regions without the need for application-level intervention when replicating encrypted data.
  6. Amazon Route 53: For telephony services, any non-transient service failure is unacceptable. In addition to providing high degree of resiliency through Multi-AZ architecture, Identity Hub also provides Regional level high availability through its multi-Region active-active architecture. Route 53 with health checks provides for dynamic rerouting of requests within minutes to alternate Regions.

Control plane

The Identity Hub control plane is used for customer configuration, status, and monitoring. The API is REST-based. Since this is not used on a call-by-call basis, the requirements around latency and performance are less stringent, though the requirements around high resiliency and dynamic scaling still apply. In this area, ease of implementation and maintainability are key.

Figure 2. Control plane architecture

Figure 2. Control plane architecture

The following AWS services implement our control plane:

  1. Amazon API Gateway: Coupled with a custom authenticator, the API Gateway handles all the REST API credential verification and routing. Implementation of an API is transformed into implementing handlers for each resource, which is the application core of the API.
  2. AWS Lambda: All the REST API handlers are written as Lambda functions. By using the Lambda’s serverless and concurrency features, the application automatically gains self-healing and auto-scaling capabilities. There is also a significant cost advantage as billing is per millisecond of actual compute time used. This is significant for a control plane where usage is typically sparse and unpredictable.
  3. Amazon DynamoDB: A stateless architecture with Lambda and API Gateway, all persistent state must be stored in an external database. The database must match the resilience and auto-scaling characteristics of the rest of the control plane. DynamoDB easily fits the requirements here.

The customer portal, in addition to providing the user interface for control plane REST APIs, also delivers a rich set of user-customizable dashboards and reporting capability. Here again, the availability of various AWS services simplifies the implementation, and remains non-intrusive to the central call-transaction processing.

Services used here include:

  1. AWS Glue: Enables extraction and transformation of raw transaction data into a format useful for reporting and dashboarding. AWS Glue is particularly useful here as the data available is regularly expanding, and the use cases for the reporting and dashboarding increase.
  2. Amazon QuickSight: Provides all the business intelligence (BI) functionality, including the ability for Ribbon to offer separate author and reader access to their users, and implements tenant-based access separation.

Conclusion

Ribbon has successfully deployed Identity Hub to enable cloud hosted telephony services to mitigate robocalls. Telephony requirements around resiliency, performance, and capacity were not compromised. Identity Hub offers the benefits of a 24/7 fully managed service requiring no additional customer on-premises equipment.

Choosing AWS services for Identity Hub gives Ribbon the ability to scale and meet future growth. The ability to dynamically scale the service in and out also brings significant cost advantages in telephony applications where busy hour traffic is significantly higher than idle time traffic. In addition, the availability of global AWS services facilitates the deployment of services in customer-local geographic locations to meet performance requirements or local regulatory compliance.

How Experian uses Amazon SageMaker to Deliver Affordability Verification 

Post Syndicated from Haresh Nandwani original https://aws.amazon.com/blogs/architecture/how-experian-uses-amazon-sagemaker-to-deliver-affordability-verification/

Financial Service (FS) providers must identify patterns and signals in a customer’s financial behavior to provide deeper, up-to-the-minute, insight into their affordability and credit risk. FS providers use these insights to improve decision making and customer management capabilities. Machine learning (ML) models and algorithms play a significant role in automating, categorising, and deriving insights from bank transaction data.

Experian publishes Categorisation-as-a-Service (CaaS) ML models that automate analysis of bank and credit card transactions, to be deployed in Amazon SageMaker. Driven by a suite of Experian proprietary algorithms, these models categorise a customer’s bank or credit card transactions into one of over 180 different income and expenditure categories. The service turns these categorised transactions into a set of summarised insights that can help a business better understand their customer and make more informed decisions. These insights provide a detailed picture of a customer’s financial circumstances and resilience by looking at verified income, expenditure, and credit behavior.

This blog demonstrates how financial service providers can introduce affordability verification and categorisation into their digital journeys by deploying Experian CaaS ML models on SageMaker. You don’t need significant ML knowledge to start using Amazon SageMaker and Experian CaaS.

Affordability verification and data categorisation in digital journeys

Product onboarding journeys are increasingly digital. Most financial service providers expect most of these journeys to initiate and complete online. An example journey would be consumers looking to apply for credit with their existing FS provider. These journeys typically involve FS providers performing affordability verification to ensure consumers are offered products they can afford. FS providers can now use Experian CaaS ML models available via AWS Marketplace to generate real-time financial insights and affordability verification for their customers.

Figure 1 depicts a typical digital journey for consumers applying for credit.

Figure 1. Customer journey for consumers applying for credit

Figure 1. Customer journey for consumers applying for credit

  1. Data categorisation for transactional data. Existing transactional data for current consumers is typically sourced from on-premises data sources into a data lake in the cloud. It is then prepared and transformed for processing and analytics. This analysis is done based on the FS provider’s existing consent in compliance with relevant data protection laws. Additional transaction information for other accounts not held by the lender can be sourced from Open Banking and categorised separately.
  2. Store categorised transactions. Background processes run a SageMaker batch transform job using the Experian CaaS Data Categorisation model to categorise this transactional data.
  3. Consumer applies for credit. Consumers use the FS providers’ existing front-end web, mobile, or any other digital channel to apply for credit.
  4. FS provider retrieves up-to-date insights. Insights are generated in real time using the Experian CaaS insights model deployed as endpoints in SageMaker and returned to the consumer-facing digital channel.
  5. FS provider makes credit decision. The channel app consolidates these insights to decide on product eligibility and drive customer journeys.

Deploying and publishing Experian CaaS ML models to Amazon SageMaker

Figure 2 demonstrates the technical solution for the customer journey described in the preceding section.

Figure 2. Credit application – technical solution using Amazon SageMaker and Experian CaaS ML models

Figure 2. Credit application – technical solution using Amazon SageMaker and Experian CaaS ML models

  1. Financial Service providers can use AWS Data Migration Service (AWS DMS) to replicate transactional data from their on-premises systems such as their core banking systems to Amazon S3. Customers can source this transactional data into a highly available and scalable data lake solution on AWS. Refer to AWS DMS documentation for technical details on supported database sources.
  2. FS providers can use AWS Glue, a serverless data integration service, to cleanse, prepare, and transform the transactional data into formats supported by the Experian CaaS ML models.
  3. FS providers can subscribe and download CaaS ML models built for SageMaker from the AWS Marketplace.
  4. These models can be deployed to SageMaker hosting services as a SageMaker endpoint for real-time inference. Endpoints are fully managed by AWS, and can be set up to scale on demand and deployed in a Multi-AZ model for resilience. FS providers can use Amazon API Gateway and AWS Lambda to make these endpoints available to their consumer-facing applications.
  5. SageMaker also supports a batch transform mode for ML models, which in this scenario will be used to precategorise transactional data. This mode is also useful for use cases that require nearly continuous and regular analysis such as a regular anti-fraud assessment.
  6. Consumer requests for a financial product such as a credit card on an FS provider’s digital channels.
  7. These requests invoke SageMaker endpoints, which use Experian CaaS models to derive real-time insights.
  8. These insights are used to further drive the customer’s product journey. CaaS models are pre-trained and can return insights within the latency requirements of most real-time digital journeys.

Security and compliance using CaaS

AWS Marketplace models are scanned by AWS for common vulnerabilities and exposures (CVE). CVE is a list of publicly known information about security vulnerability and exposure. For details on infrastructure security applied by SageMaker, see Infrastructure Security in Amazon SageMaker.

Data security is a key concern for FS providers and sharing of data externally is challenging from a security and compliance perspective. The CaaS deployment model described here helps address these challenges as data owned by the FS provider remains within their control domain and AWS account. There is no requirement for this data to be shared with Experian. This means the customer’s personal financial information is retained by the FS provider. FS providers cannot access the model code as it is running in a locked SageMaker environment.

AWS Marketplace models such as the Experian CaaS ML models are deployed in a network isolation mode. This ensures that the models cannot make any outbound network calls, even to other AWS services such as Amazon S3. SageMaker still performs download and upload operations against Amazon S3 in isolation from the model.

Implementing upgrades to CaaS ML models

ML model upgrades can be performed in place in Amazon SageMaker as vendors release newer versions of their models in AWS Marketplace. Endpoints can be set up in a blue/green deployment pattern to ensure that upgrades do not impact consumers and be safely rolled back with no business interruptions.

Conclusion

Automated categorisation of bank transaction data is now being used by FS providers as they start to realise the benefits it can bring to their business. This is being driven in part by the advent of Open Banking. Many FS providers have increased confidence in the accuracy and performance of automated categorisation engines. Suppliers such as Experian are providing transparency around their methodologies used to categorise data, which is also encouraging adoption.

In this blog, we covered how FS providers can introduce automated categorisation of data and affordability identification capabilities into their digital journeys. This can be done quickly and without significant in-house ML skills, using Amazon SageMaker and Experian CaaS ML models. SageMaker endpoints and batch transform capabilities enable the deployment of a highly scalable, secure, and extensible ML infrastructure with minimal development and operational effort.

Experian’s CaaS is available for use via the AWS Marketplace.

Using Node.js ES modules and top-level await in AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-node-js-es-modules-and-top-level-await-in-aws-lambda/

This post is written by Dan Fox, Principal Specialist Solutions Architect, Serverless.

AWS Lambda now enables the use of ECMAScript (ES) modules in Node.js 14 runtimes. This feature allows Lambda customers to use dependency libraries that are configured as ES modules, or to designate their own function code as an ES module. It provides customers the benefits of ES module features like import/export operators, language-level support for modules, strict mode by default, and improved static analysis and tree shaking. ES modules also enable top-level await, a feature that can lower cold start latency when used with Provisioned Concurrency.

This blog post shows how to use ES modules in a Lambda function. It also provides guidance on how to use top-level await with Provisioned Concurrency to improve cold start performance for latency sensitive workloads.

Designating a function handler as an ES module

You may designate function code as an ES module in one of two ways. The first way is to specify the “type” in the function’s package.json file. By setting the type to “module”, you designate all “.js” files in the package to be treated as ES modules. Set the “type” as “commonjs” to specify the package contents explicitly as CommonJS modules:

// package.json
{
  "name": "ec-module-example",
  "type": "module",
  "description": "This package will be treated as an ES module.",
  "version": "1.0",
  "main": "index.js",
  "author": "Dan Fox",
  "license": "ISC"
}

// index.js – this file will inherit the type from 
// package.json and be treated as an ES module.

import { double } from './lib.mjs';

export const handler = async () => {
    let result = double(6); // 12
    return result;
};

// lib.mjs

export function double(x) {
    return x + x;
}

The second way to designate a function as either an ES module or a CommonJS module is by using the file name extension. File name extensions override the package type directive.

File names ending in .cjs are always treated as CommonJS modules. File names ending in .mjs are always treated as ES modules. File names ending in .js inherit their type from the package. You may mix ES modules and CommonJS modules within the same package. Packages are designated as CommonJS by default:

// this file is named index.mjs – it will always be treated as an ES module
import { square } from './lib.mjs';

export async function handler() {
    let result = square(6); // 36
    return result;
};

// lib.mjs
export function square(x) {
    return x * x;
}

Understanding Provisioned Concurrency

When a Lambda function scales out, the process of allocating and initializing new runtime environments may increase latency for end users. Provisioned Concurrency gives customers more control over cold start performance by enabling them to create runtime environments in advance.

In addition to creating execution environments, Provisioned Concurrency also performs initialization tasks defined by customers. Customer initialization code performs a variety of tasks including importing libraries and dependencies, retrieving secrets and configurations, and initializing connections to other services. According to an AWS analysis of Lambda service usage, customer initialization code is the largest contributor to cold start latency.

Provisioned Concurrency runs both environment setup and customer initialization code. This enables runtime environments to be ready to respond to invocations with low latency and reduces the impact of cold starts for end users.

Reviewing the Node.js event loop

Node.js has an event loop that causes it to behave differently than other runtimes. Specifically, it uses a non-blocking input/output model that supports asynchronous operations. This model enables it to perform efficiently in most cases.

For example, if a Node.js function makes a network call, that request may be designated as an asynchronous operation and placed into a callback queue. The function may continue to process other operations within the main call stack without getting blocked by waiting for the network call to return. Once the network call is returned, the callback is run and then removed from the callback queue.

This non-blocking model affects the Lambda execution environment lifecycle. Asynchronous functions written in the initialization block of a Node.js Lambda function may not complete before handler invocation. In fact, it is possible for function handlers to be invoked with open items remaining in the callback queue.

Typically, JavaScript developers use the await keyword to instruct a function to block and force it to complete before moving on to the next step. However, await is not permitted in the initialization block of a CommonJS JavaScript function. This behavior limits the amount of asynchronous initialization code that can be run by Provisioned Concurrency before the invocation cycle.

Improving cold start performance with top-level await

With ES modules, developers may use top-level await within their functions. This allows developers to use the await keyword in the top level of the file. With this feature, Node.js functions may now complete asynchronous initialization code before handler invocations. This maximizes the effectiveness of Provisioned Concurrency as a mechanism for limiting cold start latency.

Consider a Lambda function that retrieves a parameter from the AWS Systems Manager Parameter Store. Previously, using CommonJS syntax, you place the await operator in the body of the handler function:

// method1 – CommonJS

// CommonJS require syntax
const { SSMClient, GetParameterCommand } = require("@aws-sdk/client-ssm"); 

const ssmClient = new SSMClient();
const input = { "Name": "/configItem" };
const command = new GetParameterCommand(input);
const init_promise = ssmClient.send(command);

exports.handler = async () => {
    const parameter = await init_promise; // await inside handler
    console.log(parameter);

    const response = {
        "statusCode": 200,
        "body": parameter.Parameter.Value
    };
    return response;
};

When you designate code as an ES module, you can use the await keyword at the top level of the code. As a result, the code that makes a request to the AWS Systems Manager Parameter Store now completes before the first invocation:

// method2 – ES module

// ES module import syntax
import { SSMClient, GetParameterCommand } from "@aws-sdk/client-ssm"; 

const ssmClient = new SSMClient();
const input = { "Name": "/configItem" }
const command = new GetParameterCommand(input);
const parameter = await ssmClient.send(command); // top-level await

export async function handler() {
    const response = {
        statusCode: 200,
        "body": parameter.Parameter.Value
    };
    return response;
};

With on-demand concurrency, an end user is unlikely to see much difference between these two methods. But when you run these functions using Provisioned Concurrency, you may see performance improvements. Using top-level await, Provisioned Concurrency fetches the parameter during its startup period instead of during the handler invocation. This reduces the duration of the handler execution and improves end user response latency for cold invokes.

Performing benchmark testing

You can perform benchmark tests to measure the impact of top level await. I have created a project that contains two Lambda functions, one that contains an ES module and one that contains a CommonJS module.

Both functions are configured to respond to a single API Gateway endpoint. Both functions retrieve a parameter from AWS Systems Manager Parameter Store and are configured to use Provisioned Concurrency. The ES module uses top-level await to retrieve the parameter. The CommonJS function awaits the parameter retrieval in the handler.

Example architecture

Before deploying the solution, you need:

To deploy:

  1. From a terminal window, clone the git repo:
    git clone https://github.com/aws-samples/aws-lambda-es-module-performance-benchmark
  2. Change directory:
    cd ./aws-lambda-es-module-performance-benchmark
  3. Build the application:
    sam build
  4. Deploy the application to your AWS account:
    sam deploy --guided
  5. Take note of the API Gateway URL in the Outputs section.
    Deployment outputs

This post uses a popular open source tool Artillery to provide load testing. To perform load tests:

  1. Open config.yaml document in the /load_test directory and replace the target string with the URL of the API Gateway:
    target: “Put API Gateway url string here”
  2. From a terminal window, navigate to the /load_test directory:
    cd load_test
  3. Download and install dependencies:
    npm install
  4. Begin load test for the CommonJS function.
    ./test_commonjs.sh
  5. Begin load test for ES module function.
    ./test_esmodule.sh

Reviewing the results

Test results

Here is a side-by-side comparison of the results of two load tests of 600 requests each. The left shows the results for the CommonJS module and the right shows the results for the ES module. The p99 response time reflects the cold start durations when the Lambda service scales up the function due to load. The p99 for the CommonJS module is 603 ms while the p99 for the ES module is 340.5 ms, a performance improvement of 43.5% (262.5 ms) for the p99 of this comparison load test.

Cleaning up

To delete the sample application, use the latest version of the AWS SAM CLI and run:

sam delete

Conclusion

Lambda functions now support ES modules in Node.js 14.x runtimes. ES modules support await at the top-level of function code. Using top-level await maximizes the effectiveness of Provisioned Concurrency and can reduce the latency experienced by end users during cold starts.

This post demonstrates a sample application that can be used to perform benchmark tests that measure the impact of top-level await.

For more serverless content, visit Serverless Land.

Validating addresses with AWS Lambda and the Amazon Location Service

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/validating-addresses-with-aws-lambda-and-the-amazon-location-service/

This post is written by Matthew Nightingale, Associate Solutions Architect.

Traditional methods of performing address validation on geospatial datasets can be expensive and time consuming. Using Amazon Location Service with AWS Lambda in a serverless data processing pipeline, you may achieve significant performance improvements and cost savings on address validation jobs that use geospatial data.

This blog contains a deployable AWS Serverless Application Model (AWS SAM) template. It also uses sample data sourced from publicly available datasets that you can deploy and use to test the application. This blog offers a starting point to build out a serverless address validation pipeline in your own AWS account.

Overview

This application implements a serverless scatter/gather architecture using Lambda and Amazon S3, performing address validation with the Amazon Location Service. An S3 PUT event triggers each Lambda function to run data processing jobs along each step of the pipeline.

To test the application, a user uploads a .CSV file to S3. This dataset is labeled with fields that are recognized by the 2waygeocoder Lambda function. The application returns a processed dataset to S3 appended with location information from the Amazon Location Places API.

Solution overview

  1. The Scatter Lambda function takes a dataset from the S3 bucket labeled input and splits it into equally sized shards.
  2. The Process Lambda function takes each shard from the pre-processed bucket. It performs address validation in parallel with a 2waygeocoder function calling the Amazon Location Service Places API.
  3. The Gather Lambda function takes each shard from the post-processed bucket. It appends the data into a complete dataset with additional address information.

Amazon Location Service

Amazon Location Service sources high-quality geospatial data from HERE and ESRI to support searches by using a place index resource.

With the Amazon Locations Places API, you can convert addresses and other textual queries into geographic coordinates (also known as geocoding). You can also convert geographic positions into addresses and place descriptions (known as reverse geocoding).

The example application includes a 2waygeocoder capable of both geocoding and reverse geocoding. The next section shows examples of the call and response from the Amazon Location Places API for both geocoding and reverse geocoding.

Geocoding with Amazon Location Service

Here is an example of calling the Amazon Location Service Places API using the AWS SDK for Python (Boto3). This uses the search_place_index_for_text method:

Response = location.search_place_index_for_text(
	IndexName = ‘explore.place’ 
###index is created using Amazon Location service
	Text = “Boston, MA”)
location_response = Reponse[“Results”]
print(location_response)

Example response:

Response

Example reverse-geocoding with Amazon Location Service

Here is another example of calling the Amazon Location Service Places API using the AWS SDK for Python (boto3). This uses the search_place_index_for_position method:

Response = location.search_place_index_for_position(
	IndexName = ‘explore.place’ 
###index is created using Amazon Location service
	Position = “-71.056739, 42.358660”))
location_response = Reponse[“Results”]
print(location_response)

Example response:

Response

Design considerations

Processing data with Lambda in parallel using a serverless scatter/gather pipeline helps provide performance efficiency at lower cost. To provide even greater performance, you can optimize your Lambda configuration for higher throughput. There are several strategies you can implement to do this and a key few topics to keep in mind.

Increase the allocated memory for your Lambda function

The simplest way to increase throughput is to increase the allocated memory of the Lambda function.

Faster Lambda functions can process more data and increase throughput. This works even if a Lambda function’s memory utilization is low. This is because increasing memory also increases vCPUs in proportion to the amount configured. Each function supports up to 10 GB of memory and you can access up to six vCPUs per function.

To see the average cost and execution speed for each memory configuration, the Lambda Power Tuning tool helps to visualize the tradeoffs.

Optimize shard size

Another method for increasing performance in a serverless scatter/gather architecture is to optimize the total number of shards created by the scatter function. Increasing the total number of shards consequently reduces the size of any single shard, allowing Lambda to process each shard faster.

When scaling with Lambda, one instance of a function handles one request at a time. When the number of requests increases, Lambda creates more instances of the function to process traffic. Because S3 invokes Lambda asynchronously, there is an internal queue buffering requests between the event source and the Lambda service.

In a serverless scatter/gather architecture, having more shards results in more concurrent invocations of the process Lambda function. For more information about scaling and concurrency with Lambda, see this blog post. Increasing concurrency with Lambda can lead to API request throttling.

Consider API request throttling with your concurrent Lambda functions

In a serverless scatter/gather architecture, the rate at which your code calls APIs increases by a factor equal to the number of concurrent Lambda functions. This means API request limits can quickly be exceeded. You must consider Service Quotas and API request limits when trying to increase the performance of your serverless scatter/gather architecture.

For example, the Amazon Location Places APIs called in the processing function of this application has a default limit of 50 API requests per second. The 2waygeocoder calls on average about 12 APIs per second. Splitting the application into more than four shards may cause API throttling exception errors in this case. Requests to increase Service Quotas can be made through your AWS account.

Deploying the solution

You need the following perquisites to deploy the example application:

Deploy the example application:

  1. Clone the repository and download the sample source code to your environment where AWS SAM is installed:
    git clone https://github.com/aws-samples/amazon-location-service-serverless-address-validation
  2. Change into the project directory containing the template.yaml file:
    cd ~/environment/amazon-location-service-serverless-address-validation
  3. Build the application using AWS SAM:
    sam build
    Terminal output
  4. Deploy the application to your account using AWS SAM. Be sure to follow proper S3 naming conventions providing globally unique names for S3 buckets:
    sam deploy --guided
    Deployment output

Testing the application

Testing geocoding

To test the application, download the dataset that is linked in Testing the Application section of the GitHub repository. These tests demonstrate both the geocoding and reverse-geocoding capabilities of the application.

First, test the geocoding capabilities. You perform address validation on the City of Hartford Business Listing dataset linked in the GitHub repository. The dataset contains a listing of all the active businesses registered in the city Hartford, CT, and each business address. The GitHub repo links to an external website where you can download the dataset.

  1. Download the .csv version of the City of Hartford Business Listing dataset. The link is found in the Testing the Application section of the README file on GitHub.
  2. Open the file locally to explore its contents.
  3. Ensure that the .csv file contains columns labeled as “Address”, “City”, and “State”. The 2waygeocoder deployed as part of the AWS SAM template recognizes these columns to perform geocoding.
  4. Before testing the application’s geocoding capabilities, explore the pricing of Amazon Location Service. In order to save money, you can trim the length of the dataset for testing by removing rows. Once the dataset is trimmed to a desired length, navigate to S3 in the AWS Management Console.
  5. Upload the dataset to the S3 bucket labeled “input”. This triggers the scatter function.
  6. Navigate to the S3 bucket labeled “raw” to view the shards of your dataset created by the scatter function.
  7. Navigate to Lambda and select the 2waygeocoder function to view the CloudWatch Logs to see any information that is returned by the function code in near-real-time.
  8. Once the data is processed, navigate to the S3 bucket labeled “destination” to view the complete processed dataset that is created by the gather function. It may take several minutes for your dataset to finish processing.

Congratulations! You have successfully geocoded a dataset using Amazon Location Service with a serverless address validation pipeline.

Testing reverse-geocoding

Next, test the reverse-geocoding capabilities of the application. You perform address validation on the Miami Housing Dataset linked in the GitHub repository. This dataset contains information on 13,932 single-family homes sold in Miami. The repo links to an external website where you can download the dataset.

Before testing, explore the pricing of Amazon Location Service. To start the test:

  1. Download the zip file containing the .csv version of the dataset from . The link is found in the Testing the Application section of the README file on GitHub.
  2. Open the file locally to explore its contents.
  3. Ensure the .csv file contains columns A and B labeled “Latitude” and “Longitude”. You must edit these column headers to match the correct format that is recognized by the 2waygeocoder to perform reverse-geocoding. Only the “L” should be capitalized.
  4. To minimize cost, trim the length of the dataset for testing by removing rows. At the full size of ~13,933 rows, the dataset takes approx. 5 minutes to process.
  5. Once the dataset is trimmed to a desired length and both column A and B are labeled as “Latitude” and “Longitude” respectively, navigate to S3 in the AWS Management Console, and upload the dataset to your S3 bucket labeled “Input”.
  6. Navigate to the S3 bucket labeled “raw” to view the shards of your dataset.
  7. Navigate to Lambda and select the 2waygeocoder function to view the CloudWatch Logs to see any information that is returned by the function code in near-real-time.
  8. Navigate to the S3 bucket labeled “destination” to view the complete processed dataset that is created by the gather function. It may take several minutes for your dataset to finish processing.

Congratulations! You have successfully reverse-geocoded a dataset with Amazon Location Service using a serverless scatter/gather pipeline. You can move on to the conclusion, or continue to test the geocoding capabilities of the application with additional datasets.

Next steps

To get started testing your own datasets, use the AWS SAM template from GitHub deployed as part of this blog. Ensure that the labels in your dataset are labeled to match the constructs used in this blog post. The 2waygeocoder recognizes columns labeled “Latitude” and “Longitude” to perform reverse-geocoding, and “Address”, “City”, and “State” to perform geocoding.

Now that the data has been geocoded by Amazon Location Service and is in S3, you can use Amazon QuickSight geospatial charts to quickly and easily create interactive charts. For information on how to create a Dataset in QuickSight using Amazon S3 Files, check out the QuickSight User Guide.

Below is an example using QuickSight Geospatial charts to map the Miami housing dataset. The map shows average sale price by zipcode:

QuickSight map

This example uses QuickSight geospatial charts to map the City of Hartford Business dataset. The map shows DBA (doing business as) by latitude and longitude:

Dataset visualization

Conclusion

This blog post performs address validation with the Amazon Location Service, demonstrating both geocoding and reverse geocoding capabilities.

Using a serverless architecture with S3 and Lambda, you can achieve both cost optimization and performance improvement compared with traditional methods of address validation. Using this application, your organization can better understand and harness geospatial data.

For more serverless learning resources, visit Serverless Land.

ICYMI: Serverless Q4 2021

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-q4-2021/

Welcome to the 15th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

Q4 calendar

In case you missed our last ICYMI, check out what happened last quarter here.

AWS Lambda

For developers using Amazon MSK as an event source, Lambda has expanded authentication options to include IAM, in addition to SASL/SCRAM. Lambda also now supports mutual TLS authentication for Amazon MSK and self-managed Kafka as an event source.

Lambda also launched features to make it easier to operate across AWS accounts. You can now invoke Lambda functions from Amazon SQS queues in different accounts. You must grant permission to the Lambda function’s execution role and have SQS grant cross-account permissions. For developers using container packaging for Lambda functions, Lambda also now supports pulling images from Amazon ECR in other AWS accounts. To learn about the permissions required, see this documentation.

The service now supports a partial batch response when using SQS as an event source for both standard and FIFO queues. When messages fail to process, Lambda marks the failed messages and allows reprocessing of only those messages. This helps to improve processing performance and may reduce compute costs.

Lambda launched content filtering options for functions using SQS, DynamoDB, and Kinesis as an event source. You can specify up to five filter criteria that are combined using OR logic. This uses the same content filtering language that’s used in Amazon EventBridge, and can dramatically reduce the number of downstream Lambda invocations.

Amazon EventBridge

Previously, you could consume Amazon S3 events in EventBridge via CloudTrail. Now, EventBridge receives events from the S3 service directly, making it easier to build serverless workflows triggered by activity in S3. You can use content filtering in rules to identify relevant events and forward these to 18 service targets, including AWS Lambda. You can also use event archive and replay, making it possible to reprocess events in testing, or in the event of an error.

AWS Step Functions

The AWS Batch console has added support for visualizing Step Functions workflows. This makes it easier to combine these services to orchestrate complex workflows over business-critical batch operations, such as data analysis or overnight processes.

Additionally, Amazon Athena has also added console support for visualizing Step Functions workflows. This can help when building distributed data processing pipelines, allowing Step Functions to orchestrate services such as AWS Glue, Amazon S3, or Amazon Kinesis Data Firehose.

Synchronous Express Workflows now supports AWS PrivateLink. This enables you to start these workflows privately from within your virtual private clouds (VPCs) without traversing the internet. To learn more about this feature, read the What’s New post.

Amazon SNS

Amazon SNS announced support for token-based authentication when sending push notifications to Apple devices. This creates a secure, stateless communication between SNS and the Apple Push Notification (APN) service.

SNS also launched the new PublishBatch API which enables developers to send up to 10 messages to SNS in a single request. This can reduce cost by up to 90%, since you need fewer API calls to publish the same number of messages to the service.

Amazon SQS

Amazon SQS released an enhanced DLQ management experience for standard queues. This allows you to redrive messages from a DLQ back to the source queue. This can be configured in the AWS Management Console, as shown here.

Amazon DynamoDB

The NoSQL Workbench for DynamoDB is a tool to simplify designing, visualizing and querying DynamoDB tables. The tools now supports importing sample data from CSV files and exporting the results of queries.

DynamoDB announced the new Standard-Infrequent Access table class. Use this for tables that store infrequently accessed data to reduce your costs by up to 60%. You can switch to the new table class without an impact on performance or availability and without changing application code.

AWS Amplify

AWS Amplify now allows developers to override Amplify-generated IAM, Amazon Cognito, and S3 configurations. This makes it easier to customize the generated resources to best meet your application’s requirements. To learn more about the “amplify override auth” command, visit the feature’s documentation.

Similarly, you can also add custom AWS resources using the AWS Cloud Development Kit (CDK) or AWS CloudFormation. In another new feature, developers can then export Amplify backends as CDK stacks and incorporate them into their deployment pipelines.

AWS Amplify UI has launched a new Authenticator component for React, Angular, and Vue.js. Aside from the visual refresh, this provides the easiest way to incorporate social sign-in in your frontend applications with zero-configuration setup. It also includes more customization options and form capabilities.

AWS launched AWS Amplify Studio, which automatically translates designs made in Figma to React UI component code. This enables you to connect UI components visually to backend data, providing a unified interface that can accelerate development.

AWS AppSync

You can now use custom domain names for AWS AppSync GraphQL endpoints. This enables you to specify a custom domain for both GraphQL API and Realtime API, and have AWS Certificate Manager provide and manage the certificate.

To learn more, read the feature’s documentation page.

News from other services

Serverless blog posts

October

November

December

AWS re:Invent breakouts

AWS re:Invent was held in Las Vegas from November 29 to December 3, 2021. The Serverless DA team presented numerous breakouts, workshops and chalk talks. Rewatch all our breakout content:

Serverlesspresso

We also launched an interactive serverless application at re:Invent to help customers get caffeinated!

Serverlesspresso is a contactless, serverless order management system for a physical coffee bar. The architecture comprises several serverless apps that support an ordering process from a customer’s smartphone to a real espresso bar. The customer can check the virtual line, place an order, and receive a notification when their drink is ready for pickup.

Serverlesspresso booth

You can learn more about the architecture and download the code repo at https://serverlessland.com/reinvent2021/serverlesspresso. You can also see a video of the exhibit.

Videos

Serverless Land videos

Serverless Office Hours – Tues 10 AM PT

Weekly live virtual office hours. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues. Ask us anything you want about serverless technologies and applications.

YouTube: youtube.com/serverlessland
Twitch: twitch.tv/aws

October

November

December

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

Building a serverless multi-player game that scales: Part 3

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-serverless-multi-player-game-that-scales-part-3/

This post is written by Tim Bruce, Sr. Solutions Architect, DevAx, Chelsie Delecki, Solutions Architect, DNB, and Brian Krygsman, Solutions Architect, Enterprise.

This blog series discusses building a serverless game that scales, using Simple Trivia Service:

  • Part 1 describes the overall architecture, how to deploy to your AWS account, and the different communication methods.
  • Part 2 describes adding automation to the game to help your teams scale.

This post discusses how the game scales to support concurrent users (CCU) under a load test. While this post focuses on Simple Trivia Service, you can apply the concepts to any serverless workload.

To set up the example, see the instructions in the Simple Trivia Service GitHub repo and the README.md file. This example uses services beyond AWS Free Tier and incurs charges. To remove the example from your account, see the README.md file.

Overview

Simple Trivia Service is launching at a new trivia conference. There are 200,000 registered attendees who are invited to play the game during the conference. The developers are following AWS Well-Architected best practice and load test before the launch.

Load testing is the practice of simulating user load to validate the system’s ability to scale. The focus of the load test is the game’s microservices, built using AWS Serverless services, including:

  • Amazon API Gateway and AWS IoT, which provide serverless endpoints, allowing users to interact with the Simple Trivia Service microservices.
  • AWS Lambda, which provides serverless compute services for the microservices.
  • Amazon DynamoDB, which provides a serverless NoSQL database for storing game data.

Preparing for load testing

Defining success criteria is one of the first steps in preparing for a load test. You use success criteria to determine how well the game meets the requirements and includes concurrent users, error rates, and response time. These three criteria help to ensure that your users have a good experience when playing your game.

Excluding one can lead to invalid assumptions about the scale of users that the game can support. If you exclude error rate goals, for example, users may encounter more errors, impacting their experience.

The success criteria used for Simple Trivia Service are:

  • 200,000 concurrent users split across game types.
  • Error rates below 0.05%.
  • 95th percentile synchronous responses under 1 second.

With these identified, you can develop dashboards to report on the targets. Dashboards allow you to monitor system metrics over the course of load tests. You can develop dashboards using Amazon CloudWatch dashboards, using custom widgets that organize and display metrics.

Common metrics to monitor include:

  • Error rates – total errors / total invocations.
  • Throttles – invocations resulting in 429 errors.
  • Percentage of quota usage – usage against your game’s Service Quotas.
  • Concurrent execution counts – maximum concurrent Lambda invocations.
  • Provisioned concurrency invocation rate – provisioned concurrency spillover invocation count / provisioned concurrency invocation count.
  • Latency – percentile-based response time, such as 90th and 95th percentiles.

Documentation and other services are also helpful during load testing. Centralized logging via Amazon CloudWatch Logs and AWS CloudTrail provide your team with operational data for the game. This data can help triage issues during testing.

System architecture documents provide key details to help your team focus their work during triage. Amazon DevOps Guru can also provide your team with potential solutions for issues. This uses machine learning to identify operational deviations and deployments and provides recommendations for resolving issues.

A load testing tool simplifies your testing, allowing you to model users playing the game. Popular load testing tools include Apache JMeter, Artillery.io Artillery, and Locust.io Locust. The load testing tool you select can act as your application client and access your endpoints directly.

This example uses Locust to load test Simple Trivia Service based on language and technical requirements. It allows you to accurately model usage and not only generate transactions. In production applications, select a tool that aligns to your team’s skills and meets your technical requirements.

You can place automation around load testing tool to reduce manual effort of running tests. Automation can include allocating environments, deploying and running test scripts, and collecting results. You can include this as part of your continuous integration/continuous delivery (CI/CD) pipeline. You can use the Distributed Load Testing on AWS solution to support Taurus-compatible load testing.

Also, document a plan, working backwards from your goals to help measure your progress. Plans typically use incremental growth of CCU, which can help you to identify constraints in your game. Use your plan while you are in development once portions of your game feature complete.

Testing plan for STS

This shows an example plan for load testing Simple Trivia Service:

  1. Start with individual game testing to validate tests and game modes separately.
  2. Add in testing of the three game modes together, mirroring expected real world activity.

Finally, evaluate your load test and architecture against your AWS Customer Agreement, AWS Acceptable Use Policy, Amazon EC2 Testing Policy, and the AWS Customer Support Policy for Penetration Testing. These policies are put in place to help you to be successful in your load testing efforts. AWS Support requires you to notify them at least two weeks prior to your load test using the Simulated Events Submission Form with the AWS Management Console. This form can also be used if you have questions before your load test.

Additional help for your load test may be available on the AWS Forums, AWS re:Post, or via your account team.

Testing

After triggering a test, automation scales up your infrastructure and initializes the test users. Depending on the number of users you need and their startup behavior, this ramp-up phase can take several minutes. Similarly, when the test run is complete, your test users should ramp down. Unless you have modeled the ramp-up and ramp-down phases to match real-world behavior, exclude these phases from your measurements. If you include them, you may optimize for unrealistic user behavior.

While tests are running, let metrics normalize before drawing conclusions. Services may report data at different rates. Investigate when you find metrics that cross your acceptable thresholds. You may need to make adjustments like adding Lambda Provisioned Concurrency or changing application code to resolve constraints. You may even need to re-evaluate your requirements based on how the system performs. When you make changes, re-test to verify any changes had the impact you expected before continuing with your plan.

Finally, keep an organized record of the inputs and outputs of tests, including dashboard exports and your own observations. This record is valuable when sharing test outcomes and comparing test runs. Mark your progress against the plan to stay on track.

Analyzing and improving Simple Trivia Service performance

Running the test plan, using observability tools to measure performance, finds opportunities to tune performance bottlenecks.

In this example, during single player individual tests, the dashboards show acceptable latency values. As the test size grows, increasing read capacity for retrieving leaderboards indicates a tuning opportunity:

Dashboard reads 1

Dashboard reads 2

  1. The CloudWatch dashboard reveals that the LeaderboardGet function is leading to high consumed read capacity for the Players DynamoDB table. A process within the function is querying scores and player records with every call to load avatar URLs
  2. Standardizing the player avatar URL process within the code reduces reads from the table. The update improves DynamoDB reads.

Moving into the full test phase of the plan with combined game types identified additional areas for performance optimization. In one case, dashboards highlight unexpected error rates for a Lambda function. Consulting function logs and DevOps Guru to triage the behavior, these show a downstream issue with an Amazon Kinesis Data Stream:

Identifying error rates in dashboards

  1. DevOps Guru, within an insight, highlights the problem of the Kinesis:WriteProvisionedThroughputExceeded metric during our test window
  2. DevOps Guru also correlates that metric with the Kinesis:GetRecords.Latency metric.

DevOps Guru also links to a recommendation for Kinesis Data Streams to troubleshoot and resolve the incident with the data stream. Following this advice helps to resolve the Lambda error rates during the next test.

Load testing results

By following the plan, making incremental changes as optimizations became apparent, you can reach the goals.

Table of results

The preceding table is a summary of data from Amazon CloudWatch Lambda Insights and statistics captured from Locust:

  1. The test exceeded the goal of 200k CCU with a combined total of 236,820 CCU.
  2. Less than 0.05% error rate with a combined average of 0.010%.
  3. Performance goals are achieved without needing Provisioned Concurrency in Lambda.

Function latency

Function concurrency and throttles

  1. The function latency goal of < 1 second is met, based on data from CloudWatch Lambda Insights.
  2. Function concurrency is below Service Quotas for Lambda during the test, based on data from our custom CloudWatch dashboard.

Conclusion

This post discusses how to perform a load test on a serverless workload. The process was used to validate a scale of Simple Trivia Service, a single- and multi-player game built using a serverless-first architecture on AWS. The results show a scale of over 220,000 CCUs while maintaining less than 1-second response time and an error rate under 0.05%.

For more serverless learning resources, visit Serverless Land.

Use AWS Step Functions to Monitor Services Choreography

Post Syndicated from Vito De Giosa original https://aws.amazon.com/blogs/architecture/use-aws-step-functions-to-monitor-services-choreography/

Organizations frequently need access to quick visual insight on the status of complex workflows. This involves collaboration across different systems. If your customer requires assistance on an order, you need an overview of the fulfillment process, including payment, inventory, dispatching, packaging, and delivery. If your products are expensive assets such as cars, you must track each item’s journey instantly.

Modern applications use event-driven architectures to manage the complexity of system integration at scale. These often use choreography for service collaboration. Instead of directly invoking systems to perform tasks, services interact by exchanging events through a centralized broker. Complex workflows are the result of actions each service initiates in response to events produced by other services. Services do not directly depend on each other. This increases flexibility, development speed, and resilience.

However, choreography can introduce two main challenges for the visibility of your workflow.

  1. It obfuscates the workflow definition. The sequence of events emitted by individual services implicitly defines the workflow. There is no formal statement that describes steps, permitted transitions, and possible failures.
  2. It might be harder to understand the status of workflow executions. Services act independently, based on events. You can implement distributed tracing to collect information related to a single execution across services. However, getting visual insights from traces may require custom applications. This increases time to market (TTM) and cost.

To address these challenges, we will show you how to use AWS Step Functions to model choreographies as state machines. The solution enables stakeholders to gain visual insights on workflow executions, identify failures, and troubleshoot directly from the AWS Management Console.

This GitHub repository provides a Quick Start and examples on how to model choreographies.

Modeling choreographies with Step Functions

Monitoring a choreography requires a formal representation of the distributed system behavior, such as state machines. State machines are mathematical models representing the behavior of systems through states and transitions. States model situations in which the system can operate. Transitions define which input causes a change from the current state to the next. They occur when a new event happens. Figure 1 shows a state machine modeling an order workflow.

Figure 1. Order workflow

Figure 1. Order workflow

The solution in this post uses Amazon State Language to describe a choreography as a Step Functions state machine. The state machine pauses, using Task states combined with a callback integration pattern. It then waits for the next event to be published on the broker. Choice states control transitions to the next state by inspecting event payloads. Figure 2 shows how the workflow in Figure 1 translates to a Step Functions state machine.

Figure 2. Order workflow translated into Step Functions state machine

Figure 2. Order workflow translated into Step Functions state machine

Figure 3 shows the architecture for monitoring choreographies with Step Functions.

Figure 3. Choreography monitoring with AWS Step Functions

Figure 3. Choreography monitoring with AWS Step Functions

  1. Services involved in the choreography publish events to Amazon EventBridge. There are two configured rules. The first rule matches the first event of the choreography sequence, Order Placed in the example. The second rule matches any other event of the sequence. Event payloads contain a correlation id (order_id) to group them by workflow instance.
  2. The first rule invokes an AWS Lambda function, which starts a new execution of the choreography state machine. The correlation id is passed in the name parameter, so you can quickly identify an execution in the AWS Management Console.
  3. The state machine uses Task states with AWS SDK service integrations, to directly call Amazon DynamoDB. Tasks are configured with a callback pattern. They issue a token, which is stored in DynamoDB with the execution name. Then, the workflow pauses.
  4. A service publishes another event on the event bus.
  5. The second rule invokes another Lambda function with the event payload.
  6. The function uses the correlation id to retrieve the task token from DynamoDB.
  7. The function invokes the Step Functions SendTaskSuccess API, with the token and the event payload as parameters.
  8. The state machine resumes the execution and uses Choice states to transition to the next state. If the choreography definition expects the received event payload, it selects the next state and the process will restart from Step # 3. The state machine transitions to a Fail state when it receives an unexpected event.

Increased visibility with Step Functions console

Modeling service choreographies as Step Functions Standard Workflows increases visibility with out-of-the-box features.

1. You can centrally track events produced by distributed components. Step Functions records full execution history for 90 days after the execution completes. You’ll be able to capture detailed information about the input and output of each state, including event payloads. Additionally, state machines integrate with Amazon CloudWatch to publish execution logs and metrics.

2. You can monitor choreographies visually. The Step Functions console displays a list of executions with information such as execution id, status, and start date (see Figure 4).

Figure 4. Step Functions workflow dashboard

Figure 4. Step Functions workflow dashboard

After you’ve selected an execution, a graph inspector is displayed (see Figure 5). It shows states, transitions, and marks individual states with colors. This identifies at a glance, successful tasks, failures, and tasks that are still in progress.

Figure 5. Step Functions graph inspector

Figure 5. Step Functions graph inspector

3. You can implement event-driven automation. Step Functions enables you to capture execution status changes emitting events directly to EventBridge (see Figure 6). Additionally, AWS gives you the ability to emit events by setting alarms on top of metrics. Step Functions publishes these to CloudWatch. You can respond to events by initiating corrective actions, sending notifications, or integrating with third-party solutions, such as issue tracking systems.

Figure 6. Automation with Step Functions, EventBridge, and CloudWatch alarms

Figure 6. Automation with Step Functions, EventBridge, and CloudWatch alarms

Enabling access to AWS Step Functions console

Stakeholders need secure access to the Step Functions console. This requires mechanisms to authenticate users and authorize read-only access to specific Step Functions workflows.

AWS Single Sign-On authenticates users by directly managing identities or through federation. SSO supports federation with Active Directory and SAML 2.0 compliant external identity providers (IdP). Users gain access to Step Functions state machines by assigning a permission set, which is a collection of AWS Identity and Access Management (IAM) policies. Additionally, with permission sets, you can configure a relay state, which is a URL to redirect the user after successful authentication. You can authenticate the user through the selected identity provider and immediately show the AWS Step Functions console with the workflow state machine already displayed. Figure 7 shows this process.

Figure 7. Access to Step Functions state machine with AWS SSO

Figure 7. Access to Step Functions state machine with AWS SSO

  1. The user logs in through the selected identity provider.
  2. The SSO user portal uses the SSO endpoint to send the response from the previous step. SSO uses AWS Security Token Service (STS) to get temporary security credentials on behalf of the user. It then creates a console sign-in URL using those credentials and the relay state. Finally, it sends the URL back as a redirect.
  3. The browser redirects the user to the Step Functions console.

When the identity provider does not support SAML 2.0, SSO is not a viable solution. In this case, you can create a URL with a sign-in token for users to securely access the AWS Management Console. This approach uses STS AssumeRole to get temporary security credentials. Then, it uses credentials to obtain a sign-in token from the AWS federation endpoint. Finally, it constructs a URL for the AWS Management Console, which includes the token. It then distributes this to users to grant access. This is similar to the SSO process. However, it requires custom development.

Conclusion

This post shows how you can increase visibility on choreographed business processes using AWS Step Functions. The solution provides detailed visual insights directly from the AWS Management Console, without requiring custom UI development. This reduces TTM and cost.

To learn more:

Find Public IPs of Resources – Use AWS Config for Vulnerability Assessment

Post Syndicated from Gurkamal Deep Singh Rakhra original https://aws.amazon.com/blogs/architecture/find-public-ips-of-resources-use-aws-config-for-vulnerability-assessment/

Systems vulnerability management is a key component of your enterprise security program. Its goal is to remediate OS, software, and applications vulnerabilities. Scanning tools can help identify and classify these vulnerabilities to keep the environment secure and compliant.

Typically, vulnerability scanning tools operate from internal or external networks to discover and report vulnerabilities. For internal scanning, the tools use private IPs of target systems in scope. For external scans, the public target system’s IP addresses are used. It is important that security teams always maintain an accurate inventory of all deployed resource’s IP addresses. This ensures a comprehensive, consistent, and effective vulnerability assessment.

This blog discusses a scalable, serverless, and automated approach to discover public IP addresses assigned to resources in a single or multi-account environment in AWS, using AWS Config.

Single account is when you have all your resources in a single AWS account. A multi-account environment refers to many accounts under the same AWS Organization.

Understanding scope of solution

You may have good visibility into the private IPs assigned to your resources: Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Kubernetes Service (EKS) clusters, Elastic Load Balancing (ELB), and Amazon Elastic Container Service (Amazon ECS). But it may require some effort to establish a complete view of the existing public IPs. And these IPs can change over time, as new systems join and exit the environment.

An elastic network interface is a logical networking component in a Virtual Private Cloud (VPC) that represents a virtual network card. The elastic network interface routes traffic to other destinations/resources. Usually, you have to make Describe* API calls for the specific resource with an elastic network interface to get information about its configuration and IP address. This may throttle the resource-specific API calls, and result in higher costs. Additionally, if there are tens or hundreds of accounts, it becomes exponentially more difficult to get the information into a single inventory.

AWS Config enables you to assess, audit, and evaluate the configurations of your AWS resources. The advanced query feature provides a single query endpoint and language to get current resource state metadata for a single account and Region, or multiple accounts and Regions. You can use configuration aggregators to run the same queries from a central account across multiple accounts and AWS Regions.

AWS Config supports a subset of structured query language (SQL) SELECT syntax, which enables you to perform property-based queries and aggregations on the current configuration item (CI) data. Advanced query is available at no additional cost to AWS Config customers in all AWS Regions (except China Regions) and AWS GovCloud (US).

AWS Organizations helps you centrally govern your environment. Its integration with other AWS services lets you define central configurations, security mechanisms, audit requirements, and resource sharing across accounts in your organization.

Choosing scope of advanced queries in AWS Config

When running advanced queries in AWS Config, you must choose the scope of the query. The scope defines the accounts you want to run the query against and is configured when you create an aggregator.

Following are the three possible scopes when running advanced queries:

  1. Single account and single Region
  2. Multiple accounts and multiple Regions
  3. AWS Organization accounts

Single account and single Region

Figure 1. AWS Config workflow for single account and single Region

Figure 1. AWS Config workflow for single account and single Region

The use case shown in Figure 1 addresses the need of customers operating within a single account and single Region. With AWS Config enabled for the individual account, you will use AWS Config advanced query feature to run SQL queries. These will give you resource metadata about associated public IPs. You do not require an aggregator for single-account and single Region.

In Figure 1.1, the advanced query returned results from a single account and all Availability Zones within the Region in which the query was run.

Figure 1.1 Advanced query returning results for a single account and single Region

Figure 1.1 Advanced query returning results for a single account and single Region

Query for reference

SELECT

  resouceId,

  resourceName,

  resourceType,

  configuration.association.publicIp,

  availabilityZone,

  awsRegion

WHERE

  resourceType='AWS::EC2::NetworkInterface'

  AND configuration.association.publicIp>'0.0.0.0'

This query is fetching the properties of all elastic network interfaces. The WHERE condition is used to list the elastic network interfaces using the resourceType property and find all public IPs greater than 0.0.0.0. This is because elastic network interfaces can exist with a private IP, in which case there will be no public IP assigned to it. For a list of supported resourceType, refer to supported resource types for AWS Config.

Multiple accounts and multiple Regions

Figure 2. AWS Config monitoring workflow for multiple account and multiple Regions. The figure shows EC2, EKS, and Amazon ECS, but it can be any AWS resource having a public elastic network interface.

Figure 2. AWS Config monitoring workflow for multiple account and multiple Regions. The figure shows EC2, EKS, and Amazon ECS, but it can be any AWS resource having a public elastic network interface.

AWS Config enables you to monitor configuration changes against multiple accounts and multiple Regions via an aggregator, see Figure 2. An aggregator is an AWS Config resource type that collects AWS Config data from multiple accounts and Regions. You can choose the aggregator scope when running advanced queries in AWS Config. Remember to authorize the aggregator accounts to collect AWS Config configuration and compliance data.

Figure 2.1 Advanced query returning results from multiple Regions (awsRegion column) as highlighted in the diagram

Figure 2.1 Advanced query returning results from multiple Regions (awsRegion column) as highlighted in the diagram

This use case applies when you have AWS resources in multiple accounts (or span multiple organizations) and multiple Regions. Figure 2.1 shows the query results being returned from multiple AWS Regions.

Accounts in AWS Organization

Figure 3. The workflow of accounts in an AWS Organization being monitored by AWS Config. This figure shows EC2, EKS, and Amazon ECS but it can be any AWS resource having a public elastic network interface.

Figure 3. The workflow of accounts in an AWS Organization being monitored by AWS Config. This figure shows EC2, EKS, and Amazon ECS but it can be any AWS resource having a public elastic network interface.

An aggregator also enables you to monitor all the accounts in your AWS Organization, see Figure 3. When this option is chosen, AWS Config enables you to run advanced queries against the configuration history in all the accounts in your AWS Organization. Remember that an aggregator will only aggregate data from the accounts and Regions that are specified when the aggregator is created.

Figure 3.1 Advanced query returning results from all accounts (accountId column) under an AWS Organization

Figure 3.1 Advanced query returning results from all accounts (accountId column) under an AWS Organization

In Figure 3.1, the query is run against all accounts in an AWS Organization. This scope of AWS Organization is accomplished by the aggregator and it automatically accumulates data from all accounts under a specific AWS Organization.

Common architecture workflow for discovering public IPs

Figure 4. High-level architecture pattern for discovering public IPs

Figure 4. High-level architecture pattern for discovering public IPs

The workflow shown in Figure 4 starts with Amazon EventBridge triggering an AWS Lambda function. You can configure an Amazon EventBridge schedule via rate or cron expressions, which define the frequency. This AWS Lambda function will host the code to make an API call to AWS Config that will run an advanced query. The advanced query will check for all elastic network interfaces in your account(s). This is because any public resource launched in your account will be assigned an elastic network interface.

When the results are returned, they can be stored on Amazon S3. These result files can be timestamped (via naming or S3 versioning) in order to keep a history of public IPs used in your account. The result set can then be fed into or accessed by the vulnerability scanning tool of your choice.

Note: AWS Config advanced queries can also be used to query IPv6 addresses. You can use the “configuration.ipv6Addresses” AWS Config property to get IPv6 addresses. When querying IPv6 addresses, remove “configuration.association.publicIp > ‘0.0.0.0’” condition from the preceding sample queries. For more information on available AWS Config properties and data types, refer to GitHub.

Conclusion

In this blog, we demonstrated how to extract public IP information from resources deployed in your account(s) using AWS Config and AWS Config advanced query. We discussed how you can support your vulnerability scanning process by identifying public IPs in your account(s) that can be fed into your scanning tool. This solution is serverless, automated, and scalable, which removes the undifferentiated heavy lifting required to manage your resources.

Learn more about AWS Config best practices:

Modernized Database Queuing using Amazon SQS and AWS Services

Post Syndicated from Scott Wainner original https://aws.amazon.com/blogs/architecture/modernized-database-queuing-using-amazon-sqs-and-aws-services/

A queuing system is composed of producers and consumers. A producer enqueues messages (writes messages to a database) and a consumer dequeues messages (reads messages from the database). Business applications requiring asynchronous communications often use the relational database management system (RDBMS) as the default message storage mechanism. But the increased message volume, complexity, and size, competes with the inherent functionality of the database. The RDBMS becomes a bottleneck for message delivery, while also impacting other traditional enterprise uses of the database.

In this blog, we will show how you can mitigate the RDBMS performance constraints by using Amazon Simple Queue Service (Amazon SQS), while retaining the intrinsic value of the stored relational data.

Problems with legacy queuing methods

Commercial databases such as Oracle offer Advanced Queuing (AQ) mechanisms, while SQL Server supports Service Broker for queuing. The database acts as a message queue system when incoming messages are captured along with metadata. A message stored in a database is often processed multiple times using a sequence of message extraction, transformation, and loading (ETL). The message is then routed for distribution to a set of recipients based on logic that is often also stored in the database.

The repetitive manipulation of messages and iterative attempts at distributing pending messages may create a backlog that interferes with the primary function of the database. This backpressure can propagate to other systems that are trying to store and retrieve data from the database and cause a performance issue (see Figure 1).

Figure 1. A relational database serving as a message queue.

Figure 1. A relational database serving as a message queue.

There are several scenarios where the database can become a bottleneck for message processing:

Message metadata. Messages consist of the payload (the content of the message) and metadata that describes the attributes of the message. The metadata often includes routing instructions, message disposition, message state, and payload attributes.

  • The message metadata may require iterative transformation during the message processing. This creates an inefficient sequence of read, transform, and write processes. This is especially inefficient if the message attributes undergo multiple transformations that must be reflected in the metadata. The iterative read/write process of metadata consumes the database IOPS, and forces the database to scale vertically (add more CPU and more memory).
  • A new paradigm emerges when message management processes exist outside of the database. Here, the metadata is manipulated without interacting with the database, except to write the final message disposition. Application logic can be applied through functions such as AWS Lambda to transform the message metadata.

Message large object (LOB). A message may contain a large binary object that must be stored in the payload.

  • Storing large binary objects in the RDBMS is expensive. Manipulating them consumes the throughput of the database with iterative read/write operations. If the LOB must be transformed, then it becomes wasteful to store the object in the database.
  • An alternative approach offers a more efficient message processing sequence. The large object is stored external to the database in universally addressable object storage, such as Amazon Simple Storage Service (Amazon S3). There is only a pointer to the object that is stored in the database. Smaller elements of the message can be read from or written to the database, while large objects can be manipulated more efficiently in object storage resources.

Message fan-out. A message can be loaded into the database and analyzed for routing, where the same message must be distributed to multiple recipients.

  • Messages that require multiple recipients may require a copy of the message replicated for each recipient. The replication creates multiple writes and reads from the database, which is inefficient.
  • A new method captures only the routing logic and target recipients in the database. The message replication then occurs outside of the database in distributed messaging systems, such as Amazon Simple Notification Service (Amazon SNS).

Message queuing. Messages are often kept in the database until they are successfully processed for delivery. If a message is read from the database and determined to be undeliverable, then the message is kept there until a later attempt is successful.

  • An inoperable message delivery process can create backpressure on the database where iterative message reads are processed for the same message with unsuccessful delivery. This creates a feedback loop causing even more unsuccessful work for the database.
  • Try a message queuing system such as Amazon MQ or Amazon SQS, which offloads the message queuing from the database. These services offer efficient message retry mechanisms, and reduce iterative reads from the database.

Sequenced message delivery. Messages may require ordered delivery where the delivery sequence is crucial for maintaining application integrity.

  • The application may capture the message order within database tables, but the sorting function still consumes processing capabilities. The order sequence must be sorted and maintained for each attempted message delivery.
  • Message order can be maintained outside of the database using a queue system, such as Amazon SQS, with first-in/first-out (FIFO) delivery.

Message scheduling. Messages may also be queued with a scheduled delivery attribute. These messages require an event driven architecture with initiated scheduled message delivery.

  • The database often uses trigger mechanisms to initiate message delivery. Message delivery may require a synchronized point in time for delivery (many messages at once), which can cause a spike in work at the scheduled interval. This impacts the database performance with artificially induced peak load intervals.
  • Event signals can be generated in systems such as Amazon EventBridge, which can coordinate the transmission of messages.

Message disposition. Each message maintains a message disposition state that describes the delivery state.

  • The database is often used as a logging system for message transmission status. The message metadata is updated with the disposition of the message, while the message remains in the database as an artifact.
  • An optimized technique is available using Amazon CloudWatch as a record of message disposition.

Modernized queuing architecture

Decoupling message queuing from the database improves database availability and enables greater message queue scalability. It also provides a more cost-effective use of the database, and mitigates backpressure created when database performance is constrained by message management.

The modernized architecture uses loosely coupled services, such as Amazon S3, AWS Lambda, Amazon Message Queue, Amazon SQS, Amazon SNS, Amazon EventBridge, and Amazon CloudWatch. This loosely coupled architecture lets each of the functional components scale vertically and horizontally independent of the other functions required for message queue management.

Figure 2 depicts a message queuing architecture that uses Amazon SQS for message queuing and AWS Lambda for message routing, transformation, and disposition management. An RDBMS is still leveraged to retain metadata profiles, routing logic, and message disposition. The ETL processes are handled by AWS Lambda, while large objects are stored in Amazon S3. Finally, message fan-out distribution is handled by Amazon SNS, and the queue state is monitored and managed by Amazon CloudWatch and Amazon EventBridge.

Figure 2. Modernized queuing architecture using Amazon SQS

Figure 2. Modernized queuing architecture using Amazon SQS

Conclusion

In this blog, we show how queuing functionality can be migrated from the RDMBS while minimizing changes to the business application. The RDBMS continues to play a central role in sourcing the message metadata, running routing logic, and storing message disposition. However, AWS services such as Amazon SQS offload queue management tasks related to the messages. AWS Lambda performs message transformation, queues the message, and transmits the message with massive scale, fault-tolerance, and efficient message distribution.

Read more about the diverse capabilities of AWS messaging services:

By using AWS services, the RDBMS is no longer a performance bottleneck in your business applications. This improves scalability, and provides resilient, fault-tolerant, and efficient message delivery.

Read our blog on modernization of common database functions:

Unify log aggregation and analytics across compute platforms

Post Syndicated from Hari Ohm Prasath original https://aws.amazon.com/blogs/big-data/unify-log-aggregation-and-analytics-across-compute-platforms/

Our customers want to make sure their users have the best experience running their application on AWS. To make this happen, you need to monitor and fix software problems as quickly as possible. Doing this gets challenging with the growing volume of data needing to be quickly detected, analyzed, and stored. In this post, we walk you through an automated process to aggregate and monitor logging-application data in near-real time, so you can remediate application issues faster.

This post shows how to unify and centralize logs across different computing platforms. With this solution, you can unify logs from Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Kinesis Data Firehose, and AWS Lambda using agents, log routers, and extensions. We use Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) with OpenSearch Dashboards to visualize and analyze the logs, collected across different computing platforms to get application insights. You can deploy the solution using the AWS Cloud Development Kit (AWS CDK) scripts provided as part of the solution.

Customer benefits

A unified aggregated log system provides the following benefits:

  • A single point of access to all the logs across different computing platforms
  • Help defining and standardizing the transformations of logs before they get delivered to downstream systems like Amazon Simple Storage Service (Amazon S3), Amazon OpenSearch Service, Amazon Redshift, and other services
  • The ability to use Amazon OpenSearch Service to quickly index, and OpenSearch Dashboards to search and visualize logs from its routers, applications, and other devices

Solution overview

In this post, we use the following services to demonstrate log aggregation across different compute platforms:

  • Amazon EC2 – A web service that provides secure, resizable compute capacity in the cloud. It’s designed to make web-scale cloud computing easier for developers.
  • Amazon ECS – A web service that makes it easy to run, scale, and manage Docker containers on AWS, designed to make the Docker experience easier for developers.
  • Amazon EKS – A web service that makes it easy to run, scale, and manage Docker containers on AWS.
  • Kinesis Data Firehose – A fully managed service that makes it easy to stream data to Amazon S3, Amazon Redshift, or Amazon OpenSearch Service.
  • Lambda – A compute service that lets you run code without provisioning or managing servers. It’s designed to make web-scale cloud computing easier for developers.
  • Amazon OpenSearch Service – A fully managed service that makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more.

The following diagram shows the architecture of our solution.

The architecture uses various log aggregation tools such as log agents, log routers, and Lambda extensions to collect logs from multiple compute platforms and deliver them to Kinesis Data Firehose. Kinesis Data Firehose streams the logs to Amazon OpenSearch Service. Log records that fail to get persisted in Amazon OpenSearch service will get written to AWS S3. To scale this architecture, each of these compute platforms streams the logs to a different Firehose delivery stream, added as a separate index, and rotated every 24 hours.

The following sections demonstrate how the solution is implemented on each of these computing platforms.

Amazon EC2

The Kinesis agent collects and streams logs from the applications running on EC2 instances to Kinesis Data Firehose. The agent is a standalone Java software application that offers an easy way to collect and send data to Kinesis Data Firehose. The agent continuously monitors files and sends logs to the Firehose delivery stream.

BDB-1742-Ec2

The AWS CDK script provided as part of this solution deploys a simple PHP application that generates logs under the /etc/httpd/logs directory on the EC2 instance. The Kinesis agent is configured via /etc/aws-kinesis/agent.json to collect data from access_logs and error_logs, and stream them periodically to Kinesis Data Firehose (ec2-logs-delivery-stream).

Because Amazon OpenSearch Service expects data in JSON format, you can add a call to a Lambda function to transform the log data to JSON format within Kinesis Data Firehose before streaming to Amazon OpenSearch Service. The following is a sample input for the data transformer:

46.99.153.40 - - [29/Jul/2021:15:32:33 +0000] "GET / HTTP/1.1" 200 173 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"

The following is our output:

{
    "logs" : "46.99.153.40 - - [29/Jul/2021:15:32:33 +0000] \"GET / HTTP/1.1\" 200 173 \"-\" \"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36\"",
}

We can enhance the Lambda function to extract the timestamp, HTTP, and browser information from the log data, and store them as separate attributes in the JSON document.

Amazon ECS

In the case of Amazon ECS, we use FireLens to send logs directly to Kinesis Data Firehose. FireLens is a container log router for Amazon ECS and AWS Fargate that gives you the extensibility to use the breadth of services at AWS or partner solutions for log analytics and storage.

BDB-1742-ECS

The architecture hosts FireLens as a sidecar, which collects logs from the main container running an httpd application and sends them to Kinesis Data Firehose and streams to Amazon OpenSearch Service. The AWS CDK script provided as part of this solution deploys a httpd container hosted behind an Application Load Balancer. The httpd logs are pushed to Kinesis Data Firehose (ecs-logs-delivery-stream) through the FireLens log router.

Amazon EKS

With the recent announcement of Fluent Bit support for Amazon EKS, you no longer need to run a sidecar to route container logs from Amazon EKS pods running on Fargate. With the new built-in logging support, you can select a destination of your choice to send the records to. Amazon EKS on Fargate uses a version of Fluent Bit for AWS, an upstream conformant distribution of Fluent Bit managed by AWS.

BDB-1742-EKS

The AWS CDK script provided as part of this solution deploys an NGINX container hosted behind an internal Application Load Balancer. The NGINX container logs are pushed to Kinesis Data Firehose (eks-logs-delivery-stream) through the Fluent Bit plugin.

Lambda

For Lambda functions, you can send logs directly to Kinesis Data Firehose using the Lambda extension. You can deny the records being written to Amazon CloudWatch.

BDB-1742-Lambda

After deployment, the workflow is as follows:

  1. On startup, the extension subscribes to receive logs for the platform and function events. A local HTTP server is started inside the external extension, which receives the logs.
  2. The extension buffers the log events in a synchronized queue and writes them to Kinesis Data Firehose via PUT records.
  3. The logs are sent to downstream systems.
  4. The logs are sent to Amazon OpenSearch Service.

The Firehose delivery stream name gets specified as an environment variable (AWS_KINESIS_STREAM_NAME).

For this solution, because we’re only focusing on collecting the run logs of the Lambda function, the data transformer of the Kinesis Data Firehose delivery stream filters out the records of type function ("type":"function") before sending it to Amazon OpenSearch Service.

The following is a sample input for the data transformer:

[
   {
      "time":"2021-07-29T19:54:08.949Z",
      "type":"platform.start",
      "record":{
         "requestId":"024ae572-72c7-44e0-90f5-3f002a1df3f2",
         "version":"$LATEST"
      }
   },
   {
      "time":"2021-07-29T19:54:09.094Z",
      "type":"platform.logsSubscription",
      "record":{
         "name":"kinesisfirehose-logs-extension-demo",
         "state":"Subscribed",
         "types":[
            "platform",
            "function"
         ]
      }
   },
   {
      "time":"2021-07-29T19:54:09.096Z",
      "type":"function",
      "record":"2021-07-29T19:54:09.094Z\tundefined\tINFO\tLoading function\n"
   },
   {
      "time":"2021-07-29T19:54:09.096Z",
      "type":"platform.extension",
      "record":{
         "name":"kinesisfirehose-logs-extension-demo",
         "state":"Ready",
         "events":[
            "INVOKE",
            "SHUTDOWN"
         ]
      }
   },
   {
      "time":"2021-07-29T19:54:09.097Z",
      "type":"function",
      "record":"2021-07-29T19:54:09.097Z\t024ae572-72c7-44e0-90f5-3f002a1df3f2\tINFO\tvalue1 = value1\n"
   },   
   {
      "time":"2021-07-29T19:54:09.098Z",
      "type":"platform.runtimeDone",
      "record":{
         "requestId":"024ae572-72c7-44e0-90f5-3f002a1df3f2",
         "status":"success"
      }
   }
]

Prerequisites

To implement this solution, you need the following prerequisites:

Build the code

Check out the AWS CDK code by running the following command:

mkdir unified-logs && cd unified-logs
git clone https://github.com/aws-samples/unified-log-aggregation-and-analytics .

Build the lambda extension by running the following command:

cd lib/computes/lambda/extensions
chmod +x extension.sh
./extension.sh
cd ../../../../

Make sure to replace default AWS region specified under the value of firehose.endpoint attribute inside lib/computes/ec2/ec2-startup.sh.

Build the code by running the following command:

yarn install && npm run build

Deploy the code

If you’re running AWS CDK for the first time, run the following command to bootstrap the AWS CDK environment (provide your AWS account ID and AWS Region):

cdk bootstrap \
    --cloudformation-execution-policies arn:aws:iam::aws:policy/AdministratorAccess \
    aws://<AWS Account Id>/<AWS_REGION>

You only need to bootstrap the AWS CDK one time (skip this step if you have already done this).

Run the following command to deploy the code:

cdk deploy --requires-approval

You get the following output:

 ✅  CdkUnifiedLogStack

Outputs:
CdkUnifiedLogStack.ec2ipaddress = xx.xx.xx.xx
CdkUnifiedLogStack.ecsloadbalancerurl = CdkUn-ecsse-PY4D8DVQLK5H-xxxxx.us-east-1.elb.amazonaws.com
CdkUnifiedLogStack.ecsserviceLoadBalancerDNS570CB744 = CdkUn-ecsse-PY4D8DVQLK5H-xxxx.us-east-1.elb.amazonaws.com
CdkUnifiedLogStack.ecsserviceServiceURL88A7B1EE = http://CdkUn-ecsse-PY4D8DVQLK5H-xxxx.us-east-1.elb.amazonaws.com
CdkUnifiedLogStack.eksclusterClusterNameCE21A0DB = ekscluster92983EFB-d29892f99efc4419bc08534a3d253160
CdkUnifiedLogStack.eksclusterConfigCommand515C0544 = aws eks update-kubeconfig --name ekscluster92983EFB-d29892f99efc4419bc08534a3d253160 --region us-east-1 --role-arn arn:aws:iam::xxx:role/CdkUnifiedLogStack-clustermasterroleCD184EDB-12U2TZHS28DW4
CdkUnifiedLogStack.eksclusterGetTokenCommand3C33A2A5 = aws eks get-token --cluster-name ekscluster92983EFB-d29892f99efc4419bc08534a3d253160 --region us-east-1 --role-arn arn:aws:iam::xxx:role/CdkUnifiedLogStack-clustermasterroleCD184EDB-12U2TZHS28DW4
CdkUnifiedLogStack.elasticdomainarn = arn:aws:es:us-east-1:xxx:domain/cdkunif-elasti-rkiuv6bc52rp
CdkUnifiedLogStack.s3bucketname = cdkunifiedlogstack-logsfailederrcapturebucket0bcc-xxxxx
CdkUnifiedLogStack.samplelambdafunction = CdkUnifiedLogStack-LambdatransformerfunctionFA3659-c8u392491FrW

Stack ARN:
arn:aws:cloudformation:us-east-1:xxxx:stack/CdkUnifiedLogStack/6d53ef40-efd2-11eb-9a9d-1230a5204572

AWS CDK takes care of building the required infrastructure, deploying the sample application, and collecting logs from different sources to Amazon OpenSearch Service.

The following is some of the key information about the stack:

  • ec2ipaddress – The public IP address of the EC2 instance, deployed with the sample PHP application
  • ecsloadbalancerurl – The URL of the Amazon ECS Load Balancer, deployed with the httpd application
  • eksclusterClusterNameCE21A0DB – The Amazon EKS cluster name, deployed with the NGINX application
  • samplelambdafunction – The sample Lambda function using the Lambda extension to send logs to Kinesis Data Firehose
  • opensearch-domain-arn – The ARN of the Amazon OpenSearch Service domain

Generate logs

To visualize the logs, you first need to generate some sample logs.

  1. To generate Lambda logs, invoke the function using the following AWS CLI command (run it a few times):
aws lambda invoke \
--function-name "<<samplelambdafunction>>" \
--payload '{"payload": "hello"}' /tmp/invoke-result \
--cli-binary-format raw-in-base64-out \
--log-type Tail

Make sure to replace samplelambdafunction with the actual Lambda function name. The file path needs to be updated based on the underlying operating system.

The function should return "StatusCode": 200, with the following output:

{
    "StatusCode": 200,
    "LogResult": "<<Encoded>>",
    "ExecutedVersion": "$LATEST"
}
  1. Run the following command a couple of times to generate Amazon EC2 logs:
curl http://ec2ipaddress:80

Make sure to replace ec2ipaddress with the public IP address of the EC2 instance.

  1. Run the following command a couple of times to generate Amazon ECS logs:
curl http://ecsloadbalancerurl:80

Make sure to replace ecsloadbalancerurl with the public ARN of the AWS Application Load Balancer.

We deployed the NGINX application with an internal load balancer, so the load balancer hits the health checkpoint of the application, which is sufficient to generate the Amazon EKS access logs.

Visualize the logs

To visualize the logs, complete the following steps:

  1. On the Amazon OpenSearch Service console, choose the hyperlink provided for the OpenSearch Dashboard 7URL.
  2. Configure access to the OpenSearch Dashboard.
  3. Under OpenSearch Dashboard, on the Discover menu, start creating a new index pattern for each compute log.

We can see separate indexes for each compute log partitioned by date, as in the following screenshot.

BDB-1742-create-index

The following screenshot shows the process to create index patterns for Amazon EC2 logs.

BDB-1742-ec2

After you create the index pattern, we can start analyzing the logs using the Discover menu under OpenSearch Dashboard in the navigation pane. This tool provides a single searchable and unified interface for all the records with various compute platforms. We can switch between different logs using the Change index pattern submenu.

BDB-1742-unified

Clean up

Run the following command from the root directory to delete the stack:

cdk destroy

Conclusion

In this post, we showed how to unify and centralize logs across different compute platforms using Kinesis Data Firehose and Amazon OpenSearch Service. This approach allows you to analyze logs quickly and the root cause of failures, using a single platform rather than different platforms for different services.

If you have feedback about this post, submit your comments in the comments section.

Resources

For more information, see the following resources:


About the author

HariHari Ohm Prasath is a Senior Modernization Architect at AWS, helping customers with their modernization journey to become cloud native. Hari loves to code and actively contributes to the open source initiatives. You can find him in Medium, Github & Twitter @hariohmprasath.

balluBallu Singh is a Principal Solutions Architect at AWS. He lives in the San Francisco Bay area and helps customers architect and optimize applications on AWS. In his spare time, he enjoys reading and spending time with his family.

Modernize your Penetration Testing Architecture on AWS Fargate

Post Syndicated from Conor Walsh original https://aws.amazon.com/blogs/architecture/modernize-your-penetration-testing-architecture-on-aws-fargate/

Organizations in all industries are innovating their application stack through modernization. Developers have found that modular architecture patterns, serverless operational models, and agile development processes provide great benefits. They offer faster innovation, reduced risk, and reduction in total cost of ownership.

Security organizations must evolve and innovate as well. But security practitioners often find themselves stuck between using powerful yet inflexible open-source tools with little support, and monolithic software with expensive and restrictive licenses.

This post describes how you can use modern cloud technologies to build a scalable penetration testing platform, with no infrastructure to manage.

The penetration testing monolith

AWS operates under the shared responsibility model, where AWS is responsible for the security of the cloud, and the customer is responsible for securing workloads in the cloud. This includes validating the security of your internal and external attack surface. Following the AWS penetration testing policy, customers can run tests against their AWS accounts, except for denial of service (DoS).

A legacy model commonly involves a central server for running a scanning application among the team. The server must be powerful enough for peak load and likely runs 24/7. Common licensing for scanner software is capped on the number of targets you can scan. This model does not scale, and incurs cost when no assessments are being performed.

Penetration testers must constantly reinvent their toolkit. Many one-off tools or scripts are built during engagements when encountering a unique problem. These tools and their environments are often customized, making standardization between machines and software difficult. Building, maintaining, and testing UI/UX and platform compatibility can be expensive and difficult to scale. This often leads to these tools being discarded and the value lost when the analyst moves on to the next engagement. Later, other analysts may run into the same scenario and need to rebuild the tool all over again, resulting in duplicated effort.

Network security scanning using modern cloud infrastructure

By using modern cloud container technologies, we can redesign this monolithic architecture to one that scales to meet increased demand, yet incurs no cost when idle. Containerization provides flexibility and secure isolation.

Figure 1. Overview of the serverless security scanning architecture

Figure 1. Overview of the serverless security scanning architecture

Scanning task flow

This workflow is based on the architecture shown in Figure 1:

  1. User authenticates to Amazon Cognito with their organization’s SSO.
  2. User makes authorized request to Amazon API Gateway.
  3. Request is forwarded to an AWS Lambda function that pulls configuration from Amazon Simple Storage Service (S3).
  4. Lambda function validates parameters, incorporates them into the task definition, and calls Amazon Elastic Container Service (ECS).
  5. ECS orchestrates worker nodes using AWS Fargate compute engine and initiates task.
  6. ECS asynchronously returns the task configuration to Lambda, which sanitizes sensitive data and sends response through API Gateway.
  7. The ECS task launches one or more containers, which run the tool.
  8. Scan results are stored in the ephemeral storage provided by Fargate.
  9. Final container in the ECS task copies the scan report to S3.

Now we’ll describe the different components of the architecture shown in Figure 1. Start by packaging one’s favorite tool into a container, and publish it to Amazon Elastic Container Registry (ECR). ECR provides your containers additional layers of security assurance with built-in dependency vulnerability scans.

AWS Fargate is a serverless compute engine powering Amazon ECS to orchestrate container tasks. Fargate scales up capacity to support the current load, and scales down once complete to reduce cost. By default, Fargate offers 20 GB of ephemeral storage to each ECS task for shared storage between containers as volume mounts.

Task input and output can be processed with custom code running on the serverless computing service AWS Lambda. For multi-stage Lambda functionality, you can use AWS Step Functions.

Amazon API Gateway can forward incoming requests to these Lambda functions. API Gateway provides serverless REST endpoints to handle requests processed by Lambda functions. Amazon Cognito authorizes users through API Gateway or your organization’s single-sign on (SSO) provider.

The final step of the ECS task can upload any resulting files to an Amazon S3 bucket. Amazon S3 offers industry-leading scalability, data availability, security, and performance with integration into other AWS services. This means that the results of your data can be consumed by other AWS services for processing, analytics, machine learning, and security controls.

Amazon CloudWatch Events are used to build an event-based workflow. The S3 upload initiates a CloudWatch Event, which can then invoke a Lambda function to process the file, or launch another ECS task.

This solution is completely serverless. It will scale on demand, yet cost nothing when not in use. This architecture can support anything that can be run in a container, regardless of tool function.

Network Mapper workflow

Figure 2. Network Mapper scanner task workflow

Figure 2. Network Mapper scanner task workflow

The example in Figure 2 was based on using a tool called Network Mapper, or Nmap. However, a variety of tools can be used, including nslookup/dig, Selenium, Nikto, recon-ng, SpiderFoot, Greenbone Vulnerability Manager (GVM), or OWASP ZAP. You can use anything that runs in a container! With some additional work, findings could be fed into AWS security services like AWS Security Hub, or Amazon GuardDuty. You can also use AWS Partner Network services like Splunk and Datadog, or open source frameworks like Metasploit and DefectDojo. The flexibility to add additional applications that integrate with AWS services means that this architecture can be easily deployed into a variety of AWS environments.

Remember, installation and use of software not included in an AWS-supported Amazon Machine Image (AMI) or container, falls into the customer side of the shared responsibility model. Make sure to do your due diligence in securing any software you decide to use in this or any workload. To reduce blast radius, run this in an isolated account and only provide least privilege access to targets.

Conclusion

In this blog post, we showed how to run a penetration testing workload on a modern platform, powered with serverless, and container-based services. Amazon API Gateway is the entry point for your architecture, which calls on AWS Lambda. Lambda builds a task definition to launch a fully orchestrated, on-demand container workload using AWS Fargate and Amazon ECS. The final stage of the ECS task copies the results of the scan to Amazon S3. This can be accessed by security analysts or other downstream containers, tools, or services.

We encourage you to go build this architecture in your own environment, and begin conducting your own tests! Construct your Nmap container and store it in Amazon ECR or use securecodebox/nmap, a Docker container built for the Open Web Application Security Project® (OWASP) SecureCodeBox project. Make sure to spend time securing this workload, especially when using open-source software you’re not familiar with. Now go get scanning!

Migrating a Database Workflow to Modernized AWS Workflow Services

Post Syndicated from Scott Wainner original https://aws.amazon.com/blogs/architecture/migrating-a-database-workflow-to-modernized-aws-workflow-services/

The relational database is a critical resource in application architecture. Enterprise organizations often use relational database management systems (RDBMS) to provide embedded workflow state management. But this can present problems, such as inefficient use of data storage and compute resources, performance issues, and decreased agility. Add to this the responsibility of managing workflow states through custom triggers and job-based algorithms, which further exacerbate the performance constraints of the database. The complexity of modern workflows, frequency of runtime, and external dependencies encourages us to seek alternatives to using these database mechanisms.

This blog describes how to use modernized workflow methods that will mitigate database scalability constraints. We’ll show how transitioning your workflow state management from a legacy database workflow to AWS services enables new capabilities with scale.

A workflow system is composed of an ordered set of tasks. Jobs are submitted to the workflow where tasks are initiated in the proper sequence to achieve consistent results. Each task is defined with a task input criterion, task action, task output, and task disposition, see Figure 1.

Figure 1. Task with input criteria, an action, task output, and task disposition

Figure 1. Task with input criteria, an action, task output, and task disposition

Embedded Workflow

Figure 2 depicts the database serving as the workflow state manager where an external entity submits a job for execution into the database workflow. This can be challenging, as the embedded workflow definition requires the use of well-defined database primitives. In addition, any external tasks require tight coupling with database primitives that constrains workflow agility.

Figure 2. Embedded database workflow mechanisms with internal and external task entities

Figure 2. Embedded database workflow mechanisms with internal and external task entities

Externalized workflow

A paradigm change is made with use of a modernized workflow management system, where the workflow state exists external to the relational database. A workflow management system is essentially a modernized database specifically designed to manage the workflow state (depicted in Figure 3.)

Figure 3. External task manager extracting workflow state, job data, performing the task, and re-inserting the job data back into the database

Figure 3. External task manager extracting workflow state, job data, performing the task, and re-inserting the job data back into the database

AWS offers two workflow state management services: Amazon Simple Workflow Service (Amazon SWF) and AWS Step Functions. The workflow definition and workflow state are no longer stored in a relational database; these workflow attributes are incorporated into the AWS service. The AWS services are highly scalable, enable flexible workflow definition, and integrate tasks from many other systems, including relational databases. These capabilities vastly expand the types of tasks available in a workflow. Migrating the workflow management to an AWS service reduces demand placed upon the database. In this way, the database’s primary value of representing structured and relational data is preserved. AWS Step Functions offers a well-defined set of task  primitives for the workflow designer. The designer can still incorporate tasks that leverage the inherent relational database capabilities.

Pull and push workflow models

First, we must differentiate between Amazon SWF and AWS Step Functions to determine which service is optimal for your workflow. Amazon SWF uses an HTTPS API pull model where external Workers and Deciders execute Tasks and assert the Next-Step, respectively. The workflow state is captured in the Amazon SWF history table. This table tracks the state of jobs and tasks so a common reference exists for all the candidate Workers and Deciders.

Amazon SWF does require development of external entities that make the appropriate API calls into Amazon SWF. It inherently supports external tasks that require human intervention. This workflow can tolerate long lead times for task execution. The Amazon SWF pull model is represented in the Figure 4.

Figure 4. ‘Pull model’ for workflow definition when using Amazon SWF

Figure 4. ‘Pull model’ for workflow definition when using Amazon SWF

In contrast, AWS Step Functions uses a push model, shown in Figure 5, that initiates workflow tasks and integrates seamlessly with other AWS services. AWS Step Functions may also incorporate mechanisms that enable long-running tasks that require human intervention. AWS Step Functions provides the workflow state management, requires minimal coding, and provides traceability of all transactions.

Figure 5. ‘Push model’ for workflow definition when using AWS Step Functions

Figure 5. ‘Push model’ for workflow definition when using AWS Step Functions

Workflow optimizations

The introduction of an external workflow manager such as AWS Step Functions or Amazon SWF, can effectively handle long-running tasks, computationally complex processes, or large media files. AWS workflow managers support asynchronous call-back mechanisms to track task completion. The state of the workflow is intrinsically captured in the service, and the logging of state transitions is automatically captured. Computationally expensive tasks are addressed by invoking high-performance computational resources.

Finally, the AWS workflow manager also improves the handling of large data objects. Previously, jobs would transfer large data objects (images, videos, or audio) into a database’s embedded workflow manager. But this impacts the throughput capacity and consumes database storage.

In the new paradigm, large data objects are no longer transferred to the workflow as jobs, but as job pointers. These are transferred to the workflow whenever tasks must reference external object storage systems. The sequence of state transitions can be traced through CloudWatch Events. This verifies workflow completion, diagnostics of task execution (start, duration, and stop) and metrics on the number of jobs entering the various workflows.

Large data objects are best captured in more cost-effective object storage solutions such as Amazon Simple Storage Service (Amazon S3). Data records may be conveyed via a variety of NoSQL storage mechanisms including:

The workflow manager stores pointer references so tasks can directly access these data objects and perform transformation on the data. It provides pointers to the results without transferring the data objects to the workflow. Transferring pointers in the workflow as opposed to transferring large data objects significantly improves the performance, reduces costs, and dramatically improves scalability. You may continue to use the RDBMS for the storage of structured data and use its SQL capabilities with structured tables, joins, and stored procedures. AWS Step Functions enable indirect integration with relational databases using tools such as the following:

  • AWS Lambda: Short-lived execution of custom code to handle tasks
  • AWS Glue: Data integration enabling combination and preparation of data including SQL

AWS Step Functions can be coupled with AWS Lambda, a serverless compute capability. Lambda code can manipulate the job data and incorporate many other AWS services. AWS Lambda can also interact with any relational database including Amazon Relational Database Service (RDS) or Amazon Aurora as the executor of a task.

The modernized architecture shown in Figure 6 offers more flexibility in creating new workflows that can evolve with your business requirements.

Figure 6. Using Step Functions as workflow state manager

Figure 6. Using Step Functions as workflow state manager

Summary

Several key advantages are highlighted with this modernized architecture using either Amazon SWF or AWS Step Functions:

  • You can manage multiple versions of a workflow. Backwards compatibility is maintained as capability expands. Previous business requirements using metadata interpretation on job submission is preserved.
  • Tasks leverage loose coupling of external systems. This provides far more data processing and data manipulation capabilities in a workflow.
  • Upgrades can happen independently. A loosely coupled system enables independent upgrade capabilities of the workflow or the external system executing the task.
  • Automatic scaling. Serverless architecture scales automatically with the growth in job submissions.
  • Managed services. AWS provides highly resilient and fault tolerant managed services
  • Recovery. Instance recovery mechanisms can manage workflow state machines.

The modernized workflow using Amazon SWF or AWS Step Functions offers many key advantages. It enables application agility to adapt to changing business requirements. By using a managed service, the enterprise architect can focus on the workflow requirements and task actions, rather than building out a workflow management system. Finally, critical intellectual property developed in the RDBMS system can be preserved as tasks in the modernized workflow using AWS services.

Further reading:

Optimize your IoT Services for Scale with IoT Device Simulator

Post Syndicated from Ajay Swamy original https://aws.amazon.com/blogs/architecture/optimize-your-iot-services-for-scale-with-iot-device-simulator/

The IoT (Internet of Things) has accelerated digital transformation for many industries. Companies can now offer smarter home devices, remote patient monitoring, connected and autonomous vehicles, smart consumer devices, and many more products. The enormous volume of data emitted from IoT devices can be used to improve performance, efficiency, and develop new service and business models. This can help you build better relationships with your end consumers. But you’ll need an efficient and affordable way to test your IoT backend services without incurring significant capex by deploying test devices to generate this data.

IoT Device Simulator (IDS) is an AWS Solution that manufacturing companies can use to simulate data, test device integration, and improve the performance of their IoT backend services. The solution enables you to create hundreds of IoT devices with unique attributes and properties. You can simulate data without configuring and managing physical devices.

An intuitive UI to create and manage devices and simulations

IoT Device Simulator comes with an intuitive user interface that enables you to create and manage device types for data simulation. The solution also provides you with a pre-built autonomous car device type to simulate a fleet of connected vehicles. Once you create devices, you can create simulations and generate data (see Figure 1.)

Figure 1. The landing page UI enables you to create devices and simulation

Figure 1. The landing page UI enables you to create devices and simulation

Create devices and simulate data

With IDS, you can create multiple device types with varying properties and data attributes (see Figure 2.) Each device type has a topic where simulation data is sent. The supported data types are object, array, sinusoidal, location, Boolean, integer, float, and more. Refer to this full list of data types. Additionally, you can import device types via a specific JSON format or use the existing automotive demo to pre-populate connected vehicles.

Figure 2. Create multiple device types and their data attributes

Figure 2. Create multiple device types and their data attributes

Create and manage simulations

With IDS, you can create simulations with one device or multiple device types (see Figure 3.) In addition, you can specify the number of devices to simulate for each device type and how often data is generated and sent.

Figure 3. Create simulations for multiple devices

Figure 3. Create simulations for multiple devices

You can then run multiple simulations (see Figure 4) and use the data generated to test your IoT backend services and infrastructure. In addition, you have the flexibility to stop and restart the simulation as needed.

Figure 4. Run and stop multiple simulations

Figure 4. Run and stop multiple simulations

You can view the simulation in real time and observe the data messages flowing through. This way you can ensure that the simulation is working as expected (see Figure 5.) You can stop the simulation or add a new simulation to the mix at any time.

Figure 5. Observe your simulation in real time

Figure 5. Observe your simulation in real time

IoT Device Simulator architecture

Figure 6. IoT Device Simulator architecture

Figure 6. IoT Device Simulator architecture

The AWS CloudFormation template for this solution deploys the following architecture, shown in Figure 6:

  1. Amazon CloudFront serves the web interface content from an Amazon Simple Storage Service (Amazon S3) bucket.
  2. The Amazon S3 bucket hosts the web interface.
  3. Amazon Cognito user pool authenticates the API requests.
  4. An Amazon API Gateway API provides the solution’s API layer.
  5. AWS Lambda serves as the solution’s microservices and routes API requests.
  6. Amazon DynamoDB stores simulation and device type information.
  7. AWS Step Functions include an AWS Lambda simulator function to simulate devices and send messages.
  8. An Amazon S3 bucket stores pre-defined routes that are used for the automotive demo (which is a pre-built example in the solution).
  9. AWS IoT Core serves as the endpoint to which messages are sent.
  10. Amazon Location Service provides the map display showing the location of automotive devices for the automotive demo.

The IoT Device Simulator console is hosted on an Amazon S3 bucket, which is accessed via Amazon CloudFront. It uses Amazon Cognito to manage access. API calls, such as retrieving or manipulating information from the databases or running simulations, are routed through API Gateway. API Gateway calls the microservices, which will call the relevant service.

For example, when creating a new device type, the request is sent to API Gateway, which then routes the request to the microservices Lambda function. Based on the request, the microservices Lambda function recognizes that it is a request to create a device type and saves the device type to DynamoDB.

Running a simulation

When running a simulation, the microservices Lambda starts a Step Functions workflow. First, the request contains information about the simulation to be run, including the unique device type ID. Then, using the unique device type ID, Step Functions retrieves all the necessary information about each device type to run the simulation. Once all the information has been retrieved, the simulator Lambda function is run. The simulator Lambda function uses the device type information, including the message payload template. The Lambda function uses this template to build the message sent to the IoT topic specified for the device type.

When running a custom device type, the simulator generates random information based on the values provided for each attribute. For example, when the automotive simulation is run, the simulation runs a series of calculations to simulate an automobile moving along a series of pre-defined routes. Pre-defined routes are created and stored in an S3 bucket, when the solution is launched. The simulation retrieves the routes at random each time the Lambda function runs. Automotive demo simulations also show a map generated from Amazon Location Service and display the device locations as they move.

The simulator exits once the Lambda function has completed or has reached the fifteen-minute execution limit. It then passes all the necessary information back to the Step Function. Step Functions then enters a choice state and restarts the Lambda function if it has not yet surpassed the duration specified for the simulation. It then passes all the pertinent information back to the Lambda function so that it can resume where it left off. The simulator Lambda function also checks DynamoDB every thirty seconds to see if the user has manually stopped the simulation. If it has, it will end the simulation early. Once the simulation is complete, the Step Function updates the DynamoDB table.

The solution enables you to launch hundreds of devices to test backend infrastructure in an IoT workflow. The solution contains an Import/Export feature to share device types. Exporting a device type generates a JSON file that represents the device type. The JSON file can then be imported to create the same device type automatically. The solution allows the viewing of up to 100 messages while the solution is running. You can also filter the messages by topic and device and see what data each device emits.

Conclusion

IoT Device Simulator is designed to help customers test device integration and IoT backend services more efficiently without incurring capex for physical devices. This solution provides an intuitive web-based graphic user interface (GUI) that enables customers to create and simulate hundreds of connected devices. It is not necessary to configure and manage physical devices or develop time-consuming scripts. Although we’ve illustrated an automotive application in this post, this simulator can be used for many different industries, such as consumer electronics, healthcare equipment, utilities, manufacturing, and more.

Get started with IoT Device Simulator today.