Tag Archives: Amazon EventBridge

How to automate incident response to security events with AWS Systems Manager Incident Manager

2021-09-17 Sumit Patel

Post Syndicated from Sumit Patel original https://aws.amazon.com/blogs/security/how-to-automate-incident-response-to-security-events-with-aws-systems-manager-incident-manager/

Incident response is a core security capability for organizations to develop, and a core element in the AWS Cloud Adoption Framework (AWS CAF). Responding to security incidents quickly is important to minimize their impacts. Automating incident response helps you scale your capabilities, rapidly reduce the scope of compromised resources, and reduce repetitive work by your security team.

In this post, I show you how to use Incident Manager, a capability of AWS Systems Manager, to build an effective automated incident management and response solution to security events.

You’ll walk through three common security-related events and how you can use Incident Manager to automate your response.

AWS account root user activity: An Amazon Web Services (AWS) account root user has full access to all your resources for all AWS services, including billing information. It’s therefore elemental to adhere to the best practice of using the root user only to create your first IAM user and securely lock away the root user credentials and use them to perform only a few account and service management tasks. And it is critical to be aware when root user activity occurs in your AWS account.
Amazon GuardDuty high severity findings: Amazon GuardDuty is a threat detection service that continuously monitors for malicious or unauthorized behavior to help protect your AWS accounts and workloads. In this blog post, you’ll learn how to initiate an incident response plan whenever a high severity finding is discovered.
AWS Config rule change and S3 bucket allowing public access: AWS Config enables continuous monitoring of your AWS resources, making it simple to assess, audit, and record resource configurations and changes. You will use AWS Config to monitor your Amazon Simple Storage Service (S3) bucket ACLs and policies for settings that allow public read or public write access.

Prerequisites

If this is your first time using Incident Manager, follow the initial onboarding steps in Getting prepared with Incident Manager.

Incident Manager can start managing incidents automatically using Amazon CloudWatch or Amazon EventBridge. For the solution in this blog post, you will use EventBridge to capture events and start an incident.

To complete the steps in this walkthrough, you need the following:

An AWS account and AWS Identity Access and Management (IAM) permissions to access Systems Manager, GuardDuty, Config, S3, and EventBridge. Your IAM user or role should also have iam:CreateServiceLinkedRole permissions. Incident Manager uses this permission to create the service-linked role AWSServiceRoleforIncidentManager in your account. For more information, see Using service-linked roles for Incident Manager.
To enable Systems Manager, follow the steps in the Manage instances using AWS Systems Manager Quick Setup blog post. If you want to customize Systems Manager beyond the quick setup, see Setting up AWS Systems Manager.
To enable GuardDuty in your account, follow the steps in Getting started with GuardDuty.
To enable AWS Config in your account, follow the steps in Getting started with AWS Config.

Create an Incident Manager response plan

A response plan ties together the contacts, escalation plan, and runbook. When an incident occurs, a response plan defines who to engage, how to engage, which runbook to initiate, and which metrics to monitor. By creating a well-defined response plan, you can save your security team time down the road.

Add contacts

Your contacts should include everyone who might be involved in the incident. Follow these steps to add a contact.

To add contacts

Open the AWS Management Console, and then go to Systems Manager within the console, expand Operations Management, and then expand Incident Manager.
Choose Contacts, and then choose Create contact.

Figure 1: Adding contact details
On Contact information, enter names and define contact channels for your contacts.
Under Contact channel, you can select Email, SMS, or Voice. You can also add multiple contact channels.
In Engagement plan, specify how fast to engage your responders. In the example illustrated below, the incident responder will be engaged through email immediately (0 minutes) when an incident is detected and then through SMS 10 minutes into an incident. Complete the fields and then choose Create.

Figure 2: Engagement plan

Create a response plan

Once you’ve created your contacts, you can create a response plan to define how to respond to incidents. Refer to the Best Practices for Response Plans.

Note: (Optional) You can also create an escalation plan that lets you further define the escalation path for your contacts. You can learn more in Create an escalation plan.

To create a response plan

Open the Incident Manager console, and choose Response plans in the left navigation pane.
Choose Create response plan.
Enter a unique and identifiable name for your response plan.
Enter an incident title. The incident title helps to identify an incident on the incidents home page.
Select an appropriate Impact based on the potential scope of the incident.

Figure 3: Selecting your impact level
(Optional) Choose a chat channel for the incident responders to interact in during an incident. For more information about chat channels, see Chat channels.
(Optional) For Engagement, you can choose any number of contacts and escalation plans. For this solution, select the security team responder that you created earlier as one of your contacts.

Figure 4: Adding engagements
(Optional) You can also create a runbook that can drive the incident mitigation and response. For further information, refer to Runbooks and automation.
Under Execution permissions, choose Create an IAM role using a template. Under Role name, select the IAM role you created in the prerequisites that allows Incident Manager to run SSM automation documents, and then choose Create response plan.

Monitor AWS account root activity

When you first create an AWS account, you begin with a single sign-in identity that has complete access to all AWS services and resources in the account. This identity is called the root user and is accessed by signing in with the email address and password that you used to create the account.

An AWS account root user has full access to all your resources for all AWS services, including billing information. It is critical to prevent root user access from unauthorized use and to be aware whenever root user activity occurs in your AWS account. For more information about AWS recommendations, see Security best practices in IAM.

To be certain that all root user activity is authorized and expected, it’s important to monitor root API calls to a given AWS account and to be notified when root user activity is detected.

Create an EventBridge rule

Create and validate an EventBridge rule to capture AWS account root activity.

To create an EventBridge rule

Open the EventBridge console.
In the navigation pane, choose Rules, and then choose Create rule.
Enter a name and description for the rule.
For Define pattern, choose Event pattern.
Choose Custom pattern.

Enter the following event pattern:

{
  "detail-type": [
    "AWS API Call via CloudTrail",
    "AWS Console Sign In via CloudTrail"
  ],
  "detail": {
    "userIdentity": {
      "type": [
        "Root"
      ]
    }
  }
}

For Select targets, choose Incident Manager response plan.
For Response plan, choose SecurityEventResponsePlan, which you created when you set up Incident Manager.
To create an IAM role automatically, choose Create a new role for this specific resource. To use an existing IAM role, choose Use existing role.
(Optional) Enter one or more tags for the rule.
Choose Create.

To validate the rule

Sign in using root credentials.
This console login activity by a root user should invoke the Incident Manager response plan and show an open incident as illustrated below. The respective contact channels that you defined earlier in your Engagement Plan, will be engaged.

Figure 5: Incident Manager open incidents

Watch for GuardDuty high severity findings

GuardDuty is a monitoring service that analyzes AWS CloudTrail management and Amazon S3 data events, Amazon Virtual Private Cloud (Amazon VPC) flow logs, and Amazon Route 53 DNS logs to generate security findings for your account. Once GuardDuty is enabled, it immediately starts monitoring your environment.

GuardDuty integrates with EventBridge, which can be used to send findings data to other applications and services for processing. With EventBridge, you can use GuardDuty findings to invoke automatic responses to your findings by connecting finding events to targets such as Incident Manager response plan.

Create an EventBridge rule

You’ll use an EventBridge rule to capture GuardDuty high severity findings.

To create an EventBridge rule

Open the EventBridge console.
In the navigation pane, select Rules, and then choose Create rule.
Enter a name and description for the rule.
For Define pattern, choose Event pattern.
Choose Custom pattern

Enter the following event pattern which will filter on GuardDuty high severity findings

{
  "source": ["aws.guardduty"],
  "detail-type": ["GuardDuty Finding"],
  "detail": {
    "severity": [
      7.0,
      7.1,
      7.2,
      7.3,
      7.4,
      7.5,
      7.6,
      7.7,
      7.8,
      7.9,
      8,
      8.0,
      8.1,
      8.2,
      8.3,
      8.4,
      8.5,
      8.6,
      8.7,
      8.8,
      8.9
    ]
  }
}

For Select targets, choose Incident Manager response plan.
For Response plan, select SecurityEventResponsePlan, which you created when you set up Incident Manager.
To create an IAM role automatically, choose Create a new role for this specific resource. To use an IAM role that you created before, choose Use existing role.
(Optional) Enter one or more tags for the rule.
Choose Create.

To validate the rule

To test and validate whether the above rule is now functional, you can generate sample findings within the GuardDuty console.

Open the GuardDuty console.
In the navigation pane, choose Settings.
On the Settings page, under Sample findings, choose Generate sample findings.
In the navigation pane, choose Findings. The sample findings are displayed on the Current findings page with the prefix [SAMPLE].

Once you have generated sample findings, your Incident Manager response plan will be invoked almost immediately and the engagement plan with your contacts will begin.

You can select an open incident in the Incident Manager console to see additional details from the GuardDuty finding. Figure 6 shows a high severity finding.

Figure 6: Incident Manager open incident for GuardDuty high severity finding

Monitor S3 bucket settings for public access

AWS Config enables continuous monitoring of your AWS resources, making it easier to assess, audit, and record resource configurations and changes. AWS Config does this through rules that define the desired configuration state of your AWS resources. AWS Config provides a number of AWS managed rules that address a wide range of security concerns such as checking that your Amazon Elastic Block Store (Amazon EBS) volumes are encrypted, your resources are tagged appropriately, and multi-factor authentication (MFA) is enabled for root accounts.

Set up AWS Config and EventBridge

You will use AWS Config to monitor your S3 bucket ACLs and policies for violations which could allow public read or public write access. If AWS Config finds a policy violation, it will initiate an AWS EventBridge rule to invoke your Incident Manager response plan.

To create the AWS Config rule to capture S3 bucket public access

Sign in to the AWS Config console.
If this is your first time in the AWS Config console, refer to the Getting Started guide for more information.
Select Rules from the menu and choose Add Rule.
On the AWS Config rules page, enter S3 in the search box and select the s3-bucket-public-read-prohibited and s3-bucket-public-write-prohibited rules, and then choose Next.

Figure 7: AWS Config rules
Leave the Configure rules page as default and select Next.
On the Review page, select Add Rule. AWS Config is now analyzing your S3 buckets, capturing their current configurations, and evaluating the configurations against the rules you selected.

To create the EventBridge rule

Open the Amazon EventBridge console
In the navigation pane, choose Rules, and then choose Create rule.
Enter a name and description for the rule.
For Define pattern, choose Event pattern.
Choose Custom pattern

Enter the following event pattern, which will filter on AWS Config rule s3-bucket-public-write-prohibited being non-compliant.

{
  "source": ["aws.config"],
  "detail-type": ["Config Rules Compliance Change"],
  "detail": {
    "messageType": ["ComplianceChangeNotification"],
    "configRuleName": ["s3-bucket-public-write-prohibited", ""],
    "newEvaluationResult": {
      "complianceType": [
        "NON_COMPLIANT"
      ]
    }
  }
}

For Select targets, choose Incident Manager response plan.
For Response plan, choose SecurityEventResponsePlan, which you created earlier when setting up Incident Manager.
To create an IAM role automatically, choose Create a new role for this specific resource. To use an existing IAM role, choose Use existing role.
(Optional) Enter one or more tags for the rule.
Choose Create.

To validate the rule

Create a compliant test S3 bucket with no public read or write access through either an ACL or a policy.
Change the ACL of the bucket to allow public listing of objects so that the bucket is non-compliant.

Figure 8: Amazon S3 console
After a few minutes, you should see the AWS Config rule initiated which invokes the EventBridge rule and therefore your Incident Manager response plan.

Summary

In this post, I showed you how to use Incident Manager to monitor for security events and invoke a response plan via Amazon CloudWatch or Amazon EventBridge. AWS CloudTrail API activity (for a root account login), Amazon GuardDuty (for high severity findings), and AWS Config (to enforce policies like preventing public write access to an S3 bucket). I demonstrated how you can create an incident management and response plan to ensure you have used the power of cloud to create automations that respond to and mitigate security incidents in a timely manner. To learn more about Incident Manager, see What Is AWS Systems Manager Incident Manager in the AWS documentation.

If you have feedback about this post, submit comments in the comments section below. If you have questions about this post, start a new thread on the Systems Manager forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Building a serverless GIF generator with AWS Lambda: Part 2

2021-09-13 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-serverless-gif-generator-with-aws-lambda-part-2/

In part 1 of this blog post, I explain how a GIF generation service can support a front-end application for video streaming. I compare the performance of a server-based and serverless approach and show how parallelization can significantly improve processing time. I introduce an example application and I walk through the solution architecture.

In this post, I explain the scaling behavior of the example application and consider alternative approaches. I also look at how to manage memory, temporary space, and files in this type of workload. Finally, I discuss the cost of this approach and how to determine if a workload can use parallelization.

To set up the example, visit the GitHub repo and follow the instructions in the README.md file. The example application uses the AWS Serverless Application Model (AWS SAM), enabling you to deploy the application more easily in your own AWS account. This walkthrough creates some resources covered in the AWS Free Tier but others incur cost.

Scaling up the AWS Lambda workers with Amazon EventBridge

There are two AWS Lambda functions in the example application. The first detects the length of the source video and then generates batches of events containing start and end times. These events are put onto the Amazon EventBridge default event bus.

An EventBridge rule matches the events and invokes the second Lambda function. This second function receives the events, which have the following structure:

{
    "version": "0",
    "id": "06a1596a-1234-1234-1234-abc1234567",
    "detail-type": "newVideoCreated",
    "source": "custom.gifGenerator",
    "account": "123456789012",
    "time": "2021-0-17T11:36:38Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "key": "long.mp4",
        "start": 2250,
        "end": 2279,
        "length": 3294.024,
        "tsCreated": 1623929798333
    }
}

The detail attribute contains the unique start and end time for the slice of work. Each Lambda invocation receives a different start and end time and works on a 30-second snippet of the whole video. The function then uses FFMPEG to download the original video from the source Amazon S3 bucket and perform the processing for its allocated time slice.

The EventBridge rule matches events and invokes the target Lambda function asynchronously. The Lambda service scales up the number of execution environments in response to the number of events:

The first function produces batches of events almost simultaneously but the worker function takes several seconds to process a single request. If there is no existing environment available to handle the request, the Lambda scales up to process the work. As a result, you often see a high level of concurrency when running this application, which is how parallelization is achieved:

Lambda continues to scale up until it reaches the initial burst concurrency quotas in the current AWS Region. These quotas are between 500 and 3000 execution environments per minute initially. After the initial burst, concurrency scales by an additional 500 instances per minute.

If the number of events is higher, Lambda responds to EventBridge with a throttling error. The EventBridge service retries the events with exponential backoff for 24 hours. Once Lambda is scaled sufficiently or existing execution environments become available, the events are then processed.

This means that under exceptional levels of heavy load, this retry pattern adds latency to the overall GIF generation task. To manage this, you can use Provisioned Concurrency to ensure that more execution environments are available during periods of very high load.

Alternative ways to scale the Lambda workers

The asynchronous invocation mode for Lambda allows you to scale up worker Lambda functions quickly. This is the mode used by EventBridge when Lambda functions are defined as targets in rules. The other benefit of using EventBridge to decouple the two functions in this example is extensibility. Currently, the events have only a single consumer. However, you can add new capabilities to this application by building new event consumers, without changing the producer logic. Note that using EventBridge in this architecture costs $1 per million events put onto the bus (this cost varies by Region). Delivery to targets in EventBridge is free.

This design could similarly use Amazon SNS, which also invokes consuming Lambda functions asynchronously. This costs $0.50 per million messages and delivery to Lambda functions is free (this cost varies by Region). Depending on if you use EventBridge capabilities, SNS may be a better choice for decoupling the two Lambda functions.

Alternatively, the first Lambda function could invoke the second function by using the invoke method of the Lambda API. By using the AWS SDK for JavaScript, one Lambda function can invoke another directly from the handler code. When the InvocationType is set to ‘Event’, this invocation occurs asynchronously. That means that the calling function does not wait for the target function to finish before continuing.

This direct integration between two Lambda services is the lowest latency alternative. However, this limits the extensibility of the solution in the future without modifying code.

Managing memory, temp space, and files

You can configure the memory for a Lambda function up to 10,240 MB. However, the temporary storage available in /tmp is always 512 MB, regardless of memory. Increasing the memory allocation proportionally increases the amount of virtual CPU and network bandwidth available to the function. To learn more about how this works in detail, watch Optimizing Lambda performance for your serverless applications.

The original video files used in this workload may be several gigabytes in size. Since these may be larger than the /tmp space available, the code is designed to keep the movie file in memory. As a result, this solution works for any length of movie that can fit into the 10 GB memory limit.

The FFMPEG application expects to work with local file systems and is not designed to work with object stores like Amazon S3. It can also read video files from HTTP endpoints, so the example application loads the S3 object over HTTPS instead of downloading the file and using the /tmp space. To achieve this, the code uses the getSignedUrl method of the S3 class in the SDK:

 	// Configure S3
 	const AWS = require('aws-sdk')
 	AWS.config.update({ region: process.env.AWS_REGION })
 	const s3 = new AWS.S3({ apiVersion: '2006-03-01' }) 

 	// Get signed URL for source object
	const params = {
		Bucket: record.s3.bucket.name, 
		Key: record.s3.object.key, 
		Expires: 300
	}
	const url = s3.getSignedUrl('getObject', params)

The resulting URL contains credentials to download the S3 object over HTTPs. The Expires attributes in the parameters determines how long the credentials are valid for. The Lambda function calling this method must have appropriate IAM permissions for the target S3 bucket.

The GIF generation Lambda function stores the output GIF and JPG in the /tmp storage space. Since the function can be reused by subsequent invocations, it’s important to delete these temporary files before each invocation ends. This prevents the function from using all of the /tmp space available. This is handled by the tmpCleanup function:

const fs = require('fs')
const path = require('path')
const directory = '/tmp/'

// Deletes all files in a directory
const tmpCleanup = async () => {
    console.log('Starting tmpCleanup')
    fs.readdir(directory, (err, files) => {
        return new Promise((resolve, reject) => {
            if (err) reject(err)

            console.log('Deleting: ', files)                
            for (const file of files) {
                const fullPath = path.join(directory, file)
                fs.unlink(fullPath, err => {
                    if (err) reject (err)
                })
            }
            resolve()
        })
    })
}

When the GenerateFrames parameter is set to true in the AWS SAM template, the worker function generates one frame per second of video. For longer videos, this results in a significant number of files. Since one of the dimensions of S3 pricing is the number of PUTs, this function increases the cost of the workload when using S3.

For applications that are handling large numbers of small files, it can be more cost effective to use Amazon EFS and mount the file system to the Lambda function. EFS charges based upon data storage and throughput, instead of number of files. To learn more about using EFS with Lambda, read this Compute Blog post.

Calculating the cost of the worker Lambda function

While parallelizing Lambda functions significantly reduces the overall processing time in this case, it’s also important to calculate the cost. To process the 3-hour video example in part 1, the function uses 345 invocations with 4096 MB of memory. Each invocation has an average duration of 4,311 ms.

Using the AWS Pricing Calculator, and ignoring the AWS Free Tier allowance, the costs to process this video is approximately $0.10.

There are additional charges for other services used in the example application, such as EventBridge and S3. However, in terms of compute cost, this may compare favorably with server-based alternatives that you may have to scale manually depending on traffic. The exact cost depends upon your implementation and latency needs.

Deciding if a workload can be parallelized

The GIF generation workload is a good candidate for parallelization. This is because each 30-second block of work is independent and there is no strict ordering requirement. The end result is not impacted by the order that the GIFs are generated in. Each GIF also takes several seconds to generate, which is why the time saving comparison with the sequential, server-based approach is so significant.

Not all workloads can be parallelized and in many cases the work duration may be much shorter. This workload interacts with S3, which can scale to any level of read or write traffic created by the worker functions. You may use other downstream services that cannot scale this way, which may limit the amount of parallel processing you can use.

To learn more about designing and operating Lambda-based applications, read the Lambda Operator Guide.

Conclusion

Part 2 of this blog post expands on some of the advanced topics around scaling Lambda in parallelized workloads. It explains how the asynchronous invocation mode of Lambda scales and different ways to scale the worker Lambda function.

I cover how the example application manages memory, files, and temporary storage space. I also explain how to calculate the compute cost of using this approach, and considering if you can use parallelization in a workload.

For more serverless learning resources, visit Serverless Land.

Building well-architected serverless applications: Optimizing application costs

2021-09-07 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-optimizing-application-costs/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

COST 1. How do you optimize your serverless application costs?

Design, implement, and optimize your application to maximize value. Asynchronous design patterns and performance practices ensure efficient resource use and directly impact the value per business transaction. By optimizing your serverless application performance and its code patterns, you can directly impact the value it provides, while making more efficient use of resources.

Serverless architectures are easier to manage in terms of correct resource allocation compared to traditional architectures. Due to its pay-per-value pricing model and scale based on demand, a serverless approach effectively reduces the capacity planning effort. As covered in the operational excellence and performance pillars, optimizing your serverless application has a direct impact on the value it produces and its cost. For general serverless optimization guidance, see the AWS re:Invent talks, “Optimizing your Serverless applications” Part 1 and Part 2, and “Serverless architectural patterns and best practices”.

Required practice: Minimize external calls and function code initialization

AWS Lambda functions may call other managed services and third-party APIs. Functions may also use application dependencies that may not be suitable for ephemeral environments. Understanding and controlling what your function accesses while it runs can have a direct impact on value provided per invocation.

Review code initialization

I explain the Lambda initialization process with cold and warm starts in “Optimizing application performance – part 1”. Lambda reports the time it takes to initialize application code in Amazon CloudWatch Logs. As Lambda functions are billed by request and duration, you can use this to track costs and performance. Consider reviewing your application code and its dependencies to improve the overall execution time to maximize value.

You can take advantage of Lambda execution environment reuse to make external calls to resources and use the results for subsequent invocations. Use TTL mechanisms inside your function handler code. This ensures that you can prevent additional external calls that incur additional execution time, while preemptively fetching data that isn’t stale.

Review third-party application deployments and permissions

When using Lambda layers or applications provisioned by AWS Serverless Application Repository, be sure to understand any associated charges that these may incur. When deploying functions packaged as container images, understand the charges for storing images in Amazon Elastic Container Registry (ECR).

Ensure that your Lambda function only has access to what its application code needs. Regularly review that your function has a predicted usage pattern so you can factor in the cost of other services, such as Amazon S3 and Amazon DynamoDB.

Required practice: Optimize logging output and its retention

Considering reviewing your application logging level. Ensure that logging output and log retention are appropriately set to your operational needs to prevent unnecessary logging and data retention. This helps you have the minimum of log retention to investigate operational and performance inquiries when necessary.

Emit and capture only what is necessary to understand and operate your component as intended.

With Lambda, any standard output statements are sent to CloudWatch Logs. Capture and emit business and operational events that are necessary to help you understand your function, its integration, and its interactions. Use a logging framework and environment variables to dynamically set a logging level. When applicable, sample debugging logs for a percentage of invocations.

In the serverless airline example used in this series, the booking service Lambda functions use Lambda Powertools as a logging framework with output structured as JSON.

Lambda Powertools is added to the Lambda functions as a shared Lambda layer in the AWS Serverless Application Model (AWS SAM) template. The layer ARN is stored in Systems Manager Parameter Store.

Parameters:
  SharedLibsLayer:
    Type: AWS::SSM::Parameter::Value<String>
    Description: Project shared libraries Lambda Layer ARN
Resources:
    ConfirmBooking:
        Type: AWS::Serverless::Function
        Properties:
            FunctionName: !Sub ServerlessAirline-ConfirmBooking-${Stage}
            Handler: confirm.lambda_handler
            CodeUri: src/confirm-booking
            Layers:
                - !Ref SharedLibsLayer
            Runtime: python3.7
…

The LOG_LEVEL and other Powertools settings are configured in the Globals section as Lambda environment variable for all functions.

Globals:
    Function:
        Environment:
            Variables:
                POWERTOOLS_SERVICE_NAME: booking
                POWERTOOLS_METRICS_NAMESPACE: ServerlessAirline
                LOG_LEVEL: INFO

For Amazon API Gateway, there are two types of logging in CloudWatch: execution logging and access logging. Execution logs contain information that you can use to identify and troubleshoot API errors. API Gateway manages the CloudWatch Logs, creating the log groups and log streams. Access logs contain details about who accessed your API and how they accessed it. You can create your own log group or choose an existing log group that could be managed by API Gateway.

Enable access logs, and selectively review the output format and request fields that might be necessary. For more information, see “Setting up CloudWatch logging for a REST API in API Gateway”.

API Gateway logging

Enable AWS AppSync logging which uses CloudWatch to monitor and debug requests. You can configure two types of logging: request-level and field-level. For more information, see “Monitoring and Logging”.

AWS AppSync logging

Define and set a log retention strategy

Define a log retention strategy to satisfy your operational and business needs. Set log expiration for each CloudWatch log group as they are kept indefinitely by default.

For example, in the booking service AWS SAM template, log groups are explicitly created for each Lambda function with a parameter specifying the retention period.

Parameters:
    LogRetentionInDays:
        Type: Number
        Default: 14
        Description: CloudWatch Logs retention period
Resources:
    ConfirmBookingLogGroup:
        Type: AWS::Logs::LogGroup
        Properties:
            LogGroupName: !Sub "/aws/lambda/${ConfirmBooking}"
            RetentionInDays: !Ref LogRetentionInDays

The Serverless Application Repository application, auto-set-log-group-retention can update the retention policy for new and existing CloudWatch log groups to the specified number of days.

For log archival, you can export CloudWatch Logs to S3 and store them in Amazon S3 Glacier for more cost-effective retention. You can use CloudWatch Log subscriptions for custom processing, analysis, or loading to other systems. Lambda extensions allows you to process, filter, and route logs directly from Lambda to a destination of your choice.

Good practice: Optimize function configuration to reduce cost

Benchmark your function using a different set of memory size

For Lambda functions, memory is the capacity unit for controlling the performance and cost of a function. You can configure the amount of memory allocated to a Lambda function, between 128 MB and 10,240 MB. The amount of memory also determines the amount of virtual CPU available to a function. Benchmark your AWS Lambda functions with differing amounts of memory allocated. Adding more memory and proportional CPU may lower the duration and reduce the cost of each invocation.

In “Optimizing application performance – part 2”, I cover using AWS Lambda Power Tuning to automate the memory testing process to balances performance and cost.

Best practice: Use cost-aware usage patterns in code

Reduce the time your function runs by reducing job-polling or task coordination. This avoids overpaying for unnecessary compute time.

Decide whether your application can fit an asynchronous pattern

Avoid scenarios where your Lambda functions wait for external activities to complete. I explain the difference between synchronous and asynchronous processing in “Optimizing application performance – part 1”. You can use asynchronous processing to aggregate queues, streams, or events for more efficient processing time per invocation. This reduces wait times and latency from requesting apps and functions.

Long polling or waiting increases the costs of Lambda functions and also reduces overall account concurrency. This can impact the ability of other functions to run.

Consider using other services such as AWS Step Functions to help reduce code and coordinate asynchronous workloads. You can build workflows using state machines with long-polling, and failure handling. Step Functions also supports direct service integrations, such as DynamoDB, without having to use Lambda functions.

In the serverless airline example used in this series, Step Functions is used to orchestrate the Booking microservice. The ProcessBooking state machine handles all the necessary steps to create bookings, including payment.

Booking service state machine

To reduce costs and improves performance with CloudWatch, create custom metrics asynchronously. You can use the Embedded Metrics Format to write logs, rather than the PutMetricsData API call. I cover using the embedded metrics format in “Understanding application health” – part 1 and part 2.

For example, once a booking is made, the logs are visible in the CloudWatch console. You can select a log stream and find the custom metric as part of the structured log entry.

Custom metric structured log entry

CloudWatch automatically creates metrics from these structured logs. You can create graphs and alarms based on them. For example, here is a graph based on a BookingSuccessful custom metric.

CloudWatch metrics custom graph

Consider asynchronous invocations and review run away functions where applicable

Take advantage of Lambda’s event-based model. Lambda functions can be triggered based on events ingested into Amazon Simple Queue Service (SQS) queues, S3 buckets, and Amazon Kinesis Data Streams. AWS manages the polling infrastructure on your behalf with no additional cost. Avoid code that polls for third-party software as a service (SaaS) providers. Rather use Amazon EventBridge to integrate with SaaS instead when possible.

Carefully consider and review recursion, and establish timeouts to prevent run away functions.

Conclusion

Design, implement, and optimize your application to maximize value. Asynchronous design patterns and performance practices ensure efficient resource use and directly impact the value per business transaction. By optimizing your serverless application performance and its code patterns, you can reduce costs while making more efficient use of resources.

In this post, I cover minimizing external calls and function code initialization. I show how to optimize logging output with the embedded metrics format, and log retention. I recap optimizing function configuration to reduce cost and highlight the benefits of asynchronous event-driven patterns.

This post wraps up the series, building well-architected serverless applications, where I cover the AWS Well-Architected Tool with the Serverless Lens . See the introduction post for links to all the blog posts.

For more serverless learning resources, visit Serverless Land.

Field Notes: Creating Custom Analytics Dashboards with FireEye Helix and Amazon QuickSight

2021-08-16 Karish Chowdhury

Post Syndicated from Karish Chowdhury original https://aws.amazon.com/blogs/architecture/field-notes-creating-custom-analytics-dashboards-with-fireeye-helix-and-amazon-quicksight/

FireEye Helix is a security operations platform that allows organizations to take control of any incident from detection to response. FireEye Helix detects security incidents by correlating logs and configuration settings from sources like VPC Flow Logs, AWS CloudTrail, and Security groups.

In this blog post, we will discuss an architecture that allows you to create custom analytics dashboards with Amazon QuickSight. These dashboards are based on the threat detection logs collected by FireEye Helix. We automate this process so that data can be pulled and ingested based on a provided schedule. This approach uses AWS Lambda, and Amazon Simple Storage Service (Amazon S3) in addition to QuickSight.

Architecture Overview

The solution outlines how to ingest the security log data from FireEye Helix to Amazon S3 and create QuickSight visualizations from the log data. With this approach, you need Amazon EventBridge to invoke a Lambda function to connect to the FireEye Helix API. There are two steps to this process:

Download the log data from FireEye Helix and store it in Amazon S3.
Create a visualization Dashboard in QuickSight.

The architecture shown in Figure 1 represents the process we will walk through in this blog post. To implement this solution, you will need the following AWS services and features involved:

Figure 1: Solution architecture

Prerequisites to implement the solution:

The following items are required to get your environment set up for this walkthrough.

AWS account.
FireEye Helix search alerts API endpoint. This is available under the API documentation in the FireEye Helix console.
FireEye Helix API key. This FireEye community page explains how to generate an API key with appropriate permissions (always follow least privilege principles). This key is used by the Lambda function to periodically fetch alerts.
AWS Secrets Manager secret (to store the FireEye Helix API key). To set it up, follow the steps outlined in the Creating a secret.

Extract the data from FireEye Helix and load it into Amazon S3

You will use the following high-level steps to retrieve the necessary security log data from FireEye Helix and store it on Amazon S3 to make it available for QuickSight.

Establish an AWS Identity and Access Management (IAM) role for the Lambda function. It must have permissions to access Secrets Manager and Amazon S3 so they can retrieve the FireEye Helix API key and store the extracted data, respectively.
Create an Amazon S3 bucket to store the FireEye Helix security log data.
Create a Lambda function that uses the API key from Secrets Manager, calls the FireEye Helix search alerts API to extract FireEye Helix’s threat detection logs, and stores the data in the S3 bucket.
Establish a CloudWatch EventBridge rule to invoke the Lambda function on an automated schedule.

To simplify this deployment, we have developed a CloudFormation template to automate creating the preceding requirements. Follow the below steps to deploy the template:

Download the source code from the GitHub repository
Navigate to CloudFormation console and select Create Stack
Select “Upload a template file” radio button, click on “Choose file” button, and select “helix-dashboard.yaml” file in the downloaded Github repository. Click “Next” to proceed.
On “Specify stack details” screen enter the parameters shown in Figure 2.

Figure 2: CloudFormation stack creation with initial parameters

The parameters in Figure 2 include:

HelixAPISecretName – Enter the Secrets Manager secret name where the FireEye Helix API key is stored.
HelixEndpointUrl – Enter the Helix search API endpoint URL.
Amazon S3 bucket – Enter the bucket prefix (a random suffix will be added to make it unique).
Schedule – Choose the default option that pulls logs once a day, or enter the CloudWatch event schedule expression.

Select the check box next to “I acknowledge that AWS CloudFormation might create IAM resources.” and then press the Create Stack button. After the CloudFormation stack completes, you will have a fully functional process that will retrieve the FireEye Helix security log data and store it on the S3 bucket.

You can also select the Lambda function from the CloudFormation stack outputs to navigate to the Lambda console. Review the following default code, and add any additional transformation logic according to your needs after fetching the results (line 32).

import boto3
from datetime import datetime
import requests
import os

region_name = os.environ['AWS_REGION']
secret_name = os.environ['APIKEY_SECRET_NAME']
bucket_name = os.environ['S3_BUCKET_NAME']
helix_api_url = os.environ['HELIX_ENDPOINT_URL']

def lambda_handler(event, context):

    now = datetime.now()
    # Create a Secrets Manager client to fetch API Key from Secrets Manager
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )
    apikey = client.get_secret_value(
            SecretId=secret_name
        )['SecretString']
    
    datestr = now.strftime("%B %d, %Y")
    apiheader = {'x-fireeye-api-key': apikey}
    
    try:
        # Call Helix Rest API to fetch the Alerts
        helixalerts = requests.get(
            f'{helix_api_url}?format=csv&query=start:"{datestr}" end:"{datestr}" class=alerts', headers=apiheader)
    
        # Optionally transform the content according to your needs..
        
        # Create a S3 client to upload the CSV file
        s3 = boto3.client('s3')
        path = now.strftime("%Y/%m/%d")+'/alerts-'+now.strftime("%H-%M-%S")+'.csv'
        response = s3.put_object(
            Body=helixalerts.content,
            Bucket=bucket_name,
            Key=path,
        )
        print('S3 upload response:', response)
    except Exception as e:
        print('error while fetching the alerts', e)
        raise e
        
    return {
        'statusCode': 200,
        'body': f'Successfully fetched alerts from Helix and uploaded to {path}'
    }

Creating a visualization in QuickSight

Once the FireEye data is ingested into an S3 bucket, you can start creating custom reports and dashboards using QuickSight. Following is a walkthrough on how to create a visualization in QuickSight based on the data that was ingested from FireEye Helix.

Step 1 – When placing the FireEye data into Amazon S3, ensure you have a clean directory structure so you can partition your data. By partitioning your data, you can restrict the amount of data scanned by each query, thus improving performance and reducing cost. The following is a sample directory structure you could use.

ssw-fireeye-logs

2021

04

05

The following is an example of what the data will look like after it is ingested from FireEye Helix into your Amazon S3 bucket. In this blog post, we will use the Alert_Desc column to report on the types of common attacks.

Step 2 – Next, you must create a manifest file that will instruct QuickSight how to read the FireEye log files on Amazon S3. The preceding example is a manifest file that instructs QuickSight to recursively search for files in the ssw-fireeye-logs bucket, and can be seen in the URIPrefixes section. The GlobalUploadSettings section informs QuickSight the type and format of files it will read.

"fileLocations": [
	{
		"URIPrefixes": [
			"s3://ssw-fireeye-logs/"
		]
	},
	],
	"globalUploadSettings": {
		"format": "CSV",
		"delimiter": ",",
		"textqalifier": "'",
		"containsHeader": "true"
	}
}

Step 3 – Open Amazon QuickSight. Use the AWS Management Console and search for QuickSight.

Step 4 – Below the QuickSight logo, find and select Datasets.

Step 5 – Push the blue New dataset button.

Step 6 – Now you are on Create a Dataset page which enables you to select a data source you would like QuickSight to ingest. Because we have stored the FireEye Helix data on S3, you should choose the S3 data source.

Step 7 – A pop-up box will appear called New S3 data source. Type a data source name, and upload the manifest file you created. Next, push the Connect button.

Step 8 – You are now directed to the Visualize screen. For this exercise let’s choose a Pie chart, you can find this in the Visual types section by hovering over each icon and reading each tool tip that comes up. Look for the tool tip that says Pie chart. After selecting the Pie Chart visual type, two entries in the Field wells section at the top of the screen will show up called Group/Color and Value. Click the drop down in Group/Color and select the Alert_Desc column. Now click the drop down in Value and also select Alter_Desc column but choose count as an aggregate. This will create an informative visualization of the most common attacks based on the sample data shown previously in Step 1.

Figure 3: Visualization screen in QuickSight

Clean up

If you created any resources in AWS for this solution, consider removing them and any example resources you deployed. This includes the FireEye Helix API keys, S3 buckets, Lambda functions, and QuickSight visualizations. This helps ensure that there are no unwanted recurring charges. For more information, review how to delete the CloudFormation stack.

Conclusion

This blog post showed you how to ingest security log data from FireEye Helix and store that data in Amazon S3. We also showed you how to create your own visualizations in QuickSight to better understand the overall security posture of your AWS environment. With this solution, you can perform deep analysis and share these insights among different audiences in your organization (such as, operations, security, and executive management).

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Building well-architected serverless applications: Building in resiliency – part 2

2021-08-10 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-building-in-resiliency-part-2/

Reliability question REL2: How do you build resiliency into your serverless application?

This post continues part 1 of this reliability question. Previously, I cover managing failures using retries, exponential backoff, and jitter. I explain how DLQs can isolate failed messages. I show how to use state machines to orchestrate long running transactions rather than handling these in application code.

Required practice: Manage duplicate and unwanted events

Duplicate events can occur when a request is retried or multiple consumers process the same message from a queue or stream. A duplicate can also happen when a request is sent twice at different time intervals with the same parameters. Design your applications to process multiple identical requests to have the same effect as making a single request.

Idempotency refers to the capacity of an application or component to identify repeated events and prevent duplicated, inconsistent, or lost data. This means that receiving the same event multiple times does not change the result beyond the first time the event was received. An idempotent application can, for example, handle multiple identical refund operations. The first refund operation is processed. Any further refund requests to the same customer with the same payment reference should not be processes again.

When using AWS Lambda, you can make your function idempotent. The function’s code must properly validate input events and identify if the events were processed before. For more information, see “How do I make my Lambda function idempotent?”

When processing streaming data, your application must anticipate and appropriately handle processing individual records multiple times. There are two primary reasons why records may be delivered more than once to your Amazon Kinesis Data Streams application: producer retries and consumer retries. For more information, see “Handling Duplicate Records”.

Generate unique attributes to manage duplicate events at the beginning of the transaction

Create, or use an existing unique identifier at the beginning of a transaction to ensure idempotency. These identifiers are also known as idempotency tokens. A number of Lambda triggers include a unique identifier as part of the event:

Amazon API Gateway: requestContext.requestId
Kinesis: Records[].eventID
Amazon CloudWatch Events: id
Amazon Simple Notification Service (SNS): Records[].Sns.MessageId
Amazon Simple Queue Service (SQS): Records[].messageId

You can also create your own identifiers. These can be business-specific, such as transaction ID, payment ID, or booking ID. You can use an opaque random alphanumeric string, unique correlation identifiers, or the hash of the content.

A Lambda function, for example can use these identifiers to check whether the event has been previously processed.

Depending on the final destination, duplicate events might write to the same record with the same content instead of generating a duplicate entry. This may therefore not require additional safeguards.

Use an external system to store unique transaction attributes and verify for duplicates

Lambda functions can use Amazon DynamoDB to store and track transactions and idempotency tokens to determine if the transaction has been handled previously. DynamoDB Time to Live (TTL) allows you to define a per-item timestamp to determine when an item is no longer needed. This helps to limit the storage space used. Base the TTL on the event source. For example, the message retention period for SQS.

Using DynamoDB to store idempotent tokens

You can also use DynamoDB conditional writes to ensure a write operation only succeeds if an item attribute meets one of more expected conditions. For example, you can use this to fail a refund operation if a payment reference has already been refunded. This signals to the application that it is a duplicate transaction. The application can then catch this exception and return the same result to the customer as if the refund was processed successfully.

Third-party APIs can also support idempotency directly. For example, Stripe allows you to add an Idempotency-Key: <key> header to the request. Stripe saves the resulting status code and body of the first request made for any given idempotency key, regardless of whether it succeeded or failed. Subsequent requests with the same key return the same result.

Validate events using a pre-defined and agreed upon schema

Implicitly trusting data from clients, external sources, or machines could lead to malformed data being processed. Use a schema to validate your event conforms to what you are expecting. Process the event using the schema within your application code or at the event source when applicable. Events not adhering to your schema should be discarded.

For API Gateway, I cover validating incoming HTTP requests against a schema in “Implementing application workload security – part 1”.

Amazon EventBridge rules match event patterns. EventBridge provides schemas for all events that are generated by AWS services. You can create or upload custom schemas or infer schemas directly from events on an event bus. You can also generate code bindings for event schemas.

SNS supports message filtering. This allows a subscriber to receive a subset of the messages sent to the topic using a filter policy. For more information, see the documentation.

JSON Schema is a tool for validating the structure of JSON documents. There are a number of implementations available.

Best practice: Consider scaling patterns at burst rates

Load testing your serverless application allows you to monitor the performance of an application before it is deployed to production. Serverless applications can be simpler to load test, thanks to the automatic scaling built into many of the services. For more information, see “How to design Serverless Applications for massive scale”.

In addition to your baseline performance, consider evaluating how your workload handles initial burst rates. This ensures that your workload can sustain burst rates while scaling to meet possibly unexpected demand.

Perform load tests using a burst strategy with random intervals of idleness

Perform load tests using a burst of requests for a short period of time. Also introduce burst delays to allow your components to recover from unexpected load. This allows you to future-proof the workload for key events when you do not know peak traffic levels.

There are a number of AWS Marketplace and AWS Partner Network (APN) solutions available for performance testing, including Gatling FrontLine, BlazeMeter, and Apica.

In regulating inbound request rates – part 1, I cover running a performance test suite using Gatling, an open source tool.

Gatling performance results

Amazon does have a network stress testing policy that defines which high volume network tests are allowed. Tests that purposefully attempt to overwhelm the target and/or infrastructure are considered distributed denial of service (DDoS) tests and are prohibited. For more information, see “Amazon EC2 Testing Policy”.

Review service account limits with combined utilization across resources

AWS accounts have default quotas, also referred to as limits, for each AWS service. These are generally Region-specific. You can request increases for some limits while other limits cannot be increased. Service Quotas is an AWS service that helps you manage your limits for many AWS services. Along with looking up the values, you can also request a limit increase from the Service Quotas console.

Service Quotas dashboard

As these limits are shared within an account, review the combined utilization across resources including the following:

Amazon API Gateway: number of requests per second across all APIs. (link)
AWS AppSync: throttle rate limits. (link)
AWS Lambda: function concurrency reservations and pool capacity to allow other functions to scale. (link)
Amazon CloudFront: requests per second per distribution. (link)
AWS IoT Core message broker: concurrent requests per second. (link)
Amazon EventBridge: API requests and target invocations limit. (link)
Amazon Cognito: API limits. (link)
Amazon DynamoDB: throughput, indexes, and request rates limits. (link)

Evaluate key metrics to understand how workloads recover from bursts

There are a number of key Amazon CloudWatch metrics to evaluate and alert on to understand whether your workload recovers from bursts.

AWS Lambda: Duration, Errors, Throttling, ConcurrentExecutions, UnreservedConcurrentExecutions. (link)
Amazon API Gateway: Latency, IntegrationLatency, 5xxError, 4xxError. (link)
Application Load Balancer: HTTPCode_ELB_5XX_Count, RejectedConnectionCount, HTTPCode_Target_5XX_Count, UnHealthyHostCount, LambdaInternalError, LambdaUserError. (link)
AWS AppSync: 5XX, Latency. (link)
Amazon SQS: ApproximateAgeOfOldestMessage. (link)
Amazon Kinesis Data Streams: ReadProvisionedThroughputExceeded, WriteProvisionedThroughputExceeded, GetRecords.IteratorAgeMilliseconds, PutRecord.Success, PutRecords.Success (if using Kinesis Producer Library), GetRecords.Success. (link)
Amazon SNS: NumberOfNotificationsFailed, NumberOfNotificationsFilteredOut-InvalidAttributes. (link)
Amazon Simple Email Service (SES): Rejects, Bounces, Complaints, Rendering Failures. (link)
AWS Step Functions: ExecutionThrottled, ExecutionsFailed, ExecutionsTimedOut. (link)
Amazon EventBridge: FailedInvocations, ThrottledRules. (link)
Amazon S3: 5xxErrors, TotalRequestLatency. (link)
Amazon DynamoDB: ReadThrottleEvents, WriteThrottleEvents, SystemErrors, ThrottledRequests, UserErrors. (link)

Conclusion

This post continues from part 1 and looks at managing duplicate and unwanted events with idempotency and an event schema. I cover how to consider scaling patterns at burst rates by managing account limits and show relevant metrics to evaluate

Build resiliency into your workloads. Ensure that applications can withstand partial and intermittent failures across components that may only surface in production. In the next post in the series, I cover the performance efficiency pillar from the Well-Architected Serverless Lens.

For more serverless learning resources, visit Serverless Land.

Classifying Millions of Amazon items with Machine Learning, Part I: Event Driven Architecture

2021-08-09 Mahmoud Abid

Post Syndicated from Mahmoud Abid original https://aws.amazon.com/blogs/architecture/classifying-millions-of-amazon-items-with-machine-learning-part-i-event-driven-architecture/

As part of AWS Professional Services, we work with customers across different industries to understand their needs and supplement their teams with specialized skills and experience.

Some of our customers are internal teams from the Amazon retail organization who request our help with their initiatives. One of these teams, the Global Environmental Affairs team, identifies the number of electronic products sold. Then they classify these products according to local laws and accurately report this data to regulators. This process covers the products’ end-of-life costs and ensures a high quality of recycling.

These electronic products have classification codes that differ from country to country, and these codes change according to each country’s latest regulations. This poses a complex technical problem. How do we automate our compliance teams’ work to efficiently and accurately classify over three million product classifications every month, in more than 38 countries, while also complying with evolving classification regulations?

To solve this problem, we used Amazon Machine Learning (Amazon ML) capabilities to build a resilient architecture. It ingests and processes data, trains ML models, and predicts (also known as inference workflow) monthly sales data for all countries concurrently.

In this post, we outline how we used AWS Lambda, Amazon EventBridge, and AWS Step Functions to build a scalable and cost-effective solution. We’ll also show you how to keep the data secure while processing it in Amazon ML flows.

Solution overview

Our solution consists of three main parts, which are summarized here and detailed in the following sections:

Training the ML models
Evaluating their performance
Using them to run an inference workflow (in other words, label) the sold items with the correct classification codes

Training the Amazon ML model

For training our Amazon ML model, we use the architecture in Figure 1. It starts with a periodic query against the Amazon.com data warehouse in Amazon Redshift.

Figure 1. Training workflow

A labeled dataset containing pre-recorded classification codes is extracted from Amazon Redshift. This dataset is stored in an Amazon Simple Storage Service (Amazon S3) bucket and split up by country. The data is encrypted at rest with server-side encryption using an AWS Key Management Service (AWS KMS) key. This is also known as server-side encryption with AWS KMS (SSE-KMS). The extraction query uses the AWS KMS key to encrypt the data when storing it in the S3 bucket.
Each time a country’s dataset is uploaded to the S3 bucket, a message is sent to an Amazon Simple Queue Service (Amazon SQS) queue. This prompts a Lambda function. We use Amazon SQS to ensure resiliency. If the Lambda function fails, the message will be tried again automatically. Overall, the message is either processed successfully, or ends up in a dead letter queue that we monitor (not displayed in Figure 1).
If the message is processed successfully, the Lambda function generates necessary input parameters. Then it starts a Step Functions workflow execution for the training process.
The training process involves orchestrating Amazon SageMaker Processing jobs to prepare the data. Once the data is prepared, a hyperparameter optimization job invokes multiple training jobs. These run in parallel with different values from a range of hyperparameters. The model that performs the best is chosen to move forward.
After the model is trained successfully, an EventBridge event is prompted, which will be used to invoke the performance comparison process.

Comparing performance of Amazon ML models

Because Amazon ML models are automatically trained periodically, we want to assess their performance automatically too. Newly created models should perform better than their predecessors. To measure this, we use the flow in Figure 2.

Figure 2. Model performance comparison workflow

The flow is activated by the EventBridge event at the end of the training flow.
A Lambda function gathers the necessary input parameters and uses them to start an inference workflow, implemented as a Step Function.
The inference workflow use SageMaker Processing jobs to prepare a new test dataset. It performs predictions using SageMaker Batch Transform jobs with the new model. The test dataset is a labeled subset that was not used in model training. Its prediction gives an unbiased estimation of the model’s performance, proving that the model can generalize.
After the inference workflow is completed and the results are stored on Amazon S3, an EventBridge event is performed, which prompts another Lambda function. This function runs the performance comparison Step Function.
The performance comparison workflow uses a SageMaker Processing job to analyze the inference results and calculate its performance score based on ground truth. For each country, the job compares the performance of the new model with the performance of the last used model to determine which one was best, otherwise known as the “winner model.” The metadata of the winner model is saved in an Amazon DynamoDB table so it can be queried and used in the next production inference job.
At the end of the performance comparison flow, an informational notification is sent to an Amazon Simple Notification Service (Amazon SNS) topic, which will be received by the MLOps team.

Running inference

The inference flow starts with a periodic query against the Amazon.com data warehouse in Amazon Redshift, as shown in Figure 3.

Figure 3. Inference workflow

As with training, the dataset is extracted from Amazon Redshift, split up by country, and stored in an S3 bucket and encrypted at rest using the AWS KMS key.
Every country dataset upload prompts a message to an SQS queue, which invokes a Lambda function.
The Lambda function gathers necessary input parameters and starts a workflow execution for the inference process. This is the same Step Function we used in the performance comparison. Now it runs against the real dataset instead of the test set.
The inference Step Function orchestrates the data preparation and prediction using the winner model for each country, as stored in the model performance DynamoDB table. The predictions are uploaded back to the S3 bucket to be further consumed for reporting.
Lastly, an Amazon SNS message is sent to signal completion of the inference flow, which will be received by different stakeholders.

Data encryption

One of the key requirements of this solution was to provide least privilege access to all data. To achieve this, we use AWS KMS to encrypt all data as follows:

We configured SSE-KMS as the default behavior on the S3 bucket.
We restricted the AWS KMS key policy to allow only a set of predefined AWS Identity and Access Management (IAM) roles to use it so no other role or user could use it to decrypt the data.
We set the S3 bucket policy to deny PutObject requests that do not explicitly specify the key in the request header, as shown in Figure 4.

Figure 4. Restriction of data decryption permissions

Conclusion

In this post, we outline how we used a serverless architecture to handle the end-to-end flow of data extraction, processing, and storage. We also talk about how we use this data for model training and inference.

With this solution, our customer team onboarded 38 countries and brought 60 Amazon ML models to production to classify 3.3 million items on a monthly basis.

In the next post, we show you how we use AWS Developer Tools to build a comprehensive continuous integration/continuous delivery (CI/CD) pipeline that safeguards the code behind this solution.

Building well-architected serverless applications: Building in resiliency – part 1

2021-08-03 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-building-in-resiliency-part-1/

Reliability question REL2: How do you build resiliency into your serverless application?

Evaluate scaling mechanisms for serverless and non-serverless resources to meet customer demand. Build resiliency into your workload to make your serverless application resilient to withstand partial and intermittent failures across components that may only surface in production.

Required practice: Manage transaction, partial, and intermittent failures

Whenever one service or system calls another, there is a chance that failures can happen. Services or systems often don’t fail as a single unit, but rather suffer partial or transient failures. Applications should be designed to handle component failures as part of the architecture. The system should be designed to detect failure and, ideally, automatically heal itself.

Transaction failures can occur when a component is unavailable or under high load. Partial failures can occur when a percentage of requests succeeds, including during batch processing. Intermittent failures might occur when a request fails for a short period of time due to network or other transient issues.

AWS serverless services, including AWS Lambda, are fault-tolerant and designed to handle failures. If a service invokes a Lambda function and there is a service disruption, Lambda invokes the function in a different Availability Zone.

When you invoke a function directly, you determine the strategy for handling errors. You can retry, send the event to a destination or queue for debugging, or ignore the error. Clients such as the AWS Command Line Interface (CLI) and the AWS SDK retry on client timeouts, throttling errors (429), and other errors that are not caused by a bad request.

When you invoke a function indirectly, you must be aware of the retry behavior of the invoker and any service that the request encounters along the way. For more information, see “Error handling and automatic retries in AWS Lambda”. You can configure Maximum Retry Attempts and Maximum Event Age for asynchronous invocations.

When reading from Amazon Kinesis Data Streams and Amazon DynamoDB Streams, Lambda retries the entire batch of items. Retries continue until the records expire or exceed the maximum age that you configure on the event source mapping. You can also configure the event source mapping to split a failed batch into two batches. Retrying with smaller batches isolates bad records and works around timeout issues.

Partial failures can occur in non-atomic operations. PutRecords for Kinesis and BatchWriteItem for DynamoDB return a successful response if at least one record is ingested successfully. Always inspect the response when using such operations and programmatically deal with partial failures.

Use exponential backoff with jitter

The simplest technique for dealing with failures in a networked environment is to retry calls until they succeed. This technique increases the reliability of the application and reduces operational costs for the developer.

However, it is not always safe to retry. A retry can further increase the load on the system being called if the system is already failing due to an overload. To avoid this problem, use backoff. Instead of retrying immediately and aggressively, the client waits some amount of time between tries. The most common pattern is an exponential backoff, which uses exponentially longer wait times between retries. This is typically capped to a maximum delay and number of retries.

If all backoff retries are still happening at the same time, this can still overload a system or cause contention. To avoid this problem, use jitter. Jitter adds some amount of randomness to the backoff to spread the retries around in time. This can help prevent large bursts by spreading out the rate when clients connect. For more information see the Amazon Builders’ Library article “Timeouts, retries, and backoff with jitter” and AWS Architecture blog post “Exponential Backoff And Jitter”.

Exponential backoff and jitter

When your application responds to callers in fail-fast scenarios and when performance is degraded, inform the caller via headers or metadata when they can retry.

Each AWS SDK implements automatic retry logic including exponential backoff. For downstream calls, you can adjust AWS and third-party SDK retries, backoffs, TCP, and HTTP timeouts. This helps you decide when to stop retrying. For more information, see the documentation and troubleshooting steps for Lambda and the AWS SDK.

Use a dead-letter queue mechanism to retain, investigate and retry failed transactions

There are a number of ways to handle message failures including destinations and dead-letter queues.

You can configure Lambda to send records of asynchronous invocations to another destination service. These include Amazon Simple Queue Service (SQS), Amazon Simple Notification Service (SNS), Lambda, and Amazon EventBridge. You can configure separate destinations for events that fail processing and events that are successfully processed. The invocation record contains details about the event, the response, and the reason that the record was sent.

The following example shows a function that sends a record of a successful invocation to an EventBridge event bus. When an event fails all processing attempts, Lambda sends an invocation record to an SQS queue. It includes the function’s response in the invocation record.

AWS Lambda destinations for asynchronous invocation

SNS, SQS, Lambda, and EventBridge support dead-letter queues (DLQs). DLQs make your applications more resilient and durable by storing messages or events that can’t be processed correctly into a dedicated SQS queue. This helps you debug your application by isolating the problematic messages to determine why their processing failed. One you have resolved the issue, re-process the failed message. For more information, see “When should I use a dead-letter queue?” There is an example serverless application to redrive the messages from an SQS DLQ back to its source SQS queue.

For Lambda, DLQs provide an alternative to a failure destination. Lambda destinations is preferable for asynchronous invocations.

Good practice: Orchestrate long-running transactions

Long-running transactions can be processed by one or multiple components. Consider implementing the saga pattern using state machines for these types of transactions.

The saga pattern coordinates transactions between multiple microservices as part of a state machine. Each service that performs a transaction publishes an event to trigger the next transaction in the saga. This continues until the transaction chain is complete. If a transaction fails, saga orchestrates a series of compensating transactions that undo the changes that were made by the preceding transactions.

This is preferable to handling complex or long-running transactions within application code. State machines prevent cascading failures and avoid tightly coupling components with orchestrating logic and business logic.

Use a state machine to visualize distributed transactions, and to separate business logic from orchestration logic.

AWS Step Functions lets you coordinate multiple AWS services into serverless workflows via state machines. Within Step Functions, you can set separate retries, backoff rates, max attempts, intervals, and timeouts. These are set for every step of your state machine using a declarative language.

Booking service Step Functions state machine

The state machine uses a combination of service integrations using DynamoDB, SQS, and Lambda functions to coordinate transactions and handle failures.

For example, the Reserve Booking task invokes a Lambda function. The task has retry and error handling configured as part of the task definition.

"Reserve Booking": {
	"Type": "Task",
	"Resource": "${ReserveBooking.Arn}",
	"TimeoutSeconds": 5,
	"Retry": [
		{
			"ErrorEquals": [
				"BookingReservationException"
			],
			"IntervalSeconds": 1,
			"BackoffRate": 2,
			"MaxAttempts": 2
		}
	],
	"Catch": [
		{
			"ErrorEquals": [
				"States.ALL"
			],
			"ResultPath": "$.bookingError",
			"Next": "Cancel Booking"
		}
	],
	"ResultPath": "$.bookingId",
	"Next": "Collect Payment"
},

Step Functions supports direct service integrations, including DynamoDB. The Reserve Flight task directly updates the flightTable without requiring a Lambda function.

"Reserve Flight": {
	"Type": "Task",
	"Resource": "arn:aws:states:::dynamodb:updateItem",
	"Parameters": {
		"TableName.$": "$.flightTable",
		"Key": {
			"id": {
				"S.$": "$.outboundFlightId"
			}
		},
		"UpdateExpression": "SET seatCapacity = seatCapacity - :dec",
		"ExpressionAttributeValues": {
			":dec": {
				"N": "1"
			},
			":noSeat": {
				"N": "0"
			}
		},
		"ConditionExpression": "seatCapacity > :noSeat"
	},

By default, when a state reports an error, Step Functions causes the execution to fail entirely.

Utilize dead-letter queues in response to failed state machine executions

Any state within the Step Functions workflow can encounter runtime errors. These include state machine definition issues, task failures such as Lambda function exceptions, or transient issues such as network connectivity issues. For more information, see “Error handling in Step Functions”.

Use the Step Functions service integration with SQS to send failed transactions to a DLQ as the final step. This adds a higher level of durability within your state machines.

For example, the airline Notify Failed Booking final task catches failed states from four previous steps. It sends the results to the Booking DLQ.

Booking service Step Functions DLQ

The message includes the output of the previous failed states for further troubleshooting.

"Booking DLQ": {
	"Type": "Task",
	"Resource": "arn:aws:states:::sqs:sendMessage",
	"Parameters": {
		"QueueUrl": "${BookingsDLQ}",
		"MessageBody.$": "$"
	},
	"ResultPath": "$.deadLetterQueue",
	"Next": "Booking Failed"
},

The Step Functions documentation has more information on calling SQS.

Conclusion

Build resiliency into your workloads. This makes sure that your application can withstand partial and intermittent failures across components that may only surface in production.

In this post, I cover managing failures using retries, exponential backoff, and jitter. I explain how DLQs can isolate failed messages. I show how to use state machines to orchestrate long running transactions rather than handling these in application code.

This well-architected question continues in part 2 where I look at managing duplicate and unwanted events with idempotency and an event schema. I cover how to consider scaling patterns at burst rates by managing account limits and show relevant metrics to evaluate.

For more serverless learning resources, visit Serverless Land.

Building a serverless multiplayer game that scales: Part 2

2021-07-29 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-serverless-multiplayer-game-that-scales-part-2/

This post is written by Vito De Giosa, Sr. Solutions Architect and Tim Bruce, Sr. Solutions Architect, Developer Acceleration.

This series discusses solutions for scaling serverless games, using the Simple Trivia Service, a game that relies on user-generated content. Part 1 describes the overall architecture, how to deploy to your AWS account, and different communications methods.

This post discusses how to scale via automation and asynchronous processes. You can use automation to minimize the need to scale personnel to review player-generated content for acceptability. It also introduces asynchronous processing, which allows you to run non-critical processes in the background and batch data together. This helps to improve resource usage and game performance. Both scaling techniques can also reduce overall spend.

To set up the example, see the instructions in the GitHub repo and the README.md file. This example uses services beyond the AWS Free Tier and incurs charges. Instructions to remove the example application from your account are also in the README.md file.

Technical implementation

Games require a mechanism to support auto-moderated avatars. Specifically, this is an upload process to allow the player to send the content to the game. There is a content moderation process to remove unacceptable content and a messaging process to provide players with a status regarding their content.

Here is the architecture for this feature in Simple Trivia Service, which is combined within the avatar workflow:

This architecture processes images uploaded to Amazon S3 and notifies the user of the processing result via HTTP WebPush. This solution uses AWS Serverless services and the Amazon Rekognition moderation API.

Uploading avatars

Players start the process by uploading avatars via the game client. Using presigned URLs, the client allows players to upload images directly to S3 without sharing AWS credentials or exposing the bucket publicly.

The URL embeds all the parameters of the S3 request. It includes a SignatureV4 generated with AWS credentials from the backend allowing S3 to authorize the request.

The front end retrieves the presigned URL invoking an AWS Lambda function through an Amazon API Gateway HTTP API endpoint.
The front end uses the URL to send a PUT request to S3 with the image.

Processing avatars

After the upload completes, the backend performs a set of activities. These include content moderation, generating the thumbnail variant, and saving the image URL to the player profile. AWS Step Functions orchestrates the workflow by coordinating tasks and integrating with AWS services, such as Lambda and Amazon DynamoDB. Step Functions enables creating workflows without writing code and handles errors, retries, and state management. This enables traffic control to avoid overloading single components when traffic surges.

The avatar processing workflow runs asynchronously. This allows players to play the game without being blocked and enables you to batch the requests. The Step Functions workflow is triggered from an Amazon EventBridge event. When the user uploads an image to S3, an event is published to EventBridge. The event is routed to the avatar processing Step Functions workflow.

The single avatar feature runs in seconds and uses Step Functions Express Workflows, which are ideal for high-volume event-processing use cases. Step Functions can also support longer running processes and manual steps, depending on your requirements.

To keep performance at scale, the solution adopts four strategies. First, it moderates content automatically, requiring no human intervention. This is done via Amazon Rekognition moderation API, which can discover inappropriate content in uploaded avatars. Developers do not need machine learning expertise to use this API. If it identifies unacceptable content, the Step Functions workflow deletes the uploaded picture.

Second, it uses avatar thumbnails on the top navigation bar and on leaderboards. This speeds up page loading and uses less network bandwidth. Image-editing software runs in a Lambda function to modify the uploaded file and store the result in S3 with the original.

Third, it uses Amazon CloudFront as a content delivery network (CDN) with the S3 bucket hosting images. This improves performance by implementing caching and serving static content from locations closer to the player. Additionally, using CloudFront allows you to keep the bucket private and provide greater security for the content stored within S3.

Finally, it stores profile picture URLs in DynamoDB and replicates the thumbnail URL in an Amazon Cognito user attribute named picture. This allows the game to retrieve the avatar URL as part of the login process, saving an HTTP GET request for the player profile.

The last step of the workflow publishes the result via an event to EventBridge for downstream systems to consume. The service routes the event to the notification component to inform the player about the moderation status.

Notifying users of the processing result

The result of the avatar workflow to the player is important but not urgent. Players want to know the result but not impact their gameplay experience. A solution for this challenge is to use HTTP web push. It uses the HTTP protocol and does not require a constant communication channel between backend and front end. This allows players to play games without being blocked or by introducing latency to the game communications channel.

Applications requiring low latency fully bidirectional communication, such as highly interactive multi-player games, typically use WebSockets. This creates a persistent two-way channel for front end and backend to exchange information. The web push mechanism can provide non-urgent data and messages to the player without interrupting the WebSockets channel.

The web push protocol describes how to use a consolidated push service as a broker between the web-client and the backend. It accepts subscriptions from the client and receives push message delivery requests from the backend. Each browser vendor provides a push service implementation that is compliant with the W3C Push API specification and is external to both client and backend.

The web client is typically a browser where a JavaScript application interacts with the push service to subscribe and listen for incoming notifications. The backend is the application that notifies the front end. Here is an overview of the protocol with all the parties involved.

A component on the client subscribes to the configured push service by sending an HTTP POST request. The client keeps a background connection waiting for messages.
The push service returns a URL identifying a push resource that the client distributes to backend applications that are allowed to send notifications.
Backend applications request a message delivery by sending an HTTP POST request to the previously distributed URL.
The push service forwards the information to the client.

This approach has four advantages. First, it reduces the effort to manage the reliability of the delivery process by off-loading it to an external and standardized component. Second, it minimizes cost and resource consumption. This is because it doesn’t require the backend to keep a persistent communication channel or compute resources to be constantly available. Third, it keeps complexity to a minimum because it relies on HTTP only without requiring additional technologies. Finally, HTTP web push addresses concepts such as message urgency and time-to-live (TTL) by using a standard.

Serverless HTTP web push

The implementation of the web push protocol requires the following components, per the Push API specification. First, the front end is required to create a push subscription. This is implemented through a service worker, a script running in the origin of the application. The service worker exposes operations to access the push service either creating subscriptions or listening for push events.

The client uses the service worker to subscribe to the push service via the Push API.
The push service responds with a payload including a URL, which is the client’s push endpoint. The URL is used to create notification delivery requests.
The browser enriches the subscription with public cryptographic keys, which are used to encrypt messages ensuring confidentiality.
The backend must receive and store the subscription for when a delivery request is made to the push service. This is provided by API Gateway, Lambda, and DynamoDB. API Gateway exposes an HTTP API endpoint that accepts POST requests with the push service subscription as payload. The payload is stored in DynamoDB alongside the player identifier.

This front end code implements the process:

//Once service worker is ready
navigator.serviceWorker.ready
  .then(function (registration) {
    //Retrieve existing subscription or subscribe
    return registration.pushManager.getSubscription()
      .then(async function (subscription) {
        if (subscription) {
          console.log('got subscription!', subscription)
          return subscription;
        }
        /*
         * Using Public key of our backend to make sure only our
         * application backend can send notifications to the returned
         * endpoint
         */
        const convertedVapidKey = self.vapidKey;
        return registration.pushManager.subscribe({
          userVisibleOnly: true,
          applicationServerKey: convertedVapidKey
        });
      });
  }).then(function (subscription) {
    //Distributing the subscription to the application backend
    console.log('register!', subscription);
    const body = JSON.stringify(subscription);
    const parms = {jwt: jwt, playerName: playerName, subscription: body};
    //Call to the API endpoint to save the subscription
    const res = DataService.postPlayerSubscription(parms);
    console.log(res);
  });

Next, the backend reacts to the avatar workflow completed custom event to create a delivery request. This is accomplished with EventBridge and Lambda.

EventBridge routes the event to a Lambda function.
The function retrieves the player’s agent subscriptions, including push endpoint and encryption keys, from DynamoDB.
The function sends an HTTP POST to the push endpoint with the encrypted message as payload.
When the push service delivers the message, the browser activates the service worker updating local state and displaying the notification.

The push service allows creating delivery requests based on the knowledge of the endpoint and the front end allows the backend to deliver messages by distributing the endpoint. HTTPS provides encryption for data in transit while DynamoDB encrypts all your data at rest to provide confidentiality and security for the endpoint.

Security of WebPush can be further improved by using Voluntary Application Server Identification (VAPID). With WebPush, the clients authenticate messages at delivery time. VAPID allows the push service to perform message authentication on behalf of the web client avoiding denial-of-service risk. Without the additional security of VAPID, any application knowing the push service endpoint might successfully create delivery requests with an invalid payload. This can cause the player’s agent to accept messages from unauthorized services and, possibly, cause a denial-of-service to the client by overloading its capabilities.

VAPID requires backend applications to own a key pair. In Simple Trivia Service, a Lambda function, which is an AWS CloudFormation custom resource, generates the key pair when deploying the stack. It securely saves values in AWS System Manager (SSM) Parameter Store.

Here is a representation of VAPID in action:

The front end specifies which backend the push service can accept messages from. It does this by including the public key from VAPID in the subscription request.
When requesting a message delivery, the backend self-identifies by including the public key and a token signed with the private key in the HTTP Authorization header. If the keys match and the client uses the public key at subscription, the message is sent. If not, the message is blocked by the push service.

The Lambda function that sends delivery requests to the push service reads the key values from SSM. It uses them to generate the Authorization header to include in the request, allowing for successful delivery to the client endpoint.

Conclusion

This post shows how you can add scaling support for a game via automation. The example uses Amazon Rekognition to check images for unacceptable content and uses asynchronous architecture patterns with Step Functions and HTTP WebPush. These scaling approaches can help you to maximize your technical and personnel investments.

For more serverless learning resources, visit Serverless Land.

Using Amazon Macie to Validate S3 Bucket Data Classification

2021-07-16 Bill Magee

Post Syndicated from Bill Magee original https://aws.amazon.com/blogs/architecture/using-amazon-macie-to-validate-s3-bucket-data-classification/

Securing sensitive information is a high priority for organizations for many reasons. At the same time, organizations are looking for ways to empower development teams to stay agile and innovative. Centralized security teams strive to create systems that align to the needs of the development teams, rather than mandating how those teams must operate.

Security teams who create automation for the discovery of sensitive data have some issues to consider. If development teams are able to self-provision data storage, how does the security team protect that data? If teams have a business need to store sensitive data, they must consider how, where, and with what safeguards that data is stored.

Let’s look at how we can set up Amazon Macie to validate data classifications provided by decentralized software development teams. Macie is a fully managed service that uses machine learning (ML) to discover sensitive data in AWS. If you are not familiar with Macie, read New – Enhanced Amazon Macie Now Available with Substantially Reduced Pricing.

Data classification is part of the security pillar of a Well-Architected application. Following the guidelines provided in the AWS Well-Architected Framework, we can develop a resource-tagging scheme that fits our needs.

Overview of decentralized data validation system

In our example, we have multiple levels of data classification that represent different levels of risk associated with each classification. When a software development team creates a new Amazon Simple Storage Service (S3) bucket, they are responsible for labeling that bucket with a tag. This tag represents the classification of data stored in that bucket. The security team must maintain a system to validate that the data in those buckets meets the classification specified by the development teams.

This separation of roles and responsibilities for development and security teams who work independently requires a validation system that’s decoupled from S3 bucket creation. It should automatically detect new buckets or data in the existing buckets, and validate the data against the assigned classification tags. It should also notify the appropriate development teams of misclassified or unclassified buckets in a timely manner. These notifications can be through standard notification channels, such as email or Slack channel notifications.

Validation and alerts with AWS services

Figure 1. Validation system for data classification

We assume that teams are permitted to create S3 buckets and we will use AWS Config to enforce the following required tags: DataClassification and SupportSNSTopic. The DataClassification tag indicates what type of data is allowed in the bucket. The SupportSNSTopic tag indicates an Amazon Simple Notification Service (SNS) topic. If there are issues found with the data in the bucket, a message is published to the topic, and Amazon SNS will deliver an alert. For example, if there is personally identifiable information (PII) data in a bucket that is classified as non-sensitive, the system will alert the owners of the bucket.

Macie is configured to scan all S3 buckets on a scheduled basis. This configuration ensures that any new bucket and data placed in the buckets is analyzed the next time the Macie job runs.

Macie provides several managed data identifiers for discovering and classifying the data. These include bank account numbers, credit card information, authentication credentials, PII, and more. You can also create custom identifiers (or rules) to gather information not covered by the managed identifiers.

Macie integrates with Amazon EventBridge to allow us to capture data classification events and route them to one or more destinations for reporting and alerting needs. In our configuration, the event initiates an AWS Lambda. The Lambda function is used to validate the data classification inferred by Macie against the classification specified in the DataClassification tag using custom business logic. If a data classification violation is found, the Lambda then sends a message to the Amazon SNS topic specified in the SupportSNSTopic tag.

The Lambda function also creates custom metrics and sends those to Amazon CloudWatch. The metrics are organized by engineering team and severity. This allows the security team to create a dashboard of metrics based on the Macie findings. The findings can also be filtered per engineering team and severity to determine which teams need to be contacted to ensure remediation.

Conclusion

This solution provides a centralized security team with the tools it needs. The team can validate the data classification of an Amazon S3 bucket that is self-provisioned by a development team. New Amazon S3 buckets are automatically included in the Macie jobs and alerts. These are only sent out if the data in the bucket does not conform to the classification specified by the development team. The data auditing process is loosely coupled with the Amazon S3 Bucket creation process, enabling self-service capabilities for development teams, while ensuring proper data classification. Your teams can stay agile and innovative, while maintaining a strong security posture.

Learn more about Amazon Macie and Data Classification.

Build a serverless event-driven workflow with AWS Glue and Amazon EventBridge

2021-07-15 Noritaka Sekiyama

Post Syndicated from Noritaka Sekiyama original https://aws.amazon.com/blogs/big-data/build-a-serverless-event-driven-workflow-with-aws-glue-and-amazon-eventbridge/

Customers are adopting event-driven-architectures to improve the agility and resiliency of their applications. As a result, data engineers are increasingly looking for simple-to-use yet powerful and feature-rich data processing tools to build pipelines that enrich data, move data in and out of their data lake and data warehouse, and analyze data. AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months.

Data integration jobs have varying degrees of priority and time sensitivity. For example, you can use batch processing to process weekly sales data but in some cases, data needs to be processed immediately. Fraud detection applications, for example, require near-real-time processing of security logs. Or if a partner uploads product information to your Amazon Simple Storage Service (Amazon S3) bucket, it needs to be processed right away to ensure that your website has the latest product information.

This post discusses how to configure AWS Glue workflows to run based on real-time events. You no longer need to set schedules or build complex solutions to trigger jobs based on events; AWS Glue event-driven workflows manage it all for you.

Get started with AWS Glue event-driven workflows

As a business requirement, most companies need to hydrate their data lake and data warehouse with data in near-real time. They run their pipelines on a schedule (hourly, daily, or even weekly) or trigger the pipeline through an external system. It’s difficult to predict the frequency at which upstream systems generate data, which makes it difficult to plan and schedule ETL pipelines to run efficiently. Scheduling ETL pipelines to run too frequently can be expensive, whereas scheduling pipelines to run infrequently can lead to making decisions based on stale data. Similarly, triggering pipelines from an external process can increase complexity, cost, and job startup time.

AWS Glue now supports event-driven workflows, a capability that lets developers start AWS Glue workflows based on events delivered by Amazon EventBridge. With this new feature, you can trigger a data integration workflow from any events from AWS services, software as a service (SaaS) providers, and any custom applications. For example, you can react to an S3 event generated when new buckets are created and when new files are uploaded to a specific S3 location. In addition, if your environment generates many events, AWS Glue allows you to batch them either by time duration or by the number of events. Event-driven workflows make it easy to start an AWS Glue workflow based on real-time events.

To get started, you simply create a new AWS Glue trigger of type EVENT and place it as the first trigger in your workflow. You can optionally specify a batching condition. Without event batching, the AWS Glue workflow is triggered every time an EventBridge rule matches which may result in multiple concurrent workflow runs. In some environments, starting many concurrent workflow runs could lead to throttling, reaching service quota limits, and potential cost overruns. This can also result in workflow execution failures in case the concurrency limit specified on the workflow and the jobs within the workflow do not match. Event batching allows you to configure the number of events to buffer or the maximum elapsed time before firing the particular trigger. Once the batching condition is met, a workflow run is started. For example, you can trigger your workflow when 100 files are uploaded in S3 or 5 minutes after the first upload. We recommend configuring event batching to avoid too many concurrent workflow runs, and optimize resource usage and cost.

Overview of the solution

In this post, we walk through a solution to set up an AWS Glue workflow that listens to S3 PutObject data events captured by AWS CloudTrail. This workflow is configured to run when five new files are added or the batching window time of 900 seconds expires after first file is added. The following diagram illustrates the architecture.

The steps in this solution are as follows:

Create an AWS Glue workflow with a starting trigger of EVENT type and configure the batch size on the trigger to be five and batch window to be 900 seconds.
Configure Amazon S3 to log data events, such as PutObject API calls to CloudTrail.
Create a rule in EventBridge to forward the PutObject API events to AWS Glue when they are emitted by CloudTrail.
Add an AWS Glue event-driven workflow as a target to the EventBridge rule.
To start the workflow, upload files to the S3 bucket. Remember you need to have at least five files before the workflow is triggered.

Deploy the solution with AWS CloudFormation

For a quick start of this solution, you can deploy the provided AWS CloudFormation stack. This creates all the required resources in your account.

The CloudFormation template generates the following resources:

S3 bucket – This is used to store data, CloudTrail logs, job scripts, and any temporary files generated during the AWS Glue ETL job run.
CloudTrail trail with S3 data events enabled – This enables EventBridge to receive PutObject API call data on specific bucket.
AWS Glue workflow – A data processing pipeline that is comprised of a crawler, jobs, and triggers. This workflow converts uploaded data files into Apache Parquet format.
AWS Glue database – The AWS Glue Data Catalog database that is used to hold the tables created in this walkthrough.
AWS Glue table – The Data Catalog table representing the Parquet files being converted by the workflow.
AWS Lambda function – This is used as an AWS CloudFormation custom resource to copy job scripts from an AWS Glue-managed GitHub repository and an AWS Big Data blog S3 bucket to your S3 bucket.
IAM roles and policies – We use the following AWS Identity and Access Management (IAM) roles:
- LambdaExecutionRole – Runs the Lambda function that has permission to upload the job scripts to the S3 bucket.
- GlueServiceRole – Runs the AWS Glue job that has permission to download the script, read data from the source, and write data to the destination after conversion.
- EventBridgeGlueExecutionRole – Has permissions to invoke the NotifyEvent API for an AWS Glue workflow.

To launch the CloudFormation stack, complete the following steps:

Sign in to the AWS CloudFormation console.
Choose Launch Stack:

Choose Next.
For S3BucketName, enter the unique name of your new S3 bucket.
For WorkflowName, DatabaseName, and TableName, leave as the default.
Choose Next.

On the next page, choose Next.
Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
Choose Create.

It takes a few minutes for the stack creation to complete; you can follow the progress on the Events tab.

By default, the workflow runs whenever a single file is uploaded to the S3 bucket, resulting in a PutObject API call. In the next section, we configure the event batching to change this behavior.

Review the AWS Glue trigger and add event batching conditions

The CloudFormation template provisioned an AWS Glue workflow including a crawler, jobs, and triggers. The first trigger in the workflow is configured as an event-based trigger. Next, we update this trigger to batch five events or wait for 900 seconds after the first event before it starts the workflow.

Before we make any changes, let’s review the trigger on the AWS Glue console:

On the AWS Glue console, under ETL, choose Triggers.
Choose <Workflow-name>_pre_job_trigger.
Choose Edit.

We can see the trigger’s type is set to EventBridge event, which means it’s an event-based trigger. Let’s change the event batching condition to run the workflow after five files are uploaded to Amazon S3.

For Number of events, enter 5.
For Time delay (sec), enter 900.
Choose Next.

On the next screen, under Choose jobs to trigger, leave as the default and choose Next.
Choose Finish.

Review the EventBridge rule

The CloudFormation template created an EventBridge rule to forward S3 PutObject API events to AWS Glue. Let’s review the configuration of the EventBridge rule:

On the EventBridge console, under Events, choose Rules.
Choose s3_file_upload_trigger_rule-<CloudFormation-stack-name>.
Review the information in the Event pattern section.

The event pattern shows that this rule is triggered when an S3 object is uploaded to s3://<bucket_name>/data/products_raw/. CloudTrail captures the PutObject API calls made and relays them as events to EventBridge.

In the Targets section, you can verify that this EventBridge rule is configured with an AWS Glue workflow as a target.

Trigger the AWS Glue workflow by uploading files to Amazon S3

To test your workflow, we upload files to Amazon S3 using the AWS Command Line Interface (AWS CLI). If you don’t have the AWS CLI, see Installing, updating, and uninstalling the AWS CLI.

Let’s upload some small files to your S3 bucket.

Run the following command to upload the first file to your S3 bucket:

$ echo '{"product_id": "00001", "product_name": "Television", "created_at": "2021-06-01"}' > product_00001.json
$ aws s3 cp product_00001.json s3://<bucket-name>/data/products_raw/

Run the following command to upload the second file:

$ echo '{"product_id": "00002", "product_name": "USB charger", "created_at": "2021-06-02"}' > product_00002.json
$ aws s3 cp product_00002.json s3://<bucket-name>/data/products_raw/

Run the following command to upload the third file:

$ echo '{"product_id": "00003", "product_name": "USB charger", "created_at": "2021-06-03"}' &gt; product_00003.json<br />
$ aws s3 cp product_00003.json s3://<bucket-name>/data/products_raw/

Run the following command to upload the fourth file:

$ echo '{"product_id": "00004", "product_name": "USB charger", "created_at": "2021-06-04"}' &gt; product_00004.json<br />
$ aws s3 cp product_00004.json s3://<bucket-name>/data/products_raw/

These events didn’t trigger the workflow because it didn’t meet the batch condition of five events.

Run the following command to upload the fifth file:

$ echo '{"product_id": "00005", "product_name": "USB charger", "created_at": "2021-06-05"}' > product_00005.json
$ aws s3 cp product_00005.json s3://<bucket-name>/data/products_raw/

Now the five JSON files have been uploaded to Amazon S3.

Verify the AWS Glue workflow is triggered successfully

Now the workflow should be triggered. Open the AWS Glue console to validate that your workflow is in the RUNNING state.

To view the run details, complete the following steps:

On the History tab of the workflow, choose the current or most recent workflow run.
Choose View run details.

When the workflow run status changes to Completed, let’s see the converted files in your S3 bucket.

Switch to the Amazon S3 console, and navigate to your bucket.

You can see the Parquet files under s3://<bucket-name>/data/products/.

Congratulations! Your workflow ran successfully based on S3 events triggered by uploading files to your bucket. You can verify everything works as expected by running a query against the generated table using Amazon Athena.

Verify the metrics for the EventBridge rule

Optionally, you can use Amazon CloudWatch metrics to validate the events were sent to the AWS Glue workflow.

On the EventBridge console, in the navigation pane, choose Rules.
Select your EventBridge rule s3_file_upload_trigger_rule-<Workflow-name> and choose Metrics for the rule.

When the target workflow is invoked by the rule, the metrics Invocations and TriggeredRules are published.

The metric FailedInvocations is published if the EventBridge rule is unable to trigger the AWS Glue workflow. In that case, we recommend you check the following configurations:

Verify the IAM role provided to the EventBridge rule allows the glue:NotifyEvent permission on the AWS Glue workflow.
Verify the trust relationship on the IAM role provides the events.amazonaws.com service principal the ability to assume the role.
Verify the starting trigger on your target AWS Glue workflow is an event-based trigger.

Clean up

Now to the final step, cleaning up the resources. Delete the CloudFormation stack to remove any resources you created as part of this walkthrough.

Conclusion

AWS Glue event-driven workflows enable data engineers to easily build event driven ETL pipelines that respond in near-real time, delivering fresh data to business users. In this post, we demonstrated how to configure a rule in EventBridge to forward events to AWS Glue. We also saw how to create an event-based trigger that either immediately, or after a set number of events or period of time, starts a Glue ETL workflow. Migrating your existing AWS Glue workflows to make them event-driven is easy. This can be simply done by replacing the first trigger in the workflow to be of type EVENT and adding this workflow as a target to an EventBridge rule that captures events of your interest.

For more information about event-driven AWS Glue workflows, see Starting an AWS Glue Workflow with an Amazon EventBridge Event.

About the Authors

Noritaka Sekiyama is a Senior Big Data Architect on the AWS Glue and AWS Lake Formation team. In his spare time, he enjoys playing with his children. They are addicted to grabbing crayfish and worms in the park, and putting them in the same jar to observe what happens.

Karan Vishwanathan is a Software Development Engineer on the AWS Glue team. He enjoys working on distributed systems problems and playing golf.

Keerthi Chadalavada is a Software Development Engineer on the AWS Glue team. She is passionate about building fault tolerant and reliable distributed systems at scale.

Building well-architected serverless applications: Implementing application workload security – part 2

2021-07-13 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-implementing-application-workload-security-part-2/

Security question SEC3: How do you implement application security in your workload?

This post continues part 1 of this security question. Previously, I cover reviewing security awareness documentation such as the Common Vulnerabilities and Exposures (CVE) database. I show how to use GitHub security features to inspect and manage code dependencies. I then show how to validate inbound events using Amazon API Gateway request validation.

Required practice: Store secrets that are used in your code securely

Store secrets such as database passwords or API keys in a secrets manager. Using a secrets manager allows for auditing access, easier rotation, and prevents exposing secrets in application source code. There are a number of AWS and third-party solutions to store and manage secrets.

AWS Partner Network (APN) member Hashicorp provides Vault to keep secrets and application data secure. Vault has a centralized workflow for tightly controlling access to secrets across applications, systems, and infrastructure. You can store secrets in Vault and access them from an AWS Lambda function to, for example, access a database. You can use the Vault Agent for AWS to authenticate with Vault, receive the database credentials, and then perform the necessary queries. You can also use the Vault AWS Lambda extension to manage the connectivity to Vault.

AWS Systems Manager Parameter Store allows you to store configuration data securely, including secrets, as parameter values.

AWS Secrets Manager enables you to replace hardcoded credentials in your code with an API call to Secrets Manager to retrieve the secret programmatically. You can protect, rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle. You can also generate secure secrets. By default, Secrets Manager does not write or cache the secret to persistent storage.

Parameter Store integrates with Secrets Manager. For more information, see “Referencing AWS Secrets Manager secrets from Parameter Store parameters.”

To show how Secrets Manager works, deploy the solution detailed in “How to securely provide database credentials to Lambda functions by using AWS Secrets Manager”.

The AWS CloudFormation stack deploys an Amazon RDS MySQL database with a randomly generated password. This is stored in Secrets Manager using a secret resource. A Lambda function behind an API Gateway endpoint returns the record count in a table from the database, using the required credentials. Lambda function environment variables store the database connection details and which secret to return for the database password. The password is not stored as an environment variable, nor in the Lambda function application code.

Lambda environment variables for Secrets Manager

The application flow is as follows:

Clients call the API Gateway endpoint
API Gateway invokes the Lambda function
The Lambda function retrieves the database secrets using the Secrets Manager API
The Lambda function connects to the RDS database using the credentials from Secrets Manager and returns the query results

View the password secret value in the Secrets Manager console, which is randomly generated as part of the stack deployment.

Example password stored in Secrets Manager

The Lambda function includes the following code to retrieve the secret from Secrets Manager. The function then uses it to connect to the database securely.

secret_name = os.environ['SECRET_NAME']
rds_host = os.environ['RDS_HOST']
name = os.environ['RDS_USERNAME']
db_name = os.environ['RDS_DB_NAME']

session = boto3.session.Session()
client = session.client(
	service_name='secretsmanager',
	region_name=region_name
)
get_secret_value_response = client.get_secret_value(
	SecretId=secret_name
)
...
secret = get_secret_value_response['SecretString']
j = json.loads(secret)
password = j['password']
...
conn = pymysql.connect(
	rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)

Browsing to the endpoint URL specified in the CloudFormation output displays the number of records. This confirms that the Lambda function has successfully retrieved the secure database credentials and queried the table for the record count.

Lambda function retrieving database credentials

Audit secrets access through a secrets manager

Monitor how your secrets are used to confirm that the usage is expected, and log any changes to them. This helps to ensure that any unexpected usage or change can be investigated, and unwanted changes can be rolled back.

Hashicorp Vault uses Audit devices that keep a detailed log of all requests and responses to Vault. Audit devices can append logs to a file, write to syslog, or write to a socket.

Secrets Manager supports logging API calls with AWS CloudTrail. CloudTrail captures all API calls for Secrets Manager as events. This includes calls from the Secrets Manager console and from code calling the Secrets Manager APIs.

Viewing the CloudTrail event history shows the requests to secretsmanager.amazonaws.com. This shows the requests from the console in addition to the Lambda function.

CloudTrail showing access to Secrets Manager

Secrets Manager also works with Amazon EventBridge so you can trigger alerts when administrator-specified operations occur. You can configure EventBridge rules to alert on deleted secrets or secret rotation. You can also create an alert if anyone tries to use a secret version while it is pending deletion. This can identify and alert when there is an attempt to use an out-of-date secret.

Enforce least privilege access to secrets

Access to secrets must be tightly controlled because the secrets contain sensitive information. Create AWS Identity and Access Management (IAM) policies that enable minimal access to secrets to prevent credentials being accidentally used or compromised. Secrets that have policies that are too permissive could be misused by other environments or developers. This can lead to accidental data loss or compromised systems. For more information, see “Authentication and access control for AWS Secrets Manager”.

Rotate secrets frequently.

Rotating your workload secrets is important. This prevents misuse of your secrets since they become invalid within a configured time period.

Secrets Manager allows you to rotate secrets on a schedule or on demand. This enables you to replace long-term secrets with short-term ones, significantly reducing the risk of compromise. Secrets Manager creates a CloudFormation stack with a Lambda function to manage the rotation process for you. Secrets Manager has native integrations with Amazon RDS, Amazon Redshift, and Amazon DocumentDB. It populates the function with the Amazon Resource Name (ARN) of the secret. You specify the permissions to rotate the credentials, and how often you want to rotate the secret.

The CloudFormation stack creates a MySecretRotationSchedule resource with a MyRotationLambda function to rotate the secret every 30 days.

MySecretRotationSchedule:
    Type: AWS::SecretsManager::RotationSchedule
    DependsOn: SecretRDSInstanceAttachment
    Properties:
    SecretId: !Ref MyRDSInstanceRotationSecret
    RotationLambdaARN: !GetAtt MyRotationLambda.Arn
    RotationRules:
        AutomaticallyAfterDays: 30
MyRotationLambda:
    Type: AWS::Serverless::Function
    Properties:
    Runtime: python3.7
    Role: !GetAtt MyLambdaExecutionRole.Arn
    Handler: mysql_secret_rotation.lambda_handler
    Description: 'This is a lambda to rotate MySql user passwd'
    FunctionName: 'cfn-rotation-lambda'
    CodeUri: 's3://devsecopsblog/code.zip'      
    Environment:
        Variables:
        SECRETS_MANAGER_ENDPOINT: !Sub 'https://secretsmanager.${AWS::Region}.amazonaws.com'

View and edit the rotation settings in the Secrets Manager console.

Secrets Manager rotation settings

Manually rotate the secret by selecting Rotate secret immediately. This invokes the Lambda function, which updates the database password and updates the secret in Secrets Manager.

View the updated secret in Secrets Manager, where the password has changed.

Secrets Manager password change

Browse to the endpoint URL to confirm you can still access the database with the updated credentials.

Access endpoint with updated Secret Manager password

You can provide your own code to customize a Lambda rotation function for other databases or services. The code includes the commands required to interact with your secured service to update or add credentials.

Conclusion

Implementing application security in your workload involves reviewing and automating security practices at the application code level. By implementing code security, you can protect against emerging security threats. You can improve the security posture by checking for malicious code, including third-party dependencies.

In this post, I continue from part 1, looking at securely storing, auditing, and rotating secrets that are used in your application code.

In the next post in the series, I start to cover the reliability pillar from the Well-Architected Serverless Lens with regulating inbound request rates.

For more serverless learning resources, visit Serverless Land.

Vertical Integration Strategy Powered by Amazon EventBridge

2021-07-03 Tiago Oliveira

Post Syndicated from Tiago Oliveira original https://aws.amazon.com/blogs/architecture/vertical-integration-strategy-powered-by-amazon-eventbridge/

Over the past few years, midsize and large enterprises have adopted vertical integration as part of their strategy to optimize operations and profitability. Vertical integration consists of separating different stages of the production line from other related departments, such as marketing and logistics. Enterprises implement such strategy to gain full control of their value chain: from the raw material production to the assembly lines and end consumer.

To achieve operational efficiency, enterprises must keep a level of independence between departments. However, this can lead to unstandardized operations and communication issues. Moreover, with this kind of autonomy for independent and dynamic verticals, the enterprise may lose some measure of visibility and control. As a result, it becomes challenging to generate a basic report from multiple departments. This blog post provides a high-level solution to integrate your different business verticals, using an event-driven architecture on top of Amazon EventBridge.

Event-driven architecture

Event-driven architecture is an architectural pattern to model communication between services while decoupling applications from each other. Applications scale and fail independently, and a central event bus facilitates the communication between the services in the enterprise. Instead of a particular application sending a request directly to another, it produces an event. The central event router captures it and forwards the message to the proper destinations.

For instance, when a customer places a new order on the retail website, the application sends the event to the event bus. Following, the event bus sends the message to the ERP system and the fulfillment center for dispatch. In this scenario, we call the application sending the event, an event publisher, and the applications receiving the event, event consumers.

Because all messages are going through the central event bus, there is clear independence between the applications within the enterprise. Here are some benefits:

Application independence occurs even if they belong to the same business workflow
You can plug in more event consumers to receive the same event type
You can add a data lake to receive all new order events from the retail website
You can receive all the events from the payment system and the customer relations department

This ensures you can integrate independent departments, increase overall visibility, and make sense of specific processes happening in the organization using the right tools.

Implementing event-driven architecture with Amazon EventBridge

Each vertical organically generates lifecycle events. Enterprises can use the event-driven architecture paradigm to make the information flow between the departments by asynchronously exchanging events through the event bus. This way, each department can react to events generated by other departments and initiate processes or actions depending on its business needs.

Such an approach creates a dynamic and flexible choreography between the different participants, which is unique to the enterprise. Such choreography can be followed and monitored using analytics and fine-grained event data collected on the data lake. Read Using AWS X-Ray tracing with Amazon EventBridge to learn how to debug and analyze this kind of distributed application.

Figure 1. Architecture diagram depicting enterprise vertical integration with Amazon EventBridge

In Figure 1, Amazon EventBridge works as the central event bus, the core component of this event-driven architecture. Through Amazon EventBridge, each event publisher sends or receives lifecycle events to and from all the other participants. Amazon EventBridge has an advanced routing mechanism using the concept of rules. Each rule defines up to five targets for the event arriving on the bus. Events are selected based on the event pattern. You can set up routing rules to determine where to send your data to build application architectures. These will react in real time to your data sources, with event publisher and consumer decoupled.

In addition to initiating the heavy routing and distribution of events, Amazon EventBridge can also give real-time insights into how the business runs. Using metrics automatically sent to Amazon CloudWatch, it is possible to see which kinds of events are arriving, and at which rate. You can also see how those events are distributed across the registered targets, and any failures that occur during this distribution. Every event can also be archived using the Amazon EventBridge events archiving feature.

Amazon Simple Storage Service (S3) is the backend storage, or data lake, for all the events that have ever transited via the event bus. With Amazon S3, customers have a cost-efficient storage service at any scale, with 11 9’s of durability. To help customers manage and secure their data, S3 provides features such as Amazon S3 Lifecycle to optimize costs. S3 Object Lock allows the write-once-read-many (WORM) model. You can expand this data and transform it into information using S3. Using services like Amazon Athena, Amazon Redshift, and Amazon EMR, those events can be transformed, correlated, and aggregated to generate insights on the business. The Amazon S3 data lake can also be the input to a data warehouse, machine learning models, and real-time analytics. Learn more about how to use Amazon S3 as the data lake storage.

A critical feature of this solution is the initiation of complex queries on top of the data lake. Amazon API Gateway provides one single flexible and elastic API entry point to retrieve data from the data lake. It also can publish events directly to the event bus. For complex queries, Amazon API Gateway can be integrated with an AWS Lambda. It will coordinate the execution of standard SQL queries using Amazon Athena as the query engine. You can read about a fully functional example of such an API called athena-express.

After collecting data from multiple departments, third-party entities, and shop floors, you can use the data to derive business value using cross-organization dashboards. In this way, you can increase visibility over the different entities and make sense of the data from the distributed systems. Even though this design allows you to use your favorite BI tool, we are using Amazon QuickSight for this solution. For example, with QuickSight, you can author your interactive dashboards, which include machine learning-powered insights. Those dashboards can then connect the marketing campaigns data with the sales data. You can measure how effective those campaigns were and forecast the demand on the production lines.

Conclusion

In this blog post, we showed you how to use Amazon EventBridge as an event bus to allow event-driven architectures. This architecture pattern streamlines the adoption of vertical integration. Enterprises can decouple IT systems from each other while retaining visibility into the data they generate. Integrating those systems can happen asynchronously using a choreography approach instead of having an orchestrator as a central component. There are technical challenges to implement this kind of solution, such as maintaining consistency in distributed applications and transactions spanning multiple microservices. Refer to the saga pattern for microservices-based architecture, and how to implement it using AWS Step Functions.

With a data lake in place to collect all the data produced by IT systems, you can create BI dashboards that provide a holistic view of multiple departments. Moreover, it allows organizations to get better insights into their valuable data and explore other use cases, such as machine learning. To support the data lake creation and management, refer to AWS Lake Formation and a series of other blog posts.

To learn more about Amazon EventBridge from a hands-on perspective, refer to this EventBridge workshop.

ICYMI: Serverless Q2 2021

2021-07-01 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-q2-2021/

Welcome to the 14th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

AWS Step Functions

Step Functions launched Workflow Studio, a new visual tool that provides a drag-and-drop user interface to build Step Functions workflows. This exposes all the capabilities of Step Functions that are available in Amazon States Language (ASL). This makes it easier to build and change workflows and build definitions in near-real time.

For more:

Watch this 3-minute video to learn how to build with Workflow Studio.
See the Serverless Office Hours episode, AWS Step Functions – New Workflow Studio.

The new data flow simulator in the Step Functions console helps you evaluate the inputs and outputs passed through your state machine. It allows you to simulate each of the fields used to process data and updates in real time. It can help accelerate development with workflows and help visualize JSONPath processing.

For more:

Read the developer guide.
See Serverless Office Hours: AWS Step Functions – Orchestrating data in Step Functions.

Also, Amazon API Gateway can now invoke synchronous Express Workflows using REST APIs.

Amazon EventBridge

EventBridge now supports cross-Region event routing from any commercial AWS Region to a list of supported Regions. This feature allows you to centralize global events for auditing and monitoring or replicate events across Regions.

The service now also supports bus-to-bus event routing in the same Region and in the same AWS account. This can be useful for centralizing events related to a single project, application, or team within your organization.

You can now use EventBridge as a resource within Step Functions workflows. This provides a direct service integration for both standard and Express Workflows. You can publish events directly to a specified event bus using either a request-response or wait-for-callback pattern.

EventBridge added a new target for rules – Amazon SageMaker Pipelines. This allows you to use a rule to trigger a continuous integration and continuous deployment (CI/CD) service for your machine learning workloads.

AWS Lambda

Lambda Extensions

AWS Lambda extensions are now generally available including some performance and functionality improvements. Lambda extensions provide a new way to integrate your chosen monitoring, observability, security, and governance tools with AWS Lambda. These use the Lambda Runtime Extensions API to integrate with the execution environment and provide hooks into the Lambda lifecycle.

To help build your own extensions, there is an updated GitHub repository with example code.

To learn more:

Watch a Tech Talk with Julian Wood.
Watch the 8-episode Learning Path series covering all aspects of extensions.

Amazon CloudWatch Lambda Insights support for Lambda container images is now generally available.

Amazon SNS

Amazon SNS has expanded the set of filter operators available to include IP address matching, existence of an attribute key, and “anything-but” matching.

The service has also introduced an SMS sandbox to help developers testing workloads that send text messages.

To learn more:

Find out how to migrate to 10DLC origination numbers when using SNS to send text messages to US destinations.
Serverless Office Hours covered the new messaging and filtering options for Amazon SNS.

Amazon DynamoDB

DynamoDB announced CloudFormation support for several features. First, it now supports configuring Kinesis Data Streams using CloudFormation. This allows you to use infrastructure as code to set up Kinesis Data Streams instead of DynamoDB streams.

The service also announced that NoSQL Workbench now supports CloudFormation, so you can build data models and configure table capacity settings directly from the tool. Finally, you can now create and manage global tables with CloudFormation.

Learn how to use the recently launched Serverless Patterns Collection to configure DynamoDB as an event source for Lambda.

AWS Amplify

Amplify Hosting announced support for server-side rendered (SSR) apps built with the Next.js framework. This provides a zero configuration option for developers to deploy and host their Next.js-based applications.

The Amplify GLI now allows developers to make multiple DynamoDB GSI updates in a single deployment. This can help accelerate data model iterations. Additionally, the data management experience in the Amplify Admin UI launched at AWS re:Invent 2020 is now generally available.

AWS Serverless Application Model (AWS SAM)

AWS SAM has a public preview of support for local development and testing of AWS Cloud Development Kit (AWS CDK) projects.

To learn more:

Optimize serverless development with samconfig.
Deploy machine learning models with serverless templates.

Serverless blog posts

Operating Lambda

The “Operating Lambda” blog series includes the following posts in this quarter:

Apr 5 – Operating Lambda: Logging and custom metrics
Apr 12 – Operating Lambda: Isolating and resolving issues
Apr 26 – Operating Lambda: Performance optimization – Part 1
May 4 – Operating Lambda: Performance optimization – Part 2

Streaming data

The “Building serverless applications with streaming data” blog series shows how to use Lambda with Kinesis.

Jun 1 – Building serverless applications with streaming data: Part 1
Jun 7 – Building serverless applications with streaming data: Part 2
Jun 14 – Building serverless applications with streaming data: Part 3
Jun 21 – Building leaderboard functionality with serverless data analytics
Jun 28 – Monitoring and troubleshooting serverless data analytics applications

Getting started with serverless for developers

Learn how to build serverless applications from your local integrated development environment (IDE).

April

Apr 7 – Processing satellite imagery with serverless architecture
Apr 8 – Evaluating access control methods to secure Amazon API Gateway APIs
Apr 14 – Controlling concurrency in distributed systems using AWS Step Functions
Apr 15 – Introducing cross-Region event routing with Amazon EventBridge
Apr 19 – Optimizing serverless development with samconfig
Apr 20 – Modeling workflow input and output path processing with data flow simulator
Apr 29 – Better together: AWS SAM and AWS CDK

May

June

Jun 1 – Introducing the SMS sandbox for Amazon SNS
Jun 1 – Provisioning and using 10DLC origination numbers with Amazon SNS
Jun 2 – Setting up AWS Lambda with an Apache Kafka cluster within a VPC
Jun 8 – Announcing migration of the Java 8 runtime in AWS Lambda to Amazon Corretto
Jun 16 – Exploring serverless patterns for Amazon DynamoDB
Jun 21 – Building leaderboard functionality with serverless data analytics
Jun 21 – Prototyping at speed with AWS Step Functions new Workflow Studio
Jun 22 – Deploying machine learning models with serverless templates
Jun 22 – Building well-architected serverless applications: Managing application security boundaries – part 1

Tech Talks & Events

We hold AWS Online Tech Talks covering serverless topics throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page. We also regularly deliver talks at conferences and events around the world, speak on podcasts, and record videos you can find to learn in bite-sized chunks.

Here are some from Q2:

Processing Streaming Data with AWS Lambda with James Beswick.
Transferring SaaS Application Data Securely with Amazon AppFlow with Eric Johnson.
Integrating AWS Lambda with Your Favorite Tools with Julian Wood.
Advanced Serverless Messaging Patterns for Your Applications with Talia Nassi.

Serverless Live was a day of talks held on May 19, featuring the serverless developer advocacy team, along with Adrian Cockroft and Jeff Barr. You can watch a replay of all the talks on the AWS Twitch channel.

Videos

Serverless Office Hours – Tues 10 AM PT / 1PM EST

Weekly live virtual office hours. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues. Ask us anything you want about serverless technologies and applications.

YouTube: youtube.com/serverlessland
Twitch: twitch.tv/aws

April

Apr 6 – AWS Messaging and Events – API Destinations for Amazon EventBridge
Apr 13 – Serverless Surprise! AWS Cloud9 & paired programming with serverless!
Apr 20, – AWS Lambda – The Lambda Operator’s Guide
Apr 27 – AWS Step Functions – Orchestrating data in Step Functions

May

May 4 – Amazon API Gateway – AWS IAM condition keys
May 11 – AWS Messaging and Events – Kinesis all the Streams
May 18 – Serverless Surprise – DynamoDB 101 from the 400 level guru
May 25 – AWS Lambda – Getting started with Serverless

June

Jun 1 – AWS Step Functions – Service integration for Amazon EventBridge
June 8 – Amazon API Gateway – Multi level base path mapping
June 15 – Message filtering and archiving with Amazon SNS
June 22 – AWS Step Functions – New Workflow Studio
June 29 – AWS Lambda – Lambda Extensions

DynamoDB Office Hours

Are you an Amazon DynamoDB customer with a technical question you need answered? If so, join us for weekly Office Hours on the AWS Twitch channel led by Rick Houlihan, AWS principal technologist and Amazon DynamoDB expert. See upcoming and previous shows

Apr 14 – Optimizing AWS AppSync API’s with Single Table
May 5 – Modeling an online banking service
May 12 – Deep Dive on the Dropbox Alki Metadata Service
June 2 – Modeling an Asset Management Service
June 9 – Modeling a Mobile Payments Service

Learning Path – AWS Lambda Extensions: The deep dive

Are you looking for a way to more easily integrate AWS Lambda with your favorite monitoring, observability, security, governance, and other tools? Welcome to AWS Lambda extensions: The deep dive, a learning path video series that shows you everything about augmenting Lambda functions using Lambda extensions.

There are also other helpful videos covering serverless available on the Serverless Land YouTube channel.

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

Chris Munns: @chrismunns
Eric Johnson: @edjgeek
James Beswick: @jbesw
Ben Smith: @benjamin_l_s
Julian Wood: @julian_wood
Talia Nassi: @talia_nassi

Keeping up with your dependencies: building a feedback loop for shared libraries

2021-06-26 Joerg Woehrle

Post Syndicated from Joerg Woehrle original https://aws.amazon.com/blogs/devops/keeping-up-with-your-dependencies-building-a-feedback-loop-for-shared-libraries/

In a microservices world, it’s common to share as little as possible between services. This enables teams to work independently of each other, helps to reduce wait times and decreases coupling between services.

However, it’s also a common scenario that libraries for cross-cutting-concerns (such as security or logging) are developed one time and offered to other teams for consumption. Although it’s vital to offer an opt-out of those libraries (namely, use your own code to address the cross-cutting-concern, such as when there is no version for a given language), shared libraries also provide the benefit of better governance and time savings.

To avoid these pitfalls when sharing artifacts, two points are important:

For consumers of shared libraries, it’s important to stay up to date with new releases in order to benefit from security, performance, and feature improvements.
For producers of shared libraries, it’s important to get quick feedback in case of an involuntarily added breaking change.

Based on those two factors, we’re looking for the following solution:

A frictionless and automated way to update consumer’s code to the latest release version of a given library
Immediate feedback to the library producer in case of a breaking change (the new version of the library breaks the build of a downstream system)

In this blog post I develop a solution that takes care of both those problems. I use Amazon EventBridge to be notified on new releases of a library in AWS CodeArtifact. I use an AWS Lambda function along with an AWS Fargate task to automatically create a pull request (PR) with the new release version on AWS CodeCommit. Finally, I use AWS CodeBuild to kick off a build of the PR and notify the library producer via EventBridge and Amazon Simple Notification Service (Amazon SNS) in case of a failure.

Overview of solution

Let’s start with a short introduction on the services I use for this solution:

CodeArtifact – A fully managed artifact repository service that makes it easy for organizations of any size to securely store, publish, and share software packages used in their software development process. CodeArtifact works with commonly used package managers and build tools like Maven, Gradle, npm, yarn, twine, and pip.
CodeBuild – A fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy.
CodeCommit – A fully-managed source control service that hosts secure Git-based repositories.
EventBridge – A serverless event bus that makes it easy to connect applications together using data from your own applications, integrated software as a service (SaaS) applications, and AWS services. EventBridge makes it easy to build event-driven applications because it takes care of event ingestion and delivery, security, authorization, and error handling.
Fargate – A serverless compute engine for containers that works with both Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes Service (EKS). Fargate removes the need to provision and manage servers, lets you specify and pay for resources per application, and improves security through application isolation by design.
Lambda – Lets you run code without provisioning or managing servers. You pay only for the compute time you consume.
Amazon SNS – A fully managed messaging service for both application-to-application (A2A) and application-to-person (A2P) communication.

The resulting flow through the system looks like the following diagram.

Architecture Diagram

In my example, I look at two independent teams working in two different AWS accounts. Team A is the provider of the shared library, and Team B is the consumer.

Let’s do a high-level walkthrough of the involved steps and components:

A new library version is released by Team A and pushed to CodeArtifact.
CodeArtifact creates an event when the new version is published.
I send this event to the default event bus in Team B’s AWS account.
An EventBridge rule in Team B’s account triggers a Lambda function for further processing.
The function filters SNAPSHOT releases (in Maven a SNAPSHOT represents an artifact still under development that doesn’t have a final release yet) and runs an Amazon ECS Fargate task for non-SNAPSHOT versions.
The Fargate task checks out the source that uses the shared library, updates the library’s version in the pom.xml, and creates a pull request to integrate the change into the mainline of the code repository.
The pull request creation results in an event being published.
An EventBridge rule triggers the CodeBuild project of the downstream artifact.
The result of the build is published as an event.
If the build fails, this failure is propagated back to the event bus of Team A.
The failure is forwarded to an SNS topic that notifies the subscribers of the failure.

Amazon EventBridge

A central component of the solution is Amazon EventBridge. I use EventBridge to receive and react on events emitted by the various AWS services in the solution (e.g., whenever a new version of an artifact gets uploaded to CodeArtifact, when a PR is created within CodeCommit or when a build fails in CodeBuild). Let’s have a high-level look on some of the central concepts of EventBridge:

Event Bus – An event bus is a pipeline that receives events. There is a default event bus in each account which receives events from AWS services. One can send events to an event bus via the PutEvents API.
Event – An event indicates a change in e.g., an AWS environment, a SaaS partner service or application or one of your applications.
Rule – A rule matches incoming events on an event bus and sends them to targets for processing. To react on a particular event, one creates a rule which matches this event. To learn more about the rule concept check out Rules on the EventBridge documentation.
Target – When an event matches the event pattern defined in a rule it is send to a target. There are currently more than 20 target types available in EventBridge. In this blog post I use the targets provided for: an event bus in a different account, a Lambda function, a CodeBuild project and an SNS topic. For a detailed list on available targets see Amazon EventBridge targets.

Solution Details:

In this section I walk through the most important parts of the solution. The complete code can be found on GitHub. For a detailed view on the resources created in each account please refer to the GitHub repository.

I use the AWS Cloud Development Kit (CDK) to create my infrastructure. For some of the resource types I create, no higher-level constructs are available yet (at the time of writing, I used AWS CDK version 1.108.1). This is why I sometimes use low-level AWS CloudFormation constructs or even use the provided escape hatches to use AWS CloudFormation constructs directly.

The code for the shared library producer and consumer is written in Java and uses Apache Maven for dependency management. However, the same concepts apply to e.g., Node.js and npm.

Notify another account of new releases

To send events from EventBridge to another account, the receiving account needs to specify an EventBusPolicy. The AWS CDK code on the consumer account looks like the following code:

new events.CfnEventBusPolicy(this, 'EventBusPolicy', {
    statementId: 'AllowCrossAccount',
    action: 'events:PutEvents',
    principal: consumerAccount
});

With that the producer account has the permission to publish events into the event bus of the consumer account.

I’m interested in CodeArtifact events that are published on the release of a new artifact. I first create a Rule which matches those events. Next, I add a target to the rule which targets the event bus of account B. As of this writing there is no CDK construct available to directly add another account as a target. That is why I use the underlying CloudFormation CfnRule to do that. This is called an escape hatch in CDK. For more information about escape hatches, see Escape hatches.

const onLibraryReleaseRule = new events.Rule(this, 'LibraryReleaseRule', {
  eventPattern: {
    source: [ 'aws.codeartifact' ],
    detailType: [ 'CodeArtifact Package Version State Change' ],
    detail: {
      domainOwner: [ this.account ],
      domainName: [ codeArtifactDomain.domainName ],
      repositoryName: [ codeArtifactRepo.repositoryName ],
      packageVersionState: [ 'Published' ],
      packageFormat: [ 'maven' ]
    }
  }
});
/* there is currently no CDK construct provided to add an event bus in another account as a target. 
That's why we use the underlying CfnRule directly */
const cfnRule = onLibraryReleaseRule.node.defaultChild as events.CfnRule;
cfnRule.targets = [ {arn: `arn:aws:events:${this.region}:${consumerAccount}:event-bus/default`, id: 'ConsumerAccount'} ];

For more information about event formats, see CodeArtifact event format and example.

Act on new releases in the consumer account

I established the connection between the events produced by Account A and Account B: The events now are available in Account B’s event bus. To use them, I add a rule which matches this event in Account B:

const onLibraryReleaseRule = new events.Rule(this, 'LibraryReleaseRule', {
  eventPattern: {
    source: [ 'aws.codeartifact' ],
    detailType: [ 'CodeArtifact Package Version State Change' ],
    detail: {
      domainOwner: [ producerAccount ],
      packageVersionState: [ 'Published' ],
      packageFormat: [ 'maven' ]
    }
  }
});

Add a Lambda function target

Now that I created a rule to trigger anytime a new package version is published, I will now add an EventBridge target which triggers my runTaskLambda Lambda Function. The below CDK code shows how I add our Lambda function as a target to the onLibraryRelease rule. Notice how I extract information from the event’s payload and pass it into the Lambda function’s invocation event.

onLibraryReleaseRule.addTarget(
    new targets.LambdaFunction( runTaskLambda,{
      event: events.RuleTargetInput.fromObject({
        groupId: events.EventField.fromPath('$.detail.packageNamespace'),
        artifactId: events.EventField.fromPath('$.detail.packageName'),
        version: events.EventField.fromPath('$.detail.packageVersion'),
        repoUrl: codeCommitRepo.repositoryCloneUrlHttp,
        region: this.region
      })
    }));

Filter SNAPSHOT versions

Because I’m not interested in Maven SNAPSHOT versions (such as 1.0.1-SNAPSHOT), I have to find a way to filter those and only act upon non-SNAPSHOT versions. Even though content-based filtering on event patterns is supported by Amazon EventBridge, filtering on suffixes is not supported as of this writing. This is why the Lambda function filters SNAPSHOT versions and only acts upon real, non-SNAPSHOT, releases. For those, I start a custom Amazon ECS Fargate task by using the AWS JavaScript SDK. My function passes some environment overrides to the Fargate task in order to have the required information about the artifact available at runtime.

In the following function code, I pass all required information to create a pull request into the environment of the Fargate task:

const AWS = require('aws-sdk');

const ECS = new AWS.ECS();
exports.handler = async (event) => {
    console.log(`Received event: ${JSON.stringify(event)}`)
    const artifactVersion = event.version;
    const artifactId = event.artifactId;
    if ( artifactVersion.indexOf('SNAPSHOT') > -1 ) {
        console.log(`Skipping SNAPSHOT version ${artifactVersion}`)
    } else {
        console.log(`Triggering task to create pull request for version ${artifactVersion} of artifact ${artifactId}`);
        const params = {
            launchType: 'FARGATE',
            taskDefinition: process.env.TASK_DEFINITION_ARN,
            cluster: process.env.CLUSTER_ARN,
            networkConfiguration: {
                awsvpcConfiguration: {
                    subnets: process.env.TASK_SUBNETS.split(',')
                }
            },
            overrides: {
                containerOverrides: [ {
                    name: process.env.CONTAINER_NAME,
                    environment: [
                        {name: 'REPO_URL', value: process.env.REPO_URL},
                        {name: 'REPO_NAME', value: process.env.REPO_NAME},
                        {name: 'REPO_REGION', value: process.env.REPO_REGION},
                        {name: 'ARTIFACT_VERSION', value: artifactVersion},
                        {name: 'ARTIFACT_ID', value: artifactId}
                    ]
                } ]
            }
        };
        await ECS.runTask(params).promise();
    }
};

Create the pull request

With the environment set, I can use a simple bash script inside the container to create a new Git branch, update the pom.xml with the new dependency version, push the branch to CodeCommit, and use the AWS Command Line Interface (AWS CLI) to create the pull request. The Docker entrypoint looks like the following code:

#!/usr/bin/env bash
set -e

# clone the repository and create a new branch for the change
git clone --depth 1 $REPO_URL repo && cd repo
branch="library_update_$(date +"%Y-%m-%d_%H-%M-%S")"
git checkout -b "$branch"

# replace whatever version is currently used by the new version of the library
sed -i "s/<shared\.library\.version>.*<\/shared\.library\.version>/<shared\.library\.version>${ARTIFACT_VERSION}<\/shared\.library\.version>/g" pom.xml

# stage, commit and push the change
git add pom.xml
git -c "user.name=ECS Pull Request Creator" -c "[email protected]" commit -m "Update version of ${ARTIFACT_ID} to ${ARTIFACT_VERSION}"
git push --set-upstream origin "$branch"

# create pull request
aws codecommit create-pull-request --title "Update version of ${ARTIFACT_ID} to ${ARTIFACT_VERSION}" --targets repositoryName="$REPO_NAME",sourceReference="$branch",destinationReference=main --region "$REPO_REGION"

After a successful run, I can check the CodeCommit UI for the created pull request. The following screenshot shows the changes introduced by one of my pull requests during testing:

Screenshot of the Pull Request in AWS CodeCommit

Now that I have the pull request in place, I want to verify that the dependency update does not break my consumer code. I do this by triggering a CodeBuild project with the help of EventBridge.

Build the pull request

The ingredients I use are the same as with the CodeArtifact event. I create a rule that matches the event emitted by CodeCommit (limiting it to branches that match the prefix used by our Fargate task). Afterwards I add a target to the rule to start the CodeBuild project:

const onPullRequestCreatedRule = new events.Rule(this, 'PullRequestCreatedRule', {
  eventPattern: {
    source: [ 'aws.codecommit' ],
    detailType: [ 'CodeCommit Pull Request State Change' ],
    resources: [ codeCommitRepo.repositoryArn ],
    detail: {
      event: [ 'pullRequestCreated' ],
      sourceReference: [ {
        prefix: 'refs/heads/library_update_'
      } ],
      destinationReference: [ 'refs/heads/main' ]
    }
  }
});
onPullRequestCreatedRule.addTarget( new targets.CodeBuildProject(codeBuild, {
  event: events.RuleTargetInput.fromObject( {
    projectName: codeBuild.projectName,
    sourceVersion: events.EventField.fromPath('$.detail.sourceReference')
  })
}));

This triggers the build whenever a new pull request is created with a branch prefix of refs/head/library_update_.
You can easily add the build results as a comment back to CodeCommit. For more information, see Validating AWS CodeCommit Pull Requests with AWS CodeBuild and AWS Lambda.

My last step is to notify an SNS topic in in case of a failing build. The SNS topic is a resource in Account A. To target a resource in a different account I need to forward the event to this account’s event bus. From there I then target the SNS topic.

First, I forward the failed build event from Account B into the default event bus of Account A:

const onFailedBuildRule = new events.Rule(this, 'BrokenBuildRule', {
  eventPattern: {
    detailType: [ 'CodeBuild Build State Change' ],
    source: [ 'aws.codebuild' ],
    detail: {
      'build-status': [ 'FAILED' ]
    }
  }
});
const producerAccountTarget = new targets.EventBus(events.EventBus.fromEventBusArn(this, 'cross-account-event-bus', `arn:aws:events:${this.region}:${producerAccount}:event-bus/default`))
onFailedBuildRule.addTarget(producerAccountTarget);

Then I target the SNS topic in Account A to be notified of failures:

const onFailedBuildRule = new events.Rule(this, 'BrokenBuildRule', {
  eventPattern: {
    detailType: [ 'CodeBuild Build State Change' ],
    source: [ 'aws.codebuild' ],
    account: [ consumerAccount ],
    detail: {
      'build-status': [ 'FAILED' ]
    }
  }
});
onFailedBuildRule.addTarget(new targets.SnsTopic(notificationTopic));

See it in action

I use the cdk-assume-role-credential-plugin to deploy to both accounts, producer and consumer, with a single CDK command issued to the producer account. To do this I create roles for cross account access from the producer account in the consumer account as described here. I also make sure that the accounts are bootstrapped for CDK as described here. After that I run the following steps:

Deploy the Stacks:
cd cdk && cdk deploy --context region=<YOUR_REGION> --context producerAccount=<PRODUCER_ACCOUNT_NO> --context consumerAccount==<CONSUMER_ACCOUNT_NO> --all && cd -
After a successful deployment CDK prints a set of export commands. I set my environment from those Outputs:
❯ export CODEARTIFACT_ACCOUNT=<MY_PRODUCER_ACCOUNT>
❯ export CODEARTIFACT_DOMAIN=<MY_CODEARTIFACT_DOMAIN>
❯ export CODEARTIFACT_REGION=<MY_REGION>
❯ export CODECOMMIT_URL=<MY_CODECOMMIT_URL>
Setup Maven to authenticate to CodeArtifact
export CODEARTIFACT_TOKEN=$(aws codeartifact get-authorization-token --domain $CODEARTIFACT_DOMAIN --domain-owner $CODEARTIFACT_ACCOUNT --query authorizationToken --output text)
Release the first version of the shared library to CodeArtifact:
cd library_producer/library && mvn --settings ./settings.xml deploy && cd -
From a console which is authenticated/authorized for CodeCommit in the Consumer Account
1. Setup git to work with CodeCommit
2. Push the code of the library consumer to CodeCommit:
  cd library_consumer/library && git init && git add . && git commit -m "Add consumer to codecommit" && git remote add codecommit $CODECOMMIT_URL && git push --set-upstream codecommit main && cd -
Release a new version of the shared library:
cd library_producer/library && sed -i '' 's/<version>1.0.0/<version>1.0.1/' pom.xml && mvn --settings settings.xml deploy && cd -
After 1-3 minutes a Pull Request is created in the CodeCommit repo in the Consumer Account and a build is run to verify this PR:
In case of a build failure, you can create a subscription to the SNS topic in Account A to act upon the broken build.

Clean up

In case you followed along with this blog post and want to prevent incurring costs you have to delete the created resources. Run cdk destroy --context region=<YOUR_REGION> --context producerAccount=<PRODUCER_ACCOUNT_NO> --context consumerAccount==<CONSUMER_ACCOUNT_NO> --all to delete the CloudFormation stacks.

Conclusion

In this post, I automated the manual task of updating a shared library dependency version. I used a workflow that not only updates the dependency version, but also notifies the library producer in case the new artifact introduces a regression (for example, an API incompatibility with an older version). By using Amazon EventBridge I’ve created a loosely coupled solution which can be used as a basis for a feedback loop between library creators and consumers.

What next?

To improve the solution, I suggest to look into possibilities of error handling for the Fargate task. What happens if the git operation fails? How do we signal such a failure? You might want to replace the AWS Fargate portion with a Lambda-only solution and use AWS Step Functions for better error handling.

As a next step, I could think of a solution that automates updates for libraries stored in Maven Central. Wouldn’t it be nice to never miss the release of a new Spring Boot version? A Fargate task run on a schedule and the following code should get you going:

curl -sS 'https://search.maven.org/solrsearch/select?q=g:org.springframework.boot%20a:spring-boot-starter&start=0&rows=1&wt=json' | jq -r '.response.docs[ 0 ].latestVersion'

Happy Building!

Author bio

Joerg is a Solutions Architect at AWS and works with manufacturing customers in Germany. As a former Developer, DevOps Engineer and SRE he enjoys building and automating things.

Building well-architected serverless applications: Managing application security boundaries – part 1

2021-06-22 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-managing-application-security-boundaries-part-1/

Security question SEC2: How do you manage your serverless application’s security boundaries?

Defining and securing your serverless application’s boundaries ensures isolation for, within, and between components.

Required practice: Evaluate and define resource policies

Resource policies are AWS Identity and Access Management (IAM) statements. They are attached to resources such as an Amazon S3 bucket, or an Amazon API Gateway REST API resource or method. The policies define what identities have fine-grained access to the resource. To see which services support resource-based policies, see “AWS Services That Work with IAM”. For more information on how resource policies and identity policies are evaluated, see “Identity-Based Policies and Resource-Based Policies”.

Understand and determine which resource policies are necessary

Resource policies can protect a component by restricting inbound access to managed services. Use resource policies to restrict access to your component based on a number of identities, such as the source IP address/range, function event source, version, alias, or queues. Resource policies are evaluated and enforced at IAM level before each AWS service applies it’s own authorization mechanisms, when available. For example, IAM resource policies for API Gateway REST APIs can deny access to an API before an AWS Lambda authorizer is called.

If you use multiple AWS accounts, you can use AWS Organizations to manage and govern individual member accounts centrally. Certain resource policies can be applied at the organizations level, providing guardrail for what actions AWS accounts within the organization root or OU can do. For more information see, “Understanding how AWS Organization Service Control Policies work”.

Review your existing policies and how they’re configured, paying close attention to how permissive individual policies are. Your resource policies should only permit necessary callers.

Implement resource policies to prevent unauthorized access

For Lambda, use resource-based policies to provide fine-grained access to what AWS IAM identities and event sources can invoke a specific version or alias of your function. Resource-based policies can also be used to control access to Lambda layers. You can combine resource policies with Lambda event sources. For example, if API Gateway invokes Lambda, you can restrict the policy to the API Gateway ID, HTTP method, and path of the request.

In the serverless airline example used in this series, the IngestLoyalty service uses a Lambda function that subscribes to an Amazon Simple Notification Service (Amazon SNS) topic. The Lambda function resource policy allows SNS to invoke the Lambda function.

Lambda resource policy document

API Gateway resource-based policies can restrict API access to specific Amazon Virtual Private Cloud (VPC), VPC endpoint, source IP address/range, AWS account, or AWS IAM users.

Amazon Simple Queue Service (SQS) resource-based policies provide fine-grained access to certain AWS services and AWS IAM identities (users, roles, accounts). Amazon SNS resource-based policies restrict authenticated and non-authenticated actions to topics.

Amazon DynamoDB resource-based policies provide fine-grained access to tables and indexes. Amazon EventBridge resource-based policies restrict AWS identities to send and receive events including to specific event buses.

For Amazon S3, use bucket policies to grant permission to your Amazon S3 resources.

The AWS re:Invent session Best practices for growing a serverless application includes further suggestions on enforcing security best practices.

Best practices for growing a serverless application

Good practice: Control network traffic at all layers

Apply controls for controlling both inbound and outbound traffic, including data loss prevention. Define requirements that help you protect your networks and protect against exfiltration.

Use networking controls to enforce access patterns

API Gateway and AWS AppSync have support for AWS Web Application Firewall (AWS WAF) which helps protect web applications and APIs from attacks. AWS WAF enables you to configure a set of rules called a web access control list (web ACL). These allow you to block, or count web requests based on customizable web security rules and conditions that you define. These can include specified IP address ranges, CIDR blocks, specific countries, or Regions. You can also block requests that contain malicious SQL code, or requests that contain malicious script. For more information, see How AWS WAF Works.

A private API endpoint is an API Gateway interface VPC endpoint that can only be accessed from your Amazon Virtual Private Cloud (Amazon VPC). This is an elastic network interface that you create in a VPC. Traffic to your private API uses secure connections and does not leave the Amazon network, it is isolated from the public internet. For more information, see “Creating a private API in Amazon API Gateway”.

To restrict access to your private API to specific VPCs and VPC endpoints, you must add conditions to your API’s resource policy. For example policies, see the documentation.

By default, Lambda runs your functions in a secure Lambda-owned VPC that is not connected to your account’s default VPC. Functions can access anything available on the public internet. This includes other AWS services, HTTPS endpoints for APIs, or services and endpoints outside AWS. The function cannot directly connect to your private resources inside of your VPC.

You can configure a Lambda function to connect to private subnets in a VPC in your account. When a Lambda function is configured to use a VPC, the Lambda function still runs inside the Lambda service VPC. The function then sends all network traffic through your VPC and abides by your VPC’s network controls. Functions deployed to virtual private networks must consider network access to restrict resource access.

AWS Lambda service VPC with VPC-to-VPT NAT to customer VPC

When you connect a function to a VPC in your account, the function cannot access the internet, unless the VPC provides access. To give your function access to the internet, route outbound traffic to a NAT gateway in a public subnet. The NAT gateway has a public IP address and can connect to the internet through the VPC’s internet gateway. For more information, see “How do I give internet access to my Lambda function in a VPC?”. Connecting a function to a public subnet doesn’t give it internet access or a public IP address.

You can control the VPC settings for your Lambda functions using AWS IAM condition keys. For example, you can require that all functions in your organization are connected to a VPC. You can also specify the subnets and security groups that the function’s users can and can’t use.

Unsolicited inbound traffic to a Lambda function isn’t permitted by default. There is no direct network access to the execution environment where your functions run. When connected to a VPC, function outbound traffic comes from your own network address space.

You can use security groups, which act as a virtual firewall to control outbound traffic for functions connected to a VPC. Use security groups to permit your Lambda function to communicate with other AWS resources. For example, a security group can allow the function to connect to an Amazon ElastiCache cluster.

To filter or block access to certain locations, use VPC routing tables to configure routing to different networking appliances. Use network ACLs to block access to CIDR IP ranges or ports, if necessary. For more information about the differences between security groups and network ACLs, see “Compare security groups and network ACLs.”

In addition to API Gateway private endpoints, several AWS services offer VPC endpoints, including Lambda. You can use VPC endpoints to connect to AWS services from within a VPC without an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.

Using tools to audit your traffic

When you configure a Lambda function to use a VPC, or use private API endpoints, you can use VPC Flow Logs to audit your traffic. VPC Flow Logs allow you to capture information about the IP traffic going to and from network interfaces in your VPC. Flow log data can be published to Amazon CloudWatch Logs or S3 to see where traffic is being sent to at a granular level. Here are some flow log record examples. For more information, see “Learn from your VPC Flow Logs”.

Block network access when required

In addition to security groups and network ACLs, third-party tools allow you to disable outgoing VPC internet traffic. These can also be configured to allow traffic to AWS services or allow-listed services.

Conclusion

Managing your serverless application’s security boundaries ensures isolation for, within, and between components. In this post, I cover how to evaluate and define resource policies, showing what policies are available for various serverless services. I show some of the features of AWS WAF to protect APIs. Then I review how to control network traffic at all layers. I explain how Lambda functions connect to VPCs, and how to use private APIs and VPC endpoints. I walk through how to audit your traffic.

This well-architected question will be continued where I look at using temporary credentials between resources and components. I cover why smaller, single purpose functions are better from a security perspective, and how to audit permissions. I show how to use AWS Serverless Application Model (AWS SAM) to create per-function IAM roles.

For more serverless learning resources, visit https://serverlessland.com.

Creating a notification workflow from sensitive data discover with Amazon Macie, Amazon EventBridge, AWS Lambda, and Slack

2021-06-10 Bruno Silviera

Post Syndicated from Bruno Silviera original https://aws.amazon.com/blogs/security/creating-a-notification-workflow-from-sensitive-data-discover-with-amazon-macie-amazon-eventbridge-aws-lambda-and-slack/

Following the example of the EU in implementing the General Data Protection Regulation (GDPR), many countries are implementing similar data protection laws. In response, many companies are forming teams that are responsible for data protection. Considering the volume of information that companies maintain, it’s essential that these teams are alerted when sensitive data is at risk.

This post shows how to deploy a solution that uses Amazon Macie to discover sensitive data. This solution enables you to set up automatic notification to your company’s designated data protection team via a Slack channel when sensitive data that needs to be protected is discovered by Amazon EventBridge and AWS Lambda.

The challenge

Let’s imagine that you’re part of a team that’s responsible for classifying your organization’s data but the data structure isn’t documented. Amazon Macie provides you the ability to run a scheduled classification job that examines your data, and you want to notify the data protection team when there’s new sensitive data to classify. Let’s build a solution to automatically notify the data protection team.

Solution overview

To be scalable and cost-effective, this solution uses serverless technologies and managed AWS services, including:

Macie – A fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in Amazon Web Services (AWS).
EventBridge – A serverless event bus that connects application data from your apps, SaaS, and AWS services. EventBridge can respond to specific events or run according to a schedule. The solution presented in this post uses EventBridge to initiate a custom Lambda function in response to a specific event.
Lambda – Runs code in response to events such as changes in data, changes in application state, or user actions. In this solution, a Lambda function is initiated by EventBridge.

Solution architecture

The architecture workflow is shown in Figure 1 and includes the following steps:

Macie runs a classification job and publishes its findings to EventBridge as a JSON object.
The EventBridge rule captures the findings and invokes a Lambda function as a target.
The Lambda function parses the JSON object. The function then sends a custom message to a Slack channel with the sensitive data finding for the data protection team to evaluate and respond to.

Figure 1: Solution architecture workflow

Set up Slack

For this solution, you need a Slack workspace and an incoming webhook. The workspace must be in place before you create the webhook.

Create a Slack workspace

If you already have a Slack workspace in your environment, you can skip forward, to creating the webhook.

If you don’t have a Slack workspace, follow the steps in Create a Slack Workspace to create one.

Create an incoming webhook in Slack API

Go to your Slack API.
Choose Start Building to create an app.
Enter the following details for your app:
- App Name – macie-to-slack.
- Development Slack Workspace – Choose the Slack workspace—either an existing workspace or one you created for this solution—to receive the Macie findings.
Choose the Create App button.
In the left menu, choose Incoming Webhooks.
At the Activate Incoming Webhooks screen, move the slider from OFF to ON.
Scroll down and choose Add New Webhook to Workspace.
In the screen asking where your app should post, enter the name of the Slack channel from your Workspace that you want to send notification to and choose Authorize.
On the next screen, scroll down to the Webhook URL section. Make a note of the URL to use later.

Deploy the CloudFormation template with the solution

The deployment of the CloudFormation template automatically creates the following resources:

A Lambda function that begins with the name named macie-to-slack-lambdafindingsToSlack-.
An EventBridge rule named MacieFindingsToSlack.
An IAM role named MacieFindingsToSlackkRole.
A permission to invoke the Lambda function named LambdaInvokePermission.

Note: Before you proceed, make sure you’re deploying the template to the same Region that your production Macie is running.

To deploy the Cloudformation template

Download the YAML template to your computer.

Note: To save the template, you can right click the Raw button at the top of the code and then select Save link as if you’re using Chrome, or the equivalent in your browser. This file is used in Step 4.
Open CloudFormation in the AWS Management Console.
On the Welcome page, choose Create stack and then choose With new resources.
On Step 1 — Specify template, choose Upload a template file, select Choose file and then select the file template.yaml (the file extension might be .YML), then choose Next.
On Step 2 — Specify stack details:
1. Enter macie-to-slack as the Stack name.
2. At the Slack Incoming Web Hook URL, paste the webhook URL you copied earlier.
3. At Slack channel, enter the name of the channel in your workspace that will receive the alerts and choose Next.
Figure 2: Defining stack details
On Step 3 – Configure Stack options, you can leave the default settings, or change them for your environment. Choose Next to continue.
At the bottom of Step 4 – Review, select I acknowledge that AWS CloudFormation might create IAM resources, and choose Create stack.

Figure 3: Confirmation before stack creation
Wait for the stack to reach status CREATE_COMPLETE.

Running the solution

At this point, you’ve deployed the solution and your resources are created.

To test the solution, you can schedule a Macie job targeting a bucket that contains a file with sensitive information that Macie can detect.

Note: You can check the Amazon Macie documentation to see the list of supported managed data identifiers.

When the Macie job is complete, any findings are sent to the Slack channel.

Figure 4: Macie finding delivered to Slack channel

Select the link in the message sent to the Slack channel to open that finding in the Macie console, as shown in Figure 5.

Figure 5: Finding details

And you’re done!

Now your Macie finding results are delivered to your Slack channel where they can be easily monitored, reducing response time and risk exposure.

If you deployed this for testing purposes, or want to clean this up and move to your production account, you can delete the Cloudformation stack:

Open the CloudFormation console.
Select the stack and choose Delete.

Conclusion

In this blog post we walked through the steps to configure a notification workflow using Macie, Lambda, and EventBridge to send sensitive data findings to your data protection team via a Slack channel.

Your data protection team will appreciate the timely notifications of sensitive data findings, giving you the ability to focus on creating controls to improve data security and compliance with regulations related to protection and treatment of personal data.

For more information about data privacy on AWS, see Data Privacy FAQ.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Using API destinations with Amazon EventBridge

2021-03-05 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-api-destinations-with-amazon-eventbridge/

Amazon EventBridge enables developers to route events between AWS services, integrated software as a service (SaaS) applications, and your own applications. It can help decouple applications and produce more extensible, maintainable architectures. With the new API destinations feature, EventBridge can now integrate with services outside of AWS using REST API calls.

This feature enables developers to route events to existing SaaS providers that integrate with EventBridge, like Zendesk, PagerDuty, TriggerMesh, or MongoDB. Additionally, you can use other SaaS endpoints for applications like Slack or Contentful, or any other type of API or webhook. It can also provide an easier way to ingest data from serverless workloads into Splunk without needing to modify application code or install agents.

This blog post explains how to use API destinations and walks through integration examples you can use in your workloads.

How it works

API destinations are third-party targets outside of AWS that you can invoke with an HTTP request. EventBridge invokes the HTTP endpoint and delivers the event as a payload within the request. You can use any preferred HTTP method, such as GET or POST. You can use input transformers to change the payload format to match your target.

An API destination uses a Connection to manage the authentication credentials for a target. This defines the authorization type used, which can be an API key, OAuth client credentials grant, or a basic user name and password.

The service manages the secret in AWS Secrets Manager and the cost of storing the secret is included in the pricing for API destinations.

You create a connection to each different external API endpoint and share the connection with multiple endpoints. The API destinations console shows all configured connections, together with their authorization status. Any connections that cannot be established are shown here:

To create an API destination, you provide a name, the HTTP endpoint and method, and the connection:

When you configure the destination, you must also set an invocation rate limit between 1 and 300 events per second. This helps protect the downstream endpoint from surges in traffic. If the number of arriving events exceeds the limit, the EventBridge service queues up events. It delivers to the endpoint as quickly as possible within the rate limit.

It continues to do this for 24 hours. To make sure you retain any events that cannot be delivered, set up a dead-letter queue on the event bus. This ensures that if the event is not delivered within this timeframe, it is stored durably in an Amazon SQS queue for further processing. This can also be useful if the downstream API experiences an outage for extended periods of time.

Once you have configured the API destination, it becomes available in the list of targets for rules. Matching events are sent to the HTTP endpoint with the event serialized as part of the payload.

As with API Gateway targets in EventBridge, the maximum timeout for API destination is 5 seconds. If an API call exceeds this timeout, it is retried.

Debugging the payload from API destinations

You can send an event via API Destinations to debugging tools like Webhook.site to view the headers and payload of the API call:

Create a connection with the credential, such as an API key.
Create an API destination with the webhook URL endpoint and then create a rule to match and route events.
The Webhook.site testing service shows the headers and payload once the webhook is triggered. This can help you test rules if you are adding headers or manipulating the payload using an input transformer.

Customizing the payload

Third-party APIs often require custom headers or payload formats when accepting data. EventBridge rules allow you to customize header parameters, query strings, and payload formats without the need for custom code. Header parameters and query strings can be configured with static values or attributes from the event:

To customize the payload, configure an input transformer, which consists of an Input Path and Input template. You use an Input Path to define variables and use JSONPath query syntax to identify the variable source in the event. For example, to extract two attributes from an Amazon S3 PutObject event, the Input Path is:

{
  "key" : "$.[0].s3.object.key", 
  "bucket" : " $.[0].s3.bucket.name "
}

Next, the Input template defines the structure of the data passed to the target, which references the variables. With this release you can now use variables inside quotes in the input transformer. As a result, you can pass these values as a string or JSON, for example:

{
  "filename" : "<key>", 
  "container" : "mycontainer-<bucket>"
}

Sending AWS events to DataDog

Using API destinations, you can send any AWS-sourced event to third-party services like DataDog. This approach uses the DataDog API to put data into the service. To do this, you must sign up for an account and create an API key. In this example, I send S3 events via CloudTrail to DataDog for further analysis.

Navigate to the EventBridge console, select API destinations from the menu.
Select the Connections tab and choose Create connection:
Enter a connection name, then select API Key for Authorization Type. Enter the API key name DD-API-KEY and paste your secret API key as the value. Choose Create.
In the API destinations tab, choose Create API destination.
Enter a name, set the API destination endpoint to https://http-intake.logs.datadoghq.com/v1/input, and the HTTP method to POST. Enter 300 for Invocation rate limit and select the DataDog connection from the dropdown. Choose Create.
From the EventBridge console, select Rules and choose Create rule. Enter a name, select the default bus, and enter this event pattern:
```
{
  "source": ["aws.s3"]
}
```
In Select targets, choose API destination and select DataDog for the API destination. Expand, Configure Input.
In the Input transformer section, enter {"detail":"$.detail"} in the Input Path field and enter {"message": <detail>} in the Input Template.
You can optionally add a dead letter queue. To do this, open the Retry policy and dead-letter queue section. Under Dead-letter queue, select an existing SQS queue.
Choose Create.
Open AWS CloudShell and upload an object to an S3 bucket in your account to trigger an event:
```
echo "test" > testfile.txt
aws s3 cp testfile.txt s3://YOUR_BUCKET_NAME
```
The logs appear in the DataDog Logs console, where you can process the raw data for further analysis:

Sending AWS events to Zendesk

Zendesk is a SaaS provider that provides customer support solutions. It can already send events to EventBridge using a partner integration. This post shows how you can consume ticket events from Zendesk and run a sentiment analysis using Amazon Comprehend.

With API destinations, you can now use events to call the Zendesk API to create and modify tickets and interact with chats and customer profiles.

To create an API destination for Zendesk:

Log in with an existing Zendesk account or register for a trial account.
Navigate to the EventBridge console, select API destinations from the menu and choose Create API destination.
On the Create API destination, page:
1. Enter a name for the destination (e.g. “SendToZendesk”).
2. For API destination endpoint, enter https://<<your-subdomain>>.zendesk.com/api/v2/tickets.json.
3. For HTTP method, select POST.
4. For Invocation rate, enter 10.
In the connection section:
1. Select the Create a new connection radio button.
2. For Connection name, enter ZendeskConnection.
3. For Authorization type, select Basic (Username/Password).
4. Enter your Zendesk username and password.
Choose Create.

When you create a rule to route to this API destination, use the Input transformer to build the defined JSON payload, as shown in the previous DataDog example. When an event matches the rule, EventBridge calls the Zendesk Create Ticket API. The new ticket appears in the Zendesk dashboard:

For more information on the Zendesk API, visit the Zendesk Developer Portal.

Building an integration with AWS CloudFormation and AWS SAM

To support this new feature, there are two new AWS CloudFormation resources available. These can also be used in AWS Serverless Application Model (AWS SAM) templates:

The API destination resource is represented by AWS::Events::ApiDestination.
The connection resource is represented by AWS::Events::Connection.

The connection resource defines the connection credential and optional invocation HTTP parameters:

Resources:
  TestConnection:
    Type: AWS::Events::Connection
    Properties:
      AuthorizationType: API_KEY
      Description: 'My connection with an API key'
      AuthParameters:
        ApiKeyAuthParameters:
          ApiKeyName: VHS
          ApiKeyValue: Testing
        InvocationHttpParameters:
          BodyParameters:
          - Key: 'my-integration-key'
            Value: 'ABCDEFGH0123456'

Outputs:
  TestConnectionName:
    Value: !Ref TestConnection
  TestConnectionArn:
    Value: !GetAtt TestConnection.Arn

The API destination resource provides the connection, endpoint, HTTP method, and invocation limit:

Resources:
  TestApiDestination:
    Type: AWS::Events::ApiDestination
    Properties:
      Name: 'datadog-target'
      ConnectionArn: arn:aws:events:us-east-1:123456789012:connection/datadogConnection/2
      InvocationEndpoint: 'https://http-intake.logs.datadoghq.com/v1/input'
      HttpMethod: POST
      InvocationRateLimitPerSecond: 300

Outputs:
  TestApiDestinationName:
    Value: !Ref TestApiDestination
  TestApiDestinationArn:
    Value: !GetAtt TestApiDestination.Arn
  TestApiDestinationSecretArn:
    Value: !GetAtt TestApiDestination.SecretArn

You can use the existing AWS::Events::Rule resource to configure an input transformer for API destination targets:

ApiDestinationDeliveryRule:
  Type: AWS::Events::Rule
  Properties:
    EventPattern:
      source:
        - "EventsForMyAPIdestination"
    State: "ENABLED"
    Targets:
      -
        Arn: !Ref TestApiDestinationArn
        InputTransformer:
          InputPathsMap:
            detail: $.detail
          InputTemplate: >
            {
                    "message": <detail>
            }

Conclusion

The API destinations feature of EventBridge enables developers to integrate workloads with third-party applications using REST API calls. This provides an easier way to build decoupled, extensible applications that work with applications outside of the AWS Cloud.

To use this feature, you configure a connection and an API destination. You can use API destinations in the same way as existing targets for rules, and also customize headers, query strings, and payloads in the API call.

Learn more about using API destinations with the following SaaS providers: Datadog, Freshworks, MongoDB, TriggerMesh, and Zendesk.

For more serverless learning resources, visit Serverless Land.

Analyzing Freshdesk data using Amazon EventBridge and Amazon Athena

2021-03-03 Benjamin Smith

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/analyzing-freshdesk-data-using-amazon-eventbridge-and-amazon-athena/

This post is written by Shashi Shankar, Application Architect, Shared Delivery Teams

Freshdesk is an omnichannel customer service platform by Freshworks. It provides automation services to help speed up customer support processes.

The Freshworks connector to Amazon EventBridge allows real time streaming of Freshdesk events with minimal configuration and setup. This integration provides real-time insights into customer support operations without the operational overhead of provisioning and maintaining any servers.

In this blog post, I walk through a serverless approach to ingest and analyze Freshdesk data. This solution uses EventBridge, Amazon Kinesis Data Firehose, Amazon S3, and Amazon Athena. I also look at examples of customer service questions that can be answered using this approach.

The following diagram shows a high-level architecture of the proposed solution:

When a Freshdesk ticket is updated or created, the Freshworks connector pushes event data to the Amazon EventBridge partner event bus.
A rule on the partner event bus pushes the event data to Kinesis Data Firehose.
Kinesis Data Firehose batches data before sending to S3. An AWS Lambda function transforms the data by adding a new line to each record before sending.
Kinesis Data Firehose delivers the batch of records to S3.
Athena is used to query relevant data from S3 using standard SQL.

The walkthrough shows you how to:

Add the EventBridge app to Freshdesk account.
Configure a Freshworks partner event bus in EventBridge.
Deploy a Kinesis Data Firehose stream, a Lambda function, and an S3 bucket.
Set up a custom rule on the event bus to push data to Kinesis Data Firehose.
Generate sample Freshdesk data to validate the ingestion process.
Set up a table in Athena to query the S3 bucket.
Query and analyze data

Pre-requisites

A Freshdesk account (which can be created here).
An AWS account.
AWS Serverless Application Model (AWS SAM CLI), installed and configured.

Adding the Amazon EventBridge app to a Freshdesk account

Log in to your Freshdesk account and navigate to Admin Helpdesk Productivity Apps. Search for EventBridge:
Choose the Amazon EventBridge icon and choose Install.

Enter your AWS account number in the AWS Account ID field.
Enter “OnTicketCreate”, “OnTicketUpdate” in the Events field.
Enter the AWS Region to send the Freshdesk events in the Region field. This walkthrough uses the us-east-1 Region.

Configuring a Freshworks partner event bus in EventBridge

Once previous step is completed, a partner event source is automatically created in the EventBridge console. Copy the partner event source name to a clipboard.

Clone the GitHub repo and deploy the AWS SAM template:

git clone https://github.com/aws-samples/amazon-eventbridge-freshdesk-example.git
cd ./amazon-eventbridge-freshdesk-example
sam deploy --guided

PartnerEventSource – Enter partner event source name copied from the previous step.
S3BucketName – Enter an S3 bucket name to store Freshdesk ticket event data.

The AWS SAM template creates an association between the partner event source and event bus:

    Type: AWS::Events::EventBus
    Properties:
      EventSourceName: !Ref PartnerEventSource
      Name: !Ref PartnerEventSource

The template creates a Kinesis Data Firehose delivery stream, Lambda function, and S3 bucket to process and store the events from Freshdesk tickets. It also adds a rule to the custom event bus with the Kinesis Data Firehose stream as the target:

  PushToFirehoseRule:
    Type: "AWS::Events::Rule"
    Properties:
      Description: Test Freshdesk Events Rule
      EventBusName: !Ref PartnerEventSource
      EventPattern:
        account: [!Ref AWS::AccountId]
      Name: freshdeskeventrule
      State: ENABLED
      Targets:
        - Arn:
            Fn::GetAtt:
              - "FirehoseDeliveryStream"
              - "Arn"
          Id: "idfreshdeskeventrule"
          RoleArn: !GetAtt EventRuleTargetIamRole.Arn

  EventRuleTargetIamRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Sid: ""
            Effect: "Allow"
            Principal:
              Service:
                - "events.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      Path: "/"
      Policies:
        - PolicyName: Invoke_Firehose
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: "Allow"
                Action:
                  - "firehose:PutRecord"
                  - "firehose:PutRecordBatch"
                Resource:
                  - !GetAtt FirehoseDeliveryStream.Arn

Generating sample Freshdesk data to validate the ingestion process:

To generate sample Freshdesk data, login to the Freshdesk account and browse to the “Tickets” screen as shown:

Follow the steps to simulate two customer service operations:

To create a ticket of type “Refund”. Choose the New button and enter the details:
Update an existing ticket and change the priority to “Urgent”.
Within a few minutes of updating the ticket, the data is pushed via the Freshworks connector to the S3 bucket created using the AWS SAM template. To verify this, browse to the S3 bucket and see that a new object with the ticket data is created:

You can also use the S3 Select option under object actions to view the raw JSON data that is sent from the partner system. You are now ready to analyze the data using Athena.

Setting up a table in Athena to query the S3 bucket

If you are familiar with Apache Hive, you may find creating tables on Athena helpful. You can create tables by writing the DDL statement in the query editor or by using the wizard or JDBC driver. To create a table in Athena:

Copy and paste the following DDL statement in the Athena query editor to create a Freshdesk’s events table. For this example, the table is created in the default database.
Replace S3_Bucket_Name in the following query with the name of the S3 bucket created by deploying the previous AWS SAM template:

CREATE EXTERNAL TABLE ` freshdeskevents`(
  `id` string COMMENT 'from deserializer', 
  `detail-type` string COMMENT 'from deserializer', 
  `source` string COMMENT 'from deserializer', 
  `account` string COMMENT 'from deserializer', 
  `time` string COMMENT 'from deserializer', 
  `region` string COMMENT 'from deserializer', 
  `detail` struct<ticket:struct<subject:string,description:string,is_description_truncated:boolean,description_text:string,is_description_text_truncated:boolean,due_by:string,fr_due_by:string,fr_escalated:boolean,is_escalated:boolean,fwd_emails:array<string>,reply_cc_emails:array<string>,email_config_id:string,id:int,group_id:bigint,product_id:string,company_id:string,requester_id:bigint,responder_id:bigint,status:int,priority:int,type:string,tags:array<string>,spam:boolean,source:int,tweet_id:string,cc_emails:array<string>,to_emails:string,created_at:string,updated_at:string,attachments:array<string>,custom_fields:string,changes:struct<responder_id:array<bigint>,ticket_type:array<string>,status:array<int>,status_details:array<struct<id:int,name:string>>,group_id:array<bigint>>>,requester:struct<id:bigint,name:string,email:string,mobile:string,phone:string,language:string,created_at:string>> COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
  'paths'='account,detail,detail-type,id,region,resources,source,time,version') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION  's3://S3_Bucket_Name/'

The table is created on the data stored in S3 and is ready to be queried. Note that table freshdeskevents points at the bucket s3://S3_Bucket_Name/. As more data is added to the bucket, the table automatically grows, providing a near-real-time data analysis experience.

Querying and analyzing data

You can use the following examples to get started with querying the Athena table.

To get all the events data, run:

SELECT * FROM default.freshdeskevents  limit 10

The preceding output has a detail column containing the details related to the ticket. Tickets can be filtered on nested notations to build more insightful queries. Also, the detail-type column provides classification of tickets as new (onTicketCreate) vs updated (onTicketUpdate).

To show new tickets created today with the type “Refund”:

SELECT detail.ticket.subject,detail.ticket.description_text, detail.ticket.type  FROM default.freshdeskevents
where detail.ticket.type = 'Refund' and "detail-type" = 'onTicketCreate' and date(from_iso8601_timestamp(time)) = date(current_date)

All tickets with an “Urgent” priority but not assigned to an agent:

SELECT "detail-type", detail.ticket.responder_id,detail.ticket.priority, detail.ticket.subject, detail.ticket.type  FROM default.freshdeskevents
where detail.ticket.responder_id is null and detail.ticket.priority = 4

Conclusion

In this blog post, you learn how to configure Freshworks partner event source from the Freshdesk console. Once a partner event source is configured, an AWS SAM template is deployed that creates a custom event bus by attaching the partner event source. A Kinesis Data Firehose, Lambda function, and S3 bucket is used to ingest Freshdesk’s ticket events data for analysis. An EventBridge rule is configured to route the event data to the S3 bucket.

Once event data starts flowing into the S3 bucket, an Amazon Athena table is created to run queries and analyze the ticket events data. Alternative customer service data analysis use cases can be built on the architecture shown in this blog.

To learn more about other partner integrations and the native capabilities of EventBridge, visit the AWS Compute Blog.

Using AWS X-Ray tracing with Amazon EventBridge

2021-03-02 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-aws-x-ray-tracing-with-amazon-eventbridge/

AWS X-Ray allows developers to debug and analyze distributed applications. It can be useful for tracing transactions through microservices architectures, such as those typically used in serverless applications. Amazon EventBridge allows you to route events between AWS services, integrated software as a service (SaaS) applications, and your own applications. EventBridge can help decouple applications and produce more extensible, maintainable architectures.

EventBridge now supports trace context propagation for X-Ray, which makes it easier to trace transactions through event-based architectures. This means you can potentially trace a single request from an event producer through to final processing by an event consumer. These may be decoupled application stacks where the consumer has no knowledge of how the event is produced.

This blog post explores how to use X-Ray with EventBridge and shows how to implement tracing using the example application in this GitHub repo.

How it works

X-Ray works by adding a trace header to requests, which acts as a unique identifier. In the case of a serverless application using multiple AWS services, this allows X-Ray to group service interactions together as a single trace. X-Ray can then produce a service map of the transaction flow or provide the raw data for a trace:

When you send events to EventBridge, the service uses rules to determine how the events are routed from the event bus to targets. Any event that is put on an event bus with the PutEvents API can now support trace context propagation.

The trace header is provided as internal metadata to support X-Ray tracing. The header itself is not available in the event when it’s delivered to a target. For developers using the EventBridge archive feature, this means that a trace ID is not available for replay. Similarly, it’s not available on events sent to a dead-letter queue (DLQ).

Enabling tracing with EventBridge

To enable tracing, you don’t need to change the event structure to add the trace header. Instead, you wrap the AWS SDK client in a call to AWSXRay.captureAWSClient and grant IAM permissions to allow tracing. This enables X-Ray to instrument the call automatically with the X-Amzn-Trace-Id header.

For code using the AWS SDK for JavaScript, this requires changes to the way that the EventBridge client is instantiated. Without tracing, you declare the AWS SDK and EventBridge client with:

const AWS = require('aws-sdk')
const eventBridge = new AWS.EventBridge()

To use tracing, this becomes:

const AWSXRay = require('aws-xray-sdk')
const AWS = AWSXRay.captureAWS(require('aws-sdk'))
const eventBridge = new AWS.EventBridge()

The interaction with the EventBridge client remains the same but the calls are now instrumented by X-Ray. Events are put on the event bus programmatically using a PutEvents API call. In a Node.js Lambda function, the following code processes an event to send to an event bus, with tracing enabled:

const AWSXRay = require('aws-xray-sdk')
const AWS = AWSXRay.captureAWS(require('aws-sdk'))
const eventBridge = new AWS.EventBridge()

exports.handler = async (event) => {

  let myDetail = { "name": "Alice" }

  const myEvent = { 
    Entries: [{
      Detail: JSON.stringify({ myDetail }),
      DetailType: 'myDetailType',
      Source: 'myApplication',
      Time: new Date
    }]
  }

  // Send to EventBridge
  const result = await eventBridge.putEvents(myEvent).promise()

  // Log the result
  console.log('Result: ', JSON.stringify(result, null, 2))
}

You can also define a custom tracing header using the new TraceHeader attribute on the PutEventsRequestEntry API model. The unique value you provide overrides any trace header on the HTTP header. The value is also validated by X-Ray and discarded if it does not pass validation. See the X-Ray Developer Guide to learn about generating valid trace headers.

Deploying the example application

The example application consists of a webhook microservice that publishes events and target microservices that consume events. The generated event contains a target attribute to determine which target receives the event:

To deploy these microservices, you must have the AWS SAM CLI and Node.js 12.x installed. to To complete the deployment, follow the instructions in the GitHub repo.

EventBridge can route events to a broad range of target services in AWS. Targets that support active tracing for X-Ray can create comprehensive traces from the event source. The services offering active tracing are AWS Lambda, AWS Step Functions, and Amazon API Gateway. In each case, you can trace a request from the producer to the consumer of the event.

The GitHub repo contains examples showing how to use active tracing with EventBridge targets. The webhook application uses a query string parameter called target to determine which events are routed to these targets.

For X-Ray to detect each service in the webhook, tracing must be enabled on both the API Gateway stage and the Lambda function. In the AWS SAM template, the Tracing: Active property turns on active tracing for the Lambda function. If an IAM role is not specified, the AWS SAM CLI automatically adds the arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess policy to the Lambda function’s execution role. For the API definition, adding TracingEnabled: True enables tracing for this API stage.

When you invoke the webhook’s API endpoint, X-Ray generates a trace map of the request, showing each of the services from the REST API call to putting the event on the bus:

The CloudWatch Logs from the webhook’s Lambda function shows the event that has been put on the event bus:

Tracing with a Lambda target

In the targets-lambda example application, the Lambda function uses the X-Ray SDK and has active tracing enabled in the AWS SAM template:

Resources:
  ConsumerFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.handler
      MemorySize: 128
      Timeout: 3
      Runtime: nodejs12.x
      Tracing: Active

With these two changes, the target Lambda function propagates the tracing header from the original webhook request. When the webhook API is invoked, the X-Ray trace map shows the entire request through to the Lambda target. X-Ray shows two nodes for Lambda – one is the Lambda service and the other is the Lambda function invocation:

Tracing with an API Gateway target

Currently, active tracing is only supported by REST APIs but not HTTP APIs. You can enable X-Ray tracing from the AWS CLI or from the Stages menu in the API Gateway console, in the Logs/Tracing tab:

You cannot currently create an API Gateway target for EventBridge using AWS SAM. To invoke an API endpoint from the EventBridge console, create a rule and select the API as a target. The console automatically creates the necessary IAM permissions for EventBridge to invoke the endpoint.

If the API invokes downstream services with active tracing available, these services also appear as nodes in the X-Ray service graph. Using the webhook application to invoke the API Gateway target, the trace shows the entire request from the initial API call through to the second API target:

Tracing with a Step Functions target

To enable tracing for a Step Functions target, the state machine must have tracing enabled and have permissions to write to X-Ray. The AWS SAM template can enable tracing, define the EventBridge rule and the AWSXRayDaemonWriteAccess policy in one resource:

  WorkFlowStepFunctions:
    Type: AWS::Serverless::StateMachine
    Properties:
      DefinitionUri: definition.asl.json
      DefinitionSubstitutions:
        LoggerFunctionArn: !GetAtt LoggerFunction.Arn
      Tracing:
        Enabled: True
      Events:
        UploadComplete:
          Type: EventBridgeRule
          Properties:
            Pattern:
              account: 
                - !Sub '${AWS::AccountId}'
              source:
                - !Ref EventSource
              detail:
                apiEvent:
                  target:
                    - 'sfn'

      Policies: 
        - AWSXRayDaemonWriteAccess
        - LambdaInvokePolicy:
            FunctionName: !Ref LoggerFunction

If the state machine uses services that support active tracing, these also appear in the trace map for individual requests. Using the webhook to invoke this target, X-Ray now shows the request trace to the state machine and the Lambda function it contains:

Adding X-Ray tracing to existing Lambda targets

To wrap the SDK client, you must enable active tracing and include the AWS X-Ray SDK in the Lambda function’s deployment package. Unlike the AWS SDK, the X-Ray SDK is not included in the Lambda execution environment.

Another option is to include the X-Ray SDK as a Lambda layer. You can build this layer by following the instructions in the GitHub repo. Once deployed, you can attach the X-Ray layer to any Lambda function either via the console or the CLI:

To learn more about using Lambda layers, read “Using Lambda layers to simplify your development process”.

Conclusion

X-Ray is a powerful tool for providing observability in serverless applications. With the launch of X-Ray trace context propagation in EventBridge, this allows you to trace requests across distributed applications more easily.

In this blog post, I walk through an example webhook application with three targets that support active tracing. In each case, I show how to enable tracing either via the console or using AWS SAM and show the resulting X-Ray trace map.

To learn more about how to use tracing with events, read the X-Ray Developer Guide or see the Amazon EventBridge documentation for this feature.

For more serverless learning resources, visit Serverless Land.

Discovering sensitive data in AWS CodeCommit with AWS Lambda

2021-01-04 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/discovering-sensitive-data-in-aws-codecommit-with-aws-lambda-2/

This post is courtesy of Markus Ziller, Solutions Architect.

Today, git is a de facto standard for version control in modern software engineering. The workflows enabled by git’s branching capabilities are a major reason for this. However, with git’s distributed nature, it can be difficult to reliably remove changes that have been committed from all copies of the repository. This is problematic when secrets such as API keys have been accidentally committed into version control. The longer it takes to identify and remove secrets from git, the more likely that the secret has been checked out by another user.

This post shows a solution that automatically identifies credentials pushed to AWS CodeCommit in near-real-time. I also show three remediation measures that you can use to reduce the impact of secrets pushed into CodeCommit:

Notify users about the leaked credentials.
Lock the repository for non-admins.
Hard reset the CodeCommit repository to a healthy state.

I use the AWS Cloud Development Kit (CDK). This is an open source software development framework to model and provision cloud application resources. Using the CDK can reduce the complexity and amount of code needed to automate the deployment of resources.

Overview of solution

The services in this solution are AWS Lambda, AWS CodeCommit, Amazon EventBridge, and Amazon SNS. These services are part of the AWS serverless platform. They help reduce undifferentiated work around managing servers, infrastructure, and the parts of the application that add less value to your customers. With serverless, the solution scales automatically, has built-in high availability, and you only pay for the resources you use.

This diagram outlines the workflow implemented in this blog:

After a developer pushes changes to CodeCommit, it emits an event to an event bus.
A rule defined on the event bus routes this event to a Lambda function.
The Lambda function uses the AWS SDK for JavaScript to get the changes introduced by commits pushed to the repository.
It analyzes the changes for secrets. If secrets are found, it publishes another event to the event bus.
Rules associated with this event type then trigger invocations of three Lambda functions A, B, and C with information about the problematic changes.
Each of the Lambda functions runs a remediation measure:
- Function A sends out a notification to an SNS topic that informs users about the situation (A1).
- Function B locks the repository by setting a tag with the AWS SDK (B2). It sends out a notification about this action (B2).
- Function C runs git commands that remove the problematic commit from the CodeCommit repository (C2). It also sends out a notification (C1).

Walkthrough

The following walkthrough explains the required components, their interactions and how the provisioning can be automated via CDK.

For this walkthrough, you need:

An AWS account
Installed and authenticated AWS CLI (authenticate with an IAM user or an AWS STS security token)
Installed Node.js, TypeScript
Installed git
AWS CDK installed (npm install aws-cdk -g)

Checkout and deploy the sample stack:

After completing the prerequisites, clone the associated GitHub repository by running the following command in a local directory:
```
git clone [email protected]:aws-samples/discover-sensitive-data-in-aws-codecommit-with-aws-lambda.git
```
Open the repository in a local editor and review the contents of cdk/lib/resources.ts, src/handlers/commits.ts, and src/handlers/remediations.ts.
Follow the instructions in the README.md to deploy the stack.

The CDK will deploy resources for the following services in your account.

Using CodeCommit to manage your git repositories

The CDK creates a new empty repository called TestRepository and adds a tag RepoState with an initial value of ok. You later use this tag in the LockRepo remediation strategy to restrict access.

It also creates two IAM groups with one user in each. Members of the CodeCommitSuperUsers group are always able to access the repository, while members of the CodeCommitUsers group can only access the repository when the value of the tag RepoState is not locked.

I also import the CodeCommitSystemUser into the CDK. Since the user requires git credentials in a downloaded CSV file, it cannot be created by the CDK. Instead it must be created as described in the README file.

The following CDK code sets up all the described resources:

const TAG_NAME = "RepoState";

const superUsers = new Group(this, "CodeCommitSuperUsers", { groupName: "CodeCommitSuperUsers" });
superUsers.addUser(new User(this, "CodeCommitSuperUserA", {
    password: new Secret(this, "CodeCommitSuperUserPassword").secretValue,
    userName: "CodeCommitSuperUserA"
}));

const users = new Group(this, "CodeCommitUsers", { groupName: "CodeCommitUsers" });
users.addUser(new User(this, "User", {
    password: new Secret(this, "CodeCommitUserPassword").secretValue,
    userName: "CodeCommitUserA"
}));

const systemUser = User.fromUserName(this, "CodeCommitSystemUser", props.codeCommitSystemUserName);

const repo = new Repository(this, "Repository", {
    repositoryName: "TestRepository",
    description: "The repository to test this project out",
});
Tags.of(repo).add(TAG_NAME, "ok");

users.addToPolicy(new PolicyStatement({
    effect: Effect.ALLOW,
    actions: ["*"],
    resources: [repo.repositoryArn],
    conditions: {
        StringNotEquals: {
            [`aws:ResourceTag/${TAG_NAME}`]: "locked"
        }
    }
}));

superUsers.addToPolicy(new PolicyStatement({
    effect: Effect.ALLOW,
    actions: ["*"],
    resources: [repo.repositoryArn]
}));

Using EventBridge to pass events between components

I use EventBridge, a serverless event bus, to connect the Lambda functions together. Many AWS services like CodeCommit are natively integrated into EventBridge and publish events about changes in their environment.

repo.onCommit is a higher-level CDK construct. It creates the required resources to invoke a Lambda function for every commit to a given repository. The created events rule looks like this:

Note that this event rule only matches commit events in TestRepository. To send commits of all repositories in that account to the inspecting Lambda function, remove the resources filter in the event pattern.

CodeCommit Repository State Change is a default event that is published by CodeCommit if changes are made to a repository. In addition, I define CodeCommit Security Event, a custom event, which Lambda publishes to the same event bus if secrets are discovered in the inspected code.

The sample below shows how you can set up Lambda functions as targets for both type of events.

const DETAIL_TYPE = "CodeCommit Security Event";
const eventBus = new EventBus(this, "CodeCommitEventBus", {
    eventBusName: "CodeCommitSecurityEvents"
});

repo.onCommit("AnyCommitEvent", {
    ruleName: "CallLambdaOnAnyCodeCommitEvent",
    target: new targets.LambdaFunction(commitInspectLambda)
});


new Rule(this, "CodeCommitSecurityEvent", {
    eventBus,
    enabled: true,
    ruleName: "CodeCommitSecurityEventRule",
    eventPattern: {
        detailType: [DETAIL_TYPE]
    },
    targets: [
        new targets.LambdaFunction(lockRepositoryLambda),
        new targets.LambdaFunction(raiseAlertLambda),
        new targets.LambdaFunction(forcefulRevertLambda)
    ]
});

Using Lambda functions to run remediation measures

AWS Lambda functions allow you to run code in response to events. The example defines four Lambda functions.

By comparing the delta to its predecessor, the commitInspectLambda function analyzes if secrets are introduced by a commit. With the CDK, you can create a Lambda function with:

const myLambdaInCDK = new Function(this, "UniqueIdentifierRequiredByCDK", {
    runtime: Runtime.NODEJS_12_X,
    handler: "<handlerfile>.<function name>",
    code: Code.fromAsset(path.join(__dirname, "..", "..", "src", "handlers")),
    // See git repository for complete code
});

The code for this Lambda function uses the AWS SDK for JavaScript to fetch the details of the commit, the differences introduced, and the new content.

The code checks each modified file line by line with a regular expression that matches typical secret formats. In src/handlers/regex.json, I provide a few regular expressions that match common secrets. You can extend this with your own patterns.

If a secret is discovered, a CodeCommit Security Event is published to the event bus. EventBridge then invokes all Lambda functions that are registered as targets with this event. This demo triggers three remediation measures.

The raiseAlertLambda function uses the AWS SDK for JavaScript to send out a notification to all subscribers (that is, CodeCommit administrators) on an SNS topic. It takes no further action.

SNS.publish({
    TopicArn: <TOPIC_ARN>,
    Subject: `[ACTION REQUIRED] Secrets discovered in <repo>`
    Message: `<Your message>
}

The lockRepositoryLambda function uses the AWS SDK for JavaScript to change the RepoState tag from ok to locked. This restricts access to members of the CodeCommitSuperUsers IAM group.

CodeCommit.tagResource({
    resourceArn: event.detail.repositoryArn,
    tags: {
        RepoState: "locked"
    }
})

In addition, the Lambda function uses SNS to send out a notification. The forcefulRevertLambda function runs the following git commands:

git clone <repository>
git checkout <branch>
git reset –hard <previousCommitId>
git push origin <branch> --force

These commands reset the repository to the last accepted commit, by forcefully removing the respective commit from the git history of your CodeCommit repo. I advise you to handle this with care and only activate it on a real project if you fully understand the consequences of rewriting git history.

The Node.js v12 runtime for Lambda does not have a git runtime installed by default. You can add one by using the git-lambda2 Lambda layer. This allows you to run git commands from within the Lambda function.

Finally, this Lambda function also sends out a notification. The complete code is available in the GitHub repo.

Using SNS to notify users

To notify users about secrets discovered and actions taken, you create an SNS topic and subscribe to it via email.

const topic = new Topic(this, "CodeCommitSecurityEventNotification", {
    displayName: "CodeCommitSecurityEventNotification",
});

topic.addSubscription(new subs.EmailSubscription(/* your email address */));

Testing the solution

You can test the deployed solution by running these two sets of commands. First, add a file with no credentials:

echo "Clean file - no credentials here" > clean_file.txt
git add clean_file.txt
git commit clean_file.txt -m "Adds clean_file.txt"
git push

Then add a file containing credentials:

SECRET_LIKE_STRING=$(cat /dev/urandom | env LC_CTYPE=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
echo "secret=$SECRET_LIKE_STRING" > problematic_file.txt
git add problematic_file.txt
git commit problematic_file.txt -m "Adds secret-like string to problematic_file.txt"
git push

This first command creates, commits and pushes an unproblematic file clean_file.txt that will pass the checks of commitInspectLambda. The second command creates, commits, and pushes problematic_file.txt, which matches the regular expressions and triggers the remediation measures.

If you check your email, you soon receive notifications about actions taken by the Lambda functions.

Cleaning up

To avoid incurring charges, delete the resources by running cdk destroy and confirming the deletion.

Conclusion

This post demonstrates how you can implement a solution to discover secrets in commits to AWS CodeCommit repositories. It also defines different strategies to remediate this.

The CDK code to set up all components is minimal and can be extended for remediation measures. The template is portable between Regions and uses serverless technologies to minimize cost and complexity.

For more serverless learning resources, visit Serverless Land.