Tag Archives: CloudWatch Logs

Building well-architected serverless applications: Understanding application health – part 1

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-understanding-application-health-part-1/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the nine serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the Introduction post for a table of contents and explaining the example application.

Question OPS1: How do you evaluate your serverless application’s health?

Evaluating your metrics, distributed tracing, and logging gives you insight into business and operational events, and helps you understand which services should be optimized to improve your customer’s experience. By understanding the health of your Serverless Application, you will know whether it is functioning as expected, and can proactively react to any signals that indicate it is becoming unhealthy.

Required practice: Understand, analyze, and alert on metrics provided out of the box

It is important to understand metrics for every AWS service used in your application so you can decide how to measure its behavior. AWS services provide a number of out-of-the-box standard metrics to help monitor the operational health of your application.

As these metrics are generated automatically, it is a simple way to start monitoring your application and can also be augmented with custom metrics.

The first stage is to identify which services the application uses. The airline booking component uses AWS Step Functions, AWS Lambda, Amazon SNS, and Amazon DynamoDB.

When I make a booking, as shown in the Introduction post, AWS services emit metrics to Amazon CloudWatch. These are processed asynchronously without impacting the application’s performance.

There are two default CloudWatch dashboards to visualize key metrics quickly: per service and cross service.

Per service

To view the per service metrics dashboard, I open the CloudWatch console.

per-service-metrics-dashboardI select a service where Overview is shown, such as Lambda. Now I can view the metrics for all Lambda functions in the account.

per-service-metrics-lambdaCross service

To see an overview of key metrics across all AWS services, open the CloudWatch console and choose View cross service dashboard.

cross-service-metrics-dashboardI see a list of all services with one or two key metrics displayed. This provides a good overview of all services your application uses.

Alerting

The next stage is to identify the key metrics for comparison and set up alerts for under- and over-performing services. Here are some recommended metrics to alarm on for a number of AWS services.

Alerts can be configured manually or via infrastructure as code tools such as the AWS Serverless Application Model, AWS CloudFormation, or third-party tools.

To configure a manual alert for Lambda function errors using CloudWatch Alarms:

  1. I open the CloudWatch console and select Alarms and select Create Alarm.
  2. I choose Select Metric and from AWS Namespaces, select Lambda, Across All Functions and select Errors and select Select metric.add-metric-to-alert
  3. I change the Statistic to Sum and the Period to 1 minute.metric-values
  4. Under Conditions, I select a Static threshold Greater than 1 and select Next.

Alarms can also be created using anomaly detection rather than static values if there is a discernible pattern or trend. Anomaly detection looks at past metric data and uses machine learning to create a model of expected values. Alerts can then be configured if they fall outside this band of “normal” values. I use a Static threshold for this alarm.

  1. For the notification, I set the trigger to alarm to an existing SNS topic with my email address, then choose Next.metric-notification
  2. I enter a descriptive alarm name such as serverlessairline-lambda-prod-errors > 1, select Next, and choose Create alarm.

I have now manually set up an alarm.

Use CloudWatch composite alarms to combine multiple alarms to reduce noise and focus on critical issues. For example, a single alarm could trigger if there are both Lambda function errors as well as high Lambda concurrent executions.

It is simpler and more scalable to include alerting within infrastructure as code. Here is an example of alerting programmatically using CloudFormation.

I view the out of the box standard metrics and in this example, manually create an alarm for Lambda function errors.

Improvement plan summary:

  1. Understand what metrics and dimensions each managed service used provides.
  2. Configure alerts on relevant metrics for when services are unhealthy.

Good practice: Use structured and centralized logging

Central logging provides a single place to search and analyze logs. Structured logging means selecting a consistent log format and content structure to simplify querying across multiple components.

To identify a business transaction across components, such as a particular flight booking, log operational information from upstream and downstream services. Add information such as customer_id along with business outcomes such as order=accepted or order=confirmed. Make sure you are not logging any sensitive or personal identifying data in any logs.

Use JSON as your logging output format. Log multiple fields in a single object or dictionary rather than many one line messages for simpler searching.

Here is an example of a structured logging format.

The airline booking component, which is written in Python, currently uses a shared library with a separate log processing stack.

Embedded Metrics Format is a simpler mechanism to replace the shared library and use structured logging. CloudWatch Embedded Metrics adds environmental metadata such as Lambda Function version and also automatically extracts custom metrics so you can visualize and alarm on them. There are open-source client libraries available for Node.js and Python.

I then add embedded metrics to the individual confirm booking module with the following steps:

  1. I install the aws-embedded-metrics library using the instructions.
  2. In the function init code, I import the module and create a metric_scope with the following code

from aws_embedded_metrics import metric_scope
@metric_scope

  1. In the function handler, I log the generated bookingReference with the following code.

metrics.set_property("BookingReference", ret["bookingReference"])

In this example I also log the entire incoming event details.

metrics.set_property("event", event)

It is best practice to only log what is required to avoid unnecessary costs. Ensure the event does not have any sensitive or personal identifying data which is available to anyone who has access to the logs.

To avoid the duplicate logging in this example airline application which adds cost, I remove the existing shared library logger.*() lines.

When I make a booking, the CloudWatch log message is in structured JSON format. It contains the properties I set event, BookingReference, as well as function metadata.

I can then search for all log activity related to a specific booking across multiple functions with booking_id. I can track customer activity across multiple bookings using customer_id.

Logging is often created as a shared library resource which all functions reference. Another option is using Lambda Layers, which lets functions import additional code such as external libraries. Multiple functions can share this code.

Improvement plan summary:

  1. Log request identifiers from downstream services, component name, component runtime information, unique correlation identifiers, and information that helps identify a business transaction.
  2. Use JSON as the logging output. Prefer logging entire objects/dictionaries rather than many one line messages. Mask or remove sensitive data when logging.
  3. Minimize logging debugging information to a minimum as they can incur both costs and increase noise to signal ratio

Conclusion

Evaluating serverless application health helps understand which services should be optimized to improve your customer’s experience. I cover out of the box metrics and alerts, as well as structured and centralized logging.

This well-architected question will be continued in an upcoming post where I look at custom metrics and distributed tracing.

Analyzing API Gateway custom access logs for custom domain names

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/analyzing-api-gateway-custom-access-logs-for-custom-domain-names/

This post is courtesy of Taka Matsumoto, Cloud Support Engineer, AWS

If you are using custom domain names in Amazon API Gateway, it can be useful to gain insights into requests sent to each custom domain name. Although API Gateway provides CloudWatch metrics and options to deliver request logs to Amazon CloudWatch Logs, there is no pre-defined metric or log specific to custom domain names. If there is more than one custom domain name mapped to a single API, understanding the quantity and type of requests by domain name may help understand request patterns.

Using the custom access logging option in API Gateway enables delivery of custom logs to CloudWatch Logs, which can be analyzed using CloudWatch Logs Insights. This blog post walks through the steps to create a CloudWatch log group for API custom access logging, and uses CloudWatch Logs Insights for analysis.

Overview

In the tutorial, you create a CloudWatch log group for custom access logging. You then enable custom access logging for an API stage associated with a custom domain name. The IAM role used in this tutorial must be able to create and update the relevant resources in CloudWatch, IAM, and API Gateway. For this tutorial, use the US East (N. Virginia) Region. In the next steps, the tutorial covers:

  1. Creating a CloudWatch Log group.
  2. Creating an IAM role for access logging.
  3. Enabling custom access logging.
  4. Testing an API using a custom domain name.
  5. Analyzing Logs in CloudWatch Logs Insights.

Create a CloudWatch Log group

Before enabling custom access logging for your API’s stage, create a CloudWatch log group to deliver custom logs. Create a log group called APIGateway_CustomDomainLogs by following these steps:

  1. Go to the CloudWatch Logs console.
  2. Under Actions, click on Create log group and name the log group APIGateway_CustomDomainLogs. Learn more about creating a log group in Working with Log Groups and Log Streams.

Create log group

Create an IAM role for access logging

You must use an IAM role to deliver logs from API Gateway to CloudWatch Logs. If there is no IAM role already available for logging in API Gateway, create a new IAM role:

  1. Navigate to the IAM console.
  2. Under Roles, choose Create role.
  3. Select API Gateway for the service and choose Next: Permissions.

    IAM selection

  4. Leave the attached IAM policy (AmazonAPIGatewayPushToCloudWatchLogs), and choose Next: Tags.
  5. No tags are required for this tutorial. Leave these blank and choose Next: Review.
  6. Name the role APIGatewayCloudWatchLogsRole and choose Create role.
    Create role

3. Enable custom access logging

Now you enable custom access logging. Select one of the API stages that you invoke through a custom domain name:

  1. If there is no CloudWatch log role set for API Gateway, go to the API Gateway Settings page to add the CloudWatch log role ARN. The IAM role ARN follows this format: arn:aws:iam::123456789012:role/APIGatewayCloudWatchLogsRole.
  2. For your API with a custom domain name, go to Stages page and select the Logs/Tracing tab.
  3. Enter the following fields:
    – For CloudWatch Group, add the ARN (for example, arn:aws:logs:us-east-1:123456789012:log-group:APIGateway_CustomDomainLogs).
    – For Log Format, enter:

    {
        "RequestId": "$context.requestId",
        "DomainName": "$context.domainName",
        "APIId": "$context.apiId",
        "RequestPath": "$context.path",
        "RequestTime": "$context.requestTime",
        "SourceIp": "$context.identity.sourceIp",
        "ResourcePath": "$context.resourcePath",
        "Stage": "$context.stage"
    }

    Custom access logging

  4.  Choose Save changes.

Learn more about custom access logging setup in Set up API Logging Using the API Gateway Console. In the Log Format configuration, $context variables retrieve a domain name as well as other API request information. Learn more in $context Variables for Data Models, Authorizers, Mapping Templates, and CloudWatch Access Logging.

4. Test invoke an API using a custom domain name

Once you enable custom access logging, invoke the API using the custom domain name. The logs appear in the specified CloudWatch log group shortly after. A sample response in the CloudWatch log stream looks like the following:

{
    "RequestId": "1b1ebe20-817f-11e9-a796-f5e0ffdcdac7",
    "DomainName": "test.example.com”,
    "APIId": "12345abcde",
    "RequestPath": "/dev",
    "RequestTime": "28/May/2019:19:30:52 +0000",
    "SourceIp": "1.2.3.4",
    "ResourcePath": "/",
    "Stage": "dev"
}

5. Analyze logs in CloudWatch Logs Insights

After setting up the custom access logs, you can query against them to find more insights using the custom domain name.

  1. Go to CloudWatch Logs Insights console.
  2. In the log group text field, select the CloudWatch log group, APIGateway_CustomDomainLogs.
  3. Enter the following query.
    fields @timestamp, @message
    | filter DomainName like /(?i)(test.example.com)/

    This query returns a list of log entries for the custom domain called test.example.com. To run in your account, replace this value with your custom domain name.
    Custom filter

If your network security does not allow the use of web sockets, you cannot access the CloudWatch Logs Insights console. Instead, use the CloudWatch Logs Insights query capabilities using the API. Here are some example queries use the AWS CLI:

1. A sample command for aws logs start-query:

aws logs start-query --log-group-name APIGateway_CustomDomainLogs --start-time 1557085225000 --end-time 1559763625000 --query-string 'fields @timestamp, @message | filter DomainName like /(?i)(test.example.com)/'

The response looks like:

{
    "queryId": "a1234567-bfde-47c7-9d44-41ebed011c66"
}

Learn more about start-query command in aws logs start-query.

2. Run aws logs get-query-results to retrieve the result of the query. A sample command for aws logs get-query-results:

aws logs get-query-results --query-id a1234567-bfde-47c7-9d44-41ebed011c66

The response looks like:

{
    "results": [
        [
            {
                "field": "@timestamp",
                "value": "2019-05-28 19:30:52.494"
            },
            {
                "field": "@message",
                "value": "{\"RequestId\": \"12345678-7cb3-11e9-8896-c30af5588427\",\"DomainName\":\"test.exmaple.com\"}"
            },
            {
                "field": "@ptr",
                "value": "CmEKKAokOTYxNTQyNjM4MjQzOkFQSUdhdGV3YXlfQ3VzdG9tRG9tYWluEAISNRoYAgXM/KQtAAAAAA0L2foABc5YFyAAAAHSIAEoxviFhK4tMM+HhoSuLTgCQOoBSN4OUN4KEAAYAQ=="
            }
        ]
    ],
    "statistics": {
        "recordsMatched": 1.0,
        "recordsScanned": 1.0,
        "bytesScanned": 153.0
    },
    "status": "Complete"
}

You can use other queries to filter the results based on other attributes in the logs. Learn more about get-query-results command in aws logs get-query-results.

Conclusion

In this blog post, I show how to deliver custom access logs from API Gateway to CloudWatch Logs. I also show how to use CloudWatch Logs Insights to run a query against the logs for custom domain name metrics, which help provide insights into custom domain name usage.

To learn more about the query syntax, visit CloudWatch Logs Insights Query Syntax.

Integrating Amazon EventBridge into your serverless applications

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/integrating-amazon-eventbridge-into-your-serverless-applications/

Event-driven architecture enables developers to create decoupled services across applications. When combined with the range of managed services available in AWS, this approach can make applications highly scalable and flexible, with minimal maintenance.

Many services in the AWS Cloud produce events, including integrated software as a service (SaaS) applications. Your custom applications can also produce and consume events. With so many events from different sources, you need a way to coordinate this traffic. Amazon EventBridge is a serverless event bus that helps manage how all these events are routed throughout your applications.

The routing logic is managed by rules that evaluate the events against event expressions. EventBridge delivers matching events to targets such as AWS Lambda, so you can process events with your custom business logic.

EventBridge architecture

In this blog post, I show how you can build an event producer and consumer in AWS Lambda, and create a rule to route events. The code uses the AWS Serverless Application Model (SAM), so you can deploy the application in your own AWS Account. This walkthrough uses AWS resources that are covered by the AWS Free Tier.

To set up the example application, visit the GitHub repo and follow the instructions in the README.md file.

How the example application works

In this example, a banking application for automated teller machine (ATM) produces events about transactions. It sends the events to EventBridge, which then uses rules defined by the application to route accordingly. There are three downstream services consuming a subset of these events.

 

Sample ATM application architecture

In the repo, the atmProducer subdirectory contains handler.js, which represents the ATM service producing events. This code is a Lambda handler written in Node.js, and publishes events to EventBridge via the AWS SDK using this line of code:

const result = await eventbridge.putEvents(params).promise()

This directory also contains events.js, listing several test transactions in an Entries array. A single event is defined as follows:

    {
      // Event envelope fields
      Source: 'custom.myATMapp',
      EventBusName: 'default',
      DetailType: 'transaction',
      Time: new Date(),

      // Main event body
      Detail: JSON.stringify({
        action: 'withdrawal',
        location: 'MA-BOS-01',
        amount: 300,
        result: 'approved',
        transactionId: '123456',
        cardPresent: true,
        partnerBank: 'Example Bank',
        remainingFunds: 722.34
      })
    }

The Detail section of the event specifies transaction attributes. These include the location of the ATM, the amount, the partner bank, and the result of the transaction.

The handler.js file in the atmConsumer subdirectory contains three functions:

exports.case1Handler = async (event) => {
  console.log('--- Approved transactions ---')
  console.log(JSON.stringify(event, null, 2))
}

exports.case2Handler = async (event) => {
  console.log('--- NY location transactions ---')
  console.log(JSON.stringify(event, null, 2))
}

exports.case3Handler = async (event) => {
  console.log('--- Unapproved transactions ---')
  console.log(JSON.stringify(event, null, 2))
}

Each function receives transaction events, which are logged via the console.log statements to Amazon CloudWatch Logs. The consumer functions operate independently of the producer and are unaware of the source of the events.

The routing logic is contained in the EventBridge rules that are deployed by the application’s SAM template. The rules evaluate the incoming stream of events, and route matching events to the target Lambda functions.

Running the ATM application

After deploying the sample application, you can generate test events by invoking the atmProducer Lambda function:

  1. Open the Lambda console in the same Region where you deployed the SAM application.

    AWS Lambda console

  2. There are four Lambda functions with the prefix atm-demo. Choose the atmProducerFn function, then choose Test.

    Testing the Lambda function

  3. For Event name, enter Test, then choose Create. Choose Test once more to invoke the function.

    Invoking the Lambda function

This puts the sample events onto the EventBridge default event bus. Next, inspect the logs from the three consumer functions to see which events route to each function:

  1. Navigate to the CloudWatch console in the same Region. Select Logs, then Log groups from the menu.

    CloudWatch console

  2. Select the log group containing atmConsumerCase1. You see two streams representing the two transactions approved by the ATM. Choose a log stream to view the output.

    Log stream output

  3. Navigate back to the list of log groups, then select the log group containing atmConsumerCase2. You see streams for the two transactions matching the “New York” location filter.

    Transaction matching "New York" filter

  4. Navigate back once more to the list of log groups and select the log group containing atmConsumerCase3. Open the stream to see the denied transaction.

    Denied transaction in logs

How EventBridge rules work

From the AWS Management Console, navigate to EventBridge from the Services dropdown. Choose Rules from the menu to see the rules created by the application deployment.

Amazon EventBridge rules

Choose one of the rules to see the configuration. Each rule is associated with a single event bus (the default bus for this application), which means it evaluates every event published to the bus. Scroll down to view the Event pattern used by the rule:

EventBridge event patterns

The event pattern is a JSON object with the same structure as the events they match. Each matching value must be wrapped in an array, and you can provide multiple values if necessary. If you use multiple values, this is compared using ‘or’ logic – only ones of the values needs to match the incoming event.

You can also use content-based filtering in event patterns to create more complex rules to match dynamically. The prefix operator above matches any event where the detail.location value begins with “NY-“.

Integrating EventBridge into SAM templates

You can build and test rules manually in the EventBridge console, which can help in the development process as you refine event patterns. However, once you are ready to deploy your application, it’s easier to use a framework like SAM to launch all your serverless resources consistently.

In the example application, open the template.yaml file to view the SAM template, which defines the four Lambda functions. This shows two different ways to integrate the Lambda functions with EventBridge. The first approach uses the Events property to configure the EventBridge rule:

  atmConsumerCase3Fn:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: atmConsumer/
      Handler: handler.case3Handler
      Runtime: nodejs12.x
      Events:
        Trigger:
          Type: CloudWatchEvent 
          Properties:
            Pattern:
              source:
                - custom.myATMapp
              detail-type:
                - transaction                
              detail:
                result:
                  - "anything-but": "approved"

The syntax defines an event that invokes the Lambda function. In the YAML, you only need to define the pattern, and SAM automatically creates an IAM role with the required permissions. The pattern is the YAML equivalent of the Event Pattern shown in the console earlier.

This example automatically creates the rule on the default event bus, which exists in every AWS account. To associate the rule with a custom event bus, you can add the EventBusName to the template. If this property is missing, SAM uses the default bus.

In the second approach to defining an EventBridge configuration in SAM, you can separate the resources more clearly in the template. First, you define the Lambda function:

  atmConsumerCase1Fn:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: atmConsumer/
      Handler: handler.case1Handler
      Runtime: nodejs12.x

Next, the rule is defined using an AWS::Events::Rule resource. The properties define the event pattern as before, but can also specify targets. Whereas the first method can only create a single, implied target (the parent Lambda function), you can explicitly define multiple targets using this syntax:

  EventRuleCase1: 
    Type: AWS::Events::Rule
    Properties: 
      Description: "Approved transactions"
      EventPattern: 
        source: 
          - "custom.myATMapp"
        detail-type:
          - transaction   
        detail: 
          result: 
            - "approved"
      State: "ENABLED"
      Targets: 
        - 
          Arn: 
            Fn::GetAtt: 
              - "atmConsumerCase1Fn"
              - "Arn"
          Id: "atmConsumerTarget1"

Finally, there is an AWS::Lambda::Permission resource that grants permission to EventBridge to invoke the target:

  PermissionForEventsToInvokeLambda: 
    Type: AWS::Lambda::Permission
    Properties: 
      FunctionName: 
        Ref: "atmConsumerCase1Fn"
      Action: "lambda:InvokeFunction"
      Principal: "events.amazonaws.com"
      SourceArn: 
        Fn::GetAtt: 
          - "EventRuleCase1"
          - "Arn"

For simple integrations where one Lambda function is invoked by one rule, the first approach is recommended. If you have complex routing logic, or you are connecting to resources outside of your SAM template, the second method is the better choice.

Conclusion

This walkthrough shows how to build a simple serverless application that produces and consumes events, using the EventBridge event bus to managing the routing. Using event patterns in rules, you can centralize the routing logic at EventBridge, helping reduce code in your downstream consuming services.

SAM templates make it simple to create EventBridge rules and define Lambda functions as targets. I show two ways to use SAM statements to define IAM permissions implicitly or explicitly. This allows you to decouple the services within your serverless applications, and take advantage of the routing offered by EventBridge with minimal configuration in your SAM templates.

To learn more, visit the Amazon EventBridge documentation.

 

ICYMI: Serverless pre:Invent 2019

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/icymi-serverless-preinvent-2019/

With Contributions from Chris Munns – Sr Manager – Developer Advocacy – AWS Serverless

The last two weeks have been a frenzy of AWS service and feature launches, building up to AWS re:Invent 2019. As there has been a lot announced we thought we’d ship an ICYMI post summarizing the serverless service specific features that have been announced. We’ve also dropped in some service announcements from services that are commonly used in serverless application architectures or development.

AWS re:Invent

AWS re:Invent 2019

We also want you to know that we’ll be talking about many of these features (as well as those coming) in sessions at re:Invent.

Here’s what’s new!

AWS Lambda

On September 3, AWS Lambda started rolling out a major improvement to how AWS Lambda functions work with your Amazon VPC networks. This change brings both scale and performance improvements, and addresses several of the limitations of the previous networking model with VPCs.

On November 25, Lambda announced that the rollout of this new capability has completed in 6 additional regions including US East (Virginia) and US West (Oregon).

New VPC to VPC NAT for Lambda functions

New VPC to VPC NAT for Lambda functions

On November 18, Lambda announced three new runtime updates. Lambda now supports Node.js 12, Java 11, and Python 3.8. Each of these new runtimes has new language features and benefits so be sure to check out the linked release posts. These new runtimes are all based on an Amazon Linux 2 execution environment.

Lambda has released a number of controls for both stream and async based invocations:

  • For Lambda functions consuming events from Amazon Kinesis Data Streams or Amazon DynamoDB Streams, it’s now possible to limit the retry count, limit the age of records being retried, configure a failure destination, or split a batch to isolate a problem record. These capabilities will help you deal with potential “poison pill” records that would previously cause streams to pause in processing.
  • For asynchronous Lambda invocations, you can now set the maximum event age and retry attempts on the event. If either configured condition is met, the event can be routed to a dead letter queue (DLQ), Lambda destination, or it can be discarded.

In addition to the above controls, Lambda Destinations is a new feature that allows developers to designate an asynchronous target for Lambda function invocation results. You can set one destination for a success, and another for a failure. This unlocks really useful patterns for distributed event-based applications and can reduce code to send function results to a destination manually.

Lambda Destinations

Lambda Destinations

Lambda also now supports setting a Parallelization Factor, which allows you to set multiple Lambda invocations per shard for Amazon Kinesis Data Streams and Amazon DynamoDB Streams. This allows for faster processing without the need to increase your shard count, while still guaranteeing the order of records processed.

Lambda Parallelization Factor diagram

Lambda Parallelization Factor diagram

Lambda now supports Amazon SQS FIFO queues as an event source. FIFO queues guarantee the order of record processing compared to standard queues which are unordered. FIFO queues support messaging batching via a MessageGroupID attribute which allows for parallel Lambda consumers of a single FIFO queue. This allows for high throughput of record processing by Lambda.

Lambda now supports Environment Variables in AWS China (Beijing) Region, operated by Sinnet and the AWS China (Ningxia) Region, operated by NWCD.

Lastly, you can now view percentile statistics for the duration metric of your Lambda functions. Percentile statistics tell you the relative standing of a value in a dataset, and are useful when applied to metrics that exhibit large variances. They can help you understand the distribution of a metric, spot outliers, and find hard-to-spot situations that create a poor customer experience for a subset of your users.

AWS SAM CLI

AWS SAM CLI deploy command

AWS SAM CLI deploy command

The SAM CLI team simplified the bucket management and deployment process in the SAM CLI. You no longer need to manage a bucket for deployment artifacts – SAM CLI handles this for you. The deployment process has also been streamlined from multiple flagged commands to a single command, sam deploy.

AWS Step Functions

One of the powerful features of Step Functions is its ability to integrate directly with AWS services without you needing to write complicated application code. Step Functions has expanded its integration with Amazon SageMaker to simplify machine learning workflows, and added a new integration with Amazon EMR, making it faster to build and easier to monitor EMR big data processing workflows.

Step Functions step with EMR

Step Functions step with EMR

Step Functions now provides the ability to track state transition usage by integrating with AWS Budgets, allowing you to monitor and react to usage and spending trends on your AWS accounts.

You can now view CloudWatch Metrics for Step Functions at a one-minute frequency. This makes it easier to set up detailed monitoring for your workflows. You can use one-minute metrics to set up CloudWatch Alarms based on your Step Functions API usage, Lambda functions, service integrations, and execution details.

AWS Step Functions now supports higher throughput workflows, making it easier to coordinate applications with high event rates.

In US East (N. Virginia), US West (Oregon), and EU (Ireland), throughput has increased from 1,000 state transitions per second to 1,500 state transitions per second with bucket capacity of 5,000 state transitions. The default start rate for state machine executions has also increased from 200 per second to 300 per second, with bucket capacity of up to 1,300 starts in these regions.

In all other regions, throughput has increased from 400 state transitions per second to 500 state transitions per second with bucket capacity of 800 state transitions. The default start rate for AWS Step Functions state machine executions has also increased from 25 per second to 150 per second, with bucket capacity of up to 800 state machine executions.

Amazon SNS

Amazon SNS now supports the use of dead letter queues (DLQ) to help capture unhandled events. By enabling a DLQ, you can catch events that are not processed and re-submit them or analyze to locate processing issues.

Amazon CloudWatch

CloudWatch announced Amazon CloudWatch ServiceLens to provide a “single pane of glass” to observe health, performance, and availability of your application.

CloudWatch ServiceLens

CloudWatch ServiceLens

CloudWatch also announced a preview of a capability called Synthetics. CloudWatch Synthetics allows you to test your application endpoints and URLs using configurable scripts that mimic what a real customer would do. This enables the outside-in view of your customers’ experiences, and your service’s availability from their point of view.

On November 18, CloudWatch launched Embedded Metric Format to help you ingest complex high-cardinality application data in the form of logs and easily generate actionable metrics from them. You can publish these metrics from your Lambda function by using the PutLogEvents API or for Node.js or Python based applications using an open source library.

Lastly, CloudWatch announced a preview of Contributor Insights, a capability to identify who or what is impacting your system or application performance by identifying outliers or patterns in log data.

AWS X-Ray

X-Ray announced trace maps, which enable you to map the end to end path of a single request. Identifiers will show issues and how they affect other services in the request’s path. These can help you to identify and isolate service points that are causing degradation or failures.

X-Ray also announced support for Amazon CloudWatch Synthetics, currently in preview. X-Ray supports tracing canary scripts throughout the application providing metrics on performance or application issues.

X-Ray Service map with CloudWatch Synthetics

X-Ray Service map with CloudWatch Synthetics

Amazon DynamoDB

DynamoDB announced support for customer managed customer master keys (CMKs) to encrypt data in DynamoDB. This allows customers to bring your own key (BYOK) giving you full control over how you encrypt and manage the security of your DynamoDB data.

It is now possible to add global replicas to existing DynamoDB tables to provide enhanced availability across the globe.

Currently under preview, is another new DynamoDB capability to identify frequently accessed keys and database traffic trends. With this you can now more easily identify “hot keys” and understand usage of your DynamoDB tables.

CloudWatch Contributor Insights for DynamoDB

CloudWatch Contributor Insights for DynamoDB

Last but far from least for DynamoDB, is adaptive capacity, a feature which helps you handle imbalanced workloads by isolating frequently accessed items automatically and shifting data across partitions to rebalance them. This helps both reduce cost by enabling you to provision throughput for a more balanced out workload vs. over provisioning for uneven data access patterns.

AWS Serverless Application Repository

The AWS Serverless Application Repository (SAR) now offers Verified Author badges. These badges enable consumers to quickly and reliably know who you are. The badge will appear next to your name in the SAR and will deep-link to your GitHub profile.

SAR Verified developer badges

SAR Verified developer badges

AWS Code Services

AWS CodeCommit launched the ability for you to enforce rule workflows for pull requests, making it easier to ensure that code has pass through specific rule requirements. You can now create an approval rule specifically for a pull request, or create approval rule templates to be applied to all future pull requests in a repository.

AWS CodeBuild added beta support for test reporting. With test reporting, you can now view the detailed results, trends, and history for tests executed on CodeBuild for any framework that supports the JUnit XML or Cucumber JSON test format.

CodeBuild test trends

CodeBuild test trends

AWS Amplify and AWS AppSync

Instead of trying to summarize all the awesome things that our peers over in the Amplify and AppSync teams have done recently we’ll instead link you to their own recent summary: “A round up of the recent pre-re:Invent 2019 AWS Amplify Launches”.

AWS AppSync

AWS AppSync

Still looking for more?

We only covered a small bit of all the awesome new things that were recently announced. Keep your eyes peeled for more exciting announcements next week during re:Invent and for a future ICYMI Serverless Q4 roundup. We’ll also be kicking off a fresh series of Tech Talks in 2020 with new content helping to dive deeper on everything new coming out of AWS for serverless application developers.

New AWS Lambda controls for stream processing and asynchronous invocations

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/new-aws-lambda-controls-for-stream-processing-and-asynchronous-invocations/

Today AWS Lambda is introducing new controls for asynchronous and stream processing invocations. These new features allow you to customize responses to Lambda function errors and build more resilient event-driven and stream-processing applications.

Stream processing function invocations

When processing data from event sources such as Amazon Kinesis Data Streams, and Amazon DynamoDB Streams, Lambda reads records in batches via shards. A shard is a uniquely identified sequence of data records. Your function is then invoked to process records from the batch “in order.” If an error is returned, Lambda retries the batch until processing succeeds or the data expires. This retry behavior is desirable in many cases, but not all:

  1. Until the issue is resolved, no data in the shard is processed. A single malformed record can prevent processing on an entire shard since “in order” guarantee ensures that failed batches are retried until the data record expires. This is often referred to as a “poison pill” event in that it prevents the rest of the system from processing data.
  2. In some cases, it may not be helpful to retry if subsequent invocations will also fail.
  3. If the function is not idempotent, retrying might produce unexpected side effects, and may result in duplicate outputs (for example, multiple database entries or business transactions).
Stream processing function invocationsFigure 1: Default stream invocation retry logs.

The new Lambda controls for stream processing invocations

With the new customizable controls, users are able to control how function errors and retries impact stream processing function invocations. A new event source-mapping configuration subresource (shown below), allows developers to clearly specify which controls will apply specifically to invocations.

DestinationConfig: {
    OnFailure: {
       Destination: SNS/SQS arn (String)
    }
}
Figure 2: Stream processing destination configuration.
{
    "MaximumRetryAttempts": integer,
    "BisectBatchOnFunctionError" : boolean,
    "MaximumRecordAgeInSeconds" : integer
}
Figure 3: Stream processing failure configuration.

MaximumRetryAttempts

Set a maximum number of retry attempts for batches before they can be skipped to unblock processing and prevent duplicate outputs.

Minimum: 0 | maximum: 10,000 | default: 10,000

MaximumRecordAgeInSeconds

Define a record maximum age in seconds, with expired records skipped to allow processing to continue. Data records that do not get successfully processed within the defined age of submission are skipped

Minimum: 60 | default/maximum: 604,800

BisectBatchOnFunctionError

This gives the option to recursively split the failed batch and retry on a smaller subset of records, eventually isolating the metadata causing the error.

Default: false

On-failure Destination

When either MaximumRetryAttempts or MaximumRecordAgeInSeconds reaches the specified value, a record will be skipped. If ‘Destination’ is set, metadata about the skipped records can be sent to a target ARN (i.e. Amazon SQS or Amazon SNS). If no target is configured, then the record will be dropped.

Getting started with error handling for stream processing invocations

Here you will see how to use the new controls to create a customized error handling configuration for stream processing invocations. You will set the maximum retry attempts to 1, the maximum record age to 60s, and send the metadata of skipped record to an Amazon Simple Queue Service (SQS) queue. First create a Kinesis Stream to put records into, and create an SQS queue to hold the metadata of retry exhausted or expired data records.

  1. Go to the Lambda console and choose Create function.
  2. Ensure that Author from scratch is selected, and enter a function a name such as “customStreamErrorExample.”
  3. Select Runtime as Node.js.12.x.
  4. Choose Create function.
    To enable Kinesis as an event trigger and to send metadata of a failed invocation to the SQS standard queue, you must give the Lambda execution role the required permissions.
  5. Scroll down to the Execution Role section and select the view link below the Existing role drop-down.
    Existing role selection
  6. Choose Add inline policy > JSON, then paste the following into the text box replacing {yourSQSarn} with the ARN of the SQS queue you created earlier and replacing {yourKinesisarn} with the ARN of the stream you created earlier. Choose Review policy.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "sqs:SendMessage",
                    "kinesis:GetShardIterator",
                    "kinesis:GetRecords",
                    "kinesis:DescribeStream"
                ],
                "Resource": [
                    "{yourSQSarn}",
                    "{yourKinesisarn}"
                ]
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": "kinesis:ListStreams",
                "Resource": "*"
            }
        ]
    }
    
  7. Give the policy a name such as CustomSQSKinesis and choose Create policy.
  8. Navigate back to your Lambda function and choose Add trigger.
  9. Select Kinesis from the drop-down. Select your stream in the Kinesis stream dropdown.
  10. To see the new control options, choose the drop-down arrow on the Additional settings section.
  11. Paste the ARN of the previously created SQS queue (refer to this link to find the ARN).Additional settings
  12. Set the Maximum retry attempts to 1. Set the Maximum record age to 60. Choose Add.
    The Kinesis trigger has now been configured and the Lambda function has the required permissions to write to SQS and CloudWatch Logs.Success messageKinesis trigger added
  13. Choose the Lambda icon in the Designer section. Scroll down to the Function code section and paste the following code:
    exports.handler = async (event) => {
        // TODO implement
        console.log(event)
        const response = {
            statusCode: 200,
            body: event,dfh
        };
        return response;
    };

    This code has a syntax error in order to force a failure response.

    Next, you will trigger the Lambda function by adding a record to the Kinesis Stream via the AWS CLI. See how to install the AWS CLI if you have not previously done so.

  14. In your preferred terminal run the following command, replacing {YourStreamName} with the name of the stream you have created.
    aws kinesis put-record --stream-name {YourStreamName} --partition-key 123 --data testdata

    You should see a response similar to this (your sequence number will be different):
    Terminal response

  15. In the AWS Lambda console, choose Monitoring > View logs in CloudWatch and select the most recent log group.CloudWatch Logs
  16. As you can see the Lambda function failed as expected, but this time with only a single retry attempt. This is because the max retry attempt value was set to 1 and the record is younger than the max record age that was set to 60.
  17. In the AWS Management Console navigate to your SQS queue, Services > SQS.
  18. Select your queue and choose Queue Actions > View/Delete Messages > Start Polling for Messages.View/delete messages

Here you will see the metadata of your failed invocation. It has successfully been sent to the SQS destination for further investigation or processing.

Asynchronous function invocations

When a function is asynchronously invoked, Lambda sends the event to a queue before it is processed by your function. Invocations that result in an exception in the function code are retried twice with a delay of one minute before the first retry and two minutes before the second. Some invocations may never run due to a throttle or service fault: these are retried with exponential backoff until the invocation is six hours old.

Default asynchronous invocation retry logs

This retry behaviour shown above, works well in situations where every invocation must run at least once. However, in situations with a large volume of errors subsequent invocations are increasingly delayed as the backlog increases, as each new error places another retry event back into the event queue. If it were possible to skip the event without retrying, it would eliminate this delay and continue processing new events.

In order to avoid this default auto-retry policy, there are several widely used approaches:

  • Function error handling with AWS Step Functions
  • Using the event handler for error routing logic.
  • Third party monitoring services for exception alerts

These workarounds require extra resources, code, or custom logic and add latency to your applications. Retries caused by system failures due to timeouts, lack of memory, early exits, etc. could still occur.

New Lambda controls for asynchronous event invocations

New asynchronous event invoke controls mean that functions can now skip the processing of certain events and discard unwanted or backlogged requests from the asynchronous event queue without the need for “workarounds.” The controls will be accessible from the console and from a new Event Invoke Config subresource, listed in detail below:

{
    "MaximumEventAgeInSeconds": integer,
    "MaximumRetryAttempts": integer
}
Figure 4: Asynchronous event invoke configuration.

MaximumRetryAttempts

Set a maximum number of retry attempts for events before they can be skipped to unblock processing and prevent duplicate outputs.

Minimum: 0 | maximum: 2 | default: 2

MaximumEventAgeInSeconds

Define an event maximum age in seconds, with expired events skipped to allow processing to continue. Events that do not get successfully processed within defined age of submission are written to the function’s configured Dead Letter Queue and/or On-failure Destinations for asynchronous invocations. If none is configured, the event is discarded.

Minimum: 60 | maximum: 21600 (6 hours)

These controls can also be configured from the AWS Lambda console, in the “Failure handling configuration” section shown below:

Edit failure handling configurations

When the same Lambda function is run with maximum retry attempts set to 0, the Amazon CloudWatch Logs show that the function is only invoked a single time, and not retried after the initial error.

CloudWatch Logs

Conclusion

The addition of these new Lambda controls allows you to handle function failures and retries appropriately for both asynchronous and stream processing. This eliminates the need for custom-built logic, alerts, and added overhead.

Developers understand the needs of their own applications. These new features allow them to decide how to react to errors on a per-function basis. Whether that means real-time stream processing, non-blocking event queues, or more retries, it depends on the application’s requirements. With the new controls in place, the power is in your hands.

You can get started with the new controls for stream processing and asynchronous invocations via the AWS Management Console, AWS CLI, AWS SAM, AWS CloudFormation, or AWS SDK for Lambda.

Investigating spikes in AWS Lambda function concurrency

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/investigating-spikes-in-aws-lambda-function-concurrency/

This post is courtesy of Ian Carlson, Principal Solutions Architect – AWS

As mentioned in an earlier post, a key benefit of serverless applications is the ease with which they can scale to meet traffic demands or requests. AWS Lambda is at the core of this platform.

Although this flexibility is hugely beneficial for our customers, sometimes an errant bit of code or upstream scaling can lead to spikes in concurrency. Unwanted usage can increase costs, pressure downstream systems, and throttle other functions in the account. Administrators new to serverless technologies need to leverage different metrics to manage their environment. In this blog, I walk through a sample scenario where I’m getting API errors. I trace those errors up through Amazon CloudWatch logs and leverage CloudWatch Metrics to understand what is happening in my environment. Finally, I show you how to set up alerts to reduce throttling surprises.

In my environment, I have a few Lambda functions configured. The first function is from Chris Munns’ concurrency blog, called concurrencyblog. I set up that function to execute behind an API hosted on Amazon API Gateway. In the background, I’m simulating activity with another function. This exercise uses the services in the following image.

To start, I make an API Gateway call to invoke the concurrencyblog function.

curl -i -XGET https://XXXXXXXX.execute-api.us-east-2.amazonaws.com/prod/concurrencyblog

I get the following output.

HTTP/2 502
content-type: application/json
content-length: 36
date: Wed, 01 Aug 2018 14:46:03 GMT
x-amzn-requestid: 9d5eca92-9599-11e8-bb13-dddafe0dbaa3
x-amz-apigw-id: K8ocOG_iiYcFa_Q=
x-cache: Error from cloudfront
via: 1.1 cb9e55028a8e7365209ebc8f2737b69b.cloudfront.net (CloudFront)
x-amz-cf-id: fk-gvFwSan8hzBtrC1hC_V5idaSDAKL9EwKDq205iN2RgQjnmIURYg==
 
{"message": "Internal server error"}

Hmmm, a 502 error. That shouldn’t happen. I don’t know the cause, but I configured logging for my API, so I can search for the requestid in CloudWatch logs. I navigate to the logs, select Search Log Group, and enter the x-amzn-requestid, enclosed in double quotes.

My API is invoking a Lambda function, and it’s getting an error from Lambda called ConcurrentInvocationLimitExceeded. This means my Lambda function was throttled. If I navigate to the function in the Lambda console, I get a similar message at the top.

If I scroll down, I observe that I don’t have throttling configured, so this must be coming from a different function or functions.

Using CloudWatch forensics

Lambda functions report lots of metrics in CloudWatch to tell you how they’re doing. Three of the metrics that I investigate here are Invocations, Duration, and ConcurrentExecutions. Invocations is incremented any time a Lambda function executes and is recorded for all functions and by individual functions. Duration is recorded to tell you how long Lambda functions take to execute. ConcurrentExecutions reports how many Lambda functions are executing at the same time and is emitted for the entire account and for functions that have a concurrency reservation set. Lambda emits CloudWatch metrics whenever there is Lambda activity in the account.

Lambda reports concurrency metrics for my account under AWS/Lambda/ConcurrentExecutions. To begin, I navigate to the Metrics pane of the CloudWatch console and choose Lambda on the All metrics tab.

Next, I choose Across All Functions.

Then I choose ConcurrentExecutions.

I choose the Graphed metrics tab and change Statistic from Average to Maximum, which shows me the peak concurrent executions in my account. For Period, I recommend reviewing 1-minute period data over the previous 2 weeks. After 2 weeks, the precision is aggregated over 5 minutes, which is a long time for Lambda!

In my test account, I find a concurrency spike at 14:46 UTC with 1000 concurrent executions.

Next, I want to find the culprit for this spike. I go back to the All metrics tab, but this time I choose By Function Name and enter Invocations in the search field. Then I select all of the functions listed.

The following image shows that BadLambdaConcurrency is the culprit.

It seems odd that there are only 331 invocations during that sample in the graph, so let’s dig in. Using the same method as before, I add the Duration metric for BadLambdaConcurrency. On average, this function is taking 30 seconds to complete, as shown in the following image.

Because there are 669 invocations the previous minute and the function is taking, on average, 30 seconds to complete, the next minute’s invocations (331) drives the concurrency up to 1000. Lambda functions can execute very quickly, so exact precision can be challenging, even over a 1-minute time period. However, this gives you a reasonable indication of the troublesome function in the account.

Automating this process

Investigating via the Lambda and CloudWatch console works fine if you have a few functions, but when you have tens or hundreds it can be pretty time consuming. Fortunately CloudWatch metrics are also available via API. To speed up this process I’ve written a script in Python that will go back over the last 7 days of metrics, find the minute with the highest concurrency, and output the Invocations and Average Duration for all functions for six minutes prior to that spike. You can download the script here. To execute, make sure you have rights to CloudWatch metrics, or are running from an EC2 instance that has those rights. Then you can execute:

sudo yum install python3
pip install boto3 --user
curl https://raw.githubusercontent.com/aws-samples/aws-lambda-concurrency-hunt/master/lambda-con-hunt.py -o lambda-con-hunt.py
python3 lambda-con-hunt.py

or, to output it to a file:

python3 lambda-con-hunt.py > output.csv

You should get output similar to the following image.

You can import this data into a spreadsheet program and sort it, or you can confirm visually that BadLambdaConcurrency is driving the concurrency.

Getting to the root cause

Now I want to understand what is driving that spike in Invocations for BadLambdaConcurrency, so I go to the Lambda console. It shows that API Gateway is triggering this Lambda function.

I choose API Gateway and scroll down to discover which API is triggering. Choosing the name (ConcurrencyTest) takes me to that API.

It’s the same API that I set up for concurrencyblog, but a different method. Because I already set up logging for this API, I can search the log group to check for interesting behavior. Perusing the logs, I check the method request headers for any insights as to who is calling this API. In real life I wouldn’t leave an API open without authentication, so I’ll have to do some guessing.

(a915ba7f-9591-11e8-8f19-a7737a1fb2d7) Method request headers: {CloudFront-Viewer-Country=US, CloudFront-Forwarded-Proto=https, CloudFront-Is-Tablet-Viewer=false, CloudFront-Is-Mobile-Viewer=false, User-Agent=hey/0.0.1, X-Forwarded-Proto=https, CloudFront-Is-SmartTV-Viewer=false, Host=xxxxxxxxx.execute-api.us-east-2.amazonaws.com, Accept-Encoding=gzip, X-Forwarded-Port=443, X-Amzn-Trace-Id=Root=1-5b61ba53-1958c6e2022ef9df9aac7bdb, Via=1.1 a0286f15cb377e35ea96015406919392.cloudfront.net (CloudFront), X-Amz-Cf-Id=O0GQ_V_eWRe5KydZNc46-aPSz7dfI19bmyhWCsbTBMoety73q0AtZA==, X-Forwarded-For=f.f.f.f, a.a.a.a, CloudFront-Is-Desktop-Viewer=true, Content-Type=text/html}

The method request headers have a user agent called hey. Hey, that’s a load testing utility! I bet that someone is load-testing this API, but it shouldn’t be allowed to consume all of my resources.

Applying rate and concurrency limiting

To keep this from happening, I place a throttle on the API method. In API Gateway console, in the APIs navigation pane, I choose Stages, choose prod, choose the Settings tab, and select the Enable throttling check box. Then I set a rate of 20 requests per second. It doesn’t sound like much, but with an average function duration of 30 seconds, 20 requests per second can use 600 concurrent Lambda executions.

I can also set a concurrency reservation on the function itself, as Chris pointed out in his blog.

If this is a bad function running amok or an emergency, I can throttle it directly, sometimes referred to as flipping a kill switch. I can do that quickly by choosing Throttle on the Lambda console.

I recommend throttling to zero only in emergency situations.

Investigating the duration

The other and larger problem is this function is taking 30 seconds to execute. That is a long time for an API, and the API Gateway integration timeout is 29 seconds. I wonder what is making it time out, so I check the traces in AWS X-Ray.

It initializes quickly enough, and I don’t find any downstream processes called. This function is a simple one, and the code is available from the Lambda console window. There I find my timeout culprit, a 30-second sleep call.

Not sure how that got through testing!

Setting up ongoing monitoring and alerting

To ensure that I’m not surprised again, let’s create a CloudWatch alert. In the CloudWatch console’s navigation pane, I choose Alarms and then choose Create Alarm.

When prompted, I choose Lambda and the ConcurrentExecutions metric across all functions, as shown in the following image.

Under Alarm Threshold, I give the alarm a name and description and enter 800 for is, as shown in the following image. I treat missing data as good because Lambda won’t publish a metric if there is no activity. I make sure that my period is 1 minute and use Maximum as the statistic. I want to be alerted only if this happens for any 2 minutes out of a 5-minute period. Finally, I can set up an Amazon SNS notification to alert me via email or text if this threshold is reached. This enables me to troubleshoot or request a limit increase for my account. Individual functions should be able to handle a throttling event through client-side retry and exponential backoff, but it’s still something that I want to know about.

Conclusion

In this blog, I walked through a method to investigate concurrency issues with Lambda, remediate those issues, and set up alerting. Managing concurrency is going to be new for a lot of people. As you deploy more applications, it’s especially important to segment them, monitor them, and understand how they are reporting their health. I hope you enjoyed this blog and start monitoring your functions today!

Developing .NET Core AWS Lambda functions

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/developing-net-core-aws-lambda-functions/

This post is courtesy of Mark Easton, Senior Solutions Architect – AWS

One of the biggest benefits of Lambda functions is that they isolate you from the underlying infrastructure. While that makes it easy to deploy and manage your code, it’s critical to have a clearly defined approach for testing, debugging, and diagnosing problems.

There’s a variety of best practices and AWS services to help you out. When developing Lambda functions in .NET, you can follow a four-pronged approach:

This post demonstrates the approach by creating a simple Lambda function that can be called from a gateway created by Amazon API Gateway and which returns the current UTC time. The post shows you how to design your code to allow for easy debugging, logging and tracing.

If you haven’t created Lambda functions with .NET Core before, then the following posts can help you get started:

Unit testing Lambda functions

One of the easiest ways to create a .NET Core Lambda function is to use the .NET Core CLI and create a solution using the Lambda Empty Serverless template.

If you haven’t already installed the Lambda templates, run the following command:

dotnet new -i Amazon.Lambda.Templates::*

You can now use the template to create a serverless project and unit test project, and then add them to a .NET Core solution by running the following commands:

dotnet new serverless.EmptyServerless -n DebuggingExample
cd DebuggingExample
dotnet new sln -n DebuggingExample\
dotnet sln DebuggingExample.sln add */*/*.csproj

Although you haven’t added any code yet, you can validate that everything’s working by executing the unit tests. Run the following commands:

cd test/DebuggingExample.Tests/
dotnet test

One of the key principles to effective unit testing is ensuring that units of functionality can be tested in isolation. It’s good practice to de-couple the Lambda function’s actual business logic from the plumbing code that handles the actual Lambda requests.

Using your favorite editor, create a new file, ITimeProcessor.cs, in the src/DebuggingExample folder, and create the following basic interface:

using System;

namespace DebuggingExample
{
    public interface ITimeProcessor
    {
        DateTime CurrentTimeUTC();
    }
}

Then, create a new TimeProcessor.cs file in the src/DebuggingExample folder. The file contains a concrete class implementing the interface.

using System;

namespace DebuggingExample
{
    public class TimeProcessor : ITimeProcessor
    {
        public DateTime CurrentTimeUTC()
        {
            return DateTime.UtcNow;
        }
    }
} 

Now add a TimeProcessorTest.cs file to the src/DebuggingExample.Tests folder. The file should contain the following code:

using System;
using Xunit;

namespace DebuggingExample.Tests
{
    public class TimeProcessorTest
    {
        [Fact]
        public void TestCurrentTimeUTC()
        {
            // Arrange
            var processor = new TimeProcessor();
            var preTestTimeUtc = DateTime.UtcNow;

            // Act
            var result = processor.CurrentTimeUTC();

            // Assert time moves forwards 
            var postTestTimeUtc = DateTime.UtcNow;
            Assert.True(result >= preTestTimeUtc);
            Assert.True(result <= postTestTimeUtc);
        }
    }
}

You can then execute all the tests. From the test/DebuggingExample.Tests folder, run the following command:

dotnet test

Surfacing business logic in a Lambda function

Now that you have your business logic written and tested, you can surface it as a Lambda function. Edit the src/DebuggingExample/Function.cs file so that it calls the CurrentTimeUTC method:

using System;
using System.Collections.Generic;
using System.Net;
using Amazon.Lambda.Core;
using Amazon.Lambda.APIGatewayEvents;
using Newtonsoft.Json;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(
typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))] 

namespace DebuggingExample
{
    public class Functions
    {
        ITimeProcessor processor = new TimeProcessor();

        public APIGatewayProxyResponse Get(
APIGatewayProxyRequest request, ILambdaContext context)
        {
            var result = processor.CurrentTimeUTC();

            return CreateResponse(result);
        }

APIGatewayProxyResponse CreateResponse(DateTime? result)
{
    int statusCode = (result != null) ? 
        (int)HttpStatusCode.OK : 
        (int)HttpStatusCode.InternalServerError;

    string body = (result != null) ? 
        JsonConvert.SerializeObject(result) : string.Empty;

    var response = new APIGatewayProxyResponse
    {
        StatusCode = statusCode,
        Body = body,
        Headers = new Dictionary<string, string>
        { 
            { "Content-Type", "application/json" }, 
            { "Access-Control-Allow-Origin", "*" } 
        }
    };
    
    return response;
}
    }
}

First, an instance of the TimeProcessor class is instantiated, and a Get() method is then defined to act as the entry point to the Lambda function.

By default, .NET Core Lambda function handlers expect their input in a Stream. This can be overridden by declaring a customer serializer, and then defining the handler’s method signature using a custom request and response type.

Because the project was created using the serverless.EmptyServerless template, it already overrides the default behavior. It does this by including a using reference to Amazon.Lambda.APIGatewayEvents and then declaring a custom serializer. For more information about using custom serializers in .NET, see the AWS Lambda for .NET Core repository on GitHub.

Get() takes a couple of parameters:

  • The APIGatewayProxyRequest parameter contains the request from the API Gateway fronting the Lambda function
  • The optional ILambdaContext parameter contains details of the execution context.

The Get() method calls CurrentTimeUTC() to retrieve the time from the business logic.

Finally, the result from CurrentTimeUTC() is passed to the CreateResponse() method, which converts the result into an APIGatewayResponse object to be returned to the caller.

Because the updated Lambda function no longer passes the unit tests, update the TestGetMethod in test/DebuggingExample.Tests/FunctionTest.cs file. Update the test by removing the following line:

Assert.Equal("Hello AWS Serverless", response.Body);

This leaves your FunctionTest.cs file as follows:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using Xunit;
using Amazon.Lambda.Core;
using Amazon.Lambda.TestUtilities;
using Amazon.Lambda.APIGatewayEvents;
using DebuggingExample;

namespace DebuggingExample.Tests
{
    public class FunctionTest
    {
        public FunctionTest()
        {
        }

        [Fact]
        public void TetGetMethod()
        {
            TestLambdaContext context;
            APIGatewayProxyRequest request;
            APIGatewayProxyResponse response;

            Functions functions = new Functions();

            request = new APIGatewayProxyRequest();
            context = new TestLambdaContext();
            response = functions.Get(request, context);
            Assert.Equal(200, response.StatusCode);
        }
    }
}

Again, you can check that everything is still working. From the test/DebuggingExample.Tests folder, run the following command:

dotnet test

Local integration testing with the AWS SAM CLI

Unit testing is a great start for testing thin slices of functionality. But to test that your API Gateway and Lambda function integrate with each other, you can test locally by using the AWS SAM CLI, installed as described in the AWS Lambda Developer Guide.

Unlike unit testing, which allows you to test functions in isolation outside of their runtime environment, the AWS SAM CLI executes your code in a locally hosted Docker container. It can also simulate a locally hosted API gateway proxy, allowing you to run component integration tests.

After you’ve installed the AWS SAM CLI, you can start using it by creating a template that describes your Lambda function by saving a file named template.yaml in the DebuggingExample directory with the following contents:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Sample SAM Template for DebuggingExample

# More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst
Globals:
    Function:
        Timeout: 10

Resources:

    DebuggingExampleFunction:
        Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
        Properties:
            FunctionName: DebuggingExample
			CodeUri: src/DebuggingExample/bin/Release/netcoreapp2.1/publish
            Handler: DebuggingExample::DebuggingExample.Functions::Get
            Runtime: dotnetcore2.1
            Environment: # More info about Env Vars: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#environment-object
                Variables:
                    PARAM1: VALUE
            Events:
                DebuggingExample:
                    Type: Api # More info about API Event Source: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#api
                    Properties:
                        Path: /
                        Method: get

Outputs:

    DebuggingExampleApi:
      Description: "API Gateway endpoint URL for Prod stage for Debugging Example function"
      Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/DebuggingExample/"

    DebuggingExampleFunction:
      Description: "Debugging Example Lambda Function ARN"
      Value: !GetAtt DebuggingExampleFunction.Arn

    DebuggingExampleFunctionIamRole:
      Description: "Implicit IAM Role created for Debugging Example function"
      Value: !GetAtt DebuggingExampleFunctionRole.Arn

Now that you have an AWS SAM CLI template, you can test your code locally. Because the Lambda function expects a request from API Gateway, create a sample API Gateway request. Run the following command:

sam local generate-event api > testApiRequest.json

You can now publish your DebuggingExample code locally and invoke it by passing in the sample request as follows:

dotnet publish -c Release
sam local invoke "DebuggingExampleFunction" --event testApiRequest.json

The first time that you run it, it might take some time to pull down the container image in which to host the Lambda function. After you’ve invoked it one time, the container image is cached locally, and execution speeds up.

Finally, rather than testing your function by sending it a sample request, test it with a real API gateway request by running API Gateway locally:

sam local start-api

If you now navigate to http://127.0.0.1:3000/ in your browser, you can get the API gateway to send a request to your locally hosted Lambda function. See the results in your browser.

Logging events with CloudWatch

Having a test strategy allows you to execute, test, and debug Lambda functions. After you’ve deployed your functions to AWS, you must still log what the functions are doing so that you can monitor their behavior.

The easiest way to add logging to your Lambda functions is to add code that writes events to CloudWatch. To do this, add a new method, LogMessage(), to the src/DebuggingExample/Function.cs file.

void LogMessage(ILambdaContext ctx, string msg)
{
    ctx.Logger.LogLine(
        string.Format("{0}:{1} - {2}", 
            ctx.AwsRequestId, 
            ctx.FunctionName,
            msg));
}

This takes in the context object from the Lambda function’s Get() method, and sends a message to CloudWatch by calling the context object’s Logger.Logline() method.

You can now add calls to LogMessage in the Get() method to log events in CloudWatch. It’s also a good idea to add a Try… Catch… block to ensure that exceptions are logged as well.

        public APIGatewayProxyResponse Get(APIGatewayProxyRequest request, ILambdaContext context)
        {
            LogMessage(context, "Processing request started");

            APIGatewayProxyResponse response;
            try
            {
                var result = processor.CurrentTimeUTC();
                response = CreateResponse(result);

                LogMessage(context, "Processing request succeeded.");
            }
            catch (Exception ex)
            {
                LogMessage(context, string.Format("Processing request failed - {0}", ex.Message));
                response = CreateResponse(null);
            }

            return response;
        }

To validate that the changes haven’t broken anything, you can now execute the unit tests again. Run the following commands:

cd test/DebuggingExample.Tests/
dotnet test

Tracing execution with X-Ray

Your code now logs events in CloudWatch, which provides a solid mechanism to help monitor and diagnose problems.

However, it can also be useful to trace your Lambda function’s execution to help diagnose performance or connectivity issues, especially if it’s called by or calling other services. X-Ray provides a variety of features to help analyze and trace code execution.

To enable active tracing on your function you need to modify the SAM template we created earlier to add a new attribute to the function resource definition. With SAM this is as easy as adding the Tracing attribute and specifying it as Active below the Timeout attribute in the Globals section of the template.yaml file:

Globals:
    Function:
        Timeout: 10
        Tracing: Active

To call X-Ray from within your .NET Core code, you must add the AWSSDKXRayRecoder to your solution by running the following command in the src/DebuggingExample folder:

dotnet add package AWSXRayRecorder –-version 2.2.1-beta

Then, add the following using statement at the top of the src/DebuggingExample/Function.cs file:

using Amazon.XRay.Recorder.Core;

Add a new method to the Function class, which takes a function and name and then records an X-Ray subsegment to trace the execution of the function.

        private T TraceFunction<T>(Func<T> func, string subSegmentName)
        {
            AWSXRayRecorder.Instance.BeginSubsegment(subSegmentName);
            T result = func();
            AWSXRayRecorder.Instance.EndSubsegment();

            return result;
        } 

You can now update the Get() method by replacing the following line:

var result = processor.CurrentTimeUTC();

Replace it with this line:

var result = TraceFunction(processor.CurrentTimeUTC, "GetTime");

The final version of Function.cs, in all its glory, is now:

using System;
using System.Collections.Generic;
using System.Net;
using Amazon.Lambda.Core;
using Amazon.Lambda.APIGatewayEvents;
using Newtonsoft.Json;
using Amazon.XRay.Recorder.Core;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(
typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))]

namespace DebuggingExample
{
    public class Functions
    {
        ITimeProcessor processor = new TimeProcessor();

        public APIGatewayProxyResponse Get(APIGatewayProxyRequest request, ILambdaContext context)
        {
            LogMessage(context, "Processing request started");

            APIGatewayProxyResponse response;
            try
            {
                var result = TraceFunction(processor.CurrentTimeUTC, "GetTime");
                response = CreateResponse(result);

                LogMessage(context, "Processing request succeeded.");
            }
            catch (Exception ex)
            {
                LogMessage(context, string.Format("Processing request failed - {0}", ex.Message));
                response = CreateResponse(null);
            }

            return response;
        }

        APIGatewayProxyResponse CreateResponse(DateTime? result)
        {
            int statusCode = (result != null) ?
                (int)HttpStatusCode.OK :
                (int)HttpStatusCode.InternalServerError;

            string body = (result != null) ?
                JsonConvert.SerializeObject(result) : string.Empty;

            var response = new APIGatewayProxyResponse
            {
                StatusCode = statusCode,
                Body = body,
                Headers = new Dictionary<string, string>
        {
            { "Content-Type", "application/json" },
            { "Access-Control-Allow-Origin", "*" }
        }
            };

            return response;
        }

        private void LogMessage(ILambdaContext context, string message)
        {
            context.Logger.LogLine(string.Format("{0}:{1} - {2}", context.AwsRequestId, context.FunctionName, message));
        }

        private T TraceFunction<T>(Func<T> func, string actionName)
        {
            AWSXRayRecorder.Instance.BeginSubsegment(actionName);
            T result = func();
            AWSXRayRecorder.Instance.EndSubsegment();

            return result;
        }
    }
}

Since AWS X-Ray requires an agent to collect trace information, if you want to test the code locally you should now install the AWS X-Ray agent. Once it’s installed, confirm the changes haven’t broken anything by running the unit tests again:

cd test/DebuggingExample.Tests/
dotnet test

For more information about using X-Ray from .NET Core, see the AWS X-Ray Developer Guide. For information about adding support for X-Ray in Visual Studio, see the New AWS X-Ray .NET Core Support post.

Deploying and testing the Lambda function remotely

Having created your Lambda function and tested it locally, you’re now ready to package and deploy your code.

First of all you need an Amazon S3 bucket to deploy the code into. If you don’t already have one, create a suitable S3 bucket.

You can now package the .NET Lambda Function and copy it to Amazon S3.

sam package \
  --template-file template.yaml \
  --output-template debugging-example.yaml \
  --s3-bucket debugging-example-deploy

Finally, deploy the Lambda function by running the following command:

sam deploy \
   --template-file debugging-example.yaml \
   --stack-name DebuggingExample \
   --capabilities CAPABILITY_IAM \
   --region eu-west-1

After your code has deployed successfully, test it from your local machine by running the following command:

dotnet lambda invoke-function DebuggingExample -–region eu-west-1

Diagnosing the Lambda function

Having run the Lambda function, you can now monitor its behavior by logging in to the AWS Management Console and then navigating to CloudWatch LogsCloudWatch Logs Console

You can now click on the /aws/lambda/DebuggingExample log group to view all the recorded log streams for your Lambda function.

If you open one of the log streams, you see the various messages recorded for the Lambda function, including the two events explicitly logged from within the Get() method.Lambda CloudWatch Logs

To review the logs locally, you can also use the AWS SAM CLI to retrieve CloudWatch logs and then display them in your terminal.

sam logs -n DebuggingExample --region eu-west-1

As a final alternative, you can also execute the Lambda function by choosing Test on the Lambda console. The execution results are displayed in the Log output section. Lambda Console Execution

In the X-Ray console, the Service Map page shows a map of the Lambda function’s connections.

Your Lambda function is essentially standalone. However, the Service Map page can be critical in helping to understand performance issues when a Lambda function is connected with a number of other services.X-Ray Service Map

If you open the Traces screen, the trace list showing all the trace results that it’s recorded. Open one of the traces to see a breakdown of the Lambda function performance.

X-Ray Traces UI

Conclusion

In this post, I showed you how to develop Lambda functions in .NET Core, how unit tests can be used, how to use the AWS SAM CLI for local integration tests, how CloudWatch can be used for logging and monitoring events, and finally how to use X-Ray to trace Lambda function execution.

Put together, these techniques provide a solid foundation to help you debug and diagnose your Lambda functions effectively. Explore each of the services further, because when it comes to production workloads, great diagnosis is key to providing a great and uninterrupted customer experience.

Analyze Apache Parquet optimized data using Amazon Kinesis Data Firehose, Amazon Athena, and Amazon Redshift

Post Syndicated from Roy Hasson original https://aws.amazon.com/blogs/big-data/analyzing-apache-parquet-optimized-data-using-amazon-kinesis-data-firehose-amazon-athena-and-amazon-redshift/

Amazon Kinesis Data Firehose is the easiest way to capture and stream data into a data lake built on Amazon S3. This data can be anything—from AWS service logs like AWS CloudTrail log files, Amazon VPC Flow Logs, Application Load Balancer logs, and others. It can also be IoT events, game events, and much more. To efficiently query this data, a time-consuming ETL (extract, transform, and load) process is required to massage and convert the data to an optimal file format, which increases the time to insight. This situation is less than ideal, especially for real-time data that loses its value over time.

To solve this common challenge, Kinesis Data Firehose can now save data to Amazon S3 in Apache Parquet or Apache ORC format. These are optimized columnar formats that are highly recommended for best performance and cost-savings when querying data in S3. This feature directly benefits you if you use Amazon Athena, Amazon Redshift, AWS Glue, Amazon EMR, or any other big data tools that are available from the AWS Partner Network and through the open-source community.

Amazon Connect is a simple-to-use, cloud-based contact center service that makes it easy for any business to provide a great customer experience at a lower cost than common alternatives. Its open platform design enables easy integration with other systems. One of those systems is Amazon Kinesis—in particular, Kinesis Data Streams and Kinesis Data Firehose.

What’s really exciting is that you can now save events from Amazon Connect to S3 in Apache Parquet format. You can then perform analytics using Amazon Athena and Amazon Redshift Spectrum in real time, taking advantage of this key performance and cost optimization. Of course, Amazon Connect is only one example. This new capability opens the door for a great deal of opportunity, especially as organizations continue to build their data lakes.

Amazon Connect includes an array of analytics views in the Administrator dashboard. But you might want to run other types of analysis. In this post, I describe how to set up a data stream from Amazon Connect through Kinesis Data Streams and Kinesis Data Firehose and out to S3, and then perform analytics using Athena and Amazon Redshift Spectrum. I focus primarily on the Kinesis Data Firehose support for Parquet and its integration with the AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift.

Solution overview

Here is how the solution is laid out:

 

 

The following sections walk you through each of these steps to set up the pipeline.

1. Define the schema

When Kinesis Data Firehose processes incoming events and converts the data to Parquet, it needs to know which schema to apply. The reason is that many times, incoming events contain all or some of the expected fields based on which values the producers are advertising. A typical process is to normalize the schema during a batch ETL job so that you end up with a consistent schema that can easily be understood and queried. Doing this introduces latency due to the nature of the batch process. To overcome this issue, Kinesis Data Firehose requires the schema to be defined in advance.

To see the available columns and structures, see Amazon Connect Agent Event Streams. For the purpose of simplicity, I opted to make all the columns of type String rather than create the nested structures. But you can definitely do that if you want.

The simplest way to define the schema is to create a table in the Amazon Athena console. Open the Athena console, and paste the following create table statement, substituting your own S3 bucket and prefix for where your event data will be stored. A Data Catalog database is a logical container that holds the different tables that you can create. The default database name shown here should already exist. If it doesn’t, you can create it or use another database that you’ve already created.

CREATE EXTERNAL TABLE default.kfhconnectblog (
  awsaccountid string,
  agentarn string,
  currentagentsnapshot string,
  eventid string,
  eventtimestamp string,
  eventtype string,
  instancearn string,
  previousagentsnapshot string,
  version string
)
STORED AS parquet
LOCATION 's3://your_bucket/kfhconnectblog/'
TBLPROPERTIES ("parquet.compression"="SNAPPY")

That’s all you have to do to prepare the schema for Kinesis Data Firehose.

2. Define the data streams

Next, you need to define the Kinesis data streams that will be used to stream the Amazon Connect events.  Open the Kinesis Data Streams console and create two streams.  You can configure them with only one shard each because you don’t have a lot of data right now.

3. Define the Kinesis Data Firehose delivery stream for Parquet

Let’s configure the Data Firehose delivery stream using the data stream as the source and Amazon S3 as the output. Start by opening the Kinesis Data Firehose console and creating a new data delivery stream. Give it a name, and associate it with the Kinesis data stream that you created in Step 2.

As shown in the following screenshot, enable Record format conversion (1) and choose Apache Parquet (2). As you can see, Apache ORC is also supported. Scroll down and provide the AWS Glue Data Catalog database name (3) and table names (4) that you created in Step 1. Choose Next.

To make things easier, the output S3 bucket and prefix fields are automatically populated using the values that you defined in the LOCATION parameter of the create table statement from Step 1. Pretty cool. Additionally, you have the option to save the raw events into another location as defined in the Source record S3 backup section. Don’t forget to add a trailing forward slash “ / “ so that Data Firehose creates the date partitions inside that prefix.

On the next page, in the S3 buffer conditions section, there is a note about configuring a large buffer size. The Parquet file format is highly efficient in how it stores and compresses data. Increasing the buffer size allows you to pack more rows into each output file, which is preferred and gives you the most benefit from Parquet.

Compression using Snappy is automatically enabled for both Parquet and ORC. You can modify the compression algorithm by using the Kinesis Data Firehose API and update the OutputFormatConfiguration.

Be sure to also enable Amazon CloudWatch Logs so that you can debug any issues that you might run into.

Lastly, finalize the creation of the Firehose delivery stream, and continue on to the next section.

4. Set up the Amazon Connect contact center

After setting up the Kinesis pipeline, you now need to set up a simple contact center in Amazon Connect. The Getting Started page provides clear instructions on how to set up your environment, acquire a phone number, and create an agent to accept calls.

After setting up the contact center, in the Amazon Connect console, choose your Instance Alias, and then choose Data Streaming. Under Agent Event, choose the Kinesis data stream that you created in Step 2, and then choose Save.

At this point, your pipeline is complete.  Agent events from Amazon Connect are generated as agents go about their day. Events are sent via Kinesis Data Streams to Kinesis Data Firehose, which converts the event data from JSON to Parquet and stores it in S3. Athena and Amazon Redshift Spectrum can simply query the data without any additional work.

So let’s generate some data. Go back into the Administrator console for your Amazon Connect contact center, and create an agent to handle incoming calls. In this example, I creatively named mine Agent One. After it is created, Agent One can get to work and log into their console and set their availability to Available so that they are ready to receive calls.

To make the data a bit more interesting, I also created a second agent, Agent Two. I then made some incoming and outgoing calls and caused some failures to occur, so I now have enough data available to analyze.

5. Analyze the data with Athena

Let’s open the Athena console and run some queries. One thing you’ll notice is that when we created the schema for the dataset, we defined some of the fields as Strings even though in the documentation they were complex structures.  The reason for doing that was simply to show some of the flexibility of Athena to be able to parse JSON data. However, you can define nested structures in your table schema so that Kinesis Data Firehose applies the appropriate schema to the Parquet file.

Let’s run the first query to see which agents have logged into the system.

The query might look complex, but it’s fairly straightforward:

WITH dataset AS (
  SELECT 
    from_iso8601_timestamp(eventtimestamp) AS event_ts,
    eventtype,
    -- CURRENT STATE
    json_extract_scalar(
      currentagentsnapshot,
      '$.agentstatus.name') AS current_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        currentagentsnapshot,
        '$.agentstatus.starttimestamp')) AS current_starttimestamp,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.firstname') AS current_firstname,
    json_extract_scalar(
      currentagentsnapshot,
      '$.configuration.lastname') AS current_lastname,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.username') AS current_username,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') AS               current_outboundqueue,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as current_inboundqueue,
    -- PREVIOUS STATE
    json_extract_scalar(
      previousagentsnapshot, 
      '$.agentstatus.name') as prev_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        previousagentsnapshot, 
       '$.agentstatus.starttimestamp')) as prev_starttimestamp,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.firstname') as prev_firstname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.lastname') as prev_lastname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.username') as prev_username,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') as current_outboundqueue,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as prev_inboundqueue
  from kfhconnectblog
  where eventtype <> 'HEART_BEAT'
)
SELECT
  current_status as status,
  current_username as username,
  event_ts
FROM dataset
WHERE eventtype = 'LOGIN' AND current_username <> ''
ORDER BY event_ts DESC

The query output looks something like this:

Here is another query that shows the sessions each of the agents engaged with. It tells us where they were incoming or outgoing, if they were completed, and where there were missed or failed calls.

WITH src AS (
  SELECT
     eventid,
     json_extract_scalar(currentagentsnapshot, '$.configuration.username') as username,
     cast(json_extract(currentagentsnapshot, '$.contacts') AS ARRAY(JSON)) as c,
     cast(json_extract(previousagentsnapshot, '$.contacts') AS ARRAY(JSON)) as p
  from kfhconnectblog
),
src2 AS (
  SELECT *
  FROM src CROSS JOIN UNNEST (c, p) AS contacts(c_item, p_item)
),
dataset AS (
SELECT 
  eventid,
  username,
  json_extract_scalar(c_item, '$.contactid') as c_contactid,
  json_extract_scalar(c_item, '$.channel') as c_channel,
  json_extract_scalar(c_item, '$.initiationmethod') as c_direction,
  json_extract_scalar(c_item, '$.queue.name') as c_queue,
  json_extract_scalar(c_item, '$.state') as c_state,
  from_iso8601_timestamp(json_extract_scalar(c_item, '$.statestarttimestamp')) as c_ts,
  
  json_extract_scalar(p_item, '$.contactid') as p_contactid,
  json_extract_scalar(p_item, '$.channel') as p_channel,
  json_extract_scalar(p_item, '$.initiationmethod') as p_direction,
  json_extract_scalar(p_item, '$.queue.name') as p_queue,
  json_extract_scalar(p_item, '$.state') as p_state,
  from_iso8601_timestamp(json_extract_scalar(p_item, '$.statestarttimestamp')) as p_ts
FROM src2
)
SELECT 
  username,
  c_channel as channel,
  c_direction as direction,
  p_state as prev_state,
  c_state as current_state,
  c_ts as current_ts,
  c_contactid as id
FROM dataset
WHERE c_contactid = p_contactid
ORDER BY id DESC, current_ts ASC

The query output looks similar to the following:

6. Analyze the data with Amazon Redshift Spectrum

With Amazon Redshift Spectrum, you can query data directly in S3 using your existing Amazon Redshift data warehouse cluster. Because the data is already in Parquet format, Redshift Spectrum gets the same great benefits that Athena does.

Here is a simple query to show querying the same data from Amazon Redshift. Note that to do this, you need to first create an external schema in Amazon Redshift that points to the AWS Glue Data Catalog.

SELECT 
  eventtype,
  json_extract_path_text(currentagentsnapshot,'agentstatus','name') AS current_status,
  json_extract_path_text(currentagentsnapshot, 'configuration','firstname') AS current_firstname,
  json_extract_path_text(currentagentsnapshot, 'configuration','lastname') AS current_lastname,
  json_extract_path_text(
    currentagentsnapshot,
    'configuration','routingprofile','defaultoutboundqueue','name') AS current_outboundqueue,
FROM default_schema.kfhconnectblog

The following shows the query output:

Summary

In this post, I showed you how to use Kinesis Data Firehose to ingest and convert data to columnar file format, enabling real-time analysis using Athena and Amazon Redshift. This great feature enables a level of optimization in both cost and performance that you need when storing and analyzing large amounts of data. This feature is equally important if you are investing in building data lakes on AWS.

 


Additional Reading

If you found this post useful, be sure to check out Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight and Work with partitioned data in AWS Glue.


About the Author

Roy Hasson is a Global Business Development Manager for AWS Analytics. He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan cheering his team on and hanging out with his family.

 

 

 

AWS AppSync – Production-Ready with Six New Features

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-appsync-production-ready-with-six-new-features/

If you build (or want to build) data-driven web and mobile apps and need real-time updates and the ability to work offline, you should take a look at AWS AppSync. Announced in preview form at AWS re:Invent 2017 and described in depth here, AWS AppSync is designed for use in iOS, Android, JavaScript, and React Native apps. AWS AppSync is built around GraphQL, an open, standardized query language that makes it easy for your applications to request the precise data that they need from the cloud.

I’m happy to announce that the preview period is over and that AWS AppSync is now generally available and production-ready, with six new features that will simplify and streamline your application development process:

Console Log Access – You can now see the CloudWatch Logs entries that are created when you test your GraphQL queries, mutations, and subscriptions from within the AWS AppSync Console.

Console Testing with Mock Data – You can now create and use mock context objects in the console for testing purposes.

Subscription Resolvers – You can now create resolvers for AWS AppSync subscription requests, just as you can already do for query and mutate requests.

Batch GraphQL Operations for DynamoDB – You can now make use of DynamoDB’s batch operations (BatchGetItem and BatchWriteItem) across one or more tables. in your resolver functions.

CloudWatch Support – You can now use Amazon CloudWatch Metrics and CloudWatch Logs to monitor calls to the AWS AppSync APIs.

CloudFormation Support – You can now define your schemas, data sources, and resolvers using AWS CloudFormation templates.

A Brief AppSync Review
Before diving in to the new features, let’s review the process of creating an AWS AppSync API, starting from the console. I click Create API to begin:

I enter a name for my API and (for demo purposes) choose to use the Sample schema:

The schema defines a collection of GraphQL object types. Each object type has a set of fields, with optional arguments:

If I was creating an API of my own I would enter my schema at this point. Since I am using the sample, I don’t need to do this. Either way, I click on Create to proceed:

The GraphQL schema type defines the entry points for the operations on the data. All of the data stored on behalf of a particular schema must be accessible using a path that begins at one of these entry points. The console provides me with an endpoint and key for my API:

It also provides me with guidance and a set of fully functional sample apps that I can clone:

When I clicked Create, AWS AppSync created a pair of Amazon DynamoDB tables for me. I can click Data Sources to see them:

I can also see and modify my schema, issue queries, and modify an assortment of settings for my API.

Let’s take a quick look at each new feature…

Console Log Access
The AWS AppSync Console already allows me to issue queries and to see the results, and now provides access to relevant log entries.In order to see the entries, I must enable logs (as detailed below), open up the LOGS, and check the checkbox. Here’s a simple mutation query that adds a new event. I enter the query and click the arrow to test it:

I can click VIEW IN CLOUDWATCH for a more detailed view:

To learn more, read Test and Debug Resolvers.

Console Testing with Mock Data
You can now create a context object in the console where it will be passed to one of your resolvers for testing purposes. I’ll add a testResolver item to my schema:

Then I locate it on the right-hand side of the Schema page and click Attach:

I choose a data source (this is for testing and the actual source will not be accessed), and use the Put item mapping template:

Then I click Select test context, choose Create New Context, assign a name to my test content, and click Save (as you can see, the test context contains the arguments from the query along with values to be returned for each field of the result):

After I save the new Resolver, I click Test to see the request and the response:

Subscription Resolvers
Your AWS AppSync application can monitor changes to any data source using the @aws_subscribe GraphQL schema directive and defining a Subscription type. The AWS AppSync client SDK connects to AWS AppSync using MQTT over Websockets and the application is notified after each mutation. You can now attach resolvers (which convert GraphQL payloads into the protocol needed by the underlying storage system) to your subscription fields and perform authorization checks when clients attempt to connect. This allows you to perform the same fine grained authorization routines across queries, mutations, and subscriptions.

To learn more about this feature, read Real-Time Data.

Batch GraphQL Operations
Your resolvers can now make use of DynamoDB batch operations that span one or more tables in a region. This allows you to use a list of keys in a single query, read records multiple tables, write records in bulk to multiple tables, and conditionally write or delete related records across multiple tables.

In order to use this feature the IAM role that you use to access your tables must grant access to DynamoDB’s BatchGetItem and BatchPutItem functions.

To learn more, read the DynamoDB Batch Resolvers tutorial.

CloudWatch Logs Support
You can now tell AWS AppSync to log API requests to CloudWatch Logs. Click on Settings and Enable logs, then choose the IAM role and the log level:

CloudFormation Support
You can use the following CloudFormation resource types in your templates to define AWS AppSync resources:

AWS::AppSync::GraphQLApi – Defines an AppSync API in terms of a data source (an Amazon Elasticsearch Service domain or a DynamoDB table).

AWS::AppSync::ApiKey – Defines the access key needed to access the data source.

AWS::AppSync::GraphQLSchema – Defines a GraphQL schema.

AWS::AppSync::DataSource – Defines a data source.

AWS::AppSync::Resolver – Defines a resolver by referencing a schema and a data source, and includes a mapping template for requests.

Here’s a simple schema definition in YAML form:

  AppSyncSchema:
    Type: "AWS::AppSync::GraphQLSchema"
    DependsOn:
      - AppSyncGraphQLApi
    Properties:
      ApiId: !GetAtt AppSyncGraphQLApi.ApiId
      Definition: |
        schema {
          query: Query
          mutation: Mutation
        }
        type Query {
          singlePost(id: ID!): Post
          allPosts: [Post]
        }
        type Mutation {
          putPost(id: ID!, title: String!): Post
        }
        type Post {
          id: ID!
          title: String!
        }

Available Now
These new features are available now and you can start using them today! Here are a couple of blog posts and other resources that you might find to be of interest:

Jeff;

 

 

Improve the Operational Efficiency of Amazon Elasticsearch Service Domains with Automated Alarms Using Amazon CloudWatch

Post Syndicated from Veronika Megler original https://aws.amazon.com/blogs/big-data/improve-the-operational-efficiency-of-amazon-elasticsearch-service-domains-with-automated-alarms-using-amazon-cloudwatch/

A customer has been successfully creating and running multiple Amazon Elasticsearch Service (Amazon ES) domains to support their business users’ search needs across products, orders, support documentation, and a growing suite of similar needs. The service has become heavily used across the organization.  This led to some domains running at 100% capacity during peak times, while others began to run low on storage space. Because of this increased usage, the technical teams were in danger of missing their service level agreements.  They contacted me for help.

This post shows how you can set up automated alarms to warn when domains need attention.

Solution overview

Amazon ES is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time analytics capabilities along with the availability, scalability, and security that production workloads require.  The service offers built-in integrations with a number of other components and AWS services, enabling customers to go from raw data to actionable insights quickly and securely.

One of these other integrated services is Amazon CloudWatch. CloudWatch is a monitoring service for AWS Cloud resources and the applications that you run on AWS. You can use CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources.

CloudWatch collects metrics for Amazon ES. You can use these metrics to monitor the state of your Amazon ES domains, and set alarms to notify you about high utilization of system resources.  For more information, see Amazon Elasticsearch Service Metrics and Dimensions.

While the metrics are automatically collected, the missing piece is how to set alarms on these metrics at appropriate levels for each of your domains. This post includes sample Python code to evaluate the current state of your Amazon ES environment, and to set up alarms according to AWS recommendations and best practices.

There are two components to the sample solution:

  • es-check-cwalarms.py: This Python script checks the CloudWatch alarms that have been set, for all Amazon ES domains in a given account and region.
  • es-create-cwalarms.py: This Python script sets up a set of CloudWatch alarms for a single given domain.

The sample code can also be found in the amazon-es-check-cw-alarms GitHub repo. The scripts are easy to extend or combine, as described in the section “Extensions and Adaptations”.

Assessing the current state

The first script, es-check-cwalarms.py, is used to give an overview of the configurations and alarm settings for all the Amazon ES domains in the given region. The script takes the following parameters:

python es-checkcwalarms.py -h
usage: es-checkcwalarms.py [-h] [-e ESPREFIX] [-n NOTIFY] [-f FREE][-p PROFILE] [-r REGION]
Checks a set of recommended CloudWatch alarms for Amazon Elasticsearch Service domains (optionally, those beginning with a given prefix).
optional arguments:
  -h, --help   		show this help message and exit
  -e ESPREFIX, --esprefix ESPREFIX	Only check Amazon Elasticsearch Service domains that begin with this prefix.
  -n NOTIFY, --notify NOTIFY    List of CloudWatch alarm actions; e.g. ['arn:aws:sns:xxxx']
  -f FREE, --free FREE  Minimum free storage (MB) on which to alarm
  -p PROFILE, --profile PROFILE     IAM profile name to use
  -r REGION, --region REGION       AWS region for the domain. Default: us-east-1

The script first identifies all the domains in the given region (or, optionally, limits them to the subset that begins with a given prefix). It then starts running a set of checks against each one.

The script can be run from the command line or set up as a scheduled Lambda function. For example, for one customer, it was deemed appropriate to regularly run the script to check that alarms were correctly set for all domains. In addition, because configuration changes—cluster size increases to accommodate larger workloads being a common change—might require updates to alarms, this approach allowed the automatic identification of alarms no longer appropriately set as the domain configurations changed.

The output shown below is the output for one domain in my account.

Starting checks for Elasticsearch domain iotfleet , version is 53
Iotfleet Automated snapshot hour (UTC): 0
Iotfleet Instance configuration: 1 instances; type:m3.medium.elasticsearch
Iotfleet Instance storage definition is: 4 GB; free storage calced to: 819.2 MB
iotfleet Desired free storage set to (in MB): 819.2
iotfleet WARNING: Not using VPC Endpoint
iotfleet WARNING: Does not have Zone Awareness enabled
iotfleet WARNING: Instance count is ODD. Best practice is for an even number of data nodes and zone awareness.
iotfleet WARNING: Does not have Dedicated Masters.
iotfleet WARNING: Neither index nor search slow logs are enabled.
iotfleet WARNING: EBS not in use. Using instance storage only.
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-ClusterStatus.yellow-Alarm ClusterStatus.yellow
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-ClusterStatus.red-Alarm ClusterStatus.red
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-CPUUtilization-Alarm CPUUtilization
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-JVMMemoryPressure-Alarm JVMMemoryPressure
iotfleet WARNING: Missing alarm!! ('ClusterIndexWritesBlocked', 'Maximum', 60, 5, 'GreaterThanOrEqualToThreshold', 1.0)
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-AutomatedSnapshotFailure-Alarm AutomatedSnapshotFailure
iotfleet Alarm: Threshold does not match: Test-Elasticsearch-iotfleet-FreeStorageSpace-Alarm Should be:  819.2 ; is 3000.0

The output messages fall into the following categories:

  • System overview, Informational: The Amazon ES version and configuration, including instance type and number, storage, automated snapshot hour, etc.
  • Free storage: A calculation for the appropriate amount of free storage, based on the recommended 20% of total storage.
  • Warnings: best practices that are not being followed for this domain. (For more about this, read on.)
  • Alarms: An assessment of the CloudWatch alarms currently set for this domain, against a recommended set.

The script contains an array of recommended CloudWatch alarms, based on best practices for these metrics and statistics. Using the array allows alarm parameters (such as free space) to be updated within the code based on current domain statistics and configurations.

For a given domain, the script checks if each alarm has been set. If the alarm is set, it checks whether the values match those in the array esAlarms. In the output above, you can see three different situations being reported:

  • Alarm ok; definition matches. The alarm set for the domain matches the settings in the array.
  • Alarm: Threshold does not match. An alarm exists, but the threshold value at which the alarm is triggered does not match.
  • WARNING: Missing alarm!! The recommended alarm is missing.

All in all, the list above shows that this domain does not have a configuration that adheres to best practices, nor does it have all the recommended alarms.

Setting up alarms

Now that you know that the domains in their current state are missing critical alarms, you can correct the situation.

To demonstrate the script, set up a new domain named “ver”, in us-west-2. Specify 1 node, and a 10-GB EBS disk. Also, create an SNS topic in us-west-2 with a name of “sendnotification”, which sends you an email.

Run the second script, es-create-cwalarms.py, from the command line. This script creates (or updates) the desired CloudWatch alarms for the specified Amazon ES domain, “ver”.

python es-create-cwalarms.py -r us-west-2 -e test -c ver -n "['arn:aws:sns:us-west-2:xxxxxxxxxx:sendnotification']"
EBS enabled: True type: gp2 size (GB): 10 No Iops 10240  total storage (MB)
Desired free storage set to (in MB): 2048.0
Creating  Test-Elasticsearch-ver-ClusterStatus.yellow-Alarm
Creating  Test-Elasticsearch-ver-ClusterStatus.red-Alarm
Creating  Test-Elasticsearch-ver-CPUUtilization-Alarm
Creating  Test-Elasticsearch-ver-JVMMemoryPressure-Alarm
Creating  Test-Elasticsearch-ver-FreeStorageSpace-Alarm
Creating  Test-Elasticsearch-ver-ClusterIndexWritesBlocked-Alarm
Creating  Test-Elasticsearch-ver-AutomatedSnapshotFailure-Alarm
Successfully finished creating alarms!

As with the first script, this script contains an array of recommended CloudWatch alarms, based on best practices for these metrics and statistics. This approach allows you to add or modify alarms based on your use case (more on that below).

After running the script, navigate to Alarms on the CloudWatch console. You can see the set of alarms set up on your domain.

Because the “ver” domain has only a single node, cluster status is yellow, and that alarm is in an “ALARM” state. It’s already sent a notification that the alarm has been triggered.

What to do when an alarm triggers

After alarms are set up, you need to identify the correct action to take for each alarm, which depends on the alarm triggered. For ideas, guidance, and additional pointers to supporting documentation, see Get Started with Amazon Elasticsearch Service: Set CloudWatch Alarms on Key Metrics. For information about common errors and recovery actions to take, see Handling AWS Service Errors.

In most cases, the alarm triggers due to an increased workload. The likely action is to reconfigure the system to handle the increased workload, rather than reducing the incoming workload. Reconfiguring any backend store—a category of systems that includes Elasticsearch—is best performed when the system is quiescent or lightly loaded. Reconfigurations such as setting zone awareness or modifying the disk type cause Amazon ES to enter a “processing” state, potentially disrupting client access.

Other changes, such as increasing the number of data nodes, may cause Elasticsearch to begin moving shards, potentially impacting search performance on these shards while this is happening. These actions should be considered in the context of your production usage. For the same reason I also do not recommend running a script that resets all domains to match best practices.

Avoid the need to reconfigure during heavy workload by setting alarms at a level that allows a considered approach to making the needed changes. For example, if you identify that each weekly peak is increasing, you can reconfigure during a weekly quiet period.

While Elasticsearch can be reconfigured without being quiesced, it is not a best practice to automatically scale it up and down based on usage patterns. Unlike some other AWS services, I recommend against setting a CloudWatch action that automatically reconfigures the system when alarms are triggered.

There are other situations where the planned reconfiguration approach may not work, such as low or zero free disk space causing the domain to reject writes. If the business is dependent on the domain continuing to accept incoming writes and deleting data is not an option, the team may choose to reconfigure immediately.

Extensions and adaptations

You may wish to modify the best practices encoded in the scripts for your own environment or workloads. It’s always better to avoid situations where alerts are generated but routinely ignored. All alerts should trigger a review and one or more actions, either immediately or at a planned date. The following is a list of common situations where you may wish to set different alarms for different domains:

  • Dev/test vs. production
    You may have a different set of configuration rules and alarms for your dev environment configurations than for test. For example, you may require zone awareness and dedicated masters for your production environment, but not for your development domains. Or, you may not have any alarms set in dev. For test environments that mirror your potential peak load, test to ensure that the alarms are appropriately triggered.
  • Differing workloads or SLAs for different domains
    You may have one domain with a requirement for superfast search performance, and another domain with a heavy ingest load that tolerates slower search response. Your reaction to slow response for these two workloads is likely to be different, so perhaps the thresholds for these two domains should be set at a different level. In this case, you might add a “max CPU utilization” alarm at 100% for 1 minute for the fast search domain, while the other domain only triggers an alarm when the average has been higher than 60% for 5 minutes. You might also add a “free space” rule with a higher threshold to reflect the need for more space for the heavy ingest load if there is danger that it could fill the available disk quickly.
  • “Normal” alarms versus “emergency” alarms
    If, for example, free disk space drops to 25% of total capacity, an alarm is triggered that indicates action should be taken as soon as possible, such as cleaning up old indexes or reconfiguring at the next quiet period for this domain. However, if free space drops below a critical level (20% free space), action must be taken immediately in order to prevent Amazon ES from setting the domain to read-only. Similarly, if the “ClusterIndexWritesBlocked” alarm triggers, the domain has already stopped accepting writes, so immediate action is needed. In this case, you may wish to set “laddered” alarms, where one threshold causes an alarm to be triggered to review the current workload for a planned reconfiguration, but a different threshold raises a “DefCon 3” alarm that immediate action is required.

The sample scripts provided here are a starting point, intended for you to adapt to your own environment and needs.

Running the scripts one time can identify how far your current state is from your desired state, and create an initial set of alarms. Regularly re-running these scripts can capture changes in your environment over time and adjusting your alarms for changes in your environment and configurations. One customer has set them up to run nightly, and to automatically create and update alarms to match their preferred settings.

Removing unwanted alarms

Each CloudWatch alarm costs approximately $0.10 per month. You can remove unwanted alarms in the CloudWatch console, under Alarms. If you set up a “ver” domain above, remember to remove it to avoid continuing charges.

Conclusion

Setting CloudWatch alarms appropriately for your Amazon ES domains can help you avoid suboptimal performance and allow you to respond to workload growth or configuration issues well before they become urgent. This post gives you a starting point for doing so. The additional sleep you’ll get knowing you don’t need to be concerned about Elasticsearch domain performance will allow you to focus on building creative solutions for your business and solving problems for your customers.

Enjoy!


Additional Reading

If you found this post useful, be sure to check out Analyzing Amazon Elasticsearch Service Slow Logs Using Amazon CloudWatch Logs Streaming and Kibana and Get Started with Amazon Elasticsearch Service: How Many Shards Do I Need?

 


About the Author

Dr. Veronika Megler is a senior consultant at Amazon Web Services. She works with our customers to implement innovative big data, AI and ML projects, helping them accelerate their time-to-value when using AWS.

 

 

 

Central Logging in Multi-Account Environments

Post Syndicated from matouk original https://aws.amazon.com/blogs/architecture/central-logging-in-multi-account-environments/

Centralized logging is often required in large enterprise environments for a number of reasons, ranging from compliance and security to analytics and application-specific needs.

I’ve seen that in a multi-account environment, whether the accounts belong to the same line of business or multiple business units, collecting logs in a central, dedicated logging account is an established best practice. It helps security teams detect malicious activities both in real-time and during incident response. It provides protection to log data in case it is accidentally or intentionally deleted. It also helps application teams correlate and analyze log data across multiple application tiers.

This blog post provides a solution and building blocks to stream Amazon CloudWatch log data across accounts. In a multi-account environment this repeatable solution could be deployed multiple times to stream all relevant Amazon CloudWatch log data from all accounts to a centralized logging account.

Solution Summary 

The solution uses Amazon Kinesis Data Streams and a log destination to set up an endpoint in the logging account to receive streamed logs and uses Amazon Kinesis Data Firehose to deliver log data to the Amazon Simple Storage Solution (S3) bucket. Application accounts will subscribe to stream all (or part) of their Amazon CloudWatch logs to a defined destination in the logging account via subscription filters.

Below is a diagram illustrating how the various services work together.


In logging an account, a Kinesis Data Stream is created to receive streamed log data and a log destination is created to facilitate remote streaming, configured to use the Kinesis Data Stream as its target.

The Amazon Kinesis Data Firehose stream is created to deliver log data from the data stream to S3. The delivery stream uses a generic AWS Lambda function for data validation and transformation.

In each application account, a subscription filter is created between each Amazon CloudWatch log group and the destination created for this log group in the logging account.

The following steps are involved in setting up the central-logging solution:

  1. Create an Amazon S3 bucket for your central logging in the logging account
  2. Create an AWS Lambda function for log data transformation and decoding in logging account
  3. Create a central logging stack as a logging-account destination ready to receive streamed logs and deliver them to S3
  4. Create a subscription in application accounts to deliver logs from a specific CloudWatch log group to the logging account destination
  5. Create Amazon Athena tables to query and analyze log data in your logging account

Creating a log destination in your logging account

In this section, we will setup the logging account side of the solution, providing detail on the list above. The example I use is for the us-east-1 region, however any region where required services are available could be used.

It’s important to note that your logging-account destination and application-account subscription must be in the same region. You can deploy the solution multiple times to create destinations in all required regions if application accounts use multiple regions.

Step 1: Create an S3 bucket

Use the CloudFormation template below to create S3 bucket in logging account. This template also configures the bucket to archive log data to Glacier after 60 days.


{
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description": "CF Template to create S3 bucket for central logging",
  "Parameters":{

    "BucketName":{
      "Type":"String",
      "Default":"",
      "Description":"Central logging bucket name"
    }
  },
  "Resources":{
                        
   "CentralLoggingBucket" : {
      "Type" : "AWS::S3::Bucket",
      "Properties" : {
        "BucketName" : {"Ref": "BucketName"},
        "LifecycleConfiguration": {
            "Rules": [
                {
                  "Id": "ArchiveToGlacier",
                  "Prefix": "",
                  "Status": "Enabled",
                  "Transitions":[{
                      "TransitionInDays": "60",
                      "StorageClass": "GLACIER"
                  }]
                }
            ]
        }
      }
    }

  },
  "Outputs":{
    "CentralLogBucket":{
    	"Description" : "Central log bucket",
    	"Value" : {"Ref": "BucketName"} ,
    	"Export" : { "Name" : "CentralLogBucketName"}
    }
  }
} 

To create your central-logging bucket do the following:

  1. Save the template file to your local developer machine as “central-log-bucket.json”
  2. From the CloudFormation console, select “create new stack” and import the file “central-log-bucket.json”
  3. Fill in the parameters and complete stack creation steps (as indicated in the screenshot below)
  4. Verify the bucket has been created successfully and take a note of the bucket name

Step 2: Create data processing Lambda function

Use the template below to create a Lambda function in your logging account that will be used by Amazon Firehose for data transformation during the delivery process to S3. This function is based on the AWS Lambda kinesis-firehose-cloudwatch-logs-processor blueprint.

The function could be created manually from the blueprint or using the cloudformation template below. To find the blueprint navigate to Lambda -> Create -> Function -> Blueprints

This function will unzip the event message, parse it and verify that it is a valid CloudWatch log event. Additional processing can be added if needed. As this function is generic, it could be reused by all log-delivery streams.

{
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description": "Create cloudwatch data processing lambda function",
  "Resources":{
      
    "LambdaRole": {
        "Type": "AWS::IAM::Role",
        "Properties": {
            "AssumeRolePolicyDocument": {
                "Version": "2012-10-17",
                "Statement": [
                    {
                        "Effect": "Allow",
                        "Principal": {
                            "Service": "lambda.amazonaws.com"
                        },
                        "Action": "sts:AssumeRole"
                    }
                ]
            },
            "Path": "/",
            "Policies": [
                {
                    "PolicyName": "firehoseCloudWatchDataProcessing",
                    "PolicyDocument": {
                        "Version": "2012-10-17",
                        "Statement": [
                            {
                                "Effect": "Allow",
                                "Action": [
                                    "logs:CreateLogGroup",
                                    "logs:CreateLogStream",
                                    "logs:PutLogEvents"
                                ],
                                "Resource": "arn:aws:logs:*:*:*"
                            }
                        ]
                    }
                }
            ]
        }
    },
      
    "FirehoseDataProcessingFunction": {
        "Type": "AWS::Lambda::Function",
        "Properties": {
            "Handler": "index.handler",
            "Role": {"Fn::GetAtt": ["LambdaRole","Arn"]},
            "Description": "Firehose cloudwatch data processing",
            "Code": {
                "ZipFile" : { "Fn::Join" : ["\n", [
                  "'use strict';",
                  "const zlib = require('zlib');",
                  "function transformLogEvent(logEvent) {",
                  "       return Promise.resolve(`${logEvent.message}\n`);",
                  "}",
                  "exports.handler = (event, context, callback) => {",
                  "    Promise.all(event.records.map(r => {",
                  "        const buffer = new Buffer(r.data, 'base64');",
                  "        const decompressed = zlib.gunzipSync(buffer);",
                  "        const data = JSON.parse(decompressed);",
                  "        if (data.messageType !== 'DATA_MESSAGE') {",
                  "            return Promise.resolve({",
                  "                recordId: r.recordId,",
                  "                result: 'ProcessingFailed',",
                  "            });",
                  "         } else {",
                  "            const promises = data.logEvents.map(transformLogEvent);",
                  "            return Promise.all(promises).then(transformed => {",
                  "                const payload = transformed.reduce((a, v) => a + v, '');",
                  "                const encoded = new Buffer(payload).toString('base64');",
                  "                console.log('---------------payloadv2:'+JSON.stringify(payload, null, 2));",
                  "                return {",
                  "                    recordId: r.recordId,",
                  "                    result: 'Ok',",
                  "                    data: encoded,",
                  "                };",
                  "           });",
                  "        }",
                  "    })).then(recs => callback(null, { records: recs }));",
                    "};"

                ]]}
            },
            "Runtime": "nodejs6.10",
            "Timeout": "60"
        }
    }

  },
  "Outputs":{
   "Function" : {
      "Description": "Function ARN",
      "Value": {"Fn::GetAtt": ["FirehoseDataProcessingFunction","Arn"]},
      "Export" : { "Name" : {"Fn::Sub": "${AWS::StackName}-Function" }}
    }
  }
}

To create the function follow the steps below:

  1. Save the template file as “central-logging-lambda.json”
  2. Login to logging account and, from the CloudFormation console, select “create new stack”
  3. Import the file “central-logging-lambda.json” and click next
  4. Follow the steps to create the stack and verify successful creation
  5. Take a note of Lambda function arn from the output section

Step 3: Create log destination in logging account

Log destination is used as the target of a subscription from application accounts, log destination can be shared between multiple subscriptions however according to the architecture suggested in this solution all logs streamed to the same destination will be stored in the same S3 location, if you would like to store log data in different hierarchy or in a completely different bucket you need to create separate destinations.

As noted previously, your destination and subscription have to be in the same region

Use the template below to create destination stack in logging account.

{
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description": "Create log destination and required resources",
  "Parameters":{

    "LogBucketName":{
      "Type":"String",
      "Default":"central-log-do-not-delete",
      "Description":"Destination logging bucket"
    },
    "LogS3Location":{
      "Type":"String",
      "Default":"<BU>/<ENV>/<SOURCE_ACCOUNT>/<LOG_TYPE>/",
      "Description":"S3 location for the logs streamed to this destination; example marketing/prod/999999999999/flow-logs/"
    },
    "ProcessingLambdaARN":{
      "Type":"String",
      "Default":"",
      "Description":"CloudWatch logs data processing function"
    },
    "SourceAccount":{
      "Type":"String",
      "Default":"",
      "Description":"Source application account number"
    }
  },
    
  "Resources":{
    "MyStream": {
      "Type": "AWS::Kinesis::Stream",
      "Properties": {
        "Name": {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-Stream"] ]},
        "RetentionPeriodHours" : 48,
        "ShardCount": 1,
        "Tags": [
          {
            "Key": "Solution",
            "Value": "CentralLogging"
          }
       ]
      }
    },
    "LogRole" : {
      "Type"  : "AWS::IAM::Role",
      "Properties" : {
          "AssumeRolePolicyDocument" : {
              "Statement" : [ {
                  "Effect" : "Allow",
                  "Principal" : {
                      "Service" : [ {"Fn::Join": [ "", [ "logs.", { "Ref": "AWS::Region" }, ".amazonaws.com" ] ]} ]
                  },
                  "Action" : [ "sts:AssumeRole" ]
              } ]
          },         
          "Path" : "/service-role/"
      }
    },
      
    "LogRolePolicy" : {
        "Type" : "AWS::IAM::Policy",
        "Properties" : {
            "PolicyName" : {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-LogPolicy"] ]},
            "PolicyDocument" : {
              "Version": "2012-10-17",
              "Statement": [
                {
                  "Effect": "Allow",
                  "Action": ["kinesis:PutRecord"],
                  "Resource": [{ "Fn::GetAtt" : ["MyStream", "Arn"] }]
                },
                {
                  "Effect": "Allow",
                  "Action": ["iam:PassRole"],
                  "Resource": [{ "Fn::GetAtt" : ["LogRole", "Arn"] }]
                }
              ]
            },
            "Roles" : [ { "Ref" : "LogRole" } ]
        }
    },
      
    "LogDestination" : {
      "Type" : "AWS::Logs::Destination",
      "DependsOn" : ["MyStream","LogRole","LogRolePolicy"],
      "Properties" : {
        "DestinationName": {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-Destination"] ]},
        "RoleArn": { "Fn::GetAtt" : ["LogRole", "Arn"] },
        "TargetArn": { "Fn::GetAtt" : ["MyStream", "Arn"] },
        "DestinationPolicy": { "Fn::Join" : ["",[
		
				"{\"Version\" : \"2012-10-17\",\"Statement\" : [{\"Effect\" : \"Allow\",",
                " \"Principal\" : {\"AWS\" : \"", {"Ref":"SourceAccount"} ,"\"},",
                "\"Action\" : \"logs:PutSubscriptionFilter\",",
                " \"Resource\" : \"", 
                {"Fn::Join": [ "", [ "arn:aws:logs:", { "Ref": "AWS::Region" }, ":" ,{ "Ref": "AWS::AccountId" }, ":destination:",{ "Ref" : "AWS::StackName" },"-Destination" ] ]}  ,"\"}]}"

			]]}
          
          
      }
    },
      
    "S3deliveryStream": {
      "DependsOn": ["S3deliveryRole", "S3deliveryPolicy"],
      "Type": "AWS::KinesisFirehose::DeliveryStream",
      "Properties": {
        "DeliveryStreamName": {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-DeliveryStream"] ]},
        "DeliveryStreamType": "KinesisStreamAsSource",
        "KinesisStreamSourceConfiguration": {
            "KinesisStreamARN": { "Fn::GetAtt" : ["MyStream", "Arn"] },
            "RoleARN": {"Fn::GetAtt" : ["S3deliveryRole", "Arn"] }
        },
        "ExtendedS3DestinationConfiguration": {
          "BucketARN": {"Fn::Join" : [ "", ["arn:aws:s3:::",{"Ref":"LogBucketName"}] ]},
          "BufferingHints": {
            "IntervalInSeconds": "60",
            "SizeInMBs": "50"
          },
          "CompressionFormat": "UNCOMPRESSED",
          "Prefix": {"Ref": "LogS3Location"},
          "RoleARN": {"Fn::GetAtt" : ["S3deliveryRole", "Arn"] },
          "ProcessingConfiguration" : {
              "Enabled": "true",
              "Processors": [
              {
                "Parameters": [ 
                { 
                    "ParameterName": "LambdaArn",
                    "ParameterValue": {"Ref":"ProcessingLambdaARN"}
                }],
                "Type": "Lambda"
              }]
          }
        }

      }
    },
      
    "S3deliveryRole": {
      "Type": "AWS::IAM::Role",
      "Properties": {
        "AssumeRolePolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Sid": "",
              "Effect": "Allow",
              "Principal": {
                "Service": "firehose.amazonaws.com"
              },
              "Action": "sts:AssumeRole",
              "Condition": {
                "StringEquals": {
                  "sts:ExternalId": {"Ref":"AWS::AccountId"}
                }
              }
            }
          ]
        }
      }
    },
      
    "S3deliveryPolicy": {
      "Type": "AWS::IAM::Policy",
      "Properties": {
        "PolicyName": {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-FirehosePolicy"] ]},
        "PolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Effect": "Allow",
              "Action": [
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:PutObject"
              ],
              "Resource": [
                {"Fn::Join": ["", [ {"Fn::Join" : [ "", ["arn:aws:s3:::",{"Ref":"LogBucketName"}] ]}]]},
                {"Fn::Join": ["", [ {"Fn::Join" : [ "", ["arn:aws:s3:::",{"Ref":"LogBucketName"}] ]}, "*"]]}
              ]
            },
            {
              "Effect": "Allow",
              "Action": [
                "lambda:InvokeFunction",
                "lambda:GetFunctionConfiguration",
                "logs:PutLogEvents",
                "kinesis:DescribeStream",
                "kinesis:GetShardIterator",
                "kinesis:GetRecords",
                "kms:Decrypt"
              ],
              "Resource": "*"
            }
          ]
        },
        "Roles": [{"Ref": "S3deliveryRole"}]
      }
    }

  },
  "Outputs":{
      
   "Destination" : {
      "Description": "Destination",
      "Value": {"Fn::Join": [ "", [ "arn:aws:logs:", { "Ref": "AWS::Region" }, ":" ,{ "Ref": "AWS::AccountId" }, ":destination:",{ "Ref" : "AWS::StackName" },"-Destination" ] ]},
      "Export" : { "Name" : {"Fn::Sub": "${AWS::StackName}-Destination" }}
    }

  }
} 

To create log your destination and all required resources, follow these steps:

  1. Save your template as “central-logging-destination.json”
  2. Login to your logging account and, from the CloudFormation console, select “create new stack”
  3. Import the file “central-logging-destination.json” and click next
  4. Fill in the parameters to configure the log destination and click Next
  5. Follow the default steps to create the stack and verify successful creation
    1. Bucket name is the same as in the “create central logging bucket” step
    2. LogS3Location is the directory hierarchy for saving log data that will be delivered to this destination
    3. ProcessingLambdaARN is as created in “create data processing Lambda function” step
    4. SourceAccount is the application account number where the subscription will be created
  6. Take a note of destination ARN as it appears in outputs section as you did above.

Step 4: Create the log subscription in your application account

In this section, we will create the subscription filter in one of the application accounts to stream logs from the CloudWatch log group to the log destination that was created in your logging account.

Create log subscription filter

The subscription filter is created between the CloudWatch log group and a destination endpoint. Asubscription could be filtered to send part (or all) of the logs in the log group. For example,you can create a subscription filter to stream only flow logs with status REJECT.

Use the CloudFormation template below to create subscription filter. Subscription filter and log destination must be in the same region.

{
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description": "Create log subscription filter for a specific Log Group",
  "Parameters":{

    "DestinationARN":{
      "Type":"String",
      "Default":"",
      "Description":"ARN of logs destination"
    },
    "LogGroupName":{
      "Type":"String",
      "Default":"",
      "Description":"Name of LogGroup to forward logs from"
    },
    "FilterPattern":{
      "Type":"String",
      "Default":"",
      "Description":"Filter pattern to filter events to be sent to log destination; Leave empty to send all logs"
    }
  },
    
  "Resources":{
    "SubscriptionFilter" : {
      "Type" : "AWS::Logs::SubscriptionFilter",
      "Properties" : {
        "LogGroupName" : { "Ref" : "LogGroupName" },
        "FilterPattern" : { "Ref" : "FilterPattern" },
        "DestinationArn" : { "Ref" : "DestinationARN" }
      }
    }
  }
}

To create a subscription filter for one of CloudWatch log groups in your application account, follow the steps below:

  1. Save the template as “central-logging-subscription.json”
  2. Login to your application account and, from the CloudFormation console, select “create new stack”
  3. Select the file “central-logging-subscription.json” and click next
  4. Fill in the parameters as appropriate to your environment as you did above
    a.  DestinationARN is the value of obtained in “create log destination in logging account” step
    b.  FilterPatterns is the filter value for log data to be streamed to your logging account (leave empty to stream all logs in the selected log group)
    c.  LogGroupName is the log group as it appears under CloudWatch Logs
  5. Verify successful creation of the subscription

This completes the deployment process in both the logging- and application-account side. After a few minutes, log data will be streamed to the central-logging destination defined in your logging account.

Step 5: Analyzing log data

Once log data is centralized, it opens the door to run analytics on the consolidated data for business or security reasons. One of the powerful services that AWS offers is Amazon Athena.

Amazon Athena allows you to query data in S3 using standard SQL.

Follow the steps below to create a simple table and run queries on the flow logs data that has been collected from your application accounts

  1. Login to your logging account and from the Amazon Athena console, use the DDL below in your query  editor to create a new table

CREATE EXTERNAL TABLE IF NOT EXISTS prod_vpc_flow_logs (

Version INT,

Account STRING,

InterfaceId STRING,

SourceAddress STRING,

DestinationAddress STRING,

SourcePort INT,

DestinationPort INT,

Protocol INT,

Packets INT,

Bytes INT,

StartTime INT,

EndTime INT,

Action STRING,

LogStatus STRING

)

ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.RegexSerDe’

WITH SERDEPROPERTIES (

“input.regex” = “^([^ ]+)\\s+([0-9]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([0-9]+)\\s+([0-9]+)\\s+([^ ]+)\\s+([^ ]+)$”)

LOCATION ‘s3://central-logging-company-do-not-delete/’;

2. Click ”run query” and verify a successful run/ This creates the table “prod_vpc_flow_logs”

3. You can then run queries against the table data as below:

Conclusion

By following the steps I’ve outlined, you will build a central logging solution to stream CloudWatch logs from one application account to a central logging account. This solution is repeatable and could be deployed multiple times for multiple accounts and logging requirements.

 

About the Author

Mahmoud Matouk is a Senior Cloud Infrastructure Architect. He works with our customers to help accelerate migration and cloud adoption at the enterprise level.

 

Amazon Relational Database Service – Looking Back at 2017

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-relational-database-service-looking-back-at-2017/

The Amazon RDS team launched nearly 80 features in 2017. Some of them were covered in this blog, others on the AWS Database Blog, and the rest in What’s New or Forum posts. To wrap up my week, I thought it would be worthwhile to give you an organized recap. So here we go!

Certification & Security

Features

Engine Versions & Features

Regional Support

Instance Support

Price Reductions

And That’s a Wrap
I’m pretty sure that’s everything. As you can see, 2017 was quite the year! I can’t wait to see what the team delivers in 2018.

Jeff;

 

Migrating Your Amazon ECS Containers to AWS Fargate

Post Syndicated from Tiffany Jernigan original https://aws.amazon.com/blogs/compute/migrating-your-amazon-ecs-containers-to-aws-fargate/

AWS Fargate is a new technology that works with Amazon Elastic Container Service (ECS) to run containers without having to manage servers or clusters. What does this mean? With Fargate, you no longer need to provision or manage a single virtual machine; you can just create tasks and run them directly!

Fargate uses the same API actions as ECS, so you can use the ECS console, the AWS CLI, or the ECS CLI. I recommend running through the first-run experience for Fargate even if you’re familiar with ECS. It creates all of the one-time setup requirements, such as the necessary IAM roles. If you’re using a CLI, make sure to upgrade to the latest version

In this blog, you will see how to migrate ECS containers from running on Amazon EC2 to Fargate.

Getting started

Note: Anything with code blocks is a change in the task definition file. Screen captures are from the console. Additionally, Fargate is currently available in the us-east-1 (N. Virginia) region.

Launch type

When you create tasks (grouping of containers) and clusters (grouping of tasks), you now have two launch type options: EC2 and Fargate. The default launch type, EC2, is ECS as you knew it before the announcement of Fargate. You need to specify Fargate as the launch type when running a Fargate task.

Even though Fargate abstracts away virtual machines, tasks still must be launched into a cluster. With Fargate, clusters are a logical infrastructure and permissions boundary that allow you to isolate and manage groups of tasks. ECS also supports heterogeneous clusters that are made up of tasks running on both EC2 and Fargate launch types.

The optional, new requiresCompatibilities parameter with FARGATE in the field ensures that your task definition only passes validation if you include Fargate-compatible parameters. Tasks can be flagged as compatible with EC2, Fargate, or both.

"requiresCompatibilities": [
    "FARGATE"
]

Networking

"networkMode": "awsvpc"

In November, we announced the addition of task networking with the network mode awsvpc. By default, ECS uses the bridge network mode. Fargate requires using the awsvpc network mode.

In bridge mode, all of your tasks running on the same instance share the instance’s elastic network interface, which is a virtual network interface, IP address, and security groups.

The awsvpc mode provides this networking support to your tasks natively. You now get the same VPC networking and security controls at the task level that were previously only available with EC2 instances. Each task gets its own elastic networking interface and IP address so that multiple applications or copies of a single application can run on the same port number without any conflicts.

The awsvpc mode also provides a separation of responsibility for tasks. You can get complete control of task placement within your own VPCs, subnets, and the security policies associated with them, even though the underlying infrastructure is managed by Fargate. Also, you can assign different security groups to each task, which gives you more fine-grained security. You can give an application only the permissions it needs.

"portMappings": [
    {
        "containerPort": "3000"
    }
 ]

What else has to change? First, you only specify a containerPort value, not a hostPort value, as there is no host to manage. Your container port is the port that you access on your elastic network interface IP address. Therefore, your container ports in a single task definition file need to be unique.

"environment": [
    {
        "name": "WORDPRESS_DB_HOST",
        "value": "127.0.0.1:3306"
    }
 ]

Additionally, links are not allowed as they are a property of the “bridge” network mode (and are now a legacy feature of Docker). Instead, containers share a network namespace and communicate with each other over the localhost interface. They can be referenced using the following:

localhost/127.0.0.1:<some_port_number>

CPU and memory

"memory": "1024",
 "cpu": "256"

"memory": "1gb",
 "cpu": ".25vcpu"

When launching a task with the EC2 launch type, task performance is influenced by the instance types that you select for your cluster combined with your task definition. If you pick larger instances, your applications make use of the extra resources if there is no contention.

In Fargate, you needed a way to get additional resource information so we created task-level resources. Task-level resources define the maximum amount of memory and cpu that your task can consume.

  • memory can be defined in MB with just the number, or in GB, for example, “1024” or “1gb”.
  • cpu can be defined as the number or in vCPUs, for example, “256” or “.25vcpu”.
    • vCPUs are virtual CPUs. You can look at the memory and vCPUs for instance types to get an idea of what you may have used before.

The memory and CPU options available with Fargate are:

CPUMemory
256 (.25 vCPU)0.5GB, 1GB, 2GB
512 (.5 vCPU)1GB, 2GB, 3GB, 4GB
1024 (1 vCPU)2GB, 3GB, 4GB, 5GB, 6GB, 7GB, 8GB
2048 (2 vCPU)Between 4GB and 16GB in 1GB increments
4096 (4 vCPU)Between 8GB and 30GB in 1GB increments

IAM roles

Because Fargate uses awsvpc mode, you need an Amazon ECS service-linked IAM role named AWSServiceRoleForECS. It provides Fargate with the needed permissions, such as the permission to attach an elastic network interface to your task. After you create your service-linked IAM role, you can delete the remaining roles in your services.

"executionRoleArn": "arn:aws:iam::<your_account_id>:role/ecsTaskExecutionRole"

With the EC2 launch type, an instance role gives the agent the ability to pull, publish, talk to ECS, and so on. With Fargate, the task execution IAM role is only needed if you’re pulling from Amazon ECR or publishing data to Amazon CloudWatch Logs.

The Fargate first-run experience tutorial in the console automatically creates these roles for you.

Volumes

Fargate currently supports non-persistent, empty data volumes for containers. When you define your container, you no longer use the host field and only specify a name.

Load balancers

For awsvpc mode, and therefore for Fargate, use the IP target type instead of the instance target type. You define this in the Amazon EC2 service when creating a load balancer.

If you’re using a Classic Load Balancer, change it to an Application Load Balancer or a Network Load Balancer.

Tip: If you are using an Application Load Balancer, make sure that your tasks are launched in the same VPC and Availability Zones as your load balancer.

Let’s migrate a task definition!

Here is an example NGINX task definition. This type of task definition is what you’re used to if you created one before Fargate was announced. It’s what you would run now with the EC2 launch type.

{
    "containerDefinitions": [
        {
            "name": "nginx",
            "image": "nginx",
            "memory": "512",
            "cpu": "100",
            "essential": true,
            "portMappings": [
                {
                    "hostPort": "80",
                    "containerPort": "80",
                    "protocol": "tcp"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "family": "nginx-ec2"
}

OK, so now what do you need to do to change it to run with the Fargate launch type?

  • Add FARGATE for requiredCompatibilities (not required, but a good safety check for your task definition).
  • Use awsvpc as the network mode.
  • Just specify the containerPort (the hostPortvalue is the same).
  • Add a task executionRoleARN value to allow logging to CloudWatch.
  • Provide cpu and memory limits for the task.
{
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "containerDefinitions": [
        {
            "name": "nginx",
            "image": "nginx",
            "memory": "512",
            "cpu": "100",
            "essential": true,
            "portMappings": [
                {
                    "containerPort": "80",
                    "protocol": "tcp"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "networkMode": "awsvpc",
    "executionRoleArn": "arn:aws:iam::<your_account_id>:role/ecsTaskExecutionRole",
    "family": "nginx-fargate",
    "memory": "512",
    "cpu": "256"
}

Are there more examples?

Yep! Head to the AWS Samples GitHub repo. We have several sample task definitions you can try for both the EC2 and Fargate launch types. Contributions are very welcome too :).

 

tiffany jernigan
@tiffanyfayj

Invoking AWS Lambda from Amazon MQ

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/invoking-aws-lambda-from-amazon-mq/

Contributed by Josh Kahn, AWS Solutions Architect

Message brokers can be used to solve a number of needs in enterprise architectures, including managing workload queues and broadcasting messages to a number of subscribers. Amazon MQ is a managed message broker service for Apache ActiveMQ that makes it easy to set up and operate message brokers in the cloud.

In this post, I discuss one approach to invoking AWS Lambda from queues and topics managed by Amazon MQ brokers. This and other similar patterns can be useful in integrating legacy systems with serverless architectures. You could also integrate systems already migrated to the cloud that use common APIs such as JMS.

For example, imagine that you work for a company that produces training videos and which recently migrated its video management system to AWS. The on-premises system used to publish a message to an ActiveMQ broker when a video was ready for processing by an on-premises transcoder. However, on AWS, your company uses Amazon Elastic Transcoder. Instead of modifying the management system, Lambda polls the broker for new messages and starts a new Elastic Transcoder job. This approach avoids changes to the existing application while refactoring the workload to leverage cloud-native components.

This solution uses Amazon CloudWatch Events to trigger a Lambda function that polls the Amazon MQ broker for messages. Instead of starting an Elastic Transcoder job, the sample writes the received message to an Amazon DynamoDB table with a time stamp indicating the time received.

Getting started

To start, navigate to the Amazon MQ console. Next, launch a new Amazon MQ instance, selecting Single-instance Broker and supplying a broker name, user name, and password. Be sure to document the user name and password for later.

For the purposes of this sample, choose the default options in the Advanced settings section. Your new broker is deployed to the default VPC in the selected AWS Region with the default security group. For this post, you update the security group to allow access for your sample Lambda function. In a production scenario, I recommend deploying both the Lambda function and your Amazon MQ broker in your own VPC.

After several minutes, your instance changes status from “Creation Pending” to “Available.” You can then visit the Details page of your broker to retrieve connection information, including a link to the ActiveMQ web console where you can monitor the status of your broker, publish test messages, and so on. In this example, use the Stomp protocol to connect to your broker. Be sure to capture the broker host name, for example:

<BROKER_ID>.mq.us-east-1.amazonaws.com

You should also modify the Security Group for the broker by clicking on its Security Group ID. Click the Edit button and then click Add Rule to allow inbound traffic on port 8162 for your IP address.

Deploying and scheduling the Lambda function

To simplify the deployment of this example, I’ve provided an AWS Serverless Application Model (SAM) template that deploys the sample function and DynamoDB table, and schedules the function to be invoked every five minutes. Detailed instructions can be found with sample code on GitHub in the amazonmq-invoke-aws-lambda repository, with sample code. I discuss a few key aspects in this post.

First, SAM makes it easy to deploy and schedule invocation of our function:

SubscriberFunction:
	Type: AWS::Serverless::Function
	Properties:
		CodeUri: subscriber/
		Handler: index.handler
		Runtime: nodejs6.10
		Role: !GetAtt SubscriberFunctionRole.Arn
		Timeout: 15
		Environment:
			Variables:
				HOST: !Ref AmazonMQHost
				LOGIN: !Ref AmazonMQLogin
				PASSWORD: !Ref AmazonMQPassword
				QUEUE_NAME: !Ref AmazonMQQueueName
				WORKER_FUNCTIOn: !Ref WorkerFunction
		Events:
			Timer:
				Type: Schedule
				Properties:
					Schedule: rate(5 minutes)

WorkerFunction:
Type: AWS::Serverless::Function
	Properties:
		CodeUri: worker/
		Handler: index.handler
		Runtime: nodejs6.10
Role: !GetAtt WorkerFunctionRole.Arn
		Environment:
			Variables:
				TABLE_NAME: !Ref MessagesTable

In the code, you include the URI, user name, and password for your newly created Amazon MQ broker. These allow the function to poll the broker for new messages on the sample queue.

The sample Lambda function is written in Node.js, but clients exist for a number of programming languages.

stomp.connect(options, (error, client) => {
	if (error) { /* do something */ }

	let headers = {
		destination: ‘/queue/SAMPLE_QUEUE’,
		ack: ‘auto’
	}

	client.subscribe(headers, (error, message) => {
		if (error) { /* do something */ }

		message.readString(‘utf-8’, (error, body) => {
			if (error) { /* do something */ }

			let params = {
				FunctionName: MyWorkerFunction,
				Payload: JSON.stringify({
					message: body,
					timestamp: Date.now()
				})
			}

			let lambda = new AWS.Lambda()
			lambda.invoke(params, (error, data) => {
				if (error) { /* do something */ }
			})
		}
})
})

Sending a sample message

For the purpose of this example, use the Amazon MQ console to send a test message. Navigate to the details page for your broker.

About midway down the page, choose ActiveMQ Web Console. Next, choose Manage ActiveMQ Broker to launch the admin console. When you are prompted for a user name and password, use the credentials created earlier.

At the top of the page, choose Send. From here, you can send a sample message from the broker to subscribers. For this example, this is how you generate traffic to test the end-to-end system. Be sure to set the Destination value to “SAMPLE_QUEUE.” The message body can contain any text. Choose Send.

You now have a Lambda function polling for messages on the broker. To verify that your function is working, you can confirm in the DynamoDB console that the message was successfully received and processed by the sample Lambda function.

First, choose Tables on the left and select the table name “amazonmq-messages” in the middle section. With the table detail in view, choose Items. If the function was successful, you’ll find a new entry similar to the following:

If there is no message in DynamoDB, check again in a few minutes or review the CloudWatch Logs group for Lambda functions that contain debug messages.

Alternative approaches

Beyond the approach described here, you may consider other approaches as well. For example, you could use an intermediary system such as Apache Flume to pass messages from the broker to Lambda or deploy Apache Camel to trigger Lambda via a POST to API Gateway. There are trade-offs to each of these approaches. My goal in using CloudWatch Events was to introduce an easily repeatable pattern familiar to many Lambda developers.

Summary

I hope that you have found this example of how to integrate AWS Lambda with Amazon MQ useful. If you have expertise or legacy systems that leverage APIs such as JMS, you may find this useful as you incorporate serverless concepts in your enterprise architectures.

To learn more, see the Amazon MQ website and Developer Guide. You can try Amazon MQ for free with the AWS Free Tier, which includes up to 750 hours of a single-instance mq.t2.micro broker and up to 1 GB of storage per month for one year.