Tag Archives: Management Tools

Developing enterprise application patterns with the AWS CDK

Post Syndicated from Krishnakumar Rengarajan original https://aws.amazon.com/blogs/devops/developing-application-patterns-cdk/

Enterprises often need to standardize their infrastructure as code (IaC) for governance, compliance, and quality control reasons. You also need to manage and centrally publish updates to your IaC libraries. In this post, we demonstrate how to use the AWS Cloud Development Kit (AWS CDK) to define patterns for IaC and publish them for consumption in controlled releases using AWS CodeArtifact.

AWS CDK is an open-source software development framework to model and provision cloud application resources in programming languages such as TypeScript, JavaScript, Python, Java, and C#/.Net. The basic building blocks of AWS CDK are called constructs, which map to one or more AWS resources, and can be composed of other constructs. Constructs allow high-level abstractions to be defined as patterns. You can synthesize constructs into AWS CloudFormation templates and deploy them into an AWS account.

CodeArtifact is a fully managed service for managing the lifecycle of software artifacts. You can use CodeArtifact to securely store, publish, and share software artifacts. Software artifacts are stored in repositories, which are aggregated into a domain. A CodeArtifact domain allows organizational policies to be applied across multiple repositories. You can use CodeArtifact with common build tools and package managers such as Maven, Gradle, npm, yarn, pip, and twine.

Solution overview

In this solution, we complete the following steps:

  1. Create two AWS CDK pattern constructs in Typescript: one for traditional three-tier web applications and a second for serverless web applications.
  2. Publish the pattern constructs to CodeArtifact as npm packages. npm is the package manager for Node.js.
  3. Consume the pattern construct npm packages from CodeArtifact and use them to provision the AWS infrastructure.

We provide more information about the pattern constructs in the following sections. The source code mentioned in this blog is available in GitHub.

Traditional three-tier web application construct

The first pattern construct is for a traditional three-tier web application running on Amazon Elastic Compute Cloud (Amazon EC2), with AWS resources consisting of Application Load Balancer, an Autoscaling group and EC2 launch configuration, an Amazon Relational Database Service (Amazon RDS) or Amazon Aurora database, and AWS Secrets Manager. The following diagram illustrates this architecture.

 

Traditional stack architecture

Serverless web application construct

The second pattern construct is for a serverless application with AWS resources in AWS Lambda, Amazon API Gateway, and Amazon DynamoDB.

Serverless application architecture

Publishing and consuming pattern constructs

Both constructs are written in Typescript and published to CodeArtifact as npm packages. A semantic versioning scheme is used to version the construct packages. After a package gets published to CodeArtifact, teams can consume them for deploying AWS resources. The following diagram illustrates this architecture.

Pattern constructs

Prerequisites

Before getting started, complete the following steps:

  1. Clone the code from the GitHub repository for the traditional and serverless web application constructs:
    git clone [email protected]:aws-samples/aws-cdk-developing-application-patterns-blog.git
    cd aws-cdk-developing-application-patterns-blog
  2. Configure AWS Identity and Access Management (IAM) permissions by attaching IAM policies to the user, group, or role implementing this solution. The following policy files are in the iam folder in the root of the cloned repo:
    • BlogPublishArtifacts.json – The IAM policy to configure CodeArtifact and publish packages to it.
    • BlogConsumeTraditional.json – The IAM policy to consume the traditional three-tier web application construct from CodeArtifact and deploy it to an AWS account.
    • PublishArtifacts.json – The IAM policy to consume the serverless construct from CodeArtifact and deploy it to an AWS account.

Configuring CodeArtifact

In this step, we configure CodeArtifact for publishing the pattern constructs as npm packages. The following AWS resources are created:

  • A CodeArtifact domain named blog-domain
  • Two CodeArtifact repositories:
    • blog-npm-store – For configuring the upstream NPM repository.
    • blog-repository – For publishing custom packages.

Deploy the CodeArtifact resources with the following code:

cd prerequisites/
rm -rf package-lock.json node_modules
npm install
cdk deploy --require-approval never
cd ..

Log in to the blog-repository. This step is needed for publishing and consuming the npm packages. See the following code:

aws codeartifact login \
     --tool npm \
     --domain blog-domain \
     --domain-owner $(aws sts get-caller-identity --output text --query 'Account') \
     --repository blog-repository

Publishing the pattern constructs

  1. Change the directory to the serverless construct:
    cd serverless
  2. Install the required npm packages:
    rm package-lock.json && rm -rf node_modules ;
    npm install ;
    
  3. Build the npm project:
    npm run build
  4. Publish the construct npm package to the CodeArtifact repository:
    npm publish

    Follow the previously mentioned steps for building and publishing a traditional (classic Load Balancer plus Amazon EC2) web app by running these commands in the traditional directory.

    If the publishing is successful, you see messages like the following screenshots. The following screenshot shows the traditional infrastructure.

    The following screenshot shows the message for the serverless infrastructure.

    We just published version 1.0.1 of both the traditional and serverless web app constructs. To release a new version, we can simply update the version attribute in the package.json file in the traditional or serverless folder and repeat the last two steps.

    The following code snippet is for the traditional construct:

    {
        "name": "traditional-infrastructure",
        "main": "lib/index.js",
        "files": [
            "lib/*.js",
            "src"
        ],
        "types": "lib/index.d.ts",
        "version": "1.0.1",
    ...
    }

    The following code snippet is for the serverless construct:

    {
        "name": "serverless-infrastructure",
        "main": "lib/index.js",
        "files": [
            "lib/*.js",
            "src"
        ],
        "types": "lib/index.d.ts",
        "version": "1.0.1",
    ...
    }

Consuming the pattern constructs from CodeArtifact

In this step, we demonstrate how the pattern constructs published in the previous steps can be consumed and used to provision AWS infrastructure.

  1. From the root of the GitHub package, change the directory to the examples directory containing code for consuming traditional or serverless constructs.To consume the traditional construct, use the following code:
    cd examples/traditional

    To consume the serverless construct, use the following code:

    cd examples/serverless
  2. Open the package.json file in either directory and note that the packages and versions we consume are listed in the dependencies section, along with their version.
    The following code shows the traditional web app construct dependencies:

    "dependencies": {
        "@aws-cdk/core": "1.30.0",
        "traditional-infrastructure": "1.0.1",
        "aws-cdk": "1.47.0"
    }

    The following code shows the serverless web app construct dependencies:

    "dependencies": {
        "@aws-cdk/core": "1.30.0",
        "serverless-infrastructure": "1.0.1",
        "aws-cdk": "1.47.0"
    }
  3. Install the pattern artifact npm package along with the dependencies:
    rm package-lock.json && rm -rf node_modules
    npm install
    
  4. As an optional step, build the npm project. The following command builds the Lambda function source code:
    npm run build

    This step is applicable if the Lambda function is being overridden.

  5. Bootstrap the project with the following code:
    cdk bootstrap

    This step is applicable for serverless applications only. It creates the Amazon Simple Storage Service (Amazon S3) staging bucket where the Lambda function code and artifacts are stored.

  6. Deploy the construct:
    cdk deploy --require-approval never

    If the deployment is successful, you see messages similar to the following screenshots. The following screenshot shows the traditional stack output, with the URL of the Load Balancer endpoint.

    Traditional CloudFormation stack outputs

    The following screenshot shows the serverless stack output, with the URL of the API Gateway endpoint.

    Serverless CloudFormation stack outputs

    You can test the endpoint for both constructs using a web browser or the following curl command:

    curl <endpoint output>

    The traditional web app endpoint returns a response similar to the following:

    [{"app": "traditional", "id": 1605186496, "purpose": "blog"}]

    The serverless stack returns two outputs. Use the output named ServerlessStack-v1.Api. See the following code:

    [{"purpose":"blog","app":"serverless","itemId":"1605190688947"}]

  7. Optionally, upgrade to a new version of pattern construct.
    Let’s assume that a new version of the serverless construct, version 1.0.2, has been published, and we want to upgrade our AWS infrastructure to this version. To do this, edit the package.json file and change the traditional-infrastructure or serverless-infrastructure package version in the dependencies section to 1.0.2. See the following code example:

    "dependencies": {
        "@aws-cdk/core": "1.30.0",
        "serverless-infrastructure": "1.0.2",
        "aws-cdk": "1.47.0"
    }

    To update the serverless-infrastructure package to 1.0.2, run the following command:

    npm update

    Then redeploy the CloudFormation stack:

    cdk deploy --require-approval never

Cleaning up

To avoid incurring future charges, clean up the resources you created.

  1. Delete all AWS resources that were created using the pattern constructs. We can use the AWS CDK toolkit to clean up all the resources:
    cdk destroy --force

    For more information about the AWS CDK toolkit, see Toolkit reference. Alternatively, delete the stack on the AWS CloudFormation console.

  2. Delete the CodeArtifact resources by deleting the CloudFormation stack that was deployed via AWS CDK:
    cd prerequisites
    cdk destroy –force
    

Conclusion

In this post, we demonstrated how to publish AWS CDK pattern constructs to CodeArtifact as npm packages. We also showed how teams can consume the published pattern constructs and use them to provision their AWS infrastructure.

This mechanism allows your infrastructure for AWS services to be provisioned from the configuration that has been vetted for quality control and security and governance checks. It also provides control over when new versions of the pattern constructs are released, and when the teams consuming the constructs can upgrade to the newly released versions.

About the Authors

Usman Umar

 

Usman Umar is a Sr. Applications Architect at AWS Professional Services. He is passionate about developing innovative ways to solve hard technical problems for the customers. In his free time, he likes going on biking trails, doing car modifications, and spending time with his family.

 

 

Krishnakumar Rengarajan

 

Krishnakumar Rengarajan is a DevOps Consultant with AWS Professional Services. He enjoys working with customers and focuses on building and delivering automated solutions that enables customers on their AWS cloud journeys.

Optimizing AWS Lambda cost and performance using AWS Compute Optimizer

Post Syndicated from Chad Schmutzer original https://aws.amazon.com/blogs/compute/optimizing-aws-lambda-cost-and-performance-using-aws-compute-optimizer/

This post is authored by Brooke Chen, Senior Product Manager for AWS Compute Optimizer, Letian Feng, Principal Product Manager for AWS Compute Optimizer, and Chad Schmutzer, Principal Developer Advocate for Amazon EC2

Optimizing compute resources is a critical component of any application architecture. Over-provisioning compute can lead to unnecessary infrastructure costs, while under-provisioning compute can lead to poor application performance.

Launched in December 2019, AWS Compute Optimizer is a recommendation service for optimizing the cost and performance of AWS compute resources. It generates actionable optimization recommendations tailored to your specific workloads. Over the last year, thousands of AWS customers reduced compute costs up to 25% by using Compute Optimizer to help choose the optimal Amazon EC2 instance types for their workloads.

One of the most frequent requests from customers is for AWS Lambda recommendations in Compute Optimizer. Today, we announce that Compute Optimizer now supports memory size recommendations for Lambda functions. This allows you to reduce costs and increase performance for your Lambda-based serverless workloads. To get started, opt in for Compute Optimizer to start finding recommendations.

Overview

With Lambda, there are no servers to manage, it scales automatically, and you only pay for what you use. However, choosing the right memory size settings for a Lambda function is still an important task. Computer Optimizer uses machine-learning based memory recommendations to help with this task.

These recommendations are available through the Compute Optimizer console, AWS CLI, AWS SDK, and the Lambda console. Compute Optimizer continuously monitors Lambda functions, using historical performance metrics to improve recommendations over time. In this blog post, we walk through an example to show how to use this feature.

Using Compute Optimizer for Lambda

This tutorial uses the AWS CLI v2 and the AWS Management Console.

In this tutorial, we setup two compute jobs that run every minute in AWS Region US East (N. Virginia). One job is more CPU intensive than the other. Initial tests show that the invocation times for both jobs typically last for less than 60 seconds. The goal is to either reduce cost without much increase in duration, or reduce the duration in a cost-efficient manner.

Based on these requirements, a serverless solution can help with this task. Amazon EventBridge can schedule the Lambda functions using rules. To ensure that the functions are optimized for cost and performance, you can use the memory recommendation support in Compute Optimizer.

In your AWS account, opt in to Compute Optimizer to start analyzing AWS resources. Ensure you have the appropriate IAM permissions configured – follow these steps for guidance. If you prefer to use the console to opt in, follow these steps. To opt in, enter the following command in a terminal window:

$ aws compute-optimizer update-enrollment-status --status Active

Once you enable Compute Optimizer, it starts to scan for functions that have been invoked for at least 50 times over the trailing 14 days. The next section shows two example scheduled Lambda functions for analysis.

Example Lambda functions

The code for the non-CPU intensive job is below. A Lambda function named lambda-recommendation-test-sleep is created with memory size configured as 1024 MB. An EventBridge rule is created to trigger the function on a recurring 1-minute schedule:

import json
import time

def lambda_handler(event, context):
  time.sleep(30)
  x=[0]*100000000
  return {
    'statusCode': 200,
    'body': json.dumps('Hello World!')
  }

The code for the CPU intensive job is below. A Lambda function named lambda-recommendation-test-busy is created with memory size configured as 128 MB. An EventBridge rule is created to trigger the function on a recurring 1-minute schedule:

import json
import random

def lambda_handler(event, context):
  random.seed(1)
  x=0
  for i in range(0, 20000000):
    x+=random.random()

  return {
    'statusCode': 200,
    'body': json.dumps('Sum:' + str(x))
  }

Understanding the Compute Optimizer recommendations

Compute Optimizer needs a history of at least 50 invocations of a Lambda function over the trailing 14 days to deliver recommendations. Recommendations are created by analyzing function metadata such as memory size, timeout, and runtime, in addition to CloudWatch metrics such as number of invocations, duration, error count, and success rate.

Compute Optimizer will gather the necessary information to provide memory recommendations for Lambda functions, and make them available within 48 hours. Afterwards, these recommendations will be refreshed daily.

These are recent invocations for the non-CPU intensive function:

Recent invocations for the non-CPU intensive function

Function duration is approximately 31.3 seconds with a memory setting of 1024 MB, resulting in a duration cost of about $0.00052 per invocation. Here are the recommendations for this function in the Compute Optimizer console:

Recommendations for this function in the Compute Optimizer console

The function is Not optimized with a reason of Memory over-provisioned. You can also fetch the same recommendation information via the CLI:

$ aws compute-optimizer \
  get-lambda-function-recommendations \
  --function-arns arn:aws:lambda:us-east-1:123456789012:function:lambda-recommendation-test-sleep
{
    "lambdaFunctionRecommendations": [
        {
            "utilizationMetrics": [
                {
                    "name": "Duration",
                    "value": 31333.63587049883,
                    "statistic": "Average"
                },
                {
                    "name": "Duration",
                    "value": 32522.04,
                    "statistic": "Maximum"
                },
                {
                    "name": "Memory",
                    "value": 817.67049838188,
                    "statistic": "Average"
                },
                {
                    "name": "Memory",
                    "value": 819.0,
                    "statistic": "Maximum"
                }
            ],
            "currentMemorySize": 1024,
            "lastRefreshTimestamp": 1608735952.385,
            "numberOfInvocations": 3090,
            "functionArn": "arn:aws:lambda:us-east-1:123456789012:function:lambda-recommendation-test-sleep:$LATEST",
            "memorySizeRecommendationOptions": [
                {
                    "projectedUtilizationMetrics": [
                        {
                            "name": "Duration",
                            "value": 30015.113193697029,
                            "statistic": "LowerBound"
                        },
                        {
                            "name": "Duration",
                            "value": 31515.86878891883,
                            "statistic": "Expected"
                        },
                        {
                            "name": "Duration",
                            "value": 33091.662123300975,
                            "statistic": "UpperBound"
                        }
                    ],
                    "memorySize": 900,
                    "rank": 1
                }
            ],
            "functionVersion": "$LATEST",
            "finding": "NotOptimized",
            "findingReasonCodes": [
                "MemoryOverprovisioned"
            ],
            "lookbackPeriodInDays": 14.0,
            "accountId": "123456789012"
        }
    ]
}

The Compute Optimizer recommendation contains useful information about the function. Most importantly, it has determined that the function is over-provisioned for memory. The attribute findingReasonCodes shows the value MemoryOverprovisioned. In memorySizeRecommendationOptions, Compute Optimizer has found that using a memory size of 900 MB results in an expected invocation duration of approximately 31.5 seconds.

For non-CPU intensive jobs, reducing the memory setting of the function often doesn’t have a negative impact on function duration. The recommendation confirms that you can reduce the memory size from 1024 MB to 900 MB, saving cost without significantly impacting duration. The new duration cost per invocation saves approximately 12%.

The Compute Optimizer console validates these calculations:

Compute Optimizer console validates these calculations

These are recent invocations for the second function which is CPU-intensive:

Recent invocations for the second function which is CPU-intensive

The function duration is about 37.5 seconds with a memory setting of 128 MB, resulting in a duration cost of about $0.000078 per invocation. The recommendations for this function appear in the Compute Optimizer console:

recommendations for this function appear in the Compute Optimizer console

The function is also Not optimized with a reason of Memory under-provisioned. The same recommendation information is available via the CLI:

$ aws compute-optimizer \
  get-lambda-function-recommendations \
  --function-arns arn:aws:lambda:us-east-1:123456789012:function:lambda-recommendation-test-busy
{
    "lambdaFunctionRecommendations": [
        {
            "utilizationMetrics": [
                {
                    "name": "Duration",
                    "value": 36006.85851551957,
                    "statistic": "Average"
                },
                {
                    "name": "Duration",
                    "value": 38540.43,
                    "statistic": "Maximum"
                },
                {
                    "name": "Memory",
                    "value": 53.75978407557355,
                    "statistic": "Average"
                },
                {
                    "name": "Memory",
                    "value": 55.0,
                    "statistic": "Maximum"
                }
            ],
            "currentMemorySize": 128,
            "lastRefreshTimestamp": 1608725151.752,
            "numberOfInvocations": 741,
            "functionArn": "arn:aws:lambda:us-east-1:123456789012:function:lambda-recommendation-test-busy:$LATEST",
            "memorySizeRecommendationOptions": [
                {
                    "projectedUtilizationMetrics": [
                        {
                            "name": "Duration",
                            "value": 27340.37604781184,
                            "statistic": "LowerBound"
                        },
                        {
                            "name": "Duration",
                            "value": 28707.394850202432,
                            "statistic": "Expected"
                        },
                        {
                            "name": "Duration",
                            "value": 30142.764592712556,
                            "statistic": "UpperBound"
                        }
                    ],
                    "memorySize": 160,
                    "rank": 1
                }
            ],
            "functionVersion": "$LATEST",
            "finding": "NotOptimized",
            "findingReasonCodes": [
                "MemoryUnderprovisioned"
            ],
            "lookbackPeriodInDays": 14.0,
            "accountId": "123456789012"
        }
    ]
}

For this function, Compute Optimizer has determined that the function’s memory is under-provisioned. The value of findingReasonCodes is MemoryUnderprovisioned. The recommendation is to increase the memory from 128 MB to 160 MB.

This recommendation may seem counter-intuitive, since the function only uses 55 MB of memory per invocation. However, Lambda allocates CPU and other resources linearly in proportion to the amount of memory configured. This means that increasing the memory allocation to 160 MB also reduces the expected duration to around 28.7 seconds. This is because a CPU-intensive task also benefits from the increased CPU performance that comes with the additional memory.

After applying this recommendation, the new expected duration cost per invocation is approximately $0.000075. This means that for almost no change in duration cost, the job latency is reduced from 37.5 seconds to 28.7 seconds.

The Compute Optimizer console validates these calculations:

Compute Optimizer console validates these calculations

Applying the Compute Optimizer recommendations

To optimize the Lambda functions using Compute Optimizer recommendations, use the following CLI command:

$ aws lambda update-function-configuration \
  --function-name lambda-recommendation-test-sleep \
  --memory-size 900

After invoking the function multiple times, we can see metrics of these invocations in the console. This shows that the function duration has not changed significantly after reducing the memory size from 1024 MB to 900 MB. The Lambda function has been successfully cost-optimized without increasing job duration:

Console shows the metrics from recent invocations

To apply the recommendation to the CPU-intensive function, use the following CLI command:

$ aws lambda update-function-configuration \
  --function-name lambda-recommendation-test-busy \
  --memory-size 160

After invoking the function multiple times, the console shows that the invocation duration is reduced to about 28 seconds. This matches the recommendation’s expected duration. This shows that the function is now performance-optimized without a significant cost increase:

Console shows that the invocation duration is reduced to about 28 seconds

Final notes

A couple of final notes:

  • Not every function will receive a recommendation. Compute optimizer only delivers recommendations when it has high confidence that these recommendations may help reduce cost or reduce execution duration.
  • As with any changes you make to an environment, we strongly advise that you test recommended memory size configurations before applying them into production.

Conclusion

You can now use Compute Optimizer for serverless workloads using Lambda functions. This can help identify the optimal Lambda function configuration options for your workloads. Compute Optimizer supports memory size recommendations for Lambda functions in all AWS Regions where Compute Optimizer is available. These recommendations are available to you at no additional cost. You can get started with Compute Optimizer from the console.

To learn more visit Getting started with AWS Compute Optimizer.

 

Announcing Amazon Managed Service for Grafana (in Preview)

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/announcing-amazon-managed-grafana-service-in-preview/

Today, in partnership with Grafana Labs, we are excited to announce in preview, Amazon Managed Service for Grafana (AMG), a fully managed service that makes it easy to create on-demand, scalable, and secure Grafana workspaces to visualize and analyze your data from multiple sources.

Grafana is one of the most popular open source technologies used to create observability dashboards for your applications. It has a pluggable data source model and support for different kinds of time series databases and cloud monitoring vendors. Grafana centralizes your application data from multiple open-source, cloud, and third-party data sources.

Many of our customers love Grafana, but don’t want the burden of self-hosting and managing it. AMG manages the provisioning, setup, scaling, version upgrades and security patching of Grafana, eliminating the need for customers to do it themselves. AMG automatically scales to support thousands of users with high availability.

With AMG, you will get a fully managed and secure data visualization service where you can query, correlate, and visualize operational metrics, logs and traces across multiple data sources including cloud services such as AWS, Google, and Microsoft. AMG is integrated with AWS data sources, such as Amazon CloudWatch, Amazon Elasticsearch Service, AWS X-Ray, AWS IoT SiteWise, Amazon Timestream, and others to collect operational data in a simple way. Additionally, AMG also provides plug-ins to connect to popular third-party data sources, such as Datadog, Splunk, ServiceNow, and New Relic by upgrading to Grafana Enterprise directly from the AWS Console.

Screenshot for creating and configuring a managed Grafana workspace

AMG integrates directly into your AWS Organizations. You can define a AMG workspace in one AWS account that allows you to discover and access datasources in all your accounts and regions across your AWS organization. Creating dashboards in Grafana is easy as all these different datasources are discoverable in one place.

Customers really like Grafana for the ease of creating dashboards, it comes with many built-in dashboards to use when you add a new data source, or you can take advantage of its broad community of pre-built dashboards. For example, you can see in the following image a really nice dashboard that AMG created for me from one of my AWS Lambda function.

Screenshot of an automatic dashboard for Lambda function

One of my favorite things from AMG is the built-in security features. You can easily enable single sign-on using AWS Single Sign-On, restrict access to data sources and dashboards to the right users, and access audit logs via AWS CloudTrail for your hosted Grafana workspace. With AWS Single Sign-On you can leverage your existing corporate directories to enforce authentication and authorization permissions.

Another powerful feature that AMG has is support for Alerts. AMG integrates with Amazon Simple Notification Service (SNS) so customers can send Grafana alerts to SNS as a notification destination. It also has support for four other alert destinations including PagerDuty, Slack, VictorOps and OpsGenie.

There are no up-front investments required to use AMG, and you only pay a monthly active user license fee. This means that you can provision many users to access to your Grafana workspace, but will only be billed for active users that log in and use the workspace that month. Users granted access but that do not log in, will not be billed that month. You can also upgrade to Grafana Enterprise using AWS Marketplace, to get access to enterprise plugins, support, and training content directly from Grafana Labs.

Availability

This service is available in US East (N. Virginia) and Europe (Ireland) regions. To learn more visit the AMG service page, and be sure to join our re:Invent session tomorrow 12/16 from 8:00am – 8:30am PST for a demo!

AMG is now available in preview; to get access to this service fill out the registration form here.

Marcia

Join the Preview – Amazon Managed Service for Prometheus (AMP)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/join-the-preview-amazon-managed-service-for-prometheus-amp/

Observability is an essential aspect of running cloud infrastructure at scale. You need to know that your resources are healthy and performing as expected, and that your system is delivering the desired level of performance to your customers.

A lot of challenges arise when monitoring container-based applications. First, because container resources are transient and there are lots of metrics to watch, the monitoring data has strikingly high cardinality. In plain language this means that there are lots of unique values, which can make it harder to define a space-efficient storage model and to create queries that return meaningful results. Second, because a well-architected container-based system is composed using a large number of moving parts, ingesting, processing, and storing the monitoring data can become an infrastructure challenge of its own.

Prometheus is a leading open-source monitoring solution with an active developer and user community. It has a multi-dimensional data model that is a great fit for time series data collected from containers.

Introducing Amazon Managed Service for Prometheus (AMP)
Today we are launching a preview of Amazon Managed Service for Prometheus (AMP). This fully-managed service is 100% compatible with Prometheus. It supports the same metrics, the same PromQL queries, and can also make use of the 150+ Prometheus exporters. AMP runs across multiple Availability Zones for high availability, and is powered by CNCF Cortex for horizontal scalability. AMP will easily scale to ingest, store, and query millions of time series metrics.

The preview includes support for Amazon Elastic Kubernetes Service (EKS) and Amazon Elastic Container Service (ECS). It can also be used to monitor your self-managed Kubernetes clusters that are running in the cloud or on-premises.

Getting Started with Amazon Managed Service for Prometheus (AMP)
After joining the preview, I open the AMP Console, enter a name for my AMP workspace, and click Create to get started (API and CLI support is also available):

My workspace is active within a minute or so. The console provides me with the endpoints that I can use to write data to my workspace, and to issue queries:

It also provides guidance on how to configure an existing Prometheus server to send metrics to the AMP workspace:

I can also use AWS Distro for OpenTelemetry to scrape Prometheus metrics and send them to my AMP workspace.

Once I have stored some metrics in my workspace, I can run PromQL queries and I can use Grafana to create dashboards and other visualizations. Here’s a sample Grafana dashboard:

Join the Preview
As noted earlier, we’re launching Amazon Managed Service for Prometheus (AMP) in preview form and you are welcome to try it out today.

We’ll have more info (and a more detailed blog post) at launch time.

Jeff;

Rapid and flexible Infrastructure as Code using the AWS CDK with AWS Solutions Constructs

Post Syndicated from Biff Gaut original https://aws.amazon.com/blogs/devops/rapid-flexible-infrastructure-with-solutions-constructs-cdk/

Introduction

As workloads move to the cloud and all infrastructure becomes virtual, infrastructure as code (IaC) becomes essential to leverage the agility of this new world. JSON and YAML are the powerful, declarative modeling languages of AWS CloudFormation, allowing you to define complex architectures using IaC. Just as higher level languages like BASIC and C abstracted away the details of assembly language and made developers more productive, the AWS Cloud Development Kit (AWS CDK) provides a programming model above the native template languages, a model that makes developers more productive when creating IaC. When you instantiate CDK objects in your Typescript (or Python, Java, etc.) application, those objects “compile” into a YAML template that the CDK deploys as an AWS CloudFormation stack.

AWS Solutions Constructs take this simplification a step further by providing a library of common service patterns built on top of the CDK. These multi-service patterns allow you to deploy multiple resources with a single object, resources that follow best practices by default – both independently and throughout their interaction.

Comparison of an Application stack with Assembly Language, 4th generation language and Object libraries such as Hibernate with an IaC stack of CloudFormation, AWS CDK and AWS Solutions Constructs

Application Development Stack vs. IaC Development Stack

Solution overview

To demonstrate how using Solutions Constructs can accelerate the development of IaC, in this post you will create an architecture that ingests and stores sensor readings using Amazon Kinesis Data Streams, AWS Lambda, and Amazon DynamoDB.

An architecture diagram showing sensor readings being sent to a Kinesis data stream. A Lambda function will receive the Kinesis records and store them in a DynamoDB table.

Prerequisite – Setting up the CDK environment

Tip – If you want to try this example but are concerned about the impact of changing the tools or versions on your workstation, try running it on AWS Cloud9. An AWS Cloud9 environment is launched with an AWS Identity and Access Management (AWS IAM) role and doesn’t require configuring with an access key. It uses the current region as the default for all CDK infrastructure.

To prepare your workstation for CDK development, confirm the following:

  • Node.js 10.3.0 or later is installed on your workstation (regardless of the language used to write CDK apps).
  • You have configured credentials for your environment. If you’re running locally you can do this by configuring the AWS Command Line Interface (AWS CLI).
  • TypeScript 2.7 or later is installed globally (npm -g install typescript)

Before creating your CDK project, install the CDK toolkit using the following command:

npm install -g aws-cdk

Create the CDK project

  1. First create a project folder called stream-ingestion with these two commands:

mkdir stream-ingestion
cd stream-ingestion

  1. Now create your CDK application using this command:

npx [email protected] init app --language=typescript

Tip – This example will be written in TypeScript – you can also specify other languages for your projects.

At this time, you must use the same version of the CDK and Solutions Constructs. We’re using version 1.68.0 of both based upon what’s available at publication time, but you can update this with a later version for your projects in the future.

Let’s explore the files in the application this command created:

  • bin/stream-ingestion.ts – This is the module that launches the application. The key line of code is:

new StreamIngestionStack(app, 'StreamIngestionStack');

This creates the actual stack, and it’s in StreamIngestionStack that you will write the CDK code that defines the resources in your architecture.

  • lib/stream-ingestion-stack.ts – This is the important class. In the constructor of StreamIngestionStack you will add the constructs that will create your architecture.

During the deployment process, the CDK uploads your Lambda function to an Amazon S3 bucket so it can be incorporated into your stack.

  1. To create that S3 bucket and any other infrastructure the CDK requires, run this command:

cdk bootstrap

The CDK uses the same supporting infrastructure for all projects within a region, so you only need to run the bootstrap command once in any region in which you create CDK stacks.

  1. To install the required Solutions Constructs packages for our architecture, run the these two commands from the command line:

npm install @aws-solutions-constructs/[email protected]
npm install @aws-solutions-constructs/[email protected]

Write the code

First you will write the Lambda function that processes the Kinesis data stream messages.

  1. Create a folder named lambda under stream-ingestion
  2. Within the lambda folder save a file called lambdaFunction.js with the following contents:
var AWS = require("aws-sdk");

// Create the DynamoDB service object
var ddb = new AWS.DynamoDB({ apiVersion: "2012-08-10" });

AWS.config.update({ region: process.env.AWS_REGION });

// We will configure our construct to 
// look for the .handler function
exports.handler = async function (event) {
  try {
    // Kinesis will deliver records 
    // in batches, so we need to iterate through
    // each record in the batch
    for (let record of event.Records) {
      const reading = parsePayload(record.kinesis.data);
      await writeRecord(record.kinesis.partitionKey, reading);
    };
  } catch (err) {
    console.log(`Write failed, err:\n${JSON.stringify(err, null, 2)}`);
    throw err;
  }
  return;
};

// Write the provided sensor reading data to the DynamoDB table
async function writeRecord(partitionKey, reading) {

  var params = {
    // Notice that Constructs automatically sets up 
    // an environment variable with the table name.
    TableName: process.env.DDB_TABLE_NAME,
    Item: {
      partitionKey: { S: partitionKey },  // sensor Id
      timestamp: { S: reading.timestamp },
      value: { N: reading.value}
    },
  };

  // Call DynamoDB to add the item to the table
  await ddb.putItem(params).promise();
}

// Decode the payload and extract the sensor data from it
function parsePayload(payload) {

  const decodedPayload = Buffer.from(payload, "base64").toString(
    "ascii"
  );

  // Our CLI command will send the records to Kinesis
  // with the values delimited by '|'
  const payloadValues = decodedPayload.split("|", 2)
  return {
    value: payloadValues[0],
    timestamp: payloadValues[1]
  }
}

We won’t spend a lot of time explaining this function – it’s pretty straightforward and heavily commented. It receives an event with one or more sensor readings, and for each reading it extracts the pertinent data and saves it to the DynamoDB table.

You will use two Solutions Constructs to create your infrastructure:

The aws-kinesisstreams-lambda construct deploys an Amazon Kinesis data stream and a Lambda function.

  • aws-kinesisstreams-lambda creates the Kinesis data stream and Lambda function that subscribes to that stream. To support this, it also creates other resources, such as IAM roles and encryption keys.

The aws-lambda-dynamodb construct deploys a Lambda function and a DynamoDB table.

  • aws-lambda-dynamodb creates an Amazon DynamoDB table and a Lambda function with permission to access the table.
  1. To deploy the first of these two constructs, replace the code in lib/stream-ingestion-stack.ts with the following code:
import * as cdk from "@aws-cdk/core";
import * as lambda from "@aws-cdk/aws-lambda";
import { KinesisStreamsToLambda } from "@aws-solutions-constructs/aws-kinesisstreams-lambda";

import * as ddb from "@aws-cdk/aws-dynamodb";
import { LambdaToDynamoDB } from "@aws-solutions-constructs/aws-lambda-dynamodb";

export class StreamIngestionStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const kinesisLambda = new KinesisStreamsToLambda(
      this,
      "KinesisLambdaConstruct",
      {
        lambdaFunctionProps: {
          // Where the CDK can find the lambda function code
          runtime: lambda.Runtime.NODEJS_10_X,
          handler: "lambdaFunction.handler",
          code: lambda.Code.fromAsset("lambda"),
        },
      }
    );

    // Next Solutions Construct goes here
  }
}

Let’s explore this code:

  • It instantiates a new KinesisStreamsToLambda object. This Solutions Construct will launch a new Kinesis data stream and a new Lambda function, setting up the Lambda function to receive all the messages in the Kinesis data stream. It will also deploy all the additional resources and policies required for the architecture to follow best practices.
  • The third argument to the constructor is the properties object, where you specify overrides of default values or any other information the construct needs. In this case you provide properties for the encapsulated Lambda function that informs the CDK where to find the code for the Lambda function that you stored as lambda/lambdaFunction.js earlier.
  1. Now you’ll add the second construct that connects the Lambda function to a new DynamoDB table. In the same lib/stream-ingestion-stack.ts file, replace the line // Next Solutions Construct goes here with the following code:
    // Define the primary key for the new DynamoDB table
    const primaryKeyAttribute: ddb.Attribute = {
      name: "partitionKey",
      type: ddb.AttributeType.STRING,
    };

    // Define the sort key for the new DynamoDB table
    const sortKeyAttribute: ddb.Attribute = {
      name: "timestamp",
      type: ddb.AttributeType.STRING,
    };

    const lambdaDynamoDB = new LambdaToDynamoDB(
      this,
      "LambdaDynamodbConstruct",
      {
        // Tell construct to use the Lambda function in
        // the first construct rather than deploy a new one
        existingLambdaObj: kinesisLambda.lambdaFunction,
        tablePermissions: "Write",
        dynamoTableProps: {
          partitionKey: primaryKeyAttribute,
          sortKey: sortKeyAttribute,
          billingMode: ddb.BillingMode.PROVISIONED,
          removalPolicy: cdk.RemovalPolicy.DESTROY
        },
      }
    );

    // Add autoscaling
    const readScaling = lambdaDynamoDB.dynamoTable.autoScaleReadCapacity({
      minCapacity: 1,
      maxCapacity: 50,
    });

    readScaling.scaleOnUtilization({
      targetUtilizationPercent: 50,
    });

Let’s explore this code:

  • The first two const objects define the names and types for the partition key and sort key of the DynamoDB table.
  • The LambdaToDynamoDB construct instantiated creates a new DynamoDB table and grants access to your Lambda function. The key to this call is the properties object you pass in the third argument.
    • The first property sent to LambdaToDynamoDB is existingLambdaObj – by setting this value to the Lambda function created by KinesisStreamsToLambda, you’re telling the construct to not create a new Lambda function, but to grant the Lambda function in the other Solutions Construct access to the DynamoDB table. This illustrates how you can chain many Solutions Constructs together to create complex architectures.
    • The second property sent to LambdaToDynamoDB tells the construct to limit the Lambda function’s access to the table to write only.
    • The third property sent to LambdaToDynamoDB is actually a full properties object defining the DynamoDB table. It provides the two attribute definitions you created earlier as well as the billing mode. It also sets the RemovalPolicy to DESTROY. This policy setting ensures that the table is deleted when you delete this stack – in most cases you should accept the default setting to protect your data.
  • The last two lines of code show how you can use statements to modify a construct outside the constructor. In this case we set up auto scaling on the new DynamoDB table, which we can access with the dynamoTable property on the construct we just instantiated.

That’s all it takes to create the all resources to deploy your architecture.

  1. Save all the files, then compile the Typescript into a CDK program using this command:

npm run build

  1. Finally, launch the stack using this command:

cdk deploy

(Enter “y” in response to Do you wish to deploy all these changes (y/n)?)

You will see some warnings where you override CDK default values. Because you are doing this intentionally you may disregard these, but it’s always a good idea to review these warnings when they occur.

Tip – Many mysterious CDK project errors stem from mismatched versions. If you get stuck on an inexplicable error, check package.json and confirm that all CDK and Solutions Constructs libraries have the same version number (with no leading caret ^). If necessary, correct the version numbers, delete the package-lock.json file and node_modules tree and run npm install. Think of this as the “turn it off and on again” first response to CDK errors.

You have now deployed the entire architecture for the demo – open the CloudFormation stack in the AWS Management Console and take a few minutes to explore all 12 resources that the program deployed (and the 380 line template generated to created them).

Feed the Stream

Now use the CLI to send some data through the stack.

Go to the Kinesis Data Streams console and copy the name of the data stream. Replace the stream name in the following command and run it from the command line.

aws kinesis put-records \
--stream-name StreamIngestionStack-KinesisLambdaConstructKinesisStreamXXXXXXXX-XXXXXXXXXXXX \
--records \
PartitionKey=1301,'Data=15.4|2020-08-22T01:16:36+00:00' \
PartitionKey=1503,'Data=39.1|2020-08-22T01:08:15+00:00'

Tip – If you are using the AWS CLI v2, the previous command will result in an “Invalid base64…” error because v2 expects the inputs to be Base64 encoded by default. Adding the argument --cli-binary-format raw-in-base64-out will fix the issue.

To confirm that the messages made it through the service, open the DynamoDB console – you should see the two records in the table.

Now that you’ve got it working, pause to think about what you just did. You deployed a system that can ingest and store sensor readings and scale to handle heavy loads. You did that by instantiating two objects – well under 60 lines of code. Experiment with changing some property values and deploying the changes by running npm run build and cdk deploy again.

Cleanup

To clean up the resources in the stack, run this command:

cdk destroy

Conclusion

Just as languages like BASIC and C allowed developers to write programs at a higher level of abstraction than assembly language, the AWS CDK and AWS Solutions Constructs allow us to create CloudFormation stacks in Typescript, Java, or Python instead JSON or YAML. Just as there will always be a place for assembly language, there will always be situations where we want to write CloudFormation templates manually – but for most situations, we can now use the AWS CDK and AWS Solutions Constructs to create complex and complete architectures in a fraction of the time with very little code.

AWS Solutions Constructs can currently be used in CDK applications written in Typescript, Javascript, Java and Python and will be available in C# applications soon.

About the Author

Biff Gaut has been shipping software since 1983, from small startups to large IT shops. Along the way he has contributed to 2 books, spoken at several conferences and written many blog posts. He is now a Principal Solutions Architect at AWS working on the AWS Solutions Constructs team, helping customers deploy better architectures more quickly.

AWS Solutions Constructs – A Library of Architecture Patterns for the AWS CDK

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-solutions-constructs-a-library-of-architecture-patterns-for-the-aws-cdk/

Cloud applications are built using multiple components, such as virtual servers, containers, serverless functions, storage buckets, and databases. Being able to provision and configure these resources in a safe, repeatable way is incredibly important to automate your processes and let you focus on the unique parts of your implementation.

With the AWS Cloud Development Kit, you can leverage the expressive power of your favorite programming languages to model your applications. You can use high-level components called constructs, preconfigured with “sensible defaults” that you can customize, to quickly build a new application. The CDK provisions your resources using AWS CloudFormation to get all the benefits of managing your infrastructure as code. One of the reasons I like the CDK, is that you can compose and share your own custom components as higher-level constructs.

As you can imagine, there are recurring patterns that can be useful to more than one customer. For this reason, today we are launching the AWS Solutions Constructs, an open source extension library for the CDK that provides well-architected patterns to help you build your unique solutions. CDK constructs mostly cover single services. AWS Solutions Constructs provide multi-service patterns that combine two or more CDK resources, and implement best practices such as logging and encryption.

Using AWS Solutions Constructs
To see the power of a pattern-based approach, let’s take a look at how that works when building a new application. As an example, I want to build an HTTP API to store data in a Amazon DynamoDB table. To keep the content of the table small, I can use DynamoDB Time to Live (TTL) to expire items after a few days. After the TTL expires, data is deleted from the table and sent, via DynamoDB Streams, to a AWS Lambda function to archive the expired data on Amazon Simple Storage Service (S3).

To build this application, I can use a few components:

  • An Amazon API Gateway endpoint for the API.
  • A DynamoDB table to store data.
  • A Lambda function to process the API requests, and store data in the DynamoDB table.
  • DynamoDB Streams to capture data changes.
  • A Lambda function processing data changes to archive the expired data.

Can I make it simpler? Looking at the available patterns in the AWS Solutions Constructs, I find two that can help me build my app:

  • aws-apigateway-lambda, a Construct that implements an API Gateway REST API connected to a Lambda function. As an example of the “sensible defaults” used by AWS Solutions Constructs, this pattern enables CloudWatch logging for the API Gateway.
  • aws-dynamodb-stream-lambda, a Construct implementing a DynamoDB table streaming data changes to a Lambda function with the least privileged permissions.

To build the final architecture, I simply connect those two Constructs together:

I am using TypeScript to define the CDK stack, and Node.js for the Lambda functions. Let’s start with the CDK stack:

 

import * as cdk from '@aws-cdk/core';
import * as lambda from '@aws-cdk/aws-lambda';
import * as apigw from '@aws-cdk/aws-apigateway';
import * as dynamodb from '@aws-cdk/aws-dynamodb';
import { ApiGatewayToLambda } from '@aws-solutions-constructs/aws-apigateway-lambda';
import { DynamoDBStreamToLambda } from '@aws-solutions-constructs/aws-dynamodb-stream-lambda';

export class DemoConstructsStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const apiGatewayToLambda = new ApiGatewayToLambda(this, 'ApiGatewayToLambda', {
      deployLambda: true,
      lambdaFunctionProps: {
        code: lambda.Code.fromAsset('lambda'),
        runtime: lambda.Runtime.NODEJS_12_X,
        handler: 'restApi.handler'
      },
      apiGatewayProps: {
        defaultMethodOptions: {
          authorizationType: apigw.AuthorizationType.NONE
        }
      }
    });

    const dynamoDBStreamToLambda = new DynamoDBStreamToLambda(this, 'DynamoDBStreamToLambda', {
      deployLambda: true,
      lambdaFunctionProps: {
        code: lambda.Code.fromAsset('lambda'),
        runtime: lambda.Runtime.NODEJS_12_X,
        handler: 'processStream.handler'
      },
      dynamoTableProps: {
        tableName: 'my-table',
        partitionKey: { name: 'id', type: dynamodb.AttributeType.STRING },
        timeToLiveAttribute: 'ttl'
      }
    });

    const apiFunction = apiGatewayToLambda.lambdaFunction;
    const dynamoTable = dynamoDBStreamToLambda.dynamoTable;

    dynamoTable.grantReadWriteData(apiFunction);
    apiFunction.addEnvironment('TABLE_NAME', dynamoTable.tableName);
  }
}

At the beginning of the stack, I import the standard CDK constructs for the Lambda function, the API Gateway endpoint, and the DynamoDB table. Then, I add the two patterns from the AWS Solutions Constructs, ApiGatewayToLambda and DynamoDBStreamToLambda.

After declaring the two ApiGatewayToLambda and DynamoDBStreamToLambda constructs, I store the Lambda function, created by the ApiGatewayToLambda constructs, and the DynamoDB table, created by DynamoDBStreamToLambda, in two variables.

At the end of the stack, I “connect” the two patterns together by granting permissions to the Lambda function to read/write in the DynamoDB table, and add the name of the DynamoDB table to the environment of the Lambda function, so that it can be used in the function code to store data in the table.

The code of the two Lambda functions is in the lambda folder of the CDK application. I am using the Node.js 12 runtime.

The restApi.js function implements the API and writes data to the DynamoDB table. The URL path is used as partition key, all the query string parameters in the URL are stored as attributes. The TTL for the item is computed adding a time window of 7 days to the current time.

const { DynamoDB } = require("aws-sdk");

const docClient = new DynamoDB.DocumentClient();

const TABLE_NAME = process.env.TABLE_NAME;
const TTL_WINDOW = 7 * 24 * 60 * 60; // 7 days expressed in seconds

exports.handler = async function (event) {

  const item = event.queryStringParameters;
  item.id = event.pathParameters.proxy;

  const now = new Date(); 
  item.ttl = Math.round(now.getTime() / 1000) + TTL_WINDOW;

  const response = await docClient.put({
    TableName: TABLE_NAME,
    Item: item
  }).promise();

  let statusCode = 204;
  
  if (response.err != null) {
    console.error('request: ', JSON.stringify(event, undefined, 2));
    console.error('error: ', response.err);
    statusCode = 500
  }

  return {
    statusCode: statusCode
  };
};

The processStream.js function is processing data capture records from the DynamoDB Stream, looking for the items deleted by TTL. The archive functionality is not implemented in this sample code.

exports.handler = async function (event) {
  event.Records.forEach((record) => {
    console.log('Stream record: ', JSON.stringify(record, null, 2));
    if (record.userIdentity.type == "Service" &&
      record.userIdentity.principalId == "dynamodb.amazonaws.com") {

      // Record deleted by DynamoDB Time to Live (TTL)
      
      // I can archive the record to S3, for example using Kinesis Data Firehose.
    }
  }
};

Let’s see if this works! First, I need to install all dependencies. To simplify dependencies, each release of AWS Solutions Constructs is linked to the corresponding version of the CDK. I this case, I am using version 1.46.0 for both the CDK and the AWS Solutions Constructs patterns. The first three commands are installing plain CDK constructs. The last two commands are installing the AWS Solutions Constructs patterns I am using for this application.

npm install @aws-cdk/[email protected]
npm install @aws-cdk/[email protected]
npm install @aws-cdk/[email protected]
npm install @aws-solutions-constructs/[email protected]
npm install @aws-solutions-constructs/[email protected]

Now, I build the application and use the CDK to deploy the application.

npm run build
cdk deploy

Towards the end of the output of the cdk deploy command, a green light is telling me that the deployment of the stack is completed. Just next, in the Outputs, I find the endpoint of the API Gateway.

 ✅  DemoConstructsStack

Outputs:
DemoConstructsStack.ApiGatewayToLambdaLambdaRestApiEndpoint9800D4B5 = https://1a2c3c4d.execute-api.eu-west-1.amazonaws.com/prod/

I can now use curl to test the API:

curl "https://1a2c3c4d.execute-api.eu-west-1.amazonaws.com/prod/danilop?name=Danilo&amp;company=AWS"

Let’s have a look at the DynamoDB table:

The item is stored, and the TTL is set. After a week, the item will be deleted and sent via DynamoDB Streams to the processStream.js function.

After I complete my testing, I use the CDK again to quickly delete all resources created for this application:

cdk destroy

Available Now
The AWS Solutions Constructs are available now for TypeScript and Python. The AWS Solutions Builders team is working to make these constructs also available when using Java and C# with the CDK, stay tuned. There is no cost in using the AWS Solutions Constructs, or the CDK, you only pay for the resources created when deploying the stack.

In this first release, 25 patterns are included, covering lots of different use cases. Which new patterns and features should we focus now? Give use your feedback in the open source project repository!

Danilo

Simplified Time-Series Analysis with Amazon CloudWatch Contributor Insights

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/simplified-time-series-analysis-with-amazon-cloudwatch-contributor-insights/

Inspecting multiple log groups and log streams can make it more difficult and time consuming to analyze and diagnose the impact of an issue in real time. What customers are affected? How badly? Are some affected more than others, or are outliers? Perhaps you performed deployment of an update using a staged rollout strategy and now want to know if any customers have hit issues or if everything is behaving as expected for the target customers before continuing further. All of the data points to help answer these questions is potentially buried in a mass of logs which engineers query to get ad-hoc measurements, or build and maintain custom dashboards to help track.

Amazon CloudWatch Contributor Insights, generally available today, is a new feature to help simplify analysis of Top-N contributors to time-series data in CloudWatch Logs that can help you more quickly understand who or what is impacting system and application performance, in real-time, at scale. This saves you time during an operational problem by helping you understand what is contributing to the operational issue and who or what is most affected. Amazon CloudWatch Contributor Insights can also help with ongoing analysis for system and business optimization by easily surfacing outliers, performance bottlenecks, top customers, or top utilized resources, all at a glance. In addition to logs, Amazon CloudWatch Contributor Insights can also be used with other products in the CloudWatch portfolio, including Metrics and Alarms.

Amazon CloudWatch Contributor Insights can analyze structured logs in either JSON or Common Log Format (CLF). Log data can be sourced from Amazon Elastic Compute Cloud (EC2) instances, AWS CloudTrail, Amazon Route 53, Apache Access and Error Logs, Amazon Virtual Private Cloud (VPC) Flow Logs, AWS Lambda Logs, and Amazon API Gateway Logs. You also have the choice of using structured logs published directly to CloudWatch, or using the CloudWatch Agent. Amazon CloudWatch Contributor Insights will evaluate these log events in real-time and display reports that show the top contributors and number of unique contributors in a dataset. A contributor is an aggregate metric based on dimensions contained as log fields in CloudWatch Logs, for example account-id, or interface-id in Amazon Virtual Private Cloud Flow Logs, or any other custom set of dimensions. You can sort and filter contributor data based on your own custom criteria. Report data from Amazon CloudWatch Contributor Insights can be displayed on CloudWatch dashboards, graphed alongside CloudWatch metrics, and added to CloudWatch alarms. For example customers can graph values from two Amazon CloudWatch Contributor Insights reports into a single metric describing the percentage of customers impacted by faults, and configure alarms to alert when this percentage breaches pre-defined thresholds.

Getting Started with Amazon CloudWatch Contributor Insights
To use Amazon CloudWatch Contributor Insights I simply need to define one or more rules. A rule is simply a snippet of data that defines what contextual data to extract for metrics reported from CloudWatch Logs. To configure a rule to identify the top contributors for a specific metric I supply three items of data – the log group (or groups), the dimensions for which the top contributors are evaluated, and filters to narrow down those top contributors. To do this, I head to the Amazon CloudWatch console dashboard and select Contributor Insights from the left-hand navigation links. This takes me to the Amazon CloudWatch Contributor Insights home where I can click Create a rule to get started.

To get started quickly, I can select from a library of sample rules for various services that send logs to CloudWatch Logs. You can see above that there are currently a variety of sample rules for Amazon API Gateway, Amazon Route 53 Query Logs, Amazon Virtual Private Cloud Flow Logs, and logs for container services. Alternatively, I can define my own rules, as I’ll do in the rest of this post.

Let’s say I have a deployed application that is publishing structured log data in JSON format directly to CloudWatch Logs. This application has two API versions, one that has been used for some time and is considered stable, and a second that I have just started to roll out to my customers. I want to know as early as possible if anyone who has received the new version, targeting the new api, is receiving any faults and how many faults are being triggered. My stable api version is sending log data to one log group and my new version is using a different group, so I need to monitor multiple log groups (since I also want to know if anyone is experiencing any error, regardless of version).

The JSON to define my rule, to report on 500 errors coming from any of my APIs, and to use account ID, HTTP method, and resource path as dimensions, is shown below.

{
  "Schema": {
    "Name": "CloudWatchLogRule",
    "Version": 1
  },
  "AggregateOn": "Count",
  "Contribution": {
    "Filters": [
      {
        "Match": "$.status",
        "EqualTo": 500
      }
    ],
    "Keys": [
      "$.accountId",
      "$.httpMethod",
      "$.resourcePath"
    ]
  },
  "LogFormat": "JSON",
  "LogGroupNames": [
    "MyApplicationLogsV*"
  ]
}

I can set up my rule using either the Wizard tab, or I can paste the JSON above into the Rule body field on the Syntax tab. Even though I have the JSON above, I’ll show using the Wizard tab in this post and you can see the completed fields below. When selecting log groups I can either select them from the drop down, if they already exist, or I can use wildcard syntax in the Select by prefix match option (MyApplicationLogsV* for example).

Clicking Create saves the new rule and makes it immediately start processing and analyzing data (unless I elect to create it in disabled state of course). Note that Amazon CloudWatch Contributor Insights processes new log data created once the rule is active, it does not perform historical inspection, so I need to build rules for operational scenarios that I anticipate happening in future.

With the rule in place I need to start generating some log data! To do that I’m going to use a script, written using the AWS Tools for PowerShell, to simulate my deployed application being invoked by a set of customers. Of those customers, a select few (let’s call them the unfortunate ones) will be directed to the new API version which will randomly fail on HTTP POST requests. Customers using the old API version will always succeed. The script, which runs for 5000 iterations, is shown below. The cmdlets being used to work with CloudWatch Logs are the ones with CWL in the name, for example Write-CWLLogEvent.

# Set up some random customer ids, and select a third of them to be our unfortunates
# who will experience random errors due to a bad api update being shipped!
$allCustomerIds = @( 1..15 | % { Get-Random })
$faultingIds = $allCustomerIds | Get-Random -Count 5

# Setup some log groups
$group1 = 'MyApplicationLogsV1'
$group2 = 'MyApplicationLogsV2'
$stream = "MyApplicationLogStream"

# When writing to a log stream we need to specify a sequencing token
$group1Sequence = $null
$group2Sequence = $null

$group1, $group2 | % {
    if (!(Get-CWLLogGroup -LogGroupName $_)) {
        New-CWLLogGroup -LogGroupName $_
        New-CWLLogStream -LogGroupName $_ -LogStreamName $stream
    } else {
        # When the log group and stream exist, we need to seed the sequence token to
        # the next expected value
        $logstream = Get-CWLLogStream -LogGroupName $_ -LogStreamName $stream
        $token = $logstream.UploadSequenceToken
        if ($_ -eq $group1) {
            $group1Sequence = $token
        } else {
            $group2Sequence = $token
        }
    }
}

# generate some log data with random failures for the subset of customers
1..5000 | % {

    Write-Host "Log event iteration $_" # just so we know where we are progress-wise

    $customerId = Get-Random $allCustomerIds

    # first select whether the user called the v1 or the v2 api
    $useV2Api = ((Get-Random) % 2 -eq 1)
    if ($useV2Api) {
        $resourcePath = '/api/v2/some/resource/path/'
        $targetLogGroup = $group2
        $nextToken = $group2Sequence
    } else {
        $resourcePath = '/api/v1/some/resource/path/'
        $targetLogGroup = $group1
        $nextToken = $group1Sequence
    }

    # now select whether they failed or not. GET requests for all customers on
    # all api paths succeed. POST requests to the v2 api fail for a subset of
    # customers.
    $status = 200
    $errorMessage = ''
    if ((Get-Random) % 2 -eq 0) {
        $httpMethod = "GET"
    } else {
        $httpMethod = "POST"
        if ($useV2Api -And $faultingIds.Contains($customerId)) {
            $status = 500
            $errorMessage = 'Uh-oh, something went wrong...'
        }
    }

    # Write an event and gather the sequence token for the next event
    $nextToken = Write-CWLLogEvent -LogGroupName $targetLogGroup -LogStreamName $stream -SequenceToken $nextToken -LogEvent @{
        TimeStamp = [DateTime]::UtcNow
        Message = (ConvertTo-Json -Compress -InputObject @{
            requestId = [Guid]::NewGuid().ToString("D")
            httpMethod = $httpMethod
            resourcePath = $resourcePath
            status = $status
            protocol = "HTTP/1.1"
            accountId = $customerId
            errorMessage = $errorMessage
        })
    }

    if ($targetLogGroup -eq $group1) {
        $group1Sequence = $nextToken
    } else {
        $group2Sequence = $nextToken
    }

    Start-Sleep -Seconds 0.25
}

I start the script running, and with my rule enabled, I start to see failures show up in my graph. Below is a snapshot after several minutes of running the script. I can clearly see a subset of my simulated customers are having issues with HTTP POST requests to the new v2 API.

From the Actions pull down in the Rules panel, I could now go on to create a single metric from this report, describing the percentage of customers impacted by faults, and then configure an alarm on this metric to alert when this percentage breaches pre-defined thresholds.

For the sample scenario outlined here I would use the alarm to halt the rollout of the new API if it fired, preventing the impact spreading to additional customers, while investigation of what is behind the increased faults is performed. Details on how to set up metrics and alarms can be found in the user guide.

Amazon CloudWatch Contributor Insights is available now to users in all commercial AWS Regions, including China and GovCloud.

— Steve

Visualize and Monitor Highly Distributed Applications with Amazon CloudWatch ServiceLens

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/visualize-and-monitor-highly-distributed-applications-with-amazon-cloudwatch-servicelens/

Increasingly distributed applications, with thousands of metrics and terabytes of logs, can be a challenge to visualize and monitor. Gaining an end-to-end insight of the applications and their dependencies to enable rapid pinpointing of performance bottlenecks, operational issues, and customer impact quite often requires the use of multiple dedicated tools each presenting their own particular facet of information. This in turn leads to more complex data ingestion, manual stitching together of the various insights to determine overall performance, and increased costs from maintaining multiple solutions.

Amazon CloudWatch ServiceLens, announced today, is a new fully managed observability solution to help with visualizing and analyzing the health, performance, and availability of highly distributed applications, including those with dependencies on serverless and container-based technologies, all in one place. By enabling you to easily isolate endpoints and resources that are experiencing issues, together with analysis of correlated metrics, logs, and application traces, CloudWatch ServiceLens helps reduce Mean Time to Resolution (MTTR) by consolidating all of this data in a single place using a service map. From this map you can understand the relationships and dependencies within your applications, and dive deep into the various logs, metrics, and traces from a single tool to help you quickly isolate faults. Crucial time spent correlating metrics, logs, and trace data from across various tools is saved, thus reducing any downtime incurred by end users.

Getting Started with Amazon CloudWatch ServiceLens
Let’s see how we can take advantage of Amazon CloudWatch ServiceLens to diagnose the root cause of an alarm triggered from an application. My sample application reads and writes transaction data to a Amazon DynamoDB table using AWS Lambda functions. An Amazon API Gateway is my application’s front-end, with resources for GET and PUT requests, directing traffic to the corresponding GET and PUT lambda functions. The API Gateway resources and the Lambda functions have AWS X-Ray tracing enabled, and the API calls to DynamoDB from within the Lambda functions are wrapped using the AWS X-Ray SDK. You can read more about how to instrument your code, and work with AWS X-Ray, in the Developer Guide.

An error condition has triggered an alarm for my application, so my first stop is the Amazon CloudWatch Console, where I click the Alarm link. I can see that there is some issue with availability with one or more of my API Gateway resources.

Let’s drill down to see what might be going on. First I want to get an overview of the running application so I click Service Map under ServiceLens in the left-hand navigation panel. The map displays nodes representing the distributed resources in my application. The relative size of the nodes represents the amount of request traffic that each is receiving, as does the thickness of the links between them. I can toggle the map between showing Requests modes or Latency mode. Using the same drop-down I can also toggle the option to change relative sizing of the nodes. The data shown for Request mode or Latency mode helps me isolate nodes that I need to triage first. Clicking View connections can also be used to aid in the process, since it helps me understand incoming and outgoing calls, and their impact on the individual nodes.

I’ve closed the map legend in the screenshot so we can get a good look at all the nodes, for reference here it is below.

From the map I can immediately see that my front-end gateway is the source of the triggered alarm. The red indicator on the node is showing me that there are 5xx faults associated with the resource, and the length of the indicator relative to the circumference gives me some idea of how many requests are faulting compared to successful requests. Secondly, I can see that the Lambda functions that are handling PUT requests through the API are showing 4xx errors. Finally I can see a purple indicator on the DynamoDB table, indicating throttling is occurring. At this point I have a pretty good idea of what might be happening, but let’s dig a little deeper to see what CloudWatch ServiceLens can help me prove.

In addition to Map view, I can also toggle List view. This gives me at-a-glance information on average latency, faults, and requests/min for all nodes and is specifically ordered by default to show the “worst” node first, using a sort order of descending by fault rate – descending by number of alarms in alarm – ascending by Node name.

Returning to Map view, hovering my mouse over the node representing my API front-end also gives me similar insight into traffic and faulting request percentage, specific to the node.

To see even more data, for any node, clicking it will open a drawer below the map containing graphed data for that resource. In the screenshot below I have clicked on the ApiGatewayPutLambdaFunction node.

The drawer, for each resource, enables me to jump to view logs for the resource (View logs), traces (View traces), or a dashboard (View dashboard). Below, I have clicked View dashboard for the same Lambda function. Scrolling through the data presented for that resource, I note that the duration of execution is not high, while all invokes are going into error in tandem.

Returning to the API front-end that is showing the alarm, I’d like to take look at the request traces so I click the node to open the drawer, then click View traces. I already know from the map that 5xx and 4xx status codes are being generated in the code paths selected by PUT requests coming into my application so I switch Filter type to be Status code, then select both 502 and 504 entries in the list, finally clicking Add to filter. The Traces view switches to show me the traces that resulted in those error codes, the response time distribution and a set of traces.

Ok, now we’re getting close! Clicking the first trace ID, I get a wealth of data including exception messages about that request – more than I can show in a single screenshot! For example, I can see the timelines of each segment that was traced as part of the request.

Scrolling down further, I can view exception details (below this I can also see log messages specific to the trace too) – and here lays my answer, confirming the throttling indicator that I saw in the original map. I can also see this exception message in the log data specific to the trace, shown at the foot of the page. Previously, I would have had to scan through logs for the entire application to hopefully spot this message, being able to drill down from the map is a significant time saver.

Now I know how to fix the issue and get the alarm resolved – increase the write provisioning for the table!

In conjunction with CloudWatch ServiceLens, Amazon CloudWatch has also launched a preview of CloudWatch Synthetics that helps to monitor endpoints using canaries that run 24×7, 365 days a year, so that I am quickly alerted of issues that my customers are facing. These are also visualized on the Service Map and just as I did above, I can drill down to the traces to view transactions that originated from a canary. The faster I can dive deep into a consolidated view of an operational failure or an alarm, the faster I can root cause the issue and help reduce time to resolution and mitigate the customer impact.

Amazon CloudWatch ServiceLens is available now in all commercial AWS Regions.

— Steve

AWS Cloud Development Kit (CDK) – Java and .NET are Now Generally Available

Post Syndicated from Martin Beeby original https://aws.amazon.com/blogs/aws/aws-cloud-development-kit-cdk-java-and-net-are-now-generally-available/

Today, we are happy to announce that Java and .NET support inside the AWS Cloud Development Kit (CDK) is now generally available. The AWS CDK is an open-source software development framework to model and provision your cloud application resources through AWS CloudFormationAWS CDK also offers support for TypeScript and Python.

With the AWS CDK, you can design, compose, and share your own custom resources that incorporate your unique requirements. For example, you can use the AWS CDK to model a VPC, with its associated routing and security configurations. You could then wrap that code into a construct and then share it with the rest of your organization. In this way, you can start to build up libraries of these constructs that you can use to standardize the way your organization creates AWS resources.

I like that by using the AWS CDK, you can build your application, including the infrastructure, in your favorite IDE, using the same programming language that you use for your application code. As you code your AWS CDK model in either .NET or Java, you get productivity benefits like code completion and inline documentation, which make it faster to model your infrastructure.

How the AWS CDK Works
Everything in the AWS CDK is a construct. You can think of constructs as cloud components that can represent architectures of any complexity: a single resource, such as a Amazon Simple Storage Service (S3) bucket or a Amazon Simple Notification Service (SNS) topic, a static website, or even a complex, multi-stack application that spans multiple AWS accounts and regions. You compose constructs together into stacks that you can deploy into an AWS environment, and apps – a collection of one or more stacks.

The AWS CDK includes the AWS Construct Library, which contains constructs representing AWS resources.

How to use the AWS CDK
I’m going to use the AWS CDK to build a simple queue, but rather than handcraft a CloudFormation template in YAML or JSON, the AWS CDK allows me to use a familiar programming language to generate and deploy AWS CloudFormation templates.

To get started, I need to install the AWS CDK command-line interface using NPM. Once this download completes, I can code my infrastructure in either TypeScript, Python, JavaScript, Java, or, .NET.

npm i -g aws-cdk

On my local machine, I create a new folder and navigate into it.

mkdir cdk-newsblog-dotnet && cd cdk-newsblog-dotnet

Now I have installed the CLI I can execute commands such as cdk init and pass a language switch, in this instance, I am using .NET, and the sample app with the csharp language switch.

cdk init sample-app --language csharp

If I wanted to use Java rather than .NET, I would change the --language switch to java.

cdk init sample-app --language java

Since I am in the terminal, I type code . which is a shortcut to open the current folder in VS Code. You could, of course, use any editor, such as Visual Studio or JetBrains Rider. As you can see below, the init command has created a basic .NET AWS CDK project.

If I look into the Program.cs, the Main void creates an App and then a CDKDotnetStack. This stack CDKDotnetStack is defined in the CDKDotnetStack.cs file. This is where the meat of the project resides and where all the AWS resources are defined.

Inside the CDKDotnetStack.cs file, some code creates a Amazon Simple Queue Service (SQS) then a topic and then finally adds a Amazon Simple Notification Service (SNS) subscription to the topic.

Now that I have written the code, the next step is to deploy it. When I do, the AWS CDK will compile and execute this project, converting my .NET code into a AWS CloudFormation template.

If I were to just deploy this now, I wouldn’t actually see the CloudFormation template, so the AWS CDK provides a command cdk synth that takes my application, compiles it, executes it, and then outputs a CloudFormation template.

This is just standard CloudFormation, if you look through it, you will find the following items:

  • AWS::SQS::Queue – The queue I added.
  • AWS::SQS::QueuePolicy – An IAM policy that allows my topic to send messages to my queue. I didn’t actually define this in code, but the AWS CDK is smart enough to know I need one of these, and so creates one.
  • AWS::SNS::Topic – The topic I created.
  • AWS::SNS::Subscription – The subscription between the queue and the topic.
  • AWS::CDK::Metadata This section is specific to the AWS CDK and is automatically added by the toolkit to every stack. It is used by the AWS CDK team for analytics and to allow us to identify versions if there are any issues.

Before I deploy this project to my AWS account, I will use cdk bootstrap. The bootstrap command will create a Amazon Simple Storage Service (S3) bucket for me, which will be used by the AWS CDK to store any assets that might be required during deployment. In this example, I am not using any assets, so technically, I could skip this step. However, it is good practice to bootstrap your environment from the start, so you don’t get deployment errors later if you choose to use assets.

I’m now ready to deploy my project and to do that I issue the following command cdk deploy

This command first creates the AWS CloudFormation template then deploys it into my account. Since my project will make a security change, it asks me if I wish to deploy these changes. I select yes, and a CloudFormation changeset is created, and my resources start building.

Once complete, I can go over to the CloudFormation console and see that all the resources are now part of a AWS CloudFormation stack.

That’s it, my resources have been successfully built in the cloud, all using .NET.

With the addition of Java and .NET, the AWS CDK now supports 5 programming languages in total, giving you more options in how you build your AWS resources. Why not install the AWS CDK today and give it a try in whichever language is your favorite?

— Martin

 

AWS Systems Manager Explorer – A Multi-Account, Multi-Region Operations Dashboard

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/aws-systems-manager-explorer-a-multi-account-multi-region-operations-dashboard/

Since 2006, Amazon Web Services has been striving to simplify IT infrastructure. Thanks to services like Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), Amazon Relational Database Service (RDS), AWS CloudFormation and many more, millions of customers can build reliable, scalable, and secure platforms in any AWS region in minutes. Having spent 10 years procuring, deploying and managing more hardware than I care to remember, I’m still amazed every day by the pace of innovation that builders achieve with our services.

With great power come great responsibility. The second you create AWS resources, you’re responsible for them: security of course, but also cost and scaling. This makes monitoring and alerting all the more important, which is why we built services like Amazon CloudWatch, AWS Config and AWS Systems Manager.

Still, customers told us that their operations work would be much simpler if they could just look at a single dashboard, listing potential issues on AWS resources no matter which ones of their accounts or which region they’ve been created in.

We got to work, and today we’re very happy to announce the availability of AWS Systems Manager Explorer, a unified operations dashboard built as part of Systems Manager.

Introducing AWS Systems Manager Explorer
Collecting monitoring information and alerts from EC2, Config, CloudWatch and Systems Manager, Explorer presents you with an intuitive graphical dashboard that lets you quickly view and browse problems affecting your AWS resources. By default, this data comes from the account and region you’re running in, and you can easily include other regions as well as other accounts managed with AWS Organizations.

Specifically, Explorer can provide operational information about:

  • EC2 issues, such as unhealthy instances,
  • EC2 instances that have a non-compliant patching status,
  • AWS resources that don’t comply with Config rules (predefined or your own),
  • AWS resources that have triggered a CloudWatch Events rule (predefined or your own).

Each issue is stored as an OpsItem in AWS Systems Manager OpsCenter, and is assigned a status (open, in progress, resolved), a severity and a category (security, performance, cost, etc.). Widgets let you quickly browse OpsItems, and a timeline of all OpsItems is also available.

In addition to OpsItems, the Explorer dashboard also includes widgets that show consolidated information on EC2 instances:

  • Instance count, with a tag filter,
  • Instances managed by Systems Manager, as well as unmanaged instances,
  • Instances sorted by AMI id.

As you would expect, all information can be exported to S3 for archival or further processing, and you can also set up Amazon Simple Notification Service (SNS) notifications. Last but not least, all data visible on the dashboard can be accessed from the AWS CLI or any AWS SDK with the GetOpsSummary API.

Let’s take a quick tour.

A Look at AWS Systems Manager Explorer
Before using Explorer, we recommend that you first set up Config and Systems Manager. This will help populate your Explorer dashboard immediately. No setup is required for CloudWatch events.

Setting up Config is a best practice, and the procedure is extremely simple: don’t forget to enable EC2 rules too. Setting up Systems Manager is equally simple, thanks to the quick setup procedure: add managed instances and check for patch compliance in just a few clicks! Don’t forget to do this in all regions and accounts you want Explorer to manage.

If you set these services up later, you’ll have to wait for a little while for data to be retrieved and displayed.

Now, let’s head out to the AWS console for Explorer.

Once I’ve completed the one-click setup page creating a service role and enabling data sources, a quick look at the CloudWatch Events console confirms that rules have been created automatically.

Explorer recommends that I add regions and accounts in order to get a unified view. Of course, you can skip this step if you just want a quick taste of the service.

If you’re keen on synchronizing data, you can easily create a resource data sync, which will fetch operations data coming from other regions and other accounts. I’m going all in here, but please make sure you tick the boxes that work for you.

Once data has been retrieved and processed, my dashboard lights up. Good thing it’s only a test account!

I can also see information on all EC2 instances.

From here on, I can group OpsItems and instances according to different dimensions (accounts, regions, tags). I can also drill down on OpsItems, view their details in Opscenter, and apply runbooks to fix them. If you want to know more about Opscenter, here’s the launch post.

Now Available!
We believe AWS Systems Manager Explorer will help Operations teams find and solve problems easier and faster, no matter what the scale of their AWS infrastructure is.

This feature is available today in all regions where AWS Systems Manager OpsCenter is available. Give it a try, and please share your feedback in the AWS forum for AWS Systems Manager, or with your usual AWS support contacts.

Julien

New – Import Existing Resources into a CloudFormation Stack

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-import-existing-resources-into-a-cloudformation-stack/

With AWS CloudFormation, you can model your entire infrastructure with text files. In this way, you can treat your infrastructure as code and apply software development best practices, such as putting it under version control, or reviewing architectural changes with your team before deployment.

Sometimes AWS resources initially created using the console or the AWS Command Line Interface (CLI) need to be managed using CloudFormation. For example, you (or a different team) may create an IAM role, a Virtual Private Cloud, or an RDS database in the early stages of a migration, and then you have to spend time to include them in the same stack as the final application. In such cases, you often end up recreating the resources from scratch using CloudFormation, and then migrating configuration and data from the original resource.

To make these steps easier for our customers, you can now import existing resources into a CloudFormation stack!

It was already possible to remove resources from a stack without deleting them by setting the DeletionPolicy to Retain. This, together with the new import operation, enables a new range of possibilities. For example, you are now able to:

  • Create a new stack importing existing resources.
  • Import existing resources in an already created stack.
  • Migrate resources across stacks.
  • Remediate a detected drift.
  • Refactor nested stacks by deleting children stacks from one parent and then importing them into another parent stack.

To import existing resources into a CloudFormation stack, you need to provide:

  • A template that describes the entire stack, including both the resources to import and (for existing stacks) the resources that are already part of the stack.
  • Each resource to import must have a DeletionPolicy attribute in the template. This enables easy reverting of the operation in a completely safe manner.
  • A unique identifier for each target resource, for example the name of the Amazon DynamoDB table or of the Amazon Simple Storage Service (S3) bucket you want to import.

During the resource import operation, CloudFormation checks that:

  • The imported resources do not already belong to another stack in the same region (be careful with global resources such as IAM roles).
  • The target resources exist and you have sufficient permissions to perform the operation.
  • The properties and configuration values are valid against the resource type schema, which defines its required, acceptable properties, and supported values.

The resource import operation does not check that the template configuration and the actual configuration are the same. Since the import operation supports the same resource types as drift detection, I recommend running drift detection after importing resources in a stack.

Importing Existing Resources into a New Stack
In my AWS account, I have an S3 bucket and a DynamoDB table, both with some data inside, and I’d like to manage them using CloudFormation. In the CloudFormation console, I have two new options:

  • I can create a new stack importing existing resources.

  • I can import resources into an existing stack.

In this case, I want to start from scratch, so I create a new stack. The next step is to provide a template with the resources to import.

I upload the following template with two resources to import: a DynamoDB table and an S3 bucket.

AWSTemplateFormatVersion: "2010-09-09"
Description: Import test
Resources:

  ImportedTable:
    Type: AWS::DynamoDB::Table
    DeletionPolicy: Retain
    Properties: 
      BillingMode: PAY_PER_REQUEST
      AttributeDefinitions: 
        - AttributeName: id
          AttributeType: S
      KeySchema: 
        - AttributeName: id
          KeyType: HASH

  ImportedBucket:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain

In this template I am setting DeletionPolicy  to Retain for both resources. In this way, if I remove them from the stack, they will not be deleted. This is a good option for resources which contain data you don’t want to delete by mistake, or that you may want to move to a different stack in the future. It is mandatory for imported resources to have a deletion policy set, so you can safely and easily revert the operation, and be protected from mistakenly deleting resources that were imported by someone else.

I now have to provide an identifier to map the logical IDs in the template with the existing resources. In this case, I use the DynamoDB table name and the S3 bucket name. For other resource types, there may be multiple ways to identify them and you can select which property to use in the drop-down menus.

In the final recap, I review changes before applying them. Here I check that I’m targeting the right resources to import with the right identifiers. This is actually a CloudFormation Change Set that will be executed when I import the resources.

When importing resources into an existing stack, no changes are allowed to the existing resources of the stack. The import operation will only allow the Change Set action of Import. Changes to parameters are allowed as long as they don’t cause changes to resolved values of properties in existing resources. You can change the template for existing resources to replace hard coded values with a Ref to a resource being imported. For example, you may have a stack with an EC2 instance using an existing IAM role that was created using the console. You can now import the IAM role into the stack and replace in the template the hard coded value used by the EC2 instance with a Ref to the role.

Moving on, each resource has its corresponding import events in the CloudFormation console.

When the import is complete, in the Resources tab, I see that the S3 bucket and the DynamoDB table are now part of the stack.

To be sure the imported resources are in sync with the stack template, I use drift detection.

All stack-level tags, including automatically created tags, are propagated to resources that CloudFormation supports. For example, I can use the AWS CLI to get the tag set associated with the S3 bucket I just imported into my stack. Those tags give me the CloudFormation stack name and ID, and the logical ID of the resource in the stack template:

$ aws s3api get-bucket-tagging --bucket danilop-toimport

{
  "TagSet": [
    {
      "Key": "aws:cloudformation:stack-name",
      "Value": "imported-stack"
    },
    {
      "Key": "aws:cloudformation:stack-id",
      "Value": "arn:aws:cloudformation:eu-west-1:123412341234:stack/imported-stack/..."
    },
    {
      "Key": "aws:cloudformation:logical-id",
      "Value": "ImportedBucket"
    }
  ]
}

Available Now
You can use the new CloudFormation import operation via the console, AWS Command Line Interface (CLI), or AWS SDKs, in the following regions: US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), Canada (Central), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), EU (Frankfurt), EU (Ireland), EU (London), EU (Paris), and South America (São Paulo).

It is now simpler to manage your infrastructure as code, you can learn more on bringing existing resources into CloudFormation management in the documentation.

Danilo

Operational Insights for Containers and Containerized Applications

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/operational-insights-for-containers-and-containerized-applications/

The increasing adoption of containerized applications and microservices also brings an increased burden for monitoring and management. Builders have an expectation of, and requirement for, the same level of monitoring as would be used with longer lived infrastructure such as Amazon Elastic Compute Cloud (EC2) instances. By contrast containers are relatively short-lived, and usually subject to continuous deployment. This can make it difficult to reliably collect monitoring data and to analyze performance or other issues, which in turn affects remediation time. In addition builders have to resort to a disparate collection of tools to perform this analysis and inspection, manually correlating context across a set of infrastructure and application metrics, logs, and other traces.

Announcing general availability of Amazon CloudWatch Container Insights
At the AWS Summit in New York this past July, Amazon CloudWatch Container Insights support for Amazon ECS and AWS Fargate was announced as an open preview for new clusters. Starting today Container Insights is generally available, with the added ability to now also monitor existing clusters. Immediate insights into compute utilization and failures for both new and existing cluster infrastructure and containerized applications can be easily obtained from container management services including Kubernetes, Amazon Elastic Container Service for Kubernetes, Amazon ECS, and AWS Fargate.

Once enabled Amazon CloudWatch discovers all of the running containers in a cluster and collects performance and operational data at every layer in the container stack. It also continuously monitors and updates as changes occur in the environment, simplifying the number of tools required to collect, monitor, act, and analyze container metrics and logs giving complete end to end visibility.

Being able to easily access this data means customers can shift focus to increased developer productivity and away from building mechanisms to curate and build dashboards.

Getting started with Amazon CloudWatch Container Insights
I can enable Container Insights by following the instructions in the documentation. Once enabled and new clusters launched, when I visit the CloudWatch console for my region I see a new option for Container Insights in the list of dashboards available to me.

Clicking this takes me to the relevant dashboard where I can select the container management service that hosts the clusters that I want to observe.

In the below image I have selected to view metrics for my ECS Clusters that are hosting a sample application I have deployed in AWS Fargate. I can examine the metrics for standard time periods such as 1 hour, 3 hours, etc but can also specify custom time periods. Here I am looking at the metrics for a custom time period of the past 15 minutes.

You can see that I can quickly gain operational oversight of the overall performance of the cluster. Clicking the cluster name takes me deeper to view the metrics for the tasks inside the cluster.

Selecting a container allows me to then dive into either AWS X-Ray traces or performance logs.

Selecting performance logs takes me to the Amazon CloudWatch Logs Insights page where I can run queries against the performance events collected for my container ecosystem (e.g., Container, Task/Pod, Cluster, etc.) that I can then use to troubleshoot and dive deeper.

Container Insights makes it easy for me to get started monitoring my containers and enables me to quickly drill down into performance metrics and log analytics without the need to build custom dashboards to curate data from multiple tools. Beyond monitoring and troubleshooting I can also use the data and dashboards Container Insights provides me to support other use cases such as capacity requirements and planning, by helping me understand compute utilization by Pod/Task, Container, and Service for example.

Availability
Amazon CloudWatch Container Insights is generally available today to customers in all public AWS regions where Amazon Elastic Container Service for Kubernetes, Kubernetes, Amazon ECS, and AWS Fargate are present.

— Steve

Deploying GitOps with Weave Flux and Amazon EKS

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/deploying-gitops-with-weave-flux-and-amazon-eks/

This post is contributed by Jon Jozwiak | Senior Solutions Architect, AWS

 

You have countless options for deploying resources into an Amazon EKS cluster. GitOps—a term coined by Weaveworks—provides some substantial advantages over the alternatives. With only Git as the single, central source for controlling deployment into your cluster, GitOps provides easy version control on a platform your team already knows. Getting started with GitOps is straightforward: create a pull request, merge, and the configuration deploys to the EKS cluster.

Weave Flux makes running GitOps in your EKS cluster fast and easy, as it monitors your configuration in Git and image repositories and automates deployments. Weave Flux follows a pull model, automatically triggering deployments based on changes. This provides better security than most continuous deployment tools, which need permissions to access your cluster. This approach also provides Git with version control over your configuration and enables rollback.

This post walks through implementing Weave Flux and deploying resources to EKS using Git. To simplify the image build pipeline, I use AWS Service Catalog to provide a standardized pipeline. AWS Service Catalog lets you centrally define a portfolio of approved products that AWS users can provision. An AWS CloudFormation template defines each product, which can be version-controlled.

After you deploy the sample resources, I quickly demonstrate the GitOps approach where a new image results in the configuration automatically deploying to EKS. This new image may be a commit of Kubernetes manifests or a commit of Helm release definitions.

The following diagram shows the workflow.

Prerequisites

In GitOps, you manage Docker image builds separately from deployment configuration. For image builds, this example uses AWS CodePipeline and AWS CodeBuild, which provide a managed workflow from GitHub source through to an image landing in Amazon Elastic Container Registry (ECR).

This post assumes that you already have an EKS cluster deployed, including kubectl access. It also assumes that you have a GitHub account.

GitHub setup

First, create a GitHub repository to store the Kubernetes manifests (configuration files) to apply to the cluster.

In GitHub, create a GitHub repository. This repository holds Kubernetes manifests for your deployments. Name the repository k8s-config to align with this post. Leave it as a public repository, check the box for Initialize this repository with a README, and choose Create Repo.

On the GitHub repository page, choose Clone or Download and save the SSH string:

[email protected]:youruser/k8s-config.git

Next, create a GitHub token that allows creating and deleting repositories so AWS Service Catalog can deploy and remove pipelines.

  1. In your GitHub profile, access your token settings.
  2. Choose Generate New Token.
  3. Name your new token CodePipeline Service Catalog, and select the following options:
  • repo scopes (repo:status, repo_deployment, public_repo, and repo:invite)
  • read:org
  • write:public_key and read:public_key
  • write:repo_hook and read:repo_hook
  • read:user and user:email
  • delete_repo

4 . Choose Generate Token.

5. Copy and save your access token for future access.

 

Deploy Helm

Helm is a package manager for Kubernetes that allows you to define a chart. Charts are collections of related resources that let you create, version, share, and publish applications. By deploying Helm into your cluster, you make it much easier to deploy Weave Flux and other systems. If you’ve deployed Helm already, skip this section.

First, install the Helm client with the following command:

curl -LO https://git.io/get_helm.sh

chmod 700 get_helm.sh

./get_helm.sh

 

On macOS, you could alternatively enter the following command:

brew install kubernetes-helm

 

Next, set up a service account with cluster role for Tiller, Helm’s server-side component. This allows Tiller to manage resources in your cluster.

kubectl -n kube-system create sa tiller

kubectl create clusterrolebinding tiller-cluster-rule \

--clusterrole=cluster-admin \

--serviceaccount=kube-system:tiller

 

Finally, initialize Helm and verify your version. Tiller takes a few seconds to start.

helm init --service-account tiller --history-max 200

helm version

 

Deploy Weave Flux

With Helm installed, proceed with the Weave Flux installation. Begin by installing the Flux Custom Resource Definition.

kubectl apply -f https://raw.githubusercontent.com/fluxcd/flux/helm-0.10.1/deploy-helm/flux-helm-release-crd.yaml

Now add the Weave Flux Helm repository and proceed with the install. Make sure that you update the git.url to match the GitHub repository that you created earlier.

helm repo add fluxcd https://charts.fluxcd.io

helm upgrade -i flux --set helmOperator.create=true --set helmOperator.createCRD=false --set [email protected]:YOURUSER/k8s-config --namespace flux fluxcd/flux

 

You can use the following code to verify that you successfully deployed Flux. You should see three pods running:

kubectl get pods -n flux

NAME                                 READY     STATUS    RESTARTS   AGE

flux-5bd7fb6bb6-4sc78                1/1       Running   0          52s

flux-helm-operator-df5746688-84kw8   1/1       Running   0          52s

flux-memcached-6f8c446979-f45wj      1/1       Running   0          52s

 

Flux requires a deploy key to work with the GitHub repository. In this post, Flux generates the SSH key pair itself, but you can also specify a different key pair when deploying. To access the key, download fluxctl, a command line utility that interacts with the Flux API. The following steps work for Linux. For other OS platforms, see Installing fluxctl.

sudo wget -O /usr/local/bin/fluxctl https://github.com/fluxcd/flux/releases/download/1.14.1/fluxctl_linux_amd64

sudo chmod 755 /usr/local/bin/fluxctl

 

Validate that fluxctl installed successfully, then retrieve the public key pair using the following command. Specify the namespace where you deployed Flux.

fluxctl version

fluxctl --k8s-fwd-ns=flux identity

 

Copy the key and add that as a deploy key in your GitHub repository.

  1. In your GitHub repository, choose Settings, Deploy Keys.
  2. Choose Add deploy key and name the key Flux Deploy Key.
  3. Paste the key from fluxctl identity.
  4. Choose Allow Write Access, Add Key.

Now use AWS Service Catalog to set up your image build pipeline.

 

Set up AWS Service Catalog

To allow end users to consume product portfolios, you must associate a portfolio with an IAM principal (or principals): a user, group, or role. For this example, associate your current identity. After you master these basics, there are additional resources to teach you how to set up a multi-region, multi-account catalog.

To retrieve your current identity, use the AWS CLI to get your ARN:

aws sts get-caller-identity

Deploy the product portfolio that contains an image build pipeline service by doing the following:

  1. In the AWS CloudFormation console, launch the CloudFormation stack with the following link:

 

 

2. Choose Next.

3. On the Specify Details page, enter your ARN from get-caller-identity. Also enter an environment tag, which AWS applies to all resources from this portfolio.

4. Choose Next.

5. On the Options page, choose Next.

6. On the Review page, select the check box displayed next to I acknowledge that AWS CloudFormation might create IAM resources.

7. Choose Create. CloudFormation takes a few minutes to create your resources.

 

Deploy the image pipeline

The image pipeline provisions a GitHub repository, Amazon ECR repository, and AWS CodeBuild project. It also uses AWS CodePipeline to build a Docker image.

  1. In the AWS Management Console, go to the AWS Service Catalog products list and choose Pipeline for Docker Images.
  2. Choose Launch Product.
  3. For Name, enter ExamplePipeline, and choose Next.
  4. On the Parameters page, fill in a project name, description, and unique S3 bucket name. The specifics don’t matter, but make a note of the name and S3 bucket for later use.
  5. Fill in your GitHub User and GitHub Token values from earlier. Leave the rest of the fields as the default values.
  6. To clean up your GitHub repository on stack delete, change Delete Repository to true.
  7. Choose Next.
  8. On the TagOptions screen, choose Next.
  9. Choose Next on the Notifications page.
  10. On the Review page, choose Launch.

The launch process takes 1–2 minutes. You can verify that you now have a repository matching your project name (eks-example) in GitHub. You can also look at the pipeline created in the AWS CodePipeline console.

 

Deploying with GitOps

You can now provision workloads into the EKS cluster. With a GitOps approach, you only commit code and Kubernetes resource definitions to GitHub. AWS CodePipeline handles the image builds, and Weave Flux applies the desired state to Kubernetes.

First, create a simple Hello World application in your example pipeline. Clone the GitHub repository that you created in the previous step and substitute your GitHub user below.

git clone [email protected]:youruser/eks-example.git

cd eks-example

Create a base README file, a source directory, and download a simple NGINX configuration (hello.conf), home page (index.html), and Dockerfile.

echo "# eks-example" > README.md

mkdir src

wget -O src/hello.conf https://blog-gitops-eks.s3.amazonaws.com/hello.conf

wget -O src/index.html https://blog-gitops-eks.s3.amazonaws.com/index.html

wget https://blog-gitops-eks.s3.amazonaws.com/Dockerfile

 

Now that you have a simple Hello World app with Dockerfile, commit the changes to kick off the pipeline.

git add .

git commit -am "Initial commit"

[master (root-commit) d69a6ba] Initial commit

4 files changed, 34 insertions(+)

create mode 100644 Dockerfile

create mode 100644 README.md

create mode 100644 src/hello.conf

create mode 100644 src/index.html

git push

 

Watch in the AWS CodePipeline console to see the image build in process. This may take a minute to start. When it’s done, look in the ECR console to see the first version of the container image.

To deploy this image and the Hello World application, commit Kubernetes manifests for Flux. Create a namespace, deployment, and service in the Kubernetes Git repository (k8s-config) you created. Make sure that you aren’t in your eks-example repository directory.

cd ..

git clone [email protected]:youruser/k8s-config.git

cd k8s-config

mkdir charts namespaces releases workloads

 

The preceding directory structure helps organize the repository but isn’t necessary. Flux can descend into subdirectories and look for YAML files to apply.

Create a namespace Kubernetes manifest.

cat << EOF > namespaces/eks-example.yaml
apiVersion: v1
kind: Namespace
metadata:
  labels:
    name: eks-example
  name: eks-example
EOF

Now create a deployment manifest. Make sure that you update this image to point to your repository and image tag. For example, <Account ID>.dkr.ecr.us-east-1.amazonaws.com/eks-example:d69a6bac.

cat << EOF > workloads/eks-example-dep.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: eks-example
  namespace: eks-example
  labels:
    app: eks-example
  annotations:
    # Container Image Automated Updates
    flux.weave.works/automated: "true"
    # do not apply this manifest on the cluster
    #flux.weave.works/ignore: "true"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: eks-example
  template:
    metadata:
      labels:
        app: eks-example
    spec:
      containers:
      - name: eks-example
        image: <Your Account>.dkr.ecr.us-east-1.amazonaws.com/eks-example:d69a6bac
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /
            port: http
        readinessProbe:
          httpGet:
            path: /
            port: http
EOF

 

Finally, create a service manifest to create a load balancer.

cat << EOF > workloads/eks-example-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: eks-example
  namespace: eks-example
  labels:
    app: eks-example
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: eks-example
EOF

 

In the preceding code, there are two Kubernetes annotations for Flux. The first, flux.weave.works/automated, tells Flux whether the container image should be automatically updated. This example sets the value to true, enabling updates to your deployment as new images arrive in the registry. This example comments out the second annotation, flux.weave.works/ignore. However, you can use it to tell Flux to ignore the deployment temporarily.

Commit the changes, and in a few minutes, it automatically deploys.

git add .
git commit -am "eks-example deployment"
[master 954908c] eks-example deployment
 3 files changed, 64 insertions(+)
 create mode 100644 namespaces/eks-example.yaml
 create mode 100644 workloads/eks-example-dep.yaml
 create mode 100644 workloads/eks-example-svc.yaml

 

Make sure that you push your changes.

git push

Now check the logs of your Flux pod:

kubectl get pods -n flux

Update the name below to reflect the name of the pod in your deployment. This sample pulls every five minutes for changes. When it triggers, you should see kubectl apply log messages to create the namespace, service, and deployment.

kubectl logs flux-5bd7fb6bb6-4sc78 -n flux

Find the load balancer input for your service with the following:

kubectl describe service eks-example -n eks-example

Now when you connect to the load balancer address in a browser, you can see the Hello World app.

Change the eks-example source code in a small way (such as changing index.html to say Hello World Deployment 2), then commit and push to Git.

After a few minutes, refresh your browser to see the deployed change. You can watch the changes in AWS CodePipeline, in ECR, and through Flux logs. Weave Flux automatically updated your deployment manifests in the k8s-config repository to deploy the new image as it detected it. To back out that change, use a git revert or git reset command.

Finally, you can use the same approach to deploy Helm charts. You can host these charts within the configuration Git repository (k8s-config in this example), or on an external chart repository. In the following example, you use an external chart repository.

In your k8s-config directory, get the latest changes from your repository and then create a Helm release from an external chart.

cd k8s-config

git pull

 

First, create the namespace manifest.

cat << EOF > namespaces/nginx.yaml
apiVersion: v1
kind: Namespace
metadata:
  labels:
    name: nginx
  name: nginx
EOF

 

Then create the Helm release manifest. This is a custom resource definition provided by Weave Flux.

cat << EOF > releases/nginx.yaml
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
  name: mywebserver
  namespace: nginx
  annotations:
    flux.weave.works/automated: "true"
    flux.weave.works/tag.nginx: semver:~1.16
    flux.weave.works/locked: 'true'
    flux.weave.works/locked_msg: '"Halt updates for now"'
    flux.weave.works/locked_user: User Name <[email protected]>
spec:
  releaseName: mywebserver
  chart:
    repository: https://charts.bitnami.com/bitnami/
    name: nginx
    version: 3.3.2
  values:
    usePassword: true
    image:
      registry: docker.io
      repository: bitnami/nginx
      tag: 1.16.0-debian-9-r46
    service:
      type: LoadBalancer
      port: 80
      nodePorts:
        http: ""
      externalTrafficPolicy: Cluster
    ingress:
      enabled: false
    livenessProbe:
      httpGet:
        path: /
        port: http
      initialDelaySeconds: 30
      timeoutSeconds: 5
      failureThreshold: 6
    readinessProbe:
      httpGet:
        path: /
        port: http
      initialDelaySeconds: 5
      timeoutSeconds: 3
      periodSeconds: 5
    metrics:
      enabled: false
EOF

git add . 
git commit -am "Adding NGINX Helm release"
git push

 

There are a few new annotations for Flux above. The flux.weave.works/locked annotation tells Flux to lock the deployment. This is useful if you find a known bad image and must roll back to a previous version. In addition, the flux.weave.works/tag.nginx annotation filters image tags by semantic versioning.

Wait up to five minutes for Flux to pull the configuration and verify this deployment as you did in the previous example:

kubectl get pods -n flux

kubectl logs flux-5bd7fb6bb6-4sc78 -n flux

 

kubectl get all -n nginx

 

If this doesn’t deploy, ensure Helm initialized as described earlier in this post.

kubectl get pods -n kube-system | grep tiller

kubectl get pods -n flux

kubectl logs flux-helm-operator-df5746688-84kw8 -n flux

 

Clean up

Log in as an administrator and follow these steps to clean up your sample deployment.

  1. Delete all images from the Amazon ECR repository.

2. In AWS Service Catalog provisioned products, select the three dots to the left of your ExamplePipeline service and choose Terminate provisioned product. Wait until it completes termination (1–2 minutes).

3. Delete your Amazon S3 artifact bucket.

4. Delete Weave Flux:

helm delete flux --purge

kubectl delete ns flux

kubectl delete crd helmreleases.flux.weave.works

5. Delete the load balancer services:

helm delete mywebserver --purge

kubectl delete ns nginx

kubectl delete svc eks-example -n eks-example

kubectl delete deployment eks-example -n eks-example

kubectl delete ns eks-example

6. Clean up your GitHub repositories:

 – Go to your k8s-config repository in GitHub, choose Settings, scroll to the bottom and choose Delete this repository. If you set delete to false in the pipeline service, you also must delete your eks-example repository.

 – Delete the personal access token that you created.

7.     If you provisioned an EKS cluster at the beginning of this post, delete it:

eksctl get cluster

eksctl delete cluster <clustername>

8.     In the AWS CloudFormation console, select the DevServiceCatalog stack, and choose the Actions, Delete Stack.

Conclusion

In this post, I demonstrated how to use a GitOps approach, which allows you to focus on committing code and configuration to Git rather than learning new CI/CD tooling. Git acts as the single source of truth, and Weave Flux pulls changes and ensures that the Kubernetes cluster configuration matches the desired state.

In addition, AWS Service Catalog can be used to create a portfolio of services that enables you to standardize your offerings, such as an image build pipeline based on AWS CodePipeline.

As always, AWS welcomes feedback. Please submit comments or questions below.

AWS Cloud Development Kit (CDK) – TypeScript and Python are Now Generally Available

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-cloud-development-kit-cdk-typescript-and-python-are-now-generally-available/

Managing your Infrastructure as Code provides great benefits and is often a stepping stone for a successful application of DevOps practices. In this way, instead of relying on manually performed steps, both administrators and developers can automate provisioning of compute, storage, network, and application services required by their applications using configuration files.

For example, defining your Infrastructure as Code makes it possible to:

  • Keep infrastructure and application code in the same repository
  • Make infrastructure changes repeatable and predictable across different environments, AWS accounts, and AWS regions
  • Replicate production in a staging environment to enable continuous testing
  • Replicate production in a performance test environment that you use just for the time required to run a stress test
  • Release infrastructure changes using the same tools as code changes, so that deployments include infrastructure updates
  • Apply software development best practices to infrastructure management, such as code reviews, or deploying small changes frequently

Configuration files used to manage your infrastructure are traditionally implemented as YAML or JSON text files, but in this way you’re missing most of the advantages of modern programming languages. Specifically with YAML, it can be very difficult to detect a file truncated while transferring to another system, or a missing line when copying and pasting from one template to another.

Wouldn’t it be better if you could use the expressive power of your favorite programming language to define your cloud infrastructure? For this reason, we introduced last year in developer preview the AWS Cloud Development Kit (CDK), an extensible open-source software development framework to model and provision your cloud infrastructure using familiar programming languages.

I am super excited to share that the AWS CDK for TypeScript and Python is generally available today!

With the AWS CDK you can design, compose, and share your own custom components that incorporate your unique requirements. For example, you can create a component setting up your own standard VPC, with its associated routing and security configurations. Or a standard CI/CD pipeline for your microservices using tools like AWS CodeBuild and CodePipeline.

Personally I really like that by using the AWS CDK, you can build your application, including the infrastructure, in your IDE, using the same programming language and with the support of autocompletion and parameter suggestion that modern IDEs have built in, without having to do a mental switch between one tool, or technology, and another. The AWS CDK makes it really fun to quickly code up your AWS infrastructure, configure it, and tie it together with your application code!

How the AWS CDK works
Everything in the AWS CDK is a construct. You can think of constructs as cloud components that can represent architectures of any complexity: a single resource, such as an S3 bucket or an SNS topic, a static website, or even a complex, multi-stack application that spans multiple AWS accounts and regions. To foster reusability, constructs can include other constructs. You compose constructs together into stacks, that you can deploy into an AWS environment, and apps, a collection of one of more stacks.

How to use the AWS CDK
We continuously add new features based on the feedback of our customers. That means that when creating an AWS resource, you often have to specify many options and dependencies. For example, if you create a VPC you have to think about how many Availability Zones (AZs) to use and how to configure subnets to give private and public access to the resources that will be deployed in the VPC.

To make it easy to define the state of AWS resources, the AWS Construct Library exposes the full richness of many AWS services with sensible defaults that you can customize as needed. In the case above, the VPC construct creates by default public and private subnets for all the AZs in the VPC, using 3 AZs if not specified.

For creating and managing CDK apps, you can use the AWS CDK Command Line Interface (CLI), a command-line tool that requires Node.js and can be installed quickly with:

npm install -g aws-cdk

After that, you can use the CDK CLI with different commands:

  • cdk init to initialize in the current directory a new CDK project in one of the supported programming languages
  • cdk synth to print the CloudFormation template for this app
  • cdk deploy to deploy the app in your AWS Account
  • cdk diff to compare what is in the project files with what has been deployed

Just run cdk to see more of the available commands and options.

You can easily include the CDK CLI in your deployment automation workflow, for example using Jenkins or AWS CodeBuild.

Let’s use the AWS CDK to build two sample projects, using different programming languages.

An example in TypeScript
For the first project I am using TypeScript to define the infrastructure:

cdk init app --language=typescript

Here’s a simplified view of what I want to build, not entering into the details of the public/private subnets in the VPC. There is an online frontend, writing messages in a queue, and an asynchronous backend, consuming messages from the queue:

Inside the stack, the following TypeScript code defines the resources I need, and their relations:

  • First I define the VPC and an Amazon ECS cluster in that VPC. By using the defaults provided by the AWS Construct Library, I don’t need to specify any parameter here.
  • Then I use an ECS pattern that in a few lines of code sets up an Amazon SQS queue and an ECS service running on AWS Fargate to consume the messages in that queue.
  • The ECS pattern library provides higher-level ECS constructs which follow common architectural patterns, such as load balanced services, queue processing, and scheduled tasks.
  • A Lambda function has the name of the queue, created by the ECS pattern, passed as an environment variable and is granted permissions to send messages to the queue.
  • The code of the Lambda function and the Docker image are passed as assets. Assets allow you to bundle files or directories from your project and use them with Lambda or ECS.
  • Finally, an Amazon API Gateway endpoint provides an HTTPS REST interface to the function.
const myVpc = new ec2.Vpc(this, "MyVPC");

const myCluster = new ecs.Cluster(this, "MyCluster", {
  vpc: myVpc
});

const myQueueProcessingService = new ecs_patterns.QueueProcessingFargateService(
  this, "MyQueueProcessingService", {
    cluster: myCluster,
    memoryLimitMiB: 512,
    image: ecs.ContainerImage.fromAsset("my-queue-consumer")
  });

const myFunction = new lambda.Function(
  this, "MyFrontendFunction", {
    runtime: lambda.Runtime.NODEJS_10_X,
    timeout: Duration.seconds(3),
    handler: "index.handler",
    code: lambda.Code.asset("my-front-end"),
    environment: {
      QUEUE_NAME: myQueueProcessingService.sqsQueue.queueName
    }
  });

myQueueProcessingService.sqsQueue.grantSendMessages(myFunction);

const myApi = new apigateway.LambdaRestApi(
  this, "MyFrontendApi", {
    handler: myFunction
  });

I find this code very readable and easier to maintain than the corresponding JSON or YAML. By the way, cdk synth in this case outputs more than 800 lines of plain CloudFormation YAML.

An example in Python
For the second project I am using Python:

cdk init app --language=python

I want to build a Lambda function that is executed every 10 minutes:

When you initialize a CDK project in Python, a virtualenv is set up for you. You can activate the virtualenv and install your project requirements with:

source .env/bin/activate

pip install -r requirements.txt

Note that Python autocompletion may not work with some editors, like Visual Studio Code, if you don’t start the editor from an active virtualenv.

Inside the stack, here’s the Python code defining the Lambda function and the CloudWatch Event rule to invoke the function periodically as target:

myFunction = aws_lambda.Function(
    self, "MyPeriodicFunction",
    code=aws_lambda.Code.asset("src"),
    handler="index.main",
    timeout=core.Duration.seconds(30),
    runtime=aws_lambda.Runtime.PYTHON_3_7,
)

myRule = aws_events.Rule(
    self, "MyRule",
    schedule=aws_events.Schedule.rate(core.Duration.minutes(10)),
)
myRule.add_target(aws_events_targets.LambdaFunction(myFunction))

Again, this is easy to understand even if you don’t know the details of AWS CDK. For example, durations include the time unit and you don’t have to wonder if they are expressed in seconds, milliseconds, or days. The output of cdk synth in this case is more than 90 lines of plain CloudFormation YAML.

Available Now
There is no charge for using the AWS CDK, you pay for the AWS resources that are deployed by the tool.

To quickly get hands-on with the CDK, start with this awesome step-by-step online tutorial!

More examples of CDK projects, using different programming languages, are available in this repository:

https://github.com/aws-samples/aws-cdk-examples

You can find more information on writing your own constructs here.

The AWS CDK is open source and we welcome your contribution to make it an even better tool:

https://github.com/awslabs/aws-cdk

Check out our source code on GitHub, start building your infrastructure today using TypeScript or Python, or try different languages in developer preview, such as C# and Java, and give us your feedback!

Scaling Kubernetes deployments with Amazon CloudWatch metrics

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/scaling-kubernetes-deployments-with-amazon-cloudwatch-metrics/

This post is contributed by Kwunhok Chan | Solutions Architect, AWS

 

In an earlier post, AWS introduced Horizontal Pod Autoscaler and Kubernetes Metrics Server support for Amazon Elastic Kubernetes Service. These tools make it easy to scale your Kubernetes workloads managed by EKS in response to built-in metrics like CPU and memory.

However, one common use case for applications running on EKS is the integration with AWS services. For example, you administer an application that processes messages published to an Amazon SQS queue. You want the application to scale according to the number of messages in that queue. The Amazon CloudWatch Metrics Adapter for Kubernetes (k8s-cloudwatch-adapter) helps.

 

Amazon CloudWatch Metrics Adapter for Kubernetes

The k8s-cloudwatch-adapter is an implementation of the Kubernetes Custom Metrics API and External Metrics API with integration for CloudWatch metrics. It allows you to scale your Kubernetes deployment using the Horizontal Pod Autoscaler (HPA) with CloudWatch metrics.

 

Prerequisites

Before starting, you need the following:

 

Getting started

Before using the k8s-cloudwatch-adapter, set up a way to manage IAM credentials to Kubernetes pods. The CloudWatch Metrics Adapter requires the following permissions to access metric data from CloudWatch:

cloudwatch:GetMetricData

Create an IAM policy with the following template:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        }
    ]
}

For demo purposes, I’m granting admin permissions to my Kubernetes worker nodes. Don’t do this in your production environment. To associate IAM roles to your Kubernetes pods, you may want to look at kube2iam or kiam.

If you’re using an EKS cluster, you most likely provisioned it with AWS CloudFormation. The following command uses AWS CloudFormation stacks to update the proper instance policy with the correct permissions:

aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/AdministratorAccess \
--role-name $(aws cloudformation describe-stacks --stack-name ${STACK_NAME} --query 'Stacks[0].Parameters[?ParameterKey==`NodeInstanceRoleName`].ParameterValue' | jq -r ".[0]")

 

Make sure to replace ${STACK_NAME} with the nodegroup stack name from the AWS CloudFormation console .

 

You can now deploy the k8s-cloudwatch-adapter to your Kubernetes cluster.

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/deploy/adapter.yaml

 

This deployment creates a new namespace—custom-metrics—and deploys the necessary ClusterRole, Service Account, and Role Binding values, along with the deployment of the adapter. Use the created custom resource definition (CRD) to define the configuration for the external metrics to retrieve from CloudWatch. The adapter reads the configuration defined in ExternalMetric CRDs and loads its external metrics. That allows you to use HPA to autoscale your Kubernetes pods.

 

Verifying the deployment

Next, query the metrics APIs to see if the adapter is deployed correctly. Run the following command:

$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq.
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
  ]
}

There are no resources from the response because you haven’t registered any metric resources yet.

 

Deploying an Amazon SQS application

Next, deploy a sample SQS application to test out k8s-cloudwatch-adapter. The SQS producer and consumer are provided, together with the YAML files for deploying the consumer, metric configuration, and HPA.

Both the producer and consumer use an SQS queue named helloworld. If it doesn’t exist already, the producer creates this queue.

Deploy the consumer with the following command:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/consumer-deployment.yaml

 

You can verify that the consumer is running with the following command:

$ kubectl get deploy sqs-consumer
NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
sqs-consumer   1         1         1            0           5s

 

Set up Amazon CloudWatch metric and HPA

Next, create an ExternalMetric resource for the CloudWatch metric. Take note of the Kind value for this resource. This CRD resource tells the adapter how to retrieve metric data from CloudWatch.

You define the query parameters used to retrieve the ApproximateNumberOfMessagesVisible for an SQS queue named helloworld. For details about how metric data queries work, see CloudWatch GetMetricData API.

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric:
  metadata:
    name: hello-queue-length
  spec:
    name: hello-queue-length
    resource:
      resource: "deployment"
      queries:
        - id: sqs_helloworld
          metricStat:
            metric:
              namespace: "AWS/SQS"
              metricName: "ApproximateNumberOfMessagesVisible"
              dimensions:
                - name: QueueName
                  value: "helloworld"
            period: 300
            stat: Average
            unit: Count
          returnData: true

 

Create the ExternalMetric resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/externalmetric.yaml

 

Then, set up the HPA for your consumer. Here is the configuration to use:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: sqs-consumer-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: sqs-consumer
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: hello-queue-length
      targetValue: 30

 

This HPA rule starts scaling out when the number of messages visible in your SQS queue exceeds 30, and scales in when there are fewer than 30 messages in the queue.

Create the HPA resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/hpa.yaml

 

Generate load using a producer

Finally, you can start generating messages to the queue:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/producer-deployment.yaml

On a separate terminal, you can now watch your HPA retrieving the queue length and start scaling the replicas. SQS metrics generate at five-minute intervals, so give the process a few minutes:

$ kubectl get hpa sqs-consumer-scaler -w

 

Clean up

After you complete this experiment, you can delete the Kubernetes deployment and respective resources.

Run the following commands to remove the consumer, external metric, HPA, and SQS queue:

$ kubectl delete deploy sqs-producer
$ kubectl delete hpa sqs-consumer-scaler
$ kubectl delete externalmetric sqs-helloworld-length
$ kubectl delete deploy sqs-consumer

$ aws sqs delete-queue helloworld

 

Other CloudWatch integrations

AWS recently announced the preview for Amazon CloudWatch Container Insights, which monitors, isolates, and diagnoses containerized applications running on EKS and Kubernetes clusters. To get started, see Using Container Insights.

 

Get involved

This project is currently under development. AWS welcomes issues and pull requests, and would love to hear your feedback.

How could this adapter be best implemented to work in your environment? Visit the Amazon CloudWatch Metrics Adapter for Kubernetes project on GitHub and let AWS know what you think.

OpsCenter – A New Feature to Streamline IT Operations

Post Syndicated from Martin Beeby original https://aws.amazon.com/blogs/aws/opscenter-a-new-feature-to-streamline-it-operations/

The AWS teams are always listening to customers and trying to understand how they can improve services to make customers more productive. A new feature in AWS Systems Manager called OpsCenter exemplifies this approach by enabling customers to aggregate issues, events and alerts, across services. So customers can go to one place to view, investigate, and remediate issues reducing the need to navigate across multiple different AWS services.

Issues, events and alerts appear as operations items (OpsItems) in this new console and provide contextual information, historical guidance, and quick solution steps. The feature aims to improve the mean time to resolution, making engineers more productive by ensuring key investigation data is available in one place.

Engineers working on an OpsItem get access to information such as:

  • Event, resource and account details
  • Past OpsItems with similar characteristics
  • Related AWS Config changes and relationships
  • AWS CloudTrail logs
  • Amazon CloudWatch alarms
  • AWS CloudFormation Stack information
  • Other quick-links to access logs and metrics
  • List of runbooks and recommended runbooks
  • Additional information passed to OpsCenter through AWS services

This information helps engineers to investigate and remediate operational issues faster. Engineers can use OpsCenter to view and address problems using the Systems Manager console or via the Systems Manager OpsCenter APIs.

I’ll spend the rest of this blog exploring the capabilities of this new feature. To get started, I open the Systems Manager Console, make sure that I am in the region of interest, and click OpsCenter inside the Operations Management menu which is on the far left of the screen.

After arriving at the OpsCenter screen for the first time and clicking on “Getting Started” I am prompted with a configure sources screen. This screen sets up the systems with some example CloudWatch rules that will create OpsItems when specific rules trigger. For example, one of the CloudWatch rules will alert if an AutoScaling EC2 instance is stopped or terminated. On this screen, you need to configure and add the ARN of an IAM role that has permission to create OpsItems. This security role is used by the CloudWatch rules to create the OpsItems. You can, of course, create your OpsItems through the API or by creating custom CloudWatch rules.

Now the system has set me up some CloudWatch rules I thought I would test it out by triggering an alert. In the EC2 console, I will intentionally deregister (delete) the Amazon Machine Image that is associated with my AutoScaling Group. I will then increase the Desired Capacity of my AutoScaling group from 2 to 4. The AutoScaling group will later try to create new instances; however, it will fail because I have deleted the AMI.

As I expected this triggered the CloudWatch rule to create an OpsItem in OpsCenter console. There is now one item open in the OpsItem status summary dashboard. I click on this to get more detail on the open OpsItems.

This gives me a list of all the open OpsItems, and I can see that I have one with the title “Auto Scaling EC2 instance launch failed” which has been created by CloudWatch rules because I deleted the AMI associated with the AutoScaling group. Clicking on that OpsItems takes me to more detail of the OpsItem.

I can from this overview screen start to explore the item. Looking around this screen, I can find out more information about this OpsItem and see it is collecting data from numerous services and presenting it in one place.

Further down the screen I can see other Similar OpsItems and can explore them. In a real situation, this might give me contextual information as to how similar problems were solved in the past, ensuring that operations teams learn from their previous collective experience. I can also manually add a relationship between OpsItems if they are connected. Importantly the Operational data section gives me information about the cause. The status message is particularly useful since it’s calling out the issue: that the AMI does not exist.

On the related resources details screen, I can find out more information about this OpsItem. For example, I can see tag information about the resources alongside relevant CloudWatch alarms. I can explore details from AWS config as well as drill into CloudTrail logs. I can even see if the resources are associated with any CloudFormation stacks.

Earlier on, I created a CloudWatch alarm that will alert when the number of instances on my AutoScaling group falls below the desired instance threshold (4 Instances). As you can see, I don’t need to go into the CloudWatch console to view this, I can see right from this screen that I have an Alarm State for Booking App Instance Count Low.

The Runbooks section is fascinating; what it is offering me is automated ways in which I can resolve this issue. There are several built-in Runbooks; however, I have a custom one which, luckily enough, automates the fix for this exact problem. It will create a new AMI based upon one of the healthy instances in my AutoScaling Group and then update the config to use that new AMI when it creates instances. To run this automation, I select the runbook and press execute.

It asks me to provide some parameters for the automation job. I paste the AutoScaling Group Name (BookingsAppASG) as the only required parameter and press Execute.

After a minute or so a green success signifier appears in the Latest Status column of the runbook and I am now able to view the logs and even save the output to operation data on the OpsItem so that other engineers can clearly see what I have done.

 

Back in the OpsCenter OpsItem related resource details screen, I can now see that my CloudWatch alarm is green and in an OK state, signifying that my AutoScalling group currently has four instances running and I am safe to resolve the OpsItem.

This service is available now, and you can start using it today in all public AWS regions so why not open up the console and start exploring all the ways that you can save you and your team valuable time.

AWS Control Tower – Set up & Govern a Multi-Account AWS Environment

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-control-tower-set-up-govern-a-multi-account-aws-environment/

Earlier this month I met with an enterprise-scale AWS customer. They told me that they are planning to go all-in on AWS, and want to benefit from all that we have learned about setting up and running AWS at scale. In addition to setting up a Cloud Center of Excellence, they want to set up a secure environment for teams to provision development and production accounts in alignment with our recommendations and best practices.

AWS Control Tower
Today we are announcing general availability of AWS Control Tower. This service automates the process of setting up a new baseline multi-account AWS environment that is secure, well-architected, and ready to use. Control Tower incorporates the knowledge that AWS Professional Service has gained over the course of thousands of successful customer engagements, and also draws from the recommendations found in our whitepapers, documentation, the Well-Architected Framework, and training. The guidance offered by Control Tower is opinionated and prescriptive, and is designed to accelerate your cloud journey!

AWS Control Tower builds on multiple AWS services including AWS Organizations, AWS Identity and Access Management (IAM) (including Service Control Policies), AWS Config, AWS CloudTrail, and AWS Service Catalog. You get a unified experience built around a collection of workflows, dashboards, and setup steps. AWS Control Tower automates a landing zone to set up a baseline environment that includes:

  • A multi-account environment using AWS Organizations.
  • Identity management using AWS Single Sign-On (SSO).
  • Federated access to accounts using AWS SSO.
  • Centralize logging from AWS CloudTrail, and AWS Config stored in Amazon S3.
  • Cross-account security audits using AWS IAM and AWS SSO.

Before diving in, let’s review a couple of key Control Tower terms:

Landing Zone – The overall multi-account environment that Control Tower sets up for you, starting from a fresh AWS account.

Guardrails – Automated implementations of policy controls, with a focus on security, compliance, and cost management. Guardrails can be preventive (blocking actions that are deemed as risky), or detective (raising an alert on non-conformant actions).

Blueprints – Well-architected design patterns that are used to set up the Landing Zone.

Environment – An AWS account and the resources within it, configured to run an application. Users make requests (via Service Catalog) for new environments and Control Tower uses automated workflows to provision them.

Using Control Tower
Starting from a brand new AWS account that is both Master Payer and Organization Master, I open the Control Tower Console and click Set up landing zone to get started:

AWS Control Tower will create AWS accounts for log arching and for auditing, and requires email addresses that are not already associated with an AWS account. I enter two addresses, review the information within Service permissions, give Control Tower permission to administer AWS resources and services, and click Set up landing zone:

The setup process runs for about an hour, and provides status updates along the way:

Early in the process, Control Tower sends a handful of email requests to verify ownership of the account, invite the account to participate in AWS SSO, and to subscribe to some SNS topics. The requests contain links that I must click in order for the setup process to proceed. The second email also requests that I create an AWS SSO password for the account. After the setup is complete, AWS Control Tower displays a status report:

The console offers some recommended actions:

At this point, the mandatory guardrails have been applied and the optional guardrails can be enabled:

I can see the Organizational Units (OUs) and accounts, and the compliance status of each one (with respect to the guardrails):

 

Using the Account Factory
The navigation on the left lets me access all of the AWS resources created and managed by Control Tower. Now that my baseline environment is set up, I can click Account factory to provision AWS accounts for my teams, applications, and so forth.

The Account factory displays my network configuration (I’ll show you how to edit it later), and gives me the option to Edit the account factory network configuration or to Provision new account:

I can control the VPC configuration that is used for new accounts, including the regions where VPCs are created when an account is provisioned:

The account factory is published to AWS Service Catalog automatically. I can provision managed accounts as needed, as can the developers in my organization. I click AWS Control Tower Account Factory to proceed:

I review the details and click LAUNCH PRODUCT to provision a new account:

Working with Guardrails
As I mentioned earlier, Control Tower’s guardrails provide guidance that is either Mandatory or Strongly Recommended:

Guardrails are implemented via an IAM Service Control Policy (SCP) or an AWS Config rule, and can be enabled on an OU-by-OU basis:

Now Available
AWS Control Tower is available now and you can start using it today in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) Regions, with more to follow. There is no charge for the Control Tower service; you pay only for the AWS resources that it creates on your behalf.

In addition to adding support for more AWS regions, we are working to allow you to set up a parallel landing zone next to an existing AWS account, and to give you the ability to build and use custom guardrails.

Jeff;

 

New – How to better monitor your custom application metrics using Amazon CloudWatch Agent

Post Syndicated from Helen Lin original https://aws.amazon.com/blogs/devops/new-how-to-better-monitor-your-custom-application-metrics-using-amazon-cloudwatch-agent/

This blog was contributed by Zhou Fang, Sr. Software Development Engineer for Amazon CloudWatch and Helen Lin, Sr. Product Manager for Amazon CloudWatch

Amazon CloudWatch collects monitoring and operational data from both your AWS resources and on-premises servers, providing you with a unified view of your infrastructure and application health. By default, CloudWatch automatically collects and stores many of your AWS services’ metrics and enables you to monitor and alert on metrics such as high CPU utilization of your Amazon EC2 instances. With the CloudWatch Agent that launched last year, you can also deploy the agent to collect system metrics and application logs from both your Windows and Linux environments. Using this data collected by CloudWatch, you can build operational dashboards to monitor your service and application health, set high-resolution alarms to alert and take automated actions, and troubleshoot issues using Amazon CloudWatch Logs.

We recently introduced CloudWatch Agent support for collecting custom metrics using StatsD and collectd. It’s important to collect system metrics like available memory, and you might also want to monitor custom application metrics. You can use these custom application metrics, such as request count to understand the traffic going through your application or understand latency so you can be alerted when requests take too long to process. StatsD and collectd are popular, open-source solutions that gather system statistics for a wide variety of applications. By combining the system metrics the agent already collects, with the StatsD protocol for instrumenting your own metrics and collectd’s numerous plugins, you can better monitor, analyze, alert, and troubleshoot the performance of your systems and applications.

Let’s dive into an example that demonstrates how to monitor your applications using the CloudWatch Agent.  I am operating a RESTful service that performs simple text encoding. I want to use CloudWatch to help monitor a few key metrics:

  • How many requests are coming into my service?
  • How many of these requests are unique?
  • What is the typical size of a request?
  • How long does it take to process a job?

These metrics help me understand my application performance and throughput, in addition to setting alarms on critical metrics that could indicate service degradation, such as request latency.

Step 1. Collecting StatsD metrics

My service is running on an EC2 instance, using Amazon Linux AMI 2018.03.0. Make sure to attach the CloudWatchAgentServerPolicy AWS managed policy so that the CloudWatch agent can collect and publish metrics from this instance:

Here is the service structure:

 

The “/encode” handler simply returns the base64 encoded string of an input text.  To monitor key metrics, such as total and unique request count as well as request size and method response time, I used StatsD to define these custom metrics.

@RestController

public class EncodeController {

    @RequestMapping("/encode")
    public String encode(@RequestParam(value = "text") String text) {
        long startTime = System.currentTimeMillis();
        statsd.incrementCounter("totalRequest.count", new String[]{"path:/encode"});
        statsd.recordSetValue("uniqueRequest.count", text, new String[]{"path:/encode"});
        statsd.recordHistogramValue("request.size", text.length(), new String[]{"path:/encode"});
        String encodedString = Base64.getEncoder().encodeToString(text.getBytes());
        statsd.recordExecutionTime("latency", System.currentTimeMillis() - startTime, new String[]{"path:/encode"});
        return encodedString;
    }
}

Note that I need to first choose a StatsD client from here.

The “/status” handler responds with a health check ping.  Here I am monitoring my available JVM memory:

@RestController
public class StatusController {

    @RequestMapping("/status")
    public int status() {
        statsd.recordGaugeValue("memory.free", Runtime.getRuntime().freeMemory(), new String[]{"path:/status"});
        return 0;
    }
}

 

Step 2. Emit custom metrics using collectd (optional)

collectd is another popular, open-source daemon for collecting application metrics. If I want to use the hundreds of available collectd plugins to gather application metrics, I can also use the CloudWatch Agent to publish collectd metrics to CloudWatch for 15-months retention. In practice, I might choose to use either StatsD or collectd to collect custom metrics, or I have the option to use both. All of these use cases  are supported by the CloudWatch agent.

Using the same demo RESTful service, I’ll show you how to monitor my service health using the collectd cURL plugin, which passes the collectd metrics to CloudWatch Agent via the network plugin.

For my RESTful service, the “/status” handler returns HTTP code 200 to signify that it’s up and running. This is important to monitor the health of my service and trigger an alert when the application does not respond with a HTTP 200 success code. Additionally, I want to monitor the lapsed time for each health check request.

To collect these metrics using collectd, I have a collectd daemon installed on the EC2 instance, running version 5.8.0. Here is my collectd config:

LoadPlugin logfile
LoadPlugin curl
LoadPlugin network

<Plugin logfile>
  LogLevel "debug"
  File "/var/log/collectd.log"
  Timestamp true
</Plugin>

<Plugin curl>
    <Page "status">
        URL "http://localhost:8080/status";
        MeasureResponseTime true
        MeasureResponseCode true
    </Page>
</Plugin>

<Plugin network>
    <Server "127.0.0.1" "25826">
        SecurityLevel Encrypt
        Username "user"
        Password "secret"
    </Server>
</Plugin>

 

For the cURL plugin, I configured it to measure response time (latency) and response code (HTTP status code) from the RESTful service.

Note that for the network plugin, I used Encrypt mode which requires an authentication file for the CloudWatch Agent to authenticate incoming collectd requests.  Click here for full details on the collectd installation script.

 

Step 3. Configure the CloudWatch agent

So far, I have shown you how to:

A.  Use StatsD to emit custom metrics to monitor my service health
B.  Optionally use collectd to collect metrics using plugins

Next, I will install and configure the CloudWatch agent to accept metrics from both the StatsD client and collectd plugins.

I installed the CloudWatch Agent following the instructions in the user guide, but here are the detailed steps:

Install CloudWatch Agent:

wget https://s3.amazonaws.com/amazoncloudwatch-agent/linux/amd64/latest/AmazonCloudWatchAgent.zip -O AmazonCloudWatchAgent.zip && unzip -o AmazonCloudWatchAgent.zip && sudo ./install.sh

Configure CloudWatch Agent to receive metrics from StatsD and collectd:

{
  "metrics": {
    "append_dimensions": {
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "collectd": {},
      "statsd": {}
    }
  }
}

Pass the above config (config.json) to the CloudWatch Agent:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:config.json -s

In case you want to skip these steps and just execute my sample agent install script, you can find it here.

 

Step 4. Generate and monitor application traffic in CloudWatch

Now that I have the CloudWatch agent installed and configured to receive StatsD and collect metrics, I’m going to generate traffic through the service:

echo "send 100 requests"
for i in {1..100}
do
   curl "localhost:8080/encode?text=TextToEncode_${i}[email protected]#%"
   echo ""
   sleep 1
done

 

Next, I log in to the CloudWatch console and check that the service is up and running. Here’s a graph of the StatsD metrics:

 

Here is a graph of the collectd metrics:

 

Conclusion

With StatsD and collectd support, you can now use the CloudWatch Agent to collect and monitor your custom applications in addition to the system metrics and application logs it already collects. Furthermore, you can create operational dashboards with these metrics, set alarms to take automated actions when free memory is low, and troubleshoot issues by diving into the application logs.  Note that StatsD supports both Windows and Linux operating systems while collectd is Linux only.  For Windows, you can also continue to use Windows Performance Counters to collect custom metrics instead.

The CloudWatch Agent with custom metrics support (version 1.203420.0 or later) is available in all public AWS Regions, AWS GovCloud (US), with AWS China (Beijing) and AWS China (Ningxia) coming soon.

The agent is free to use; you pay the usual CloudWatch prices for logs and custom metrics.

For more details, head over to the CloudWatch user guide for StatsD and collectd.

Building an Amazon CloudWatch Dashboard Outside of the AWS Management Console

Post Syndicated from Stephen McCurry original https://aws.amazon.com/blogs/devops/building-an-amazon-cloudwatch-dashboard-outside-of-the-aws-management-console/

Steve McCurry is a Senior Product Manager for CloudWatch

This is the second in a series of two blog posts that demonstrate how to use the new CloudWatch
snapshot graphs feature. You can find the first post here.

A key challenge for any DevOps team is to provide sufficient monitoring visibility on service
health. Although CloudWatch dashboards are a powerful tool for monitoring your systems and
applications, the dashboards are accessible only to users with permissions to the AWS
Management Console. You can now use a new CloudWatch feature, snapshot graphs, to create
dashboards that contain CloudWatch graphs and are available outside of the AWS Management
Console. You can display CloudWatch snapshot graphs on your internal wiki pages or TV-based
dashboards. You can integrate them with chat applications and ticketing and bug tracking tools.

This blog post shows you how to embed CloudWatch snapshot graphs into your websites using a
lightweight, embeddable widget written in JavaScript.

Snapshot graphs overview

CloudWatch snapshot graphs are images of CloudWatch charts that are useful for building
custom dashboards or integrating with tools outside of AWS. Although the images are static,
they can be refreshed frequently to create a live dashboard experience.

CloudWatch dashboards and charts provide flexible, interactive visualizations that can be used to
create unified operational views across your AWS resources and metrics. However, maybe you
want to display CloudWatch charts on a TV screen for team-level visibility, take snapshots of
charts for auditing in ticketing systems and bug tracking tools, or share snapshots in chat
applications to collaborate on an issue. For these use cases and more, snapshot graphs are an
ideal tool for integrating CloudWatch charts with your webpages and third-party applications.

Snapshot graphs are available through the CloudWatch API, which you can use through the
AWS SDKs or CLI. The charts you request through the API are represented as JSON. To copy
the JSON definition of the graph and use it in the API request, open the Amazon CloudWatch
console. You’ll find the JSON on the Source tab of the Metrics page, as shown here.

All of the features of the CloudWatch line and stacked graphs are available in snapshot graphs,
including vertical and horizontal annotations.

Embedding a snapshot graph in your webpage

In this demonstration, we will set up monitoring for an EC2 instance and embed a CloudWatch
snapshot graph for CPUUtilization in a website outside of the AWS Management Console. The
embeddable widget can be configured to support any CloudWatch line or stacked chart. This
demonstration involves these steps:

  1. Create an EC2 instance to monitor.
  2. Create the Lambda function that calls CloudWatch GetMetricWidgetImage.
  3. Create an API Gateway endpoint that proxies requests to the Lambda function.
  4. Embed the widget into a website and configure it for the API Gateway request.

The code for this solution is available from the SnapshotWidgetDemo GitHub repo.

The embeddable JavaScript widget will communicate with CloudWatch through a gateway in
Amazon API Gateway and an AWS Lambda backend. The advantage of using API Gateway is
the additional flexibility you have to secure the endpoint and create fine-grained access control.
For example, you can block access to the endpoint from outside of your corporate network.
Amazon Route 53 could be an alternative solution.

The end goal is to have a webpage running on your local machine that displays a CloudWatch
snapshot graph displaying live metric data from a sample EC2 instance. The sample code
includes a basic webpage containing the embed code.

The JavaScript widget requests a snapshot graph from an API Gateway endpoint. API Gateway
proxies the request to a Lambda function that calls the new CloudWatch API service,
GetMetricWidgetImage. The retrieved snapshot graph is returned in binary and displayed on the
website in an IMG HTML tag.

Here is what the end-to-end solution looks like:

Server setup

  1. Download the repository.
  2. Navigate to ./server and run npm install
  3. From the server folder, run zip -r snapshotwidgetdemo.zip ./*
  4. Upload snapshotwidgetdemo.zip to any S3 bucket.
  5. Upload ./server/apigateway-lambda.json to any S3 bucket.
  6. Navigate to the AWS CloudFormation console and choose Create Stack.
    • Point the new stack to the S3 location in step 5.
    • During setup, you will be asked for the Lambda S3 bucket name from step 4.

The AWS CloudFormation script will create all the required server-side components described in
the previous section.

Client setup

  1. Navigate to ./client and run npm install
  2. Edit ./demo/index.html to replace the following placeholders with your values.
    a. <YOUR_INSTANCE_ID> You can find the instance ID in the AWS
    CloudFormation stack output.
    b. <YOUR_API_GATEWAY_URL> You can find the full URL in the AWS
    CloudFormation stack output.
    c. <YOUR_API_KEY> The API gateway requires a key. The key reference but not
    the key itself appears in theAWS CloudFormation stack output. To retrieve the
    key value, go to the Keys tab of the Amazon API Gateway console.
  3. Build the component using WebPack ./node_modules/.bin/webpack –config
    webpack.config.js
  4. Server the demo webpage on localhost ./node_modules/.bin/webpack-dev-server —
    open

The browser should open at index.html automatically. The page contains one embedded snapshot
graph with the CPU utilization of your EC2 instance.

Troubleshooting

If you don’t see anything on the webpage, use the browser console tools to check for console
error messages.

If you still can’t debug the problem, go to the Amazon API Gateway console. On the Logs tab,
make sure that Enable CloudWatch Logs is selected, as shown here:

To check the Lambda logs, in the CloudWatch console, choose the Logs tab, and then search for
the name of your Lambda function.

Summary

This blog post provided a solution for embedding CloudWatch snapshot graphs into webpages
and wikis outside of the AWS Management Console. To read the other blog post in this series
about CloudWatch snapshot graphs, see Reduce Time to Resolution with Amazon CloudWatch
Snapshot Graphs and Alerts.

For more information, see the snapshot graphs API documentation or visit our home page to
learn more about how Amazon CloudWatch achieves monitoring visibility for your cloud
resources and applications.

It would be great to hear your feedback.

AWS Online Tech Talks – June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-june-2018/

AWS Online Tech Talks – June 2018

Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

 

Analytics & Big Data

June 18, 2018 | 11:00 AM – 11:45 AM PTGet Started with Real-Time Streaming Data in Under 5 Minutes – Learn how to use Amazon Kinesis to capture, store, and analyze streaming data in real-time including IoT device data, VPC flow logs, and clickstream data.
June 20, 2018 | 11:00 AM – 11:45 AM PT – Insights For Everyone – Deploying Data across your Organization – Learn how to deploy data at scale using AWS Analytics and QuickSight’s new reader role and usage based pricing.

 

AWS re:Invent
June 13, 2018 | 05:00 PM – 05:30 PM PTEpisode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar.
Compute

June 25, 2018 | 01:00 PM – 01:45 PM PTAccelerating Containerized Workloads with Amazon EC2 Spot Instances – Learn how to efficiently deploy containerized workloads and easily manage clusters at any scale at a fraction of the cost with Spot Instances.

June 26, 2018 | 01:00 PM – 01:45 PM PTEnsuring Your Windows Server Workloads Are Well-Architected – Get the benefits, best practices and tools on running your Microsoft Workloads on AWS leveraging a well-architected approach.

 

Containers
June 25, 2018 | 09:00 AM – 09:45 AM PTRunning Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.

 

Databases

June 18, 2018 | 01:00 PM – 01:45 PM PTOracle to Amazon Aurora Migration, Step by Step – Learn how to migrate your Oracle database to Amazon Aurora.
DevOps

June 20, 2018 | 09:00 AM – 09:45 AM PTSet Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tools – Learn how to set up a CI/CD pipeline for deploying containers using the AWS Developer Tools.

 

Enterprise & Hybrid
June 18, 2018 | 09:00 AM – 09:45 AM PTDe-risking Enterprise Migration with AWS Managed Services – Learn how enterprise customers are de-risking cloud adoption with AWS Managed Services.

June 19, 2018 | 11:00 AM – 11:45 AM PTLaunch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new

 

AWS Environments

June 21, 2018 | 11:00 AM – 11:45 AM PTLeading Your Team Through a Cloud Transformation – Learn how you can help lead your organization through a cloud transformation.

June 21, 2018 | 01:00 PM – 01:45 PM PTEnabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.

June 28, 2018 | 01:00 PM – 01:45 PM PTFireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device.
IoT

June 27, 2018 | 11:00 AM – 11:45 AM PTAWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.

 

Machine Learning

June 19, 2018 | 09:00 AM – 09:45 AM PTIntegrating Amazon SageMaker into your Enterprise – Learn how to integrate Amazon SageMaker and other AWS Services within an Enterprise environment.

June 21, 2018 | 09:00 AM – 09:45 AM PTBuilding Text Analytics Applications on AWS using Amazon Comprehend – Learn how you can unlock the value of your unstructured data with NLP-based text analytics.

 

Management Tools

June 20, 2018 | 01:00 PM – 01:45 PM PTOptimizing Application Performance and Costs with Auto Scaling – Learn how selecting the right scaling option can help optimize application performance and costs.

 

Mobile
June 25, 2018 | 11:00 AM – 11:45 AM PTDrive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.

 

Security, Identity & Compliance

June 26, 2018 | 09:00 AM – 09:45 AM PTUnderstanding AWS Secrets Manager – Learn how AWS Secrets Manager helps you rotate and manage access to secrets centrally.
June 28, 2018 | 09:00 AM – 09:45 AM PTUsing Amazon Inspector to Discover Potential Security Issues – See how Amazon Inspector can be used to discover security issues of your instances.

 

Serverless

June 19, 2018 | 01:00 PM – 01:45 PM PTProductionize Serverless Application Building and Deployments with AWS SAM – Learn expert tips and techniques for building and deploying serverless applications at scale with AWS SAM.

 

Storage

June 26, 2018 | 11:00 AM – 11:45 AM PTDeep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services.
June 27, 2018 | 01:00 PM – 01:45 PM PTChanging the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances.
June 28, 2018 | 11:00 AM – 11:45 AM PTBig Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.