Tag Archives: Management Tools

Visualize and Monitor Highly Distributed Applications with Amazon CloudWatch ServiceLens

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/visualize-and-monitor-highly-distributed-applications-with-amazon-cloudwatch-servicelens/

Increasingly distributed applications, with thousands of metrics and terabytes of logs, can be a challenge to visualize and monitor. Gaining an end-to-end insight of the applications and their dependencies to enable rapid pinpointing of performance bottlenecks, operational issues, and customer impact quite often requires the use of multiple dedicated tools each presenting their own particular facet of information. This in turn leads to more complex data ingestion, manual stitching together of the various insights to determine overall performance, and increased costs from maintaining multiple solutions.

Amazon CloudWatch ServiceLens, announced today, is a new fully managed observability solution to help with visualizing and analyzing the health, performance, and availability of highly distributed applications, including those with dependencies on serverless and container-based technologies, all in one place. By enabling you to easily isolate endpoints and resources that are experiencing issues, together with analysis of correlated metrics, logs, and application traces, CloudWatch ServiceLens helps reduce Mean Time to Resolution (MTTR) by consolidating all of this data in a single place using a service map. From this map you can understand the relationships and dependencies within your applications, and dive deep into the various logs, metrics, and traces from a single tool to help you quickly isolate faults. Crucial time spent correlating metrics, logs, and trace data from across various tools is saved, thus reducing any downtime incurred by end users.

Getting Started with Amazon CloudWatch ServiceLens
Let’s see how we can take advantage of Amazon CloudWatch ServiceLens to diagnose the root cause of an alarm triggered from an application. My sample application reads and writes transaction data to a Amazon DynamoDB table using AWS Lambda functions. An Amazon API Gateway is my application’s front-end, with resources for GET and PUT requests, directing traffic to the corresponding GET and PUT lambda functions. The API Gateway resources and the Lambda functions have AWS X-Ray tracing enabled, and the API calls to DynamoDB from within the Lambda functions are wrapped using the AWS X-Ray SDK. You can read more about how to instrument your code, and work with AWS X-Ray, in the Developer Guide.

An error condition has triggered an alarm for my application, so my first stop is the Amazon CloudWatch Console, where I click the Alarm link. I can see that there is some issue with availability with one or more of my API Gateway resources.

Let’s drill down to see what might be going on. First I want to get an overview of the running application so I click Service Map under ServiceLens in the left-hand navigation panel. The map displays nodes representing the distributed resources in my application. The relative size of the nodes represents the amount of request traffic that each is receiving, as does the thickness of the links between them. I can toggle the map between showing Requests modes or Latency mode. Using the same drop-down I can also toggle the option to change relative sizing of the nodes. The data shown for Request mode or Latency mode helps me isolate nodes that I need to triage first. Clicking View connections can also be used to aid in the process, since it helps me understand incoming and outgoing calls, and their impact on the individual nodes.

I’ve closed the map legend in the screenshot so we can get a good look at all the nodes, for reference here it is below.

From the map I can immediately see that my front-end gateway is the source of the triggered alarm. The red indicator on the node is showing me that there are 5xx faults associated with the resource, and the length of the indicator relative to the circumference gives me some idea of how many requests are faulting compared to successful requests. Secondly, I can see that the Lambda functions that are handling PUT requests through the API are showing 4xx errors. Finally I can see a purple indicator on the DynamoDB table, indicating throttling is occurring. At this point I have a pretty good idea of what might be happening, but let’s dig a little deeper to see what CloudWatch ServiceLens can help me prove.

In addition to Map view, I can also toggle List view. This gives me at-a-glance information on average latency, faults, and requests/min for all nodes and is specifically ordered by default to show the “worst” node first, using a sort order of descending by fault rate – descending by number of alarms in alarm – ascending by Node name.

Returning to Map view, hovering my mouse over the node representing my API front-end also gives me similar insight into traffic and faulting request percentage, specific to the node.

To see even more data, for any node, clicking it will open a drawer below the map containing graphed data for that resource. In the screenshot below I have clicked on the ApiGatewayPutLambdaFunction node.

The drawer, for each resource, enables me to jump to view logs for the resource (View logs), traces (View traces), or a dashboard (View dashboard). Below, I have clicked View dashboard for the same Lambda function. Scrolling through the data presented for that resource, I note that the duration of execution is not high, while all invokes are going into error in tandem.

Returning to the API front-end that is showing the alarm, I’d like to take look at the request traces so I click the node to open the drawer, then click View traces. I already know from the map that 5xx and 4xx status codes are being generated in the code paths selected by PUT requests coming into my application so I switch Filter type to be Status code, then select both 502 and 504 entries in the list, finally clicking Add to filter. The Traces view switches to show me the traces that resulted in those error codes, the response time distribution and a set of traces.

Ok, now we’re getting close! Clicking the first trace ID, I get a wealth of data including exception messages about that request – more than I can show in a single screenshot! For example, I can see the timelines of each segment that was traced as part of the request.

Scrolling down further, I can view exception details (below this I can also see log messages specific to the trace too) – and here lays my answer, confirming the throttling indicator that I saw in the original map. I can also see this exception message in the log data specific to the trace, shown at the foot of the page. Previously, I would have had to scan through logs for the entire application to hopefully spot this message, being able to drill down from the map is a significant time saver.

Now I know how to fix the issue and get the alarm resolved – increase the write provisioning for the table!

In conjunction with CloudWatch ServiceLens, Amazon CloudWatch has also launched a preview of CloudWatch Synthetics that helps to monitor endpoints using canaries that run 24×7, 365 days a year, so that I am quickly alerted of issues that my customers are facing. These are also visualized on the Service Map and just as I did above, I can drill down to the traces to view transactions that originated from a canary. The faster I can dive deep into a consolidated view of an operational failure or an alarm, the faster I can root cause the issue and help reduce time to resolution and mitigate the customer impact.

Amazon CloudWatch ServiceLens is available now in all commercial AWS Regions.

— Steve

AWS Cloud Development Kit (CDK) – Java and .NET are Now Generally Available

Post Syndicated from Martin Beeby original https://aws.amazon.com/blogs/aws/aws-cloud-development-kit-cdk-java-and-net-are-now-generally-available/

Today, we are happy to announce that Java and .NET support inside the AWS Cloud Development Kit (CDK) is now generally available. The AWS CDK is an open-source software development framework to model and provision your cloud application resources through AWS CloudFormationAWS CDK also offers support for TypeScript and Python.

With the AWS CDK, you can design, compose, and share your own custom resources that incorporate your unique requirements. For example, you can use the AWS CDK to model a VPC, with its associated routing and security configurations. You could then wrap that code into a construct and then share it with the rest of your organization. In this way, you can start to build up libraries of these constructs that you can use to standardize the way your organization creates AWS resources.

I like that by using the AWS CDK, you can build your application, including the infrastructure, in your favorite IDE, using the same programming language that you use for your application code. As you code your AWS CDK model in either .NET or Java, you get productivity benefits like code completion and inline documentation, which make it faster to model your infrastructure.

How the AWS CDK Works
Everything in the AWS CDK is a construct. You can think of constructs as cloud components that can represent architectures of any complexity: a single resource, such as a Amazon Simple Storage Service (S3) bucket or a Amazon Simple Notification Service (SNS) topic, a static website, or even a complex, multi-stack application that spans multiple AWS accounts and regions. You compose constructs together into stacks that you can deploy into an AWS environment, and apps – a collection of one or more stacks.

The AWS CDK includes the AWS Construct Library, which contains constructs representing AWS resources.

How to use the AWS CDK
I’m going to use the AWS CDK to build a simple queue, but rather than handcraft a CloudFormation template in YAML or JSON, the AWS CDK allows me to use a familiar programming language to generate and deploy AWS CloudFormation templates.

To get started, I need to install the AWS CDK command-line interface using NPM. Once this download completes, I can code my infrastructure in either TypeScript, Python, JavaScript, Java, or, .NET.

npm i -g aws-cdk

On my local machine, I create a new folder and navigate into it.

mkdir cdk-newsblog-dotnet && cd cdk-newsblog-dotnet

Now I have installed the CLI I can execute commands such as cdk init and pass a language switch, in this instance, I am using .NET, and the sample app with the csharp language switch.

cdk init sample-app --language csharp

If I wanted to use Java rather than .NET, I would change the --language switch to java.

cdk init sample-app --language java

Since I am in the terminal, I type code . which is a shortcut to open the current folder in VS Code. You could, of course, use any editor, such as Visual Studio or JetBrains Rider. As you can see below, the init command has created a basic .NET AWS CDK project.

If I look into the Program.cs, the Main void creates an App and then a CDKDotnetStack. This stack CDKDotnetStack is defined in the CDKDotnetStack.cs file. This is where the meat of the project resides and where all the AWS resources are defined.

Inside the CDKDotnetStack.cs file, some code creates a Amazon Simple Queue Service (SQS) then a topic and then finally adds a Amazon Simple Notification Service (SNS) subscription to the topic.

Now that I have written the code, the next step is to deploy it. When I do, the AWS CDK will compile and execute this project, converting my .NET code into a AWS CloudFormation template.

If I were to just deploy this now, I wouldn’t actually see the CloudFormation template, so the AWS CDK provides a command cdk synth that takes my application, compiles it, executes it, and then outputs a CloudFormation template.

This is just standard CloudFormation, if you look through it, you will find the following items:

  • AWS::SQS::Queue – The queue I added.
  • AWS::SQS::QueuePolicy – An IAM policy that allows my topic to send messages to my queue. I didn’t actually define this in code, but the AWS CDK is smart enough to know I need one of these, and so creates one.
  • AWS::SNS::Topic – The topic I created.
  • AWS::SNS::Subscription – The subscription between the queue and the topic.
  • AWS::CDK::Metadata This section is specific to the AWS CDK and is automatically added by the toolkit to every stack. It is used by the AWS CDK team for analytics and to allow us to identify versions if there are any issues.

Before I deploy this project to my AWS account, I will use cdk bootstrap. The bootstrap command will create a Amazon Simple Storage Service (S3) bucket for me, which will be used by the AWS CDK to store any assets that might be required during deployment. In this example, I am not using any assets, so technically, I could skip this step. However, it is good practice to bootstrap your environment from the start, so you don’t get deployment errors later if you choose to use assets.

I’m now ready to deploy my project and to do that I issue the following command cdk deploy

This command first creates the AWS CloudFormation template then deploys it into my account. Since my project will make a security change, it asks me if I wish to deploy these changes. I select yes, and a CloudFormation changeset is created, and my resources start building.

Once complete, I can go over to the CloudFormation console and see that all the resources are now part of a AWS CloudFormation stack.

That’s it, my resources have been successfully built in the cloud, all using .NET.

With the addition of Java and .NET, the AWS CDK now supports 5 programming languages in total, giving you more options in how you build your AWS resources. Why not install the AWS CDK today and give it a try in whichever language is your favorite?

— Martin

 

AWS Systems Manager Explorer – A Multi-Account, Multi-Region Operations Dashboard

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/aws-systems-manager-explorer-a-multi-account-multi-region-operations-dashboard/

Since 2006, Amazon Web Services has been striving to simplify IT infrastructure. Thanks to services like Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), Amazon Relational Database Service (RDS), AWS CloudFormation and many more, millions of customers can build reliable, scalable, and secure platforms in any AWS region in minutes. Having spent 10 years procuring, deploying and managing more hardware than I care to remember, I’m still amazed every day by the pace of innovation that builders achieve with our services.

With great power come great responsibility. The second you create AWS resources, you’re responsible for them: security of course, but also cost and scaling. This makes monitoring and alerting all the more important, which is why we built services like Amazon CloudWatch, AWS Config and AWS Systems Manager.

Still, customers told us that their operations work would be much simpler if they could just look at a single dashboard, listing potential issues on AWS resources no matter which ones of their accounts or which region they’ve been created in.

We got to work, and today we’re very happy to announce the availability of AWS Systems Manager Explorer, a unified operations dashboard built as part of Systems Manager.

Introducing AWS Systems Manager Explorer
Collecting monitoring information and alerts from EC2, Config, CloudWatch and Systems Manager, Explorer presents you with an intuitive graphical dashboard that lets you quickly view and browse problems affecting your AWS resources. By default, this data comes from the account and region you’re running in, and you can easily include other regions as well as other accounts managed with AWS Organizations.

Specifically, Explorer can provide operational information about:

  • EC2 issues, such as unhealthy instances,
  • EC2 instances that have a non-compliant patching status,
  • AWS resources that don’t comply with Config rules (predefined or your own),
  • AWS resources that have triggered a CloudWatch Events rule (predefined or your own).

Each issue is stored as an OpsItem in AWS Systems Manager OpsCenter, and is assigned a status (open, in progress, resolved), a severity and a category (security, performance, cost, etc.). Widgets let you quickly browse OpsItems, and a timeline of all OpsItems is also available.

In addition to OpsItems, the Explorer dashboard also includes widgets that show consolidated information on EC2 instances:

  • Instance count, with a tag filter,
  • Instances managed by Systems Manager, as well as unmanaged instances,
  • Instances sorted by AMI id.

As you would expect, all information can be exported to S3 for archival or further processing, and you can also set up Amazon Simple Notification Service (SNS) notifications. Last but not least, all data visible on the dashboard can be accessed from the AWS CLI or any AWS SDK with the GetOpsSummary API.

Let’s take a quick tour.

A Look at AWS Systems Manager Explorer
Before using Explorer, we recommend that you first set up Config and Systems Manager. This will help populate your Explorer dashboard immediately. No setup is required for CloudWatch events.

Setting up Config is a best practice, and the procedure is extremely simple: don’t forget to enable EC2 rules too. Setting up Systems Manager is equally simple, thanks to the quick setup procedure: add managed instances and check for patch compliance in just a few clicks! Don’t forget to do this in all regions and accounts you want Explorer to manage.

If you set these services up later, you’ll have to wait for a little while for data to be retrieved and displayed.

Now, let’s head out to the AWS console for Explorer.

Once I’ve completed the one-click setup page creating a service role and enabling data sources, a quick look at the CloudWatch Events console confirms that rules have been created automatically.

Explorer recommends that I add regions and accounts in order to get a unified view. Of course, you can skip this step if you just want a quick taste of the service.

If you’re keen on synchronizing data, you can easily create a resource data sync, which will fetch operations data coming from other regions and other accounts. I’m going all in here, but please make sure you tick the boxes that work for you.

Once data has been retrieved and processed, my dashboard lights up. Good thing it’s only a test account!

I can also see information on all EC2 instances.

From here on, I can group OpsItems and instances according to different dimensions (accounts, regions, tags). I can also drill down on OpsItems, view their details in Opscenter, and apply runbooks to fix them. If you want to know more about Opscenter, here’s the launch post.

Now Available!
We believe AWS Systems Manager Explorer will help Operations teams find and solve problems easier and faster, no matter what the scale of their AWS infrastructure is.

This feature is available today in all regions where AWS Systems Manager OpsCenter is available. Give it a try, and please share your feedback in the AWS forum for AWS Systems Manager, or with your usual AWS support contacts.

Julien

New – Import Existing Resources into a CloudFormation Stack

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-import-existing-resources-into-a-cloudformation-stack/

With AWS CloudFormation, you can model your entire infrastructure with text files. In this way, you can treat your infrastructure as code and apply software development best practices, such as putting it under version control, or reviewing architectural changes with your team before deployment.

Sometimes AWS resources initially created using the console or the AWS Command Line Interface (CLI) need to be managed using CloudFormation. For example, you (or a different team) may create an IAM role, a Virtual Private Cloud, or an RDS database in the early stages of a migration, and then you have to spend time to include them in the same stack as the final application. In such cases, you often end up recreating the resources from scratch using CloudFormation, and then migrating configuration and data from the original resource.

To make these steps easier for our customers, you can now import existing resources into a CloudFormation stack!

It was already possible to remove resources from a stack without deleting them by setting the DeletionPolicy to Retain. This, together with the new import operation, enables a new range of possibilities. For example, you are now able to:

  • Create a new stack importing existing resources.
  • Import existing resources in an already created stack.
  • Migrate resources across stacks.
  • Remediate a detected drift.
  • Refactor nested stacks by deleting children stacks from one parent and then importing them into another parent stack.

To import existing resources into a CloudFormation stack, you need to provide:

  • A template that describes the entire stack, including both the resources to import and (for existing stacks) the resources that are already part of the stack.
  • Each resource to import must have a DeletionPolicy attribute in the template. This enables easy reverting of the operation in a completely safe manner.
  • A unique identifier for each target resource, for example the name of the Amazon DynamoDB table or of the Amazon Simple Storage Service (S3) bucket you want to import.

During the resource import operation, CloudFormation checks that:

  • The imported resources do not already belong to another stack in the same region (be careful with global resources such as IAM roles).
  • The target resources exist and you have sufficient permissions to perform the operation.
  • The properties and configuration values are valid against the resource type schema, which defines its required, acceptable properties, and supported values.

The resource import operation does not check that the template configuration and the actual configuration are the same. Since the import operation supports the same resource types as drift detection, I recommend running drift detection after importing resources in a stack.

Importing Existing Resources into a New Stack
In my AWS account, I have an S3 bucket and a DynamoDB table, both with some data inside, and I’d like to manage them using CloudFormation. In the CloudFormation console, I have two new options:

  • I can create a new stack importing existing resources.

  • I can import resources into an existing stack.

In this case, I want to start from scratch, so I create a new stack. The next step is to provide a template with the resources to import.

I upload the following template with two resources to import: a DynamoDB table and an S3 bucket.

AWSTemplateFormatVersion: "2010-09-09"
Description: Import test
Resources:

  ImportedTable:
    Type: AWS::DynamoDB::Table
    DeletionPolicy: Retain
    Properties: 
      BillingMode: PAY_PER_REQUEST
      AttributeDefinitions: 
        - AttributeName: id
          AttributeType: S
      KeySchema: 
        - AttributeName: id
          KeyType: HASH

  ImportedBucket:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain

In this template I am setting DeletionPolicy  to Retain for both resources. In this way, if I remove them from the stack, they will not be deleted. This is a good option for resources which contain data you don’t want to delete by mistake, or that you may want to move to a different stack in the future. It is mandatory for imported resources to have a deletion policy set, so you can safely and easily revert the operation, and be protected from mistakenly deleting resources that were imported by someone else.

I now have to provide an identifier to map the logical IDs in the template with the existing resources. In this case, I use the DynamoDB table name and the S3 bucket name. For other resource types, there may be multiple ways to identify them and you can select which property to use in the drop-down menus.

In the final recap, I review changes before applying them. Here I check that I’m targeting the right resources to import with the right identifiers. This is actually a CloudFormation Change Set that will be executed when I import the resources.

When importing resources into an existing stack, no changes are allowed to the existing resources of the stack. The import operation will only allow the Change Set action of Import. Changes to parameters are allowed as long as they don’t cause changes to resolved values of properties in existing resources. You can change the template for existing resources to replace hard coded values with a Ref to a resource being imported. For example, you may have a stack with an EC2 instance using an existing IAM role that was created using the console. You can now import the IAM role into the stack and replace in the template the hard coded value used by the EC2 instance with a Ref to the role.

Moving on, each resource has its corresponding import events in the CloudFormation console.

When the import is complete, in the Resources tab, I see that the S3 bucket and the DynamoDB table are now part of the stack.

To be sure the imported resources are in sync with the stack template, I use drift detection.

All stack-level tags, including automatically created tags, are propagated to resources that CloudFormation supports. For example, I can use the AWS CLI to get the tag set associated with the S3 bucket I just imported into my stack. Those tags give me the CloudFormation stack name and ID, and the logical ID of the resource in the stack template:

$ aws s3api get-bucket-tagging --bucket danilop-toimport

{
  "TagSet": [
    {
      "Key": "aws:cloudformation:stack-name",
      "Value": "imported-stack"
    },
    {
      "Key": "aws:cloudformation:stack-id",
      "Value": "arn:aws:cloudformation:eu-west-1:123412341234:stack/imported-stack/..."
    },
    {
      "Key": "aws:cloudformation:logical-id",
      "Value": "ImportedBucket"
    }
  ]
}

Available Now
You can use the new CloudFormation import operation via the console, AWS Command Line Interface (CLI), or AWS SDKs, in the following regions: US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), Canada (Central), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), EU (Frankfurt), EU (Ireland), EU (London), EU (Paris), and South America (São Paulo).

It is now simpler to manage your infrastructure as code, you can learn more on bringing existing resources into CloudFormation management in the documentation.

Danilo

Operational Insights for Containers and Containerized Applications

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/operational-insights-for-containers-and-containerized-applications/

The increasing adoption of containerized applications and microservices also brings an increased burden for monitoring and management. Builders have an expectation of, and requirement for, the same level of monitoring as would be used with longer lived infrastructure such as Amazon Elastic Compute Cloud (EC2) instances. By contrast containers are relatively short-lived, and usually subject to continuous deployment. This can make it difficult to reliably collect monitoring data and to analyze performance or other issues, which in turn affects remediation time. In addition builders have to resort to a disparate collection of tools to perform this analysis and inspection, manually correlating context across a set of infrastructure and application metrics, logs, and other traces.

Announcing general availability of Amazon CloudWatch Container Insights
At the AWS Summit in New York this past July, Amazon CloudWatch Container Insights support for Amazon ECS and AWS Fargate was announced as an open preview for new clusters. Starting today Container Insights is generally available, with the added ability to now also monitor existing clusters. Immediate insights into compute utilization and failures for both new and existing cluster infrastructure and containerized applications can be easily obtained from container management services including Kubernetes, Amazon Elastic Container Service for Kubernetes, Amazon ECS, and AWS Fargate.

Once enabled Amazon CloudWatch discovers all of the running containers in a cluster and collects performance and operational data at every layer in the container stack. It also continuously monitors and updates as changes occur in the environment, simplifying the number of tools required to collect, monitor, act, and analyze container metrics and logs giving complete end to end visibility.

Being able to easily access this data means customers can shift focus to increased developer productivity and away from building mechanisms to curate and build dashboards.

Getting started with Amazon CloudWatch Container Insights
I can enable Container Insights by following the instructions in the documentation. Once enabled and new clusters launched, when I visit the CloudWatch console for my region I see a new option for Container Insights in the list of dashboards available to me.

Clicking this takes me to the relevant dashboard where I can select the container management service that hosts the clusters that I want to observe.

In the below image I have selected to view metrics for my ECS Clusters that are hosting a sample application I have deployed in AWS Fargate. I can examine the metrics for standard time periods such as 1 hour, 3 hours, etc but can also specify custom time periods. Here I am looking at the metrics for a custom time period of the past 15 minutes.

You can see that I can quickly gain operational oversight of the overall performance of the cluster. Clicking the cluster name takes me deeper to view the metrics for the tasks inside the cluster.

Selecting a container allows me to then dive into either AWS X-Ray traces or performance logs.

Selecting performance logs takes me to the Amazon CloudWatch Logs Insights page where I can run queries against the performance events collected for my container ecosystem (e.g., Container, Task/Pod, Cluster, etc.) that I can then use to troubleshoot and dive deeper.

Container Insights makes it easy for me to get started monitoring my containers and enables me to quickly drill down into performance metrics and log analytics without the need to build custom dashboards to curate data from multiple tools. Beyond monitoring and troubleshooting I can also use the data and dashboards Container Insights provides me to support other use cases such as capacity requirements and planning, by helping me understand compute utilization by Pod/Task, Container, and Service for example.

Availability
Amazon CloudWatch Container Insights is generally available today to customers in all public AWS regions where Amazon Elastic Container Service for Kubernetes, Kubernetes, Amazon ECS, and AWS Fargate are present.

— Steve

Deploying GitOps with Weave Flux and Amazon EKS

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/deploying-gitops-with-weave-flux-and-amazon-eks/

This post is contributed by Jon Jozwiak | Senior Solutions Architect, AWS

 

You have countless options for deploying resources into an Amazon EKS cluster. GitOps—a term coined by Weaveworks—provides some substantial advantages over the alternatives. With only Git as the single, central source for controlling deployment into your cluster, GitOps provides easy version control on a platform your team already knows. Getting started with GitOps is straightforward: create a pull request, merge, and the configuration deploys to the EKS cluster.

Weave Flux makes running GitOps in your EKS cluster fast and easy, as it monitors your configuration in Git and image repositories and automates deployments. Weave Flux follows a pull model, automatically triggering deployments based on changes. This provides better security than most continuous deployment tools, which need permissions to access your cluster. This approach also provides Git with version control over your configuration and enables rollback.

This post walks through implementing Weave Flux and deploying resources to EKS using Git. To simplify the image build pipeline, I use AWS Service Catalog to provide a standardized pipeline. AWS Service Catalog lets you centrally define a portfolio of approved products that AWS users can provision. An AWS CloudFormation template defines each product, which can be version-controlled.

After you deploy the sample resources, I quickly demonstrate the GitOps approach where a new image results in the configuration automatically deploying to EKS. This new image may be a commit of Kubernetes manifests or a commit of Helm release definitions.

The following diagram shows the workflow.

Prerequisites

In GitOps, you manage Docker image builds separately from deployment configuration. For image builds, this example uses AWS CodePipeline and AWS CodeBuild, which provide a managed workflow from GitHub source through to an image landing in Amazon Elastic Container Registry (ECR).

This post assumes that you already have an EKS cluster deployed, including kubectl access. It also assumes that you have a GitHub account.

GitHub setup

First, create a GitHub repository to store the Kubernetes manifests (configuration files) to apply to the cluster.

In GitHub, create a GitHub repository. This repository holds Kubernetes manifests for your deployments. Name the repository k8s-config to align with this post. Leave it as a public repository, check the box for Initialize this repository with a README, and choose Create Repo.

On the GitHub repository page, choose Clone or Download and save the SSH string:

[email protected]:youruser/k8s-config.git

Next, create a GitHub token that allows creating and deleting repositories so AWS Service Catalog can deploy and remove pipelines.

  1. In your GitHub profile, access your token settings.
  2. Choose Generate New Token.
  3. Name your new token CodePipeline Service Catalog, and select the following options:
  • repo scopes (repo:status, repo_deployment, public_repo, and repo:invite)
  • read:org
  • write:public_key and read:public_key
  • write:repo_hook and read:repo_hook
  • read:user and user:email
  • delete_repo

4 . Choose Generate Token.

5. Copy and save your access token for future access.

 

Deploy Helm

Helm is a package manager for Kubernetes that allows you to define a chart. Charts are collections of related resources that let you create, version, share, and publish applications. By deploying Helm into your cluster, you make it much easier to deploy Weave Flux and other systems. If you’ve deployed Helm already, skip this section.

First, install the Helm client with the following command:

curl -LO https://git.io/get_helm.sh

chmod 700 get_helm.sh

./get_helm.sh

 

On macOS, you could alternatively enter the following command:

brew install kubernetes-helm

 

Next, set up a service account with cluster role for Tiller, Helm’s server-side component. This allows Tiller to manage resources in your cluster.

kubectl -n kube-system create sa tiller

kubectl create clusterrolebinding tiller-cluster-rule \

--clusterrole=cluster-admin \

--serviceaccount=kube-system:tiller

 

Finally, initialize Helm and verify your version. Tiller takes a few seconds to start.

helm init --service-account tiller --history-max 200

helm version

 

Deploy Weave Flux

With Helm installed, proceed with the Weave Flux installation. Begin by installing the Flux Custom Resource Definition.

kubectl apply -f https://raw.githubusercontent.com/fluxcd/flux/helm-0.10.1/deploy-helm/flux-helm-release-crd.yaml

Now add the Weave Flux Helm repository and proceed with the install. Make sure that you update the git.url to match the GitHub repository that you created earlier.

helm repo add fluxcd https://charts.fluxcd.io

helm upgrade -i flux --set helmOperator.create=true --set helmOperator.createCRD=false --set [email protected]:YOURUSER/k8s-config --namespace flux fluxcd/flux

 

You can use the following code to verify that you successfully deployed Flux. You should see three pods running:

kubectl get pods -n flux

NAME                                 READY     STATUS    RESTARTS   AGE

flux-5bd7fb6bb6-4sc78                1/1       Running   0          52s

flux-helm-operator-df5746688-84kw8   1/1       Running   0          52s

flux-memcached-6f8c446979-f45wj      1/1       Running   0          52s

 

Flux requires a deploy key to work with the GitHub repository. In this post, Flux generates the SSH key pair itself, but you can also specify a different key pair when deploying. To access the key, download fluxctl, a command line utility that interacts with the Flux API. The following steps work for Linux. For other OS platforms, see Installing fluxctl.

sudo wget -O /usr/local/bin/fluxctl https://github.com/fluxcd/flux/releases/download/1.14.1/fluxctl_linux_amd64

sudo chmod 755 /usr/local/bin/fluxctl

 

Validate that fluxctl installed successfully, then retrieve the public key pair using the following command. Specify the namespace where you deployed Flux.

fluxctl version

fluxctl --k8s-fwd-ns=flux identity

 

Copy the key and add that as a deploy key in your GitHub repository.

  1. In your GitHub repository, choose Settings, Deploy Keys.
  2. Choose Add deploy key and name the key Flux Deploy Key.
  3. Paste the key from fluxctl identity.
  4. Choose Allow Write Access, Add Key.

Now use AWS Service Catalog to set up your image build pipeline.

 

Set up AWS Service Catalog

To allow end users to consume product portfolios, you must associate a portfolio with an IAM principal (or principals): a user, group, or role. For this example, associate your current identity. After you master these basics, there are additional resources to teach you how to set up a multi-region, multi-account catalog.

To retrieve your current identity, use the AWS CLI to get your ARN:

aws sts get-caller-identity

Deploy the product portfolio that contains an image build pipeline service by doing the following:

  1. In the AWS CloudFormation console, launch the CloudFormation stack with the following link:

 

 

2. Choose Next.

3. On the Specify Details page, enter your ARN from get-caller-identity. Also enter an environment tag, which AWS applies to all resources from this portfolio.

4. Choose Next.

5. On the Options page, choose Next.

6. On the Review page, select the check box displayed next to I acknowledge that AWS CloudFormation might create IAM resources.

7. Choose Create. CloudFormation takes a few minutes to create your resources.

 

Deploy the image pipeline

The image pipeline provisions a GitHub repository, Amazon ECR repository, and AWS CodeBuild project. It also uses AWS CodePipeline to build a Docker image.

  1. In the AWS Management Console, go to the AWS Service Catalog products list and choose Pipeline for Docker Images.
  2. Choose Launch Product.
  3. For Name, enter ExamplePipeline, and choose Next.
  4. On the Parameters page, fill in a project name, description, and unique S3 bucket name. The specifics don’t matter, but make a note of the name and S3 bucket for later use.
  5. Fill in your GitHub User and GitHub Token values from earlier. Leave the rest of the fields as the default values.
  6. To clean up your GitHub repository on stack delete, change Delete Repository to true.
  7. Choose Next.
  8. On the TagOptions screen, choose Next.
  9. Choose Next on the Notifications page.
  10. On the Review page, choose Launch.

The launch process takes 1–2 minutes. You can verify that you now have a repository matching your project name (eks-example) in GitHub. You can also look at the pipeline created in the AWS CodePipeline console.

 

Deploying with GitOps

You can now provision workloads into the EKS cluster. With a GitOps approach, you only commit code and Kubernetes resource definitions to GitHub. AWS CodePipeline handles the image builds, and Weave Flux applies the desired state to Kubernetes.

First, create a simple Hello World application in your example pipeline. Clone the GitHub repository that you created in the previous step and substitute your GitHub user below.

git clone [email protected]:youruser/eks-example.git

cd eks-example

Create a base README file, a source directory, and download a simple NGINX configuration (hello.conf), home page (index.html), and Dockerfile.

echo "# eks-example" > README.md

mkdir src

wget -O src/hello.conf https://blog-gitops-eks.s3.amazonaws.com/hello.conf

wget -O src/index.html https://blog-gitops-eks.s3.amazonaws.com/index.html

wget https://blog-gitops-eks.s3.amazonaws.com/Dockerfile

 

Now that you have a simple Hello World app with Dockerfile, commit the changes to kick off the pipeline.

git add .

git commit -am "Initial commit"

[master (root-commit) d69a6ba] Initial commit

4 files changed, 34 insertions(+)

create mode 100644 Dockerfile

create mode 100644 README.md

create mode 100644 src/hello.conf

create mode 100644 src/index.html

git push

 

Watch in the AWS CodePipeline console to see the image build in process. This may take a minute to start. When it’s done, look in the ECR console to see the first version of the container image.

To deploy this image and the Hello World application, commit Kubernetes manifests for Flux. Create a namespace, deployment, and service in the Kubernetes Git repository (k8s-config) you created. Make sure that you aren’t in your eks-example repository directory.

cd ..

git clone [email protected]:youruser/k8s-config.git

cd k8s-config

mkdir charts namespaces releases workloads

 

The preceding directory structure helps organize the repository but isn’t necessary. Flux can descend into subdirectories and look for YAML files to apply.

Create a namespace Kubernetes manifest.

cat << EOF > namespaces/eks-example.yaml
apiVersion: v1
kind: Namespace
metadata:
  labels:
    name: eks-example
  name: eks-example
EOF

Now create a deployment manifest. Make sure that you update this image to point to your repository and image tag. For example, <Account ID>.dkr.ecr.us-east-1.amazonaws.com/eks-example:d69a6bac.

cat << EOF > workloads/eks-example-dep.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: eks-example
  namespace: eks-example
  labels:
    app: eks-example
  annotations:
    # Container Image Automated Updates
    flux.weave.works/automated: "true"
    # do not apply this manifest on the cluster
    #flux.weave.works/ignore: "true"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: eks-example
  template:
    metadata:
      labels:
        app: eks-example
    spec:
      containers:
      - name: eks-example
        image: <Your Account>.dkr.ecr.us-east-1.amazonaws.com/eks-example:d69a6bac
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /
            port: http
        readinessProbe:
          httpGet:
            path: /
            port: http
EOF

 

Finally, create a service manifest to create a load balancer.

cat << EOF > workloads/eks-example-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: eks-example
  namespace: eks-example
  labels:
    app: eks-example
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: eks-example
EOF

 

In the preceding code, there are two Kubernetes annotations for Flux. The first, flux.weave.works/automated, tells Flux whether the container image should be automatically updated. This example sets the value to true, enabling updates to your deployment as new images arrive in the registry. This example comments out the second annotation, flux.weave.works/ignore. However, you can use it to tell Flux to ignore the deployment temporarily.

Commit the changes, and in a few minutes, it automatically deploys.

git add .
git commit -am "eks-example deployment"
[master 954908c] eks-example deployment
 3 files changed, 64 insertions(+)
 create mode 100644 namespaces/eks-example.yaml
 create mode 100644 workloads/eks-example-dep.yaml
 create mode 100644 workloads/eks-example-svc.yaml

 

Make sure that you push your changes.

git push

Now check the logs of your Flux pod:

kubectl get pods -n flux

Update the name below to reflect the name of the pod in your deployment. This sample pulls every five minutes for changes. When it triggers, you should see kubectl apply log messages to create the namespace, service, and deployment.

kubectl logs flux-5bd7fb6bb6-4sc78 -n flux

Find the load balancer input for your service with the following:

kubectl describe service eks-example -n eks-example

Now when you connect to the load balancer address in a browser, you can see the Hello World app.

Change the eks-example source code in a small way (such as changing index.html to say Hello World Deployment 2), then commit and push to Git.

After a few minutes, refresh your browser to see the deployed change. You can watch the changes in AWS CodePipeline, in ECR, and through Flux logs. Weave Flux automatically updated your deployment manifests in the k8s-config repository to deploy the new image as it detected it. To back out that change, use a git revert or git reset command.

Finally, you can use the same approach to deploy Helm charts. You can host these charts within the configuration Git repository (k8s-config in this example), or on an external chart repository. In the following example, you use an external chart repository.

In your k8s-config directory, get the latest changes from your repository and then create a Helm release from an external chart.

cd k8s-config

git pull

 

First, create the namespace manifest.

cat << EOF > namespaces/nginx.yaml
apiVersion: v1
kind: Namespace
metadata:
  labels:
    name: nginx
  name: nginx
EOF

 

Then create the Helm release manifest. This is a custom resource definition provided by Weave Flux.

cat << EOF > releases/nginx.yaml
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
  name: mywebserver
  namespace: nginx
  annotations:
    flux.weave.works/automated: "true"
    flux.weave.works/tag.nginx: semver:~1.16
    flux.weave.works/locked: 'true'
    flux.weave.works/locked_msg: '"Halt updates for now"'
    flux.weave.works/locked_user: User Name <[email protected]>
spec:
  releaseName: mywebserver
  chart:
    repository: https://charts.bitnami.com/bitnami/
    name: nginx
    version: 3.3.2
  values:
    usePassword: true
    image:
      registry: docker.io
      repository: bitnami/nginx
      tag: 1.16.0-debian-9-r46
    service:
      type: LoadBalancer
      port: 80
      nodePorts:
        http: ""
      externalTrafficPolicy: Cluster
    ingress:
      enabled: false
    livenessProbe:
      httpGet:
        path: /
        port: http
      initialDelaySeconds: 30
      timeoutSeconds: 5
      failureThreshold: 6
    readinessProbe:
      httpGet:
        path: /
        port: http
      initialDelaySeconds: 5
      timeoutSeconds: 3
      periodSeconds: 5
    metrics:
      enabled: false
EOF

git add . 
git commit -am "Adding NGINX Helm release"
git push

 

There are a few new annotations for Flux above. The flux.weave.works/locked annotation tells Flux to lock the deployment. This is useful if you find a known bad image and must roll back to a previous version. In addition, the flux.weave.works/tag.nginx annotation filters image tags by semantic versioning.

Wait up to five minutes for Flux to pull the configuration and verify this deployment as you did in the previous example:

kubectl get pods -n flux

kubectl logs flux-5bd7fb6bb6-4sc78 -n flux

 

kubectl get all -n nginx

 

If this doesn’t deploy, ensure Helm initialized as described earlier in this post.

kubectl get pods -n kube-system | grep tiller

kubectl get pods -n flux

kubectl logs flux-helm-operator-df5746688-84kw8 -n flux

 

Clean up

Log in as an administrator and follow these steps to clean up your sample deployment.

  1. Delete all images from the Amazon ECR repository.

2. In AWS Service Catalog provisioned products, select the three dots to the left of your ExamplePipeline service and choose Terminate provisioned product. Wait until it completes termination (1–2 minutes).

3. Delete your Amazon S3 artifact bucket.

4. Delete Weave Flux:

helm delete flux --purge

kubectl delete ns flux

kubectl delete crd helmreleases.flux.weave.works

5. Delete the load balancer services:

helm delete mywebserver --purge

kubectl delete ns nginx

kubectl delete svc eks-example -n eks-example

kubectl delete deployment eks-example -n eks-example

kubectl delete ns eks-example

6. Clean up your GitHub repositories:

 – Go to your k8s-config repository in GitHub, choose Settings, scroll to the bottom and choose Delete this repository. If you set delete to false in the pipeline service, you also must delete your eks-example repository.

 – Delete the personal access token that you created.

7.     If you provisioned an EKS cluster at the beginning of this post, delete it:

eksctl get cluster

eksctl delete cluster <clustername>

8.     In the AWS CloudFormation console, select the DevServiceCatalog stack, and choose the Actions, Delete Stack.

Conclusion

In this post, I demonstrated how to use a GitOps approach, which allows you to focus on committing code and configuration to Git rather than learning new CI/CD tooling. Git acts as the single source of truth, and Weave Flux pulls changes and ensures that the Kubernetes cluster configuration matches the desired state.

In addition, AWS Service Catalog can be used to create a portfolio of services that enables you to standardize your offerings, such as an image build pipeline based on AWS CodePipeline.

As always, AWS welcomes feedback. Please submit comments or questions below.

AWS Cloud Development Kit (CDK) – TypeScript and Python are Now Generally Available

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-cloud-development-kit-cdk-typescript-and-python-are-now-generally-available/

Managing your Infrastructure as Code provides great benefits and is often a stepping stone for a successful application of DevOps practices. In this way, instead of relying on manually performed steps, both administrators and developers can automate provisioning of compute, storage, network, and application services required by their applications using configuration files.

For example, defining your Infrastructure as Code makes it possible to:

  • Keep infrastructure and application code in the same repository
  • Make infrastructure changes repeatable and predictable across different environments, AWS accounts, and AWS regions
  • Replicate production in a staging environment to enable continuous testing
  • Replicate production in a performance test environment that you use just for the time required to run a stress test
  • Release infrastructure changes using the same tools as code changes, so that deployments include infrastructure updates
  • Apply software development best practices to infrastructure management, such as code reviews, or deploying small changes frequently

Configuration files used to manage your infrastructure are traditionally implemented as YAML or JSON text files, but in this way you’re missing most of the advantages of modern programming languages. Specifically with YAML, it can be very difficult to detect a file truncated while transferring to another system, or a missing line when copying and pasting from one template to another.

Wouldn’t it be better if you could use the expressive power of your favorite programming language to define your cloud infrastructure? For this reason, we introduced last year in developer preview the AWS Cloud Development Kit (CDK), an extensible open-source software development framework to model and provision your cloud infrastructure using familiar programming languages.

I am super excited to share that the AWS CDK for TypeScript and Python is generally available today!

With the AWS CDK you can design, compose, and share your own custom components that incorporate your unique requirements. For example, you can create a component setting up your own standard VPC, with its associated routing and security configurations. Or a standard CI/CD pipeline for your microservices using tools like AWS CodeBuild and CodePipeline.

Personally I really like that by using the AWS CDK, you can build your application, including the infrastructure, in your IDE, using the same programming language and with the support of autocompletion and parameter suggestion that modern IDEs have built in, without having to do a mental switch between one tool, or technology, and another. The AWS CDK makes it really fun to quickly code up your AWS infrastructure, configure it, and tie it together with your application code!

How the AWS CDK works
Everything in the AWS CDK is a construct. You can think of constructs as cloud components that can represent architectures of any complexity: a single resource, such as an S3 bucket or an SNS topic, a static website, or even a complex, multi-stack application that spans multiple AWS accounts and regions. To foster reusability, constructs can include other constructs. You compose constructs together into stacks, that you can deploy into an AWS environment, and apps, a collection of one of more stacks.

How to use the AWS CDK
We continuously add new features based on the feedback of our customers. That means that when creating an AWS resource, you often have to specify many options and dependencies. For example, if you create a VPC you have to think about how many Availability Zones (AZs) to use and how to configure subnets to give private and public access to the resources that will be deployed in the VPC.

To make it easy to define the state of AWS resources, the AWS Construct Library exposes the full richness of many AWS services with sensible defaults that you can customize as needed. In the case above, the VPC construct creates by default public and private subnets for all the AZs in the VPC, using 3 AZs if not specified.

For creating and managing CDK apps, you can use the AWS CDK Command Line Interface (CLI), a command-line tool that requires Node.js and can be installed quickly with:

npm install -g aws-cdk

After that, you can use the CDK CLI with different commands:

  • cdk init to initialize in the current directory a new CDK project in one of the supported programming languages
  • cdk synth to print the CloudFormation template for this app
  • cdk deploy to deploy the app in your AWS Account
  • cdk diff to compare what is in the project files with what has been deployed

Just run cdk to see more of the available commands and options.

You can easily include the CDK CLI in your deployment automation workflow, for example using Jenkins or AWS CodeBuild.

Let’s use the AWS CDK to build two sample projects, using different programming languages.

An example in TypeScript
For the first project I am using TypeScript to define the infrastructure:

cdk init app --language=typescript

Here’s a simplified view of what I want to build, not entering into the details of the public/private subnets in the VPC. There is an online frontend, writing messages in a queue, and an asynchronous backend, consuming messages from the queue:

Inside the stack, the following TypeScript code defines the resources I need, and their relations:

  • First I define the VPC and an Amazon ECS cluster in that VPC. By using the defaults provided by the AWS Construct Library, I don’t need to specify any parameter here.
  • Then I use an ECS pattern that in a few lines of code sets up an Amazon SQS queue and an ECS service running on AWS Fargate to consume the messages in that queue.
  • The ECS pattern library provides higher-level ECS constructs which follow common architectural patterns, such as load balanced services, queue processing, and scheduled tasks.
  • A Lambda function has the name of the queue, created by the ECS pattern, passed as an environment variable and is granted permissions to send messages to the queue.
  • The code of the Lambda function and the Docker image are passed as assets. Assets allow you to bundle files or directories from your project and use them with Lambda or ECS.
  • Finally, an Amazon API Gateway endpoint provides an HTTPS REST interface to the function.
const myVpc = new ec2.Vpc(this, "MyVPC");

const myCluster = new ecs.Cluster(this, "MyCluster", {
  vpc: myVpc
});

const myQueueProcessingService = new ecs_patterns.QueueProcessingFargateService(
  this, "MyQueueProcessingService", {
    cluster: myCluster,
    memoryLimitMiB: 512,
    image: ecs.ContainerImage.fromAsset("my-queue-consumer")
  });

const myFunction = new lambda.Function(
  this, "MyFrontendFunction", {
    runtime: lambda.Runtime.NODEJS_10_X,
    timeout: Duration.seconds(3),
    handler: "index.handler",
    code: lambda.Code.asset("my-front-end"),
    environment: {
      QUEUE_NAME: myQueueProcessingService.sqsQueue.queueName
    }
  });

myQueueProcessingService.sqsQueue.grantSendMessages(myFunction);

const myApi = new apigateway.LambdaRestApi(
  this, "MyFrontendApi", {
    handler: myFunction
  });

I find this code very readable and easier to maintain than the corresponding JSON or YAML. By the way, cdk synth in this case outputs more than 800 lines of plain CloudFormation YAML.

An example in Python
For the second project I am using Python:

cdk init app --language=python

I want to build a Lambda function that is executed every 10 minutes:

When you initialize a CDK project in Python, a virtualenv is set up for you. You can activate the virtualenv and install your project requirements with:

source .env/bin/activate

pip install -r requirements.txt

Note that Python autocompletion may not work with some editors, like Visual Studio Code, if you don’t start the editor from an active virtualenv.

Inside the stack, here’s the Python code defining the Lambda function and the CloudWatch Event rule to invoke the function periodically as target:

myFunction = aws_lambda.Function(
    self, "MyPeriodicFunction",
    code=aws_lambda.Code.asset("src"),
    handler="index.main",
    timeout=core.Duration.seconds(30),
    runtime=aws_lambda.Runtime.PYTHON_3_7,
)

myRule = aws_events.Rule(
    self, "MyRule",
    schedule=aws_events.Schedule.rate(core.Duration.minutes(10)),
)
myRule.add_target(aws_events_targets.LambdaFunction(myFunction))

Again, this is easy to understand even if you don’t know the details of AWS CDK. For example, durations include the time unit and you don’t have to wonder if they are expressed in seconds, milliseconds, or days. The output of cdk synth in this case is more than 90 lines of plain CloudFormation YAML.

Available Now
There is no charge for using the AWS CDK, you pay for the AWS resources that are deployed by the tool.

To quickly get hands-on with the CDK, start with this awesome step-by-step online tutorial!

More examples of CDK projects, using different programming languages, are available in this repository:

https://github.com/aws-samples/aws-cdk-examples

You can find more information on writing your own constructs here.

The AWS CDK is open source and we welcome your contribution to make it an even better tool:

https://github.com/awslabs/aws-cdk

Check out our source code on GitHub, start building your infrastructure today using TypeScript or Python, or try different languages in developer preview, such as C# and Java, and give us your feedback!

Scaling Kubernetes deployments with Amazon CloudWatch metrics

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/scaling-kubernetes-deployments-with-amazon-cloudwatch-metrics/

This post is contributed by Kwunhok Chan | Solutions Architect, AWS

 

In an earlier post, AWS introduced Horizontal Pod Autoscaler and Kubernetes Metrics Server support for Amazon Elastic Kubernetes Service. These tools make it easy to scale your Kubernetes workloads managed by EKS in response to built-in metrics like CPU and memory.

However, one common use case for applications running on EKS is the integration with AWS services. For example, you administer an application that processes messages published to an Amazon SQS queue. You want the application to scale according to the number of messages in that queue. The Amazon CloudWatch Metrics Adapter for Kubernetes (k8s-cloudwatch-adapter) helps.

 

Amazon CloudWatch Metrics Adapter for Kubernetes

The k8s-cloudwatch-adapter is an implementation of the Kubernetes Custom Metrics API and External Metrics API with integration for CloudWatch metrics. It allows you to scale your Kubernetes deployment using the Horizontal Pod Autoscaler (HPA) with CloudWatch metrics.

 

Prerequisites

Before starting, you need the following:

 

Getting started

Before using the k8s-cloudwatch-adapter, set up a way to manage IAM credentials to Kubernetes pods. The CloudWatch Metrics Adapter requires the following permissions to access metric data from CloudWatch:

cloudwatch:GetMetricData

Create an IAM policy with the following template:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        }
    ]
}

For demo purposes, I’m granting admin permissions to my Kubernetes worker nodes. Don’t do this in your production environment. To associate IAM roles to your Kubernetes pods, you may want to look at kube2iam or kiam.

If you’re using an EKS cluster, you most likely provisioned it with AWS CloudFormation. The following command uses AWS CloudFormation stacks to update the proper instance policy with the correct permissions:

aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/AdministratorAccess \
--role-name $(aws cloudformation describe-stacks --stack-name ${STACK_NAME} --query 'Stacks[0].Parameters[?ParameterKey==`NodeInstanceRoleName`].ParameterValue' | jq -r ".[0]")

 

Make sure to replace ${STACK_NAME} with the nodegroup stack name from the AWS CloudFormation console .

 

You can now deploy the k8s-cloudwatch-adapter to your Kubernetes cluster.

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/deploy/adapter.yaml

 

This deployment creates a new namespace—custom-metrics—and deploys the necessary ClusterRole, Service Account, and Role Binding values, along with the deployment of the adapter. Use the created custom resource definition (CRD) to define the configuration for the external metrics to retrieve from CloudWatch. The adapter reads the configuration defined in ExternalMetric CRDs and loads its external metrics. That allows you to use HPA to autoscale your Kubernetes pods.

 

Verifying the deployment

Next, query the metrics APIs to see if the adapter is deployed correctly. Run the following command:

$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq.
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
  ]
}

There are no resources from the response because you haven’t registered any metric resources yet.

 

Deploying an Amazon SQS application

Next, deploy a sample SQS application to test out k8s-cloudwatch-adapter. The SQS producer and consumer are provided, together with the YAML files for deploying the consumer, metric configuration, and HPA.

Both the producer and consumer use an SQS queue named helloworld. If it doesn’t exist already, the producer creates this queue.

Deploy the consumer with the following command:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/consumer-deployment.yaml

 

You can verify that the consumer is running with the following command:

$ kubectl get deploy sqs-consumer
NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
sqs-consumer   1         1         1            0           5s

 

Set up Amazon CloudWatch metric and HPA

Next, create an ExternalMetric resource for the CloudWatch metric. Take note of the Kind value for this resource. This CRD resource tells the adapter how to retrieve metric data from CloudWatch.

You define the query parameters used to retrieve the ApproximateNumberOfMessagesVisible for an SQS queue named helloworld. For details about how metric data queries work, see CloudWatch GetMetricData API.

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric:
  metadata:
    name: hello-queue-length
  spec:
    name: hello-queue-length
    resource:
      resource: "deployment"
      queries:
        - id: sqs_helloworld
          metricStat:
            metric:
              namespace: "AWS/SQS"
              metricName: "ApproximateNumberOfMessagesVisible"
              dimensions:
                - name: QueueName
                  value: "helloworld"
            period: 300
            stat: Average
            unit: Count
          returnData: true

 

Create the ExternalMetric resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/externalmetric.yaml

 

Then, set up the HPA for your consumer. Here is the configuration to use:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: sqs-consumer-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: sqs-consumer
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: hello-queue-length
      targetValue: 30

 

This HPA rule starts scaling out when the number of messages visible in your SQS queue exceeds 30, and scales in when there are fewer than 30 messages in the queue.

Create the HPA resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/hpa.yaml

 

Generate load using a producer

Finally, you can start generating messages to the queue:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/producer-deployment.yaml

On a separate terminal, you can now watch your HPA retrieving the queue length and start scaling the replicas. SQS metrics generate at five-minute intervals, so give the process a few minutes:

$ kubectl get hpa sqs-consumer-scaler -w

 

Clean up

After you complete this experiment, you can delete the Kubernetes deployment and respective resources.

Run the following commands to remove the consumer, external metric, HPA, and SQS queue:

$ kubectl delete deploy sqs-producer
$ kubectl delete hpa sqs-consumer-scaler
$ kubectl delete externalmetric sqs-helloworld-length
$ kubectl delete deploy sqs-consumer

$ aws sqs delete-queue helloworld

 

Other CloudWatch integrations

AWS recently announced the preview for Amazon CloudWatch Container Insights, which monitors, isolates, and diagnoses containerized applications running on EKS and Kubernetes clusters. To get started, see Using Container Insights.

 

Get involved

This project is currently under development. AWS welcomes issues and pull requests, and would love to hear your feedback.

How could this adapter be best implemented to work in your environment? Visit the Amazon CloudWatch Metrics Adapter for Kubernetes project on GitHub and let AWS know what you think.

OpsCenter – A New Feature to Streamline IT Operations

Post Syndicated from Martin Beeby original https://aws.amazon.com/blogs/aws/opscenter-a-new-feature-to-streamline-it-operations/

The AWS teams are always listening to customers and trying to understand how they can improve services to make customers more productive. A new feature in AWS Systems Manager called OpsCenter exemplifies this approach by enabling customers to aggregate issues, events and alerts, across services. So customers can go to one place to view, investigate, and remediate issues reducing the need to navigate across multiple different AWS services.

Issues, events and alerts appear as operations items (OpsItems) in this new console and provide contextual information, historical guidance, and quick solution steps. The feature aims to improve the mean time to resolution, making engineers more productive by ensuring key investigation data is available in one place.

Engineers working on an OpsItem get access to information such as:

  • Event, resource and account details
  • Past OpsItems with similar characteristics
  • Related AWS Config changes and relationships
  • AWS CloudTrail logs
  • Amazon CloudWatch alarms
  • AWS CloudFormation Stack information
  • Other quick-links to access logs and metrics
  • List of runbooks and recommended runbooks
  • Additional information passed to OpsCenter through AWS services

This information helps engineers to investigate and remediate operational issues faster. Engineers can use OpsCenter to view and address problems using the Systems Manager console or via the Systems Manager OpsCenter APIs.

I’ll spend the rest of this blog exploring the capabilities of this new feature. To get started, I open the Systems Manager Console, make sure that I am in the region of interest, and click OpsCenter inside the Operations Management menu which is on the far left of the screen.

After arriving at the OpsCenter screen for the first time and clicking on “Getting Started” I am prompted with a configure sources screen. This screen sets up the systems with some example CloudWatch rules that will create OpsItems when specific rules trigger. For example, one of the CloudWatch rules will alert if an AutoScaling EC2 instance is stopped or terminated. On this screen, you need to configure and add the ARN of an IAM role that has permission to create OpsItems. This security role is used by the CloudWatch rules to create the OpsItems. You can, of course, create your OpsItems through the API or by creating custom CloudWatch rules.

Now the system has set me up some CloudWatch rules I thought I would test it out by triggering an alert. In the EC2 console, I will intentionally deregister (delete) the Amazon Machine Image that is associated with my AutoScaling Group. I will then increase the Desired Capacity of my AutoScaling group from 2 to 4. The AutoScaling group will later try to create new instances; however, it will fail because I have deleted the AMI.

As I expected this triggered the CloudWatch rule to create an OpsItem in OpsCenter console. There is now one item open in the OpsItem status summary dashboard. I click on this to get more detail on the open OpsItems.

This gives me a list of all the open OpsItems, and I can see that I have one with the title “Auto Scaling EC2 instance launch failed” which has been created by CloudWatch rules because I deleted the AMI associated with the AutoScaling group. Clicking on that OpsItems takes me to more detail of the OpsItem.

I can from this overview screen start to explore the item. Looking around this screen, I can find out more information about this OpsItem and see it is collecting data from numerous services and presenting it in one place.

Further down the screen I can see other Similar OpsItems and can explore them. In a real situation, this might give me contextual information as to how similar problems were solved in the past, ensuring that operations teams learn from their previous collective experience. I can also manually add a relationship between OpsItems if they are connected. Importantly the Operational data section gives me information about the cause. The status message is particularly useful since it’s calling out the issue: that the AMI does not exist.

On the related resources details screen, I can find out more information about this OpsItem. For example, I can see tag information about the resources alongside relevant CloudWatch alarms. I can explore details from AWS config as well as drill into CloudTrail logs. I can even see if the resources are associated with any CloudFormation stacks.

Earlier on, I created a CloudWatch alarm that will alert when the number of instances on my AutoScaling group falls below the desired instance threshold (4 Instances). As you can see, I don’t need to go into the CloudWatch console to view this, I can see right from this screen that I have an Alarm State for Booking App Instance Count Low.

The Runbooks section is fascinating; what it is offering me is automated ways in which I can resolve this issue. There are several built-in Runbooks; however, I have a custom one which, luckily enough, automates the fix for this exact problem. It will create a new AMI based upon one of the healthy instances in my AutoScaling Group and then update the config to use that new AMI when it creates instances. To run this automation, I select the runbook and press execute.

It asks me to provide some parameters for the automation job. I paste the AutoScaling Group Name (BookingsAppASG) as the only required parameter and press Execute.

After a minute or so a green success signifier appears in the Latest Status column of the runbook and I am now able to view the logs and even save the output to operation data on the OpsItem so that other engineers can clearly see what I have done.

 

Back in the OpsCenter OpsItem related resource details screen, I can now see that my CloudWatch alarm is green and in an OK state, signifying that my AutoScalling group currently has four instances running and I am safe to resolve the OpsItem.

This service is available now, and you can start using it today in all public AWS regions so why not open up the console and start exploring all the ways that you can save you and your team valuable time.

AWS Control Tower – Set up & Govern a Multi-Account AWS Environment

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-control-tower-set-up-govern-a-multi-account-aws-environment/

Earlier this month I met with an enterprise-scale AWS customer. They told me that they are planning to go all-in on AWS, and want to benefit from all that we have learned about setting up and running AWS at scale. In addition to setting up a Cloud Center of Excellence, they want to set up a secure environment for teams to provision development and production accounts in alignment with our recommendations and best practices.

AWS Control Tower
Today we are announcing general availability of AWS Control Tower. This service automates the process of setting up a new baseline multi-account AWS environment that is secure, well-architected, and ready to use. Control Tower incorporates the knowledge that AWS Professional Service has gained over the course of thousands of successful customer engagements, and also draws from the recommendations found in our whitepapers, documentation, the Well-Architected Framework, and training. The guidance offered by Control Tower is opinionated and prescriptive, and is designed to accelerate your cloud journey!

AWS Control Tower builds on multiple AWS services including AWS Organizations, AWS Identity and Access Management (IAM) (including Service Control Policies), AWS Config, AWS CloudTrail, and AWS Service Catalog. You get a unified experience built around a collection of workflows, dashboards, and setup steps. AWS Control Tower automates a landing zone to set up a baseline environment that includes:

  • A multi-account environment using AWS Organizations.
  • Identity management using AWS Single Sign-On (SSO).
  • Federated access to accounts using AWS SSO.
  • Centralize logging from AWS CloudTrail, and AWS Config stored in Amazon S3.
  • Cross-account security audits using AWS IAM and AWS SSO.

Before diving in, let’s review a couple of key Control Tower terms:

Landing Zone – The overall multi-account environment that Control Tower sets up for you, starting from a fresh AWS account.

Guardrails – Automated implementations of policy controls, with a focus on security, compliance, and cost management. Guardrails can be preventive (blocking actions that are deemed as risky), or detective (raising an alert on non-conformant actions).

Blueprints – Well-architected design patterns that are used to set up the Landing Zone.

Environment – An AWS account and the resources within it, configured to run an application. Users make requests (via Service Catalog) for new environments and Control Tower uses automated workflows to provision them.

Using Control Tower
Starting from a brand new AWS account that is both Master Payer and Organization Master, I open the Control Tower Console and click Set up landing zone to get started:

AWS Control Tower will create AWS accounts for log arching and for auditing, and requires email addresses that are not already associated with an AWS account. I enter two addresses, review the information within Service permissions, give Control Tower permission to administer AWS resources and services, and click Set up landing zone:

The setup process runs for about an hour, and provides status updates along the way:

Early in the process, Control Tower sends a handful of email requests to verify ownership of the account, invite the account to participate in AWS SSO, and to subscribe to some SNS topics. The requests contain links that I must click in order for the setup process to proceed. The second email also requests that I create an AWS SSO password for the account. After the setup is complete, AWS Control Tower displays a status report:

The console offers some recommended actions:

At this point, the mandatory guardrails have been applied and the optional guardrails can be enabled:

I can see the Organizational Units (OUs) and accounts, and the compliance status of each one (with respect to the guardrails):

 

Using the Account Factory
The navigation on the left lets me access all of the AWS resources created and managed by Control Tower. Now that my baseline environment is set up, I can click Account factory to provision AWS accounts for my teams, applications, and so forth.

The Account factory displays my network configuration (I’ll show you how to edit it later), and gives me the option to Edit the account factory network configuration or to Provision new account:

I can control the VPC configuration that is used for new accounts, including the regions where VPCs are created when an account is provisioned:

The account factory is published to AWS Service Catalog automatically. I can provision managed accounts as needed, as can the developers in my organization. I click AWS Control Tower Account Factory to proceed:

I review the details and click LAUNCH PRODUCT to provision a new account:

Working with Guardrails
As I mentioned earlier, Control Tower’s guardrails provide guidance that is either Mandatory or Strongly Recommended:

Guardrails are implemented via an IAM Service Control Policy (SCP) or an AWS Config rule, and can be enabled on an OU-by-OU basis:

Now Available
AWS Control Tower is available now and you can start using it today in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) Regions, with more to follow. There is no charge for the Control Tower service; you pay only for the AWS resources that it creates on your behalf.

In addition to adding support for more AWS regions, we are working to allow you to set up a parallel landing zone next to an existing AWS account, and to give you the ability to build and use custom guardrails.

Jeff;

 

New – How to better monitor your custom application metrics using Amazon CloudWatch Agent

Post Syndicated from Helen Lin original https://aws.amazon.com/blogs/devops/new-how-to-better-monitor-your-custom-application-metrics-using-amazon-cloudwatch-agent/

This blog was contributed by Zhou Fang, Sr. Software Development Engineer for Amazon CloudWatch and Helen Lin, Sr. Product Manager for Amazon CloudWatch

Amazon CloudWatch collects monitoring and operational data from both your AWS resources and on-premises servers, providing you with a unified view of your infrastructure and application health. By default, CloudWatch automatically collects and stores many of your AWS services’ metrics and enables you to monitor and alert on metrics such as high CPU utilization of your Amazon EC2 instances. With the CloudWatch Agent that launched last year, you can also deploy the agent to collect system metrics and application logs from both your Windows and Linux environments. Using this data collected by CloudWatch, you can build operational dashboards to monitor your service and application health, set high-resolution alarms to alert and take automated actions, and troubleshoot issues using Amazon CloudWatch Logs.

We recently introduced CloudWatch Agent support for collecting custom metrics using StatsD and collectd. It’s important to collect system metrics like available memory, and you might also want to monitor custom application metrics. You can use these custom application metrics, such as request count to understand the traffic going through your application or understand latency so you can be alerted when requests take too long to process. StatsD and collectd are popular, open-source solutions that gather system statistics for a wide variety of applications. By combining the system metrics the agent already collects, with the StatsD protocol for instrumenting your own metrics and collectd’s numerous plugins, you can better monitor, analyze, alert, and troubleshoot the performance of your systems and applications.

Let’s dive into an example that demonstrates how to monitor your applications using the CloudWatch Agent.  I am operating a RESTful service that performs simple text encoding. I want to use CloudWatch to help monitor a few key metrics:

  • How many requests are coming into my service?
  • How many of these requests are unique?
  • What is the typical size of a request?
  • How long does it take to process a job?

These metrics help me understand my application performance and throughput, in addition to setting alarms on critical metrics that could indicate service degradation, such as request latency.

Step 1. Collecting StatsD metrics

My service is running on an EC2 instance, using Amazon Linux AMI 2018.03.0. Make sure to attach the CloudWatchAgentServerPolicy AWS managed policy so that the CloudWatch agent can collect and publish metrics from this instance:

Here is the service structure:

 

The “/encode” handler simply returns the base64 encoded string of an input text.  To monitor key metrics, such as total and unique request count as well as request size and method response time, I used StatsD to define these custom metrics.

@RestController

public class EncodeController {

    @RequestMapping("/encode")
    public String encode(@RequestParam(value = "text") String text) {
        long startTime = System.currentTimeMillis();
        statsd.incrementCounter("totalRequest.count", new String[]{"path:/encode"});
        statsd.recordSetValue("uniqueRequest.count", text, new String[]{"path:/encode"});
        statsd.recordHistogramValue("request.size", text.length(), new String[]{"path:/encode"});
        String encodedString = Base64.getEncoder().encodeToString(text.getBytes());
        statsd.recordExecutionTime("latency", System.currentTimeMillis() - startTime, new String[]{"path:/encode"});
        return encodedString;
    }
}

Note that I need to first choose a StatsD client from here.

The “/status” handler responds with a health check ping.  Here I am monitoring my available JVM memory:

@RestController
public class StatusController {

    @RequestMapping("/status")
    public int status() {
        statsd.recordGaugeValue("memory.free", Runtime.getRuntime().freeMemory(), new String[]{"path:/status"});
        return 0;
    }
}

 

Step 2. Emit custom metrics using collectd (optional)

collectd is another popular, open-source daemon for collecting application metrics. If I want to use the hundreds of available collectd plugins to gather application metrics, I can also use the CloudWatch Agent to publish collectd metrics to CloudWatch for 15-months retention. In practice, I might choose to use either StatsD or collectd to collect custom metrics, or I have the option to use both. All of these use cases  are supported by the CloudWatch agent.

Using the same demo RESTful service, I’ll show you how to monitor my service health using the collectd cURL plugin, which passes the collectd metrics to CloudWatch Agent via the network plugin.

For my RESTful service, the “/status” handler returns HTTP code 200 to signify that it’s up and running. This is important to monitor the health of my service and trigger an alert when the application does not respond with a HTTP 200 success code. Additionally, I want to monitor the lapsed time for each health check request.

To collect these metrics using collectd, I have a collectd daemon installed on the EC2 instance, running version 5.8.0. Here is my collectd config:

LoadPlugin logfile
LoadPlugin curl
LoadPlugin network

<Plugin logfile>
  LogLevel "debug"
  File "/var/log/collectd.log"
  Timestamp true
</Plugin>

<Plugin curl>
    <Page "status">
        URL "http://localhost:8080/status";
        MeasureResponseTime true
        MeasureResponseCode true
    </Page>
</Plugin>

<Plugin network>
    <Server "127.0.0.1" "25826">
        SecurityLevel Encrypt
        Username "user"
        Password "secret"
    </Server>
</Plugin>

 

For the cURL plugin, I configured it to measure response time (latency) and response code (HTTP status code) from the RESTful service.

Note that for the network plugin, I used Encrypt mode which requires an authentication file for the CloudWatch Agent to authenticate incoming collectd requests.  Click here for full details on the collectd installation script.

 

Step 3. Configure the CloudWatch agent

So far, I have shown you how to:

A.  Use StatsD to emit custom metrics to monitor my service health
B.  Optionally use collectd to collect metrics using plugins

Next, I will install and configure the CloudWatch agent to accept metrics from both the StatsD client and collectd plugins.

I installed the CloudWatch Agent following the instructions in the user guide, but here are the detailed steps:

Install CloudWatch Agent:

wget https://s3.amazonaws.com/amazoncloudwatch-agent/linux/amd64/latest/AmazonCloudWatchAgent.zip -O AmazonCloudWatchAgent.zip && unzip -o AmazonCloudWatchAgent.zip && sudo ./install.sh

Configure CloudWatch Agent to receive metrics from StatsD and collectd:

{
  "metrics": {
    "append_dimensions": {
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "collectd": {},
      "statsd": {}
    }
  }
}

Pass the above config (config.json) to the CloudWatch Agent:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:config.json -s

In case you want to skip these steps and just execute my sample agent install script, you can find it here.

 

Step 4. Generate and monitor application traffic in CloudWatch

Now that I have the CloudWatch agent installed and configured to receive StatsD and collect metrics, I’m going to generate traffic through the service:

echo "send 100 requests"
for i in {1..100}
do
   curl "localhost:8080/encode?text=TextToEncode_${i}[email protected]#%"
   echo ""
   sleep 1
done

 

Next, I log in to the CloudWatch console and check that the service is up and running. Here’s a graph of the StatsD metrics:

 

Here is a graph of the collectd metrics:

 

Conclusion

With StatsD and collectd support, you can now use the CloudWatch Agent to collect and monitor your custom applications in addition to the system metrics and application logs it already collects. Furthermore, you can create operational dashboards with these metrics, set alarms to take automated actions when free memory is low, and troubleshoot issues by diving into the application logs.  Note that StatsD supports both Windows and Linux operating systems while collectd is Linux only.  For Windows, you can also continue to use Windows Performance Counters to collect custom metrics instead.

The CloudWatch Agent with custom metrics support (version 1.203420.0 or later) is available in all public AWS Regions, AWS GovCloud (US), with AWS China (Beijing) and AWS China (Ningxia) coming soon.

The agent is free to use; you pay the usual CloudWatch prices for logs and custom metrics.

For more details, head over to the CloudWatch user guide for StatsD and collectd.

Building an Amazon CloudWatch Dashboard Outside of the AWS Management Console

Post Syndicated from Stephen McCurry original https://aws.amazon.com/blogs/devops/building-an-amazon-cloudwatch-dashboard-outside-of-the-aws-management-console/

Steve McCurry is a Senior Product Manager for CloudWatch

This is the second in a series of two blog posts that demonstrate how to use the new CloudWatch
snapshot graphs feature. You can find the first post here.

A key challenge for any DevOps team is to provide sufficient monitoring visibility on service
health. Although CloudWatch dashboards are a powerful tool for monitoring your systems and
applications, the dashboards are accessible only to users with permissions to the AWS
Management Console. You can now use a new CloudWatch feature, snapshot graphs, to create
dashboards that contain CloudWatch graphs and are available outside of the AWS Management
Console. You can display CloudWatch snapshot graphs on your internal wiki pages or TV-based
dashboards. You can integrate them with chat applications and ticketing and bug tracking tools.

This blog post shows you how to embed CloudWatch snapshot graphs into your websites using a
lightweight, embeddable widget written in JavaScript.

Snapshot graphs overview

CloudWatch snapshot graphs are images of CloudWatch charts that are useful for building
custom dashboards or integrating with tools outside of AWS. Although the images are static,
they can be refreshed frequently to create a live dashboard experience.

CloudWatch dashboards and charts provide flexible, interactive visualizations that can be used to
create unified operational views across your AWS resources and metrics. However, maybe you
want to display CloudWatch charts on a TV screen for team-level visibility, take snapshots of
charts for auditing in ticketing systems and bug tracking tools, or share snapshots in chat
applications to collaborate on an issue. For these use cases and more, snapshot graphs are an
ideal tool for integrating CloudWatch charts with your webpages and third-party applications.

Snapshot graphs are available through the CloudWatch API, which you can use through the
AWS SDKs or CLI. The charts you request through the API are represented as JSON. To copy
the JSON definition of the graph and use it in the API request, open the Amazon CloudWatch
console. You’ll find the JSON on the Source tab of the Metrics page, as shown here.

All of the features of the CloudWatch line and stacked graphs are available in snapshot graphs,
including vertical and horizontal annotations.

Embedding a snapshot graph in your webpage

In this demonstration, we will set up monitoring for an EC2 instance and embed a CloudWatch
snapshot graph for CPUUtilization in a website outside of the AWS Management Console. The
embeddable widget can be configured to support any CloudWatch line or stacked chart. This
demonstration involves these steps:

  1. Create an EC2 instance to monitor.
  2. Create the Lambda function that calls CloudWatch GetMetricWidgetImage.
  3. Create an API Gateway endpoint that proxies requests to the Lambda function.
  4. Embed the widget into a website and configure it for the API Gateway request.

The code for this solution is available from the SnapshotWidgetDemo GitHub repo.

The embeddable JavaScript widget will communicate with CloudWatch through a gateway in
Amazon API Gateway and an AWS Lambda backend. The advantage of using API Gateway is
the additional flexibility you have to secure the endpoint and create fine-grained access control.
For example, you can block access to the endpoint from outside of your corporate network.
Amazon Route 53 could be an alternative solution.

The end goal is to have a webpage running on your local machine that displays a CloudWatch
snapshot graph displaying live metric data from a sample EC2 instance. The sample code
includes a basic webpage containing the embed code.

The JavaScript widget requests a snapshot graph from an API Gateway endpoint. API Gateway
proxies the request to a Lambda function that calls the new CloudWatch API service,
GetMetricWidgetImage. The retrieved snapshot graph is returned in binary and displayed on the
website in an IMG HTML tag.

Here is what the end-to-end solution looks like:

Server setup

  1. Download the repository.
  2. Navigate to ./server and run npm install
  3. From the server folder, run zip -r snapshotwidgetdemo.zip ./*
  4. Upload snapshotwidgetdemo.zip to any S3 bucket.
  5. Upload ./server/apigateway-lambda.json to any S3 bucket.
  6. Navigate to the AWS CloudFormation console and choose Create Stack.
    • Point the new stack to the S3 location in step 5.
    • During setup, you will be asked for the Lambda S3 bucket name from step 4.

The AWS CloudFormation script will create all the required server-side components described in
the previous section.

Client setup

  1. Navigate to ./client and run npm install
  2. Edit ./demo/index.html to replace the following placeholders with your values.
    a. <YOUR_INSTANCE_ID> You can find the instance ID in the AWS
    CloudFormation stack output.
    b. <YOUR_API_GATEWAY_URL> You can find the full URL in the AWS
    CloudFormation stack output.
    c. <YOUR_API_KEY> The API gateway requires a key. The key reference but not
    the key itself appears in theAWS CloudFormation stack output. To retrieve the
    key value, go to the Keys tab of the Amazon API Gateway console.
  3. Build the component using WebPack ./node_modules/.bin/webpack –config
    webpack.config.js
  4. Server the demo webpage on localhost ./node_modules/.bin/webpack-dev-server —
    open

The browser should open at index.html automatically. The page contains one embedded snapshot
graph with the CPU utilization of your EC2 instance.

Troubleshooting

If you don’t see anything on the webpage, use the browser console tools to check for console
error messages.

If you still can’t debug the problem, go to the Amazon API Gateway console. On the Logs tab,
make sure that Enable CloudWatch Logs is selected, as shown here:

To check the Lambda logs, in the CloudWatch console, choose the Logs tab, and then search for
the name of your Lambda function.

Summary

This blog post provided a solution for embedding CloudWatch snapshot graphs into webpages
and wikis outside of the AWS Management Console. To read the other blog post in this series
about CloudWatch snapshot graphs, see Reduce Time to Resolution with Amazon CloudWatch
Snapshot Graphs and Alerts.

For more information, see the snapshot graphs API documentation or visit our home page to
learn more about how Amazon CloudWatch achieves monitoring visibility for your cloud
resources and applications.

It would be great to hear your feedback.

AWS Online Tech Talks – June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-june-2018/

AWS Online Tech Talks – June 2018

Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

 

Analytics & Big Data

June 18, 2018 | 11:00 AM – 11:45 AM PTGet Started with Real-Time Streaming Data in Under 5 Minutes – Learn how to use Amazon Kinesis to capture, store, and analyze streaming data in real-time including IoT device data, VPC flow logs, and clickstream data.
June 20, 2018 | 11:00 AM – 11:45 AM PT – Insights For Everyone – Deploying Data across your Organization – Learn how to deploy data at scale using AWS Analytics and QuickSight’s new reader role and usage based pricing.

 

AWS re:Invent
June 13, 2018 | 05:00 PM – 05:30 PM PTEpisode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar.
Compute

June 25, 2018 | 01:00 PM – 01:45 PM PTAccelerating Containerized Workloads with Amazon EC2 Spot Instances – Learn how to efficiently deploy containerized workloads and easily manage clusters at any scale at a fraction of the cost with Spot Instances.

June 26, 2018 | 01:00 PM – 01:45 PM PTEnsuring Your Windows Server Workloads Are Well-Architected – Get the benefits, best practices and tools on running your Microsoft Workloads on AWS leveraging a well-architected approach.

 

Containers
June 25, 2018 | 09:00 AM – 09:45 AM PTRunning Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.

 

Databases

June 18, 2018 | 01:00 PM – 01:45 PM PTOracle to Amazon Aurora Migration, Step by Step – Learn how to migrate your Oracle database to Amazon Aurora.
DevOps

June 20, 2018 | 09:00 AM – 09:45 AM PTSet Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tools – Learn how to set up a CI/CD pipeline for deploying containers using the AWS Developer Tools.

 

Enterprise & Hybrid
June 18, 2018 | 09:00 AM – 09:45 AM PTDe-risking Enterprise Migration with AWS Managed Services – Learn how enterprise customers are de-risking cloud adoption with AWS Managed Services.

June 19, 2018 | 11:00 AM – 11:45 AM PTLaunch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new

 

AWS Environments

June 21, 2018 | 11:00 AM – 11:45 AM PTLeading Your Team Through a Cloud Transformation – Learn how you can help lead your organization through a cloud transformation.

June 21, 2018 | 01:00 PM – 01:45 PM PTEnabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.

June 28, 2018 | 01:00 PM – 01:45 PM PTFireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device.
IoT

June 27, 2018 | 11:00 AM – 11:45 AM PTAWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.

 

Machine Learning

June 19, 2018 | 09:00 AM – 09:45 AM PTIntegrating Amazon SageMaker into your Enterprise – Learn how to integrate Amazon SageMaker and other AWS Services within an Enterprise environment.

June 21, 2018 | 09:00 AM – 09:45 AM PTBuilding Text Analytics Applications on AWS using Amazon Comprehend – Learn how you can unlock the value of your unstructured data with NLP-based text analytics.

 

Management Tools

June 20, 2018 | 01:00 PM – 01:45 PM PTOptimizing Application Performance and Costs with Auto Scaling – Learn how selecting the right scaling option can help optimize application performance and costs.

 

Mobile
June 25, 2018 | 11:00 AM – 11:45 AM PTDrive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.

 

Security, Identity & Compliance

June 26, 2018 | 09:00 AM – 09:45 AM PTUnderstanding AWS Secrets Manager – Learn how AWS Secrets Manager helps you rotate and manage access to secrets centrally.
June 28, 2018 | 09:00 AM – 09:45 AM PTUsing Amazon Inspector to Discover Potential Security Issues – See how Amazon Inspector can be used to discover security issues of your instances.

 

Serverless

June 19, 2018 | 01:00 PM – 01:45 PM PTProductionize Serverless Application Building and Deployments with AWS SAM – Learn expert tips and techniques for building and deploying serverless applications at scale with AWS SAM.

 

Storage

June 26, 2018 | 11:00 AM – 11:45 AM PTDeep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services.
June 27, 2018 | 01:00 PM – 01:45 PM PTChanging the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances.
June 28, 2018 | 11:00 AM – 11:45 AM PTBig Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.

AWS Online Tech Talks – May and Early June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-may-and-early-june-2018/

AWS Online Tech Talks – May and Early June 2018  

Join us this month to learn about some of the exciting new services and solution best practices at AWS. We also have our first re:Invent 2018 webinar series, “How to re:Invent”. Sign up now to learn more, we look forward to seeing you.

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

Analytics & Big Data

May 21, 2018 | 11:00 AM – 11:45 AM PT Integrating Amazon Elasticsearch with your DevOps Tooling – Learn how you can easily integrate Amazon Elasticsearch Service into your DevOps tooling and gain valuable insight from your log data.

May 23, 2018 | 11:00 AM – 11:45 AM PTData Warehousing and Data Lake Analytics, Together – Learn how to query data across your data warehouse and data lake without moving data.

May 24, 2018 | 11:00 AM – 11:45 AM PTData Transformation Patterns in AWS – Discover how to perform common data transformations on the AWS Data Lake.

Compute

May 29, 2018 | 01:00 PM – 01:45 PM PT – Creating and Managing a WordPress Website with Amazon Lightsail – Learn about Amazon Lightsail and how you can create, run and manage your WordPress websites with Amazon’s simple compute platform.

May 30, 2018 | 01:00 PM – 01:45 PM PTAccelerating Life Sciences with HPC on AWS – Learn how you can accelerate your Life Sciences research workloads by harnessing the power of high performance computing on AWS.

Containers

May 24, 2018 | 01:00 PM – 01:45 PM PT – Building Microservices with the 12 Factor App Pattern on AWS – Learn best practices for building containerized microservices on AWS, and how traditional software design patterns evolve in the context of containers.

Databases

May 21, 2018 | 01:00 PM – 01:45 PM PTHow to Migrate from Cassandra to Amazon DynamoDB – Get the benefits, best practices and guides on how to migrate your Cassandra databases to Amazon DynamoDB.

May 23, 2018 | 01:00 PM – 01:45 PM PT5 Hacks for Optimizing MySQL in the Cloud – Learn how to optimize your MySQL databases for high availability, performance, and disaster resilience using RDS.

DevOps

May 23, 2018 | 09:00 AM – 09:45 AM PT.NET Serverless Development on AWS – Learn how to build a modern serverless application in .NET Core 2.0.

Enterprise & Hybrid

May 22, 2018 | 11:00 AM – 11:45 AM PTHybrid Cloud Customer Use Cases on AWS – Learn how customers are leveraging AWS hybrid cloud capabilities to easily extend their datacenter capacity, deliver new services and applications, and ensure business continuity and disaster recovery.

IoT

May 31, 2018 | 11:00 AM – 11:45 AM PTUsing AWS IoT for Industrial Applications – Discover how you can quickly onboard your fleet of connected devices, keep them secure, and build predictive analytics with AWS IoT.

Machine Learning

May 22, 2018 | 09:00 AM – 09:45 AM PTUsing Apache Spark with Amazon SageMaker – Discover how to use Apache Spark with Amazon SageMaker for training jobs and application integration.

May 24, 2018 | 09:00 AM – 09:45 AM PTIntroducing AWS DeepLens – Learn how AWS DeepLens provides a new way for developers to learn machine learning by pairing the physical device with a broad set of tutorials, examples, source code, and integration with familiar AWS services.

Management Tools

May 21, 2018 | 09:00 AM – 09:45 AM PTGaining Better Observability of Your VMs with Amazon CloudWatch – Learn how CloudWatch Agent makes it easy for customers like Rackspace to monitor their VMs.

Mobile

May 29, 2018 | 11:00 AM – 11:45 AM PT – Deep Dive on Amazon Pinpoint Segmentation and Endpoint Management – See how segmentation and endpoint management with Amazon Pinpoint can help you target the right audience.

Networking

May 31, 2018 | 09:00 AM – 09:45 AM PTMaking Private Connectivity the New Norm via AWS PrivateLink – See how PrivateLink enables service owners to offer private endpoints to customers outside their company.

Security, Identity, & Compliance

May 30, 2018 | 09:00 AM – 09:45 AM PT – Introducing AWS Certificate Manager Private Certificate Authority (CA) – Learn how AWS Certificate Manager (ACM) Private Certificate Authority (CA), a managed private CA service, helps you easily and securely manage the lifecycle of your private certificates.

June 1, 2018 | 09:00 AM – 09:45 AM PTIntroducing AWS Firewall Manager – Centrally configure and manage AWS WAF rules across your accounts and applications.

Serverless

May 22, 2018 | 01:00 PM – 01:45 PM PTBuilding API-Driven Microservices with Amazon API Gateway – Learn how to build a secure, scalable API for your application in our tech talk about API-driven microservices.

Storage

May 30, 2018 | 11:00 AM – 11:45 AM PTAccelerate Productivity by Computing at the Edge – Learn how AWS Snowball Edge support for compute instances helps accelerate data transfers, execute custom applications, and reduce overall storage costs.

June 1, 2018 | 11:00 AM – 11:45 AM PTLearn to Build a Cloud-Scale Website Powered by Amazon EFS – Technical deep dive where you’ll learn tips and tricks for integrating WordPress, Drupal and Magento with Amazon EFS.

 

 

 

 

Wanted: Sales Engineer

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-sales-engineer/

At inception, Backblaze was a consumer company. Thousands upon thousands of individuals came to our website and gave us $5/mo to keep their data safe. But, we didn’t sell business solutions. It took us years before we had a sales team. In the last couple of years, we’ve released products that businesses of all sizes love: Backblaze B2 Cloud Storage and Backblaze for Business Computer Backup. Those businesses want to integrate Backblaze deeply into their infrastructure, so it’s time to hire our first Sales Engineer!

Company Description:
Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

Some Backblaze Perks:

  • Competitive healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee
  • Fully stocked Micro kitchen
  • Catered breakfast and lunches
  • Awesome people who work on awesome projects
  • Childcare bonus
  • Normal work hours
  • Get to bring your pets into the office
  • San Mateo Office – located near Caltrain and Highways 101 & 280.

Backblaze B2 cloud storage is a building block for almost any computing service that requires storage. Customers need our help integrating B2 into iOS apps to Docker containers. Some customers integrate directly to the API using the programming language of their choice, others want to solve a specific problem using ready made software, already integrated with B2.

At the same time, our computer backup product is deepening it’s integration into enterprise IT systems. We are commonly asked for how to set Windows policies, integrate with Active Directory, and install the client via remote management tools.

We are looking for a sales engineer who can help our customers navigate the integration of Backblaze into their technical environments.

Are you 1/2” deep into many different technologies, and unafraid to dive deeper?

Can you confidently talk with customers about their technology, even if you have to look up all the acronyms right after the call?

Are you excited to setup complicated software in a lab and write knowledge base articles about your work?

Then Backblaze is the place for you!

Enough about Backblaze already, what’s in it for me?
In this role, you will be given the opportunity to learn about the technologies that drive innovation today; diverse technologies that customers are using day in and out. And more importantly, you’ll learn how to learn new technologies.

Just as an example, in the past 12 months, we’ve had the opportunity to learn and become experts in these diverse technologies:

  • How to setup VM servers for lab environments, both on-prem and using cloud services.
  • Create an automatically “resetting” demo environment for the sales team.
  • Setup Microsoft Domain Controllers with Active Directory and AD Federation Services.
  • Learn the basics of OAUTH and web single sign on (SSO).
  • Archive video workflows from camera to media asset management systems.
  • How upload/download files from Javascript by enabling CORS.
  • How to install and monitor online backup installations using RMM tools, like JAMF.
  • Tape (LTO) systems. (Yes – people still use tape for storage!)

How can I know if I’ll succeed in this role?

You have:

  • Confidence. Be able to ask customers questions about their environments and convey to them your technical acumen.
  • Curiosity. Always want to learn about customers’ situations, how they got there and what problems they are trying to solve.
  • Organization. You’ll work with customers, integration partners, and Backblaze team members on projects of various lengths. You can context switch and either have a great memory or keep copious notes. Your checklists have their own checklists.

You are versed in:

  • The fundamentals of Windows, Linux and Mac OS X operating systems. You shouldn’t be afraid to use a command line.
  • Building, installing, integrating and configuring applications on any operating system.
  • Debugging failures – reading logs, monitoring usage, effective google searching to fix problems excites you.
  • The basics of TCP/IP networking and the HTTP protocol.
  • Novice development skills in any programming/scripting language. Have basic understanding of data structures and program flow.
  • Your background contains:

  • Bachelor’s degree in computer science or the equivalent.
  • 2+ years of experience as a pre or post-sales engineer.
  • The right extra credit:
    There are literally hundreds of previous experiences you can have had that would make you perfect for this job. Some experiences that we know would be helpful for us are below, but make sure you tell us your stories!

  • Experience using or programming against Amazon S3.
  • Experience with large on-prem storage – NAS, SAN, Object. And backing up data on such storage with tools like Veeam, Veritas and others.
  • Experience with photo or video media. Media archiving is a key market for Backblaze B2.
  • Program arduinos to automatically feed your dog.
  • Experience programming against web or REST APIs. (Point us towards your projects, if they are open source and available to link to.)
  • Experience with sales tools like Salesforce.
  • 3D print door stops.
  • Experience with Windows Servers, Active Directory, Group policies and the like.
  • What’s it like working with the Sales team?
    The Backblaze sales team collaborates. We help each other out by sharing ideas, templates, and our customer’s experiences. When we talk about our accomplishments, there is no “I did this,” only “we”. We are truly a team.

    We are honest to each other and our customers and communicate openly. We aim to have fun by embracing crazy ideas and creative solutions. We try to think not outside the box, but with no boxes at all. Customers are the driving force behind the success of the company and we care deeply about their success.

    If this all sounds like you:

    1. Send an email to [email protected] with the position in the subject line.
    2. Tell us a bit about your Sales Engineering experience.
    3. Include your resume.

    The post Wanted: Sales Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.