All posts by Chris Barclay

Validating AWS CodeCommit Pull Requests with AWS CodeBuild and AWS Lambda

Post Syndicated from Chris Barclay original

Thanks to Jose Ferraris and Flynn Bundy for this great post about how to validate AWS CodeCommit pull requests with AWS CodeBuild and AWS Lambda. Both are DevOps Consultants from the AWS Professional Services’ EMEA team.

You can help ensure a high level of code quality and avoid merging code that does not integrate with previous changes by testing proposed code changes in pull requests before they are allowed to be merged. In this blog post, we’ll show you how to set up this kind of validation using AWS CodeCommit, AWS CodeBuild, and AWS Lambda. In addition, we’ll show you how to set up a pipeline to automatically build your tested, approved, and merged code changes using AWS CodePipeline.

When we talk with customers and partners, we find that they are in different stages in the adoption of DevOps methodologies such as Continuous Integration and Continuous Deployment (CI/CD). However, one of the main requirements we see is a strong emphasis on automation of delivering resources in a safe, secure, and repeatable manner. One of the fundamental principles of CI/CD is aimed at keeping everyone on the team in sync about changes happening in the codebase. With this in mind, it’s important to fail fast and fail early within a CI/CD workflow to ensure that potential issues are caught before making their way into production.

To do this, we can use services such as AWS CodeBuild for running our tests, along with AWS CodeCommit to store our source code. One of the ways we can “fail fast” is to validate pull requests with tests to see how they will integrate with the current master branch of a repository when first opened in AWS CodeCommit. By running our tests against the proposed changes prior to merging them into the master branch, we can ensure a high level of quality early on, catch any potential issues, and boost the confidence of the developer in relation to their changes. In this way, you can start validating your pull requests in AWS CodeCommit by utilizing AWS Lambda and AWS CodeBuild to automatically trigger builds and tests of your development branches.

We can also use services such as AWS CodePipeline for visualizing and creating our pipeline, and automatically building and deploying merged code that has met the validation bar for pull requests.

The following diagram shows the workflow of a pull request. The AWS CodeCommit repository contains two branches, the master branch that contains approved code, and the development branch, where changes to the code are developed. In this workflow, a pull request is created with the new code in the development branch, which the developer wants to merge into the master branch. The creation of the pull request is an event detected by AWS CloudWatch. This event will start two separate actions:
• It triggers an AWS Lambda function that will post an automated comment to the pull request that indicates a build to test the changes is about to begin.
• It also triggers an AWS CodeBuild project that will build and validate those changes.

When the build completes, AWS CloudWatch detects that event. Another AWS Lambda function posts an automated comment to the pull request with the results of the build and a link to the build logs. Based on this automated testing, the developer who opened the pull request can update the code to address any build failures, and then update the pull request with those changes. Those updates will be built, and the build results are then posted to the pull request as a comment.

Let’s show how this works in a specific example project. This project has its own set of tasks defined in the build specification file that will execute and validate this specific pull request. The buildspec.yml for our example AWS CloudFormation template contains the following code:

version: 0.2

      - pip install cfn-lint
      - cfn-lint --template ./template.yaml --regions $AWS_REGION
      - aws cloudformation validate-template --template-body file://$(pwd)/template.yaml
    - '*'

In this example we are installing cfn-lint, which perform various checks against our template, we are also running the AWS CloudFormation validate-template command via the AWS CLI.

Once the code included in the pull request has been built, AWS CloudWatch detects the build complete event and passes along the outcome to a Lambda function that will update the specific commit with a comment that notifies the users of the results. It also includes a link to build logs in AWS CodeBuild. This process repeats any time the pull request is updated. For example, if an initial pull request was opened but failed the set of tests associated with the project, the developer might fix the code and make an update to the currently opened pull request. This will in turn trigger the function to run again and update the comments section with the test results.

Testing and validating pull requests before they can be merged into production code is a common approach and a best practice when working with CI/CD. Once the pull request is approved and merged into the production branch, it is also a good CI/CD practice to automatically build, test, and deploy that code. This is why we’ve structured this into two different AWS CloudFormation stacks (both can be found in our GitHub repository). One contains a base layer template that contains the resources you would only need to create once, in this case the AWS Lambda functions that test and update pull requests. The second stack includes an example of a CI/CD pipeline defined in AWS CloudFormation that imports the resources from the base layer stack.

We start by creating our base layer, which creates the Lambda functions and sets up AWS IAM roles that the functions will use to interact with the various AWS services. Once this stack is in place, we can add one or more pipeline stacks which import some of the values from the base layer. The pipeline will automatically build any changes merged into the master branch of the repository. Once any pipeline stack is complete, we have an AWS CodeCommit repository, AWS CodeBuild project, and an AWS CodePipeline pipeline set up and ready for deployment.

We can now push some code into our repository on the master branch to trigger a run-through of our pipeline.

In this example we will use the following AWS CloudFormation template. This template creates a single Amazon S3 bucket. This template will be the artifact that we push through our CI/CD pipeline and deploy to our stages.

AWSTemplateFormatVersion: '2010-09-09'
Description: 'A sample CloudFormation template that we can use to validate in our pipeline'
    Type: 'AWS::S3::Bucket'

Once this code is tested and approved in a pull request, it will be merged into the production branch as part of the pull request approve and merge process. This will automatically start our pipeline in AWS CodePipeline, and will run through to the stages defined for it. For example:

Now we can make some changes to our code base in the development branch and open a pull request. First, edit the file to make a typo in our CloudFormation template so we can test the validation.

AWSTemplateFormatVersion: '2010-09-09'
  License: Apache-2.0
Description: 'A sample CloudFormation template that we can use to validate in our pipeline'
    Type: 'AWS::S3::Bucket1'

Notice that we changed the S3 bucket to be AWS::S3::Bucket1. This doesn’t exist, so cfn-lint will return a failure when it attempts to validate the template.

Now push this change into our development branch in the AWS CodeCommit repository and open the pull request against the production (master) branch.

From there, navigate to the comments section of the pull request. You should see a status update that the pull request is currently building.

Once the build is complete, you should see feedback on the outcome of the build and its results given to us as a comment.

Choose the Logs link to view details about the failure. We can see that we were able to catch an error related to linting rules failing.

We can remedy this and update our pull request with the updated code. Upon doing so, we can see another build has been kicked off by looking at the comments of the pull request. Once this has been completed we can confirm that our pull request has been validated as desired and our tests have passed.

Once this pull request is approved and merged to master, this will start our pipeline in AWS CodePipeline, which will take this code change through the specified stages.


Using Federated Identities with AWS CodeCommit

Post Syndicated from Chris Barclay original

Thanks to Raja Mani, AWS Solutions Architect, for this great blog that describes how federated users can access AWS CodeCommit.

You can access repositories in AWS CodeCommit using the identities used in your business. This is useful because you can reuse your existing organizational identities and authentication methods. In this blog post, we’ll focus on authenticating with Active Directory Federation Services (AD FS), but the concepts apply to other federated identity providers as well, such as Okta.

AWS Federation helps to manage access to your AWS resources centrally. With federation, you can single sign-on to your AWS accounts using your corporate directory credentials. This authenticates through AWS Identity and Access Management (IAM) policies once a user assumes an IAM role. AWS offers multiple options for federating your identities. You can learn more about them here.

If you are federating with AWS for the first time, refer the following for more information and implementation guidance:

There are two solutions available when using federated identities with AWS CodeCommit: AWS Single Sign-On and AWS Process Credential Provider.

Using CodeCommit with AWS Single Sign-On

Your first option is to use AWS Single Sign-On (AWS SSO). You can access AWS CodeCommit repositories by using temporary credentials obtained from the AWS SSO user portal. You might want to use AWS SSO if you have multiple AWS accounts and business applications and you want to manage them centrally. This solution is documented in the blog post “AWS Single Sign-On Now Enables Command Line Interface Access for AWS Account Using Corporate Credentials”.

The AWS SSO Service connects to an on-premises Microsoft Active Directory via AD Connector/AD trust. In the AWS SSO Console, you can obtain temporary credentials for the AWS Command Line Interface (AWS CLI). It is easy to manage federation for multiple AWS accounts centrally using AWS SSO. However, as of this writing, AWS SSO supports Microsoft Active Directory only. If you are using a different identity provider, use the second solution.

The high-level architecture for this solution looks like this:

  1. The user signs in to the AWS SSO Console and chooses the account they want to use for CodeCommit access. Then, they select the particular permission set for that account that has access to the CodeCommit Repository.
  2. The user then clicks the “Command line or programmatic access” link corresponding to the permission set to get the AWS Credentials.
  3. The user configures the AWS CLI to use the credentials obtained in the previous step.
  4. The user configures the Git command line tool to use the AWS CLI via the AWS CLI Credential Helper. For more information, see this page.

After completing all the above steps, the user can run commands from the Git client.

Using CodeCommit with AWS Process Credential Provider

Your second option is to use the AWS Process Credential Provider utility. You might want to use this solution if you do not currently use AWS SSO to centrally manage access to AWS accounts. The following diagram shows a high-level architectural view of what happens when the AWS Process Credential Provider utility is invoked from the AWS CLI.

  1. The AWS Process Credential Provider utility is invoked by AWS CLI. It calls the Active Directory Federation Services portal sign-in page and provides the Active Directory authentication credentials. If you are using a different federated access solution, it works in a similar manner.
  2. AD FS authenticates the user against Active Directory.
  3. AD FS builds the Security Assertion Markup Language (SAML) response and sends it back to the AWS Process Credential Provider utility.
  4. The AWS Process Credential Provider utility connects to AWS Simple Token Service (STS) using STS AssumeRoleWithSAML.
  5. STS sends temporary credentials to the AWS Process Credential Provider utility.
  6. The user is authenticated and can successfully connect to the AWS CodeCommit repository from their local Git client, command line, or terminal.

I am going to demonstrate this solution by cloning an AWS CodeCommit repository called ExampleCorpRepository from the US East (Virginia) Region (us-east-1) region by authenticating against AD FS. I used an Amazon EC2 instance to demonstrate this solution, but the same steps can be used in other environments, such as your business intranet.


  • Setup AWS Federated authentication with your identity provider. You can use the blog post mentioned at the beginning of this blog post for reference.
  • Install Git on your local computer, including a Git command line from here:
  • Install the AWS Process Credential Provider utility “awsprocesscreds” on your local computer according to the installation instructions on this page:
  • Install a Git CodeCommit helper utility called “git-remote-codecommit” on your local computer from this page:
  • Enable the ADFS Sign-on page (You can get it from your ADFS Admin) in your corporate network. Make sure it has the loginToRp parameter set with the correct value. Your ADFS admin can provide that value too.
  • Create an AWS IAM role that grants access to AWS CodeCommit repositories in the US East (N. Virginia) region. Make sure to record the Amazon Resource Name (ARN) for the role that grants users who assume the role access to AWS CodeCommit repositories. Refer this page for more info about CodeCommit.
  • Create an AD FS user and password.
  • Create an AWS CodeCommit repository in the US East (Virginia) region named ExampleCorpRepository.

Step 1: Create an AWS Command Line Interface (CLI) Profile
This step configures the AWS CLI to use the AWS Process Credential Provider utility you installed as part of the prerequisites. It will authenticate against ADFS using the user name and password you provide. If the authentication is successful, it will obtain temporary AccessKeyId and SecretAccessKey tokens from AWS STS and pass them to the AWS CLI.

1. Open the .aws/config file in a plain-text editor. Create a new profile entry named adfs in the config file and set it up to use the AWS Process Credential Provider utility. For example, I created the following entry in the config file (in my environment, the config file is found in the /home/ec2-user/.aws directory).

[profile adfs]
credential_process=awsprocesscreds-saml -e '' -u '[email protected]' -p adfs -a arn:aws:iam::xxxxxxxxxxxx:role/ADFS-Production

It takes 4 parameters: the URL of your ADFS Sign-on page (-e flag), the user name of your AD_FS user (-u flag), the name of your SAML provider (-p flag) and the ARN of the role in AWS IAM your user will assume for access to AWS CodeCommit (-a flag).

  • The first parameter is AD_FS Sign-on page and it is also known as IdP-initiated login page. For AD_FS, the Sign-on page takes the form of https://<fqdn>/adfs/ls/IdpInitiatedSignOn.aspx with query string value of loginToRp=urn:amazon:webservices.
  • The second parameter is AD_FS user name which will be your corporate user id / user id you created specifically for CodeCommit access in the prerequisite steps. In a real-world example, this is something your users would obtain from your AD FS administrator.
  • The third parameter is the name of your federated access identity. Currently the only supported providers are AD_FS and Okta.
  • The fourth parameter is the AWS IAM role ARN that the AD_FS user will assume in order to be granted access to the AWS CodeCommit repository. This is the role you created in the prerequisites. Refer to for more information.

The AWS Process Credential Provider utility will prompt you for the password for the configured AD_FS user the first time you invoke an AWS CLI command at the command line or terminal. If authentication is successful, it caches the security tokens for further AWS CLI usage until the cache expires. By default, tokens are valid for an hour. If you want to change the default, you can change it as per the instructions in this blog post (Refer to the section “Adjusting Session Duration”).

2. Save your config file changes and open a new command line or terminal session.

Step 2: Clone the AWS CodeCommit Repository to your Local Computer

This step uses git-remote-codecommit, the Git CodeCommit helper utility you installed in the prerequisities section. It simplifies the way you push and pull changes to repositories from your local Git client to AWS CodeCommit. You can run the following command to clone your repository. In my case, I have a repository called ExampleCorpRepository. I cloned it to my local computer, creating a local repo, by using the following command:

git clone codecommit://[email protected]

The above command takes one parameter in the form of codecommit://<AWS CLI Profile Name>@<AWS CodeCommit Repository Name>. The AWS CLI profile name is the one you created in Step 1 (in the example, it is named adfs). The repository name is the actual name of the AWS CodeCommit repository you want to clone. Git will remember the repository information. You won’t need to pass the parameter again when you run additional Git commands, such as git commit or git push. It will only prompt you for a password if the session is expired.


We hope this has illustrated how you can use CodeCommit with federated users. Happy coding!

Continuous Deployment to Kubernetes using AWS CodePipeline, AWS CodeCommit, AWS CodeBuild, Amazon ECR and AWS Lambda

Post Syndicated from Chris Barclay original

Thank you to my colleague Omar Lari for this blog on how to create a continuous deployment pipeline for Kubernetes!

You can use Kubernetes and AWS together to create a fully managed, continuous deployment pipeline for container based applications. This approach takes advantage of Kubernetes’ open-source system to manage your containerized applications, and the AWS developer tools to manage your source code, builds, and pipelines.

This post describes how to create a continuous deployment architecture for containerized applications. It uses AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, and AWS Lambda to deploy containerized applications into a Kubernetes cluster. In this environment, developers can remain focused on developing code without worrying about how it will be deployed, and development managers can be satisfied that the latest changes are always deployed.

What is Continuous Deployment?

There are many articles, posts and even conferences dedicated to the practice of continuous deployment. For the purposes of this post, I will summarize continuous delivery into the following points:

  • Code is more frequently released into production environments
  • More frequent releases allow for smaller, incremental changes reducing risk and enabling simplified roll backs if needed
  • Deployment is automated and requires minimal user intervention

For a more information, see “Practicing Continuous Integration and Continuous Delivery on AWS”.

How can you use continuous deployment with AWS and Kubernetes?

You can leverage AWS services that support continuous deployment to automatically take your code from a source code repository to production in a Kubernetes cluster with minimal user intervention. To do this, you can create a pipeline that will build and deploy committed code changes as long as they meet the requirements of each stage of the pipeline.

To create the pipeline, you will use the following services:

  • AWS CodePipeline. AWS CodePipeline is a continuous delivery service that models, visualizes, and automates the steps required to release software. You define stages in a pipeline to retrieve code from a source code repository, build that source code into a releasable artifact, test the artifact, and deploy it to production. Only code that successfully passes through all these stages will be deployed. In addition, you can optionally add other requirements to your pipeline, such as manual approvals, to help ensure that only approved changes are deployed to production.
  • AWS CodeCommit. AWS CodeCommit is a secure, scalable, and managed source control service that hosts private Git repositories. You can privately store and manage assets such as your source code in the cloud and configure your pipeline to automatically retrieve and process changes committed to your repository.
  • AWS CodeBuild. AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces artifacts that are ready to deploy. You can use AWS CodeBuild to both build your artifacts, and to test those artifacts before they are deployed.
  • AWS Lambda. AWS Lambda is a compute service that lets you run code without provisioning or managing servers. You can invoke a Lambda function in your pipeline to prepare the built and tested artifact for deployment by Kubernetes to the Kubernetes cluster.
  • Kubernetes. Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It provides a platform for running, deploying, and managing containers at scale.

An Example of Continuous Deployment to Kubernetes:

The following example illustrates leveraging AWS developer tools to continuously deploy to a Kubernetes cluster:

  1. Developers commit code to an AWS CodeCommit repository and create pull requests to review proposed changes to the production code. When the pull request is merged into the master branch in the AWS CodeCommit repository, AWS CodePipeline automatically detects the changes to the branch and starts processing the code changes through the pipeline.
  2. AWS CodeBuild packages the code changes as well as any dependencies and builds a Docker image. Optionally, another pipeline stage tests the code and the package, also using AWS CodeBuild.
  3. The Docker image is pushed to Amazon ECR after a successful build and/or test stage.
  4. AWS CodePipeline invokes an AWS Lambda function that includes the Kubernetes Python client as part of the function’s resources. The Lambda function performs a string replacement on the tag used for the Docker image in the Kubernetes deployment file to match the Docker image tag applied in the build, one that matches the image in Amazon ECR.
  5. After the deployment manifest update is completed, AWS Lambda invokes the Kubernetes API to update the image in the Kubernetes application deployment.
  6. Kubernetes performs a rolling update of the pods in the application deployment to match the docker image specified in Amazon ECR.
    The pipeline is now live and responds to changes to the master branch of the CodeCommit repository. This pipeline is also fully extensible, you can add steps for performing testing or adding a step to deploy into a staging environment before the code ships into the production cluster.

An example pipeline in AWS CodePipeline that supports this architecture can be seen below:


We are excited to see how you leverage this pipeline to help ease your developer experience as you develop applications in Kubernetes.

You’ll find an AWS CloudFormation template with everything necessary to spin up your own continuous deployment pipeline at the CodeSuite – Continuous Deployment Reference Architecture for Kubernetes repo on GitHub. The repository details exactly how the pipeline is provisioned and how you can use it to deploy your own applications. If you have any questions, feedback, or suggestions, please let us know!

Using AWS CodeCommit Pull Requests to request code reviews and discuss code

Post Syndicated from Chris Barclay original

Thank you to Michael Edge, Senior Cloud Architect, for a great blog on CodeCommit pull requests.


AWS CodeCommit is a fully managed service for securely hosting private Git repositories. CodeCommit now supports pull requests, which allows repository users to review, comment upon, and interactively iterate on code changes. Used as a collaboration tool between team members, pull requests help you to review potential changes to a CodeCommit repository before merging those changes into the repository. Each pull request goes through a simple lifecycle, as follows:

  • The new features to be merged are added as one or more commits to a feature branch. The commits are not merged into the destination branch.
  • The pull request is created, usually from the difference between two branches.
  • Team members review and comment on the pull request. The pull request might be updated with additional commits that contain changes made in response to comments, or include changes made to the destination branch.
  • Once team members are happy with the pull request, it is merged into the destination branch. The commits are applied to the destination branch in the same order they were added to the pull request.

Commenting is an integral part of the pull request process, and is used to collaborate between the developers and the reviewer. Reviewers add comments and questions to a pull request during the review process, and developers respond to these with explanations. Pull request comments can be added to the overall pull request, a file within the pull request, or a line within a file.

To make the comments more useful, sign in to the AWS Management Console as an AWS Identity and Access Management (IAM) user. The username will then be associated with the comment, indicating the owner of the comment. Pull request comments are a great quality improvement tool as they allow the entire development team visibility into what reviewers are looking for in the code. They also serve as a record of the discussion between team members at a point in time, and shouldn’t be deleted.

AWS CodeCommit is also introducing the ability to add comments to a commit, another useful collaboration feature that allows team members to discuss code changed as part of a commit. This helps you discuss changes made in a repository, including why the changes were made, whether further changes are necessary, or whether changes should be merged. As is the case with pull request comments, you can comment on an overall commit, on a file within a commit, or on a specific line or change within a file, and other repository users can respond to your comments. Comments are not restricted to commits, they can also be used to comment on the differences between two branches, or between two tags. Commit comments are separate from pull request comments, i.e. you will not see commit comments when reviewing a pull request – you will only see pull request comments.

A pull request example

Let’s get started by running through an example. We’ll take a typical pull request scenario and look at how we’d use CodeCommit and the AWS Management Console for each of the steps.

To try out this scenario, you’ll need:

  • An AWS CodeCommit repository with some sample code in the master branch. We’ve provided sample code below.
  • Two AWS Identity and Access Management (IAM) users, both with the AWSCodeCommitPowerUser managed policy applied to them.
  • Git installed on your local computer, and access configured for AWS CodeCommit.
  • A clone of the AWS CodeCommit repository on your local computer.

In the course of this example, you’ll sign in to the AWS CodeCommit console as one IAM user to create the pull request, and as the other IAM user to review the pull request. To learn more about how to set up your IAM users and how to connect to AWS CodeCommit with Git, see the following topics:

  • Information on creating an IAM user with AWS Management Console access.
  • Instructions on how to access CodeCommit using Git.
  • If you’d like to use the same ‘hello world’ application as used in this article, here is the source code:

public class Main {
	public static void main(String[] args) {

		System.out.println("Hello, world");

The scenario below uses the us-east-2 region.

Creating the branches

Before we jump in and create a pull request, we’ll need at least two branches. In this example, we’ll follow a branching strategy similar to the one described in GitFlow. We’ll create a new branch for our feature from the main development branch (the default branch). We’ll develop the feature in the feature branch. Once we’ve written and tested the code for the new feature in that branch, we’ll create a pull request that contains the differences between the feature branch and the main development branch. Our team lead (the second IAM user) will review the changes in the pull request. Once the changes have been reviewed, the feature branch will be merged into the development branch.

Figure 1: Pull request link

Sign in to the AWS CodeCommit console with the IAM user you want to use as the developer. You can use an existing repository or you can go ahead and create a new one. We won’t be merging any changes to the master branch of your repository, so it’s safe to use an existing repository for this example. You’ll find the Pull requests link has been added just above the Commits link (see Figure 1), and below Commits you’ll find the Branches link. Click Branches and create a new branch called ‘develop’, branched from the ‘master’ branch. Then create a new branch called ‘feature1’, branched from the ‘develop’ branch. You’ll end up with three branches, as you can see in Figure 2. (Your repository might contain other branches in addition to the three shown in the figure).

Figure 2: Create a feature branch

If you haven’t cloned your repo yet, go to the Code link in the CodeCommit console and click the Connect button. Follow the instructions to clone your repo (detailed instructions are here). Open a terminal or command line and paste the git clone command supplied in the Connect instructions for your repository. The example below shows cloning a repository named codecommit-demo:

git clone

If you’ve previously cloned the repo you’ll need to update your local repo with the branches you created. Open a terminal or command line and make sure you’re in the root directory of your repo, then run the following command:

git remote update origin

You’ll see your new branches pulled down to your local repository.

$ git remote update origin
Fetching origin
 * [new branch]      develop    -> origin/develop
 * [new branch]      feature1   -> origin/feature1

You can also see your new branches by typing:

git branch --all

$ git branch --all
* master

Now we’ll make a change to the ‘feature1’ branch. Open a terminal or command line and check out the feature1 branch by running the following command:

git checkout feature1

$ git checkout feature1
Branch feature1 set up to track remote branch feature1 from origin.
Switched to a new branch 'feature1'

Make code changes

Edit a file in the repo using your favorite editor and save the changes. Commit your changes to the local repository, and push your changes to CodeCommit. For example:

git commit -am 'added new feature'
git push origin feature1

$ git commit -am 'added new feature'
[feature1 8f6cb28] added new feature
1 file changed, 1 insertion(+), 1 deletion(-)

$ git push origin feature1
Counting objects: 9, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (9/9), 617 bytes | 617.00 KiB/s, done.
Total 9 (delta 2), reused 0 (delta 0)
   2774a53..8f6cb28  feature1 -> feature1

Creating the pull request

Now we have a ‘feature1’ branch that differs from the ‘develop’ branch. At this point we want to merge our changes into the ‘develop’ branch. We’ll create a pull request to notify our team members to review our changes and check whether they are ready for a merge.

In the AWS CodeCommit console, click Pull requests. Click Create pull request. On the next page select ‘develop’ as the destination branch and ‘feature1’ as the source branch. Click Compare. CodeCommit will check for merge conflicts and highlight whether the branches can be automatically merged using the fast-forward option, or whether a manual merge is necessary. A pull request can be created in both situations.

Figure 3: Create a pull request

After comparing the two branches, the CodeCommit console displays the information you’ll need in order to create the pull request. In the ‘Details’ section, the ‘Title’ for the pull request is mandatory, and you may optionally provide comments to your reviewers to explain the code change you have made and what you’d like them to review. In the ‘Notifications’ section, there is an option to set up notifications to notify subscribers of changes to your pull request. Notifications will be sent on creation of the pull request as well as for any pull request updates or comments. And finally, you can review the changes that make up this pull request. This includes both the individual commits (a pull request can contain one or more commits, available in the Commits tab) as well as the changes made to each file, i.e. the diff between the two branches referenced by the pull request, available in the Changes tab. After you have reviewed this information and added a title for your pull request, click the Create button. You will see a confirmation screen, as shown in Figure 4, indicating that your pull request has been successfully created, and can be merged without conflicts into the ‘develop’ branch.

Figure 4: Pull request confirmation page

Reviewing the pull request

Now let’s view the pull request from the perspective of the team lead. If you set up notifications for this CodeCommit repository, creating the pull request would have sent an email notification to the team lead, and he/she can use the links in the email to navigate directly to the pull request. In this example, sign in to the AWS CodeCommit console as the IAM user you’re using as the team lead, and click Pull requests. You will see the same information you did during creation of the pull request, plus a record of activity related to the pull request, as you can see in Figure 5.

Figure 5: Team lead reviewing the pull request

Commenting on the pull request

You now perform a thorough review of the changes and make a number of comments using the new pull request comment feature. To gain an overall perspective on the pull request, you might first go to the Commits tab and review how many commits are included in this pull request. Next, you might visit the Changes tab to review the changes, which displays the differences between the feature branch code and the develop branch code. At this point, you can add comments to the pull request as you work through each of the changes. Let’s go ahead and review the pull request. During the review, you can add review comments at three levels:

  • The overall pull request
  • A file within the pull request
  • An individual line within a file

The overall pull request
In the Changes tab near the bottom of the page you’ll see a ‘Comments on changes’ box. We’ll add comments here related to the overall pull request. Add your comments as shown in Figure 6 and click the Save button.

Figure 6: Pull request comment

A specific file in the pull request
Hovering your mouse over a filename in the Changes tab will cause a blue ‘comments’ icon to appear to the left of the filename. Clicking the icon will allow you to enter comments specific to this file, as in the example in Figure 7. Go ahead and add comments for one of the files changed by the developer. Click the Save button to save your comment.

Figure 7: File comment

A specific line in a file in the pull request
A blue ‘comments’ icon will appear as you hover over individual lines within each file in the pull request, allowing you to create comments against lines that have been added, removed or are unchanged. In Figure 8, you add comments against a line that has been added to the source code, encouraging the developer to review the naming standards. Go ahead and add line comments for one of the files changed by the developer. Click the Save button to save your comment.

Figure 8: Line comment

A pull request that has been commented at all three levels will look similar to Figure 9. The pull request comment is shown expanded in the ‘Comments on changes’ section, while the comments at file and line level are shown collapsed. A ‘comment’ icon indicates that comments exist at file and line level. Clicking the icon will expand and show the comment. Since you are expecting the developer to make further changes based on your comments, you won’t merge the pull request at this stage, but will leave it open awaiting feedback. Each comment you made results in a notification being sent to the developer, who can respond to the comments. This is great for remote working, where developers and team lead may be in different time zones.

Figure 9: Fully commented pull request

Adding a little complexity

A typical development team is going to be creating pull requests on a regular basis. It’s highly likely that the team lead will merge other pull requests into the ‘develop’ branch while pull requests on feature branches are in the review stage. This may result in a change to the ‘Mergable’ status of a pull request. Let’s add this scenario into the mix and check out how a developer will handle this.

To test this scenario, we could create a new pull request and ask the team lead to merge this to the ‘develop’ branch. But for the sake of simplicity we’ll take a shortcut. Clone your CodeCommit repo to a new folder, switch to the ‘develop’ branch, and make a change to one of the same files that were changed in your pull request. Make sure you change a line of code that was also changed in the pull request. Commit and push this back to CodeCommit. Since you’ve just changed a line of code in the ‘develop’ branch that has also been changed in the ‘feature1’ branch, the ‘feature1’ branch cannot be cleanly merged into the ‘develop’ branch. Your developer will need to resolve this merge conflict.

A developer reviewing the pull request would see the pull request now looks similar to Figure 10, with a ‘Resolve conflicts’ status rather than the ‘Mergable’ status it had previously (see Figure 5).

Figure 10: Pull request with merge conflicts

Reviewing the review comments

Once the team lead has completed his review, the developer will review the comments and make the suggested changes. As a developer, you’ll see the list of review comments made by the team lead in the pull request Activity tab, as shown in Figure 11. The Activity tab shows the history of the pull request, including commits and comments. You can reply to the review comments directly from the Activity tab, by clicking the Reply button, or you can do this from the Changes tab. The Changes tab shows the comments for the latest commit, as comments on previous commits may be associated with lines that have changed or been removed in the current commit. Comments for previous commits are available to view and reply to in the Activity tab.

In the Activity tab, use the shortcut link (which looks like this </>) to move quickly to the source code associated with the comment. In this example, you will make further changes to the source code to address the pull request review comments, so let’s go ahead and do this now. But first, you will need to resolve the ‘Resolve conflicts’ status.

Figure 11: Pull request activity

Resolving the ‘Resolve conflicts’ status

The ‘Resolve conflicts’ status indicates there is a merge conflict between the ‘develop’ branch and the ‘feature1’ branch. This will require manual intervention to restore the pull request back to the ‘Mergable’ state. We will resolve this conflict next.

Open a terminal or command line and check out the develop branch by running the following command:

git checkout develop

$ git checkout develop
Switched to branch 'develop'
Your branch is up-to-date with 'origin/develop'.

To incorporate the changes the team lead made to the ‘develop’ branch, merge the remote ‘develop’ branch with your local copy:

git pull

$ git pull
remote: Counting objects: 9, done.
Unpacking objects: 100% (9/9), done.
   af13c82..7b36f52  develop    -> origin/develop
Updating af13c82..7b36f52
 src/main/java/com/amazon/helloworld/ | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Then checkout the ‘feature1’ branch:

git checkout feature1

$ git checkout feature1
Switched to branch 'feature1'
Your branch is up-to-date with 'origin/feature1'.

Now merge the changes from the ‘develop’ branch into your ‘feature1’ branch:

git merge develop

$ git merge develop
Auto-merging src/main/java/com/amazon/helloworld/
CONFLICT (content): Merge conflict in src/main/java/com/amazon/helloworld/
Automatic merge failed; fix conflicts and then commit the result.

Yes, this fails. The file has been changed in both branches, resulting in a merge conflict that can’t be resolved automatically. However, will now contain markers that indicate where the conflicting code is, and you can use these to resolve the issues manually. Edit using your favorite IDE, and you’ll see it looks something like this:


import java.util.*;

 * This class prints a hello world message

public class Main {
   public static void main(String[] args) {

<<<<<<< HEAD
        Date todaysdate = Calendar.getInstance().getTime();

        System.out.println("Hello, earthling. Today's date is: " + todaysdate);
      System.out.println("Hello, earth");
>>>>>>> develop

The code between HEAD and ‘===’ is the code the developer added in the ‘feature1’ branch (HEAD represents ‘feature1’ because this is the current checked out branch). The code between ‘===’ and ‘>>> develop’ is the code added to the ‘develop’ branch by the team lead. We’ll resolve the conflict by manually merging both changes, resulting in an updated


import java.util.*;

 * This class prints a hello world message

public class Main {
   public static void main(String[] args) {

        Date todaysdate = Calendar.getInstance().getTime();

        System.out.println("Hello, earth. Today's date is: " + todaysdate);

After saving the change you can add and commit it to your local repo:

git add src/
git commit -m 'fixed merge conflict by merging changes'

Fixing issues raised by the reviewer

Now you are ready to address the comments made by the team lead. If you are no longer pointing to the ‘feature1’ branch, check out the ‘feature1’ branch by running the following command:

git checkout feature1

$ git checkout feature1
Branch feature1 set up to track remote branch feature1 from origin.
Switched to a new branch 'feature1'

Edit the source code in your favorite IDE and make the changes to address the comments. In this example, the developer has updated the source code as follows:


import java.util.*;

 *  This class prints a hello world message
 * @author Michael Edge
 * @see HelloEarth
 * @version 1.0

public class Main {
   public static void main(String[] args) {

        Date todaysDate = Calendar.getInstance().getTime();

        System.out.println("Hello, earth. Today's date is: " + todaysDate);

After saving the changes, commit and push to the CodeCommit ‘feature1’ branch as you did previously:

git commit -am 'updated based on review comments'
git push origin feature1

Responding to the reviewer

Now that you’ve fixed the code issues you will want to respond to the review comments. In the AWS CodeCommit console, check that your latest commit appears in the pull request Commits tab. You now have a pull request consisting of more than one commit. The pull request in Figure 12 has four commits, which originated from the following activities:

  • 8th Nov: the original commit used to initiate this pull request
  • 10th Nov, 3 hours ago: the commit by the team lead to the ‘develop’ branch, merged into our ‘feature1’ branch
  • 10th Nov, 24 minutes ago: the commit by the developer that resolved the merge conflict
  • 10th Nov, 4 minutes ago: the final commit by the developer addressing the review comments

Figure 12: Pull request with multiple commits

Let’s reply to the review comments provided by the team lead. In the Activity tab, reply to the pull request comment and save it, as shown in Figure 13.

Figure 13: Replying to a pull request comment

At this stage, your code has been committed and you’ve updated your pull request comments, so you are ready for a final review by the team lead.

Final review

The team lead reviews the code changes and comments made by the developer. As team lead, you own the ‘develop’ branch and it’s your decision on whether to merge the changes in the pull request into the ‘develop’ branch. You can close the pull request with or without merging using the Merge and Close buttons at the bottom of the pull request page (see Figure 13). Clicking Close will allow you to add comments on why you are closing the pull request without merging. Merging will perform a fast-forward merge, incorporating the commits referenced by the pull request. Let’s go ahead and click the Merge button to merge the pull request into the ‘develop’ branch.

Figure 14: Merging the pull request

After merging a pull request, development of that feature is complete and the feature branch is no longer needed. It’s common practice to delete the feature branch after merging. CodeCommit provides a check box during merge to automatically delete the associated feature branch, as seen in Figure 14. Clicking the Merge button will merge the pull request into the ‘develop’ branch, as shown in Figure 15. This will update the status of the pull request to ‘Merged’, and will close the pull request.


This blog has demonstrated how pull requests can be used to request a code review, and enable reviewers to get a comprehensive summary of what is changing, provide feedback to the author, and merge the code into production. For more information on pull requests, see the documentation.

Deep Learning on AWS Batch

Post Syndicated from Chris Barclay original

Thanks to my colleague Kiuk Chung for this great post on Deep Learning using AWS Batch.


GPU instances naturally pair with deep learning as neural network algorithms can take advantage of their massive parallel processing power. AWS provides GPU instance families, such as g2 and p2, which allow customers to run scalable GPU workloads. You can leverage such scalability efficiently with AWS Batch.

AWS Batch manages the underlying compute resources on-your behalf, allowing you to focus on modeling tasks without the overhead of resource management. Compute environments (that is, clusters) in AWS Batch are pools of instances in your account, which AWS Batch dynamically scales up and down, provisioning and terminating instances with respect to the numbers of jobs. This minimizes idle instances, which in turn optimizes cost.

Moreover, AWS Batch ensures that submitted jobs are scheduled and placed onto the appropriate instance, hence managing the lifecycle of the jobs. With the addition of customer-provided AMIs, AWS Batch users can now take advantage of this elasticity and convenience for jobs that require GPU.

This post illustrates how you can run GPU-based deep learning workloads on AWS Batch. I walk you through an example of training a convolutional neural network (the LeNet architecture), using Apache MXNet to recognize handwritten digits using the MNIST dataset.

Running an MXNet job in AWS Batch

Apache MXNet is a full-featured, flexibly programmable, and highly scalable deep learning framework that supports state-of-the-art deep models, including convolutional neural networks (CNNs) and long short-term memory networks (LSTMs).

There are three steps to running an AWS Batch job:

  • Create a custom AMI
  • Create AWS Batch entities
  • Submit a training job

Create a custom AMI

Start by creating an AMI that includes the NVIDIA driver and the Amazon ECS agent. In AWS Batch, instances can be launched with the specific AMI of your choice by specifying imageId when you create your compute environment. Because you are running a job that requires GPU, you need an AMI that has the NVIDIA driver installed.

Choose Launch Stack to launch the CloudFormation template in us-east-1 in your account:

As shown below, take note of the AMI value in the Outputs tab of the CloudFormation stack. You use this as the imageId value when creating the compute environment in the next section.

Alternatively, you may follow the AWS Batch documentation to create a GPU-enabled AMI.

Create AWS Batch resources

After you have built the AMI, create the following resources:

A compute environment, is a collection of instances (compute resources) of the same or different instance types. In this case, you create a managed compute environment in which the instances are of type p2.xlarge. For imageId, specify the AMI you built in the previous section.

Then, create a job queue. In AWS Batch, jobs are submitted to a job queue that are associated to an ordered list of compute environments. After a lower order compute environment is filled, jobs spill over to the next compute environment. For this example, you associate a single compute environment to the job queue.

Finally, create a job definition, which is a template for a job specification. For those familiar with Amazon ECS, this is analogous to task definitions. You mount the directory containing the NVIDIA driver on the host to /usr/local/nvidia on the container. You also need to set the privileged flag on the container properties.

The following code creates the aforementioned resources in AWS Batch. For more information, see the AWS Batch User Guide.

git clone
cd aws-batch-helpers/gpu-example

 --subnets <subnet1,subnet2,…>\
 --security-groups <sg1,sg2,…>\
 --key-pair \
 --instance-role \
 --image-id \

Submit a training job

Now you submit a job that trains a convolutional neural network model for handwritten digit recognition. Much like Amazon ECS tasks, jobs in AWS Batch are run as commands in a Docker container. To use MXNet as your deep learning library, you need a Docker image containing MXNet. For this example, use mxnet/python:gpu.

The script submits the job, and tails the output from CloudWatch Logs.

# cd aws-batch-helpers/gpu-example
python --wait

You should see an output that looks like the following:

Submitted job [train_imagenet - e1bccebc-76d9-4cd1-885b-667ef93eb1f5] to the job queue [gpu_queue]
Job [train_imagenet - e1bccebc-76d9-4cd1-885b-667ef93eb1f5] is RUNNING.
Output [train_imagenet/e1bccebc-76d9-4cd1-885b-667ef93eb1f5/12030dd3-0734-42bf-a3d1-d99118b401eb]:

[2017-04-25T19:02:57.076Z] INFO:root:Epoch[0] Batch [100]	Speed: 15554.63 samples/sec Train-accuracy=0.861077
[2017-04-25T19:02:57.428Z] INFO:root:Epoch[0] Batch [200]	Speed: 18224.89 samples/sec Train-accuracy=0.954688
[2017-04-25T19:02:57.755Z] INFO:root:Epoch[0] Batch [300]	Speed: 19551.42 samples/sec Train-accuracy=0.965313
[2017-04-25T19:02:58.080Z] INFO:root:Epoch[0] Batch [400]	Speed: 19697.65 samples/sec Train-accuracy=0.969531
[2017-04-25T19:02:58.405Z] INFO:root:Epoch[0] Batch [500]	Speed: 19705.82 samples/sec Train-accuracy=0.968281
[2017-04-25T19:02:58.734Z] INFO:root:Epoch[0] Batch [600]	Speed: 19486.54 samples/sec Train-accuracy=0.971719
[2017-04-25T19:02:59.058Z] INFO:root:Epoch[0] Batch [700]	Speed: 19735.59 samples/sec Train-accuracy=0.973281
[2017-04-25T19:02:59.384Z] INFO:root:Epoch[0] Batch [800]	Speed: 19631.17 samples/sec Train-accuracy=0.976562
[2017-04-25T19:02:59.713Z] INFO:root:Epoch[0] Batch [900]	Speed: 19490.74 samples/sec Train-accuracy=0.979062
[2017-04-25T19:02:59.834Z] INFO:root:Epoch[0] Train-accuracy=0.976774
[2017-04-25T19:02:59.834Z] INFO:root:Epoch[0] Time cost=3.190
[2017-04-25T19:02:59.850Z] INFO:root:Saved checkpoint to "/mnt/model/mnist-0001.params"
[2017-04-25T19:03:00.079Z] INFO:root:Epoch[0] Validation-accuracy=0.969148

Job [train_imagenet - e1bccebc-76d9-4cd1-885b-667ef93eb1f5] SUCCEEDED

In reality, you may want to modify the job command to save the trained model artifact to Amazon S3 so that subsequent prediction jobs can generate predictions against the model. For information about how to reference objects in Amazon S3 in your jobs, see the Creating a Simple “Fetch & Run” AWS Batch Job post.


In this post, I walked you through an example of running a GPU-enabled job in AWS Batch, using MXNet as the deep learning library. AWS Batch exposes primitives to allow you to focus on implementing the most efficient algorithm for your workload. It enables you to manage the lifecycle of submitted jobs and dynamically adapt the infrastructure requirements of your jobs within the specified bounds. It’s easy to take advantage of the horizontal scalability of compute instances provided by AWS in a cost-efficient manner.

MXNet, on the other hand, provides a rich set of highly optimized and scalable building blocks to start implementing your own deep learning algorithms. Together, you can not only solve problems requiring large neural network models, but also cut down on iteration time by harnessing the seemingly unlimited compute resources in Amazon EC2.

With AWS Batch managing the resources on your behalf, you can easily implement workloads such as hyper-parameter optimization to fan out tens or even hundreds of searches in parallel to find the best set of model parameters for your problem space. Moreover, because your jobs are run inside Docker containers, you may choose the tools and libraries that best fit your needs, build a Docker image, and submit your jobs using the image of your choice.

We encourage you to try it yourself and let us know what you think!

Amazon ECS Events in February

Post Syndicated from Chris Barclay original

Here are some upcoming events for Amazon ECS this month:

Container World: Abby Fuller, senior AWS technical evangelist, will be speaking about Amazon ECS at Container World on Feb 21-23. Check out her schedule.

Microservices Day @ AWS NY Loft: Microservices Day is on Feb 24 as part of the DevOps | AWS Loft Architecture Week. Learn more about how to build and deploy microservices architectures on AWS. We will cover how to use Amazon ECS and AWS Lambda to build microservices. Signup here.

Seattle AWS Architects & Engineers Meetup: Join us Feb 28 at SURF Incubator to learn more about AWS Batch and Amazon ECS. Food and drinks provided. RSVP here.

Managing Secrets for Amazon ECS Applications Using Parameter Store and IAM Roles for Tasks

Post Syndicated from Chris Barclay original

Thanks to my colleague Stas Vonholsky  for a great blog on managing secrets with Amazon ECS applications.


As containerized applications and microservice-oriented architectures become more popular, managing secrets, such as a password to access an application database, becomes more challenging and critical.

Some examples of the challenges include:

  • Support for various access patterns across container environments such as dev, test, and prod
  • Isolated access to secrets on a container/application level rather than at the host level
  • Multiple decoupled services with their own needs for access, both as services and as clients of other services

This post focuses on newly released features that support further improvements to secret management for containerized applications running on Amazon ECS. My colleague, Matthew McClean, also published an excellent post on the AWS Security Blog, How to Manage Secrets for Amazon EC2 Container Service–Based Applications by Using Amazon S3 and Docker, which discusses some of the limitations of passing and storing secrets with container parameter variables.

Most secret management tools provide the following functionality:

  • Highly secured storage system
  • Central management capabilities
  • Secure authorization and authentication mechanisms
  • Integration with key management and encryption providers
  • Secure introduction mechanisms for access
  • Auditing
  • Secret rotation and revocation

Amazon EC2 Systems Manager Parameter Store

Parameter Store is a feature of Amazon EC2 Systems Manager. It provides a centralized, encrypted store for sensitive information and has many advantages when combined with other capabilities of Systems Manager, such as Run Command and State Manager. The service is fully managed, highly available, and highly secured.

Because Parameter Store is accessible using the Systems Manager API, AWS CLI, and AWS SDKs, you can also use it as a generic secret management store. Secrets can be easily rotated and revoked. Parameter Store is integrated with AWS KMS so that specific parameters can be encrypted at rest with the default or custom KMS key. Importing KMS keys enables you to use your own keys to encrypt sensitive data.

Access to Parameter Store is enabled by IAM policies and supports resource level permissions for access. An IAM policy that grants permissions to specific parameters or a namespace can be used to limit access to these parameters. CloudTrail logs, if enabled for the service, record any attempt to access a parameter.

While Amazon S3 has many of the above features and can also be used to implement a central secret store, Parameter Store has the following added advantages:

  • Easy creation of namespaces to support different stages of the application lifecycle.
  • KMS integration that abstracts parameter encryption from the application while requiring the instance or container to have access to the KMS key and for the decryption to take place locally in memory.
  • Stored history about parameter changes.
  • A service that can be controlled separately from S3, which is likely used for many other applications.
  • A configuration data store, reducing overhead from implementing multiple systems.
  • No usage costs.

Note: At the time of publication, Systems Manager doesn’t support VPC private endpoint functionality. To enforce stricter access to a Parameter Store endpoint from a private VPC, use a NAT gateway with a set Elastic IP address together with IAM policy conditions that restrict parameter access to a limited set of IP addresses.

IAM roles for tasks

With IAM roles for Amazon ECS tasks, you can specify an IAM role to be used by the containers in a task. Applications interacting with AWS services must sign their API requests with AWS credentials. This feature provides a strategy for managing credentials for your applications to use, similar to the way that Amazon EC2 instance profiles provide credentials to EC2 instances.

Instead of creating and distributing your AWS credentials to the containers or using the EC2 instance role, you can associate an IAM role with an ECS task definition or the RunTask API operation. For more information, see IAM Roles for Tasks.

You can use IAM roles for tasks to securely introduce and authenticate the application or container with the centralized Parameter Store. Access to the secret manager should include features such as:

  • Limited TTL for credentials used
  • Granular authorization policies
  • An ID to track the requests in the logs of the central secret manager
  • Integration support with the scheduler that could map between the container or task deployed and the relevant access privileges

IAM roles for tasks support this use case well, as the role credentials can be accessed only from within the container for which the role is defined. The role exposes temporary credentials and these are rotated automatically. Granular IAM policies are supported with optional conditions about source instances, source IP addresses, time of day, and other options.

The source IAM role can be identified in the CloudTrail logs based on a unique Amazon Resource Name and the access permissions can be revoked immediately at any time with the IAM API or console. As Parameter Store supports resource level permissions, a policy can be created to restrict access to specific keys and namespaces.

Dynamic environment association

In many cases, the container image does not change when moving between environments, which supports immutable deployments and ensures that the results are reproducible. What does change is the configuration: in this context, specifically the secrets. For example, a database and its password might be different in the staging and production environments. There’s still the question of how do you point the application to retrieve the correct secret? Should it retrieve prod.app1.secret, test.app1.secret or something else?

One option can be to pass the environment type as an environment variable to the container. The application then concatenates the environment type (prod, test, etc.) with the relative key path and retrieves the relevant secret. In most cases, this leads to a number of separate ECS task definitions.

When you describe the task definition in a CloudFormation template, you could base the entry in the IAM role that provides access to Parameter Store, KMS key, and environment property on a single CloudFormation parameter, such as “environment type.” This approach could support a single task definition type that is based on a generic CloudFormation template.

Walkthrough: Securely access Parameter Store resources with IAM roles for tasks

This walkthrough is configured for the North Virginia region (us-east-1). I recommend using the same region.

Step 1: Create the keys and parameters

First, create the following KMS keys with the default security policy to be used to encrypt various parameters:

  • prod-app1 –used to encrypt any secrets for app1.
  • license-key –used to encrypt license-related secrets.
aws kms create-key --description prod-app1 --region us-east-1
aws kms create-key --description license-code --region us-east-1

Note the KeyId property in the output of both commands. You use it throughout the walkthrough to identify the KMS keys.

The following commands create three parameters in Parameter Store:

  • prod.app1.db-pass (encrypted with the prod-app1 KMS key)
  • general.license-code (encrypted with the license-key KMS key)
  • prod.app2.user-name (stored as a standard string without encryption)
aws ssm put-parameter --name prod.app1.db-pass --value "AAAAAAAAAAA" --type SecureString --key-id "<key-id-for-prod-app1-key>" --region us-east-1
aws ssm put-parameter --name general.license-code --value "CCCCCCCCCCC" --type SecureString --key-id "<key-id-for-license-code-key>" --region us-east-1
aws ssm put-parameter --name prod.app2.user-name --value "BBBBBBBBBBB" --type String --region us-east-1

Step 2: Create the IAM role and policies

Now, create a role and an IAM policy to be associated later with the ECS task that you create later on.
The trust policy for the IAM role needs to allow the ecs-tasks entity to assume the role.

   "Version": "2012-10-17",
   "Statement": [
       "Sid": "",
       "Effect": "Allow",
       "Principal": {
         "Service": ""
       "Action": "sts:AssumeRole"

Save the above policy as a file in the local directory with the name ecs-tasks-trust-policy.json.

aws iam create-role --role-name prod-app1 --assume-role-policy-document file://ecs-tasks-trust-policy.json

The following policy is attached to the role and later associated with the app1 container. Access is granted to the prod.app1.* namespace parameters, the encryption key required to decrypt the prod.app1.db-pass parameter and the license code parameter. The namespace resource permission structure is useful for building various hierarchies (based on environments, applications, etc.).

Make sure to replace <key-id-for-prod-app1-key> with the key ID for the relevant KMS key and <account-id> with your account ID in the following policy.

     "Version": "2012-10-17",
     "Statement": [
             "Effect": "Allow",
             "Action": [
             "Resource": "*"
             "Sid": "Stmt1482841904000",
             "Effect": "Allow",
             "Action": [
             "Resource": [
             "Sid": "Stmt1482841948000",
             "Effect": "Allow",
             "Action": [
             "Resource": [

Save the above policy as a file in the local directory with the name app1-secret-access.json:

aws iam create-policy --policy-name prod-app1 --policy-document file://app1-secret-access.json

Replace <account-id> with your account ID in the following command:

aws iam attach-role-policy --role-name prod-app1 --policy-arn "arn:aws:iam::<account-id>:policy/prod-app1"

Step 3: Add the testing script to an S3 bucket

Create a file with the script below, name it and add it to an S3 bucket in your account. Make sure the object is publicly accessible and note down the object link, for example

#This is simple bash script that is used to test access to the EC2 Parameter store.
# Install the AWS CLI
apt-get -y install python2.7 curl
curl -O
pip install awscli
# Getting region
EC2_AVAIL_ZONE=`curl -s`
EC2_REGION="`echo \"$EC2_AVAIL_ZONE\" | sed -e 's:\([0-9][0-9]*\)[a-z]*\$:\\1:'`"
# Trying to retrieve parameters from the EC2 Parameter Store
APP1_WITH_ENCRYPTION=`aws ssm get-parameters --names prod.app1.db-pass --with-decryption --region $EC2_REGION --output text 2>&1`
APP1_WITHOUT_ENCRYPTION=`aws ssm get-parameters --names prod.app1.db-pass --no-with-decryption --region $EC2_REGION --output text 2>&1`
LICENSE_WITH_ENCRYPTION=`aws ssm get-parameters --names general.license-code --with-decryption --region $EC2_REGION --output text 2>&1`
LICENSE_WITHOUT_ENCRYPTION=`aws ssm get-parameters --names general.license-code --no-with-decryption --region $EC2_REGION --output text 2>&1`
APP2_WITHOUT_ENCRYPTION=`aws ssm get-parameters --names prod.app2.user-name --no-with-decryption --region $EC2_REGION --output text 2>&1`
# The nginx server is started after the script is invoked, preparing folder for HTML.
if [ ! -d /usr/share/nginx/html/ ]; then
mkdir -p /usr/share/nginx/html/;
chmod 755 /usr/share/nginx/html/

# Creating an HTML file to be accessed at http://<public-instance-DNS-name>/ecs.html
cat > /usr/share/nginx/html/ecs.html <<EOF
<!DOCTYPE html>
body {padding: 20px;margin: 0 auto;font-family: Tahoma, Verdana, Arial, sans-serif;}
code {white-space: pre-wrap;}
result {background: hsl(220, 80%, 90%);}
<h1>Hi there!</h1>
<p style="padding-bottom: 0.8cm;">Following are the results of different access attempts as expirienced by "App1".</p>

<p><b>Access to prod.app1.db-pass:</b><br/>
<pre><code>aws ssm get-parameters --names prod.app1.db-pass --with-decryption</code><br/>
<code>aws ssm get-parameters --names prod.app1.db-pass --no-with-decryption</code><br/>

<p><b>Access to general.license-code:</b><br/>
<pre><code>aws ssm get-parameters --names general.license-code --with-decryption</code><br/>
<code>aws ssm get-parameters --names general.license-code --no-with-decryption</code><br/>

<p><b>Access to prod.app2.user-name:</b><br/>
<pre><code>aws ssm get-parameters --names prod.app2.user-name --no-with-decryption</code><br/>

<p><em>Thanks for visiting</em></p>

Step 4: Create a test cluster

I recommend creating a new ECS test cluster with the latest ECS AMI and ECS agent on the instance. Use the following field values:

  • Cluster name: access-test
  • EC2 instance type: t2.micro
  • Number of instances: 1
  • Key pair: No EC2 key pair is required, unless you’d like to SSH to the instance and explore the running container.
  • VPC: Choose the default VPC. If unsure, you can find the VPC ID with the IP range in the Amazon VPC console.
  • Subnets: Pick a subnet in the default VPC.
  • Security group: Create a new security group with CIDR block and port 80 for inbound access.

Leave other fields with the default settings.

Create a simple task definition that relies on the public NGINX container and the role that you created for app1. Specify the properties such as the available container resources and port mappings. Note the command option is used to download and invoke a test script that installs the AWS CLI on the container, runs a number of get-parameter commands, and creates an HTML file with the results.

Replace <account-id> with your account ID, <your-S3-URI> with a link to the S3 object created in step 3 in the following commands:

aws ecs register-task-definition --family access-test --task-role-arn "arn:aws:iam::<account-id>:role/prod-app1" --container-definitions name="access-test",image="nginx",portMappings="[{containerPort=80,hostPort=80,protocol=tcp}]",readonlyRootFilesystem=false,cpu=512,memory=490,essential=true,entryPoint="sh,-c",command="\"/bin/sh -c \\\"apt-get update ; apt-get -y install curl ; curl -O <your-S3-URI> ; chmod +x ; ./ ; nginx -g 'daemon off;'\\\"\"" --region us-east-1

aws ecs run-task --cluster access-test --task-definition access-test --count 1 --region us-east-1

Verifying access

After the task is in a running state, check the public DNS name of the instance and navigate to the following page:


You should see the results of running different access tests from the container after a short duration.

If the test results don’t appear immediately, wait a few seconds and refresh the page.
Make sure that inbound traffic for port 80 is allowed on the security group attached to the instance.

The results you see in the static results HTML page should be the same as running the following commands from the container.


aws ssm get-parameters --names prod.app1.db-pass --with-decryption --region us-east-1
aws ssm get-parameters --names prod.app1.db-pass --no-with-decryption --region us-east-1

Both commands should work, as the policy provides access to both the parameter and the required KMS key.


aws ssm get-parameters --names general.license-code --no-with-decryption --region us-east-1
aws ssm get-parameters --names general.license-code --with-decryption --region us-east-1

Only the first command with the “no-with-decryption” parameter should work. The policy allows access to the parameter in Parameter Store but there’s no access to the KMS key. The second command should fail with an access denied error.


aws ssm get-parameters --names prod.app2.user-name –no-with-decryption --region us-east-1

The command should fail with an access denied error, as there are no permissions associated with the namespace for prod.app2.

Finishing up

Remember to delete all resources (such as the KMS keys and EC2 instance), so that you don’t incur charges.


Central secret management is an important aspect of securing containerized environments. By using Parameter Store and task IAM roles, customers can create a central secret management store and a well-integrated access layer that allows applications to access only the keys they need, to restrict access on a container basis, and to further encrypt secrets with custom keys with KMS.

Whether the secret management layer is implemented with Parameter Store, Amazon S3, Amazon DynamoDB, or a solution such as Vault or KeyWhiz, it’s a vital part to the process of managing and accessing secrets.

How to Automate Container Instance Draining in Amazon ECS

Post Syndicated from Chris Barclay original

My colleague Madhuri Peri sent a nice guest post that describes how to use container instance draining to remove tasks from an instance before scaling down a cluster with Auto Scaling Groups.

There are times when you might need to remove an instance from an Amazon ECS cluster; for example, to perform system updates, update the Docker daemon, or scale down the cluster size. Container instance draining enables you to remove a container instance from a cluster without impacting tasks in your cluster. It works by preventing new tasks from being scheduled for placement on the container instance while it is in the DRAINING state, replacing service tasks on other container instances in the cluster if the resources are available, and enabling you to wait until tasks have successfully moved before terminating the instance.

You can change a container instance’s state to DRAINING manually, but in this post, I demonstrate how to use container instance draining with Auto Scaling groups and AWS Lambda to automate the process.

Amazon ECS overview

Amazon ECS is a container management service that makes it easy to run, stop, and manage Docker containers on a cluster, or logical grouping of EC2 instances. When you run tasks using ECS, you place them on a cluster. Amazon ECS downloads your container images from a registry that you specify, and runs those images on the container instances within your cluster.

Using the container instance draining state

Auto Scaling groups support lifecycle hooks that can be invoked to allow custom processes to finish before instances launch or terminate. For this example, the lifecycle hook invokes a Lambda function that performs two tasks:

  1. Sets the ECS container instance state to DRAINING.
  2. Checks if there are any tasks left on the container instance. If there are running tasks still in process of draining, it posts a message to SNS so that the Lambda function is called again.

Lambda repeats step 2 until there are no tasks running on the container instance OR the heartbeat timeout on the lifecycle hook is reached (set to TTL 15 minutes in the sample CloudFormation template), whichever occurs first. Afterward, control is returned to the Auto Scaling lifecycle hook, and the instance terminates. This process is shown in the following diagram:

Try it out!

Use the CloudFormation template to set up the resources described in this post. To use the CloudFormation template you will need to upload the Lambda deployment package to an S3 bucket in your account. This template creates the following resources:

  • The VPC and associated network elements (subnets, security groups, route table, etc.)
  • An ECS cluster, ECS service, and sample ECS task definition
  • An Auto Scaling group with two EC2 instances and a termination lifecycle hook
  • A Lambda function
  • An SNS topic
  • IAM roles for Lambda to execute

Create the CloudFormation stack and then see how this works by triggering an instance termination event.

In the Amazon EC2 console, choose Auto Scaling Groups and select the name of the Auto Scaling group created by CloudFormation (from the resources section of the CloudFormation template).

Select Actions, Edit and update the service to reduce the desired number of instances by “1”. This initiates one of the instances’ termination process.

Select the Auto Scaling group Instances tab; one instance state value should show the lifecycle state “Terminating:Wait”.

This is when the lifecycle hook gets activated and posts a message to SNS. The Lambda function is then executed in response to the SNS message trigger.

The Lambda function changes the ECS container instance state to DRAINING. The ECS service scheduler then stop the tasks on the instance and starts tasks on an available instance.

You can go to the ECS console to confirm that the container instance state is DRAINING.

After the tasks have drained, the Auto Scaling group activity history confirms that the EC2 instance is terminated.

How it works

Take a moment to see the inner workings of the Lambda function. The function first checks to see if the event received has a LifecycleTransition value matching autoscaling:EC2_INSTANCE_TERMINATING.

# If the event received is instance terminating...
if 'LifecycleTransition' in message.keys():
print("message autoscaling {}".format(message['LifecycleTransition']))
if message['LifecycleTransition'].find('autoscaling:EC2_INSTANCE_TERMINATING') > -1:

If there is a match, it proceeds to call the function “checkContainerInstanceTaskStatus”. This function gets the container instance ID of the EC2 instance ID received, and sets the container instance state to ‘DRAINING’.

# Get lifecycle hook name
lifecycleHookName = message['LifecycleHookName']
print("Setting lifecycle hook name {} ".format(lifecycleHookName))

# Check if there are any tasks running on the instance
tasksRunning = checkContainerInstanceTaskStatus(Ec2InstanceId)

It then checks to see if there are tasks running on the instance. If there are tasks, it publishes a message to the SNS topic to trigger the Lambda function again and then exits.

# Use Task ARNs to get describe tasks
descTaskResp = ecsClient.describe_tasks(cluster=clusterName, tasks=listTaskResp['taskArns'])
for key in descTaskResp['tasks']:
print("Task status {}".format(key['lastStatus']))
print("Container instance ARN {}".format(key['containerInstanceArn']))
print("Task ARN {}".format(key['taskArn']))

# Check if any tasks are running
if len(descTaskResp['tasks']) > 0:
print("Tasks are still running..")
return 1
print("NO tasks are on this instance {}..".format(Ec2InstanceId))
return 0

When the Lambda function sees that no more tasks are running on the container instance, it proceeds to complete the lifecycle hook and terminate the EC2 instance.

#Complete lifecycle hook.
response = asgClient.complete_lifecycle_action(
print("Response = {}".format(response))
print("Completedlifecycle hook action")
except Exception, e:


Container instance draining simplifies cluster scale-down and operational activities such as new AMI rollouts. For example, with the integration described in this post, you could use CloudFormation and CodePipeline to create a rolling deployment that launches new instances and terminates instances in batches.

To learn more about container instance draining, see the Amazon ECS Developer Guide.

If you have questions or suggestions, please comment below.

Continuous Deployment to Amazon ECS using AWS CodePipeline, AWS CodeBuild, Amazon ECR, and AWS CloudFormation

Post Syndicated from Chris Barclay original

Thanks to my colleague John Pignata for a great blog on how to create a continuous deployment pipeline to Amazon ECS.

Delivering new iterations of software at a high velocity is a competitive advantage in today’s business environment. The speed at which organizations can deliver innovations to customers and adapt to changing markets is increasingly a pivotal attribute that can make the difference between success and failure.

AWS provides a set of flexible services designed to enable organizations to embrace the combination of cultural philosophies, practices, and tools called DevOps that increases an organization’s ability to deliver applications and services at high velocity.

In this post, I explore the DevOps practice called continuous deployment and outline a reference architecture to implement an automated deployment pipeline for applications delivered as Docker containers onto Amazon ECS using AWS CodePipeline, AWS CodeBuild, and AWS CloudFormation.

What is continuous deployment?

Agility is often cited as a key advantage of cloud computing over the traditional delivery of IT resources. Instead of waiting weeks or months for other departments to provision a new server, developers can create new instances with a click or API call and start using it within minutes. This newfound speed and autonomy frees developers to experiment and deliver new products and features to their customers as quickly as possible.

On top of the cloud, teams are embracing DevOps practices in order to achieve a faster time-to-market, better code quality, and more reliable releases of their products and services. Continuous deployment is a DevOps practice in which new software revisions are automatically built, tested, packaged, and released to production.

Continuous deployment enables developers to ship features and fixes through an entirely automated software release process. Instead of batching up large releases over a period of weeks or months and conducting deployments manually, developers can use automation to deliver versions of their applications many times a day as new software revisions are ready for users. In the same way cloud computing abbreviates the delivery time of resources, continuous deployment reduces the release cycle of new software to your users from weeks or months to minutes.

Embracing this speed and agility has many benefits including:

  • New features and bug fixes are released to users quickly; code sitting in a source code repository does not deliver business value or benefit your customers. By releasing new software revisions as close to immediately as possible, customers start benefiting from your work more quickly and teams can get more focused feedback.
  • Change sets are smaller; large change sets create challenges in pinpointing root causes of issues, bugs, and other regressions. By releasing smaller change sets more frequently, teams can more easily attribute and correct introduced issues.
  • Automated deployment encourages best practices; as any change committed to your source code repository can be deployed immediately via automation, teams have to ensure that changes are well-tested and that their production environments are closely monitored.

How does continuous deployment work?

Continuous deployment is conducted by an automated pipeline that coordinates the activities related to software release and provides visibility into the process. During the process, a releasable artifact is built, tested, packaged, and deployed into a production environment. The releasable artifact might be an executable file, a package of script files, a container, or some other component that ultimately must be delivered to production.

AWS CodePipeline is a continuous delivery and deployment service that coordinates the building, testing, and deployment of your code each time there is a new software revision. CodePipeline provides visible, central orchestration for taking a code change and moving it through a workflow and ultimately into the hands of your users. The pipeline defines stages to retrieve code from a source code repository, build the source code into a releasable artifact, test that artifact, and deliver it to production while ensuring that these stages happen in order and are halted if a failure occurs.

While CodePipeline powers the delivery pipeline and orchestrates the process, it does not have facilities for building or testing the software itself. For these stages, CodePipeline integrates with several other tools, including AWS CodeBuild, which is a fully managed build service. CodeBuild compiles source code, runs tests, and produces software packages that are ready to deploy. That makes it ideal for the build and test stages of a continuous deployment pipeline. Out of the box, CodeBuild has native support for many different kinds of build environments, including building Docker containers.

Containers are a powerful mechanism for software delivery, as they allow for a predictable and reproducible environment and provide a high level of confidence that changes tested in one environment can be successfully deployed. AWS provides several services to run and manage Docker container images. Amazon ECS is a highly scalable and high performance container management service that allows you to run applications on a cluster of Amazon EC2 instances. Amazon ECR is a fully managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images.

Finally, CodePipeline integrates with several services to facilitate deployment, including AWS Elastic Beanstalk, AWS CodeDeploy, AWS OpsWorks, and your own custom deployment code or process using AWS Lambda or AWS CloudFormation. These deployment actions can be used to power the final step in your pipeline to push the newly built changes live onto your production environment.

Continuous deployment to Amazon ECS

Here’s a reference architecture that puts these components together to deliver a continuous deployment pipeline of Docker applications onto ECS:

This architecture demonstrates how to deploy containers onto ECS and ECR using CodePipeline to build a fully automated continuous deployment pipeline on top of AWS. This approach to continuous deployment is entirely serverless and uses managed services for the orchestration, build, and deployment of your software.

The pipeline created in the reference architecture looks like the following:

In this post, I discuss each stage in this reference architecture. What happens when a developer changes some copy on a landing page and pushes that change into the source code repository?

First, in the Source stage, the pipeline is configured with details for accessing a source code repository system. In the reference architecture, you have a sample application hosted in a GitHub repository. CodePipeline polls this repository and initiates a new pipeline execution for each new commit. In addition to GitHub, CodePipeline also supports source locations such as a Git repository in AWS CodeCommit or a versioned object stored in Amazon S3. Each new build is retrieved from the source code repository, packaged as a zip file, stored on S3, and sent to the next stage of the pipeline.

The Source stage also defines a template artifact stored on Amazon S3. This is the template that defines the deployment environment used by the deployment stage after a successful build of the application.

The Build stage uses CodeBuild to create a new Docker container image based upon the latest source code and pushes it to an ECR repository. CodePipeline also integrates with a number of third-party build systems, such as Jenkins, CloudBees, Solano CI, and TeamCity.

Finally, the Deploy stage uses CloudFormation to create a new task definition revision that points to the newly built Docker container image and updates the ECS service to use the new task definition revision. After this is done, ECS initiates a deployment by fetching the new Docker container from ECR and restarting the service.

After all of the pipeline’s stages are green, you can reload the application in a web browser and see the developer’s copy changes live in production. This happened automatically without any human invention.

This pipeline is now in production, listening for new code in the source code repository, and ready to ship any future changes that your team pushes into production. It’s also extensible, meaning that new stages can be added to include additional steps. For example, you could include a test stage to execute unit and acceptance tests to ensure the new code revision is safe to deploy to production. After it’s deployed, a notification step could be added to alert your team via email or a Slack channel that a new version is live, along with the details about the change set deployed to production.


We’re excited to see what kinds of applications you can deliver to your users using this approach and how it affects your product development processes. The cloud unlocks massive advantages in agility, and the ability to implement techniques like continuous deployment unlocks a significant competitive advantage.

You’ll find an AWS CloudFormation template with everything necessary to spin up your own continuous deployment pipeline at the AWS Labs EC2 Container Service – Reference Architecture: Continuous Deployment repo on GitHub. If you have any questions, feedback, or suggestions, please let us know!

Amazon EC2 Container Service at AWS re:Invent 2016 – Wrap-up

Post Syndicated from Chris Barclay original

We wanted to summarize a few of the highlights from this year’s AWS re:Invent.


On Thursday December 1, Werner Vogels announced two new features for Amazon ECS.

Blox is a new open source project that enables users to build custom schedulers and other tooling on top of Amazon ECS. Our goal with Blox is to provide tools that simplify the creation of custom schedulers, dashboards and other extensions, so that customers can meet the needs of their specific use cases. Werner also announced that new task placement strategies are coming later this year. Watch the keynote or see the AWS Compute blog for more details on these announcements.

Werner also announced three other services that can be used with Amazon ECS. EC2 Systems Manager parameter store provides a centralized, encrypted store for sensitive information​ that can be used to configure microservices; see the docs for more info. CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages and Docker images that are ready to deploy; see the docs for more info. AWS X-Ray helps developers analyze and debug production, distributed applications, such as those built using a microservices architecture; see the docs for more info on how to use X-Ray with ECS.


There were multiple sessions that included deep information about Amazon ECS:

CON301 – Operations Management with Amazon ECS [video]
CON302 – Development Workflow with Docker and Amazon ECS [video]
CON303 – Introduction to Container Management on AWS [video]
CON307 – Advanced Task Scheduling with Amazon ECS and Blox [video]
CON308 – Service Integration Delivery and Automation Using Amazon ECS [video]
CON309 – Running Microservices on Amazon ECS [video]
CON310 – Running Batch Jobs on Amazon ECS [video]
CON311 – Operations Automation and Infrastructure Management with Amazon ECS [video]
CON312 – Deploying Scalable SAP Hybris Clusters using Docker [video]
CON313 – Netflix: Container Scheduling, Execution, and Integration with AWS [video]
CON316 – State of the Union: Containers [video]
CON401 – Amazon ECR Deep Dive on Image Optimization [video]
CON402 – Securing Container-Based Applications [video]
DEV313 – Infrastructure Continuous Deployment Using AWS CloudFormation [video]
GAM401 – Riot Games: Standardizing Application Deployments Using Amazon ECS and Terraform [video]
NET203 – From EC2 to ECS: How Capital One uses Application Load Balancer Features to Serve Traffic at Scale [video]

We enjoyed meeting everyone at re:Invent and appreciate all the feedback you had about Amazon ECS, and look forward to hearing about how you use the new features we announced.

— The Amazon ECS Team

Introducing Blox from Amazon EC2 Container Service

Post Syndicated from Chris Barclay original

Today we are announcing Blox, a new open source project from the Amazon ECS team that enables users to build custom schedulers and other tooling on top of ECS. Our goal with Blox is to provide tools that simplify the creation of custom schedulers, dashboards and other extensions, so that customers can meet the needs of their specific use cases.

ECS recently announced the availability of an event stream that delivers ECS container instance and task state changes to Amazon CloudWatch Events. Customers that build scheduling workflows often need to consume the events generated in the ECS cluster, persist this state locally and operate on the local cluster state. Blox includes a cluster-state-service that provides this functionality and offers REST APIs on top of the local cluster state. Blox is targeted at developers that want to build custom schedulers or processes that need the current state of resources in the ECS cluster and developers that want to take action based on cluster events.

Blox also ships with a daemon-scheduler that supports launching one and only one copy of a task across all container instances in an ECS cluster. The scheduler monitors for new container instances joining the cluster and will place the task on them. Blox daemon-scheduler enables tasks like log agents and metric collection agents to run on ECS clusters.

We are excited to release Blox as open source software and plan to build an ecosystem of tools around ECS. If you are interested in using or contributing to Blox, come visit the Blox GitHub repository. We are tracking a number of feature proposals that we are evaluating for the roadmap. We invite you to come participate in our GitHub repository and help identify and prioritize improvements.

Deploying Blox

Blox Deployment on a Local Environment

Our recommended way for getting started with Blox is to deploy the framework on your local Docker installation. Blox offers a Docker Compose file that enables deployment in local environments. This allows you to get started with building custom schedulers using the cluster-state-service.

Here is the Blox architecture when run locally:

  • ECS pushes the cluster state changes as CloudWatch events.
  • CloudWatch events is configured to send to the SQS Queue.
  • Blox cluster-state-service consumes these events and recreates and stores the cluster state locally and offers REST APIs.
  • Blox daemon-scheduler uses the cluster-state-service APIs to track container instances in ECS cluster and launch tasks on them.

Step 1: Create SQS Queue and Configure CloudWatch Events to send ECS events to the SQS Queue

Blox depends on an ECS event stream that is delivered via CloudWatch events. In order to use Blox, you need to create an SQS queue and configure CloudWatch to deliver the ECS events to this SQS queue. Blox provides a pre-built AWS CloudFormation template that will deploy and configure the required Amazon AWS components. Once you have pulled the CloudFormation template from the Blox repository, run the following command using the AWS CLI:

$ aws --region  cloudformation create-stack --stack-name BloxLocal --template-body file://cloudformation_template.json

In a few minutes, the CloudFormation template will finish setting up the CloudWatch event and the SQS queue and you will be ready to deploy Blox.

Step 2: Launch Blox

Next, download the Docker Compose file from the Blox repo. Before launching Blox, you will first need to update docker-compose.yml with the following changes:

  • Update the AWS_REGION value with the region of your ECS and SQS resources.
  • Update the AWS_PROFILE value with your profile name in ~/.aws/credentials. You can skip this step if you are using the default profile.

After you have updated docker-compose.yml, you can use the following commands to launch the Blox containers on your local Docker environment.

# From the folder where you downloaded docker-compose.yml
$ docker-compose up –d
$ docker-compose ps

You will see output that shows the Blox cluster-state-service, daemon-scheduler, and etcd storage:

Name             Command                          State   Ports
etcd_1        /usr/local/bin/etcd --data ...   Up      2379/tcp, 2380/tcp
scheduler_1   --bind --css- ...   Up>2000/tcp
css_1         --bind --etcd ...   Up      3000/tcp

You have now completed the local installation of Blox. You can begin consuming the Scheduler API at http://localhost:2000/.

Using the daemon-scheduler

The daemon-scheduler uses the following concepts:

  • An environment represents the configuration for desired state of the tasks to be maintained. For daemon-scheduler, the environment indicates the task definition to launch in a specific cluster.
  • A deployment is the operation that brings the environment into existence. A deployment indicates to the scheduler that the desired configuration state in the environment should be established in the cluster.

Step 1: Create an ECS cluster

If you don’t have an ECS cluster, follow our Create Cluster guide.

Step 2: Register Task Definition

In order to launch tasks in ECS cluster, you need to register a task definition with ECS. Here is a Task definition you can use, if you don’t have one already.

$ cat > /tmp/nginx.json << EOF
    "family": "nginx",
    "containerDefinitions": [{
        "name": "nginx",
        "image": "nginx",
        "cpu": 1024,
        "memory": 128
    } ]

$ aws ecs register-task-definition --cli-input-json file:///tmp/nginx.json

Query the ARN for the nginx task definition. You need this for the next step.

$ aws ecs list-task-definitions
   "taskDefinitionArns": [

Launch Daemon workloads using the daemon-scheduler

For this exercise, we will be using the demo-cli that Blox provides to interact with the scheduler. Please consult the Blox GitHub repository regarding the APIs that the daemon-scheduler exposes.

Step 3: Create an environment
Create an environment by replacing the cluster name and task definition ARN in the following command:

./ --environment TestEnvironment --cluster <MyClusterName> --task-definition <task-def-arn>

Sample output:

  "items": [
      "deploymentToken": "17fb6b8b-abf3-4e7b-b9f4-fdb431d53887",
      "health": "healthy",
      "name": "releaseenvironment",
      "instanceGroup": {
        "cluster": "arn:aws:ecs:us-west-2:203719379804:cluster/BloxTestCluster-1123-2"

Upon successful creation of the environment, the daemon-scheduler response will have a deploymentToken that will be used in our next step.

Step 4: Create a Deployment

In order to bring this environment into existence in your ECS cluster, you need to perform a deployment operation:

./ --environment TestEnvironment --deploymentToken <deploymentToken>

Creating a deployment will result in the scheduler launching the task definition attached to the environment across all the container instances in your cluster. You can now go to the ECS console and check out the tasks running on your container instances. You have now successfully used the daemon-scheduler to launch daemon workloads in your ECS cluster.

Blox Deployment on AWS

Blox can also be deployed on AWS. Use the Blox CloudFormation template to create:

  • A new ECS cluster with one container instance on which the Blox components are setup as an ECS service with cluster-state-service, daemon-scheduler and etcd containers making up a single task.
  • An Application Load Balancer is created in front of the daemon-scheduler endpoint.
  • An API Gateway is set up as the public facing frontend to Blox and provides the authentication mechanism. This API Gateway can be used to reach the scheduler and manage tasks on the ECS cluster.
  • A Lambda function that acts as a simple proxy enables the public facing API Gateway endpoint to forward requests onto the ALB listener in the VPC.

This Blox deployment can then be used to manage ECS clusters associated with the account.

See the instructions in our GitHub repo for the steps to configure this option.

Available now

Blox is available now. To learn more, see the Blox documentation in our GitHub repo.

Automated Cleanup of Unused Images in Amazon ECR

Post Syndicated from Chris Barclay original

Thanks to my colleague Anuj Sharma for a great blog on cleaning up old images in Amazon ECR using AWS Lambda.

When you use Amazon ECR as part of a container build and deployment pipeline, a new image is created and pushed to ECR on every code change. As a result, repositories tend to quickly fill up with new revisions. Each Docker push command creates a version that may or may not be in use by Amazon ECS tasks, depending on which version of the image is actually used to run the task.

It’s important to have an automated way to discover the image that is in use by running tasks and to delete the stale images. You may also want to keep a predetermined number of images in your repositories for easy fallback on a previous version. In this way, repositories can be kept clean and manageable, and save costs by cleaning undesired images.

In this post, I show you how to query running ECS tasks, identify the images which are in use, and delete the remaining images in ECR. To keep a predetermined number of images irrespective of images in use by tasks, the solution should keep the latest n number of images. As ECR repositories may be hosted in multiple regions, the solution should also query either all available regions where ECR is available or only a specific region. You should be able to identify and print the images to be deleted, so that extra caution can be exercised. Finally, the solution should run at a scheduled time on a reoccurring basis, if required.

Cleaning solution

The solution contains two components:

Python script

The Python script is available from the AWS Labs repository. The logic coded is as follows:

  • Get a list of all the ECS clusters using ListClusters API operation.
  • For each cluster, get a list of all the running tasks using ListTasks.
  • Get the ARNs for each running task by calling DescribeTasks.
  • Get the container image for each running task using DescribeTaskDefinition.
  • Filter out only those container images that contain “.dkr.ecr.” and “:” . This ensures that you get a list of all the containers running on ECR that have an associated tag.
  • Get a list of all the repositories using DescribeRepositories.
  • For each repository, get the imagePushedAt value, tags, and SHA for every image using DescribeImages.
  • Ignore those images from the list that have a “latest” tag or which are currently running (as discovered in the earlier steps).
  • Delete the images that have the tags as discovered earlier, using BatchDeleteImage.
    • The -dryrun flag prints out the images marked for deletion, without deleting them. Defaults to True, which only prints the images to be deleted.
    • The -region flag deletes images only in the specified region. If not specified, all regions where ECS is available are queried and images deleted. Defaults to None, which queries all regions.
    • The -imagestokeep flag keeps the latest n number of specified images and does not deletes them. Defaults to 100. For example, if –imagestokeep 20 is specified, the last 20 images in each repository are not deleted and the remaining images that satisfy the logic mentioned above are deleted.

Lambda function

The Lambda function runs the Python script, which is executed using CloudWatch scheduled events. Here’s a diagram of the workflow:

(1) The build system builds and pushes the container image. The build system also tags the image being pushed, either with the source control commit SHA hash value or the incremental package version. This gives full control over running a versioned container image in the task definition. The versioned images (images tagged in this way version the container images) are further used to run the tasks, using your own scheduler or by using the ECS scheduling capabilities.
(2) The Lambda function gets invoked by CloudWatch Events at the specified time and queries all available clusters.
(3) Based on the coded Python logic, the container image tags being used by running tasks are discovered.
(4) The Lambda function also queries all images available in ECR and, based on the coded logic, decides on the image tags to be cleaned.
(5) The Lambda function deletes the images as discovered by the coded logic.

Important: It’s a good practice to run the task definition with an image, which has a version in image tag, so that there is a better visibility and control over the version of running container. The build system can tag the image with the source code version or package version before pushing the image so that each image has the version tagged with each push, which can be used further to run the task. The script assumes that the task definitions have a versioned container specified. I recommend running the script first with the –dryrun flag to identify the images to be deleted.

Usage examples

Prints the images that are not used by running tasks and which are older than the last 100 versions, in all regions: python

Deletes the images that are not used by running tasks and which are older than the last 100 versions, in all regions: python -dryrun False

Deletes the images that are not used by running tasks and which are older than the last 20 versions (in each repository), in all regions: python -dryrun False -imagestokeep 20

Deletes the images that are not used by running tasks and which are older than the last 20 versions (in each repository), in Oregon only: python -dryrun False -imagestokeep 20 -region us-west-2


I’ve made the following assumptions for this solution:

  • Your ECR repository is in the same account as the running ECS clusters.
  • Each container image that has been pushed is tagged with a unique tag.

Docker tags and container image names are mutable. Multiple names can point to the same underlying image and the same image name can point to different underlying images at different points in time.

To trace the version of running code in a container to the source code, it’s a good practice to push the container image with a unique tag, which could be the source control commit SHA hash value or a package’s incremental version number; never reuse the same tag. In this way, you have better control over identifying the version of code to run in the container image, with no ambiguity.

Lambda function deployment

Now it’s time to create and deploy the Lambda function. You can do this through the console or the AWS CLI.

Using the AWS Management Console

Use the console to create a policy, role, and function.

Create a policy

Create a policy with the following policy test. This defines the access for the necessary APIs.

Sign in to the IAM console and choose Policies, Create Policy, Create Your Own Policy. Copy and paste the following policies, type a name, and choose Create.

    "Version": "2012-10-17",
    "Statement": [
            "Sid": "Stmt1476919244000",
            "Effect": "Allow",
            "Action": [
            "Resource": [
            "Sid": "Stmt1476919353000",
            "Effect": "Allow",
            "Action": [
            "Resource": [

Create a Lambda role

Create a role for the Lambda function, using the policy that you just created.

In the IAM console, choose Roles and enter a name, such as LAMBDA_ECR_CLEANUP. Choose AWS Lambda, select your custom policy, and choose Create Role.

Create a Lambda function

Create a Lambda function, with the role that you just created and the Python code available from the Lambda_ECR_Cleanup repository. For more information, see the Lambda function handler page in the Lambda documentation.

Schedule the Lambda function

Add the trigger for your Lambda function.

In the CloudWatch console, choose Events, Create rule, and then under Event selector, choose Schedule. For example, you can put the cron schedule as * 22 * * * to run the function every day at 10PM. For more information, see Scenario 2: Take Scheduled EBS Snapshots.

Using the AWS CLI

Use the following code examples to create the Lambda function using the AWS CLI. Replace values as needed.

Create a policy

Create a file with the following contents, called LAMBDA_ECR_DELETE.json. You reference this file when you create the IAM role.

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Principal": {
        "Service": [
      "Action": [

      "Resource": ["*"]


Create a Lambda role

aws iam create-role --role-name LAMBDA_ECR_DELETE --assume-role-policy-document file://LAMBDA_ECR_DELETE.json

Create a Lambda function

aws lambda create-function --function-name {NAME_OF_FUNCTION} --runtime python2.7 
    --role {ARN_NUMBER} --handler main.handler --timeout 15 
    --zip-file fileb://{ZIP_FILE_PATH}

Schedule a Lambda function

For more information, see Scenario 6: Run an AWS Lambda Function on a Schedule Using the AWS CLI.


The SDK for Python provides methods and access to ECS and ECR APIs, which can be leveraged to write logic to identify stale container images. Lambda can be used to run the code and avoid provisioning any servers. CloudWatch Events can be used to trigger the Lambda code on a recurring schedule, to keep your repositories clean and manageable and control costs.

If you have questions or suggestions, please comment below.

Create and Manage Clusters on the Amazon ECS Console

Post Syndicated from Chris Barclay original

We recently added three Amazon ECS console improvements to help you create and manage clusters.

Resource provisioning

The first change is a wizard for creating clusters that takes care of provisioning all the resources required by the cluster such as the Auto Scaling group, VPC, subnets, and security group. You can also choose to use existing VPCs and subnets. Now, any cluster that you create requires only a few inputs and the console can create the resources for you. You can also scale the number of container instances in the cluster by changing the desired count on the Container Instance tab.

Cluster dashboard

The second change is a new console dashboard that provides visibility into the state of your clusters, including CPU, memory utilization, and the number of running tasks. You can now get an overview of your clusters at a glance, without clicking through multiple screens.

ECS agent version

The third change is better visibility into the ECS agent version. The ECS agent runs on each instance within an ECS cluster. It sends information about the instance’s current running tasks and resource utilization to ECS, and starts and stops tasks whenever it receives a request from ECS. Now, you’ll see when there’s a new agent version available, and be able to take advantage of the latest functionality.

We hope you enjoy these new console features. If you have questions or suggestions, please comment below.

Monitor Cluster State with Amazon ECS Event Stream

Post Syndicated from Chris Barclay original

Thanks to my colleague Jay Allen for this great blog on how to use the ECS Event stream for operational tasks.


In the past, in order to obtain updates on the state of a running Amazon ECS cluster, customers have had to rely on periodically polling the state of container instances and tasks using the AWS CLI or an SDK. With the new Amazon ECS event stream feature, it is now possible to retrieve near real-time, event-driven updates on the state of your Amazon ECS tasks and container instances. Events are delivered through Amazon CloudWatch Events, and can be routed to any valid CloudWatch Events target, such as an AWS Lambda function or an Amazon SNS topic.

In this post, I show you how to create a simple serverless architecture that captures, processes, and stores event stream updates. You first create a Lambda function that scans all incoming events to determine if there is an error related to any running tasks (for example, if a scheduled task failed to start); if so, the function immediately sends an SNS notification. Your function then stores the entire message as a document inside of an Elasticsearch cluster using Amazon Elasticsearch Service, where you and your development team can use the Kibana interface to monitor the state of your cluster and search for diagnostic information in response to issues reported by users.

Understanding the structure of event stream events

An ECS event stream sends two types of event notifications:

  • Task state change notifications, which ECS fires when a task starts or stops
  • Container instance state change notifications, which ECS fires when the resource utilization or reservation for an instance changes

A single event may result in ECS sending multiple notifications of both types. For example, if a new task starts, ECS first sends a task state change notification to signal that the task is starting, followed by a notification when the task has started (or has failed to start); additionally, ECS also fires container instance state change notifications when the utilization of the instance on which ECS launches the task changes.

Event stream events are sent using CloudWatch Events, which structures events as JSON messages divided into two sections: the envelope and the payload. The detail section of each event contains the payload data, and the structure of the payload is specific to the event being fired. The following example shows the JSON representation of a container state change event. Notice that the properties at to the top level of the JSON document describe event properties, such as the event name and time the event occurred, while the detail section contains the information about the task and container instance that triggered the event.

The following JSON depicts an ECS task state change event signifying that the essential container for a task running on an ECS cluster has exited, and thus the task has been stopped on the ECS cluster:

  "version": "0",
  "id": "8f07966c-b005-4a0f-9ee9-63d2c41448b3",
  "detail-type": "ECS Task State Change",
  "source": "aws.ecs",
  "account": "244698725403",
  "time": "2016-10-17T20:29:14Z",
  "region": "us-east-1",
  "resources": [
  "detail": {
    "clusterArn": "arn:aws:ecs:us-east-1:123456789012:cluster/eventStreamTestCluster",
    "containerInstanceArn": "arn:aws:ecs:us-east-1:123456789012:container-instance/f813de39-e42c-4a27-be3c-f32ebb79a5dd",
    "containers": [
        "containerArn": "arn:aws:ecs:us-east-1:123456789012:container/4b5f2b75-7d74-4625-8dc8-f14230a6ae7e",
        "exitCode": 1,
        "lastStatus": "STOPPED",
        "name": "web",
        "networkBindings": [
            "bindIP": "",
            "containerPort": 80,
            "hostPort": 80,
            "protocol": "tcp"
        "taskArn": "arn:aws:ecs:us-east-1:123456789012:task/cdf83842-a918-482b-908b-857e667ce328"
    "createdAt": "2016-10-17T20:28:53.671Z",
    "desiredStatus": "STOPPED",
    "lastStatus": "STOPPED",
    "overrides": {
      "containerOverrides": [
          "name": "web"
    "startedAt": "2016-10-17T20:29:14.179Z",
    "stoppedAt": "2016-10-17T20:29:14.332Z",
    "stoppedReason": "Essential container in task exited",
    "updatedAt": "2016-10-17T20:29:14.332Z",
    "taskArn": "arn:aws:ecs:us-east-1:123456789012:task/cdf83842-a918-482b-908b-857e667ce328",
    "taskDefinitionArn": "arn:aws:ecs:us-east-1:123456789012:task-definition/wpunconfiguredfail:1",
    "version": 3

Setting up an Elasticsearch cluster

Before you dive into the code for handling events, set up your Elasticsearch cluster. On the console, choose Elasticsearch Service, Create a New Domain. In Elasticsearch domain name, type elasticsearch-ecs-events, then choose Next.

For Step 2: Configure cluster, accept all of the defaults by choosing Next.

For Step 3: Set up access policy, choose Next. This page lets you establish a resource-based policy for accessing your cluster; to allow access to the cluster’s actions, use an identity-based policy associated with your Lambda function.

Finally, on the Review page, choose Confirm and create. This starts spinning up your cluster.

While your cluster is being created, set up the SNS topic and Lambda function you need to start capturing and issuing notifications about events.

Create an SNS topic

Because your Lambda function emails you when a task fails unexpectedly due to an error condition, you need to set up an Amazon SNS topic to which your Lambda function can write.

In the console, choose SNS, Create Topic. For Topic name, type ECSTaskErrorNotification, and then choose Create topic.

When you’re done, copy the Topic ARN value, and save it to a text editor on your local desktop; you need it to configure permissions for your Lambda function in the next step. Finally, choose Create subscription to subscribe to an email address for which you have access, so that you receive these event notifications. Remember to click the link in the confirmation email, or you won’t receive any events.

The eagle-eyed among you may realize that you haven’t given your future Lambda function permission to call your SNS topic. You grant this permission to the Lambda execution role when you create your Lambda function in the following steps.

Handling event stream events in a Lambda function

For the next step, create your Lambda function to capture events. Here’s the code for your function (written in Python 2.7):

import requests
import json
from requests_aws_sign import AWSV4Sign
from boto3 import session, client
from elasticsearch import Elasticsearch, RequestsHttpConnection

es_host = '<insert your own Amazon ElasticSearch endpoint here>'
sns_topic = '<insert your own SNS topic ARN here>'

def lambda_handler(event, context):
    # Establish credentials
    session_var = session.Session()
    credentials = session_var.get_credentials()
    region = session_var.region_name or 'us-east-1'

    # Check to see if this event is a task event and, if so, if it contains
    # information about an event failure. If so, send an SNS notification.
    if "detail-type" not in event:
        raise ValueError("ERROR: event object is not a valid CloudWatch Logs event")
        if event["detail-type"] == "ECS Task State Change":
            detail = event["detail"]
            if detail["lastStatus"] == "STOPPED":
                if detail["stoppedReason"] == "Essential container in task exited":
                  # Send an error status message.
                  sns_client = client('sns')
                      Subject="ECS task failure detected for container",

    # Elasticsearch connection. Note that you must sign your requests in order
    # to call the Elasticsearch API anonymously. Use the requests_aws_sign
    # package for this.
    service = 'es'
    auth=AWSV4Sign(credentials, region, service)
    es_client = Elasticsearch(host=es_host,

    es_client.index(index="ecs-index", doc_type="eventstream", body=event)

Break this down: First, the function inspects the event to see if it is a task change event. If so, it further looks to see if the event is reporting a stopped task, and whether that task stopped because one of its essential containers terminated. If these conditions are true, it sends a notification to the SNS topic that you created earlier.

Second, the function creates an Elasticsearch connection to your Amazon ES instance. The function uses the requests_aws_sign library to implement Sig4 signing because, in order to call Amazon ES, you need to sign all requests with the Sig4 signing process. After the Sig4 signature is generated, the function calls Amazon ES and adds the event to an index for later retrieval and inspection.

To get this code to work, your Lambda function must have permission to perform HTTP POST requests against your Amazon ES instance, and to publish messages to your SNS topic. Configure this by setting up your Lambda function with an execution role that grants the appropriate permission to these resources in your account.

To get started, you need to prepare a ZIP file for the above code that contains both the code and its prerequisites. Create a directory named lambda_eventstream, and save the code above to a file named In your favorite text editor, replace the es_host and sns_topic variables with your own Amazon ES endpoint and SNS topic ARN, respectively.
Next, on the command line (Linux, Windows or Mac), change to the directory that you just created, and run the following command for pip (the de facto standard Python installation utility) to download all of the required prerequisites for this code into the directory. You need to ship these dependencies with your code, as they are not pre-installed on the instance that runs your Lambda function.

NOTE: You need to be on a machine with Python and pip already installed. If you are using Python 2.7.9 or greater, pip is installed as part of your standard Python installation. If you are not using Python 2.7.9 or greater, consult the pip page for installation instructions.

pip install requests_aws_sign elasticsearch -t .

Finally, zip all of the contents of this directory into a single zip file. Make sure that the file is at the top of the file hierarchy within the zip file, and that it is not contained within another directory. From within the lambda-eventstream directory, you can use the following command on Linux and MacOS systems:

zip *

On Windows clients with the 7-Zip utility installed, you can run the following command from PowerShell or, if you’re really so inclined, a command prompt:

7z a -tzip *

Now that your function and its dependencies are properly packaged, install and test it. Navigate to the Lambda console, choose Create a Lambda Function, and then on the Select Blueprint page, choose Blank function. Choose Next on the Configure triggers screen; you wire up your function to your ECS event stream in the next section.

On the Configure function page, for Name, enter lambda-eventstream. For Runtime, choose Python 2.7. Under Lambda function code, for Code entry type, choose Upload a .ZIP file, and choose Upload to select the ZIP file that you just created.

Under Lambda function handler and role, for Role, choose Create a custom role. This opens a new window for configuring your policy. For IAM Role, choose Create a New IAM Role, and type a name. Then choose View Policy Document, Edit. Paste in the IAM policy below, making sure to replace every instance of AWSAccountID with your own AWS account ID.

          "Effect": "Allow",
          "Action": [
          "Resource": "arn:aws:es:us-east-1:<AWSAccountID>:domain/ecs-events-cluster/*"
            "Effect": "Allow",
            "Action": [
            "Resource": "arn:aws:sns:us-east-1:<AWSAccountID>:ECSTaskErrorNotification"        

This policy establishes every permission that your Lambda function requires for execution, including permission to:

  • Create a new CloudWatch Events log group, and save all outputs from your Lambda function to this group
  • Perform HTTP PUT commands on your Elasticsearch cluster
  • Publish messages to your SNS topic

When you’re done, you can test your configuration by scrolling up to the sample event stream message provided earlier in this post, and using it to test your Lambda function in the console. On the dashboard page for your new function, choose Test, and in the Input test event window, enter the JSON-formatted event from earlier.

Note that, if you haven’t correctly input your account ID in the correct places in your IAM policy file, you may receive a message along the lines of:

User: arn:aws:sts::123456789012:assumed-role/LambdaEventStreamTake2/awslambda_421_20161017203411268 is not authorized to perform: es:ESHttpPost on resource: ecs-events-cluster.

Edit the policy associated with your Lambda execution role in the IAM console and try again.

Send event stream events to your Lambda function

Almost there! Now with your SNS topic, Elasticsearch cluster, and Lambda function all in place, the only remaining element is to wire up your ECS event stream events and route them to your Lambda function. The CloudWatch Events console offer everything you need to set this up quickly and easily.

From the console, choose CloudWatch, Events. On Step 1: Create Rule, under Event selector, choose Amazon EC2 Container Service. CloudWatch Events enables you to filter by the type of message (task state change or container instance state change), as well as to select a specific cluster from which to receive events. For the purposes of this post, keep the default settings of Any detail type and Any cluster.

Under Targets, choose Lambda function. For Function, choose lambda-eventstream. Behind the scenes, this sends events from your ECS clusters to your Lambda function and also creates the service role required for CloudWatch Events to call your Lambda function.

Verify your work

Now it’s time to verify that messages sent from your ECS cluster flow through your Lambda function, trigger an SNS message for failed tasks, and are stored in your Elasticsearch cluster for future retrieval. To test this workflow, you can use the following ECS task definition, which attempts to start the official WordPress image without configuring an SQL database for storage:

    "taskDefinition": {
        "status": "ACTIVE",
        "family": "wpunconfiguredfail",
        "volumes": [],
        "taskDefinitionArn": "arn:aws:ecs:us-east-1:244698725403:task-definition/wpunconfiguredfail:1",
        "containerDefinitions": [
                "environment": [],
                "name": "web",
                "mountPoints": [],
                "image": "wordpress",
                "cpu": 99,
                "portMappings": [
                        "protocol": "tcp",
                        "containerPort": 80,
                        "hostPort": 80
                "memory": 100,
                "essential": true,
                "volumesFrom": []
        "revision": 1

Create this task definition using either the AWS Management Console or the AWS CLI, and then start a task from this task definition. For more detailed instructions, see Launching a Container Instance.

A few minutes after launching this task definition, you should receive an SNS message with the contents of the task state change JSON indicating that the task failed. You can also examine your Elasticsearch cluster in the console by selecting the name of your cluster and choosing Indicates, ecs-index. For Count, you should see that you have multiple records stored.

You can also search the messages that have been stored by opening up access to your Kibana endpoint. Kibana provides a host of visualization and search capabilities for data stored in Amazon ES. To open up access to Kibana to your computer, find your computer’s IP address, and then choose Modify access policy for your Elasticsearch cluster. For Set the domain access policy to, choose Allow access to the domain from specific IP(s) and enter your IP address.

(A more robust and scalable solution for securing Kibana is to front it with a proxy. Details on this approach can be found in Karthi Thyagarajan’s post How to Control Access to Your Amazon Elasticsearch Service Domain.)

You should now be able to kick the Kibana endpoint for your cluster, and search for messages stored in your cluster’s indexes.


After you have this basic, serverless architecture set up for consuming ECS cluster-related event notifications, the possibilities are limitless. For example, instead of storing the events in Amazon ES, you could store them in Amazon DynamoDB, and use the resulting tables to build a UI that materializes the current state of your clusters.

You could also use this information to drive container placement and scaling automatically, allowing you to “right-size” your clusters to a very granular level. By delivering cluster state information in near-real time using an event-driven model as opposed to a pull model, the new ECS event stream feature opens up a much wider array of possibilities for monitoring and scaling your container infrastructure.

If you have questions or suggestions, please comment below.