Contributed by Nathan Taber and Michael Hausenblas
At re:Invent 2017 we introduced the Amazon Elastic Container Service for Kubernetes, or Amazon EKS for short. We consider these tenets as valid today as they were at launch:
EKS is a platform to run production-grade workloads. This means that security and reliability are our first priority. After that we focus on doing the heavy lifting for you in the control plane, including life cycle-related things like version upgrades.
EKS provides a native and upstream Kubernetes experience. This means, with EKS you get vanilla, un-forked Kubernetes. Of course, in keeping with our first tenant, we ensure the Kubernetes versions we run have security-related patches, even for older, supported versions as quickly as possible. However, in terms of portability there’s no special sauce and no lock in.
If you want to use additional AWS services, the integrations are as seamless as possible.
The EKS team in AWS actively contributes to the upstream Kubernetes project, both on the technical level as well as community, from communicating good practices to participation in SIGs and working groups.
The first two tenets are highlighted and that is for a good reason: on the one hand we aim to go in lock-step with the upstream release cadence as much as possible, including outcomes of the SIG PM as well as the LTS Working Group. Given that running a service for production applications is our main focus, we want to make sure that you can rely on the Kubernetes we run for you. This includes, but is not limited to, security considerations around community support for ongoing bug fixes and patches for critical vulnerabilities and exposures (CVEs).
In this post, we want to give you a heads-up on upcoming changes with out Amazon EKS is managing the lifecycle for Kubernetes versions, walk you through the process in general and then have a look at a concrete example, Kubernetes version 1.10. This version happens to be the first version that will be deprecated on Amazon EKS.
But why now?
Glad you asked. It’s really all about security. Past a certain point (usually 1 year), the Kubernetes community stops releasing bug and CVE patches. Additionally, the Kubernetes project does not encourage CVE submission for deprecated versions. This means that vulnerabilities specific to an older version of Kubernetes may not even be reported, leaving users exposed with no notice in the case of a vulnerability. We consider this to be an unacceptable security posture for our customers.
Earlier this year we announced support for Kubernetes 1.12 in EKS. That, together with our commitment to support three Kubernetes versions at any given point in time and the fact that 1.13 will land very soon in EKS means that we have to deprecate 1.10, after which the three supported versions, unsurprisingly, will be 1.11, 1.12, and (you guessed it) 1.13. OK, with that out of the way, let’s have a look at the options you have to move to the latest Kubernetes versions with Amazon EKS and then dive into the update and deprecation process in greater detail:
Ideally, you test a new version and move to one of the three supported ones, in time (details below).
If you are still on a version we deprecate, you will be upgraded automatically, after some time (details, again, below).
If you’re using a deprecated version beyond a certain point and we can’t upgrade the cluster, we may deactivate it.
A quick Kubernetes release cycle refresher
In a nutshell, the Kubernetes versioning and release regime is roughly following a four-releases-per-year pattern, with cadence varying between 70 and 130 days. It also lays out an expectation in terms of upgrades:
We expect users to stay reasonably up-to-date with the versions of Kubernetes they use in production, but understand that it may take time to upgrade, especially for production-critical components.
The formal API versioning allows for a strict deprecation policy which states, amongst other things, that stable (GA) API support is “12 months or 3 releases (whichever is longer)”.
Now that we’re on the same page how upstream Kubernetes releases are managed, let’s have a look at how we at AWS implement the process in EKS.
The EKS Process
In line with the Kubernetes community support for Kubernetes versions, Amazon EKS is committed to running at least three production-ready versions of Kubernetes at any given time, with a fourth version in deprecation. A new Kubernetes version is released as generally available by the Kubernetes project every 70 and 130 days (we take the average of 90 days for simplicity). New GA versions will be supported by EKS some time after GA release (typically at the first patch version release – 1.XX.1, but sometimes later). This means that the total time a version is in production with EKS should be roughly 270 days.
We will announce the deprecation of a given Kubernetes version (n) at least 60 days before the deprecation date and over time, will align the deprecation of a Kubernetes version on EKS to be on or after the date the Kubernetes project stops supporting the version upstream.
For example, we will announce deprecation of version 1.10 while 1.12 is available for EKS and complete the deprecation process after version 1.13 is available for EKS. We will announce the deprecation of 1.11 after 1.13 is available and complete the deprecation after 1.14 is available for EKS.
The following table shows how this will work:
EKS Version
Today
Soon
About +90 days
About +180 days
About +270 days
Latest Available
1.12
1.13
1.14
1.15
1.16
Default
1.11
1.12
1.13
1.14
1.15
Oldest
1.10
1.11
1.12
1.13
1.14
In Deprecation
1.10
1.11
1.12
1.13
When we announce the deprecation, we will give customers a specific date when new cluster creation will be disabled for the version targeted for deprecation. On this date, EKS clusters running the version targeted for deprecation will begin to be updated to the next EKS-supported version of Kubernetes. This means that if the deprecated version is 1.10, clusters will be automatically updated to version 1.11. If a cluster is automatically updated by EKS, customers will need to update the version of their worker nodes after the update is complete. Kubernetes has compatibility between masters and workers for at least 2 versions, so 1.10 workers will continue to operate when orchestrated by a 1.11 control plane.
Upcoming deprecation of Kubernetes 1.10 in EKS
Amazon EKS will deprecate Kubernetes version 1.10 on July 22, 2019. On this day, you will no longer be able to create new 1.10 clusters and all EKS clusters running Kubernetes version 1.10 will be updated to the latest available platform version of Kubernetes version 1.11.
We recommend that all Amazon EKS customers update their 1.10 clusters to Kubernetes version 1.11 or 1.12 as soon as possible.
Wrapping up
What can you do today to prepare? Well, first off, internalize the timeline and try to align internal processes with it. Our documentation has more information about the EKS Kubernetes version deprecation process and EKS updates. If you have any questions, send us a note on our version deprecation issue in the public containers roadmap on GitHub.
Today, App Mesh is available as a public preview. In the coming months, we plan to add new functionality and integrations.
Why App Mesh?
Many of our customers are building applications with microservices architectures, breaking applications into many separate, smaller pieces of software that are independently deployed and operated. Microservices help to increase the availability and scalability of an application by allowing each component to scale independently based on demand. Each microservice interacts with the other microservices through an API.
When you start building more than a few microservices within an application, it becomes difficult to identify and isolate issues. These can include high latencies, error rates, or error codes across the application. There is no dynamic way to route network traffic when there are failures or when new containers need to be deployed.
You can address these problems by adding custom code and libraries into each microservice and using open source tools that manage communications for each microservice. However, these solutions can be hard to install, difficult to update across teams, and complex to manage for availability and resiliency.
AWS App Mesh implements a new architectural pattern that helps solve many of these challenges and provides a consistent, dynamic way to manage the communications between microservices. With App Mesh, the logic for monitoring and controlling communications between microservices is implemented as a proxy that runs alongside each microservice, instead of being built into the microservice code. The proxy handles all of the network traffic into and out of the microservice and provides consistency for visibility, traffic control, and security capabilities to all of your microservices.
Use App Mesh to model how all of your microservices connect. App Mesh automatically computes and sends the appropriate configuration information to each microservice proxy. This gives you standardized, easy-to-use visibility and traffic controls across your entire application. App Mesh uses Envoy, an open source proxy. That makes it compatible with a wide range of AWS partner and open source tools for monitoring microservices.
Using App Mesh, you can export observability data to multiple AWS and third-party tools, including Amazon CloudWatch, AWS X-Ray, or any third-party monitoring and tracing tool that integrates with Envoy. You can configure new traffic routing controls to enable dynamic blue/green canary deployments for your services.
Getting started
Here’s a sample application with two services, where service A receives traffic from the internet and uses service B for some backend processing. You want to route traffic dynamically between services B and B’, a new version of B deployed to act as the canary.
First, create a mesh, a namespace that groups related microservices that must interact.
Next, create virtual nodes to represent services in the mesh. A virtual node can represent a microservice or a specific microservice version. In this example, service A and B participate in the mesh and you manage the traffic to service B using App Mesh.
Now, deploy your services with the required Envoy proxy and with a mapping to the node in the mesh.
After you have defined your virtual nodes, you can define how the traffic flows between your microservices. To do this, define a virtual router and routes for communications between microservices.
A virtual router handles traffic for your microservices. After you create a virtual router, you create routes to direct traffic appropriately. These routes include the connection requests that the route should accept, where they should go, and the weighted amount of traffic to send. All of these changes to adjust traffic between services is computed and sent dynamically to the appropriate proxies by App Mesh to execute your deployment.
You now have a virtual router set up that accepts all traffic from virtual node A sending to the existing version of service B, as well some traffic to the new version, B’.
Exporting metrics, logs, and traces
One of benefits about placing a proxy in front of every microservice is that you can automatically capture metrics, logs, and traces about the communication between your services. App Mesh enables you to easily collect and export this data to the tools of your choice. Envoy is already integrated with several tools like Prometheus and Datadog.
During the preview, we are adding support for AWS services such as Amazon CloudWatch and AWS X-Ray. We have a lot more integrations planned as well.
Available now
AWS App Mesh is available as a public preview and you can start using it today in the North Virginia, Ohio, Oregon, and Ireland AWS Regions. During the preview, we plan to add new features and want to hear your feedback. You can check out our GitHub repository for examples and our roadmap.
This post contributed by Scott Malkie, AWS Solutions Architect
Amazon EC2 P3 and P2 instances, featuring NVIDIA GPUs, power some of the most computationally advanced workloads today, including machine learning (ML), high performance computing (HPC), financial analytics, and video transcoding. Now Amazon Elastic Container Service for Kubernetes (Amazon EKS) supports P3 and P2 instances, making it easy to deploy, manage, and scale GPU-based containerized applications.
This blog post walks through how to start up GPU-powered worker nodes and connect them to an existing Amazon EKS cluster. Then it demonstrates an example application to show how containers can take advantage of all that GPU power!
Prerequisites
You need an existing Amazon EKS cluster, kubectl, and the aws-iam-authenticator set up according to Getting Started with Amazon EKS.
Two steps are required to enable GPU workloads. First, join Amazon EC2 P3 or P2 GPU compute instances as worker nodes to the Kubernetes cluster. Second, configure pods to enable container-level access to the node’s GPUs.
Spinning up Amazon EC2 GPU instances and joining them to an existing Amazon EKS Cluster
Subscribe to the AMI and then launch it using the AWS CloudFormation template. The template takes care of networking, configuring kubelets, and placing your worker nodes into an Auto Scaling group, as shown in the following image.
This template creates an Auto Scaling group with up to two p3.8xlarge Amazon EC2 GPU instances. Powered by up to eight NVIDIA Tesla V100 GPUs, these instances deliver up to 1 petaflop of mixed-precision performance per instance to significantly accelerate ML and HPC applications. Amazon EC2 P3 instances have been proven to reduce ML training times from days to hours and to reduce time-to-results for HPC.
After the AWS CloudFormation template completes, the Outputs view contains the NodeInstanceRole parameter, as shown in the following image.
NodeInstanceRole needs to be passed in to the AWS Authenticator ConfigMap, as documented in the AWS EKS Getting Started Guide. To do so, edit the ConfigMap template and run the command kubectl apply -f aws-auth-cm.yaml in your terminal to apply the ConfigMap. You can then run kubectl get nodes —watch to watch the two Amazon EC2 GPU instances join the cluster, as shown in the following image.
Configuring Kubernetes pods to access GPU resources
Once the daemon set is running on the GPU-powered worker nodes, use the following command to verify that each node has allocatable GPUs.
kubectl get nodes \ "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
The following output shows that each node has four GPUs available:
Next, modify any Kubernetes pod manifests, such as the following one, to take advantage of these GPUs. In general, adding the resources configuration (resources: limits:) to pod manifests gives containers access to one GPU. A pod can have access to all of the GPUs available to the node that it’s running on.
As a more specific example, the following sample manifest displays the results of the nvidia-smi binary, which shows diagnostic information about all GPUs visible to the container.
Download this manifest as nvidia-smi-pod.yaml and launch it with kubectl apply -f nvidia-smi-pod.yaml.
To confirm successful nvidia-smi execution, use the following command to examine the log.
kubectl logs nvidia-smi
The above commands produce the following output:
Existing limitations
GPUs cannot be overprovisioned – containers and pods cannot share GPUs
The maximum number of GPUs that you can schedule to a pod is capped by the number of GPUs available to that pod’s node
Depending on your account, you might have Amazon EC2 service limits on how many and which type of Amazon EC2 GPU compute instances you can launch simultaneously
Please leave any comments about this post and share what you’re working on. I can’t wait to see what you build with GPU-powered workloads on Amazon EKS!
This post contributed by Subhrangshu Kumar Sarkar, Sr. Technical Account Manager at AWS
The Amazon ECS–optimized Amazon Machine Image (AMI) comes prepackaged with the Amazon Elastic Container Service (ECS) container agent, Docker, and the ecs-init service. When updates to these components are released, try to integrate them as quickly as possible. Doing so helps you maintain a safe, secure, and reliable environment for running your containers.
Each release of the ECS–optimized AMI includes bug fixes and feature updates. AWS recommends refreshing your container instance fleet with the latest AMI whenever possible, rather than trying to patch instances in-place. Periodical replacement of your ECS instances aligns with the immutable infrastructure paradigm, which is less prone to human error. It’s also less susceptible to configuration drift because infrastructure is managed through code.
In this post, I show you how to manually refresh the container instances in an active ECS cluster with new container instances built from a newly released AMI. You also see how to refresh the ECS instance fleet when it is part of an Auto Scaling group, and when it is not.
Solution Overview
The following flow chart shows the strategy to be used in refreshing the cluster.
Prerequisites
An AWS account with enough room to accommodate “ECS cluster instance count” number of more Amazon EC2 instances, in addition to the existing EC2 instances that you already have during the refresh period. If you have a total of 10 t2.medium instances in an AWS Region where an ECS cluster with four container instances is running, you should be able to spawn four more t2.medium instances. Your instance count comes down to 10 again, after your old instances are de-registered and terminated at the end of the refresh period.
An existing ECS cluster (preferably with one or more container instances built with an old AMI), with or without a service running on it.
A Linux system with the AWS CLI and JQ installed. This allows you to try the programmatic method of refreshing the cluster. You can SSH into an EC2 virtual machine if you do not have local access to a Linux system.
An IAM user with permissions to view ECS resources, deregister and terminate the ECS instances, revise a task definition, and update a service.
A specified AWS Region. In this post, the cluster is in us-east-1 and that is the region for all AWS CLI commands mentioned.
Use the following steps to test if you have all the resources and permissions to proceed.
On the clusters page, select the cluster to refresh.
You should be able to see the details of the services, tasks, and the container instance on the respective tabs.
1. Retrieve the latest ECS–optimized AMI metadata
Previously, to make sure that you were using the latest ECS–optimized AMI, you had to either consult the ECS documentation or subscribe to the ECS AMI Amazon SNS topic.
Now, you can query the AWS Systems Manager Parameter Store API to get the latest AMI version ID or a list of available AMI IDs and their corresponding Docker runtime and ECS agent versions. You can query the Parameter Store API using the AWS CLI or any of the AWS SDKs. In fact, you can now use a Systems Manager parameter in AWS CloudFormation to launch EC2 instances with the latest ECS-optimized AMI.
Now, find the corresponding EC2 instance IDs for these container instances. The IDs are then used to find the corresponding Auto Scaling group from which to detach the instances.
An ECS container instance is an EC2 instance that is running the ECS container agent and has been registered into a cluster. In the above sample output:
2db66342-5f69-4782-89a3-f9b707f979ab is the container instance ID
i-08e8cfc073db135a9 is an EC2 instance ID
Using the AWS Console
In the ECS console, choose Clusters, select the cluster, and choose ECS Instances.
Select Filter by attributes and choose ecs:ami-id as the attribute on which to filter.
Select an AMI ID that is not same as the latest AMI ID, in this case ami-aff65ad2.
For all resulting ECS instances, the container instance ID and the EC2 instance IDs are both visible.
3. List the instances that are part of an Auto Scaling group
If your cluster was created with the console first-run experience after November 24, 2015, then the Auto Scaling group associated with the AWS CloudFormation stack created for your cluster can be scaled up or down to add or remove container instances. You can perform this scaling operation from within the ECS console.
Use the following steps to list the outdated ECS instances that are part of an Auto Scaling group.
Using the AWS CLI
Run the following command:
aws autoscaling describe-auto-scaling-instances – instance-ids <instance id #1> <instance id #2>
The response shows that the instances are part of the EC2ContainerService-workshop-app-cluster-EcsInstanceAsg-1IVVUK4CR81X1 Auto Scaling group.
Using the AWS Console
If the ECS cluster was created from the console, you likely have an associated CloudFormation stack. By default, the stack name is EC2ContainerService-cluster_name.
In the CloudFormation console, select the cluster, choose Outputs, and note the corresponding stack for your cluster.
In the EC2 console, choose Auto Scaling groups.
Select the group and check that the EC2 instance IDs for the ECS instance are registered.
4. Create a new Auto Scaling group
If the container instances are not part of any Auto Scaling group, create a new group from one of the existing container instances and then add all other container instances to it. A launch configuration is automatically created for the new Auto Scaling group.
Using the AWS CLI
Run the following command to create an Auto Scaling group using the EC2 instance ID for an existing container instance:
Keep the min-size parameter to 0 and max-size to greater than the number of instances that you are going to add to this Auto Scaling group.
At this point, your Auto Scaling group does not contain any instances. Neither does it have any of the subnets or Availability Zones of any of the old instances, other than the instance from which you made the Auto Scaling group. To add all old instances (including the one from which the Auto Scaling group was created) to this Auto Scaling group, find the subnets and Availability Zones to which they are attached.
After you have all the Availability Zones and subnets to be added to the Auto Scaling group, run the following command to update the Auto Scaling group:
Now, all existing container instances are part of an Auto Scaling group, which is attached to a launch configuration capable of launching instances with the old AMI.
When you attach instances, Auto Scaling increases the desired capacity of the group by the number of instances being attached.
Using the AWS Console
To create an Auto Scaling group from an existing container instance, do the following steps:
In the ECS console, on the EC2 Instances tab, open the EC2 instance ID for the container instance.
Select the instance and choose Actions, Instance Settings, and Attach to Auto Scaling Group.
On the Attach to Auto Scaling Group page, select a new Auto Scaling group, enter a name for the group, and then choose Attach.
The new Auto Scaling group is created using a new launch configuration with the same name that you specified for the Auto Scaling group. The launch configuration gets its settings (for example, security group and IAM role) from the instance that you attached. The Auto Scaling group also gets settings (for example, Availability Zone and subnet) from the instance that you attached, and has a desired capacity and maximum size of 1.
Now that you have an Auto Scaling group and launch configuration ready, add the max value for the Auto Scaling group to the total number of exiting container instances in the ECS cluster.
To add other container instances of the ECS cluster to this Auto Scaling group:
On the navigation pane, under Auto Scaling, choose Auto Scaling Groups, select the new Auto Scaling group, and choose Edit.
Add subnets for other instances to the Subnet(s) section and save the configuration.
For each of the other container instances of the cluster, open the EC2 instance ID, select the instance, and then choose Actions, Instance Settings, and Attach to Auto Scaling Group.
On the Attach to Auto Scaling Group page, select an existing Auto Scaling group, select the Auto Scaling group that you just created, and then choose Attach.
If the instance doesn’t meet the criteria (for example, if it’s not in the same Availability Zone as the Auto Scaling group), you get an error message with the details. Choose Close and try again with an instance that meets the criteria.
5. Create a new launch configuration
Create a new launch configuration for the Auto Scaling group. This launch configuration should be able to launch instances with the new ECS–optimized AMI. It should also put the user data in the instances to allow them to join the ECS cluster when they are created.
Using the AWS CLI
First, run the following command to get the launch configuration for the Auto Scaling group:
aws autoscaling describe-auto-scaling-groups – auto-scaling-group-names <Auto Scaling group name> – query AutoScalingGroups[].LaunchConfigurationName – output text
Now, create a new launch configuration with the new image ID from this existing launch configuration. Create a launch configuration called New-AMI-launch. Substitute the existing launch configuration name for launch-configuration-name and the image ID corresponding to the new AMI for image_id. aws autoscaling describe-launch-configurations – launch-configuration-name \ <launch-configuration-name> – query "LaunchConfigurations[0]" | \ jq 'del(.LaunchConfigurationARN)' | jq 'del(.CreatedTime)' | \ jq 'del(.KernelId)' | jq 'del(.RamdiskId)' | \ jq '. += {"LaunchConfigurationName": "New-AMI-launch"}' | \ jq '. += {"ImageId": "<image_id>"}' > new-launch-config.json aws autoscaling create-launch-configuration – cli-input-json file://new-launch-config.json
At this point, the New-AMI-launch launch configuration is ready. Update the Auto Scaling group with the new launch configuration:
To add block devices to the launch configuration, you can always override the block device mapping for the new launch configuration.
Using the AWS Console
On the Auto Scaling groups page, choose Details in the bottom pane and note the launch configuration for your Auto Scaling group.
On the Launch configurations page, select the launch configuration and choose Copy launch configuration.
On the AMI details page, choose Edit AMI.
In the search box, enter the latest AMI image ID (in this case, ami-aff65ad2) and choose Select.
On the Configure details page, enter a new name for the launch configuration.
Keep everything else the same and choose Create.
On the Auto Scaling groups page, choose Edit.
Select the newly created launch configuration and choose Save.
6. Detach the old ECS instances from the Auto Scaling group
Now that you have a new launch configuration with the Auto Scaling group, detach the old instances from the group.
For every old instance detached, add a new instance through the new launch configuration. This keeps the desired count for the Auto Scaling group unchanged.
Using the AWS CLI
Run the following command:
aws autoscaling detach-instances – instance-ids <instance id #1> <instance id #2> – auto-scaling-group-name <auto-scaling-group-name> – no-should-decrement-desired-capacity
When this is done, the following command should show a blank result:
aws autoscaling describe-auto-scaling-instances – instance-ids <instance id #1> <instance id #2>
The following command should show the new ECS instances, for every old instance detached from the Auto Scaling group:
The old container instances have been detached from the Auto Scaling group but they are still registered in the ECS cluster.
Using the AWS Console
On the Auto Scaling groups page, select the group.
On the instance tab, select the old container instances.
In the bottom pane, choose Actions, Detach.
In the Detach Instances dialog box, select the check box for Add new instances to the Auto Scaling group to balance the load and choose Detach instances.
7. Revise the task definition and update the service
Now revise the task definition in use to impose a constraint. Subsequent tasks spawned from this task definition are hosted only on ECS instances built with the new AMI.
Using the AWS CLI
Run the following command to get the task definition for the service running on the cluster:
Here, workshop-task is the family and 9 is the revision. Now, update the task definition with the constraint. Use the built-in attribute, ecs.ami-id, to impose the constraint. Replace the image_id value in the following command with the value found by querying Parameter Store. aws ecs describe-task-definition – task-definition <task definition family:revision> – query taskDefinition | \ jq '. + {placementConstraints: [{"expression": "attribute:ecs.ami-id == <image_id>", "type": "memberOf"}]}' | \ jq 'del(.status)'| jq 'del(.revision)' | jq 'del(.requiresAttributes)' | \ jq '. + {containerDefinitions:[.containerDefinitions[] + {"memory":256, "memoryReservation": 128}]}'| \ jq 'del(.compatibilities)' | jq 'del(.taskDefinitionArn)' > new-task-def.json Even if your original container definition doesn’t have a memory or memoryReservation key, you must provide one of those values while updating the task definition. For this post, I have used the task-level memory allocation value (256) and an arbitrary value (128) for those keys, respectively.
After the service is updated with the revised task definition, the new tasks constituting the service should come up on the new ECS instances, thanks to the constraint in the new task definition.
Use the command on the old container instances until there are no task ARNs in the output:
aws ecs list-tasks – cluster <cluster name> – container-instance <container-instance id #1> – container-instance <container-instance id #2>
Using the AWS Console
In the ECS console, on the Task definitions page, select your task definition and choose Create new revision.
On the Create new revision of task definition page, choose Add constraint.
For Expression, add attribute:ecs.ami-id == <AMI ID for new ECS optimized AMI> and choose Create. You see a new revision of the task definition being created. In this case, workshop-task:10 got created.
To update the service, on the Clusters page, select the service corresponding to the revised task definition.
On the Configure service page, for Task definition, select the appropriate task definition version and choose Next step.
Keep the remaining default values. On the Review page, choose Update service.
On the service page, on the Event tab, you see events corresponding to the old tasks getting stopped new tasks getting started on the new ECS instances.
Wait until no tasks are running on the old ECS instances and you see all tasks starting on the new ECS instances.
9. Deregister and terminate the old ECS instances
Using the AWS CLI
For each of the old container instances, run the following command:
In the ECS console, choose Clusters, ECS instances.
Keep the EC2 instance ID displayed on the EC2 Instance column and keep the instance detail page open.
Open the container instance ID for the ECS instance to deregister.
On the container instance page, choose Deregister.
After the container instance is deregistered, terminate the instance detail page.
At this point, your ECS cluster has been refreshed with the EC2 instances built with the new ECS–optimized AMI.
Conclusion
In this post, I demonstrated how to refresh the container instances in an active ECS cluster with instances built from a newly released ECS–optimized AMI. You can either use the AWS Management Console or programmatically refresh your ECS cluster in some quick steps.
AWS Fargate is a service that’s designed to remove the need to do these types of operations by running and managing all the EC2 infrastructure necessary to support your containers for you. With Fargate, your containers are always started with the latest ECS agent and Docker version.
This post contributed by Duke Takle and Kevin Hayes at Corteva Agriscience
At Corteva Agriscience, the agricultural division of DowDuPont, our purpose is to enrich the lives of those who produce and those who consume, ensuring progress for generations to come. As a global business, we support a network of research stations to improve agricultural productivity around the world
As analytical technology advances the volume of data, as well as the speed at which it must be processed, meeting the needs of our scientists poses unique challenges. Corteva Cloud Engineering teams are responsible for collaborating with and enabling software developers, data scientists, and others. Their work allows Corteva research and development to become the most efficient innovation machine in the agricultural industry.
Recently, our Systems and Innovations for Breeding and Seed Products organization approached the Cloud Engineering team with the challenge of how to deploy a novel machine learning (ML) algorithm for scoring genetic markers. The solution would require supporting labs across six continents in a process that is run daily. This algorithm replaces time-intensive manual scoring of genotypic assays with a robust, automated solution. When examining the solution space for this challenge, the main requirements for our solution were global deployability, application uptime, and scalability.
Before the implementing this algorithm in AWS, ML autoscoring was done as a proof of concept using pre-production instances on premises. It required several technicians to continue to process assays by hand. After implementing on AWS, we have enabled those technicians to be better used in other areas, such as technology development.
Solutions Considered
A RESTful web service seemed to be an obvious way to solve the problem presented. AWS has several patterns that could implement a RESTful web service, such as Amazon API Gateway, AWS Lambda, Amazon EC2, AWS Auto Scaling, Amazon Elastic Container Service (ECS) using the EC2 launch type, and AWS Fargate.
At the time, the project came into our backlog, we had just heard of Fargate. Fargate does have a few limitations (scratch storage, CPU, and memory), none of which were a problem. So EC2, Auto Scaling, and ECS with the EC2 launch type were ruled out because they would have introduced unneeded complexity. The unneeded complexity is mostly around management of EC2 instances to either run the application or the container needed for the solution.
When the project came into our group, there had been a substantial proof-of-concept done with a Docker container. While we are strong API Gateway and Lambda proponents, there is no need to duplicate processes or services that AWS provides. We also knew that we needed to be able to move fast. We wanted to put the power in the hands of our developers to focus on building out the solution. Additionally, we needed something that could scale across our organization and provide some rationalization in how we approach these problems. AWS services, such as Fargate, AWS CodePipeline, and AWS CloudFormation, made that possible.
Solution Overview
Our group prefers using existing AWS services to bring a complete project to the production environment.
AWS Fargate is used for the actual application stack, obviously.
CI/CD Pipeline
A complete discussion of the CI/CD pipeline for the project is beyond the scope of this post. However, in broad strokes, the pipeline is:
Compile some C++ code wrapped in Python, create a Python wheel, and publish it to an artifact store.
Create a Docker image with that wheel installed and publish it to ECR.
Deploy and test the new image to our test environment.
Deploy the new image to the production environment.
Solution
As mentioned earlier, the application is a Docker container deployed with the Fargate launch type. It uses an Aurora PostgreSQL DB instance for the backend data. The application itself is only needed internally so the Application Load Balancer is created with the scheme set to “internal” and deployed into our private application subnets.
Our environments are all constructed with CloudFormation templates. Each environment is constructed in a separate AWS account and connected back to a central utility account. The infrastructure stacks export a number of useful bits like the VPC, subnets, IAM roles, security groups, etc. This scheme allows us to move projects through the several accounts without changing the CloudFormation templates, just the parameters that are fed into them.
For this solution, we use an existing VPC, set of subnets, IAM role, and ACM certificate in the us-east-1 Region. The solution CloudFormation stack describes and manages the following resources:
A complete discussion of all the resources for the solution is beyond the scope of this post. However, we can explore the resource definitions of the components specific to Fargate. The following three simple segments of CloudFormation are all that is needed to create a Fargate stack: an ECS cluster, task definition, and service. More complete examples of the CloudFormation templates are linked at the end of this post, with stack creation instructions.
The ECS Cluster resource is a simple grouping for the other ECS resources to be created. The cluster created in this stack holds the tasks and service that implement the actual solution. Finally, in the AWS Management Console, the cluster is the entry point to find info about your ECS resources.
The ECS Task Definition is where we specify and configure the container. Interesting things to note are the CPU and memory configuration items. It is important to note the valid combinations for CPU/memory settings, as shown in the following table.
The ECS Service resource is how we can configure where and how many instances of tasks are executed to solve our problem. In this case, we see that there are at least minimumCount instances of the task running in any of three private subnets in our VPC.
Conclusion
Deploying this algorithm on AWS using containers and Fargate allowed us to start running the application at scale with low support overhead. This has resulted in faster turnaround time with fewer staff and a concomitant reduction in cost.
“We are very excited with the deployment of Polaris, the autoscoring of the marker lab genotyping data using AWS technologies. This key technology deployment has enhanced performance, scalability, and efficiency of our global labs to deliver over 1.4 Billion data points annually to our key customers in Plant Breeding and Integrated Operations.”
Sandra Milach, Director of Systems and Innovations for Breeding and Seed Products.
We are distributing this solution to all our worldwide laboratories to harmonize data quality, and speed. We hope this enables an increase in the velocity of genetic gain to increase yields of crops for farmers around the world.
You can learn more about the work we do at Corteva at www.corteva.com.
Try it yourself:
The snippets above are instructive but not complete. We have published two repositories on GitHub that you can explore to see how we built this solution:
In this post, we discuss the various options available for ensuring that certificates can be securely and reliably made available to containers. By simplifying the process of distributing or generating certificates and other secrets, it’s easier for you to build inherently secure architectures without compromising scalability.
There are several ways to achieve this:
1. Storing the certificate and private key in the Docker image
Certificates and keys can be included in the Docker image and made available to the container at runtime. This approach makes the deployment of containers with certificates and keys simple and easy.
However, there are some drawbacks. First, the certificates and keys need to be created, stored securely, and then included in the Docker image. There are some manual or additional automation steps required to securely create, retrieve, and include them for every new revision of the Docker image.
The following example Docker file creates an NGINX container that has the certificate and the key included:
FROM nginx:alpine
# Copy in secret materials
RUN mkdir -p /root/certs/nginxdemotls.com
COPY nginxdemotls.com.key /root/certs/nginxdemotls.com/nginxdemotls.com.key
COPY nginxdemotls.com.crt /root/certs/nginxdemotls.com/nginxdemotls.com.crt
RUN chmod 400 /root/certs/nginxdemotls.com/nginxdemotls.com.key
# Copy in nginx configuration files
COPY nginx.conf /etc/nginx/nginx.conf
COPY nginxdemo.conf /etc/nginx/conf.d
COPY nginxdemotls.conf /etc/nginx/conf.d
# Create folders to hold web content and copy in HTML files.
RUN mkdir -p /var/www/nginxdemo.com
RUN mkdir -p /var/www/nginxdemotls.com
COPY index.html /var/www/nginxdemo.com/index.html
COPY indextls.html /var/www/nginxdemotls.com/index.html
From a security perspective, this approach has additional drawbacks. Because certificates and private keys are bundled with the Docker images, anyone with access to a Docker image can also retrieve the certificate and private key. The other drawback is the updated certificates are not replaced automatically and the Docker image must be re-created to include any updated certificates. Running containers must either be restarted with the new image, or have the certificates updated.
2. Storing the certificates in AWS Systems Manager Parameter Store and Amazon S3
Some certificates can be stored in Parameter Store using the ‘Secure String’ type and using KMS for encryption. With this approach, you can make an API call to retrieve the certificate when the container is deployed. As mentioned earlier, the access to the certificate can be based on the role used to retrieve the certificate. The advantage of this approach is that the certificate can be replaced. The next time the container is deployed, it picks up the new certificate and there is no need to update the Docker image.
Currently, there is a limitation of 4,096 characters that can be stored in Parameter Store. This may not be sufficient for some type of certificates. For example, some x509 certs include the chain and so can exceed the 4,096 character limit. To avoid any character size limitation, Amazon S3 can be used to store the certificate with Parameter Store. The certificate can be stored on Amazon S3, encrypted with KMS and the private key, or the password can be stored in Parameter Store.
With this approach, there is no limitation on certificate length and the private key remains secured with KMS. However, it does involve some additional complexity in setting up the process of creating the certificates, storing them in S3, and then storing the password or private keys in Parameter Store. That is in addition to securing, trusting, and auditing the system handling the private keys and certificates.
3. Storing the certificates in AWS Secrets Manager
AWS Secrets Manager offers a number of features to allow you to store and manage credentials, keys, and other secret materials. This eliminates the need to store these materials with the application code and instead allows them to be referenced on demand. By centralizing the management of secret materials, this single service can manage fine-grained access control through granular IAM policies as well as the revocation and rotation, all through API calls.
All materials stored in the AWS Secrets Manager are encrypted with the customer’s choice of KMS key. The post AWS Secrets Manager: Store, Distribute, and Rotate Credentials Securely shows how AWS Secrets Manager can be used to store RDS database credentials. However, the same process can apply to TLS certificates and keys.
Secrets currently have a limit of 4,096 characters. This approach may be unsuitable for some x509 certificates that include the chain and can exceed this limit. This limit applies to the sum of all key-value pairs within a single secret, so certificates and keys may need to be stored in separate secrets.
After the secure material is in place, it can be retrieved by the container instance at runtime via the AWS Command Line Interface (AWS CLI) or directly from within the application code. All that’s required is for the container task role to have the requisite permissions in IAM to read the secrets.
With the exception of rotating RDS credentials, AWS Secrets Manager requires the user to provide Lambda function code, which is called on a configurable schedule to manage the rotation. This rotation would need to consider the generation of new keys and certificates and redeploying the containers.
4. Using self-signed certificates, generated as the Docker container is created
The advantage of this approach is that it allows the use of TLS communications without any of the complexity of distributing certificates or private keys. However, this approach does require implicit trust of the server. Some applications may generate warnings that there is no acceptable root of trust.
5. Building and managing a private certificate authority
A private certificate authority (CA) can offer greater security and flexibility than the solutions outlined earlier. Typically, a private CA solution would manage the following for each ‘Common name’:
A private key
A certificate, created with the private key
Lists of certificates issued and those that have been revoked
Policies for managing certificates, for example which services have the right to make a request for a new certificate
Audit logs to track the lifecycle of certificates, in particular to ensure timely renewal where necessary
It is possible for an organization to build and maintain their own certificate issuing platform. This approach requires the implementation of a platform that is highly available and secure. These types of systems add to the overall overhead of maintaining infrastructures from a security, availability, scalability, and maintenance perspective. Some customers have also implemented Lambda functions to achieve the same functionality when it comes to issuing private certificates.
While it’s possible to build a private CA for internal services, there are some challenges to be aware of. Any solution should provide a number of features that are key to ensuring appropriate management of the certificates throughout their lifecycle.
For instance, the solution must support the creation, tracking, distribution, renewal, and revocation of certificates. All of these operations must be provided with the requisite security and authentication controls to ensure that certificates are distributed appropriately.
Scalability is another consideration. As applications become increasingly stateless and elastic, it’s conceivable that certificates may be required for every new container instance or wildcard certificates created to support an environment. Whatever CA solution is implemented must be ready to accommodate such a load while also providing high availability.
These types of approaches have drawbacks from various perspectives:
Custom code can be hard to maintain
Additional security measures must be implemented
Certificate renewal and revocation mechanisms also must be implemented
The platform must be maintained and kept up-to-date from a patching perspective while maintaining high availability
6. Using the new ACM Private CA to issue private certificates
ACM Private CA offers a secure, managed infrastructure to support the issuance and revocation of private digital certificates. It supports RSA and ECDSA key types for CA keys used for the creation of new certificates, as well as certificate revocation lists (CRLs) to inform clients when a certificate should no longer be trusted. Currently, ACM Private CA does not offer root CA support.
The following screenshot shows a subordinate certificate that is available for use:
The private key for any private CA that you create with ACM Private CA is created and stored in a FIPS 140-2 Level 3 Hardware Security Module (HSM) managed by AWS. The ACM Private CA is also integrated with AWS CloudTrail, which allows you to record the audit trail of API calls made using the AWS Management Console, AWS CLI, and AWS SDKs.
Setting up ACM Private CA requires a root CA. This can be used to sign a certificate signing request (CSR) for the new subordinate (CA), which is then imported into ACM Private CA. After this is complete, it’s possible for containers within your platform to generate their own key-value pairs at runtime using OpenSSL. They can then use the key-value pairs to make their own CSR and ultimately receive their own certificate.
More specifically, the container would complete the following steps at runtime:
Add OpenSSL to the Docker image (if it is not already included).
Generate a key-value pair (a cryptographically related private and public key).
Use that private key to make a CSR.
Call the ACM Private CA API or CLI issue-certificate operation, which issues a certificate based on the CSR.
Call the ACM Private CA API or CLI get-certificate operation, which returns an issued certificate.
The following diagram shows these steps:
The authorization to successfully request a certificate is controlled via IAM policies, which can be attached via a role to the Amazon ECS task. Containers require the ‘Allow’ effect for at least the acm-pca:GetCertificate and acm:IssueCertificate actions. The following is a sample IAM policy:
For additional security, it is possible to store the certificate and keys in a temporary volume mounted in memory through the ‘tmpfs’ parameter. With this option enabled, the secure material is never written to the filesystem of the host machine.
Note: This feature is not currently available for containers run on AWS Fargate.
The task now has the necessary materials and starts up. Clients should be able to establish the trust hierarchy from the server, through ACM Private CA to the root or intermediate CA.
One consideration to be aware of is that ACM Private CA currently has a limit of 50,000 certificates for each CA in each Region. If the requirement is for each, short-lived container instance to have a separate certificate, then this limit could be reached.
Summary
The approaches outlined in this post describe the available options for ensuring that generation, storage, or distribution of sensitive material is done efficiently and securely. It should also be done in a way that supports the ephemeral, automatic scaling capabilities of container-based architectures. ACM Private CA provides a single interface to manage public and now private certificates, as well as seamlessly integrating with the AWS services.
If you have questions or suggestions, please comment below.
This post was contributed by Jason Umiker, AWS Solutions Architect.
Whether it’s helping facilitate a journey to microservices or deploying existing tools more easily and repeatably, many customers are moving toward containerized infrastructure and workflows. AWS provides many of the services and mechanisms to help you with that.
Amazon Elastic Container Service (ECS) helps schedule and orchestrate containers across a fleet of servers. It involves installing an agent on each container host that takes instructions from the ECS control plane and relays them to the local Docker image on each one. ECS makes this easy by providing an optimized Amazon Machine Image (AMI) that launches automatically using the ECS console or CLI and that you can use to launch container hosts yourself.
It is up to you to choose the appropriate instance types, sizes, and quantity for your cluster fleet. You should have the capacity to deploy and scale workloads as well as to spread them across enough failure domains for high availability. Features like Auto Scaling groups help with that.
Also, while AWS provides Amazon Linux and Windows AMIs pre-configured for ECS, you are responsible for ongoing maintenance of the OS, which includes patching and security. Items that require regular patching or updating in this model are the OS, Docker, the ECS agent, and of course the contents of the container images.
Two of the key ECS concepts are Tasks and Services. A task is one or more containers that are to be scheduled together by ECS. A service is like an Auto Scaling group for tasks. It defines the quantity of tasks to run across the cluster, where they should be running (for example, across multiple Availability Zones), automatically associates them with a load balancer, and horizontally scales based on metrics that you define like CPI or memory utilization.
What is Fargate?
AWS Fargate is a new compute engine for Amazon ECS that runs containers without requiring you to deploy or manage the underlying Amazon EC2 instances. With Fargate, you specify an image to deploy and the amount of CPU and memory it requires. Fargate handles the updating and securing of the underlying Linux OS, Docker daemon, and ECS agent as well as all the infrastructure capacity management and scaling.
How to use Fargate?
Fargate is exposed as a launch type for ECS. It uses an ECS task and service definition that is similar to the traditional EC2 launch mode, with a few minor differences. It is easy to move tasks and services back and forth between launch types. The differences include:
Using the awsvpc network mode
Specifying the CPU and memory requirements for the task in the definition
The best way to learn how to use Fargate is to walk through the process and see it in action.
Walkthrough: Deploying a service with Fargate in the console
At the time of publication, Fargate for ECS is available in the N. Virginia, Ohio, Oregon, and Ireland AWS regions. This walkthrough works in any AWS region where Fargate is available.
If you’d prefer to use a CloudFormation template, this one covers Steps 1-4. After launching this template you can skip ahead to Explore Running Service after Step 4.
Step 1 – Create an ECS cluster
An ECS cluster is a logical construct for running groups of containers known as tasks. Clusters can also be used to segregate different environments or teams from each other. In the traditional EC2 launch mode, there are specific EC2 instances associated with and managed by each ECS cluster, but this is transparent to the customer with Fargate.
Open the ECS console and ensure that Fargate is available in the selected Region (for example, N. Virginia).
Choose Clusters, Create Cluster.
Choose Networking only, Next step.
For Cluster name, enter “Fargate”. If you don’t already have a VPC to use, select the Create VPC check box and accept the defaults as well. Choose Create.
Step 2 – Create a task definition, CloudWatch log group, and task execution role
A task is a collection of one or more containers that is the smallest deployable unit of your application. A task definition is a JSON document that serves as the blueprint for ECS to know how to deploy and run your tasks.
The console makes it easier to create this definition by exposing all the parameters graphically. In addition, the console creates two dependencies:
The Amazon CloudWatch log group to store the aggregated logs from the task
The task execution IAM role that gives Fargate the permissions to run the task
In the left navigation pane, choose Task Definitions, Create new task definition.
Under Select launch type compatibility, choose FARGATE, Next step.
For Task Definition Name, enter NGINX.
If you had an IAM role for your task, you would enter it in Task Role but you don’t need one for this example.
The Network Mode is automatically set to awsvpc for Fargate
Under Task size, for Task memory, choose 0.5 GB. For Task CPU, enter 0.25.
Choose Add container.
For Container name, enter NGINX.
For Image, put nginx:1.13.9-alpine.
For Port mappings type 80 into Container port.
Choose Add, Create.
Step 3 – Create an Application Load Balancer
Sending incoming traffic through a load balancer is often a key piece of making an application both scalable and highly available. It can balance the traffic between multiple tasks, as well as ensure that traffic is only sent to healthy tasks. You can have the service manage the addition or removal of tasks from an Application Load Balancer as they come and go but that must be specified when the service is created. It’s a dependency that you create first.
Open the EC2 console.
In the left navigation pane, choose Load Balancers, Create Load Balancer.
Under Application Load Balancer, choose Create.
For Name, put NGINX.
Choose the appropriate VPC (10.0.0.0/16 if you let ECS create if for you).
For Availability Zones, select both and choose Next: Configure Security Settings.
Choose Next: Configure Security Groups.
For Assign a security group, choose Create a new security group. Choose Next: Configure Routing.
For Name, enter NGINX. For Target type, choose ip.
Select the new load balancer and note its DNS name (this is the public address for the service).
Step 4 – Create an ECS service using Fargate
A service in ECS using Fargate serves a similar purpose to an Auto Scaling group in EC2. It ensures that the needed number of tasks are running both for scaling as well as spreading the tasks over multiple Availability Zones for high availability. A service creates and destroys tasks as part of its role and can optionally add or remove them from an Application Load Balancer as targets as it does so.
Open the ECS console and ensure that that Fargate is available in the selected Region (for example, N. Virginia).
In the left navigation pane, choose Task Definitions.
Select the NGINX task definition that you created and choose Actions, Create Service.
For Launch Type, select Fargate.
For Service name, enter NGINX.
For Number of tasks, enter 1.
Choose Next step.
Under Subnets, choose both of the options.
For Load balancer type, choose Application Load Balancer. It should then default to the NGINX version that you created earlier.
Choose Add to load balancer.
For Target group name, choose NGINX.
Under DNS records for service discovery, for TTL, enter 60.
Click Next step, Next step, and Create Service.
Explore the running service
At this point, you have a running NGINX service using Fargate. You can now explore what you have running and how it works. You can also ask it to scale up to two tasks across two Availability Zones in the console.
Go into the service and see details about the associated load balancer, tasks, events, metrics, and logs:
Scale the service from one task to multiple tasks:
Choose Update.
For Number of tasks, enter 2.
Choose Next step, Next step, Next step then Update Service.
Watch the event that is logged and the new additional task both appear.
On the service Details tab, open the NGINX Target Group Name link and see the IP address registered targets spread across the two zones.
Go to the DNS name for the Application Load Balancer in your browser and see the default NGINX page. Get the value from the Load Balancers dashboard in the EC2 console.
Walkthrough: Adding a CI/CD pipeline to your service
Now, I’m going to show you how to set up a CI/CD pipeline around this service. It watches a GitHub repo for changes and rebuilds the container with CodeBuild based on the buildspec.yml file and Dockerfile in the repo. If that build is successful, it then updates your Fargate service to deploy the new image.
If you’d prefer to use a CloudFormation Template, this one covers the creation of the dependencies so that the console will pre-fill these (CodeBuild Project and IAM Roles) during the creation of the CodePipeline in the steps below.
Step 1 – Create an ECR repository for the rebuilt container image
An ECR repository is a place to store your container images in a secure and reliable manner. Scaling and self-healing of Fargate tasks requires these images to be always available to be pulled when required. This is an important part of a container platform.
Open the ECS console and ensure that that Fargate is available in the selected Region (for example N. Virginia).
In the left navigation pane, under Amazon ECR, choose Repositories, Get started.
For Repository name, put NGINX and choose Next step.
Step 2 – Fork the nginx-codebuild example into your own GitHub account
I have created an example project that takes the Dockerfile and config files for the official NGINX Docker Hub image and adds a buildspec.yml file to tell CodeBuild how to build the container and push it to your new ECR registry on completion. You can fork it into your own GitHub account for this CI/CD demo.
Go to https://github.com/jasonumiker/nginx-codebuild.
In the upper right corner, choose Fork.
Step 3 – Create the pipeline and associated IAM roles
You have two complementary AWS services for building a CI/CD pipeline for your containers. CodeBuild executes the build jobs and CodePipeline kicks off those builds when it notices that the source GitHub or CodeCommit repo changes. If successful, CodePipeline then deploys the new container image to Fargate.
The CodePipeline console can create the associated CodeBuild project, in addition to other dependencies such as the required IAM roles.
Open the CodePipeline console and ensure that that Fargate is available in the selected Region (for example, N. Virginia).
Choose Get started.
For Pipeline name, enter NGINX and choose Next step.
For Source provider, choose GitHub.
Choose Connect to GitHub and log in.
For Repository, choose your forked nginx-codebuild repo. For Branch, enter master. Choose Next step.
For Build provider, enter AWS CodeBuild.
Select Create a new build project.
For Project name, enter NGINX.
For Operating system, choose Ubuntu. For Runtime, choose Docker. For Version, select the latest version.
Expand Advanced and set the following environment variables:
AWS_ACCOUNT_ID with a value of the account number
IMAGE_REPO_NAME with a value of NGINX (or whatever ECR name that you used)
Choose Save build project, Next step.
For Deployment provider, choose Amazon ECS.
For Cluster name, enter Fargate.
For Service name, choose NGINX.
For Image filename, enter images.json.
Choose Next step.
Choose Create role, Allow, Next step, and then choose Create pipeline.
Open the IAM console and ensure that that Fargate is available in the selected Region (for example, N. Virginia).
In the left navigation pane, choose Roles.
Choose the code-build-nginx-service-role that was just created and choose Attach policy.
For Policy type, choose AmazonEC2ContainerRegistryPowerUser and choose Attach policy.
Step 4 – Start the pipeline
You now have CodePipeline watching the GitHub repo for changes. It kicks off a CodeBuild build job on a change and, if the build is successful, creates a new deployment of the Fargate service with the new image.
Make a change to the source repo (even just adding a new dummy file) and then commit it and push it to master on your GitHub fork. This automatically kicks off the pipeline to build and deploy the change.
Conclusion
As you’ve seen, Fargate is fast and easy to set up, integrates well with the rest of the AWS platform, and saves you from much of the heavy lifting of running containers reliably at scale.
While it is useful to go through creating things in the console to understand them better we suggest automating them with infrastructure-as-code patterns via things like our CloudFormation to ensure that they are repeatable, and any changes can be managed. There are some example templates to help you get started in this post.
In addition, adding things like unit and integration testing, blue/green and/or manual approval gates into CodePipeline are often a good idea before deploying patterns like this to production in many organizations. Some additional examples to look at next include:
This post was contributed by Nare Hayrapetyan, Sr. Software Engineer
Many customers are excited about new microservices management tools and technologies like service mesh. Specifically, they ask how to get started using Envoy on AWS. In this post, I walk through setting up an Envoy reverse proxy on Amazon Elastic Container Service (Amazon ECS). This example is based on the Envoy front proxy sandbox provided in the Envoy documentation.
The Envoy front proxy acts as a reverse proxy. It accepts incoming requests and routes them to ECS service tasks that can have an envoy sidecar themselves. The envoy sidecar then redirects the request to the service on the local host.
The reverse proxy provides the following features:
Terminates TLS.
Supports both HTTP/1.1 and HTTP/
Supports full HTTP L7
Talks to ECS services via an envoy sidecar, which means that the reverse proxy and the service hosts are operated in the same way and emit the same statistics because both are running Envoy.
To get started, create the following task definitions:
One contains an envoy image acting as a front proxy
One contains an envoy image acting as a service sidecar and an image for the service task itself
The envoy images are the same. The only difference is the configuration provided to the envoy process that defines how the proxy acts.
Service + Envoy Sidecar
Create a simple service that returns the hostname that it’s running on. This allows you to test the Envoy load balancing capabilities.
Create the following:
A simple service running on Amazon ECS
A script to start the service
A Dockerfile that copies the service code and starts it
The envoy configuration
A Dockerfile to download the envoy image and start it with the provided configuration
Images from both Dockerfiles, pushed to a registry so that they can be accessed by ECS
An ECS task definition that points to the envoy and service images
FROM alpine:latest
RUN apk update && apk add python3 bash
RUN python3 – version && pip3 – version
RUN pip3 install -q Flask==0.11.1 requests==2.18.4
RUN mkdir /code
ADD ./service.py /code
ADD ./start_service.sh /usr/local/bin/start_service.sh
RUN chmod u+x /usr/local/bin/start_service.sh
ENTRYPOINT /usr/local/bin/start_service.sh
Here’s the configuration for Envoy as a service sidecar. Use the awsvpc networking mode to allow Envoy to access the service on the local host (see the task definition below).
The envoy process is configured to listen to port 80 and redirect to local host:8080 (127.0.0.1:80), which is the address on which the service is listening. The service can be accessed on the local host because the task is running in awsvpc mode, which allows containers in the task to communicate with each other via local host.
Create the envoy sidecar image that has access to the envoy configuration.
FROM envoyproxy/envoy:latest
COPY service-envoy.yaml /etc/envoy/service-envoy.yaml
CMD /usr/local/bin/envoy -c /etc/envoy/service-envoy.yaml
And here’s the task definition containing the service and envoy images.
The task definition is used to launch an Amazon ECS service. To test load balancing in Envoy, scale the service to a couple of tasks. Also, we’ll add ECS service discovery to the service so that the front proxy can discover the service endpoints.
Envoy Front Proxy
For the front proxy setup, you need a Dockerfile with an envoy image and front proxy envoy configuration. Similar to the task definition earlier, you create a Docker image from the Dockerfile and push it to a repository to be accessed by the ECS task definition.
FROM envoyproxy/envoy:latest
COPY front-envoy.yaml /etc/envoy/front-envoy.yaml
CMD /usr/local/bin/envoy -c /etc/envoy/front-envoy.yaml
The differences between the envoy configurations are the hosts in the cluster. The front proxy envoy uses ECS service discovery—set up when the service was being created—to discover the service endpoints.
After you push the front proxy image to ECR and create an ECS task definition, launch both services (using the front proxy and the service task definitions) in the same VPC. Now the calls to the front proxy are redirected to one of the service envoys discovered by ECS service discovery.
Now test Envoy’s load balancing capabilities:
$ curl (front-proxy-private-ip):80/service Hello from behind Envoy! hostname: 6ae1c4ff6b5d
$ curl (front-proxy-private-ip):80/service Hello from behind Envoy! hostname: 6203f60d9d5c
Conclusion
Now you should be all set! As you can see, getting started with Envoy and ECS can be simple and straight forward. I’m excited to see how you can use these technologies to build next-gen applications!
This post was contributed by Christoph Kassen, AWS Solutions Architect
With the emergence of microservices architectures, the number of services that are part of a web application has increased a lot. It’s not unusual anymore to build and operate hundreds of separate microservices, all as part of the same application.
Think of a typical e-commerce application that displays products, recommends related items, provides search and faceting capabilities, and maintains a shopping cart. Behind the scenes, many more services are involved, such as clickstream tracking, ad display, targeting, and logging. When handling a single user request, many of these microservices are involved in responding. Understanding, analyzing, and debugging the landscape is becoming complex.
AWS X-Ray provides application-tracing functionality, giving deep insights into all microservices deployed. With X-Ray, every request can be traced as it flows through the involved microservices. This provides your DevOps teams the insights they need to understand how your services interact with their peers and enables them to analyze and debug issues much faster.
With microservices architectures, every service should be self-contained and use the technologies best suited for the problem domain. Depending on how the service is built, it is deployed and hosted differently.
One of the most popular choices for packaging and deploying microservices at the moment is containers. The application and its dependencies are clearly defined, the container can be built on CI infrastructure, and the deployment is simplified greatly. Container schedulers, such as Kubernetes and Amazon Elastic Container Service (Amazon ECS), greatly simplify deploying and running containers at scale.
Running X-Ray on Kubernetes
Kubernetes is an open-source container management platform that automates deployment, scaling, and management of containerized applications.
This post shows you how to run X-Ray on top of Kubernetes to provide application tracing capabilities to services hosted on a Kubernetes cluster. Additionally, X-Ray also works for applications hosted on Amazon ECS, AWS Elastic Beanstalk, Amazon EC2, and even when building services with AWS Lambda functions. This flexibility helps you pick the technology you need while still being able to trace requests through all of the services running within your AWS environment.
The complete code, including a simple Node.js based demo application is available in the corresponding aws-xray-kubernetes GitHub repository, so you can quickly get started with X-Ray.
The sample application within the repository consists of two simple microservices, Service-A and Service-B. The following architecture diagram shows how each service is deployed with two Pods on the Kubernetes cluster:
Requests are sent to the Service-A from clients.
Service-A then contacts Service-B.
The requests are serviced by Service-B.
Service-B adds a random delay to each request to show different response times in X-Ray.
To test out the sample applications on your own Kubernetes cluster use the Dockerfiles provided in the GitHub repository, build the two containers, push them to a container registry and apply the yaml configuration with kubectl to your Kubernetes cluster.
Prerequisites
If you currently do not have a cluster running within your AWS environment, take a look at Amazon Elastic Container Service for Kubernetes (Amazon EKS), or use the instructions from the Manage Kubernetes Clusters on AWS Using Kops blog post to spin up a self-managed Kubernetes cluster.
Security Setup for X-Ray
The nodes in the Kubernetes cluster hosting web application Pods need IAM permissions so that the Pods hosting the X-Ray daemon can send traces to the X-Ray service backend.
The easiest way is to set up a new IAM policy allowing all worker nodes within your Kubernetes cluster to write data to X-Ray. In the IAM console or AWS CLI, create a new policy like the following:
Adjust the AWS account ID within the resource. Give the policy a descriptive name, such as k8s-nodes-XrayWriteAccess.
Next, attach the policy to the instance profile for the Kubernetes worker nodes. Therefore, select the IAM role assigned to your worker instances (check the EC2 console if you are unsure) and attach the IAM policy created earlier to it. You can attach the IAM permissions directly from the command line with the following command:
aws iam attach-role-policy – role-name k8s-nodes – policy-arn arn:aws:iam::000000000000:policy/k8s-nodes-XrayWriteAccess
Build the X-Ray daemon Docker image
The X-Ray daemon is available as a single, statically compiled binary that can be downloaded directly from the AWS website.
The first step is to create a Docker container hosting the X-Ray daemon binary and exposing port 2000 via UDP. The daemon is either configured via command line parameters or a configuration file. The most important option is to set the listen port to the correct IP address so that tracing requests from application Pods can be accepted.
To build your own Docker image containing the X-Ray daemon, use the Dockerfile shown below.
# Use Amazon Linux Version 1
FROM amazonlinux:1
# Download latest 2.x release of X-Ray daemon
RUN yum install -y unzip && \
cd /tmp/ && \
curl https://s3.dualstack.us-east-2.amazonaws.com/aws-xray-assets.us-east-2/xray-daemon/aws-xray-daemon-linux-2.x.zip > aws-xray-daemon-linux-2.x.zip && \
unzip aws-xray-daemon-linux-2.x.zip && \
cp xray /usr/bin/xray && \
rm aws-xray-daemon-linux-2.x.zip && \
rm cfg.yaml
# Expose port 2000 on udp
EXPOSE 2000/udp
ENTRYPOINT ["/usr/bin/xray"]
# No cmd line parameters, use default configuration
CMD ['']
This container image is based on Amazon Linux which results in a small container image. To build and tag the container image run docker build -t xray:latest.
Create an Amazon ECR repository
Create a repository in Amazon Elastic Container Registry (Amazon ECR) to hold your X-Ray Docker image. Your Kubernetes cluster uses this repository to pull the image from upon deployment of the X-Ray Pods.
Use the following CLI command to create your repository or alternatively the AWS Management Console:
Make sure that you configure the kubectl tool properly for your cluster to be able to deploy the X-Ray Pod onto your Kubernetes cluster. After the X-Ray Pods are deployed to your Kubernetes cluster, applications can send tracing information to the X-Ray daemon on their host. The biggest advantage is that you do not need to provide X-Ray as a sidecar container alongside your application. This simplifies the configuration and deployment of your applications and overall saves resources on your cluster.
To deploy the X-Ray daemon as Pods onto your Kubernetes cluster, run the following from the cloned GitHub repository:
kubectl apply -f xray-k8s-daemonset.yaml
This deploys and maintains an X-Ray Pod on each worker node, which is accepting tracing data from your microservices and routing it to one of the X-Ray pods. When deploying the container using a DaemonSet, the X-Ray port is also exposed directly on the host. This way, clients can connect directly to the daemon on your node. This avoids unnecessary network traffic going across your cluster.
Connecting to the X-Ray daemon
To integrate application tracing with your applications, use the X-Ray SDK for one of the supported programming languages:
Java
Node.js
.NET (Framework and Core)
Go
Python
The SDKs provide classes and methods for generating and sending trace data to the X-Ray daemon. Trace data includes information about incoming HTTP requests served by the application, and calls that the application makes to downstream services using the AWS SDK or HTTP clients.
By default, the X-Ray SDK expects the daemon to be available on 127.0.0.1:2000. This needs to be changed in this setup, as the daemon is not part of each Pod but hosted within its own Pod.
The deployed X-Ray DaemonSet exposes all Pods via the Kubernetes service discovery, so applications can use this endpoint to discover the X-Ray daemon. If you deployed to the default namespace, the endpoint is:
xray-service.default
Applications now need to set the daemon address either with the AWS_XRAY_DAEMON_ADDRESS environment variable (preferred) or directly within the SDK setup code:
To set up the environment variable, include the following information in your Kubernetes application deployment description YAML. That exposes the X-Ray service address via the environment variable, where it is picked up automatically by the SDK.
Sending tracing information from your application is straightforward with the X-Ray SDKs. The example code below serves as a starting point to instrument your application with traces. Take a look at the two sample applications in the GitHub repository to see how to send traces from Service A to Service B. The diagram below visualizes the flow of requests between the services.
Because your application is running within containers, enable both the EC2Plugin and ECSPlugin, which gives you information about the Kubernetes node hosting the Pod as well as the container name. Despite the name ECSPlugin, this plugin gives you additional information about your container when running your application on Kubernetes.
var app = express();
//...
var AWSXRay = require('aws-xray-sdk');
AWSXRay.config([XRay.plugins.EC2Plugin, XRay.plugins.ECSPlugin]);
app.use(AWSXRay.express.openSegment('defaultName')); //required at the start of your routes
app.get('/', function (req, res) {
res.render('index');
});
app.use(AWSXRay.express.closeSegment()); //required at the end of your routes / first in error handling routes
For more information about all options and possibilities to instrument your application code, see the X-Ray documentation page for the corresponding SDK information.
The picture below shows the resulting service map that provides insights into the flow of requests through the microservice landscape. You can drill down here into individual traces and see which path each request has taken.
From the service map, you can drill down into individual requests and see where they originated from and how much time was spent in each service processing the request.
You can also view details about every individual segment of the trace by clicking on it. This displays more details.
On the Resources tab, you will see the Kubernetes Pod picked up by the ECSPlugin, which handled the request, as well as the instance that Pod was running on.
Summary
In this post, I shared how to deploy and run X-Ray on an existing Kubernetes cluster. Using tracing gives you deep insights into your applications to ease analysis and spot potential problems early. With X-Ray, you get these insights for all your applications running on AWS, no matter if they are hosted on Amazon ECS, AWS Lambda, or a Kubernetes cluster.
This post contributed by AWS Senior Cloud Infrastructure Architect Anabell St Vincent.
Some systems or applications require Transport Layer Security (TLS) traffic from the client all the way through to the Docker container, without offloading or terminating certificates at a load balancer. Some highly time-sensitive services may require communication over TLS without any decryption and re-encryption in the communication path.
There are multiple options for this type of implementation on AWS. One option is to use a service discovery tool to implement the requirements, but that creates overhead from an implementation and management perspective. In this post, I examine the option of using Amazon Elastic Container Service (Amazon ECS) with a Network Load Balancer.
Amazon ECS is a highly scalable, high-performance service for running Docker containers on AWS. It is integrated with each of the Elastic Load Balancing load balancers offered by AWS:
The Classic Load Balancer supports application or network level traffic and can be used to pass through the TLS traffic when configured with TCP listeners. This approach limits the number of containers to be deployed by Amazon ECS to one per EC2 host. This approach is only available for the ECS launch type of EC2, not Fargate. This is due to Classic Load Balancer not supporting target groups, so dynamic port mapping can’t be leveraged for this type of implementation.
The Application Load Balancer functions at the application layer, the layer 7 of Open System Interconnection (OSI) and supports HTTP and HTTPS protocols. By operating at layer 7, the Application Load Balancer is able to route traffic based on the request path in the URL. It can also provide SSL offloading or termination of the SSL certificates for applications, by hosting the SSL certificate. The Application Load Balancer integrates well with ECS by providing the functionality of the target groups for the container instances, which allows for port mapping and targeting container groups. Port mappings allows the containers to run on different ports but still be represented by the one port configured on the ALB as part of the target group of the task.
The Network Load Balancer, which is a high performance, ultra-low latency load balancer that operates at layer 4. It is designed to handle tens of millions of requests per second while maintaining high throughput at ultra-low latency. Operating at layer 4 networking means that the traffic is passed onto the targets based on the IP addresses and ports as oppose to session information or cookies, as is the case with Application Load Balancers. The Network Load Balancer also supports long-lived TCP connections, which are ideal for WebSocket type of applications . Some applications communicate on TCP protocols rather than solely on HTTP and HTTPS. The Network Load Balancer provides the support to load balance between services that require protocols besides HTTP and HTTPS.
The Network Load Balancer is the best option for managing secure traffic as it provides support for TCP traffic pass through, without decrypting and then re-encrypting the traffic. Additionally, the Network Load Balancer supports target groups, which means port mapping can be used, as well as configuring the load balancer as part of the task definition for the containers.
Architecture
The diagram below shows a high-level architecture with TLS traffic coming from the internet that passes through the VPC to the Network Load Balancer, then to a container without offloading or terminating the certificate in the path.
In this diagram, there are three containers running in an ECS cluster. The ECS cluster can be either the EC2 or Fargate launch type.
The following diagram shows an architecture that contains two different services deployed in two different ECS clusters.
In this architecture, the containers in each of the clusters can reference the other containers securely via the Network Load Balancer without terminating or offloading the certificates until it reaches the destination container.
This approach addresses the requirements where containers need to communicate securely whether they are deployed in one VPC, across VPCs, or in separate AWS accounts.
The following diagram below shows both TLS connections from the internet, as well as connections from within the same cluster.
The above approach is achieved by using the same Network Load Balancer for the cluster to serve two different services. Implement this by using two different target groups (one for each service) and associating them with the ECS task definition. The containers use the DNS name associated with the Network Load Balancer to access the containers in the other service.
Summary
The architectures in this post show how the Network Load Balancer integrates seamlessly with ECS and other AWS services, providing end-to-end TLS communication across services without offloading or terminating the certificates. This gives you the ability to use dynamic ports in ECS containers. It can also handle tens of millions of requests per second while maintaining high throughput at ultra-low latency for applications that require the TCP protocol.
If you have questions or suggestions, please comment below.
Building and deploying containerized services manually is slow and prone to errors. Continuous delivery with automated build and test mechanisms helps detect errors early, saves time, and reduces failures, making this a popular model for application deployments. Previously, to automate your container workflows with ECS, you had to build your own solution using AWS CloudFormation. Now, you can integrate CodePipeline and CodeBuild with ECS to automate your workflows in just a few steps.
A typical continuous delivery workflow with CodePipeline, CodeBuild, and ECS might look something like the following:
Choosing your source
Building your project
Deploying your code
We also have a continuous deployment reference architecture on GitHub for this workflow.
Getting Started
First, create a new project with CodePipeline and give the project a name, such as “demo”.
Next, choose a source location where the code is stored. This could be AWS CodeCommit, GitHub, or Amazon S3. For this example, enter GitHub and then give CodePipeline access to the repository.
Next, add a build step. You can import an existing build, such as a Jenkins server URL or CodeBuild project, or create a new step with CodeBuild. If you don’t have an existing build project in CodeBuild, create one from within CodePipeline:
Build provider: AWS CodeBuild
Configure your project: Create a new build project
Environment image: Use an image managed by AWS CodeBuild
Operating system: Ubuntu
Runtime: Docker
Version: aws/codebuild/docker:1.12.1
Build specification: Use the buildspec.yml in the source code root directory
Now that you’ve created the CodeBuild step, you can use it as an existing project in CodePipeline.
Next, add a deployment provider. This is where your built code is placed. It can be a number of different options, such as AWS CodeDeploy, AWS Elastic Beanstalk, AWS CloudFormation, or Amazon ECS. For this example, connect to Amazon ECS.
For CodeBuild to deploy to ECS, you must create an image definition JSON file. This requires adding some instructions to the pre-build, build, and post-build phases of the CodeBuild build process in your buildspec.yml file. For help with creating the image definition file, see Step 1 of the Tutorial: Continuous Deployment with AWS CodePipeline.
Deployment provider: Amazon ECS
Cluster name: enter your project name from the build step
Service name: web
Image filename: enter your image definition filename (“web.json”).
You are almost done!
You can now choose an existing IAM service role that CodePipeline can use to access resources in your account, or let CodePipeline create one. For this example, use the wizard, and go with the role that it creates (AWS-CodePipeline-Service).
Finally, review all of your changes, and choose Create pipeline.
After the pipeline is created, you’ll have a model of your entire pipeline where you can view your executions, add different tests, add manual approvals, or release a change.
This post contributed by Laura Thomson, Senior Product Manager for Amazon EC2.
As you start planning for the new year, I want to give you a heads up that Amazon EC2 is migrating to longer format, 17-character resource IDs. Instances and volumes currently already receive this ID format. Beginning in July 2018, all newly created EC2 resources receive longer IDs as well.
The switch-over will not impact most customers. However, I wanted to make you aware so that you can schedule time at the beginning of 2018 to test your systems with the longer format. If you have a system that parses or stores resource IDs, you may be affected.
From January 2018 through the end of June 2018, there will be a transition period, during which you can opt in to receive longer IDs. To make this easy, AWS will provide an option to opt in with one click for all regions, resources, and users. AWS will also provide more granular controls via API operations and console support. More information on the opt-in process will be sent out in January.
We need to do this given how fast AWS is continuing to grow. We will start to run low on IDs for certain resources within a year or so. In order to enable the long-term, uninterrupted creation of new resources, we need to move to the longer ID format.
The current format is a resource identifier followed by an eight-character string. The new format is the same resource identifier followed by a 17-character string. For example, your current VPCs have resource identifiers such as “vpc-1234abc0”. Starting July 2018, new VPCs will be assigned an identifier such as “vpc-1234567890abcdef0”. You can continue using the existing eight-character IDs for your existing resources, which won’t change and will continue to be supported. Only new resources will receive the 17-character IDs and only after you opt in to the new format.
This post was developed and written by Jeremy Cowan, Thomas Fuller, Samuel Karp, and Akram Chetibi.
—
Containers have revolutionized the way that developers build, package, deploy, and run applications. Initially, containers only supported code and tooling for Linux applications. With the release of Docker Engine for Windows Server 2016, Windows developers have started to realize the gains that their Linux counterparts have experienced for the last several years.
This week, we’re adding support for running production workloads in Windows containers using Amazon Elastic Container Service (Amazon ECS). Now, Amazon ECS provides an ECS-Optimized Windows Server Amazon Machine Image (AMI). This AMI is based on the EC2 Windows Server 2016 AMI, and includes Docker 17.06 Enterprise Edition and the ECS Agent 1.16. This AMI provides improved instance and container launch time performance. It’s based on Windows Server 2016 Datacenter and includes Docker 17.06.2-ee-5, along with a new version of the ECS agent that now runs as a native Windows service.
In this post, I discuss the benefits of this new support, and walk you through getting started running Windows containers with Amazon ECS.
When AWS released the Windows Server 2016 Base with Containers AMI, the ECS agent ran as a process that made it difficult to monitor and manage. As a service, the agent can be health-checked, managed, and restarted no differently than other Windows services. The AMI also includes pre-cached images for Windows Server Core 2016 and Windows Server Nano Server 2016. By caching the images in the AMI, launching new Windows containers is significantly faster. When Docker images include a layer that’s already cached on the instance, Docker re-uses that layer instead of pulling it from the Docker registry.
The ECS agent and an accompanying ECS PowerShell module used to install, configure, and run the agent come pre-installed on the AMI. This guarantees there is a specific platform version available on the container instance at launch. Because the software is included, you don’t have to download it from the internet. This saves startup time.
The Windows-compatible ECS-optimized AMI also reports CPU and memory utilization and reservation metrics to Amazon CloudWatch. Using the CloudWatch integration with ECS, you can create alarms that trigger dynamic scaling events to automatically add or remove capacity to your EC2 instances and ECS tasks.
Getting started
To help you get started running Windows containers on ECS, I’ve forked the ECS reference architecture, to build an ECS cluster comprised of Windows instances instead of Linux instances. You can pull the latest version of the reference architecture for Windows.
The reference architecture is a layered CloudFormation stack, in that it calls other stacks to create the environment. Within the stack, the ecs-windows-cluster.yaml file contains the instructions for bootstrapping the Windows instances and configuring the ECS cluster. To configure the instances outside of AWS CloudFormation (for example, through the CLI or the console), you can add the following commands to your instance’s user data:
If you don’t specify a cluster name when you initialize the agent, the instance is joined to the default cluster.
Adding -EnableIAMTaskRole when initializing the agent adds support for IAM roles for tasks. Previously, enabling this setting meant running a complex script and setting an environment variable before you could assign roles to your ECS tasks.
When you enable IAM roles for tasks on Windows, it consumes port 80 on the host. If you have tasks that listen on port 80 on the host, I recommend configuring a service for them that uses load balancing. You can use port 80 on the load balancer, and the traffic can be routed to another host port on your container instances. For more information, see Service Load Balancing.
Create a cluster
To create a new ECS cluster, choose Launch stack, or pull the GitHub project to your local machine and run the following command:
aws cloudformation create-stack –template-body file://<path to master-windows.yaml> – stack-name <name>
Upload your container image
Now that you have a cluster running, step through how to build and push an image into a container repository. You use a repository hosted in Amazon Elastic Container Registry (Amazon ECR) for this, but you could also use Docker Hub. To build and push an image to a repository, install Docker on your Windows* workstation. You also create a repository and assign the necessary permissions to the account that pushes your image to Amazon ECR. For detailed instructions, see Pushing an Image.
* If you are building an image that is based on Windows layers, then you must use a Windows environment to build and push your image to the registry.
Write your task definition
Now that your image is built and ready, the next step is to run your Windows containers using a task.
Start by creating a new task definition based on the windows-simple-iis image from Docker Hub.
You can now go back into the Task Definition page and see windows-simple-iis as an available task definition.
There are a few important aspects of the task definition file to note when working with Windows containers. First, the hostPort is configured as 8080, which is necessary because the ECS agent currently uses port 80 to enable IAM roles for tasks required for least-privilege security configurations.
There are also some fairly standard task parameters that are intentionally not included. For example, network mode is not available with Windows at the time of this release, so keep that setting blank to allow Docker to configure WinNAT, the only option available today.
Also, some parameters work differently with Windows than they do with Linux. The CPU limits that you define in the task definition are absolute, whereas on Linux they are weights. For information about other task parameters that are supported or possibly different with Windows, see the documentation.
Run your containers
At this point, you are ready to run containers. There are two options to run containers with ECS:
Task
Service
A task is typically a short-lived process that ECS creates. It can’t be configured to actively monitor or scale. A service is meant for longer-running containers and can be configured to use a load balancer, minimum/maximum capacity settings, and a number of other knobs and switches to help ensure that your code keeps running. In both cases, you are able to pick a placement strategy and a specific IAM role for your container.
Select the task definition that you created above and choose Action, Run Task.
Leave the settings on the next page to the default values.
Select the ECS cluster created when you ran the CloudFormation template.
Choose Run Task to start the process of scheduling a Docker container on your ECS cluster.
You can now go to the cluster and watch the status of your task. It may take 5–10 minutes for the task to go from PENDING to RUNNING, mostly because it takes time to download all of the layers necessary to run the microsoft/iis image. After the status is RUNNING, you should see the following results:
You may have noticed that the example task definition is named windows-simple-iis:2. This is because I created a second version of the task definition, which is one of the powerful capabilities of using ECS. You can make the task definitions part of your source code and then version them. You can also roll out new versions and practice blue/green deployment, switching to reduce downtime and improve the velocity of your deployments!
After the task has moved to RUNNING, you can see your website hosted in ECS. Find the public IP or DNS for your ECS host. Remember that you are hosting on port 8080. Make sure that the security group allows ingress from your client IP address to that port and that your VPC has an internet gateway associated with it. You should see a page that looks like the following:
This is a nice start to deploying a simple single instance task, but what if you had a Web API to be scaled out and in based on usage? This is where you could look at defining a service and collecting CloudWatch data to add and remove both instances of the task. You could also use CloudWatch alarms to add more ECS container instances and keep up with the demand. The former is built into the configuration of your service.
Select the task definition and choose Create Service.
Associate a load balancer.
Set up Auto Scaling.
The following screenshot shows an example where you would add an additional task instance when the CPU Utilization CloudWatch metric is over 60% on average over three consecutive measurements. This may not be aggressive enough for your requirements; it’s meant to show you the option to scale tasks the same way you scale ECS instances with an Auto Scaling group. The difference is that these tasks start much faster because all of the base layers are already on the ECS host.
This is just scratching the surface of the flexibility that you get from using containers and Amazon ECS. For more information, see the Amazon ECS Developer Guide and ECS Resources.
Contributed by Tiffany Jernigan, Developer Advocate for Amazon ECS
Get ready for takeoff!
We made sure that this year’s re:Invent is chock-full of containers: there are over 40 sessions! New to containers? No problem, we have several introductory sessions for you to dip your toes. Been using containers for years and know the ins and outs? Don’t miss our technical deep-dives and interactive chalk talks led by container experts.
If you can’t make it to Las Vegas, you can catch the keynotes and session recaps from our livestream and on Twitch.
Session types
Not everyone learns the same way, so we have multiple types of breakout content:
Birds of a Feather An interactive discussion with industry leaders about containers on AWS.
Breakout sessions 60-minute presentations about building on AWS. Sessions are delivered by both AWS experts and customers and span all content levels.
Workshops 2.5-hour, hands-on sessions that teach how to build on AWS. AWS credits are provided. Bring a laptop, and have an active AWS account.
Chalk Talks 1-hour, highly interactive sessions with a smaller audience. They begin with a short lecture delivered by an AWS expert, followed by a discussion with the audience.
Session levels
Whether you’re new to containers or you’ve been using them for years, you’ll find useful information at every level.
Introductory Sessions are focused on providing an overview of AWS services and features, with the assumption that attendees are new to the topic.
Advanced Sessions dive deeper into the selected topic. Presenters assume that the audience has some familiarity with the topic, but may or may not have direct experience implementing a similar solution.
Expert Sessions are for attendees who are deeply familiar with the topic, have implemented a solution on their own already, and are comfortable with how the technology works across multiple services, architectures, and implementations.
Session locations
All container sessions are located in the Aria Resort.
MONDAY 11/27
Breakout sessions
Level 200 (Introductory)
CON202 – Getting Started with Docker and Amazon ECS By packaging software into standardized units, Docker gives code everything it needs to run, ensuring consistency from your laptop all the way into production. But once you have your code ready to ship, how do you run and scale it in the cloud? In this session, you become comfortable running containerized services in production using Amazon ECS. We cover container deployment, cluster management, service auto-scaling, service discovery, secrets management, logging, monitoring, security, and other core concepts. We also cover integrated AWS services and supplementary services that you can take advantage of to run and scale container-based services in the cloud.
Chalk talks
Level 200 (Introductory)
CON211 – Reducing your Compute Footprint with Containers and Amazon ECS Tomas Riha, platform architect for Volvo, shows how Volvo transitioned its WirelessCar platform from using Amazon EC2 virtual machines to containers running on Amazon ECS, significantly reducing cost. Tomas dives deep into the architecture that Volvo used to achieve the migration in under four months, including Amazon ECS, Amazon ECR, Elastic Load Balancing, and AWS CloudFormation.
CON212 – Anomaly Detection Using Amazon ECS, AWS Lambda, and Amazon EMR Learn about the architecture that Cisco CloudLock uses to enable automated security and compliance checks throughout the entire development lifecycle, from the first line of code through runtime. It includes integration with IAM roles, Amazon VPC, and AWS KMS.
Level 400 (Expert)
CON410 – Advanced CICD with Amazon ECS Control Plane Mohit Gupta, product and engineering lead for Clever, demonstrates how to extend the Amazon ECS control plane to optimize management of container deployments and how the control plane can be broadly applied to take advantage of new AWS services. This includes ark—an AWS CLI-based deployment to Amazon ECS, Dapple—a slack-based automation system for deployments and notifications, and Kayvee—log and event routing libraries based on Amazon Kinesis.
Workshops
Level 200 (Introductory)
CON209 – Interstella 8888: Learn How to Use Docker on AWS Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. Join this workshop to get hands-on experience with Docker as you containerize Interstella 8888’s aging monolithic application and deploy it using Amazon ECS.
CON213 – Hands-on Deployment of Kubernetes on AWS In this workshop, attendees get hands-on experience using Kubernetes and Kops (Kubernetes Operations), as described in our recent blog post. Attendees learn how to provision a cluster, assign role-based permissions and security, and launch a container. If you’re interested in learning best practices for running Kubernetes on AWS, don’t miss this workshop.
TUESDAY 11/28
Breakout Sessions
Level 200 (Introductory)
CON206 – Docker on AWS In this session, Docker Technical Staff Member Patrick Chanezon discusses how Finnish Rail, the national train system for Finland, is using Docker on Amazon Web Services to modernize their customer facing applications, from ticket sales to reservations. Patrick also shares the state of Docker development and adoption on AWS, including explaining the opportunities and implications of efforts such as Project Moby, Docker EE, and how developers can use and contribute to Docker projects.
CON208 – Building Microservices on AWS Increasingly, organizations are turning to microservices to help them empower autonomous teams, letting them innovate and ship software faster than ever before. But implementing a microservices architecture comes with a number of new challenges that need to be dealt with. Chief among these finding an appropriate platform to help manage a growing number of independently deployable services. In this session, Sam Newman, author of Building Microservices and a renowned expert in microservices strategy, discusses strategies for building scalable and robust microservices architectures. He also tells you how to choose the right platform for building microservices, and about common challenges and mistakes organizations make when they move to microservices architectures.
Level 300 (Advanced)
CON302 – Building a CICD Pipeline for Containers on AWS Containers can make it easier to scale applications in the cloud, but how do you set up your CICD workflow to automatically test and deploy code to containerized apps? In this session, we explore how developers can build effective CICD workflows to manage their containerized code deployments on AWS.
Ajit Zadgaonkar, Director of Engineering and Operations at Edmunds walks through best practices for CICD architectures used by his team to deploy containers. We also deep dive into topics such as how to create an accessible CICD platform and architect for safe blue/green deployments.
CON307 – Building Effective Container Images Sick of getting paged at 2am and wondering “where did all my disk space go?” New Docker users often start with a stock image in order to get up and running quickly, but this can cause problems as your application matures and scales. Creating efficient container images is important to maximize resources, and deliver critical security benefits.
In this session, AWS Sr. Technical Evangelist Abby Fuller covers how to create effective images to run containers in production. This includes an in-depth discussion of how Docker image layers work, things you should think about when creating your images, working with Amazon ECR, and mise-en-place for install dependencies. Prakash Janakiraman, Co-Founder and Chief Architect at Nextdoor discuss high-level and language-specific best practices for with building images and how Nextdoor uses these practices to successfully scale their containerized services with a small team.
CON309 – Containerized Machine Learning on AWS Image recognition is a field of deep learning that uses neural networks to recognize the subject and traits for a given image. In Japan, Cookpad uses Amazon ECS to run an image recognition platform on clusters of GPU-enabled EC2 instances. In this session, hear from Cookpad about the challenges they faced building and scaling this advanced, user-friendly service to ensure high-availability and low-latency for tens of millions of users.
CON320 – Monitoring, Logging, and Debugging for Containerized Services As containers become more embedded in the platform tools, debug tools, traces, and logs become increasingly important. Nare Hayrapetyan, Senior Software Engineer and Calvin French-Owen, Senior Technical Officer for Segment discuss the principals of monitoring and debugging containers and the tools Segment has implemented and built for logging, alerting, metric collection, and debugging of containerized services running on Amazon ECS.
Chalk Talks
Level 300 (Advanced)
CON314 – Automating Zero-Downtime Production Cluster Upgrades for Amazon ECS Containers make it easy to deploy new code into production to update the functionality of a service, but what happens when you need to update the Amazon EC2 compute instances that your containers are running on? In this talk, we’ll deep dive into how to upgrade the Amazon EC2 infrastructure underlying a live production Amazon ECS cluster without affecting service availability. Matt Callanan, Engineering Manager at Expedia walk through Expedia’s “PRISM” project that safely relocates hundreds of tasks onto new Amazon EC2 instances with zero-downtime to applications.
CON322 – Maximizing Amazon ECS for Large-Scale Workloads Head of Mobfox DevOps, David Spitzer, shows how Mobfox used Docker and Amazon ECS to scale the Mobfox services and development teams to achieve low-latency networking and automatic scaling. This session covers Mobfox’s ecosystem architecture. It compares 2015 and today, the challenges Mobfox faced in growing their platform, and how they overcame them.
CON323 – Microservices Architectures for the Enterprise Salva Jung, Principle Engineer for Samsung Mobile shares how Samsung Connect is architected as microservices running on Amazon ECS to securely, stably, and efficiently handle requests from millions of mobile and IoT devices around the world.
CON324 – Windows Containers on Amazon ECS Docker containers are commonly regarded as powerful and portable runtime environments for Linux code, but Docker also offers API and toolchain support for running Windows Servers in containers. In this talk, we discuss the various options for running windows-based applications in containers on AWS.
CON326 – Remote Sensing and Image Processing on AWS Learn how Encirca services by DuPont Pioneer uses Amazon ECS powered by GPU-instances and Amazon EC2 Spot Instances to run proprietary image-processing algorithms against satellite imagery. Mark Lanning and Ethan Harstad, engineers at DuPont Pioneer show how this architecture has allowed them to process satellite imagery multiple times a day for each agricultural field in the United States in order to identify crop health changes.
Workshops
Level 300 (Advanced)
CON317 – Advanced Container Management at Catsndogs.lol Catsndogs.lol is a (fictional) company that needs help deploying and scaling its container-based application. During this workshop, attendees join the new DevOps team at CatsnDogs.lol, and help the company to manage their applications using Amazon ECS, and help release new features to make our customers happier than ever.Attendees get hands-on with service and container-instance auto-scaling, spot-fleet integration, container placement strategies, service discovery, secrets management with AWS Systems Manager Parameter Store, time-based and event-based scheduling, and automated deployment pipelines. If you are a developer interested in learning more about how Amazon ECS can accelerate your application development and deployment workflows, or if you are a systems administrator or DevOps person interested in understanding how Amazon ECS can simplify the operational model associated with running containers at scale, then this workshop is for you. You should have basic familiarity with Amazon ECS, Amazon EC2, and IAM.
Additional requirements:
The AWS CLI or AWS Tools for PowerShell installed
An AWS account with administrative permissions (including the ability to create IAM roles and policies) created at least 24 hours in advance.
WEDNESDAY 11/29
Birds of a Feather (BoF)
CON01 – Birds of a Feather: Containers and Open Source at AWS Cloud native architectures take advantage of on-demand delivery, global deployment, elasticity, and higher-level services to enable developer productivity and business agility. Open source is a core part of making cloud native possible for everyone. In this session, we welcome thought leaders from the CNCF, Docker, and AWS to discuss the cloud’s direction for growth and enablement of the open source community. We also discuss how AWS is integrating open source code into its container services and its contributions to open source projects.
Breakout Sessions
Level 300 (Advanced)
CON308 – Mastering Kubernetes on AWS Much progress has been made on how to bootstrap a cluster since Kubernetes’ first commit and is now only a matter of minutes to go from zero to a running cluster on Amazon Web Services. However, evolving a simple Kubernetes architecture to be ready for production in a large enterprise can quickly become overwhelming with options for configuration and customization.
In this session, Arun Gupta, Open Source Strategist for AWS and Raffaele Di Fazio, software engineer at leading European fashion platform Zalando, show the common practices for running Kubernetes on AWS and share insights from experience in operating tens of Kubernetes clusters in production on AWS. We cover options and recommendations on how to install and manage clusters, configure high availability, perform rolling upgrades and handle disaster recovery, as well as continuous integration and deployment of applications, logging, and security.
CON310 – Moving to Containers: Building with Docker and Amazon ECS If you’ve ever considered moving part of your application stack to containers, don’t miss this session. We cover best practices for containerizing your code, implementing automated service scaling and monitoring, and setting up automated CI/CD pipelines with fail-safe deployments. Manjeeva Silva and Thilina Gunasinghe show how McDonalds implemented their home delivery platform in four months using Docker containers and Amazon ECS to serve tens of thousands of customers.
Level 400 (Expert)
CON402 – Advanced Patterns in Microservices Implementation with Amazon ECS Scaling a microservice-based infrastructure can be challenging in terms of both technical implementation and developer workflow. In this talk, AWS Solutions Architect Pierre Steckmeyer is joined by Will McCutchen, Architect at BuzzFeed, to discuss Amazon ECS as a platform for building a robust infrastructure for microservices. We look at the key attributes of microservice architectures and how Amazon ECS supports these requirements in production, from configuration to sophisticated workload scheduling to networking capabilities to resource optimization. We also examine what it takes to build an end-to-end platform on top of the wider AWS ecosystem, and what it’s like to migrate a large engineering organization from a monolithic approach to microservices.
CON404 – Deep Dive into Container Scheduling with Amazon ECS As your application’s infrastructure grows and scales, well-managed container scheduling is critical to ensuring high availability and resource optimization. In this session, we deep dive into the challenges and opportunities around container scheduling, as well as the different tools available within Amazon ECS and AWS to carry out efficient container scheduling. We discuss patterns for container scheduling available with Amazon ECS, the Blox scheduling framework, and how you can customize and integrate third-party scheduler frameworks to manage container scheduling on Amazon ECS.
Chalk Talks
Level 300 (Advanced)
CON312 – Building a Selenium Fleet on the Cheap with Amazon ECS with Spot Fleet Roberto Rivera and Matthew Wedgwood, engineers at RetailMeNot, give a practical overview of setting up a fleet of Selenium nodes running on Amazon ECS with Spot Fleet. Discuss the challenges of running Selenium with high availability at minimum cost using Amazon ECS container introspection to connect the Selenium Hub with its nodes.
CON315 – Virtually There: Building a Render Farm with Amazon ECS Learn how 8i Corp scales its multi-tenanted, volumetric render farm up to thousands of instances using AWS, Docker, and an API-driven infrastructure. This render farm enables them to turn the video footage from an array of synchronized cameras into a photo-realistic hologram capable of playback on a range of devices, from mobile phones to high-end head mounted displays. Join Owen Evans, VP of Engineering for 8i, as they dive deep into how 8i’s rendering infrastructure is built and maintained by just a handful of people and powered by Amazon ECS.
CON325 – Developing Microservices – from Your Laptop to the Cloud Wesley Chow, Staff Engineer at Adroll, shows how his team extends Amazon ECS by enabling local development capabilities. Hologram, Adroll’s local development program, brings the capabilities of the Amazon EC2 instance metadata service to non-EC2 hosts, so that developers can run the same software on local machines with the same credentials source as in production.
CON327 – Patterns and Considerations for Service Discovery Roven Drabo, head of cloud operations at Kaplan Test Prep, illustrates Kaplan’s complete container automation solution using Amazon ECS along with how his team uses NGINX and HashiCorp Consul to provide an automated approach to service discovery and container provisioning.
CON328 – Building a Development Platform on Amazon ECS Quinton Anderson, Head of Engineering for Commonwealth Bank of Australia, walks through how they migrated their internal development and deployment platform from Mesos/Marathon to Amazon ECS. The platform uses a custom DSL to abstract a layered application architecture, in a way that makes it easy to plug or replace new implementations into each layer in the stack.
Workshops
Level 300 (Advanced)
CON318 – Interstella 8888: Monolith to Microservices with Amazon ECS Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. Join this workshop to get hands-on experience deploying Docker containers as you break Interstella 8888’s aging monolithic application into containerized microservices. Using Amazon ECS and an Application Load Balancer, you create API-based microservices and deploy them leveraging integrations with other AWS services.
CON332 – Build a Java Spring Application on Amazon ECS This workshop teaches you how to lift and shift existing Spring and Spring Cloud applications onto the AWS platform. Learn how to build a Spring application container, understand bootstrap secrets, push container images to Amazon ECR, and deploy the application to Amazon ECS. Then, learn how to configure the deployment for production.
THURSDAY 11/30
Breakout Sessions
Level 200 (Introductory)
CON201 – Containers on AWS – State of the Union Just over four years after the first public release of Docker, and three years to the day after the launch of Amazon ECS, the use of containers has surged to run a significant percentage of production workloads at startups and enterprise organizations. Join Deepak Singh, General Manager of Amazon Container Services, as he covers the state of containerized application development and deployment trends, new container capabilities on AWS that are available now, options for running containerized applications on AWS, and how AWS customers successfully run container workloads in production.
Level 300 (Advanced)
CON304 – Batch Processing with Containers on AWS Batch processing is useful to analyze large amounts of data. But configuring and scaling a cluster of virtual machines to process complex batch jobs can be difficult. In this talk, we show how to use containers on AWS for batch processing jobs that can scale quickly and cost-effectively. We also discuss AWS Batch, our fully managed batch-processing service. You also hear from GoPro and Here about how they use AWS to run batch processing jobs at scale including best practices for ensuring efficient scheduling, fine-grained monitoring, compute resource automatic scaling, and security for your batch jobs.
Level 400 (Expert)
CON406 – Architecting Container Infrastructure for Security and Compliance While organizations gain agility and scalability when they migrate to containers and microservices, they also benefit from compliance and security, advantages that are often overlooked. In this session, Kelvin Zhu, lead software engineer at Okta, joins Mitch Beaumont, enterprise solutions architect at AWS, to discuss security best practices for containerized infrastructure. Learn how Okta built their development workflow with an emphasis on security through testing and automation. Dive deep into how containers enable automated security and compliance checks throughout the development lifecycle. Also understand best practices for implementing AWS security and secrets management services for any containerized service architecture.
Chalk Talks
Level 300 (Advanced)
CON329 – Full Software Lifecycle Management for Containers Running on Amazon ECS Learn how The Washington Post uses Amazon ECS to run Arc Publishing, a digital journalism platform that powers The Washington Post and a growing number of major media websites. Amazon ECS enabled The Washington Post to containerize their existing microservices architecture, avoiding a complete rewrite that would have delayed the platform’s launch by several years. In this session, Jason Bartz, Technical Architect at The Washington Post, discusses the platform’s architecture. He addresses the challenges of optimizing Arc Publishing’s workload, and managing the application lifecycle to support 2,000 containers running on more than 50 Amazon ECS clusters.
CON330 – Running Containerized HIPAA Workloads on AWS Nihar Pasala, Engineer at Aetion, discusses the Aetion Evidence Platform, a system for generating the real-world evidence used by healthcare decision makers to implement value-based care. This session discusses the architecture Aetion uses to run HIPAA workloads using containers on Amazon ECS, best practices, and learnings.
Level 400 (Expert)
CON408 – Building a Machine Learning Platform Using Containers on AWS DeepLearni.ng develops and implements machine learning models for complex enterprise applications. In this session, Thomas Rogers, Engineer for DeepLearni.ng discusses how they worked with Scotiabank to leverage Amazon ECS, Amazon ECR, Docker, GPU-accelerated Amazon EC2 instances, and TensorFlow to develop a retail risk model that helps manage payment collections for millions of Canadian credit card customers.
Workshops
Level 300 (Advanced)
CON319 – Interstella 8888: CICD for Containers on AWS Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. Join this workshop to learn how to set up a CI/CD pipeline for containerized microservices. You get hands-on experience deploying Docker container images using Amazon ECS, AWS CloudFormation, AWS CodeBuild, and AWS CodePipeline, automating everything from code check-in to production.
FRIDAY 12/1
Breakout Sessions
Level 400 (Expert)
CON405 – Moving to Amazon ECS – the Not-So-Obvious Benefits If you ask 10 teams why they migrated to containers, you will likely get answers like ‘developer productivity’, ‘cost reduction’, and ‘faster scaling’. But teams often find there are several other ‘hidden’ benefits to using containers for their services. In this talk, Franziska Schmidt, Platform Engineer at Mapbox and Yaniv Donenfeld from AWS will discuss the obvious, and not so obvious benefits of moving to containerized architecture. These include using Docker and Amazon ECS to achieve shared libraries for dev teams, separating private infrastructure from shareable code, and making it easier for non-ops engineers to run services.
Chalk Talks
Level 300 (Advanced)
CON331 – Deploying a Regulated Payments Application on Amazon ECS Travelex discusses how they built an FCA-compliant international payments service using a microservices architecture on AWS. This chalk talk covers the challenges of designing and operating an Amazon ECS-based PaaS in a regulated environment using a DevOps model.
Workshops
Level 400 (Expert)
CON407 – Interstella 8888: Advanced Microservice Operations Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. In this workshop, you help Interstella 8888 build a modern microservices-based logistics system to save the company from financial ruin. We give you the hands-on experience you need to run microservices in the real world. This includes implementing advanced container scheduling and scaling to deal with variable service requests, implementing a service mesh, issue tracing with AWS X-Ray, container and instance-level logging with Amazon CloudWatch, and load testing.
Know before you go
Want to brush up on your container knowledge before re:Invent? Here are some helpful resources to get started:
Contributed by Trevor Sullivan, AWS Solutions Architect
When you deploy containers with Amazon ECS, are you gathering all of the key metrics so that you can correctly monitor the overall health of your ECS cluster?
By default, ECS writes metrics to Amazon CloudWatch in 5-minute increments. For complex or large services, this may not be sufficient to make scaling decisions quickly. You may want to respond immediately to changes in workload or to identify application performance problems. Last July, CloudWatch announced support for high-resolution metrics, up to a per-second basis.
These high-resolution metrics can be used to give you a clearer picture of the load and performance for your applications, containers, clusters, and hosts. In this post, I discuss how you can use AWS Step Functions, along with AWS Lambda, to cost effectively record high-resolution metrics into CloudWatch. You implement this solution using a serverless architecture, which keeps your costs low and makes it easier to troubleshoot the solution.
To show how this works, you retrieve some useful metric data from an ECS cluster running in the same AWS account and region (Oregon, us-west-2) as the Step Functions state machine and Lambda function. However, you can use this architecture to retrieve any custom application metrics from any resource in any AWS account and region.
Why Step Functions?
Step Functions enables you to orchestrate multi-step tasks in the AWS Cloud that run for any period of time, up to a year. Effectively, you’re building a blueprint for an end-to-end process. After it’s built, you can execute the process as many times as you want.
For this architecture, you gather metrics from an ECS cluster, every five seconds, and then write the metric data to CloudWatch. After your ECS cluster metrics are stored in CloudWatch, you can create CloudWatch alarms to notify you. An alarm can also trigger an automated remediation activity such as scaling ECS services, when a metric exceeds a threshold defined by you.
When you build a Step Functions state machine, you define the different states inside it as JSON objects. The bulk of the work in Step Functions is handled by the common task state, which invokes Lambda functions or Step Functions activities. There is also a built-in library of other useful states that allow you to control the execution flow of your program.
One of the most useful state types in Step Functions is the parallel state. Each parallel state in your state machine can have one or more branches, each of which is executed in parallel. Another useful state type is the wait state, which waits for a period of time before moving to the next state.
In this walkthrough, you combine these three states (parallel, wait, and task) to create a state machine that triggers a Lambda function, which then gathers metrics from your ECS cluster.
Step Functions pricing
This state machine is executed every minute, resulting in 60 executions per hour, and 1,440 executions per day. Step Functions is billed per state transition, including the Start and End state transitions, and giving you approximately 37,440 state transitions per day. To reach this number, I’m using this estimated math:
26 state transitions per-execution x 60 minutes x 24 hours
Based on current pricing, at $0.000025 per state transition, the daily cost of this metric gathering state machine would be $0.936.
Step Functions offers an indefinite 4,000 free state transitions every month. This benefit is available to all customers, not just customers who are still under the 12-month AWS Free Tier. For more information and cost example scenarios, see Step Functions pricing.
Why Lambda?
The goal is to capture metrics from an ECS cluster, and write the metric data to CloudWatch. This is a straightforward, short-running process that makes Lambda the perfect place to run your code. Lambda is one of the key services that makes up “Serverless” application architectures. It enables you to consume compute capacity only when your code is actually executing.
The process of gathering metric data from ECS and writing it to CloudWatch takes a short period of time. In fact, my average Lambda function execution time, while developing this post, is only about 250 milliseconds on average. For every five-second interval that occurs, I’m only using 1/20th of the compute time that I’d otherwise be paying for.
Lambda pricing
For billing purposes, Lambda execution time is rounded up to the nearest 100-ms interval. In general, based on the metrics that I observed during development, a 250-ms runtime would be billed at 300 ms. Here, I calculate the cost of this Lambda function executing on a daily basis.
Assuming 31 days in each month, there would be 535,680 five-second intervals (31 days x 24 hours x 60 minutes x 12 five-second intervals = 535,680). The Lambda function is invoked every five-second interval, by the Step Functions state machine, and runs for a 300-ms period. At current Lambda pricing, for a 128-MB function, you would be paying approximately the following:
Total compute
Total executions = 535,680 Total compute = total executions x (3 x $0.000000208 per 100 ms) = $0.334 per day
Total requests
Total requests = (535,680 / 1000000) * $0.20 per million requests = $0.11 per day
Total Lambda Cost
$0.11 requests + $0.334 compute time = $0.444 per day
Similar to Step Functions, Lambda offers an indefinite free tier. For more information, see Lambda Pricing.
Walkthrough
In the following sections, I step through the process of configuring the solution just discussed. If you follow along, at a high level, you will:
Configure an IAM role and policy
Create a Step Functions state machine to control metric gathering execution
Create a metric-gathering Lambda function
Configure a CloudWatch Events rule to trigger the state machine
Validate the solution
Prerequisites
You should already have an AWS account with a running ECS cluster. If you don’t have one running, you can easily deploy a Docker container on an ECS cluster using the AWS Management Console. In the example produced for this post, I use an ECS cluster running Windows Server (currently in beta), but either a Linux or Windows Server cluster works.
Create an IAM role and policy
First, create an IAM role and policy that enables Step Functions, Lambda, and CloudWatch to communicate with each other.
The CloudWatch Events rule needs permissions to trigger the Step Functions state machine.
The Step Functions state machine needs permissions to trigger the Lambda function.
The Lambda function needs permissions to query ECS and then write to CloudWatch Logs and metrics.
When you create the state machine, Lambda function, and CloudWatch Events rule, you assign this role to each of those resources. Upon execution, each of these resources assumes the specified role and executes using the role’s permissions.
Open the IAM console.
Choose Roles, create New Role.
For Role Name, enter WriteMetricFromStepFunction.
Choose Save.
Create the IAM role trust relationship The trust relationship (also known as the assume role policy document) for your IAM role looks like the following JSON document. As you can see from the document, your IAM role needs to trust the Lambda, CloudWatch Events, and Step Functions services. By configuring your role to trust these services, they can assume this role and inherit the role permissions.
Open the IAM console.
Choose Roles and select the IAM role previously created.
After you’ve finished configuring your role’s trust relationship, grant the role access to the other AWS resources that make up the solution.
The IAM policy is what gives your IAM role permissions to access various resources. You must whitelist explicitly the specific resources to which your role has access, because the default IAM behavior is to deny access to any AWS resources.
I’ve tried to keep this policy document as generic as possible, without allowing permissions to be too open. If the name of your ECS cluster is different than the one in the example policy below, make sure that you update the policy document before attaching it to your IAM role. You can attach this policy as an inline policy, instead of creating the policy separately first. However, either approach is valid.
Open the IAM console.
Select the IAM role, and choose Permissions.
Choose Add in-line policy.
Choose Custom Policy and then enter the following policy. The inline policy name does not matter.
In this section, you create a Step Functions state machine that invokes the metric-gathering Lambda function every five (5) seconds, for a one-minute period. If you divide a minute (60) seconds into equal parts of five-second intervals, you get 12. Based on this math, you create 12 branches, in a single parallel state, in the state machine. Each branch triggers the metric-gathering Lambda function at a different five-second marker, throughout the one-minute period. After all of the parallel branches finish executing, the Step Functions execution completes and another begins.
Follow these steps to create your Step Functions state machine:
Open the Step Functions console.
Choose Dashboard, Create State Machine.
For State Machine Name, enter WriteMetricFromStepFunction.
Enter the state machine code below into the editor. Make sure that you insert your own AWS account ID for every instance of “676655494xxx”
Choose Create State Machine.
Select the WriteMetricFromStepFunction IAM role that you previously created.
Now you’ve got a shiny new Step Functions state machine! However, you might ask yourself, “After the state machine has been created, how does it get executed?” Before I answer that question, create the Lambda function that writes the custom metric, and then you get the end-to-end process moving.
Create a Lambda function
The meaty part of the solution is a Lambda function, written to consume the Python 3.6 runtime, that retrieves metric values from ECS, and then writes them to CloudWatch. This Lambda function is what the Step Functions state machine is triggering every five seconds, via the Task states. Key points to remember:
The Lambda function needs permission to:
Write CloudWatch metrics (PutMetricData API).
Retrieve metrics from ECS clusters (DescribeCluster API).
Write StdOut to CloudWatch Logs.
Boto3, the AWS SDK for Python, is included in the Lambda execution environment for Python 2.x and 3.x.
Because Lambda includes the AWS SDK, you don’t have to worry about packaging it up and uploading it to Lambda. You can focus on writing code and automatically take a dependency on boto3.
As for permissions, you’ve already created the IAM role and attached a policy to it that enables your Lambda function to access the necessary API actions. When you create your Lambda function, make sure that you select the correct IAM role, to ensure it is invoked with the correct permissions.
The following Lambda function code is generic. So how does the Lambda function know which ECS cluster to gather metrics for? Your Step Functions state machine automatically passes in its state to the Lambda function. When you create your CloudWatch Events rule, you specify a simple JSON object that passes the desired ECS cluster name into your Step Functions state machine, which then passes it to the Lambda function.
Use the following property values as you create your Lambda function:
Function Name: WriteMetricFromStepFunction Description: This Lambda function retrieves metric values from an ECS cluster and writes them to Amazon CloudWatch. Runtime: Python3.6 Memory: 128 MB IAM Role: WriteMetricFromStepFunction
Now you’ve created an IAM role and policy, Step Functions state machine, and Lambda function. How do these components actually start communicating with each other? The final step in this process is to set up a CloudWatch Events rule that triggers your metric-gathering Step Functions state machine every minute. You have two choices for your CloudWatch Events rule expression: rate or cron. In this example, use the cron expression.
A couple key learning points from creating the CloudWatch Events rule:
You can specify one or more targets, of different types (for example, Lambda function, Step Functions state machine, SNS topic, and so on).
You’re required to specify an IAM role with permissions to trigger your target. NOTE: This applies only to certain types of targets, including Step Functions state machines.
Each target that supports IAM roles can be triggered using a different IAM role, in the same CloudWatch Events rule.
Optional: You can provide custom JSON that is passed to your target Step Functions state machine as input.
Follow these steps to create the CloudWatch Events rule:
Open the CloudWatch console.
Choose Events, Rules, Create Rule.
Select Schedule, Cron Expression, and then enter the following rule: 0/1 * * * ? *
Choose Add Target, Step Functions State Machine, WriteMetricFromStepFunction.
For Configure Input, select Constant (JSON Text).
Enter the following JSON input, which is passed to Step Functions, while changing the cluster name accordingly: { "ECSClusterName": "ECSEsgaroth" }
Choose Use Existing Role, WriteMetricFromStepFunction (the IAM role that you previously created).
After you’ve completed with these steps, your screen should look similar to this:
Validate the solution
Now that you have finished implementing the solution to gather high-resolution metrics from ECS, validate that it’s working properly.
Open the CloudWatch console.
Choose Metrics.
Choose custom and select the ECS namespace.
Choose the ClusterName metric dimension.
You should see your metrics listed below.
Troubleshoot configuration issues
If you aren’t receiving the expected ECS cluster metrics in CloudWatch, check for the following common configuration issues. Review the earlier procedures to make sure that the resources were properly configured.
The IAM role’s trust relationship is incorrectly configured. Make sure that the IAM role trusts Lambda, CloudWatch Events, and Step Functions in the correct region.
The IAM role does not have the correct policies attached to it. Make sure that you have copied the IAM policy correctly as an inline policy on the IAM role.
The CloudWatch Events rule is not triggering new Step Functions executions. Make sure that the target configuration on the rule has the correct Step Functions state machine and IAM role selected.
The Step Functions state machine is being executed, but failing part way through. Examine the detailed error message on the failed state within the failed Step Functions execution. It’s possible that the
IAM role does not have permissions to trigger the target Lambda function, that the target Lambda function may not exist, or that the Lambda function failed to complete successfully due to invalid permissions. Although the above list covers several different potential configuration issues, it is not comprehensive. Make sure that you understand how each service is connected to each other, how permissions are granted through IAM policies, and how IAM trust relationships work.
Conclusion
In this post, you implemented a Serverless solution to gather and record high-resolution application metrics from containers running on Amazon ECS into CloudWatch. The solution consists of a Step Functions state machine, Lambda function, CloudWatch Events rule, and an IAM role and policy. The data that you gather from this solution helps you rapidly identify issues with an ECS cluster.
To gather high-resolution metrics from any service, modify your Lambda function to gather the correct metrics from your target. If you prefer not to use Python, you can implement a Lambda function using one of the other supported runtimes, including Node.js, Java, or .NET Core. However, this post should give you the fundamental basics about capturing high-resolution metrics in CloudWatch.
If you found this post useful, or have questions, please comment below.
This post courtesy of ECS Sr. Software Dev Engineer Anirudh Aithal.
Today, AWS announced Task Networking for Amazon ECS. This feature brings Amazon EC2 networking capabilities to tasks using elastic network interfaces.
An elastic network interface is a virtual network interface that you can attach to an instance in a VPC. When you launch an EC2 virtual machine, an elastic network interface is automatically provisioned to provide networking capabilities for the instance.
A task is a logical group of running containers. Previously, tasks running on Amazon ECS shared the elastic network interface of their EC2 host. Now, the new awsvpc networking mode lets you attach an elastic network interface directly to a task.
This simplifies network configuration, allowing you to treat each container just like an EC2 instance with full networking features, segmentation, and security controls in the VPC.
In this post, I cover how awsvpc mode works and show you how you can start using elastic network interfaces with your tasks running on ECS.
Background: Elastic network interfaces in EC2
When you launch EC2 instances within a VPC, you don’t have to configure an additional overlay network for those instances to communicate with each other. By default, routing tables in the VPC enable seamless communication between instances and other endpoints. This is made possible by virtual network interfaces in VPCs called elastic network interfaces. Every EC2 instance that launches is automatically assigned an elastic network interface (the primary network interface). All networking parameters—such as subnets, security groups, and so on—are handled as properties of this primary network interface.
Furthermore, an IPv4 address is allocated to every elastic network interface by the VPC at creation (the primary IPv4 address). This primary address is unique and routable within the VPC. This effectively makes your VPC a flat network, resulting in a simple networking topology.
Elastic network interfaces can be treated as fundamental building blocks for connecting various endpoints in a VPC, upon which you can build higher-level abstractions. This allows elastic network interfaces to be leveraged for:
VPC-native IPv4 addressing and routing (between instances and other endpoints in the VPC)
Network traffic isolation
Network policy enforcement using ACLs and firewall rules (security groups)
IPv4 address range enforcement (via subnet CIDRs)
Why use awsvpc?
Previously, ECS relied on the networking capability provided by Docker’s default networking behavior to set up the network stack for containers. With the default bridge network mode, containers on an instance are connected to each other using the docker0 bridge. Containers use this bridge to communicate with endpoints outside of the instance, using the primary elastic network interface of the instance on which they are running. Containers share and rely on the networking properties of the primary elastic network interface, including the firewall rules (security group subscription) and IP addressing.
This means you cannot address these containers with the IP address allocated by Docker (it’s allocated from a pool of locally scoped addresses), nor can you enforce finely grained network ACLs and firewall rules. Instead, containers are addressable in your VPC by the combination of the IP address of the primary elastic network interface of the instance, and the host port to which they are mapped (either via static or dynamic port mapping). Also, because a single elastic network interface is shared by multiple containers, it can be difficult to create easily understandable network policies for each container.
The awsvpc networking mode addresses these issues by provisioning elastic network interfaces on a per-task basis. Hence, containers no longer share or contend use these resources. This enables you to:
Run multiple copies of the container on the same instance using the same container port without needing to do any port mapping or translation, simplifying the application architecture.
Extract higher network performance from your applications as they no longer contend for bandwidth on a shared bridge.
Enforce finer-grained access controls for your containerized applications by associating security group rules for each Amazon ECS task, thus improving the security for your applications.
Associating security group rules with a container or containers in a task allows you to restrict the ports and IP addresses from which your application accepts network traffic. For example, you can enforce a policy allowing SSH access to your instance, but blocking the same for containers. Alternatively, you could also enforce a policy where you allow HTTP traffic on port 80 for your containers, but block the same for your instances. Enforcing such security group rules greatly reduces the surface area of attack for your instances and containers.
ECS manages the lifecycle and provisioning of elastic network interfaces for your tasks, creating them on-demand and cleaning them up after your tasks stop. You can specify the same properties for the task as you would when launching an EC2 instance. This means that containers in such tasks are:
Addressable by IP addresses and the DNS name of the elastic network interface
Attachable as ‘IP’ targets to Application Load Balancers and Network Load Balancers
Observable from VPC flow logs
Access controlled by security groups
This also enables you to run multiple copies of the same task definition on the same instance, without needing to worry about port conflicts. You benefit from higher performance because you don’t need to perform any port translations or contend for bandwidth on the shared docker0 bridge, as you do with the bridge networking mode.
Getting started
If you don’t already have an ECS cluster, you can create one using the create cluster wizard. In this post, I use “awsvpc-demo” as the cluster name. Also, if you are following along with the command line instructions, make sure that you have the latest version of the AWS CLI or SDK.
Registering the task definition
The only change to make in your task definition for task networking is to set the networkMode parameter to awsvpc. In the ECS console, enter this value for Network Mode.
If you plan on registering a container in this task definition with an ECS service, also specify a container port in the task definition. This example specifies an NGINX container exposing port 80:
This creates a task definition named “nginx-awsvpc" with networking mode set to awsvpc. The following commands illustrate registering the task definition from the command line:
To run a task with this task definition, navigate to the cluster in the Amazon ECS console and choose Run new task. Specify the task definition as “nginx-awsvpc“. Next, specify the set of subnets in which to run this task. You must have instances registered with ECS in at least one of these subnets. Otherwise, ECS can’t find a candidate instance to attach the elastic network interface.
You can use the console to narrow down the subnets by selecting a value for Cluster VPC:
Next, select a security group for the task. For the purposes of this example, create a new security group that allows ingress only on port 80. Alternatively, you can also select security groups that you’ve already created.
Next, run the task by choosing Run Task.
You should have a running task now. If you look at the details of the task, you see that it has an elastic network interface allocated to it, along with the IP address of the elastic network interface:
When you describe an “awsvpc” task, details of the elastic network interface are returned via the “attachments” object. You can also get this information from the “containers” object. For example:
The nginx container is now addressable in your VPC via the 10.0.0.35 IPv4 address. You did not have to modify the security group on the instance to allow requests on port 80, thus improving instance security. Also, you ensured that all ports apart from port 80 were blocked for this application without modifying the application itself, which makes it easier to manage your task on the network. You did not have to interact with any of the elastic network interface API operations, as ECS handled all of that for you.
In this post, I take a closer look at how this new container-native “awsvpc” network mode is implemented using container networking interface plugins on ECS managed instances (referred to as container instances).
This post is a deep dive into how task networking works with Amazon ECS. If you want to learn more about how you can start using task networking for your containerized applications, see Introducing Cloud Native Networking for Amazon ECS Containers. Cloud Native Computing Foundation (CNCF) hosts the Container Networking Interface (CNI) project, which consists of a specification and libraries for writing plugins to configure network interfaces in Linux containers. For more about cloud native computing in AWS, see Adrian Cockcroft’s post on Cloud Native Computing.
Container instance setup
Before I discuss the details of enabling task networking on container instances, look at how a typical instance looks in ECS.
The diagram above shows a typical container instance. The ECS agent, which itself is running as a container, is responsible for:
Registering the EC2 instance with the ECS backend
Ensuring that task state changes communicated to it by the ECS backend are enacted on the container instance
Interacting with the Docker daemon to create, start, stop, and monitor
Relaying container state and task state transitions to the ECS backend
Because the ECS agent is just acting as the supervisor for containers under its management, it offloads the problem of setting up networking for containers to either the Docker daemon (for containers configured with one of Docker’s default networking modes) or a set of CNI plugins (for containers in task with networking mode set to awsvpc).
In either case, network stacks of containers are configured via network namespaces. As per the ip-netns(8) manual, “A network namespace is logically another copy of the network stack, with its own routes, firewall rules, and network devices.” The network namespace construct makes the partitioning of network stack between processes and containers running on a host possible.
Network namespaces and CNI plugins
CNI plugins are executable files that comply with the CNI specification and configure the network connectivity of containers. The CNI project defines a specification for the plugins and provides a library for interacting with plugins, thus providing a consistent, reliable, and simple interface with which to interact with the plugins.
You specify the container or its network namespace and invoke the plugin with the ADD command to add network interfaces to a container, and then the DEL command to tear them down. For example, the reference bridge plugin adds all containers on the same host into a bridge that resides in the host network namespace.
This plugin model fits in nicely with the ECS agent’s “minimal intrusion in the container lifecycle” model, as the agent doesn’t need to concern itself with the details of the network setup for containers. It’s also an extensible model, which allows the agent to switch to a different set of plugins if the need arises in future. Finally, the ECS agent doesn’t need to monitor the liveliness of these plugins as they are only invoked when required.
Invoking CNI plugins from the ECS agent
When ECS attaches an elastic network interface to the instance and sends the message to the agent to provision the elastic network interface for containers in a task, the elastic network interface (as with any network device) shows up in the global default network namespace of the host. The ECS agent invokes a chain of CNI plugins to ensure that the elastic network interface is configured appropriately in the container’s network namespace. You can review these plugins in the amazon-ecs-cni-plugins GitHub repo.
The first plugin invoked in this chain is the ecs-eni plugin, which ensures that the elastic network interface is attached to container’s network namespace and configured with the VPC-allocated IP addresses and the default route to use the subnet gateway. The container also needs to make HTTP requests to the credentials endpoint (hosted by the ECS agent) for getting IAM role credentials. This is handled by the ecs-bridge and ecs-ipam plugins, which are invoked next. The CNI library provides mechanisms to interpret the results from the execution of these plugins, which results in an efficient error handling in the agent. The following diagram illustrates the different steps in this process:
To avoid the race condition between configuring the network stack and commands being invoked in application containers, the ECS agent creates an additional “pause” container for each task before starting the containers in the task definition. It then sets up the network namespace of the pause container by executing the previously mentioned CNI plugins. It also starts the rest of the containers in the task so that they share their network stack of the pause container. This means that all containers in a task are addressable by the IP addresses of the elastic network interface, and they can communicate with each other over the localhost interface.
In this example setup, you have two containers in a task behind an elastic network interface. The following commands show that they have a similar view of the network stack and can talk to each other over the localhost interface.
List the last three containers running on the host (you launched a task with two containers and the ECS agent launched the additional container to configure the network namespace):
$ docker ps -n 3 – format "{{.ID}}\t{{.Names}}\t{{.Command}}\t{{.Status}}"
7d7b7fbc30b9 ecs-front-envoy-5-envoy-sds-ecs-ce8bd9eca6dd81a8d101 "/bin/sh -c '/usr/..." Up 3 days
dfdcb2acfc91 ecs-front-envoy-5-front-envoy-faeae686adf9c1d91000 "/bin/sh -c '/usr/..." Up 3 days
f731f6dbb81c ecs-front-envoy-5-internalecspause-a8e6e19e909fa9c9e901 "./pause" Up 3 days
List interfaces for these containers and make sure that they are the same:
$ for id in `docker ps -n 3 -q`; do pid=`docker inspect $id -f '{{.State.Pid}}'`; echo container $id; sudo nsenter -t $pid -n ip link show; done
container 7d7b7fbc30b9
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: ec[email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 0a:58:a9:fe:ac:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
27: eth12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 02:5a:a1:1a:43:42 brd ff:ff:ff:ff:ff:ff
container dfdcb2acfc91
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 0a:58:a9:fe:ac:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
27: eth12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 02:5a:a1:1a:43:42 brd ff:ff:ff:ff:ff:ff
container f731f6dbb81c
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 0a:58:a9:fe:ac:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
27: eth12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 02:5a:a1:1a:43:42 brd ff:ff:ff:ff:ff:ff
Starting today, customers can keep their container image repositories tidy by automatically removing old or unused images using lifecycle policies, now available as part of Amazon E2 Container Repository (Amazon ECR).
Amazon ECR is a fully managed Docker container registry that makes it easy to store manage and deploy Docker container images without worrying about the typical challenges of scaling a service to handle pulling hundreds of images at one time. This scale means that development teams using Amazon ECR actively often find that their repositories fill up with many container image versions. This makes it difficult to find the code changes that matter and incurs unnecessary storage costs. Previously, cleaning up your repository meant spending time to manually delete old images, or writing and executing scripts.
Now, lifecycle policies allow you to define a set of rules to remove old container images automatically. You can also preview rules to see exactly which container images are affected when the rule runs. This allows repositories to be better organized, makes it easier to find the code revisions that matter, and lowers storage costs.
Look at how lifecycle policies work.
Ground Rules
One of the biggest benefits of deploying code in containers is the ability to quickly and easily roll back to a previous version. You can deploy with less risk because, if something goes wrong, it is easy to revert back to the previous container version and know that your application will run like it did before the failed deployment. Most people probably never roll back past a few versions. If your situation is similar, then one simple lifecycle rule might be to just keep the last 30 images.
Last 30 Images
In your ECR registry, choose Dry-Run Lifecycle Rules, Add.
For Image Status, select Untagged.
Under Match criteria, for Count Type, enter Image Count More Than.
For Count Number, enter 30.
For Rule action, choose expire.
Choose Save. To see which images would be cleaned up, Save and dry-run rules.
Of course, there are teams who, for compliance reasons, might prefer to keep certain images for a period of time, rather than keeping by count. For that situation, you can choose to clean up images older than 90 days.
Last 90 Days
Select the rule that you just created and choose Edit. Change the parameters to keep only 90 days of untagged images:
Under Match criteria, for Count Type, enter Since Image Pushed
For Count Number, enter 90.
For Count Unit, enter days.
Tags
Certainly 90 days is an arbitrary timeframe, and your team might have policies in place that would require a longer timeframe for certain kinds of images. If that’s the case, but you still want to continue with the spring cleaning, you can consider getting rid of images that are tag prefixed.
Here is the list of rules I came up with to groom untagged, development, staging, and production images:
Remove untagged images over 90 days old
Remove development tagged images over 90 days old
Remove staging tagged images over 180 days old
Remove production tagged images over 1 year old
As you can see, the new Amazon ECR lifecycle policies are powerful, and help you easily keep the images you need, while cleaning out images you may never use again. This feature is available starting today, in all regions where Amazon ECR is available, at no extra charge. For more information, see Amazon ECR Lifecycle Policies in the AWS technical documentation.
When you use Amazon Relational Database Service (Amazon RDS), depending on the logging levels on the RDS instances and the volume of transactions, you could generate a lot of log data. To ensure that everything is running smoothly, many customers search for log error patterns using different log aggregation and visualization systems, such as Amazon Elasticsearch Service, Splunk, or other tool of their choice. A module needs to periodically retrieve the RDS logs using the SDK, and then send them to Amazon S3. From there, you can stream them to your log aggregation tool.
One option is writing an AWS Lambda function to retrieve the log files. However, because of the time that this function needs to execute, depending on the volume of log files retrieved and transferred, it is possible that Lambda could time out on many instances. Another approach is launching an Amazon EC2 instance that runs this job periodically. However, this would require you to run an EC2 instance continuously, not an optimal use of time or money.
Using the new Amazon CloudWatch integration with Amazon EC2 Container Service, you can trigger this job to run in a container on an existing Amazon ECS cluster. Additionally, this would allow you to improve costs by running containers on a fleet of Spot Instances.
In this post, I will show you how to use the new scheduled tasks (cron) feature in Amazon ECS and launch tasks using CloudWatch events, while leveraging Spot Fleet to maximize availability and cost optimization for containerized workloads.
Architecture
The following diagram shows how the various components described schedule a task that retrieves log files from Amazon RDS database instances, and deposits the logs into an S3 bucket.
Amazon ECS cluster container instances are using Spot Fleet, which is a perfect match for the workload that needs to run when it can. This improves cluster costs.
The task definition defines which Docker image to retrieve from the Amazon EC2 Container Registry (Amazon ECR) repository and run on the Amazon ECS cluster.
The container image has Python code functions to make AWS API calls using boto3. It iterates over the RDS database instances, retrieves the logs, and deposits them in the S3 bucket. Many customers choose these logs to be delivered to their centralized log-store. CloudWatch Events defines the schedule for when the container task has to be launched.
Walkthrough
To provide the basic framework, we have built an AWS CloudFormation template that creates the following resources:
Amazon ECR repository for storing the Docker image to be used in the task definition
S3 bucket that holds the transferred logs
Task definition, with image name and S3 bucket as environment variables provided via input parameter
CloudWatch Events rule
Amazon ECS cluster
Amazon ECS container instances using Spot Fleet
IAM roles required for the container instance profiles
Before you begin
Ensure that Git, Docker, and the AWS CLI are installed on your computer.
Run the CloudFormation stack and get the names for the Amazon ECR repo and S3 bucket. In the stack, choose Outputs.
Open the ECS console and choose Repositories. The rdslogs repo has been created. Choose View Push Commands and follow the instructions to connect to the repository and push the image for the code that you built in Step 2. The screenshot shows the final result:
Associate the CloudWatch scheduled task with the created Amazon ECS Task Definition, using a new CloudWatch event rule that is scheduled to run at intervals. The following rule is scheduled to run every 15 minutes: aws – profile default – region us-west-2 events put-rule – name demo-ecs-task-rule – schedule-expression "rate(15 minutes)"
CloudWatch requires IAM permissions to place a task on the Amazon ECS cluster when the CloudWatch event rule is executed, in addition to an IAM role that can be assumed by CloudWatch Events. This is done in three steps:
Create the IAM role to be assumed by CloudWatch. aws – profile default – region us-west-2 iam create-role – role-name Test-Role – assume-role-policy-document file://event-role.json
Create the IAM policy defining the ECS cluster and task definition. You need to get these values from the CloudFormation outputs and resources. aws – profile default – region us-west-2 iam create-policy – policy-name test-policy – policy-document file://event-policy.json
Attach the IAM policy to the role. aws – profile default – region us-west-2 iam attach-role-policy – role-name Test-Role – policy-arn arn:aws:iam::1234567890:policy/test-policy
Associate the CloudWatch rule created earlier to place the task on the ECS cluster. The following command shows an example. Replace the AWS account ID and region with your settings. aws events put-targets – rule demo-ecs-task-rule – targets "Id"="1","Arn"="arn:aws:ecs:us-west-2:12345678901:cluster/test-cwe-blog-ecsCluster-15HJFWCH4SP67","EcsParameters"={"TaskDefinitionArn"="arn:aws:ecs:us-west-2:12345678901:task-definition/test-cwe-blog-taskdef:8"},"RoleArn"="arn:aws:iam::12345678901:role/Test-Role"
{
"FailedEntries": [],
"FailedEntryCount": 0
}
That’s it. The logs now run based on the defined schedule.
To test this, open the Amazon ECS console, select the Amazon ECS cluster that you created, and then choose Tasks, Run New Task. Select the task definition created by the CloudFormation template, and the cluster should be selected automatically. As this runs, the S3 bucket should be populated with the RDS logs for the instance.
Conclusion
In this post, you’ve seen that the choices for workloads that need to run at a scheduled time include Lambda with CloudWatch events or EC2 with cron. However, sometimes the job could run outside of Lambda execution time limits or be not cost-effective for an EC2 instance.
In such cases, you can schedule the tasks on an ECS cluster using CloudWatch rules. In addition, you can use a Spot Fleet cluster with Amazon ECS for cost-conscious workloads that do not have hard requirements on execution time or instance availability in the Spot Fleet. For more information, see Powering your Amazon ECS Cluster with Amazon EC2 Spot Instances and Scheduled Events.
If you have questions or suggestions, please comment below.
This post and accompanying code graciously contributed by:
Huy Huynh Sr. Solutions Architect
Magnus Bjorkman Solutions Architect
Java is a popular language used by many enterprises today. To simplify and accelerate Java application development, many companies are moving from a monolithic to microservices architecture. For some, it has become a strategic imperative. Containerization technology, such as Docker, lets enterprises build scalable, robust microservice architectures without major code rewrites.
In this post, I cover how to containerize a monolithic Java application to run on Docker. Then, I show how to deploy it on AWS using Amazon EC2 Container Service (Amazon ECS), a high-performance container management service. Finally, I show how to break the monolith into multiple services, all running in containers on Amazon ECS.
Application Architecture
For this example, I use the Spring Pet Clinic, a monolithic Java application for managing a veterinary practice. It is a simple REST API, which allows the client to manage and view Owners, Pets, Vets, and Visits.
It is a simple three-tier architecture:
Client You simulate this by using curl commands.
Web/app server This is the Java and Spring-based application that you run using the embedded Tomcat. As part of this post, you run this within Docker containers.
Database server This is the relational database for your application that stores information about owners, pets, vets, and visits. For this post, use MySQL RDS.
I decided to not put the database inside a container as containers were designed for applications and are transient in nature. The choice was made even easier because you have a fully managed database service available with Amazon RDS.
RDS manages the work involved in setting up a relational database, from provisioning the infrastructure capacity that you request to installing the database software. After your database is up and running, RDS automates common administrative tasks, such as performing backups and patching the software that powers your database. With optional Multi-AZ deployments, Amazon RDS also manages synchronous data replication across Availability Zones with automatic failover.
You need the following to walk through this solution:
An AWS account
An access key and secret key for a user in the account
The AWS CLI installed
Also, install the latest versions of the following:
Java
Maven
Python
Docker
Step 1: Move the existing Java Spring application to a container deployed using Amazon ECS
First, move the existing monolith application to a container and deploy it using Amazon ECS. This is a great first step before breaking the monolith apart because you still get some benefits before breaking apart the monolith:
An improved pipeline. The container also allows an engineering organization to create a standard pipeline for the application lifecycle.
The following diagram is an overview of what the setup looks like for Amazon ECS and related services:
This setup consists of the following resources:
The client application that makes a request to the load balancer.
The load balancer that distributes requests across all available ports and instances registered in the application’s target group using round-robin.
The target group that is updated by Amazon ECS to always have an up-to-date list of all the service containers in the cluster. This includes the port on which they are accessible.
One Amazon ECS cluster that hosts the container for the application.
A VPC network to host the Amazon ECS cluster and associated security groups.
Each container has a single application process that is bound to port 8080 within its namespace. In reality, all the containers are exposed on a different, randomly assigned port on the host.
The architecture is containerized but still monolithic because each container has all the same features of the rest of the containers
The following is also part of the solution but not depicted in the above diagram:
A service/task definition that spins up containers on the instances of the Amazon ECS cluster.
A MySQL RDS instance that hosts the applications schema. The information about the MySQL RDS instance is sent in through environment variables to the containers, so that the application can connect to the MySQL RDS instance.
The Python script calls the CloudFormation template for the initial setup of the VPC, Amazon ECS cluster, and RDS instance. It then extracts the outputs from the template and uses those for API calls to create Amazon ECR repositories, tasks, services, Application Load Balancer, and target groups.
Environment variables and Spring properties binding
As part of the Python script, you pass in a number of environment variables to the container as part of the task/container definition:
The preceding environment variables work in concert with the Spring property system. The value in the variable SPRING_PROFILES_ACTIVE, makes Spring use the MySQL version of the application property file. The other environment files override the following properties in that file:
Use the Spotify Docker Maven plugin to create the image and push it directly to Amazon ECR. This allows you to do this as part of the regular Maven build. It also integrates the image generation as part of the overall build process. Use an explicit Dockerfile as input to the plugin.
FROM frolvlad/alpine-oraclejdk8:slim
VOLUME /tmp
ADD spring-petclinic-rest-1.7.jar app.jar
RUN sh -c 'touch /app.jar'
ENV JAVA_OPTS=""
ENTRYPOINT [ "sh", "-c", "java $JAVA_OPTS -Djava.security.egd=file:/dev/./urandom -jar /app.jar" ]
The Python script discussed earlier uses the AWS CLI to authenticate you with AWS. The script places the token in the appropriate location so that the plugin can work directly against the Amazon ECR repository.
Test setup
You can test the setup by running the Python script: python setup.py -m setup -r <your region>
After the script has successfully run, you can test by querying an endpoint: curl <your endpoint from output above>/owner
You can clean this up before going to the next section: python setup.py -m cleanup -r <your region>
Step 2: Converting the monolith into microservices running on Amazon ECS
The second step is to convert the monolith into microservices. For a real application, you would likely not do this as one step, but re-architect an application piece by piece. You would continue to run your monolith but it would keep getting smaller for each piece that you are breaking apart.
By migrating microservices, you would get four benefits associated with microservices:
Isolation of crashes If one microservice in your application is crashing, then only that part of your application goes down. The rest of your application continues to work properly.
Isolation of security When microservice best practices are followed, the result is that if an attacker compromises one service, they only gain access to the resources of that service. They can’t horizontally access other resources from other services without breaking into those services as well.
Independent scaling When features are broken out into microservices, then the amount of infrastructure and number of instances of each microservice class can be scaled up and down independently.
Development velocity In a monolith, adding a new feature can potentially impact every other feature that the monolith contains. On the other hand, a proper microservice architecture has new code for a new feature going into a new service. You can be confident that any code you write won’t impact the existing code at all, unless you explicitly write a connection between two microservices.
Find the monolith example at 2_ECS_Java_Spring_PetClinic_Microservices. You break apart the Spring Pet Clinic application by creating a microservice for each REST API operation, as well as creating one for the system services.
Java code changes
Comparing the project structure between the monolith and the microservices version, you can see that each service is now its own separate build. First, the monolith version: You can clearly see how each API operation is its own subpackage under the org.springframework.samples.petclinic package, all part of the same monolithic application. This changes as you break it apart in the microservices version: Now, each API operation is its own separate build, which you can build independently and deploy. You have also duplicated some code across the different microservices, such as the classes under the model subpackage. This is intentional as you don’t want to introduce artificial dependencies among the microservices and allow these to evolve differently for each microservice.
Also, make the dependencies among the API operations more loosely coupled. In the monolithic version, the components are tightly coupled and use object-based invocation.
Here is an example of this from the OwnerController operation, where the class is directly calling PetRepository to get information about pets. PetRepository is the Repository class (Spring data access layer) to the Pet table in the RDS instance for the Pet API:
@RestController
class OwnerController {
@Inject
private PetRepository pets;
@Inject
private OwnerRepository owners;
private static final Logger logger = LoggerFactory.getLogger(OwnerController.class);
@RequestMapping(value = "/owner/{ownerId}/getVisits", method = RequestMethod.GET)
public ResponseEntity<List<Visit>> getOwnerVisits(@PathVariable int ownerId){
List<Pet> petList = this.owners.findById(ownerId).getPets();
List<Visit> visitList = new ArrayList<Visit>();
petList.forEach(pet -> visitList.addAll(pet.getVisits()));
return new ResponseEntity<List<Visit>>(visitList, HttpStatus.OK);
}
}
In the microservice version, call the Pet API operation and not PetRepository directly. Decouple the components by using interprocess communication; in this case, the Rest API. This provides for fault tolerance and disposability.
@RestController
class OwnerController {
@Value("#{environment['SERVICE_ENDPOINT'] ?: 'localhost:8080'}")
private String serviceEndpoint;
@Inject
private OwnerRepository owners;
private static final Logger logger = LoggerFactory.getLogger(OwnerController.class);
@RequestMapping(value = "/owner/{ownerId}/getVisits", method = RequestMethod.GET)
public ResponseEntity<List<Visit>> getOwnerVisits(@PathVariable int ownerId){
List<Pet> petList = this.owners.findById(ownerId).getPets();
List<Visit> visitList = new ArrayList<Visit>();
petList.forEach(pet -> {
logger.info(getPetVisits(pet.getId()).toString());
visitList.addAll(getPetVisits(pet.getId()));
});
return new ResponseEntity<List<Visit>>(visitList, HttpStatus.OK);
}
private List<Visit> getPetVisits(int petId){
List<Visit> visitList = new ArrayList<Visit>();
RestTemplate restTemplate = new RestTemplate();
Pet pet = restTemplate.getForObject("http://"+serviceEndpoint+"/pet/"+petId, Pet.class);
logger.info(pet.getVisits().toString());
return pet.getVisits();
}
}
You now have an additional method that calls the API. You are also handing in the service endpoint that should be called, so that you can easily inject dynamic endpoints based on the current deployment.
Container deployment overview
Here is an overview of what the setup looks like for Amazon ECS and the related services: This setup consists of the following resources:
The client application that makes a request to the load balancer.
The Application Load Balancer that inspects the client request. Based on routing rules, it directs the request to an instance and port from the target group that matches the rule.
The Application Load Balancer that has a target group for each microservice. The target groups are used by the corresponding services to register available container instances. Each target group has a path, so when you call the path for a particular microservice, it is mapped to the correct target group. This allows you to use one Application Load Balancer to serve all the different microservices, accessed by the path. For example, https:///owner/* would be mapped and directed to the Owner microservice.
One Amazon ECS cluster that hosts the containers for each microservice of the application.
A VPC network to host the Amazon ECS cluster and associated security groups.
Because you are running multiple containers on the same instances, use dynamic port mapping to avoid port clashing. By using dynamic port mapping, the container is allocated an anonymous port on the host to which the container port (8080) is mapped. The anonymous port is registered with the Application Load Balancer and target group so that traffic is routed correctly.
The following is also part of the solution but not depicted in the above diagram:
One Amazon ECR repository for each microservice.
A service/task definition per microservice that spins up containers on the instances of the Amazon ECS cluster.
A MySQL RDS instance that hosts the applications schema. The information about the MySQL RDS instance is sent in through environment variables to the containers. That way, the application can connect to the MySQL RDS instance.
The CloudFormation template remains the same as in the previous section. In the Python script, you are now building five different Java applications, one for each microservice (also includes a system application). There is a separate Maven POM file for each one. The resulting Docker image gets pushed to its own Amazon ECR repository, and is deployed separately using its own service/task definition. This is critical to get the benefits described earlier for microservices.
Here is an example of the POM file for the Owner microservice:
After the script has successfully run, you can test by querying an endpoint:
curl <your endpoint from output above>/owner
Conclusion
Migrating a monolithic application to a containerized set of microservices can seem like a daunting task. Following the steps outlined in this post, you can begin to containerize monolithic Java apps, taking advantage of the container runtime environment, and beginning the process of re-architecting into microservices. On the whole, containerized microservices are faster to develop, easier to iterate on, and more cost effective to maintain and secure.
This post focused on the first steps of microservice migration. You can learn more about optimizing and scaling your microservices with components such as service discovery, blue/green deployment, circuit breakers, and configuration servers at http://aws.amazon.com/containers.
If you have questions or suggestions, please comment below.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.