For .NET developers, leveraging Team Foundation Server (TFS) has been the cornerstone for CI/CD over the years. As more and more .NET developers start to deploy onto AWS, they have been asking questions about using the same tools to deploy to the AWS cloud. By configuring a pipeline in Azure DevOps to deploy to the AWS cloud, you can easily use familiar Microsoft development tools to build great applications.
This blog post demonstrates how to create a simple Azure DevOps project, repository, and pipeline to deploy an ASP.NET Core web application to Amazon ECS using Azure DevOps. The following screenshot shows a high-level architecture diagram of the pipeline:
In this example, you perform the following steps:
Create an Azure DevOps Project, clone project repo, and push ASP.NET Core web application.
Create a pipeline in Azure DevOps
Build an Amazon ECS Cluster, Task and Service.
Kick-off deployment of the ASP.Net Core web application using the newly create Azure DevOps pipeline.
Ensure you have the following prerequisites set up:
An IAM user with permissions for Amazon ECR and Amazon ECS (the user will need an access key and secret access key)
Create an Azure DevOps Project, clone project repo, and push ASP.NET Core web application
Follow these steps to deploy a .NET Core app onto your Amazon ECS cluster using the Azure DevOps (ADO) repository and pipeline:
Login to dev.azure.com and navigate to the marketplace.
Go to Visual Studio, search for “AWS”, and add the AWS Tools for Microsoft Visual Studio Team Services.
Create a project in ADO: Provide a project name and choose Create.
On the Project Summary page, choose Project Settings.
In the Project Settings pane, navigate to the Service Connections page.
Choose Create service connection, select AWS, and choose Next.
Input an Access Key ID and Secret Access Key. (You’ll need an IAM user with permissions for Amazon ECR and Amazon ECS in order to deploy via the Azure DevOps pipeline.) Choose Save.
Choose Repos in the left pane, then Clone in Visual Studio under Clone to your computer.
Create a ASP.NET Core web application in Visual Studio, set the location to locally cloned repository, and check Enable Docker support.
Once you’ve created the new project, perform an initial commit and push to the repository in Azure DevOps.
Creating a pipeline in Azure DevOps
Now that you have synced the repository, create a pipeline in Azure DevOps.
Go to the pipeline page within Azure DevOps and choose Create Pipeline.
Choose Use the classic editor.
Select Azure Repos Git for the location of your code and select the repository you created earlier.
On the Choose a Template page, select Docker Container and choose Apply.
Remove the Push an image step.
Add an Amazon ECR Push task by choosing the + symbol next to Agent job 1. You can search for “AWS” in the Add tasks pane to filter for all AWS tasks.
Now, configure each task:
Choose the Build an image task and ensure that the action is set to Build an image. Additionally, you can modify the Image Name to your standards.
Choose the Push Image task and provide the following
Enter a name under Display Name.
Select the AWS Credentials that you created in Service Connections.
Select the AWS Region.
Provide the source image name, which you can find in the setting for the Build an image task.
Enter the name of the repository in Amazon ECR to which the image is pushed
Choose Save and queue.
Build Amazon ECS Cluster, Task, and Service
The goal here is to test up to building the Docker image and ensure it’s pushed to Amazon ECR. Once the Docker image is in Amazon ECR, you can create the Amazon ECS cluster, task definition, and service leveraging the newly created Docker image.
Add the last step by choosing the + symbol next to Agent job 1.
Search for “AWS CLI” in the search bar and add the task.
Choose AWS CLI and configure the task.
Enter a name under Display Name, such as Update ECS Service.
Select the AWS Credentials that you created in Service Connections.
Select the AWS Region.
Input the following command, which updates the Amazon ECS service after a new image is pushed to Amazon ECR. Replace <clustername> and <servicename> with your Amazon ECS cluster and service names.
Options and parameters: --cluster <clustername> --service <servicename> --force-new-deployment
Now choose the Triggers tab and select Enable continuous integration with the repository you created.
Choose Save and queue.
At this point, your build pipeline kicks off and builds a Docker image from the source code in the repository you created, pushes the image to Amazon ECR, and updates the Amazon ECS service with the new image.
You can verify by viewing the build. Choose Pipelines in Azure DevOps, selecting the entry for the latest run, and then the icon under the status column. Once it successfully completes, you can log in to the AWS console and view the updated image in Amazon ECR and the updated service in Amazon ECS.
Every time you commit and push your code through Visual Studio, this pipeline kicks off and builds and deploys your application to Amazon ECS.
At the end of this example, once you’ve completed all steps and are finished testing, follow these steps to disable or delete resources to avoid incurring costs:
Go to the Amazon ECS console within the AWS Console.
Navigate to the cluster you created, then choose the Tasks tab.
Choose Stop all to turn off the tasks.
This blog post reviewed how to create a CI/CD pipeline in Azure DevOps to deploy a Docker Image to Amazon ECR and container to Amazon ECS. It provided detailed steps on how to set up a basic CI/CD pipeline, leveraging tools with which .NET developers are familiar and the steps needed to integrate with Amazon ECR and Amazon ECS.
I hope this post was informative and has helped you learn the basics of how to integrate Amazon ECR and Amazon ECS with Azure DevOps to create a robust CI/CD pipeline.
About the Authors
John Formento is a Solution Architect at Amazon Web Services. He helps large enterprises achieve their goals by architecting secure and scalable solutions on the AWS Cloud.
The majority of companies have embraced open-source software (OSS) at an accelerated rate even when building proprietary applications. Some of the obvious benefits for this shift include transparency, cost, flexibility, and a faster time to market. Snyk’s unique combination of developer-first tooling and best in class security depth enables businesses to easily build security into their continuous development process.
Even for teams building proprietary code, use of open-source packages and libraries is a necessity. In reality, a developer’s own code is often a small core within the app, and the rest is open-source software. While relying on third-party elements has obvious benefits, it also presents numerous complexities. Inadvertently introducing vulnerabilities into your codebase through repositories that are maintained in a distributed fashion and with widely varying levels of security expertise can be common, and opens up applications to effective attacks downstream.
There are three common barriers to truly effective open-source security:
The security task remains in the realm of security and compliance, often perpetuating the siloed structure that DevOps strives to eliminate and slowing down release pace.
Current practice may offer automated scanning of repositories, but the remediation advice it provides is manual and often un-actionable.
The data generated often focuses solely on public sources, without unique and timely insights.
Developer-led application security
This blog post demonstrates techniques to improve your application security posture using Snyk tools to seamlessly integrate within the developer workflow using AWS services such as Amazon ECR, AWS Lambda, AWS CodePipeline, and AWS CodeBuild. Snyk is a SaaS offering that organizations use to find, fix, prevent, and monitor open source dependencies. Snyk is a developer-first platform that can be easily integrated into the Software Development Lifecycle (SDLC). The examples presented in this post enable you to actively scan code checked into source code management, container images, and serverless, creating a highly efficient and effective method of managing the risk inherent to open source dependencies.
The examples provided in this post assume that you already have an AWS account and that your account has the ability to create new IAM roles and scope other IAM permissions. You can use your integrated development environment (IDE) of choice. The examples reference AWS Cloud9 cloud-based IDE. An AWS Quick Start for Cloud9 is available to quickly deploy to either a new or existing Amazon VPC and offers expandable Amazon EBS volume size.
Sample code and AWS CloudFormation templates are available to simplify provisioning the various services you need to configure this integration. You can fork or clone those resources. You also need a working knowledge of git and how to fork or clone within your source provider to complete these tasks.
cd ~/environment && \
git clone https://github.com/aws-samples/aws-modernization-with-snyk.git modernization-workshop
git submodule init
git submodule update
Configure your CI/CD pipeline
The workflow for this example consists of a continuous integration and continuous delivery pipeline leveraging AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, Amazon ECR, and AWS Fargate, as shown in the following screenshot.
A detailed step-by-step guide can be found in the self-paced workshop, but if you are familiar with AWS CloudFormation, you can launch the templates in three steps. From your Cloud9 IDE terminal, change directory to the location of the sample templates and complete the following three steps.
1) Launch basic services
aws cloudformation create-stack --stack-name WorkshopServices --template-body file://services.yaml \
--capabilities CAPABILITY_NAMED_IAM until [[ `aws cloudformation describe-stacks \
--stack-name "WorkshopServices" --query "Stacks.[StackStatus]" \
--output text` == "CREATE_COMPLETE" ]]; do echo "The stack is NOT in a state of CREATE_COMPLETE at `date`"; sleep 30; done &&; echo "The Stack is built at `date` - Please proceed"
2) Launch Fargate:
aws cloudformation create-stack --stack-name WorkshopECS --template-body file://ecs-fargate.yaml \
--capabilities CAPABILITY_NAMED_IAM until [[ `aws cloudformation describe-stacks \
--stack-name "WorkshopECS" --query "Stacks.[StackStatus]" \
--output text` == "CREATE_COMPLETE" ]]; do echo "The stack is NOT in a state of CREATE_COMPLETE at `date`"; sleep 30; done &&; echo "The Stack is built at `date` - Please proceed"
3) From your Cloud9 IDE terminal, change directory to the location of the sample templates and run the following command:
aws cloudformation create-stack --stack-name WorkshopPipeline --template-body file://pipeline.yaml \
--capabilities CAPABILITY_NAMED_IAM until [[ `aws cloudformation describe-stacks \
--stack-name "WorkshopPipeline" --query "Stacks.[StackStatus]" \
--output text` == "CREATE_COMPLETE" ]]; do echo "The stack is NOT in a state of CREATE_COMPLETE at `date`"; sleep 30; done &&; echo "The Stack is built at `date` - Please proceed"
Improving your security posture
You need to sign up for a free account with Snyk. You may use your Google, Bitbucket, or Github credentials to sign up. Snyk utilizes these services for authentication and does not store your password. Once signed up, navigate to your name and select Account Settings. Under API Token, choose Show, which will reveal the token to copy, and copy this value. It will be unique for each user.
Save your password to the session manager
Run the following command, replacing abc123 with your unique token. This places the token in the session parameter manager.
Next, you need to insert testing with Snyk after maven builds the application. The simplest method is to insert commands to download, authorize, and run the Snyk commands after maven has built the application/dependency tree.
The sample Dockerfile contains an environment variable from a value passed to the docker build command, which contains the token for Snyk. By using an environment variable, Snyk automatically detects the token when used.
# Declare Snyktoken as a build-arg ARG snyk_auth_token
# Set the SNYK_TOKEN environment variable ENV
Download Snyk, and run a test, looking for medium to high severity issues. If the build succeeds, post the results to Snyk for monitoring and reporting. If a new vulnerability is found, you are notified.
# package the application
RUN mvn package -Dmaven.test.skip=true
# download, configure and run snyk. Break build if vulns present, post results to `https://snyk.io/`
RUN curl -Lo ./snyk "https://github.com/snyk/snyk/releases/download/v1.210.0/snyk-linux"
RUN chmod -R +x ./snyk
#Auth set through environment variable
RUN ./snyk test --severity-threshold=medium
RUN ./snyk monitor
Set up docker scanning
Later in the build process, a docker image is created. Analyze it for vulnerabilities in buildspec.yml. First, pull the Snyk token snykAuthToken from the parameter store.
Next, in the build phase, pass the token to the docker compose command, where it is retrieved in the Dockerfile code you set up to test the application.
- echo Build started on `date`
- echo Building the Docker image...
- cd modules/containerize-application
- docker build --build-arg snyk_auth_token=$SNYK_AUTH_TOKEN -t $REPOSITORY_URI:latest.
You can further extend the build phase to authorize the Snyk instance for testing the Docker image that’s produced. If it passes, you can pass the results to Snyk for monitoring and reporting.
For reference, a sample buildspec.yaml configured with Snyk is available in the sample repo. You can either copy this file and overwrite your existing buildspec.yaml or open an editor and replace the contents.
Testing the application
Now that services have been provisioned and Snyk tools have been integrated into your CI/CD pipeline, any new git commit triggers a fresh build and application scanning with Snyk detects vulnerabilities in your code.
In the CodeBuild console, you can look at your build history to see why your build failed, identify security vulnerabilities, and pinpoint how to fix them.
✗ Medium severity vulnerability found in org.primefaces:primefaces
Description: Cross-site Scripting (XSS)
Introduced through: org.primefaces:[email protected]
From: org.primefaces:[email protected]
Upgrade direct dependency org.primefaces:[email protected] to org.primefaces:[email protected] (triggers upgrades to org.primefaces:[email protected])
✗ Medium severity vulnerability found in org.primefaces:primefaces
Description: Cross-site Scripting (XSS)
Introduced through: org.primefaces:[email protected]
From: org.primefaces:[email protected]
Upgrade direct dependency org.primefaces:[email protected] to org.primefaces:[email protected] (triggers upgrades to org.primefaces:[email protected])
Package manager: maven
Target file: pom.xml
Open source: no
Project path: /usr/src/app
Tested 37 dependencies for known vulnerabilities, found 2 vulnerabilities, 2 vulnerable paths.
The command '/bin/sh -c ./snyk test' returned a non-zero code: 1
[Container] 2020/02/14 03:46:22 Command did not exit successfully docker build --build-arg snyk_auth_token=$SNYK_AUTH_TOKEN -t $REPOSITORY_URI:latest . exit status 1
[Container] 2020/02/14 03:46:22 Phase complete: BUILD Success: false
[Container] 2020/02/14 03:46:22 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: docker build --build-arg snyk_auth_token=$SNYK_AUTH_TOKEN -t $REPOSITORY_URI:latest .. Reason: exit status 1
Once you remediate your vulnerabilities and check in your code, another build is triggered and an additional scan is performed by Snyk. This time, you should see the build pass with a status of Succeeded.
You can also drill down into the CodeBuild logs and see that Snyk successfully scanned the Docker Image and found no package dependency issues with your Docker container!
[Container] 2020/02/14 03:54:14 Running command $PWDUTILS/snyk test --docker $REPOSITORY_URI:latest
Package manager: rpm
Docker image: 300326902600.dkr.ecr.us-west-2.amazonaws.com/petstore_frontend:latest
✓ Tested 190 dependencies for known vulnerabilities, no vulnerable paths found.
Snyk provides detailed reports for your imported projects. You can navigate to Projects and choose View Report to set the frequency with which the project is checked for vulnerabilities. You can also choose View Report and then the Dependencies tab to see which libraries were used. Snyk offers a comprehensive database and remediation guidance for known vulnerabilities in their Vulnerability DB. Specifics on potential vulnerabilities that may exist in your code would be contingent on the particular open source dependencies used with your application.
Remember to delete any resources you may have created in order to avoid additional costs. If you used the AWS CloudFormation templates provided here, you can safely remove them by deleting those stacks from the AWS CloudFormation Console.
In this post, you learned how to leverage various AWS services to build a fully automated CI/CD pipeline and cloud IDE development environment. You also learned how to utilize Snyk to seamlessly integrate with AWS and secure your open-source dependencies and container images. If you are interested in learning more about DevSecOps with Snyk and AWS, then I invite you to check out this workshop and watch this video.
About the Author
Jay is a Senior Partner Solutions Architect at AWS bringing over 20 years of experience in various technical roles. He holds a Master of Science degree in Computer Information Systems and is a subject matter expert and thought leader for strategic initiatives that help customers embrace a DevOps culture.
Learn about AWS Services & Solutions – September AWS Online Tech Talks
Join us this September to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!
Note – All sessions are free and in Pacific Time.
Tech talks this month:
September 23, 2019 | 11:00 AM – 12:00 PM PT – Build Your Hybrid Cloud Architecture with AWS – Learn about the extensive range of services AWS offers to help you build a hybrid cloud architecture best suited for your use case.
October 1, 2019 | 11:00 AM – 12:00 PM PT – Robots and STEM: AWS RoboMaker and AWS Educate Unite! – Come join members of the AWS RoboMaker and AWS Educate teams as we provide an overview of our education initiatives and walk you through the newly launched RoboMaker Badge.
October 2, 2019 | 9:00 AM – 10:00 AM PT – Deep Dive on Amazon EventBridge – Learn how to optimize event-driven applications, and use rules and policies to route, transform, and control access to these events that react to data from SaaS apps.
This post is contributed by Tony Pujals | Senior Developer Advocate, AWS
AWS recently increased the number of elastic network interfaces available when you run tasks on Amazon ECS. Use the account setting called awsvpcTrunking. If you use the Amazon EC2 launch type and task networking (awsvpc network mode), you can now run more tasks on an instance—5 to 17 times as many—as you did before.
As more of you embrace microservices architectures, you deploy increasing numbers of smaller tasks. AWS now offers you the option of more efficient packing per instance, potentially resulting in smaller clusters and associated savings.
To manage your own cluster of EC2 instances, use the EC2 launch type. Use task networking to run ECS tasks using the same networking properties as if tasks were distinct EC2 instances.
Task networking offers several benefits. Every task launched with awsvpc network mode has its own attached network interface, a primary private IP address, and an internal DNS hostname. This simplifies container networking and gives you more control over how tasks communicate, both with each other and with other services within their virtual private clouds (VPCs).
Task networking also lets you take advantage of other EC2 networking features like VPC Flow Logs. This feature lets you monitor traffic to and from tasks. It also provides greater security control for containers, allowing you to use security groups and network monitoring tools at a more granular level within tasks. For more information, see Introducing Cloud Native Networking for Amazon ECS Containers.
However, if you run container tasks on EC2 instances with task networking, you can face a networking limit. This might surprise you, particularly when an instance has plenty of free CPU and memory. The limit reflects the number of network interfaces available to support awsvpc network mode per container instance.
Raise network interface density limits with trunking
The good news is that AWS raised network interface density limits by implementing a networking feature on ECS called “trunking.” This is a technique for multiplexing data over a shared communication link.
If you’re migrating to microservices using AWS App Mesh, you should optimize network interface density. App Mesh requires awsvpc networking to provide routing control and visibility over an ever-expanding array of running tasks. In this context, increased network interface density might save money.
By opting for network interface trunking, you should see a significant increase in capacity—from 5 to 17 times more than the previous limit. For more information on the new task limits per container instance, see Supported Amazon EC2 Instance Types.
Applications with tasks not hitting CPU or memory limits also benefit from this feature through the more cost-effective “bin packing” of container instances.
Trunking is an opt-in feature
AWS chose to make the trunking feature opt-in due to the following factors:
Instance registration: While normal instance registration is straightforward with trunking, this feature increases the number of asynchronous instance registration steps that can potentially fail. Any such failures might add extra seconds to launch time.
Available IP addresses: The “trunk” belongs to the same subnet in which the instance’s primary network interface originates. This effectively reduces the available IP addresses and potentially the ability to scale out on other EC2 instances sharing the same subnet. The trunk consumes an IP address. With a trunk attached, there are two assigned IP addresses per instance, one for the primary interface and one for the trunk.
Differing customer preferences and infrastructure: If you have high CPU or memory workloads, you might not benefit from trunking. Or, you may not want awsvpc networking.
Consequently, AWS leaves it to you to decide if you want to use this feature. AWS might revisit this decision in the future, based on customer feedback. For now, your account roles or users must opt in to the awsvpcTrunking account setting to gain the benefits of increased task density per container instance.
Enable the ECS elastic network interface trunking feature to increase the number of network interfaces that can be attached to supported EC2 container instance types. You must meet the following prerequisites before you can launch a container instance with the increased network interface limits:
Your account must have the AWSServiceRoleForECS service-linked role for ECS.
You must opt into the awsvpcTrunking account setting.
Make sure that a service-linked role exists for ECS
A service-linked role is a unique type of IAM role linked to an AWS service (such as ECS). This role lets you delegate the permissions necessary to call other AWS services on your behalf. Because ECS is a service that manages resources on your behalf, you need this role to proceed.
In most cases, you won’t have to create a service-linked role. If you created or updated an ECS cluster, ECS likely created the service-linked role for you.
You can confirm that your service-linked role exists using the AWS CLI, as shown in the following code example:
Your account, IAM user, or role must opt in to the awsvpcTrunking account setting. Select this setting using the AWS CLI or the ECS console. You can opt in for an account by making awsvpcTrunking its default setting. Or, you can enable this setting for the role associated with the instance profile with which the instance launches. For instructions, see Account Settings.
After completing the prerequisites described in the preceding sections, launch a new container instance with increased network interface limits using one of the supported EC2 instance types.
Keep the following in mind:
It’s available with the latest variant of the ECS-optimized AMI.
It only affects creation of new container instances after opting into awsvpcTrunking.
It only affects tasks created with awsvpc network mode and EC2 launch type. Tasks created with the AWS Fargate launch type always have a dedicated network interface, no matter how many you launch.
If you seek to optimize the usage of your EC2 container instances for clusters that you manage, enable the increased network interface density feature with awsvpcTrunking. By following the steps outlined in this post, you can launch tasks using significantly fewer EC2 instances. This is especially useful if you embrace a microservices architecture, with its increasing numbers of lighter tasks.
Hopefully, you found this post informative and the proposed solution intriguing. As always, AWS welcomes all feedback or comment.
This post is contributed by Tony Pujals | Senior Developer Advocate, AWS
AWS App Mesh is a service mesh, which provides a framework to control and monitor services spanning multiple AWS compute environments. My previous post provided a walkthrough to get you started. In it, I showed deploying a simple microservice application to Amazon ECS and configuring App Mesh to provide traffic control and observability.
In this post, I show more advanced techniques using AWS Fargate as an ECS launch type. I show you how to deploy a specific version of the colorteller service from the previous post. Finally, I move on and explore distributing traffic across other environments, such as Amazon EC2 and Amazon EKS.
I simplified this example for clarity, but in the real world, creating a service mesh that bridges different compute environments becomes useful. Fargate is a compute service for AWS that helps you run containerized tasks using the primitives (the tasks and services) of an ECS application. This lets you work without needing to directly configure and manage EC2 instances.
This post assumes that you already have a containerized application running on ECS, but want to shift your workloads to use Fargate.
You deploy a new version of the colorteller service with Fargate, and then begin shifting traffic to it. If all goes well, then you continue to shift more traffic to the new version until it serves 100% of all requests. Use the labels “blue” to represent the original version and “green” to represent the new version. The following diagram shows programmer model of the Color App.
You want to begin shifting traffic over from version 1 (represented by colorteller-blue in the following diagram) over to version 2 (represented by colorteller-green).
In App Mesh, every version of a service is ultimately backed by actual running code somewhere, in this case ECS/Fargate tasks. Each service has its own virtual node representation in the mesh that provides this conduit.
The following diagram shows the App Mesh configuration of the Color App.
After shifting the traffic, you must physically deploy the application to a compute environment. In this demo, colorteller-blue runs on ECS using the EC2 launch type and colorteller-green runs on ECS using the Fargate launch type. The goal is to test with a portion of traffic going to colorteller-green, ultimately increasing to 100% of traffic going to the new green version.
AWS compute model of the Color App.
Before following along, set up the resources and deploy the Color App as described in the previous walkthrough.
Deploy the Fargate app
To get started after you complete your Color App, configure it so that your traffic goes to colorteller-blue for now. The blue color represents version 1 of your colorteller service.
Log into the App Mesh console and navigate to Virtual routers for the mesh. Configure the HTTP route to send 100% of traffic to the colorteller-blue virtual node.
The following screenshot shows routes in the App Mesh console.
Test the service and confirm in AWS X-Ray that the traffic flows through the colorteller-blue as expected with no errors.
The following screenshot shows racing the colorgateway virtual node.
Deploy the new colorteller to Fargate
With your original app in place, deploy the send version on Fargate and begin slowly increasing the traffic that it handles rather than the original. The app colorteller-green represents version 2 of the colorteller service. Initially, only send 30% of your traffic to it.
If your monitoring indicates a healthy service, then increase it to 60%, then finally to 100%. In the real world, you might choose more granular increases with automated rollout (and rollback if issues arise), but this demonstration keeps things simple.
You pushed the gateway and colorteller images to ECR (see Deploy Images) in the previous post, and then launched ECS tasks with these images. For this post, launch an ECS task using the Fargate launch type with the same colorteller and envoy images. This sets up the running envoy container as a sidecar for the colorteller container.
You don’t have to manually configure the EC2 instances in a Fargate launch type. Fargate automatically colocates the sidecar on the same physical instance and lifecycle as the primary application container.
To begin deploying the Fargate instance and diverting traffic to it, follow these steps.
This updated mesh configuration adds a new virtual node (colorteller-green-vn). It updates the virtual router (colorteller-vr) for the colorteller virtual service so that it distributes traffic between the blue and green virtual nodes at a 2:1 ratio. That is, the green node receives one-third of the traffic.
$ ./appmesh-colorapp.sh ... Waiting for changeset to be created.. Waiting for stack create/update to complete ... Successfully created/updated stack - DEMO-appmesh-colorapp $
Step 2: Deploy the green task to Fargate
The fargate-colorteller.sh script creates parameterized template definitions before deploying the fargate-colorteller.yaml CloudFormation template. The change to launch a colorteller task as a Fargate task is in fargate-colorteller-task-def.json.
$ ./fargate-colorteller.sh ...
Waiting for changeset to be created.. Waiting for stack create/update to complete Successfully created/updated stack - DEMO-fargate-colorteller $
Verify the Fargate deployment
The ColorApp endpoint is one of the CloudFormation template’s outputs. You can view it in the stack output in the AWS CloudFormation console, or fetch it with the AWS CLI:
This reflects the expected result for a 2:1 ratio. Check everything on your AWS X-Ray console.
The following screenshot shows the X-Ray console map after the initial testing.
The results look good: 100% success, no errors.
You can now increase the rollout of the new (green) version of your service running on Fargate.
Using AWS CloudFormation to manage your stacks lets you keep your configuration under version control and simplifies the process of deploying resources. AWS CloudFormation also gives you the option to update the virtual route in appmesh-colorapp.yaml and deploy the updated mesh configuration by running appmesh-colorapp.sh.
For this post, use the App Mesh console to make the change. Choose Virtual routers for appmesh-mesh, and edit the colorteller-route. Update the HTTP route so colorteller-blue-vn handles 33.3% of the traffic and colorteller-green-vn now handles 66.7%.
Again, repeat your verification process in both the CLI and X-Ray to confirm that the new version of your service is running successfully.
In this walkthrough, I showed you how to roll out an update from version 1 (blue) of the colorteller service to version 2 (green). I demonstrated that App Mesh supports a mesh spanning ECS services that you ran as EC2 tasks and as Fargate tasks.
In my next walkthrough, I will demonstrate that App Mesh handles even uncontainerized services launched directly on EC2 instances. It provides a uniform and powerful way to control and monitor your distributed microservice applications on AWS.
If you have any questions or feedback, feel free to comment below.
Guest post by AWS Container Hero Tung Nguyen. Tung is the president and founder of BoltOps, a consulting company focused on cloud infrastructure and software on AWS. He also enjoys writing for the BoltOps Nuts and Bolts blog.
EC2 Spot Instances allow me to use spare compute capacity at a steep discount. Using Amazon ECS with Spot Instances is probably one of the best ways to run my workloads on AWS. By using Spot Instances, I can save 50–90% on Amazon EC2 instances. You would think that folks would jump at a huge opportunity like a black Friday sales special. However, most folks either seem to not know about Spot Instances or are hesitant. This may be due to some fallacies about Spot.
With the Spot model, AWS can remove instances at any time. It can be due to a maintenance upgrade; high demand for that instance type; older instance type; or for any reason whatsoever.
Hence the first fear and fallacy that people quickly point out with Spot:
What do you mean that the instance can be replaced at any time? Oh no, that must mean that within 20 minutes of launching the instance, it gets killed.
The average frequency of interruption across all Regions and instance types is less than 5%.
From my own usage, I have seen instances run for weeks. Need proof? Here’s a screenshot from an instance in one of our production clusters.
If you’re wondering how many days that is….
Yes, that is 228 continuous days. You might not get these same long uptimes, but it disproves the fallacy that Spot Instances are usually interrupted within 20 minutes from launch.
With Spot Instances, I place a single request for a specific instance in a specific Availability Zone. With Spot Fleets, instead of requesting a single instance type, I can ask for a variety of instance types that meet my requirements. For many workloads, as long as the CPU and RAM are close enough, many instance types do just fine.
So, I can spread my instance bets across instance types and multiple zones with Spot Fleets. Using Spot Fleets dramatically makes the system more robust on top of the already mentioned low interruption rate. Also, I can run an On-Demand cluster to provide additional safeguard capacity.
ECS and Spot Fleets: A Great Fit Together
This is one of my favorite ways to run workloads because it gives me a scalable system at a ridiculously low cost. The technologies are such a great fit together that one might think they were built for each other.
Docker provides a consistent, standard binary format to deploy. If it works in one Docker environment, then it works in another. Containers can be pulled down in seconds, making them an excellent fit for Spot Instances, where containers might move around during an interruption.
ECS provides a great ecosystem to run Docker containers. ECS supports a feature called connection instance draining that allows me to tell ECS to relocate the Docker containers to other EC2 instances.
Spot Instances fire off a two-minute warning signal letting me know when it’s about to terminate the instance.
These are the necessary pieces I need for building an ECS cluster on top of Spot Fleet. I use the two-minute warning to call ECS connection draining, and ECS automatically moves containers to another instance in the fleet.
Here’s a CloudFormation template that achieves this: ecs-ec2-spot-fleet. Because the focus is on understanding Spot Fleets, the VPC is designed to be simple.
The template specifies two instance types in the Spot Fleet: t3.small and t3.medium with 2 GB and 4 GB of RAM, respectively. The template weights the t3.medium twice as much as the t3.small. Essentially, the Spot Fleet TargetCapacity value equals the total RAM to provision for the ECS cluster. So if I specify 8, the Spot Fleet service might provision four t3.small instances or two t3.medium instances. The cluster adds up to at least 8 GB of RAM.
To launch the stack run, I run the following command:
The CloudFormation stack launches container instances and registers them to an ECS cluster named developmentby default. I can change this with the EcsCluster parameter. For more information on the parameters, see the README and the template source.
When I deploy the application, the deploy tool creates the ECS cluster itself. Here are the Spot Instances in the EC2 console.
Deploy the demo app
After the Spot cluster is up, I can deploy a demo app on it. I wrote a tool called Ufo that is useful for these tasks:
Build the Docker image.
Register the ECS task definition.
Register and deploy the ECS service.
Create the load balancer.
Docker should be installed as a prerequisite. First, I create an ECR repo and set some variables:
The application returns “42,” the meaning of life, successfully. That’s it! I now have an application running on ECS with Spot Fleet Instances.
One additional advantage of using Spot is that it encourages me to think about my architecture in a highly available manner. The Spot “constraints” ironically result in much better sleep at night as the system must be designed to be self-healing.
Hopefully, this post opens the world of running ECS on Spot Instances to you. It’s a core of part of the systems that BoltOps has been running on its own production system and for customers. I still get excited about the setup today. If you’re interested in Spot architectures, contact me at BoltOps.
One last note: Auto Scaling groups also support running multiple instance types and purchase options. Jeff mentions in his post that weight support is planned for a future release. That’s exciting, as it may streamline the usage of Spot with ECS even further.
The Amazon ECS-optimized AMI comes prepackaged with the ECS container agent, Docker agent, and the ecs-init upstart service. We recommend that you use the Amazon ECS-optimized AMI for your container instances unless your application requires any of the following:
A specific operating system
Custom security and monitoring agents installed
Root volumes encryption enabled
A Docker version that is not yet available in the Amazon ECS-optimized AMI
Regardless of the type of AMI that you choose, AWS recommends updating your ECS containers instance fleet with the latest AMI whenever possible. It’s easier than trying to patch existing instances in place.
In this solution, you deploy the ECS cluster and, specify cluster size, instance type, AMI ID, and other parameters. After the ECS cluster has been created and instances registered, you can update the ECS cluster with another AMI ID to trigger the following events:
A new launch configuration is created using the new AMI ID.
The Auto Scaling group adds one new instance using the new launch configuration. This executes the ‘Adding Instances’ process described below.
The adding instances process finishes for the single new node with the new AMI. Then, the removing instances process is started against the oldest instance with the old AMI ID.
After the removing nodes process is finished, steps 2 and 3 are repeated until all nodes in the cluster have been replaced.
If an error is encountered during the rollout, the new launch configuration is deleted, and the old one is put back in place.
Scaling a cluster out (adding instances)
Take a closer look at each step in scaling out a cluster:
A stack update changes the AMI ID parameter.
CloudFormation updates the launch configuration and tells the Auto Scaling group to add an instance.
Auto Scaling launches an instance using new AMI ID to join the ECS cluster.
Auto Scaling invokes the Launch Lambda function.
Lambda asks the ECS cluster if the newly launched instance has joined and is showing healthy.
Lambda tells Auto Scaling whether the launch succeeded or failed.
Auto Scaling tells CloudFormation whether the scale-up has succeeded.
The stack update succeeds, or rolls back.
Scaling a cluster in (removing instances)
Take a closer look at each step in scaling in a cluster:
CloudFormation tells the Auto Scaling group to remove an instance.
Auto Scaling chooses an instance to be terminated.
Auto Scaling invokes the Terminate Lambda function.
The Lambda function performs the following tasks:
Sets the instance to be terminated to DRAINING mode.
Confirms that all ECS tasks are drained from the instance marked for termination.
Confirms that the ECS cluster services and tasks are stable.
Lambda tells Auto Scaling to proceed with termination.
Auto Scaling tells CloudFormation whether the scale-in has succeeded.
The stack update succeeds, or rolls back.
Here are the technologies used for this solution, with more details.
AWS Auto Scaling
Amazon CloudWatch Events
AWS Systems Manager Parameter Store
AWS CloudFormation is used to deploy the stack, and should be used for lifecycle management. Do not directly edit Auto Scaling groups, Lambda functions, and so on. Instead, update the CloudFormation template.
This forces the resolution of the latest AMI, as well as providing an opportunity to change the size or instance type involved in the ECS cluster.
CloudFormation has rollback capabilities to return to the last known good state if errors are encountered. It is the recommended mechanism for management through the clusters lifecycle.
AWS Auto Scaling
For ECS, the primary scaling and rollout mechanism is AWS Auto Scaling. Auto Scaling allows you to define a desired state environment, and keep that desired state as necessary by launching and terminating instances.
When a new AMI has been selected, CloudFormation informs Auto Scaling that it should replace the existing fleet of instances. This is controlled by an Auto Scaling update policy.
This solution rolls a single instance out to the ECS cluster, then drain, and terminate a single instance in response. This cycle continues until all instances in the ECS cluster have been replaced.
Auto Scaling lifecycle hooks
Auto Scaling permits the use of a lifecycle hooks. This is code that executes when a scaling operation occurs. This solution uses a Lambda function that is informed when an instance is launched or terminated.
A lifecycle hook informs Auto Scaling whether it can proceed with the activity or if it should abandon it. In this case, the ECS cluster remains healthy and all tasks have been redistributed before allowing Auto Scaling to proceed.
Lifecycles also have a timeout. In this case, it is 3600 seconds (1 hour) before Auto Scaling gives up. In that case, the default activity is to abandon the operation.
Amazon CloudWatch Events
CloudWatch Events is a mechanism for watching calls made to the AWS APIs, and then activating functions in response. This is the mechanism used to launch the Lambda functions when a lifecycle event occurs. It’s also the mechanism used to re-launch the Lambda function when it times out (Lambda maximum execution time is 15 minutes).
In this solution, four CloudWatch Events are created. Two to pick up the initial scale-up event. Two more to pick up a continuation from the Lambda function.
This solution relies on the AMI IDs stored in Parameter Store. Given a naming standard of /ami/ecs/latest, this always resolves to the latest available AMI for ECS.
CloudFormation now supports using the values stored in Parameter Store as inputs to CloudFormation templates. The template can be simply passed a value—/ami/ecs/latest—and it resolve that to the latest AMI.
The Lambda functions are used to handle the Auto Scaling lifecycle hooks. They operate against the ECS cluster to assure it is healthy, and inform Auto Scaling that it can proceed, or to abandon its current operation.
The functions are invoked by CloudWatch Events in response to scaling operations so they are idle unless there are changes happening in the cluster.
They’re written in Python, and use the boto3 SDK to communicate with the ECS cluster, and Auto Scaling.
The launch Lambda function waits until the instance has fully joined the ECS cluster. This is shown by the instance being marked ‘ACTIVE’ by the ECS control plane, and it’s ECS agent status showing as connected. This means that the new instance is ready to run tasks for the cluster.
The terminate Lambda function waits until the instance has fully drained all running tasks. It also checks that all tasks, and services are in a stable state before allowing Auto Scaling to terminate an instance. This assures the instance is truly idle, and the cluster stable before an instance can be removed.
Before you begin deployment, you need the following:
An AWS account
You need an AWS account with enough room to accommodate the additional EC2 instances required by the ECS cluster.
(Optional) Linux system
Use AWS CLI and optionally JQ to deploy the solution. Although a Linux system is recommended, it’s not required.
You need an IAM admin user with permissions to create IAM policies and roles and create and update CloudFormation stacks. The user must also be able to deploy the ECS cluster, Lambda functions, Systems Manager parameters, and other resources.
Now, create a new LambdaECSScalingRole IAM role. For Trusted Entity, choose AWS Service, Lambda. Attach the following permissions policies:
LambdaECSScaling (created in the previous step)
ReadOnlyAccess (AWS managed policy)
AWSLambdaBasicExecutionRole (AWS managed policy)
ECS cluster instance profile
The ECS cluster nodes must have an instance profile attached that allows them to speak to the ECS service. This profile can also contain any other permissions that they would require (Systems Manager for management and executing commands for example).
These are all AWS managed policies so you only add the role.
Create a new IAM role called EcsInstanceRole, select AWS Service → EC2 as Trusted Entity. Attach the following AWS managed permissions policies:
The AWSLambdaBasicExecutionRole policy may look out of place, but this allows the instance to create new CloudWatch Logs groups. These permissions facilitate using CloudWatch Logs as the primary logging mechanism with ECS. This managed policy grants the required permissions without you needing to manage a custom role.
CloudFormation parameter file
We recommend using a parameter file for the CloudFormation template. This documents the desired parameters for the template. It is usually less error prone to do this versus using the console for inputting parameters.
There is a file called blank_parameter_file.json in the source code project. Copy this file to something new and with a more meaningful name (such as dev-cluster.json), then fill out the parameters.
EcsClusterName: The name of the ECS cluster to create.
EcsAmiParameterKey: The Systems Manager parameter that contains the AMI ID to be used. This defaults to /ami/ecs/latest.
IamRoleInstanceProfile: The name of the EC2 instance profile used by the ECS cluster members. Discussed in the prerequisite section.
EcsInstanceType: The instance type to use for the cluster. Use whatever is appropriate for your workloads.
EbsVolumeSize: The size of the Docker storage setup that is created using LVM. ECS typically defaults to 100 GB.
ClusterSize: The desired number of EC2 instances for the cluster.
ClusterMaxSize: This value should always be double the amount contained in ClusterSize. CloudFormation has no ‘math’ operators or we wouldn’t prompt for this. This allows rolling updates to be performed safely by doubling the cluster size, then contracting back.
KeyName: The name of the EC2 key pair to place on the ECS instance to support SSH.
SubnetIds: A comma-separated list of subnet IDs that the cluster should be allowed to launch instances into. These should map to at least two zones for a resilient cluster, for example subnet-a70508df,subnet-e009eb89.
SecurityGroupIds: A comma-separated list of security group IDs that are attached to each node, for example sg-bd9d1bd4,sg-ac9127dca (a single value is fine).
DeploymentS3Bucket: This is the bucket where the two Lambda functions for scale in/scale out lifecycle hooks can be found.
LifecycleLaunchFunctionZip: This is the full path within the DeploymentS3Bucket where the ecs-lifecycle-hook-launch.zip contents can be found.
LifecycleTerminateFunctionZip: The full path within the DeploymentS3Bucket where the ecs-lifecycle-hook-terminate.zip contents can be found.
LambdaFunctionRole: The name of the role that the Lambda functions use. Discussed in the prerequisite section.
A completed parameter file looks like the following:
Given the CloudFormation template and the parameter file, you can deploy the stack using the AWS CLI or the console.
Here’s an example deploying through the AWS CLI. This example uses a stack named ecs-dev and a parameter file named dev-cluster.json. It also uses the --profile argument to assure that the CLI assumes a role in the right account for deployment. Use the corresponding Region and profile from your local ~/.aws/config file.
This command outputs the stack ID as soon as it is executed, even though the other required resources are still being created.
After the CloudFormation stack has been created, go to the ECS console. and open the DevCluster cluster that you just created. There are no tasks running, although you should see two container instances registered with the cluster.
You also see a warning message indicating that the container instances are not running the latest version of Amazon ECS container agent. The reason is that you did not use the latest available version of the ECS-Optimized AMI.
Fix this issue by updating the container instances AMI.
Update the cluster instances AMI
Run the following commands to set the /ami/ecs/latest parameter to the latest AMI ID.
Make sure that the parameter value has been updated in the console.
To update your ECS cluster, run the update-stack command without changing any parameters. CloudFormation evaluates the value stored by /ami/ecs/latest. If it has changed, CloudFormation makes updates as appropriate.
We recommend supervising your updates to the ECS cluster while they are being deployed. This assures that the cluster remains stable. For the majority of situations, there is no manual intervention required.
Keep an eye on Auto Scaling activities. In the Auto Scaling groups section of the EC2 console, select the Auto Scaling group for a cluster and choose Activity History.
Keep an eye on the ECS instances to ensure that new instances are joining and draining instances are leaving. In the ECS console, choose Cluster, ECS Instances.
Lambda function logs help troubleshoot things that aren’t behaving as expected. In the Lambda console, select the LifeCycleLaunch or LifeCycleTerminate functions, and choose Monitoring, View logs in CloudWatch. Expand the logs for the latest executions and see what’s going on:
When you go back to the ECS cluster page, notice that the “Outdated Amazon ECS container agent” warning message has disappeared.
Select one of the cluster’s EC2 instance IDs and observe that the latest ECS optimized AMI is used.
In this post, you saw how to use CloudFormation, Lambda, CloudWatch Events, and Auto Scaling lifecycle hooks to update your ECS cluster instances with a new AMI.
The sample code is available on GitHub for you to use and extend. Contributions are always welcome!
This post is contributed by Brent Langston – Sr. Developer Advocate, Amazon Container Services
Last week, AWS announced enhanced Amazon Elastic Container Service (Amazon ECS) support for GPU-enabled EC2 instances. This means that now GPUs are first class resources that can be requested in your task definition, and scheduled on your cluster by ECS.
Previously, to schedule a GPU workload, you had to maintain your own custom configured AMI, with a custom configured Docker runtime. You also had to use custom vCPU logic as a stand-in for assigning your GPU workloads to GPU instances. Even when all that was in place, there was still no pinning of a GPU to a task. One task might consume more GPU resources than it should. This could cause other tasks to not have a GPU available.
Now, AWS maintains an ECS-optimized AMI that includes the correct NVIDIA drivers and Docker customizations. You can use this AMI to provision your GPU workloads. With this enhancement, GPUs can also be requested directly in the task definition. Like allocating CPU or RAM to a task, now you can explicitly request a number of GPUs to be allocated to your task. The scheduler looks for matching resources on the cluster to place those tasks. The GPUs are pinned to the task for as long as the task is running, and can’t be allocated to any other tasks.
I thought I’d see how easy it is to deploy GPU workloads to my ECS cluster. I’m working in the US-EAST-2 (Ohio) region, from my AWS Cloud9 IDE, so these commands work for Amazon Linux. Feel free to adapt to your environment as necessary.
If you’d like to run this example yourself, you can find all the code in this GitHub repo. If you run this example in your own account, be aware of the instance pricing, and clean up your resources when your experiment is complete.
While AWS CloudFormation is provisioning resources, examine the template used to build your infrastructure. Open `cluster-cpu-gpu.yml`, and you see that you are provisioning a test VPC with two c5.2xlarge instances, and two p3.2xlarge instances. This gives you one NVIDIA Tesla V100 GPU per instance, for a total of two GPUs to run training tasks.
I adapted the TensorFlow benchmark Docker container to create a training workload. I use this container to compare the GPU scheduling and runtime.
When the CloudFormation stack is deployed, register a task definition with the ECS service:
When you launch the task, the output shows the `guIds` values that are assigned to the task. This GPU is pinned to this task, and can’t be shared with any other tasks. If all GPUs are allocated, you can’t schedule additional GPU tasks until a running task with a GPU completes. That frees the GPU to be scheduled again.
When you look at the log output in Amazon CloudWatch Logs, you see that the container discovered one GPU: `/gpu0` and the training benchmark trained at a rate of 321.16 images/sec.
With your two p3.2xlarge nodes in the cluster, you are limited to two concurrent single GPU based workloads. To scale horizontally, you could add additional p3.2xlarge nodes. This would limit your workloads to a single GPU each. To scale vertically, you could bump up the instance type, which would allow you to assign multiple GPUs to a single task. Now, let’s see how fast your TensorFlow container can train when assigned multiple GPUs.
Launch a multiple-GPU training workload
To begin, replace the p3.2xlarge instances with p3.16xlarge instances. This gives your cluster two instances that each have eight GPUs, for a total of 16 GPUs that can be allocated.
With each task request, GPUs are allocated: four in the first request, and eight in the second request. Again, these GPUs are pinned to the task, and not usable by any other task until these tasks are complete.
Check the log output in CloudWatch Logs:
On the “devices” lines, you can see that the container discovered and used four (or eight) GPUs. Also, the total images/sec improved to 1297.41 with four GPUs, and 1707.23 with eight GPUs.
Because you can pin single or multiple GPUs to a task, running advanced GPU based training tasks on Amazon ECS is easier than ever!
To clean up your running resources, delete the CloudFormation stack:
This post is contributed by Nathan Peck – Developer Advocate, Amazon Container Services
AWS Fargate, Amazon ECS, and Amazon ECR now have support for AWS PrivateLink. AWS PrivateLink is a networking technology designed to enable access to AWS services in a highly available and scalable manner, while keeping all the network traffic within the AWS network. When you create AWS PrivateLink endpoints for ECR, ECS, and Fargate, these service endpoints appear as elastic network interfaces with a private IP address in your VPC.
Before AWS PrivateLink, your Amazon EC2 instances had to use an internet gateway to download Docker images stored in ECR or communicate to the ECS control plane. Instances in a public subnet with a public IP address used the internet gateway directly. Instances in a private subnet used a network address translation (NAT) gateway hosted in a public subnet. The NAT gateway would then use the internet gateway to talk to ECR and ECS.
Now that AWS PrivateLink support has been added, instances in both public and private subnets can use it to get private connectivity to download images from Amazon ECR. Instances can also communicate with the ECS control plane via AWS PrivateLink endpoints without needing an internet gateway or NAT gateway.
This networking architecture is considerably simpler. It enables enhanced security by allowing you to deny your private EC2 instances access to anything other than these AWS services. That’s assuming that you want to block all other outbound internet access for those instances. For this to work, you must create some AWS PrivateLink resources:
AWS PrivateLink endpoints for ECR. This allows instances in your VPC to communicate with ECR to download image manifests
AWS PrivateLink gateway for Amazon S3. This allows instances to download the image layers from the underlying private S3 buckets that host them.
AWS PrivateLink endpoints for ECS. These endpoints allow instances to communicate with the telemetry and agent services in the ECS control plane.
This post explains how to create these resources.
Create an AWS PrivateLink interface endpoint for ECR
Next, specify the VPC and subnets to which the AWS PrivateLink interface should be added. Make sure that you select the same VPC in which your ECS cluster is running. To be on the safe side, select every Availability Zone and subnet from the list. Each zone has a list of the subnets available. You can select all the subnets in each Availability Zone.
However, depending on your networking needs, you might also choose to only enable the AWS PrivateLink endpoint in your private subnets from each Availability Zone. Let instances running in a public subnet continue to communicate with ECR via the public subnet’s internet gateway.
Next, enable Private DNS Name. You are required to enable Private DNS Name for endpoint
A private hosted zone enables you to access the resources in your VPC using the Amazon ECR default DNS domain names instead of using private IPv4 address or private DNS hostnames provided by AWS VPC Endpoints. The Amazon ECR DNS hostname that AWS CLI and Amazon ECR SDKs use by default (https://api.ecr.region.amazonaws.com) resolves to your VPC endpoint.
If you enabled a private hosted zone for com.amazonaws.region.ecr.api and you are using an SDK released before January 24, 2019, you must specify the following endpoint when using SDK or AWS CLI. For example:
If you enabled a private hosted zone and you are using the SDK released on January 24, 2019 or later, this would be:
aws ecr describe-repositories
Lastly, specify a security group for the interface itself. This is going to control whether each host is able to talk to the interface. The security group should allow inbound connections on port 80 from the instances in your cluster.
You may have a security group that is applied to all the EC2 instances in the cluster, perhaps using an Auto Scaling group. You can create a rule that allows the VPC endpoint to be accessed by any instance in that security group.
Finally, choose Create endpoint. The new endpoint appears in the list.
Add an AWS PrivateLink gateway endpoint for S3
The next step is to create a gateway VPC endpoint for S3. This is necessary because ECR uses S3 to store Docker image layers. When your instances download Docker images from ECR, they must access ECR to get the image manifest and S3 to download the actual image layers.
S3 uses a slightly different endpoint type called a gateway. Be careful about adding an S3 gateway to your VPC if your application is actively using S3. With gateway endpoints, your application’s existing connections to S3 may be briefly interrupted while the gateway is being added. You may have a busy cluster with many active ECS deployments, causing image layer downloads from S3. Or, your application itself may make heavy usage of S3. In that case, it’s best to create a fresh new VPC with an S3 gateway, then migrate your ECS cluster and its containers into that VPC.
To add the S3 gateway endpoint, select com.amazonaws.region.s3 on the list of AWS services and select the VPC hosting your ECS cluster. Gateway endpoints are added to the VPC route table for the subnets. Select each route table associated with the subnet in which the S3 gateway should be.
Instead of using a security group, the gateway endpoint uses an IAM policy document to limit access to the service. This policy is similar to an IAM policy but does not replace the default level of access that your applications have through their IAM role. It just further limits what portions of the service are available via the gateway. It’s okay to just use the default Full Access policy. Any restrictions you have put on your task IAM roles or other IAM user policies still apply on top of this policy.
Choose Create to add this gateway endpoint to your VPC. When you view the route tables in your VPC subnets, you see an S3 gateway that is used whenever ECR Docker image layers are being downloaded from S3.
Create an AWS PrivateLink interface endpoint for ECS
In addition to downloading Docker images from ECR, your EC2 instances must also communicate with the ECS control plane to receive orchestration instructions.
ECS requires three endpoints:
Create these three interface endpoints in the same way that you created the endpoint for ECR, by adding each endpoint and setting the subnets and security group for the endpoint.
After the endpoints are created and added to your VPC, there is one additional step. Restart any ECS agents that are currently running in the VPC. The ECS agent uses a persistent web socket connection to the ECS backend and VPC endpoints do not interrupt existing connections. The agent continues to use its existing connection instead of establishing a new connection through the new endpoint, unless you restart it.
To restart the agent with no disruption to your application containers, you can connect using SSH to each EC2 instance in the cluster and issue the following command:
sudo docker restart ecs-agent
This restarts the ECS agent without stopping any of the other application containers on the host. Your application may be stateless and safe to stop at any time, or you may not have or want SSH access to the underlying hosts. In that case, choose to just reboot each EC2 instance in the cluster one at a time. This restarts the agent on that host while also restarting any service launched tasks on that host on a different host.
If you are using AWS Fargate, you can issue an UpdateService API call to do a rolling restart of all your containers. Or, manually stop your running containers one by one and let them be automatically replaced. When they restart, they use an ECS agent that is communicating using the new ECS endpoints. The Docker image is downloaded using the ECR endpoint and S3 gateway.
In this post, I showed you how to add AWS PrivateLink endpoints to your VPC for ECS and ECR, including an S3 gateway for ECR layer downloads. These endpoints work whether you are running your containers on EC2 instances in a self-managed cluster in your VPC, or as Fargate containers running in your VPC.
Your ECS cluster or Fargate tasks communicate directly with the ECS control plane. They should be able to download Docker images directly without needing to make any connections outside of your VPC via an internet gateway or NAT gateway. All container orchestration traffic stays inside the VPC.
If you have questions or suggestions, please comment below.
This post is courtesy of Vidya Narasimhan, AWS Solutions Architect
Java Enterprise Edition has been an important server-side platform for over a decade for developing mission-critical & large-scale applications amongst enterprises. High-availability & fault tolerance for such applications is typically achieved through built-in JEE clustering provided by the platform.
JEE clustering represents a group of machines working together to transparently provide enterprise services such as JNDI, EJB, JMS, HTTPSession etc. that enable distribution, discovery, messaging, transaction, caching, replication & component failover. Implementation of clustering technology varies in JEE platforms provided by different vendors. Many of the clustering implementations involve proprietary communication protocols that use multicast for intra-cluster communications that is not supported in public cloud.
This article is relevant for JEE platforms & other products that use JGroups based clustering such as Wildfly. The solution described allows easy migration of applications developed on these platforms using native clustering to Amazon Elastic Container Service (Amazon ECS) which is a highly scalable, fast, container management service that makes it easy to orchestrate, run & scale Docker containers on a cluster. This solution is useful when the business objective is to migrate to cloud fast with minimum changes to the application. The approach recommends lift & shift to AWS wherein the initial focus is to migrate as-is with optimizations coming in later incrementally.
Whether the JEE application to be migrated is designed as a monolith or micro services, a legacy or green-field deployment, there are multiple reasons why organizations should opt for containerization of their application. This link explains well the benefits of containerization (see section Why Use Containers) https://aws.amazon.com/getting-started/projects/break-monolith-app-microservices-ecs-docker-ec2/module-one/
2. Wildfly Clustering on ECS
Here onwards, this article highlights how to migrate a standard clustered JEE app deployed on Wildfly Application Server to Amazon ECS. Wildfly supports clustering out of the box and supports two modes of clustering, standalone & domain mode. This article explores how to setup WildFly cluster in ECS with multiple Wildfly standalone nodes enabled for HA to form a cluster. The clustering is demonstrated through a web application that replicates session information across the cluster nodes and can withstand a failover without session data loss.
The important components of clustering that requires a mention right away are ECS Service Discovery, JGroups & Infinispan.
JGroups – Wildfly clustering is enabled by the popular open-source JGroups toolkit. The JGroups subsystem provides group communication support for HA services using a multicast transmission by default. It deals with all aspects of node discovery and providing reliable messaging between the nodes as follows-
Node-to-node messaging — By default is based on UDP/multicast that can be extended via TCP/unicast.
Node discovery — By default uses multicast ping MPING. Alternatives include TCPPING, S3_PING, JDBC_PING, DNS_PING and others.
This article focusses on DNS_PING for node discovery using TCP protocol.
ECS Service discovery – Amazon ECS service can optionally be configured to use Amazon ECS Service Discovery. Service discovery uses Amazon Route 53 auto naming API actions to manage DNS entries (A or SRV records) for service tasks, making them discoverable within your VPC. You can specify health check conditions in a service task definition and Amazon ECS will ensure that only healthy service endpoints are returned by a service lookup.
As your services scale up or down in response to load or container health, the Route 53 hosted zone is kept up to date.
Wildfly uses JGroups to discover the cluster nodes via DNS_PING discovery protocol that sends a DNS service endpoint query to the ECS service registry maintained in Route53.
Infinispan – Wildfly uses Infinispan subsystem to provides high-performance, clustered, transactional caching. In a clustered web application, Infinispan handles the replication of application data across the cluster by means of a replicated/distributed cache. Under the hood, it uses JGroups channel for data transmission within the cluster.
3. Implementation Instructions
Modify Wildfly standalone configuration file – Standalone-HA.xml. The HA suffix implies high availability configuration.
Modify the JGroup Subsystem – Add a TCP Stack with DNS_Ping as the discovery protocol & configure the DNS Query endpoint. It is important to note that the DNS_QUERY matches the ECS service endpoint when configuring the ECS service.
Change the JGroup default stack to point to the TCP Stack.
Configure a custom Infinispan replicated cache to be used by the web app or use the default cache.
Build the docker image & store it in Elastic Container Registry (ECR)
Package the JBoss/Wildfly image with JEE application & Wildfly platform on Docker. Create a Dockerfile & include the following:
Install the WildFly distribution & set permissions – This approach requires the latest Wildfly distribution 15.0.0.Final released recently.
Copy the modified Wildfly standalone-ha.xml to the container.
Deploy the JEE web application. This simple web app is configured as distributable and uses Infinispan to replicate session information across cluster nodes. It displays a page containing the container IP/hostname, Session ID & session data & helps demonstrate session replication.
Define a custom entrypoint, entrypoint.sh, to boot Wildfly with the specified bind IP addresses to its interfaces. The script gets the container metadata, extracts the container IP to bind it to Wildfly interfaces. This interface binding is an important step as it enables the application related network communication (web, messaging) between the containers.
Add the enrypoint.sh script to the image in the Dockerfile.
Build the container & push it to ECR repository. Amazon Elastic Container Registry (ECR) is a fully managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images.
The Wildly configuration files, the Dockerfile & the web app WAR file can be found at the Github link https://github.com/vidyann/Wildfly_ECS
Create ECS Service with service discovery
Create a Service using service discovery.
This link describe steps to set up a ECS task & service with service discovery https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-service-discovery.html#create-service-discovery-taskdef. Though the example in the link creates a Fargate cluster, you can create an EC2 based cluster as well for this example.
While configuring the task choose the network mode as AWSVPC. The task networking features provided by the AWSVPC network mode give Amazon ECS tasks the same networking properties as Amazon EC2 instances. Benefits of task networking can be found here – https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-networking.html
Tasks can be flagged as compatible with EC2, Fargate, or both. Here is what the cluster & service looks like:
When setting up the container details in task, use 8080 as the port, which is the default Wildfly port. This can be changed through WIldfly configuration. Enable the cloudwatch logs which captures Wildfly logging.
While configuring the ECS service, ensure that the service name & namespace should combine to form service endpoint that exactly matches the DNS_Query endpoint configured in Wildfly configuration file. The container security group should allow inbound traffic to port 8080. Here is what the service endpoint looks like:
The route53 registry created by ECS is shown below. We see two DNS entries corresponding to the DNS endpoint myapp.sampleaws.com.
Finally view the Wildfly logs in the console by clicking a task instance. You can check if clustering is enabled by looking for a log entry as below:
Here we see that a Wildfly cluster was formed with two nodes(same as the pic in route 53).
Run the Web App in a browser
Spin up a windows instance in the VPC & open the web app in a browser. Below is a screenshot of the webapp:
Open in different browsers & tabs & verify the Container IP & session ID. Now force shutdown a node by resizing the ECS service task instances to one. Note that though the container IP in the webapp changes, the session ID does not change and the webapp is available and the HTTP Session is alive thus demonstrating the session replication & failover amongst the clustering nodes.
Our goal here is to migrate the enterprise JEE apps to Amazon ECS by tweaking a few configurations but gaining immediately the benefits of containerization & orchestration managed by ECS. By delegating the undifferentiated heavy lifting of container management, orchestration, scaling to ECS, you can focus on improvising/re-architecting your application to micro-services oriented architecture. Please note that all the deployment procedures in this article can be fully automated via the AWS CI/CD services.
Companies are increasingly building their applications as microservices (many separate services that each do a single job). Microservices often allow companies to iterate and deploy more quickly. Many of these microservice-based modern applications are built using various types of cloud resources and deployed on dynamically changing infrastructure. Previously you had to use configuration files to manage the location of your application resource. However, dependencies in a microservices-based application can quickly become too complex to easily manage through configuration files. Additionally, many applications are built using containers that scale dynamically, reacting on the changes in traffic load. That increases your application responsiveness, but poses a new class of problem – now your application components need to discover and connect to the upstream services at runtime. This problem of connectivity in dynamically changing infrastructures and microservices is commonly addressed by service discovery.
Introducing AWS Cloud Map
AWS Cloud Map keeps track of all your application components, their locations, attributes and health status. Now your applications can simply query AWS Cloud Map using AWS SDK, API or even DNS to discover the locations of its dependencies. That allows your applications to scale dynamically and connect to upstream services directly, increasing the responsiveness of your applications.
When you register your web services and cloud resources in AWS Cloud Map, you can describe them using custom attributes, such as deployment stage and version. Your applications then can make discovery calls specifying the required deployment stage and version. AWS Cloud Map will return the locations of resources that match the supplied parameters. It simplifies your deployments and reduces the operational complexity for your applications.
Integrated health checking for IP-based resources, registered with AWS Cloud Map, automatically stops routing traffic to unhealthy endpoints. Additionally, you have APIs to describe the health status of your services, so that you can learn about potential issues with your infrastructure. That increases the resilience of your applications.
AWS Cloud Map in Action Getting started with AWS Cloud Map is easy. You can use the AWS console or CLI to create a namespace, such as myapp.com . For this example, I’ll use the CLI. Let’s create a namespace:
At this point, you’ll need to decide whether your want your applications to discover resources only via the AWS SDK and API calls, or if you need optional discovery via DNS. When you enable DNS discovery for a namespace, you’ll need to provide IP addresses for all the resources that you register. If you plan to register other cloud resources, such as DynamoDB tables by ARN or the URLs of the APIs deployed on Amazon API Gateway, you need to select API discovery mode.
Once your namespace is created, it’s time to create services. A service represents your application components, such as users , auth, or payment and can be comprised of many dynamically changing resources. You can specify a friendly name for your service, then select the DNS discovery and health checking options. You can create a service like this:
And that’s it! Amazon Elastic Container Service (ECS) and AWS Fargate are tightly integrated with AWS Cloud Map. When you create your service and enable service discovery, all the task instances are automatically registered in AWS Cloud Map on scale up, and deregistered on scale down. ECS also ensures that only healthy task instances are returned on the discovery calls by publishing always up-to-date health information to AWS Cloud Map.
For Amazon Elastic Container Service for Kubernetes (EKS), you can automatically publish the external IPs of the services running in EKS in AWS Cloud Map. To do this, we’ve released an update to an open source project, ExternalDNS, to make Kubernetes resources discoverable via AWS Cloud Map. You can find out more details about Kubernetes External DNS here.
Now Generally Available You can start building your applications with AWS Cloud Map and enjoy the integration with Amazon ECS and EKS, rich and secure API query interface, ubiquitous DNS name resolution and integrated health checking support today. Want to try it out? Head to https://console.aws.amazon.com/cloudmap/home. To test out the integration with ECS, head to https://console.aws.amazon.com/ecs/home and enable Service Discovery to get started.
Many of today’s discussions around blockchain technology remind me of the classic Shimmer Floor Wax skit. According to Dan Aykroyd, Shimmer is a dessert topping. Gilda Radner claims that it is a floor wax, and Chevy Chase settles the debate and reveals that it actually is both! Some of the people that I talk to see blockchains as the foundation of a new monetary system and a way to facilitate international payments. Others see blockchains as a distributed ledger and immutable data source that can be applied to logistics, supply chain, land registration, crowdfunding, and other use cases. Either way, it is clear that there are a lot of intriguing possibilities and we are working to help our customers use this technology more effectively.
We are launching AWS Blockchain Templates today. These templates will let you launch an Ethereum (either public or private) or Hyperledger Fabric (private) network in a matter of minutes and with just a few clicks. The templates create and configure all of the AWS resources needed to get you going in a robust and scalable fashion.
Launching a Private Ethereum Network The Ethereum template offers two launch options. The ecs option creates an Amazon ECS cluster within a Virtual Private Cloud (VPC) and launches a set of Docker images in the cluster. The docker-local option also runs within a VPC, and launches the Docker images on EC2 instances. The template supports Ethereum mining, the EthStats and EthExplorer status pages, and a set of nodes that implement and respond to the Ethereum RPC protocol. Both options create and make use of a DynamoDB table for service discovery, along with Application Load Balancers for the status pages.
Here are the AWS Blockchain Templates for Ethereum:
With the advent of AWS PrivateLink, you can provide services to AWS customers directly in their Virtual Private Networks by offering cross-account SaaS solutions on private IP addresses rather than over the Internet.
Traffic that flows to the services you provide does so over private AWS networking rather than over the Internet, offering security and performance enhancements, as well as convenience. PrivateLink can tie in with the AWS Marketplace, facilitating billing and providing a straightforward consumption model to your customers.
The use cases are myriad, but, for this blog post, we’ll demonstrate a fictional order-processing resource. The resource accepts JSON data over a RESTful API, simulating an interface. This could easily be an existing application being considered for a PrivateLink-based consumption model. Consumers of this resource send JSON payloads representing new orders and the system responds with order IDs corresponding to newly-created orders in the system. In a real-world scenario, additional APIs, such as authentication, might also represent critical aspects of the system. This example will not demonstrate these additional APIs because they could be consumed over PrivateLink in a similar fashion to the API constructed in the example.
I’ll demonstrate how to expose the resource on a private IP address in a customer’s VPC. I’ll also explain an architecture leveraging PrivateLink and provide detailed instructions for how to set up such a service. Finally, I’ll provide an example of how a customer might consume such a service. I’ll focus not only on how to architect the solution, but also the considerations that drive architectural choices.
N.B.: Only two subnets and Availability Zones are shown per VPC for simplicity. Resources must cover all Availability Zones per Region, so that the application is available to all consumers in the region. The instructions in this post, which pertain to resources sitting in us-east-1 will detail the deployment of subnets in all six Availability Zones for this region.
This solution exposes an application’s HTTP-based API over PrivateLink in a provider’s AWS account. The application is a stateless web server running on Amazon Elastic Compute Cloud (EC2) instances. The provider places instances within a virtual private network (VPC) consisting of one private subnet per Availability Zone (AZ). Each AZ contains a subnet. Instances populate each subnet inside of Auto Scaling Groups (ASGs), maintaining a desired count per subnet. There is one ASG per subnet to ensure that the service is available in each AZ. An internal Network Load Balancer (NLB) sits in front of the entire fleet of application instances and an endpoint service is connected with the NLB.
In the customer’s AWS account, they create an endpoint that consumes the endpoint service from the provider’s account. The endpoint exposes an Elastic Network Interface (ENI) in each subnet the customer desires. Each ENI is assigned an IP address within the CIDR block associated with the subnet, for any number of subnets in any number of AZs within the region, for each customer.
PrivateLink facilitates cross-account access to services so the customer can use the provider’s service, feeding it data that exist within the customer’s account while using application logic and systems that run in the provider’s account. The routing between accounts is over private networking rather than over the Internet.
Though this example shows a simple, stateless service running on EC2 and sitting behind an NLB, many kinds of AWS services can be exposed through PrivateLink and can serve as pathways into a provider’s application, such as Amazon Kinesis Streams, Amazon EC2 Container Service, Amazon EC2 Systems Manager, and more.
Using PrivateLink to Establish a Service for Consumption
Building a service to be consumed through PrivateLink involves a few steps:
Build a VPC covering all AZs in region with private subnets
Create a NLB, listener, and target group for instances
Create a launch configuration and ASGs to manage the deployment of Amazon
EC2 instances in each subnet
Launch an endpoint service and connect it with the NLB
Tie endpoint-request approval with billing systems or the AWS Marketplace
Provide the endpoint service in multiple regions
Step 1: Build a VPC and private subnets
Start by determining the network you will need to serve the application. Keep in mind, that you will need to serve the application out of each AZ within any region you choose. Customers will expect to consume your service in multiple AZs because AWS recommends they architect their own applications to span across AZs for fault-tolerance purposes.
Additionally, anything less than full coverage across all AZs in a single region will not facilitate straightforward consumption of your service because AWS does not guarantee that a single AZ will carry the same name across accounts. In fact, AWS randomizes AZ names across accounts to ensure even distribution of independent workloads. Telling customers, for example, that you provide a service in us-east-1a may not give them sufficient information to connect with your service.
The solution is to serve your application in all AZs within a region because this guarantees that no matter what AZs a customer chooses for endpoint creation, that customer is guaranteed to find a running instance of your application with which to connect.
You can lay the foundations for doing this by creating a subnet in each AZ within the region of your choice. The subnets can be private because the service, exposed via PrivateLink, will not provide any publicly routable APIs.
This example uses the us-east-1 region. If you use a different region, the number of AZs may vary, which will change the number of subnets required, and thus the size of the IP address range for your VPC may require adjustments.
The example above creates a VPC with 128 IP addresses starting at 10.3.0.0. Each subnet will contain 16 IP addresses, using a total of 96 addresses in the space. Allocating a sufficient block of addresses requires some planning (though you can make adjustments later if needed). I’d suggest an equally-sized address space in each subnet because the provided service should embody the same performance, availability, and functionality regardless of which AZ your customers choose. Each subnet will need a sufficient address space to accommodate the number of instances you run within it. Additionally, you will need enough space to allow for one IP address per subnet to assign to that subnet’s NLB node’s Elastic Network Interface (ENI).
In this simple example, 16 IP addresses per subnet are enough because we will configure ASGs to maintain two instances each and the NLB requires one ENI. Each subnet reserves five IP addresses for internal purposes, for a total of eight IP addresses needed in each subnet to support the service.
Next, create the private subnets for each Availability Zone. The following demonstrates the creation of the first subnet, which sits in the us-east-1a AZ:
Repeat this step for each remaining AZ. If using the us-east-1 region, you will need to create private subnets in all AZs as follows:
For the purpose of this example, the subnets can leverage the default route table, as it contains a single rule for routing requests to private IP addresses in the VPC, as follows:
In a real-world case, additional routing may be required. For example, you may need additional routes to support VPC peering to access dependencies in other VPCs, connectivity to on-premises resources over DirectConnect or VPN, Internet-accessible dependencies via NAT, or other scenarios.
Security Group Creation
Instances will need to be placed in a security group that allows traffic from the NLB nodes that sit in each subnet.
All instances running the service should be in a security group accepting TCP traffic on the traffic port from any other IP address in the VPC. This will allow the NLB to forward traffic to those instances because the NLB nodes sit in the VPC and are assigned IP addresses in the subnets. In this example, the order processing server running on each instance exposes a service on port 3000, so the security group rule covers this port.
Create a security group for instances:
aws ec2 create-security-group \
--group-name "service-sg" \
--description "Security group for service instances" \
Step 2: Create a Network Load Balancer, Listener, and Target Group
The service integrates with PrivateLink using an internal NLB which sits in front of instances that run the service.
Step 3: Create a Launch Configuration and Auto Scaling Groups
Each private subnet in the VPC will require its own ASG in order to ensure that there is always a minimum number of instances in each subnet.
A single ASG spanning all subnets will not guarantee that every subnet contains the appropriate number of instances. For example, while a single ASG could be configured to work across six subnets and maintain twelve instances, there is no guarantee that each of the six subnets will contain two instances. To guarantee the appropriate number of instances on a per-subnet basis, each subnet must be configured with its own ASG.
New instances should be automatically created within each ASG based on a single launch configuration. The launch configuration should be set up to use an existing Amazon Machine Image (AMI).
This post presupposes you have an AMI that can be used to create new instances that serve the application. There are only a few basic assumptions to how this image is configured:
1. The image containes a web server that serves traffic (in this case, on port 3000) 2. The image is configured to automatically launch the web server as a daemon when the instance starts.
Repeat this process to create an ASG in each remaining subnet, using the same launch configuration and target group.
In this example, only two instances are created in each subnet. In a real-world scenario, additional instances would likely be recommended for both availability and scale. The ASGs use the provided launch configuration as a template for creating new instances.
When creating the ASGs, the ARN of the target group for the NLB is provided. This way, the ASGs automatically register newly-created instances with the target group so that the NLB can begin sending traffic to them.
Step 4: Launch an endpoint service and connect with NLB
Now, expose the service via PrivateLink with an endpoint service, providing the ARN of the NLB:
This endpoint service is configured to require acceptance. This means that new consumers who attempt to add endpoints that consume it will have to wait for the provider to allow access. This provides an opportunity to control access and integrate with billing systems that monetize the provided service.
Step 5: Tie endpoint request approval with billing system or the AWS Marketplace
If you’re maintaining your service as a private service, any account that is intended to have access must be whitelisted before it can find the endpoint service and create an endpoint to consume it.
For more information on listing a PrivateLink service in the AWS Marketplace, see How to List Your Product in AWS Marketplace (https://aws.amazon.com/blogs/apn/how-to-list-your-product-in-aws-marketplace/).
Most production-ready services offered through PrivateLink will require acceptance of Endpoint requests before customers can consume them. Typically, some level of automation around processing approvals is helpful. PrivateLink can publish on a Simple Notification Service (SNS) topic when customers request approval.
Setting this up requires two steps:
1. Create a new SNS topic 2. Create an endpoint connection notification that publishes to the SNS topic.
Each is discussed below.
Create an SNS Topic
First, create a new SNS Topic that can receive messages relating to endpoint service access requests:
A billing system may ultimately tie in with request approval. This can also be done manually, which may be less useful, but is illustrative. As an example, assume that a customer account has already requested an endpoint to consume the service. The customer can be accepted manually, as follows:
At this point, the consumer can begin consuming the service.
Step 6: Take the Service Across Regions
In distributing SaaS via PrivateLink, providers may have to have to think about how to make their services available in different regions because Endpoint Services are only available within the region where they are created. Customers who attempt to consume Endpoint Services will not be able to create Endpoints across regions.
Rather than saddling consumers with the responsibility of making the jump across regions, we recommend providers work to make services available where their customers consume. They are in a better position to adapt their architectures to multiple regions than customers who do not know the internals of how providers have designed their services.
There are several architectural options that can support multi-region adaptation. Selection among them will depend on a number of factors, including read-to-write ratio, latency requirements, budget, amenability to re-architecture, and preference for simplicity.
Generally, the challenge in providing multi-region SaaS is in instantiating stateful components in multiple regions because the data on which such components depend are hard to replicate, synchronize, and access with low latency over large geographical distances.
Of all stateful components, perhaps the most frequently encountered will be databases. Some solutions for overcoming this challenge with respect to databases are as follows:
1. Provide a master in a single region; provide read replicas in every region. 2. Provide a master in every region; assign each tenant to one master only. 3. Create a full multi-master architecture; replicate data efficiently. 4. Rely on a managed service for replicating data cross-regionally (e.g., DynamoDB Global Tables).
Stateless components can be provisioned in multiple regions more easily. In this example, you will have to re-create all of the VPC resources—including subnets, Routing Tables, Security Groups, and Endpoint Services—as well as all EC2 resources—including instances, NLBs, Listeners, Target Groups, ASGs, and Launch Configurations—in each additional region. Because of the complexity in doing so, in addition to the significant need to keep regional configurations in-sync, you may wish to explore an orchestration tool such as CloudFormation, rather than the command line.
Regardless of what orchestration tooling you choose, you will need to copy your AMI to each region in which you wish to deploy it. Once available, you can build out your service in that region much as you did in the first one.
The response will include an attribute called VpcEndpoint.DnsEntries. The service can be accessed at each of the DNS names in the output under any of the entries there. Before the consumer can access the endpoint service, the provider has to accept the Endpoint.
Access Endpoint Via Custom DNS Names
When creating a new Endpoint, the consumer will receive named endpoint addresses in each AZ where the Endpoint is created, plus a named endpoint that is AZ-agnostic. For example:
The consumer can use Route53 to provide a custom DNS name for the service. This not only allows for using cleaner service names, but also enables the consumer to leverage the traffic management features of Route53, such as fail-over routing.
First, the the consumer must enable DNS Hostnames and DNS Support on the VPC within which the Endpoint was created. The consumer should start by enabling DNS Hostnames:
After the VPC is properly configured to work with Route53, the consumer should either select an existing hosted zone or create a new one. Assuming one has not already been created, the consumer should create one as follows:
In the request, the consumer specifies the DNS name, VPC ID, region, and flags the hosted zone as private. Additionally, the consumer must provide a “caller reference” which is a unique ID of the request that can be used to identify it in subsequent actions if the request fails.
Next, the consumer should create a JSON file corresponding to a batch of record change requests. In this file, the consumer can specify the name of the endpoint, as well as a CNAME pointing to the AZ-agnostic DNS name of the Endpoint:
At this point, the Endpoint can be consumed at http://order-processor.endpoints.internal.
AWS PrivateLink is an exciting way to expose SaaS services to customers. This article demonstrated how to expose an existing application on EC2 via PrivateLink in a customer’s VPC, as well as recommended architecture. Finally, it walked through the steps that a customer would have to go through to consume the service.
Amazon ECS now includes integrated service discovery. This makes it possible for an ECS service to automatically register itself with a predictable and friendly DNS name in Amazon Route 53. As your services scale up or down in response to load or container health, the Route 53 hosted zone is kept up to date, allowing other services to lookup where they need to make connections based on the state of each service. You can see a demo of service discovery in an imaginary social networking app over at: https://servicediscovery.ranman.com/.
Part of the transition to microservices and modern architectures involves having dynamic, autoscaling, and robust services that can respond quickly to failures and changing loads. Your services probably have complex dependency graphs of services they rely on and services they provide. A modern architectural best practice is to loosely couple these services by allowing them to specify their own dependencies, but this can be complicated in dynamic environments as your individual services are forced to find their own connection points.
Traditional approaches to service discovery like consul, etcd, or zookeeper all solve this problem well, but they require provisioning and maintaining additional infrastructure or installation of agents in your containers or on your instances. Previously, to ensure that services were able to discover and connect with each other, you had to configure and run your own service discovery system or connect every service to a load balancer. Now, you can enable service discovery for your containerized services in the ECS console, AWS CLI, or using the ECS API.
Introducing Amazon Route 53 Service Registry and Auto Naming APIs
Amazon ECS Service Discovery works by communicating with the Amazon Route 53 Service Registry and Auto Naming APIs. Since we haven’t talked about it before on this blog, I want to briefly outline how these Route 53 APIs work. First, some vocabulary:
Namespaces – A namespace specifies a domain name you want to route traffic to (e.g. internal, local, corp). You can think of it as a logical boundary between which services should be able to discover each other. You can create a namespace with a call to the aws servicediscovery create-private-dns-namespace command or in the ECS console. Namespaces are roughly equivalent to hosted zones in Route 53. A namespace contains services, our next vocabulary word.
Service – A service is a specific application or set of applications in your namespace like “auth”, “timeline”, or “worker”. A service contains service instances.
Service Instance – A service instance contains information about how Route 53 should respond to DNS queries for a resource.
Route 53 provides APIs to create: namespaces, A records per task IP, and SRV records per task IP + port.
When we ask Route 53 for something like: worker.corp we should get back a set of possible IPs that could fulfill that request. If the application we’re connecting to exposes dynamic ports then the calling application can easily query the SRV record to get more information.
ECS service discovery is built on top of the Route 53 APIs and manages all of the underlying API calls for you. Now that we understand how the service registry, works lets take a look at the ECS side to see service discovery in action.
Amazon ECS Service Discovery
Let’s launch an application with service discovery! First, I’ll create two task definitions: “flask-backend” and “flask-worker”. Both are simple AWS Fargate tasks with a single container serving HTTP requests. I’ll have flask-backend ask worker.corp to do some work and I’ll return the response as well as the address Route 53 returned for worker. Something like the code below:
Now, with my containers and task definitions in place, I’ll create a service in the console.
As I move to step two in the service wizard I’ll fill out the service discovery section and have ECS create a new namespace for me.
I’ll also tell ECS to monitor the health of the tasks in my service and add or remove them from Route 53 as needed. Then I’ll set a TTL of 10 seconds on the A records we’ll use.
I’ll repeat those same steps for my “worker” service and after a minute or so most of my tasks should be up and running.
Over in the Route 53 console I can see all the records for my tasks!
We can use the Route 53 service discovery APIs to list all of our available services and tasks and programmatically reach out to each one. We could easily extend to any number of services past just backend and worker. I’ve created a simple demo of an imaginary social network with services like “auth”, “feed”, “timeline”, “worker”, “user” and more here: https://servicediscovery.ranman.com/. You can see the code used to run that page on github.
Available Now Amazon ECS service discovery is available now in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). AWS Fargate is currently only available in US East (N. Virginia). When you use ECS service discovery, you pay for the Route 53 resources that you consume, including each namespace that you create, and for the lookup queries your services make. Container level health checks are provided at no cost. For more information on pricing check out the documentation.
Please let us know what you’ll be building or refactoring with service discovery either in the comments or on Twitter!
Cerberus Technologies, in their own words: Cerberus is a company founded in 2017 by a team of visionary iGaming veterans. Our mission is simple – to offer the best tech solutions through a data-driven and a customer-first approach, delivering innovative solutions that go against traditional forms of working and process. This mission is based on the solid foundations of reliability, flexibility and security, and we intend to fundamentally change the way iGaming and other industries interact with technology.
Over the years, I have developed and created a number of data warehouses from scratch. Recently, I built a data warehouse for the iGaming industry single-handedly. To do it, I used the power and flexibility of Amazon Redshift and the wider AWS data management ecosystem. In this post, I explain how I was able to build a robust and scalable data warehouse without the large team of experts typically needed.
In two of my recent projects, I ran into challenges when scaling our data warehouse using on-premises infrastructure. Data was growing at many tens of gigabytes per day, and query performance was suffering. Scaling required major capital investment for hardware and software licenses, and also significant operational costs for maintenance and technical staff to keep it running and performing well. Unfortunately, I couldn’t get the resources needed to scale the infrastructure with data growth, and these projects were abandoned. Thanks to cloud data warehousing, the bottleneck of infrastructure resources, capital expense, and operational costs have been significantly reduced or have totally gone away. There is no more excuse for allowing obstacles of the past to delay delivering timely insights to decision makers, no matter how much data you have.
With Amazon Redshift and AWS, I delivered a cloud data warehouse to the business very quickly, and with a small team: me. I didn’t have to order hardware or software, and I no longer needed to install, configure, tune, or keep up with patches and version updates. Instead, I easily set up a robust data processing pipeline and we were quickly ingesting and analyzing data. Now, my data warehouse team can be extremely lean, and focus more time on bringing in new data and delivering insights. In this post, I show you the AWS services and the architecture that I used.
Handling data feeds
I have several different data sources that provide everything needed to run the business. The data includes activity from our iGaming platform, social media posts, clickstream data, marketing and campaign performance, and customer support engagements.
To handle the diversity of data feeds, I developed abstract integration applications using Docker that run on Amazon EC2 Container Service (Amazon ECS) and feed data to Amazon Kinesis Data Streams. These data streams can be used for real time analytics. In my system, each record in Kinesis is preprocessed by an AWS Lambda function to cleanse and aggregate information. My system then routes it to be stored where I need on Amazon S3 by Amazon Kinesis Data Firehose. Suppose that you used an on-premises architecture to accomplish the same task. A team of data engineers would be required to maintain and monitor a Kafka cluster, develop applications to stream data, and maintain a Hadoop cluster and the infrastructure underneath it for data storage. With my stream processing architecture, there are no servers to manage, no disk drives to replace, and no service monitoring to write.
Setting up a Kinesis stream can be done with a few clicks, and the same for Kinesis Firehose. Firehose can be configured to automatically consume data from a Kinesis Data Stream, and then write compressed data every N minutes to Amazon S3. When I want to process a Kinesis data stream, it’s very easy to set up a Lambda function to be executed on each message received. I can just set a trigger from the AWS Lambda Management Console, as shown following.
Regardless of the format I receive the data from our partners, I can send it to Kinesis as JSON data using my own formatters. After Firehose writes this to Amazon S3, I have everything in nearly the same structure I received but compressed, encrypted, and optimized for reading.
This data is automatically crawled by AWS Glue and placed into the AWS Glue Data Catalog. This means that I can immediately query the data directly on S3 using Amazon Athena or through Amazon Redshift Spectrum. Previously, I used Amazon EMR and an Amazon RDS–based metastore in Apache Hive for catalog management. Now I can avoid the complexity of maintaining Hive Metastore catalogs. Glue takes care of high availability and the operations side so that I know that end users can always be productive.
Working with Amazon Athena and Amazon Redshift for analysis
I found Amazon Athena extremely useful out of the box for ad hoc analysis. Our engineers (me) use Athena to understand new datasets that we receive and to understand what transformations will be needed for long-term query efficiency.
For our data analysts and data scientists, we’ve selected Amazon Redshift. Amazon Redshift has proven to be the right tool for us over and over again. It easily processes 20+ million transactions per day, regardless of the footprint of the tables and the type of analytics required by the business. Latency is low and query performance expectations have been more than met. We use Redshift Spectrum for long-term data retention, which enables me to extend the analytic power of Amazon Redshift beyond local data to anything stored in S3, and without requiring me to load any data. Redshift Spectrum gives me the freedom to store data where I want, in the format I want, and have it available for processing when I need it.
To load data directly into Amazon Redshift, I use AWS Data Pipeline to orchestrate data workflows. I create Amazon EMR clusters on an intra-day basis, which I can easily adjust to run more or less frequently as needed throughout the day. EMR clusters are used together with Amazon RDS, Apache Spark 2.0, and S3 storage. The data pipeline application loads ETL configurations from Spring RESTful services hosted on AWS Elastic Beanstalk. The application then loads data from S3 into memory, aggregates and cleans the data, and then writes the final version of the data to Amazon Redshift. This data is then ready to use for analysis. Spark on EMR also helps with recommendations and personalization use cases for various business users, and I find this easy to set up and deliver what users want. Finally, business users use Amazon QuickSight for self-service BI to slice, dice, and visualize the data depending on their requirements.
Each AWS service in this architecture plays its part in saving precious time that’s crucial for delivery and getting different departments in the business on board. I found the services easy to set up and use, and all have proven to be highly reliable for our use as our production environments. When the architecture was in place, scaling out was either completely handled by the service, or a matter of a simple API call, and crucially doesn’t require me to change one line of code. Increasing shards for Kinesis can be done in a minute by editing a stream. Increasing capacity for Lambda functions can be accomplished by editing the megabytes allocated for processing, and concurrency is handled automatically. EMR cluster capacity can easily be increased by changing the master and slave node types in Data Pipeline, or by using Auto Scaling. Lastly, RDS and Amazon Redshift can be easily upgraded without any major tasks to be performed by our team (again, me).
In the end, using AWS services including Kinesis, Lambda, Data Pipeline, and Amazon Redshift allows me to keep my team lean and highly productive. I eliminated the cost and delays of capital infrastructure, as well as the late night and weekend calls for support. I can now give maximum value to the business while keeping operational costs down. My team pushed out an agile and highly responsive data warehouse solution in record time and we can handle changing business requirements rapidly, and quickly adapt to new data and new user requests.
Stephen Borg is the Head of Big Data and BI at Cerberus Technologies. He has a background in platform software engineering, and first became involved in data warehousing using the typical RDBMS, SQL, ETL, and BI tools. He quickly became passionate about providing insight to help others optimize the business and add personalization to products. He is now the Head of Big Data and BI at Cerberus Technologies.
AWS Fargate is a new technology that works with Amazon Elastic Container Service (ECS) to run containers without having to manage servers or clusters. What does this mean? With Fargate, you no longer need to provision or manage a single virtual machine; you can just create tasks and run them directly!
Fargate uses the same API actions as ECS, so you can use the ECS console, the AWS CLI, or the ECS CLI. I recommend running through the first-run experience for Fargate even if you’re familiar with ECS. It creates all of the one-time setup requirements, such as the necessary IAM roles. If you’re using a CLI, make sure to upgrade to the latest version.
In this blog, you will see how to migrate ECS containers from running on Amazon EC2 to Fargate.
Note: Anything with code blocks is a change in the task definition file. Screen captures are from the console. Additionally, Fargate is currently available in the us-east-1 (N. Virginia) region.
When you create tasks (grouping of containers) and clusters (grouping of tasks), you now have two launch type options: EC2 and Fargate. The default launch type, EC2, is ECS as you knew it before the announcement of Fargate. You need to specify Fargate as the launch type when running a Fargate task.
Even though Fargate abstracts away virtual machines, tasks still must be launched into a cluster. With Fargate, clusters are a logical infrastructure and permissions boundary that allow you to isolate and manage groups of tasks. ECS also supports heterogeneous clusters that are made up of tasks running on both EC2 and Fargate launch types.
The optional, new requiresCompatibilities parameter with FARGATE in the field ensures that your task definition only passes validation if you include Fargate-compatible parameters. Tasks can be flagged as compatible with EC2, Fargate, or both.
In November, we announced the addition of task networking with the network mode awsvpc. By default, ECS uses the bridge network mode. Fargate requires using the awsvpc network mode.
In bridge mode, all of your tasks running on the same instance share the instance’s elastic network interface, which is a virtual network interface, IP address, and security groups.
The awsvpc mode provides this networking support to your tasks natively. You now get the same VPC networking and security controls at the task level that were previously only available with EC2 instances. Each task gets its own elastic networking interface and IP address so that multiple applications or copies of a single application can run on the same port number without any conflicts.
The awsvpc mode also provides a separation of responsibility for tasks. You can get complete control of task placement within your own VPCs, subnets, and the security policies associated with them, even though the underlying infrastructure is managed by Fargate. Also, you can assign different security groups to each task, which gives you more fine-grained security. You can give an application only the permissions it needs.
What else has to change? First, you only specify a containerPort value, not a hostPort value, as there is no host to manage. Your container port is the port that you access on your elastic network interface IP address. Therefore, your container ports in a single task definition file need to be unique.
Additionally, links are not allowed as they are a property of the “bridge” network mode (and are now a legacy feature of Docker). Instead, containers share a network namespace and communicate with each other over the localhost interface. They can be referenced using the following:
When launching a task with the EC2 launch type, task performance is influenced by the instance types that you select for your cluster combined with your task definition. If you pick larger instances, your applications make use of the extra resources if there is no contention.
In Fargate, you needed a way to get additional resource information so we created task-level resources. Task-level resources define the maximum amount of memory and cpu that your task can consume.
memory can be defined in MB with just the number, or in GB, for example, “1024” or “1gb”.
cpu can be defined as the number or in vCPUs, for example, “256” or “.25vcpu”.
vCPUs are virtual CPUs. You can look at the memory and vCPUs for instance types to get an idea of what you may have used before.
The memory and CPU options available with Fargate are:
256 (.25 vCPU)
0.5GB, 1GB, 2GB
512 (.5 vCPU)
1GB, 2GB, 3GB, 4GB
1024 (1 vCPU)
2GB, 3GB, 4GB, 5GB, 6GB, 7GB, 8GB
2048 (2 vCPU)
Between 4GB and 16GB in 1GB increments
4096 (4 vCPU)
Between 8GB and 30GB in 1GB increments
Because Fargate uses awsvpc mode, you need an Amazon ECS service-linked IAM role named AWSServiceRoleForECS. It provides Fargate with the needed permissions, such as the permission to attach an elastic network interface to your task. After you create your service-linked IAM role, you can delete the remaining roles in your services.
With the EC2 launch type, an instance role gives the agent the ability to pull, publish, talk to ECS, and so on. With Fargate, the task execution IAM role is only needed if you’re pulling from Amazon ECR or publishing data to Amazon CloudWatch Logs.
AWS Fargate is a technology that allows you to focus on running your application without needing to provision, monitor, or manage the underlying compute infrastructure. You package your application into a Docker container that you can then launch using your container orchestration tool of choice.
Fargate allows you to use containers without being responsible for Amazon EC2 instances, similar to how EC2 allows you to run VMs without managing physical infrastructure. Currently, Fargate provides support for Amazon Elastic Container Service (Amazon ECS). Support for Amazon Elastic Container Service for Kubernetes (Amazon EKS) will be made available in the near future.
Despite offloading the responsibility for the underlying instances, Fargate still gives you deep control over configuration of network placement and policies. This includes the ability to use many networking fundamentals such as Amazon VPC and security groups.
This post covers how to take advantage of the different ways of networking your containers in Fargate when using ECS as your orchestration platform, with a focus on how to do networking securely.
The first step to running any application in Fargate is defining an ECS task for Fargate to launch. A task is a logical group of one or more Docker containers that are deployed with specified settings. When running a task in Fargate, there are two different forms of networking to consider:
Container (local) networking
Container networking is often used for tightly coupled application components. Perhaps your application has a web tier that is responsible for serving static content as well as generating some dynamic HTML pages. To generate these dynamic pages, it has to fetch information from another application component that has an HTTP API.
One potential architecture for such an application is to deploy the web tier and the API tier together as a pair and use local networking so the web tier can fetch information from the API tier.
If you are running these two components as two processes on a single EC2 instance, the web tier application process could communicate with the API process on the same machine by using the local loopback interface. The local loopback interface has a special IP address of 127.0.0.1 and hostname of localhost.
By making a networking request to this local interface, it bypasses the network interface hardware and instead the operating system just routes network calls from one process to the other directly. This gives the web tier a fast and efficient way to fetch information from the API tier with almost no networking latency.
In Fargate, when you launch multiple containers as part of a single task, they can also communicate with each other over the local loopback interface. Fargate uses a special container networking mode called awsvpc, which gives all the containers in a task a shared elastic network interface to use for communication.
If you specify a port mapping for each container in the task, then the containers can communicate with each other on that port. For example the following task definition could be used to deploy the web tier and the API tier:
ECS, with Fargate, is able to take this definition and launch two containers, each of which is bound to a specific static port on the elastic network interface for the task.
Because each Fargate task has its own isolated networking stack, there is no need for dynamic ports to avoid port conflicts between different tasks as in other networking modes. The static ports make it easy for containers to communicate with each other. For example, the web container makes a request to the API container using its well-known static port:
This sends a local network request, which goes directly from one container to the other over the local loopback interface without traversing the network. This deployment strategy allows for fast and efficient communication between two tightly coupled containers. But most application architectures require more than just internal local networking.
External networking is used for network communications that go outside the task to other servers that are not part of the task, or network communications that originate from other hosts on the internet and are directed to the task.
Configuring external networking for a task is done by modifying the settings of the VPC in which you launch your tasks. A VPC is a fundamental tool in AWS for controlling the networking capabilities of resources that you launch on your account.
When setting up a VPC, you create one or more subnets, which are logical groups that your resources can be placed into. Each subnet has an Availability Zone and its own route table, which defines rules about how network traffic operates for that subnet. There are two main types of subnets: public and private.
A public subnet is a subnet that has an associated internet gateway. Fargate tasks in that subnet are assigned both private and public IP addresses:
A browser or other client on the internet can send network traffic to the task via the internet gateway using its public IP address. The tasks can also send network traffic to other servers on the internet because the route table can route traffic out via the internet gateway.
If tasks want to communicate directly with each other, they can use each other’s private IP address to send traffic directly from one to the other so that it stays inside the subnet without going out to the internet gateway and back in.
A private subnet does not have direct internet access. The Fargate tasks inside the subnet don’t have public IP addresses, only private IP addresses. Instead of an internet gateway, a network address translation (NAT) gateway is attached to the subnet:
There is no way for another server or client on the internet to reach your tasks directly, because they don’t even have an address or a direct route to reach them. This is a great way to add another layer of protection for internal tasks that handle sensitive data. Those tasks are protected and can’t receive any inbound traffic at all.
In this configuration, the tasks can still communicate to other servers on the internet via the NAT gateway. They would appear to have the IP address of the NAT gateway to the recipient of the communication. If you run a Fargate task in a private subnet, you must add this NAT gateway. Otherwise, Fargate can’t make a network request to Amazon ECR to download the container image, or communicate with Amazon CloudWatch to store container metrics.
If you are running a container that is hosting internet content in a private subnet, you need a way for traffic from the public to reach the container. This is generally accomplished by using a load balancer such as an Application Load Balancer or a Network Load Balancer.
ECS integrates tightly with AWS load balancers by automatically configuring a service-linked load balancer to send network traffic to containers that are part of the service. When each task starts, the IP address of its elastic network interface is added to the load balancer’s configuration. When the task is being shut down, network traffic is safely drained from the task before removal from the load balancer.
To get internet traffic to containers using a load balancer, the load balancer is placed into a public subnet. ECS configures the load balancer to forward traffic to the container tasks in the private subnet:
This configuration allows your tasks in Fargate to be safely isolated from the rest of the internet. They can still initiate network communication with external resources via the NAT gateway, and still receive traffic from the public via the Application Load Balancer that is in the public subnet.
Another potential use case for a load balancer is for internal communication from one service to another service within the private subnet. This is typically used for a microservice deployment, in which one service such as an internet user account service needs to communicate with an internal service such as a password service. Obviously, it is undesirable for the password service to be directly accessible on the internet, so using an internet load balancer would be a major security vulnerability. Instead, this can be accomplished by hosting an internal load balancer within the private subnet:
With this approach, one container can distribute requests across an Auto Scaling group of other private containers via the internal load balancer, ensuring that the network traffic stays safely protected within the private subnet.
Best Practices for Fargate Networking
Determine whether you should use local task networking
Local task networking is ideal for communicating between containers that are tightly coupled and require maximum networking performance between them. However, when you deploy one or more containers as part of the same task they are always deployed together so it removes the ability to independently scale different types of workload up and down.
In the example of the application with a web tier and an API tier, it may be the case that powering the application requires only two web tier containers but 10 API tier containers. If local container networking is used between these two container types, then an extra eight unnecessary web tier containers would end up being run instead of allowing the two different services to scale independently.
A better approach would be to deploy the two containers as two different services, each with its own load balancer. This allows clients to communicate with the two web containers via the web service’s load balancer. The web service could distribute requests across the eight backend API containers via the API service’s load balancer.
Run internet tasks that require internet access in a public subnet
If you have tasks that require internet access and a lot of bandwidth for communication with other services, it is best to run them in a public subnet. Give them public IP addresses so that each task can communicate with other services directly.
If you run these tasks in a private subnet, then all their outbound traffic has to go through an NAT gateway. AWS NAT gateways support up to 10 Gbps of burst bandwidth. If your bandwidth requirements go over this, then all task networking starts to get throttled. To avoid this, you could distribute the tasks across multiple private subnets, each with their own NAT gateway. It can be easier to just place the tasks into a public subnet, if possible.
Avoid using a public subnet or public IP addresses for private, internal tasks
If you are running a service that handles private, internal information, you should not put it into a public subnet or use a public IP address. For example, imagine that you have one task, which is an API gateway for authentication and access control. You have another background worker task that handles sensitive information.
The intended access pattern is that requests from the public go to the API gateway, which then proxies request to the background task only if the request is from an authenticated user. If the background task is in a public subnet and has a public IP address, then it could be possible for an attacker to bypass the API gateway entirely. They could communicate directly to the background task using its public IP address, without being authenticated.
Fargate gives you a way to run containerized tasks directly without managing any EC2 instances, but you still have full control over how you want networking to work. You can set up containers to talk to each other over the local network interface for maximum speed and efficiency. For running workloads that require privacy and security, use a private subnet with public internet access locked down. Or, for simplicity with an internet workload, you can just use a public subnet and give your containers a public IP address.
To deploy one of these Fargate task networking approaches, check out some sample CloudFormation templates showing how to configure the VPC, subnets, and load balancers.
If you have questions or suggestions, please comment below.
So, what’s Amazon Elastic Container Service (ECS)? ECS is a managed service for running containers on AWS, designed to make it easy to run applications in the cloud without worrying about configuring the environment for your code to run in. Using ECS, you can easily deploy containers to host a simple website or run complex distributed microservices using thousands of containers.
Getting started with ECS isn’t too difficult. To fully understand how it works and how you can use it, it helps to understand the basic building blocks of ECS and how they fit together!
Let’s begin with an analogy
Imagine you’re in a virtual reality game with blocks and portals, in which your task is to build kingdoms.
In your spaceship, you pull up a holographic map of your upcoming destination: Nozama, a golden-orange planet. Looking at its various regions, you see that the nearest one is za-southwest-1 (SWNozama). You set your destination, and use your jump drive to jump to the outer atmosphere of za-southwest-1.
As you approach SW Nozama, you see three portals, 1a, 1b, and 1c. Each portal lets you transport directly to an isolated zone (Availability Zone), where you can start construction on your new kingdom (cluster), Royaume.
With your supply of blocks, you take the portal to 1b, and erect the surrounding walls of your first territory (instance)*.
Before you get ahead of yourself, there are some rules to keep in mind. For your territory to be a part of Royaume, the land ordinance requires construction of a building (container), specifically a castle, from which your territory’s lord (agent)* rules.
You can then create architectural plans (task definitions) to build your developments (tasks), consisting of up to 10 buildings per plan. A development can be built now within this or any territory, or multiple territories.
If you do decide to create more territories, you can either stay here in 1b or take a portal to another location in SW Nozama and start building there.
Amazon EC2 building blocks
We currently provide two launch types: EC2 and Fargate. With Fargate, the Amazon EC2 instances are abstracted away and managed for you. Instead of worrying about ECS container instances, you can just worry about tasks. In this post, the infrastructure components used by ECS that are handled by Fargate are marked with a *.
EC2 instances are good ol’ virtual machines (VMs). And yes, don’t worry, you can connect to them (via SSH). Because customers have varying needs in memory, storage, and computing power, many different instance types are offered. Just want to run a small application or try a free trial? Try t2.micro. Want to run memory-optimized workloads? R3 and X1 instances are a couple options. There are many more instance types as well, which cater to various use cases.
Sorry if you wanted to immediately march forward, but before you create your instance, you need to choose an AMI. An AMI stands for Amazon Machine Image. What does that mean? Basically, an AMI provides the information required to launch an instance: root volume, launch permissions, and volume-attachment specifications. You can find and choose a Linux or Windows AMI provided by AWS, the user community, the AWS Marketplace (for example, the Amazon ECS-Optimized AMI), or you can create your own.
AWS is divided into regions that are geographic areas around the world (for now it’s just Earth, but maybe someday…). These regions have semi-evocative names such as us-east-1 (N. Virginia), us-west-2 (Oregon), eu-central-1 (Frankfurt), ap-northeast-1 (Tokyo), etc.
Each region is designed to be completely isolated from the others, and consists of multiple, distinct data centers. This creates a “blast radius” for failure so that even if an entire region goes down, the others aren’t affected. Like many AWS services, to start using ECS, you first need to decide the region in which to operate. Typically, this is the region nearest to you or your users.
AWS regions are subdivided into Availability Zones. A region has at minimum two zones, and up to a handful. Zones are physically isolated from each other, spanning one or more different data centers, but are connected through low-latency, fiber-optic networking, and share some common facilities. EC2 is designed so that the most common failures only affect a single zone to prevent region-wide outages. This means you can achieve high availability in a region by spanning your services across multiple zones and distributing across hosts.
Are containers virtual machines? Nope! Virtual machines virtualize the hardware (benefits), while containers virtualize the operating system (even more benefits!). If you look inside a container, you would see that it is made by processes running on the host, and tied together by kernel constructs like namespaces, cgroups, etc. But you don’t need to bother about that level of detail, at least not in this post!
Why containers? Containers give you the ability to build, ship, and run your code anywhere!
Before the cloud, you needed to self-host and therefore had to buy machines in addition to setting up and configuring the operating system (OS), and running your code. In the cloud, with virtualization, you can just skip to setting up the OS and running your code. Containers make the process even easier—you can just run your code.
Additionally, all of the dependencies travel in a package with the code, which is called an image. This allows containers to be deployed on any host machine. From the outside, it looks like a host is just holding a bunch of containers. They all look the same, in the sense that they are generic enough to be deployed on any host.
With ECS, you can easily run your containerized code and applications across a managed cluster of EC2 instances.
Are containers a fairly new technology? The concept of containerization is not new. Its origins date back to 1979 with the creation of chroot. However, it wasn’t until the early 2000s that containers became a major technology. The most significant milestone to date was the release of Docker in 2013, which led to the popularization and widespread adoption of containers.
What does ECS use? While other container technologies exist (LXC, rkt, etc.), because of its massive adoption and use by our customers, ECS was designed first to work natively with Docker containers.
And as you probably guessed, in these instances, you are running containers.
These container instances can use any AMI as long as it has the following specifications: a modern Linux distribution with the agent and the Docker Daemon with any Docker runtime dependencies running on it.
Want it more simplified? Well, AWS created the Amazon ECS-Optimized AMI for just that. Not only does that AMI come preconfigured with all of the previously mentioned specifications, it’s tested and includes the recommended ecs-init upstart process to run and monitor the agent.
An ECS cluster is a grouping of (container) instances* (or tasks in Fargate) that lie within a single region, but can span multiple Availability Zones – it’s even a good idea for redundancy. When launching an instance (or tasks in Fargate), unless specified, it registers with the cluster named “default”. If “default” doesn’t exist, it is created. You can also scale and delete your clusters.
The Amazon ECS container agent is a Go program that runs in its own container within each EC2 instance that you use with ECS. (It’s also available open source on GitHub!) The agent is the intermediary component that takes care of the communication between the scheduler and your instances. Want to register your instance into a cluster? (Why wouldn’t you? A cluster is both a logical boundary and provider of pool of resources!) Then you need to run the agent on it.
When you want to start a container, it has to be part of a task. Therefore, you have to create a task first. Succinctly, tasks are a logical grouping of 1 to N containers that run together on the same instance, with N defined by you, up to 10. Let’s say you want to run a custom blog engine. You could put together a web server, an application server, and an in-memory cache, each in their own container. Together, they form a basic frontend unit.
Ah, but you cannot create a task directly. You have to create a task definition that tells ECS that “task definition X is composed of this container (and maybe that other container and that other container too!).” It’s kind of like an architectural plan for a city. Some other details it can include are how the containers interact, container CPU and memory constraints, and task permissions using IAM roles.
Then you can tell ECS, “start one task using task definition X.” It might sound like unnecessary planning at first. As soon as you start to deal with multiple tasks, scaling, upgrades, and other “real life” scenarios, you’ll be glad that you have task definitions to keep track of things!
So, the scheduler schedules… sorry, this should be more helpful, huh? The scheduler is part of the “hosted orchestration layer” provided by ECS. Wait a minute, what do I mean by “hosted orchestration”? Simply put, hosted means that it’s operated by ECS on your behalf, without you having to care about it. Your applications are deployed in containers running on your instances, but the managing of tasks is taken care of by ECS. One less thing to worry about!
Also, the scheduler is the component that decides what (which containers) gets to run where (on which instances), according to a number of constraints. Say that you have a custom blog engine to scale for high availability. You could create a service, which by default, spreads tasks across all zones in the chosen region. And if you want each task to be on a different instance, you can use the distinctInstancetask placement constraint. ECS makes sure that not only this happens, but if a task fails, it starts again.
To ensure that you always have your task running without managing it yourself, you can create a service based on the task that you defined and ECS ensures that it stays running. A service is a special construct that says, “at any given time, I want to make sure that N tasks using task definition X1 are running.” If N=1, it just means “make sure that this task is running, and restart it if needed!” And with N>1, you’re basically scaling your application until you hit N, while also ensuring each task is running.
So, what now?
Hopefully you, at the very least, learned a tiny something. All comments are very welcome!
Want to discuss ECS with others? Join the amazon-ecs slack group, which members of the community created and manage.
Also, if you’re interested in learning more about the core concepts of ECS and its relation to EC2, here are some resources:
This post contributed by Sundar Narasiman, Arun Kannan, and Thomas Fuller.
AWS recently announced the general availability of Windows container management for Amazon Elastic Container Service (Amazon ECS). Docker containers and Amazon ECS make it easy to run and scale applications on a virtual machine by abstracting the complex cluster management and setup needed.
Classic .NET applications are developed with .NET Framework 4.7.1 or older and can run only on a Windows platform. These include Windows Communication Foundation (WCF), ASP.NET Web Forms, and an ASP.NET MVC web app or web API.
Why classic ASP.NET?
ASP.NET MVC 4.6 and older versions of ASP.NET occupy a significant footprint in the enterprise web application space. As enterprises move towards microservices for new or existing applications, containers are one of the stepping stones for migrating from monolithic to microservices architectures. Additionally, the support for Windows containers in Windows 10, Windows Server 2016, and Visual Studio Tooling support for Docker simplifies the containerization of ASP.NET MVC apps.
In this post, you pick an ASP.NET 4.6.2 MVC application and get step-by-step instructions for migrating to ECS using Windows containers. The detailed steps, AWS CloudFormation template, Microsoft Visual Studio solution, ECS service definition, and ECS task definition are available in the aws-ecs-windows-aspnet GitHub repository.
To help you getting started running Windows containers, here is the reference architecture for Windows containers on GitHub: ecs-refarch-cloudformation-windows. This reference architecture is the layered CloudFormation stack, in that it calls the other stacks to create the environment. The CloudFormation YAML template in this reference architecture is referenced to create a single JSON CloudFormation stack, which is used in the steps for the migration.
Your development environment needs to have the latest version and updates for Visual Studio 2017, Windows 10, and Docker for Windows Stable.
Next, containerize the ASP.NET application and test it locally. The size of Windows container application images is generally larger compared to Linux containers. This is because the base image of the Windows container itself is large in size, typically greater than 9 GB.
After the application is containerized, the container image needs to be pushed to Amazon Elastic Container Registry (Amazon ECR). Images stored in ECR are compressed to improve pull times and reduce storage costs. In this case, you can see that ECR compresses the image to around 1 GB, for an optimization factor of 90%.
Create a CloudFormation stack using the template in the ‘CloudFormation template’ folder. This creates an ECS service, task definition (referring the containerized ASP.NET application), and other related components mentioned in the ECS reference architecture for Windows containers.
After the stack is created, verify the successful creation of the ECS service, ECS instances, running tasks (with the threshold mentioned in the task definition), and the Application Load Balancer’s successful health check against running containers.
Navigate to the Application Load Balancer URL and see the successful rendering of the containerized ASP.NET MVC app in the browser.
Generally, Windows container images occupy large amount of space (in the order of few GBs).
All the task definition parameters for Linux containers are not available for Windows containers. For more information, see Windows Task Definitions.
An Application Load Balancer can be configured to route requests to one or more ports on each container instance in a cluster. The dynamic port mapping allows you to have multiple tasks from a single service on the same container instance.
IAM roles for Windows tasks require extra configuration. For more information, see Windows IAM Roles for Tasks. For this post, configuration was handled by the CloudFormation template.
The ECS container agent log file can be accessed for troubleshooting Windows containers: C:\ProgramData\Amazon\ECS\log\ecs-agent.log
In this post, you migrated an ASP.NET MVC application to ECS using Windows containers.
The logical next step is to automate the activities for migration to ECS and build a fully automated continuous integration/continuous deployment (CI/CD) pipeline for Windows containers. This can be orchestrated by leveraging services such as AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, Amazon ECR, and Amazon ECS. You can learn more about how this is done in the Set Up a Continuous Delivery Pipeline for Containers Using AWS CodePipeline and Amazon ECS post.
If you have questions or suggestions, please comment below.
This post contributed by AWS Senior Cloud Infrastructure Architect Anabell St Vincent.
Some systems or applications require Transport Layer Security (TLS) traffic from the client all the way through to the Docker container, without offloading or terminating certificates at a load balancer. Some highly time-sensitive services may require communication over TLS without any decryption and re-encryption in the communication path.
There are multiple options for this type of implementation on AWS. One option is to use a service discovery tool to implement the requirements, but that creates overhead from an implementation and management perspective. In this post, I examine the option of using Amazon Elastic Container Service (Amazon ECS) with a Network Load Balancer.
Amazon ECS is a highly scalable, high-performance service for running Docker containers on AWS. It is integrated with each of the Elastic Load Balancing load balancers offered by AWS:
The Classic Load Balancer supports application or network level traffic and can be used to pass through the TLS traffic when configured with TCP listeners. This approach limits the number of containers to be deployed by Amazon ECS to one per EC2 host. This approach is only available for the ECS launch type of EC2, not Fargate. This is due to Classic Load Balancer not supporting target groups, so dynamic port mapping can’t be leveraged for this type of implementation.
The Application Load Balancer functions at the application layer, the layer 7 of Open System Interconnection (OSI) and supports HTTP and HTTPS protocols. By operating at layer 7, the Application Load Balancer is able to route traffic based on the request path in the URL. It can also provide SSL offloading or termination of the SSL certificates for applications, by hosting the SSL certificate. The Application Load Balancer integrates well with ECS by providing the functionality of the target groups for the container instances, which allows for port mapping and targeting container groups. Port mappings allows the containers to run on different ports but still be represented by the one port configured on the ALB as part of the target group of the task.
The Network Load Balancer, which is a high performance, ultra-low latency load balancer that operates at layer 4. It is designed to handle tens of millions of requests per second while maintaining high throughput at ultra-low latency. Operating at layer 4 networking means that the traffic is passed onto the targets based on the IP addresses and ports as oppose to session information or cookies, as is the case with Application Load Balancers. The Network Load Balancer also supports long-lived TCP connections, which are ideal for WebSocket type of applications . Some applications communicate on TCP protocols rather than solely on HTTP and HTTPS. The Network Load Balancer provides the support to load balance between services that require protocols besides HTTP and HTTPS.
The Network Load Balancer is the best option for managing secure traffic as it provides support for TCP traffic pass through, without decrypting and then re-encrypting the traffic. Additionally, the Network Load Balancer supports target groups, which means port mapping can be used, as well as configuring the load balancer as part of the task definition for the containers.
The diagram below shows a high-level architecture with TLS traffic coming from the internet that passes through the VPC to the Network Load Balancer, then to a container without offloading or terminating the certificate in the path.
In this diagram, there are three containers running in an ECS cluster. The ECS cluster can be either the EC2 or Fargate launch type.
The following diagram shows an architecture that contains two different services deployed in two different ECS clusters.
In this architecture, the containers in each of the clusters can reference the other containers securely via the Network Load Balancer without terminating or offloading the certificates until it reaches the destination container.
This approach addresses the requirements where containers need to communicate securely whether they are deployed in one VPC, across VPCs, or in separate AWS accounts.
The following diagram below shows both TLS connections from the internet, as well as connections from within the same cluster.
The above approach is achieved by using the same Network Load Balancer for the cluster to serve two different services. Implement this by using two different target groups (one for each service) and associating them with the ECS task definition. The containers use the DNS name associated with the Network Load Balancer to access the containers in the other service.
The architectures in this post show how the Network Load Balancer integrates seamlessly with ECS and other AWS services, providing end-to-end TLS communication across services without offloading or terminating the certificates. This gives you the ability to use dynamic ports in ECS containers. It can also handle tens of millions of requests per second while maintaining high throughput at ultra-low latency for applications that require the TCP protocol.
If you have questions or suggestions, please comment below.
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.