Tag Archives: Amazon EC2 Container Service

Optimizing Amazon ECS task density using awsvpc network mode

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/optimizing-amazon-ecs-task-density-using-awsvpc-network-mode/

This post is contributed by Tony Pujals | Senior Developer Advocate, AWS

 

AWS recently increased the number of elastic network interfaces available when you run tasks on Amazon ECS. Use the account setting called awsvpcTrunking. If you use the Amazon EC2 launch type and task networking (awsvpc network mode), you can now run more tasks on an instance—5 to 17 times as many—as you did before.

As more of you embrace microservices architectures, you deploy increasing numbers of smaller tasks. AWS now offers you the option of more efficient packing per instance, potentially resulting in smaller clusters and associated savings.

 

Overview

To manage your own cluster of EC2 instances, use the EC2 launch type. Use task networking to run ECS tasks using the same networking properties as if tasks were distinct EC2 instances.

Task networking offers several benefits. Every task launched with awsvpc network mode has its own attached network interface, a primary private IP address, and an internal DNS hostname. This simplifies container networking and gives you more control over how tasks communicate, both with each other and with other services within their virtual private clouds (VPCs).

Task networking also lets you take advantage of other EC2 networking features like VPC Flow Logs. This feature lets you monitor traffic to and from tasks. It also provides greater security control for containers, allowing you to use security groups and network monitoring tools at a more granular level within tasks. For more information, see Introducing Cloud Native Networking for Amazon ECS Containers.

However, if you run container tasks on EC2 instances with task networking, you can face a networking limit. This might surprise you, particularly when an instance has plenty of free CPU and memory. The limit reflects the number of network interfaces available to support awsvpc network mode per container instance.

 

Raise network interface density limits with trunking

The good news is that AWS raised network interface density limits by implementing a networking feature on ECS called “trunking.” This is a technique for multiplexing data over a shared communication link.

If you’re migrating to microservices using AWS App Mesh, you should optimize network interface density. App Mesh requires awsvpc networking to provide routing control and visibility over an ever-expanding array of running tasks. In this context, increased network interface density might save money.

By opting for network interface trunking, you should see a significant increase in capacity—from 5 to 17 times more than the previous limit. For more information on the new task limits per container instance, see Supported Amazon EC2 Instance Types.

Applications with tasks not hitting CPU or memory limits also benefit from this feature through the more cost-effective “bin packing” of container instances.

 

Trunking is an opt-in feature

AWS chose to make the trunking feature opt-in due to the following factors:

  • Instance registration: While normal instance registration is straightforward with trunking, this feature increases the number of asynchronous instance registration steps that can potentially fail. Any such failures might add extra seconds to launch time.
  • Available IP addresses: The “trunk” belongs to the same subnet in which the instance’s primary network interface originates. This effectively reduces the available IP addresses and potentially the ability to scale out on other EC2 instances sharing the same subnet. The trunk consumes an IP address. With a trunk attached, there are two assigned IP addresses per instance, one for the primary interface and one for the trunk.
  • Differing customer preferences and infrastructure: If you have high CPU or memory workloads, you might not benefit from trunking. Or, you may not want awsvpc networking.

Consequently, AWS leaves it to you to decide if you want to use this feature. AWS might revisit this decision in the future, based on customer feedback. For now, your account roles or users must opt in to the awsvpcTrunking account setting to gain the benefits of increased task density per container instance.

 

Enable trunking

Enable the ECS elastic network interface trunking feature to increase the number of network interfaces that can be attached to supported EC2 container instance types. You must meet the following prerequisites before you can launch a container instance with the increased network interface limits:

  • Your account must have the AWSServiceRoleForECS service-linked role for ECS.
  • You must opt into the awsvpcTrunking  account setting.

 

Make sure that a service-linked role exists for ECS

A service-linked role is a unique type of IAM role linked to an AWS service (such as ECS). This role lets you delegate the permissions necessary to call other AWS services on your behalf. Because ECS is a service that manages resources on your behalf, you need this role to proceed.

In most cases, you won’t have to create a service-linked role. If you created or updated an ECS cluster, ECS likely created the service-linked role for you.

You can confirm that your service-linked role exists using the AWS CLI, as shown in the following code example:

$ aws iam get-role --role-name AWSServiceRoleForECS
{
    "Role": {
        "Path": "/aws-service-role/ecs.amazonaws.com/",
        "RoleName": "AWSServiceRoleForECS",
        "RoleId": "AROAJRUPKI7I2FGUZMJJY",
        "Arn": "arn:aws:iam::226767807331:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS",
        "CreateDate": "2018-11-09T21:27:17Z",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "ecs.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        },
        "Description": "Role to enable Amazon ECS to manage your cluster.",
        "MaxSessionDuration": 3600
    }
}

If the service-linked role does not exist, create it manually with the following command:

aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com

For more information, see Using Service-Linked Roles for Amazon ECS.

 

Opt in to the awsvpcTrunking account setting

Your account, IAM user, or role must opt in to the awsvpcTrunking account setting. Select this setting using the AWS CLI or the ECS console. You can opt in for an account by making awsvpcTrunking  its default setting. Or, you can enable this setting for the role associated with the instance profile with which the instance launches. For instructions, see Account Settings.

 

Other considerations

After completing the prerequisites described in the preceding sections, launch a new container instance with increased network interface limits using one of the supported EC2 instance types.

Keep the following in mind:

  • It’s available with the latest variant of the ECS-optimized AMI.
  • It only affects creation of new container instances after opting into awsvpcTrunking.
  • It only affects tasks created with awsvpc network mode and EC2 launch type. Tasks created with the AWS Fargate launch type always have a dedicated network interface, no matter how many you launch.

For details, see ENI Trunking Considerations.

 

Summary

If you seek to optimize the usage of your EC2 container instances for clusters that you manage, enable the increased network interface density feature with awsvpcTrunking. By following the steps outlined in this post, you can launch tasks using significantly fewer EC2 instances. This is especially useful if you embrace a microservices architecture, with its increasing numbers of lighter tasks.

Hopefully, you found this post informative and the proposed solution intriguing. As always, AWS welcomes all feedback or comment.

Using AWS App Mesh with Fargate

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/using-aws-app-mesh-with-fargate/

This post is contributed by Tony Pujals | Senior Developer Advocate, AWS

 

AWS App Mesh is a service mesh, which provides a framework to control and monitor services spanning multiple AWS compute environments. My previous post provided a walkthrough to get you started. In it, I showed deploying a simple microservice application to Amazon ECS and configuring App Mesh to provide traffic control and observability.

In this post, I show more advanced techniques using AWS Fargate as an ECS launch type. I show you how to deploy a specific version of the colorteller service from the previous post. Finally, I move on and explore distributing traffic across other environments, such as Amazon EC2 and Amazon EKS.

I simplified this example for clarity, but in the real world, creating a service mesh that bridges different compute environments becomes useful. Fargate is a compute service for AWS that helps you run containerized tasks using the primitives (the tasks and services) of an ECS application. This lets you work without needing to directly configure and manage EC2 instances.

 

Solution overview

This post assumes that you already have a containerized application running on ECS, but want to shift your workloads to use Fargate.

You deploy a new version of the colorteller service with Fargate, and then begin shifting traffic to it. If all goes well, then you continue to shift more traffic to the new version until it serves 100% of all requests. Use the labels “blue” to represent the original version and “green” to represent the new version. The following diagram shows programmer model of the Color App.

You want to begin shifting traffic over from version 1 (represented by colorteller-blue in the following diagram) over to version 2 (represented by colorteller-green).

In App Mesh, every version of a service is ultimately backed by actual running code somewhere, in this case ECS/Fargate tasks. Each service has its own virtual node representation in the mesh that provides this conduit.

The following diagram shows the App Mesh configuration of the Color App.

 

 

After shifting the traffic, you must physically deploy the application to a compute environment. In this demo, colorteller-blue runs on ECS using the EC2 launch type and colorteller-green runs on ECS using the Fargate launch type. The goal is to test with a portion of traffic going to colorteller-green, ultimately increasing to 100% of traffic going to the new green version.

 

AWS compute model of the Color App.

Prerequisites

Before following along, set up the resources and deploy the Color App as described in the previous walkthrough.

 

Deploy the Fargate app

To get started after you complete your Color App, configure it so that your traffic goes to colorteller-blue for now. The blue color represents version 1 of your colorteller service.

Log into the App Mesh console and navigate to Virtual routers for the mesh. Configure the HTTP route to send 100% of traffic to the colorteller-blue virtual node.

The following screenshot shows routes in the App Mesh console.

Test the service and confirm in AWS X-Ray that the traffic flows through the colorteller-blue as expected with no errors.

The following screenshot shows racing the colorgateway virtual node.

 

Deploy the new colorteller to Fargate

With your original app in place, deploy the send version on Fargate and begin slowly increasing the traffic that it handles rather than the original. The app colorteller-green represents version 2 of the colorteller service. Initially, only send 30% of your traffic to it.

If your monitoring indicates a healthy service, then increase it to 60%, then finally to 100%. In the real world, you might choose more granular increases with automated rollout (and rollback if issues arise), but this demonstration keeps things simple.

You pushed the gateway and colorteller images to ECR (see Deploy Images) in the previous post, and then launched ECS tasks with these images. For this post, launch an ECS task using the Fargate launch type with the same colorteller and envoy images. This sets up the running envoy container as a sidecar for the colorteller container.

You don’t have to manually configure the EC2 instances in a Fargate launch type. Fargate automatically colocates the sidecar on the same physical instance and lifecycle as the primary application container.

To begin deploying the Fargate instance and diverting traffic to it, follow these steps.

 

Step 1: Update the mesh configuration

You can download updated AWS CloudFormation templates located in the repo under walkthroughs/fargate.

This updated mesh configuration adds a new virtual node (colorteller-green-vn). It updates the virtual router (colorteller-vr) for the colorteller virtual service so that it distributes traffic between the blue and green virtual nodes at a 2:1 ratio. That is, the green node receives one-third of the traffic.

$ ./appmesh-colorapp.sh
...
Waiting for changeset to be created..
Waiting for stack create/update to complete
...
Successfully created/updated stack - DEMO-appmesh-colorapp
$

Step 2: Deploy the green task to Fargate

The fargate-colorteller.sh script creates parameterized template definitions before deploying the fargate-colorteller.yaml CloudFormation template. The change to launch a colorteller task as a Fargate task is in fargate-colorteller-task-def.json.

$ ./fargate-colorteller.sh
...

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - DEMO-fargate-colorteller
$

 

Verify the Fargate deployment

The ColorApp endpoint is one of the CloudFormation template’s outputs. You can view it in the stack output in the AWS CloudFormation console, or fetch it with the AWS CLI:

$ colorapp=$(aws cloudformation describe-stacks --stack-name=$ENVIRONMENT_NAME-ecs-colorapp --query="Stacks[0
].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp> ].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp
http://DEMO-Publi-YGZIJQXL5U7S-471987363.us-west-2.elb.amazonaws.com

Assign the endpoint to the colorapp environment variable so you can use it for a few curl requests:

$ curl $colorapp/color
{"color":"blue", "stats": {"blue":1}}
$

The 2:1 weight of blue to green provides predictable results. Clear the histogram and run it a few times until you get a green result:

$ curl $colorapp/color/clear
cleared

$ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done

0: {"color":"blue", "stats": {"blue":1}}
1: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"blue", "stats": {"blue":0.67,"green":0.33}}
3: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
4: {"color":"blue", "stats": {"blue":0.6,"green":0.4}}
5: {"color":"gre
en", "stats": {"blue":0.5,"green":0.5}}
6: {"color":"blue", "stats": {"blue":0.57,"green":0.43}}
7: {"color":"blue", "stats": {"blue":0.63,"green":0.38}}
8: {"color":"green", "stats": {"blue":0.56,"green":0.44}}
...
199: {"color":"blue", "stats": {"blue":0.66,"green":0.34}}

This reflects the expected result for a 2:1 ratio. Check everything on your AWS X-Ray console.

The following screenshot shows the X-Ray console map after the initial testing.

The results look good: 100% success, no errors.

You can now increase the rollout of the new (green) version of your service running on Fargate.

Using AWS CloudFormation to manage your stacks lets you keep your configuration under version control and simplifies the process of deploying resources. AWS CloudFormation also gives you the option to update the virtual route in appmesh-colorapp.yaml and deploy the updated mesh configuration by running appmesh-colorapp.sh.

For this post, use the App Mesh console to make the change. Choose Virtual routers for appmesh-mesh, and edit the colorteller-route. Update the HTTP route so colorteller-blue-vn handles 33.3% of the traffic and colorteller-green-vn now handles 66.7%.

Run your simple verification test again:

$ curl $colorapp/color/clear
cleared
fargate $ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done
0: {"color":"green", "stats": {"green":1}}
1: {"color":"blue", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"green", "stats": {"blue":0.33,"green":0.67}}
3: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
4: {"color":"green", "stats": {"blue":0.2,"green":0.8}}
5: {"color":"green", "stats": {"blue":0.17,"green":0.83}}
6: {"color":"blue", "stats": {"blue":0.29,"green":0.71}}
7: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
...
199: {"color":"green", "stats": {"blue":0.32,"green":0.68}}
$

If your results look good, double-check your result in the X-Ray console.

Finally, shift 100% of your traffic over to the new colorteller version using the same App Mesh console. This time, modify the mesh configuration template and redeploy it:

appmesh-colorteller.yaml
  ColorTellerRoute:
    Type: AWS::AppMesh::Route
    DependsOn:
      - ColorTellerVirtualRouter
      - ColorTellerGreenVirtualNode
    Properties:
      MeshName: !Ref AppMeshMeshName
      VirtualRouterName: colorteller-vr
      RouteName: colorteller-route
      Spec:
        HttpRoute:
          Action:
            WeightedTargets:
              - VirtualNode: colorteller-green-vn
                Weight: 1
          Match:
            Prefix: "/"
$ ./appmesh-colorapp.sh
...
Waiting for changeset to be created..
Waiting for stack create/update to complete
...
Successfully created/updated stack - DEMO-appmesh-colorapp
$

Again, repeat your verification process in both the CLI and X-Ray to confirm that the new version of your service is running successfully.

 

Conclusion

In this walkthrough, I showed you how to roll out an update from version 1 (blue) of the colorteller service to version 2 (green). I demonstrated that App Mesh supports a mesh spanning ECS services that you ran as EC2 tasks and as Fargate tasks.

In my next walkthrough, I will demonstrate that App Mesh handles even uncontainerized services launched directly on EC2 instances. It provides a uniform and powerful way to control and monitor your distributed microservice applications on AWS.

If you have any questions or feedback, feel free to comment below.

Docker, Amazon ECS, and Spot Fleets: A Great Fit Together

Post Syndicated from Tung Nguyen original https://aws.amazon.com/blogs/aws/docker-amazon-ecs-and-spot-fleets-a-great-fit-together/

Guest post by AWS Container Hero Tung Nguyen. Tung is the president and founder of BoltOps, a consulting company focused on cloud infrastructure and software on AWS. He also enjoys writing for the BoltOps Nuts and Bolts blog.

EC2 Spot Instances allow me to use spare compute capacity at a steep discount. Using Amazon ECS with Spot Instances is probably one of the best ways to run my workloads on AWS. By using Spot Instances, I can save 50–90% on Amazon EC2 instances. You would think that folks would jump at a huge opportunity like a black Friday sales special. However, most folks either seem to not know about Spot Instances or are hesitant. This may be due to some fallacies about Spot.

Spot Fallacies

With the Spot model, AWS can remove instances at any time. It can be due to a maintenance upgrade; high demand for that instance type; older instance type; or for any reason whatsoever.

Hence the first fear and fallacy that people quickly point out with Spot:

What do you mean that the instance can be replaced at any time? Oh no, that must mean that within 20 minutes of launching the instance, it gets killed.

I felt the same way too initially. The actual Spot Instance Advisor website states:

The average frequency of interruption across all Regions and instance types is less than 5%.

From my own usage, I have seen instances run for weeks. Need proof? Here’s a screenshot from an instance in one of our production clusters.

If you’re wondering how many days that is….

Yes, that is 228 continuous days. You might not get these same long uptimes, but it disproves the fallacy that Spot Instances are usually interrupted within 20 minutes from launch.

Spot Fleets

With Spot Instances, I place a single request for a specific instance in a specific Availability Zone. With Spot Fleets, instead of requesting a single instance type, I can ask for a variety of instance types that meet my requirements. For many workloads, as long as the CPU and RAM are close enough, many instance types do just fine.

So, I can spread my instance bets across instance types and multiple zones with Spot Fleets. Using Spot Fleets dramatically makes the system more robust on top of the already mentioned low interruption rate. Also, I can run an On-Demand cluster to provide additional safeguard capacity.

ECS and Spot Fleets: A Great Fit Together

This is one of my favorite ways to run workloads because it gives me a scalable system at a ridiculously low cost. The technologies are such a great fit together that one might think they were built for each other.

  1. Docker provides a consistent, standard binary format to deploy. If it works in one Docker environment, then it works in another. Containers can be pulled down in seconds, making them an excellent fit for Spot Instances, where containers might move around during an interruption.
  2. ECS provides a great ecosystem to run Docker containers. ECS supports a feature called connection instance draining that allows me to tell ECS to relocate the Docker containers to other EC2 instances.
  3. Spot Instances fire off a two-minute warning signal letting me know when it’s about to terminate the instance.

These are the necessary pieces I need for building an ECS cluster on top of Spot Fleet. I use the two-minute warning to call ECS connection draining, and ECS automatically moves containers to another instance in the fleet.

Here’s a CloudFormation template that achieves this: ecs-ec2-spot-fleet. Because the focus is on understanding Spot Fleets, the VPC is designed to be simple.

The template specifies two instance types in the Spot Fleet: t3.small and t3.medium with 2 GB and 4 GB of RAM, respectively. The template weights the t3.medium twice as much as the t3.small. Essentially, the Spot Fleet TargetCapacity value equals the total RAM to provision for the ECS cluster. So if I specify 8, the Spot Fleet service might provision four t3.small instances or two t3.medium instances. The cluster adds up to at least 8 GB of RAM.

To launch the stack run, I run the following command:

aws cloudformation create-stack --stack-name ecs-spot-demo --template-body file://ecs-spot-demo.yml --capabilities CAPABILITY_IAM

The CloudFormation stack launches container instances and registers them to an ECS cluster named developmentby default. I can change this with the EcsCluster parameter. For more information on the parameters, see the README and the template source.

When I deploy the application, the deploy tool creates the ECS cluster itself. Here are the Spot Instances in the EC2 console.

Deploy the demo app

After the Spot cluster is up, I can deploy a demo app on it. I wrote a tool called Ufo that is useful for these tasks:

  1. Build the Docker image.
  2. Register the ECS task definition.
  3. Register and deploy the ECS service.
  4. Create the load balancer.

Docker should be installed as a prerequisite. First, I create an ECR repo and set some variables:

ECR_REPO=$(aws ecr create-repository --repository-name demo/sinatra | jq -r '.repository.repositoryUri')
VPC_ID=$(aws ec2 describe-vpcs --filters Name=tag:Name,Values="demo vpc" | jq -r '.Vpcs[].VpcId')

Now I’m ready to clone the demo repo and deploy a sample app to ECS with ufo.

git clone https://github.com/tongueroo/demo-ufo.git demo
cd demo
ufo init --image $ECR_REPO --vpc-id $VPC_ID
ufo current --service demo-web
ufo ship # deploys to ECS on the Spot Fleet cluster

Here’s the ECS service running:

I then grab the Elastic Load Balancing endpoint from the console or with ufo ps.

$ ufo ps
Elb: develop-Elb-12LHJWU4TH3Q8-597605736.us-west-2.elb.amazonaws.com
$

Now I test with curl:

$ curl develop-Elb-12LHJWU4TH3Q8-597605736.us-west-2.elb.amazonaws.com
42

The application returns “42,” the meaning of life, successfully. That’s it! I now have an application running on ECS with Spot Fleet Instances.

Parting thoughts

One additional advantage of using Spot is that it encourages me to think about my architecture in a highly available manner. The Spot “constraints” ironically result in much better sleep at night as the system must be designed to be self-healing.

Hopefully, this post opens the world of running ECS on Spot Instances to you. It’s a core of part of the systems that BoltOps has been running on its own production system and for customers. I still get excited about the setup today. If you’re interested in Spot architectures, contact me at BoltOps.

One last note: Auto Scaling groups also support running multiple instance types and purchase options. Jeff mentions in his post that weight support is planned for a future release. That’s exciting, as it may streamline the usage of Spot with ECS even further.

Automatically update instances in an Amazon ECS cluster using the AMI ID parameter

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/automatically-update-instances-in-an-amazon-ecs-cluster-using-the-ami-id-parameter/

This post is contributed by Adam McLean – Solutions Developer at AWS and Chirill Cucereavii – Application Architect at AWS 

In this post, we show you how to automatically refresh the container instances in an active Amazon Elastic Container Service (ECS) cluster with instances built from a newly released AMI.

The Amazon ECS-optimized AMI  comes prepackaged with the ECS container agent, Docker agent, and the ecs-init upstart service. We recommend that you use the Amazon ECS-optimized AMI for your container instances unless your application requires any of the following:

  • A specific operating system
  • Custom security and monitoring agents installed
  • Root volumes encryption enabled
  • A Docker version that is not yet available in the Amazon ECS-optimized AMI

Regardless of the type of AMI that you choose, AWS recommends updating your ECS containers instance fleet with the latest AMI whenever possible. It’s easier than trying to patch existing instances in place.

Solution overview

In this solution, you deploy the ECS cluster and, specify cluster size, instance type, AMI ID, and other parameters. After the ECS cluster has been created and instances registered, you can update the ECS cluster with another AMI ID to trigger the following events:

  1. A new launch configuration is created using the new AMI ID.
  2. The Auto Scaling group adds one new instance using the new launch configuration. This executes the ‘Adding Instances’ process described below.
  3. The adding instances process finishes for the single new node with the new AMI. Then, the removing instances process is started against the oldest instance with the old AMI ID.
  4. After the removing nodes process is finished, steps 2 and 3 are repeated until all nodes in the cluster have been replaced.
  5. If an error is encountered during the rollout, the new launch configuration is deleted, and the old one is put back in place.

Scaling a cluster out (adding instances)

Take a closer look at each step in scaling out a cluster:

  1. A stack update changes the AMI ID parameter.
  2. CloudFormation updates the launch configuration and tells the Auto Scaling group to add an instance.
  3. Auto Scaling launches an instance using new AMI ID to join the ECS cluster.
  4. Auto Scaling invokes the Launch Lambda function.
  5. Lambda asks the ECS cluster if the newly launched instance has joined and is showing healthy.
  6. Lambda tells Auto Scaling whether the launch succeeded or failed.
  7. Auto Scaling tells CloudFormation whether the scale-up has succeeded.
  8. The stack update succeeds, or rolls back.

Scaling a cluster in (removing instances)

Take a closer look at each step in scaling in a cluster:

  1. CloudFormation tells the Auto Scaling group to remove an instance.
  2. Auto Scaling chooses an instance to be terminated.
  3. Auto Scaling invokes the Terminate Lambda function.
  4. The Lambda function performs the following tasks:
    1. Sets the instance to be terminated to DRAINING mode.
    2. Confirms that all ECS tasks are drained from the instance marked for termination.
    3. Confirms that the ECS cluster services and tasks are stable.
  5. Lambda tells Auto Scaling to proceed with termination.
  6. Auto Scaling tells CloudFormation whether the scale-in has succeeded.
  7. The stack update succeeds, or rolls back.

Solution technologies

Here are the technologies used for this solution, with more details.

  • AWS CloudFormation
  • AWS Auto Scaling
  • Amazon CloudWatch Events
  • AWS Systems Manager Parameter Store
  • AWS Lambda

AWS CloudFormation

AWS CloudFormation is used to deploy the stack, and should be used for lifecycle management. Do not directly edit Auto Scaling groups, Lambda functions, and so on. Instead, update the CloudFormation template.

This forces the resolution of the latest AMI, as well as providing an opportunity to change the size or instance type involved in the ECS cluster.

CloudFormation has rollback capabilities to return to the last known good state if errors are encountered. It is the recommended mechanism for management through the clusters lifecycle.

AWS Auto Scaling

For ECS, the primary scaling and rollout mechanism is AWS Auto Scaling. Auto Scaling allows you to define a desired state environment, and keep that desired state as necessary by launching and terminating instances.

When a new AMI has been selected, CloudFormation informs Auto Scaling that it should replace the existing fleet of instances. This is controlled by an Auto Scaling update policy.

This solution rolls a single instance out to the ECS cluster, then drain, and terminate a single instance in response. This cycle continues until all instances in the ECS cluster have been replaced.

Auto Scaling lifecycle hooks

Auto Scaling permits the use of a lifecycle hooks. This is code that executes when a scaling operation occurs. This solution uses a Lambda function that is informed when an instance is launched or terminated.

A lifecycle hook informs Auto Scaling whether it can proceed with the activity or if it should abandon it. In this case, the ECS cluster remains healthy and all tasks have been redistributed before allowing Auto Scaling to proceed.

Lifecycles also have a timeout. In this case, it is 3600 seconds (1 hour) before Auto Scaling gives up. In that case, the default activity is to abandon the operation.

Amazon CloudWatch Events

CloudWatch Events is a mechanism for watching calls made to the AWS APIs, and then activating functions in response. This is the mechanism used to launch the Lambda functions when a lifecycle event occurs. It’s also the mechanism used to re-launch the Lambda function when it times out (Lambda maximum execution time is 15 minutes).

In this solution, four CloudWatch Events are created. Two to pick up the initial scale-up event. Two more to pick up a continuation from the Lambda function.

AWS Systems Manager Parameter Store

AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration data management and secrets management.

This solution relies on the AMI IDs stored in Parameter Store. Given a naming standard of /ami/ecs/latest, this always resolves to the latest available AMI for ECS.

CloudFormation now supports using the values stored in Parameter Store as inputs to CloudFormation templates. The template can be simply passed a value—/ami/ecs/latest—and it resolve that to the latest AMI.

AWS Lambda

The Lambda functions are used to handle the Auto Scaling lifecycle hooks. They operate against the ECS cluster to assure it is healthy, and inform Auto Scaling that it can proceed, or to abandon its current operation.

The functions are invoked by CloudWatch Events in response to scaling operations so they are idle unless there are changes happening in the cluster.

They’re written in Python, and use the boto3 SDK to communicate with the ECS cluster, and Auto Scaling.

The launch Lambda function waits until the instance has fully joined the ECS cluster. This is shown by the instance being marked ‘ACTIVE’ by the ECS control plane, and it’s ECS agent status showing as connected. This means that the new instance is ready to run tasks for the cluster.

The terminate Lambda function waits until the instance has fully drained all running tasks. It also checks that all tasks, and services are in a stable state before allowing Auto Scaling to terminate an instance. This assures the instance is truly idle, and the cluster stable before an instance can be removed.

Deployment

Before you begin deployment, you need the following:

  • An AWS account 

You need an AWS account with enough room to accommodate the additional EC2 instances required by the ECS cluster.

  • (Optional) Linux system 

Use AWS CLI and optionally JQ to deploy the solution. Although a Linux system is recommended, it’s not required.

  • IAM user

You need an IAM admin user with permissions to create IAM policies and roles and create and update CloudFormation stacks. The user must also be able to deploy the ECS cluster, Lambda functions, Systems Manager parameters, and other resources.

  • Download the code

Clone or download the project from https://github.com/awslabs/ecs-cluster-manager on GitHub:

git clone [email protected]:awslabs/ecs-cluster-manager.git

AMI ID parameter

Create an Systems Manager parameter where the desired AMI ID is stored.

The first run does not use the latest ECS optimized AMI. Later, you update the ECS cluster to the latest AMI.

Use the AMI released on 2017.09. Run the following commands to create /ami/ecs/latest parameter in Parameter Store with a corresponding AMI value.

AMI_ID=$(aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux/amzn-ami-2017.09.l-amazon-ecs-optimized --region us-east-1 --query "Parameters[].Value" --output text | jq -r .image_id)
aws ssm put-parameter \
  --overwrite \
  --name "/ami/ecs/latest" \
  --type "String" \
  --value $AMI_ID \
  --region us-east-1 \
  --profile devAdmin

Substitute us-east-1 with your desired Region.

In the AWS Management Console, choose AWS Systems Manager, Parameter Store.

You should see the /ami/ecs/latest parameter that you just created.

Select the /ami/ecs/latest parameter and make sure that the AMI ID is present in parameter value. If you are using the us-east-1 Region, you should see the following value:

ami-aff65ad2

Upload the Lambda function code to Amazon S3

The Lambda functions are too large to embed in the CloudFormation template. Therefore, they must be loaded into an S3 bucket before CloudFormation stack is created.

Assuming you’re using an S3 bucket called ecs-deployment,  copy each Lambda function zip file as follows:

cd ./ecs-cluster-manager
aws s3 cp lambda/ecs-lifecycle-hook-launch.zip s3://ecs-deployment
aws s3 cp lambda/ecs-lifecycle-hook-terminate.zip s3://ecs-deployment

Refer to these when running your CloudFormation template later so that CloudFormation knows where to find the Lambda files.

Lambda function role

The Lambda functions that execute require read permissions to EC2, write permissions to ECS, and permissions to submit a result or heartbeat to Auto Scaling.

Create a new LambdaECSScaling IAM policy in your AWS account. Use the following JSON as the policy body:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:CompleteLifecycleAction",
                "autoscaling:DescribeScalingActivities",
                "autoscaling:RecordLifecycleActionHeartbeat",
                "ecs:UpdateContainerInstancesState",
                "ecs:Describe*",
                "ecs:List*"
            ],
            "Resource": "*"
        }
    ]
}

 

Now, create a new LambdaECSScalingRole IAM role. For Trusted Entity, choose AWS Service, Lambda. Attach the following permissions policies:

  • LambdaECSScaling (created in the previous step)
  • ReadOnlyAccess (AWS managed policy)
  • AWSLambdaBasicExecutionRole (AWS managed policy)

ECS cluster instance profile

The ECS cluster nodes must have an instance profile attached that allows them to speak to the ECS service. This profile can also contain any other permissions that they would require (Systems Manager for management and executing commands for example).

These are all AWS managed policies so you only add the role.

Create a new IAM role called EcsInstanceRole, select AWS Service → EC2 as Trusted Entity. Attach the following AWS managed permissions policies:

  • AmazonEC2RoleforSSM
  • AmazonEC2ContainerServiceforEC2Role
  • AWSLambdaBasicExecutionRole

The AWSLambdaBasicExecutionRole policy may look out of place, but this allows the instance to create new CloudWatch Logs groups. These permissions facilitate using CloudWatch Logs as the primary logging mechanism with ECS. This managed policy grants the required permissions without you needing to manage a custom role.

CloudFormation parameter file

We recommend using a parameter file for the CloudFormation template. This documents the desired parameters for the template. It is usually less error prone to do this versus using the console for inputting parameters.

There is a file called blank_parameter_file.json in the source code project. Copy this file to something new and with a more meaningful name (such as dev-cluster.json), then fill out the parameters.

The file looks like this:

[
  {
    "ParameterKey": "EcsClusterName",
    "ParameterValue": ""
  }, 
  {
    "ParameterKey": "EcsAmiParameterKey",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "IamRoleInstanceProfile",
    "ParameterValue": ""
  }, 
  {
    "ParameterKey": "EcsInstanceType",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "EbsVolumeSize",
    "ParameterValue": ""
  }, 
  {
    "ParameterKey": "ClusterSize",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "ClusterMaxSize",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "KeyName",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "SubnetIds",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "SecurityGroupIds",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "DeploymentS3Bucket",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "LifecycleLaunchFunctionZip",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "LifecycleTerminateFunctionZip",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "LambdaFunctionRole",
    "ParameterValue": ""
  }
]

Here are the details for each parameter:

  • EcsClusterName:  The name of the ECS cluster to create.
  • EcsAmiParameterKey:  The Systems Manager parameter that contains the AMI ID to be used. This defaults to /ami/ecs/latest.
  • IamRoleInstanceProfile:  The name of the EC2 instance profile used by the ECS cluster members. Discussed in the prerequisite section.
  • EcsInstanceType:  The instance type to use for the cluster. Use whatever is appropriate for your workloads.
  • EbsVolumeSize:  The size of the Docker storage setup that is created using LVM. ECS typically defaults to 100 GB.
  • ClusterSize:  The desired number of EC2 instances for the cluster.
  • ClusterMaxSize:  This value should always be double the amount contained in ClusterSize. CloudFormation has no ‘math’ operators or we wouldn’t prompt for this. This allows rolling updates to be performed safely by doubling the cluster size, then contracting back.
  • KeyName:  The name of the EC2 key pair to place on the ECS instance to support SSH.
  • SubnetIds: A comma-separated list of subnet IDs that the cluster should be allowed to launch instances into. These should map to at least two zones for a resilient cluster, for example subnet-a70508df,subnet-e009eb89.
  • SecurityGroupIds:  A comma-separated list of security group IDs that are attached to each node, for example sg-bd9d1bd4,sg-ac9127dca (a single value is fine).
  • DeploymentS3Bucket: This is the bucket where the two Lambda functions for scale in/scale out lifecycle hooks can be found.
  • LifecycleLaunchFunctionZip: This is the full path within the DeploymentS3Bucket where the ecs-lifecycle-hook-launch.zip contents can be found.
  • LifecycleTerminateFunctionZip:  The full path within the DeploymentS3Bucket where the ecs-lifecycle-hook-terminate.zip contents can be found.
  • LambdaFunctionRole:  The name of the role that the Lambda functions use. Discussed in the prerequisite section.

A completed parameter file looks like the following:

[
  {
    "ParameterKey": "EcsClusterName",
    "ParameterValue": "DevCluster"
  }, 
  {
    "ParameterKey": "EcsAmiParameterKey",
    "ParameterValue": "/ami/ecs/latest"
  },
  {
    "ParameterKey": "IamRoleInstanceProfile",
    "ParameterValue": "EcsInstanceRole"
  }, 
  {
    "ParameterKey": "EcsInstanceType",
    "ParameterValue": "m4.large"
  },
  {
    "ParameterKey": "EbsVolumeSize",
    "ParameterValue": "100"
  }, 
  {
    "ParameterKey": "ClusterSize",
    "ParameterValue": "2"
  },
  {
    "ParameterKey": "ClusterMaxSize",
    "ParameterValue": "4"
  },
  {
    "ParameterKey": "KeyName",
    "ParameterValue": "dev-cluster"
  },
  {
    "ParameterKey": "SubnetIds",
    "ParameterValue": "subnet-a70508df,subnet-e009eb89"
  },
  {
    "ParameterKey": "SecurityGroupIds",
    "ParameterValue": "sg-bd9d1bd4"
  },
  {
    "ParameterKey": "DeploymentS3Bucket",
    "ParameterValue": "ecs-deployment"
  },
  {
    "ParameterKey": "LifecycleLaunchFunctionZip",
    "ParameterValue": "ecs-lifecycle-hook-launch.zip"
  },
  {
    "ParameterKey": "LifecycleTerminateFunctionZip",
    "ParameterValue": "ecs-lifecycle-hook-terminate.zip"
  },
  {
    "ParameterKey": "LambdaFunctionRole",
    "ParameterValue": "LambdaECSScalingRole"
  }
]

Deployment

Given the CloudFormation template and the parameter file, you can deploy the stack using the AWS CLI or the console.

Here’s an example deploying through the AWS CLI. This example uses a stack named ecs-dev and a parameter file named dev-cluster.json. It also uses the --profile argument to assure that the CLI assumes a role in the right account for deployment. Use the corresponding Region and profile from your local ~/.aws/config file.

This command outputs the stack ID as soon as it is executed, even though the other required resources are still being created.

aws cloudformation create-stack \
  --stack-name ecs-dev \
  --template-body file://./ecs-cluster.yaml \
  --parameters file://./dev-cluster.json \
  --region us-east-1 \
  --profile devAdmin

Use the AWS Management Console to check whether the stack is done creating. Or, run the following command:

aws cloudformation wait stack-create-complete \
  --stack-name ecs-dev \
  --region us-east-1 \
  --profile devAdmin

Use the AWS Management Console to check whether the stack is done creating. Or, run the following command:

aws cloudformation wait stack-create-complete \
  --stack-name ecs-dev \
  --region us-east-1 \
  --profile devAdmin

After the CloudFormation stack has been created, go to the ECS console. and open the DevCluster cluster that you just created. There are no tasks running, although you should see two container instances registered with the cluster.

You also see a warning message indicating that the container instances are not running the latest version of Amazon ECS container agent. The reason is that you did not use the latest available version of the ECS-Optimized AMI.

Fix this issue by updating the container instances AMI.

Update the cluster instances AMI

Run the following commands to set the /ami/ecs/latest parameter to the latest AMI ID.

AMI_ID=$(aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux/recommended --region us-east-1 --query "Parameters[].Value" --output text | jq -r .image_id)

 aws ssm put-parameter \
   --overwrite \
   --name "/ami/ecs/latest" \
   --type "String" \
   --value $AMI_ID
   --region us-east-1 \
   --profile devAdmin

Make sure that the parameter value has been updated in the console.

To update your ECS cluster, run the update-stack command without changing any parameters. CloudFormation evaluates the value stored by /ami/ecs/latest. If it has changed, CloudFormation makes updates as appropriate.

aws cloudformation update-stack \
  --stack-name ecs-dev \
  --template-body file://./ecs-cluster.yaml \
  --parameters file://./dev-cluster.json \
  --region us-east-1 \
  --profile devAdmin

Supervising updates

We recommend supervising your updates to the ECS cluster while they are being deployed. This assures that the cluster remains stable. For the majority of situations, there is no manual intervention required.

  • Keep an eye on Auto Scaling activities. In the Auto Scaling groups section of the EC2 console, select the Auto Scaling group for a cluster and choose Activity History.
  • Keep an eye on the ECS instances to ensure that new instances are joining and draining instances are leaving. In the ECS console, choose Cluster, ECS Instances.
  • Lambda function logs help troubleshoot things that aren’t behaving as expected. In the Lambda console, select the LifeCycleLaunch or LifeCycleTerminate functions, and choose Monitoring, View logs in CloudWatch. Expand the logs for the latest executions and see what’s going on:

When you go back to the ECS cluster page, notice that the “Outdated Amazon ECS container agent” warning message has disappeared.

Select one of the cluster’s EC2 instance IDs and observe that the latest ECS optimized AMI is used.

Summary

In this post, you saw how to use CloudFormation, Lambda, CloudWatch Events, and Auto Scaling lifecycle hooks to update your ECS cluster instances with a new AMI.

The sample code is available on GitHub for you to use and extend. Contributions are always welcome!

 

Scheduling GPUs for deep learning tasks on Amazon ECS

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/scheduling-gpus-for-deep-learning-tasks-on-amazon-ecs/

This post is contributed by Brent Langston – Sr. Developer Advocate, Amazon Container Services

Last week, AWS announced enhanced Amazon Elastic Container Service (Amazon ECS) support for GPU-enabled EC2 instances. This means that now GPUs are first class resources that can be requested in your task definition, and scheduled on your cluster by ECS.

Previously, to schedule a GPU workload, you had to maintain your own custom configured AMI, with a custom configured Docker runtime. You also had to use custom vCPU logic as a stand-in for assigning your GPU workloads to GPU instances. Even when all that was in place, there was still no pinning of a GPU to a task. One task might consume more GPU resources than it should. This could cause other tasks to not have a GPU available.

Now, AWS maintains an ECS-optimized AMI that includes the correct NVIDIA drivers and Docker customizations. You can use this AMI to provision your GPU workloads. With this enhancement, GPUs can also be requested directly in the task definition. Like allocating CPU or RAM to a task, now you can explicitly request a number of GPUs to be allocated to your task. The scheduler looks for matching resources on the cluster to place those tasks. The GPUs are pinned to the task for as long as the task is running, and can’t be allocated to any other tasks.

I thought I’d see how easy it is to deploy GPU workloads to my ECS cluster. I’m working in the US-EAST-2 (Ohio) region, from my AWS Cloud9 IDE, so these commands work for Amazon Linux. Feel free to adapt to your environment as necessary.

If you’d like to run this example yourself, you can find all the code in this GitHub repo. If you run this example in your own account, be aware of the instance pricing, and clean up your resources when your experiment is complete.

Clone the repo using the following command:

git clone https://github.com/brentley/tensorflow-container.git

Setup

You need the latest version of the AWS CLI (for this post, I used 1.16.98):

echo “export PATH=$HOME/.local/bin:$HOME/bin:$PATH” >> ~/.bash_profile
source ~/.bash_profile
pip install --user -U awscli

Provision an ECS cluster, with two C5 instances, and two P3 instances:

aws cloudformation deploy --stack-name tensorflow-test --template-file cluster-cpu-gpu.yml --capabilities CAPABILITY_IAM                            

While AWS CloudFormation is provisioning resources, examine the template used to build your infrastructure. Open `cluster-cpu-gpu.yml`, and you see that you are provisioning a test VPC with two c5.2xlarge instances, and two p3.2xlarge instances. This gives you one NVIDIA Tesla V100 GPU per instance, for a total of two GPUs to run training tasks.

I adapted the TensorFlow benchmark Docker container to create a training workload. I use this container to compare the GPU scheduling and runtime.

When the CloudFormation stack is deployed, register a task definition with the ECS service:

aws ecs register-task-definition --cli-input-json file://gpu-1-taskdef.json

To request GPU resources in the task definition, the only change needed is to include a GPU resource requirement in the container definition:

            "resourceRequirements": [
                {
                    "type": "GPU",
                    "value": "1"
                }
            ],

Including this resource requirement ensures that the ECS scheduler allocates the task to an instance with a free GPU resource.

Launch a single-GPU training workload

Now you’re ready to launch the first GPU workload.

export cluster=$(aws cloudformation describe-stacks --stack-name tensorflow-test --query 
'Stacks[0].Outputs[?OutputKey==`ClusterName`].OutputValue' --output text) 
echo $cluster
aws ecs run-task --cluster $cluster --task-definition tensorflow-1-gpu

When you launch the task, the output shows the `guIds` values that are assigned to the task. This GPU is pinned to this task, and can’t be shared with any other tasks. If all GPUs are allocated, you can’t schedule additional GPU tasks until a running task with a GPU completes. That frees the GPU to be scheduled again.

When you look at the log output in Amazon CloudWatch Logs, you see that the container discovered one GPU: `/gpu0` and the training benchmark trained at a rate of 321.16 images/sec.

With your two p3.2xlarge nodes in the cluster, you are limited to two concurrent single GPU based workloads. To scale horizontally, you could add additional p3.2xlarge nodes. This would limit your workloads to a single GPU each.  To scale vertically, you could bump up the instance type,  which would allow you to assign multiple GPUs to a single task.  Now, let’s see how fast your TensorFlow container can train when assigned multiple GPUs.

Launch a multiple-GPU training workload

To begin, replace the p3.2xlarge instances with p3.16xlarge instances. This gives your cluster two instances that each have eight GPUs, for a total of 16 GPUs that can be allocated.

aws cloudformation deploy --stack-name tensorflow-test --template-file cluster-cpu-gpu.yml --parameter-overrides GPUInstanceType=p3.16xlarge --capabilities CAPABILITY_IAM

When the CloudFormation deploy is complete, register two more task definitions to launch your benchmark container requesting more GPUs:

aws ecs register-task-definition --cli-input-json file://gpu-4-taskdef.json  
aws ecs register-task-definition --cli-input-json file://gpu-8-taskdef.json 

Next, launch two TensorFlow benchmark containers, one requesting four GPUs, and one requesting eight GPUs:

aws ecs run-task --cluster $cluster --task-definition tensorflow-4-gpu
aws ecs run-task --cluster $cluster --task-definition tensorflow-8-gpu

With each task request, GPUs are allocated: four in the first request, and eight in the second request. Again, these GPUs are pinned to the task, and not usable by any other task until these tasks are complete.

Check the log output in CloudWatch Logs:

On the “devices” lines, you can see that the container discovered and used four (or eight) GPUs. Also, the total images/sec improved to 1297.41 with four GPUs, and 1707.23 with eight GPUs.

Because you can pin single or multiple GPUs to a task, running advanced GPU based training tasks on Amazon ECS is easier than ever!

Cleanup

To clean up your running resources, delete the CloudFormation stack:

aws cloudformation delete-stack --stack-name tensorflow-test

Conclusion

For more information, see Working with GPUs on Amazon ECS.

If you want to keep up on the latest container info from AWS, please follow me on Twitter and tweet any questions! @brentContained

Setting up AWS PrivateLink for AWS Fargate, Amazon ECS, and Amazon ECR

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/setting-up-aws-privatelink-for-aws-fargate-amazon-ecs-and-amazon-ecr/

This post is contributed by Nathan Peck – Developer Advocate, Amazon Container Services

AWS Fargate, Amazon ECS, and Amazon ECR now have support for AWS PrivateLink. AWS PrivateLink is a networking technology designed to enable access to AWS services in a highly available and scalable manner, while keeping all the network traffic within the AWS network. When you create AWS PrivateLink endpoints for ECR, ECS, and Fargate, these service endpoints appear as elastic network interfaces with a private IP address in your VPC.

Before AWS PrivateLink, your Amazon EC2 instances had to use an internet gateway to download Docker images stored in ECR or communicate to the ECS control plane. Instances in a public subnet with a public IP address used the internet gateway directly. Instances in a private subnet used a network address translation (NAT) gateway hosted in a public subnet. The NAT gateway would then use the internet gateway to talk to ECR and ECS.

Now that AWS PrivateLink support has been added, instances in both public and private subnets can use it to get private connectivity to download images from Amazon ECR. Instances can also communicate with the ECS control plane via AWS PrivateLink endpoints without needing an internet gateway or NAT gateway.

This networking architecture is considerably simpler. It enables enhanced security by allowing you to deny your private EC2 instances access to anything other than these AWS services. That’s assuming that you want to block all other outbound internet access for those instances. For this to work, you must create some AWS PrivateLink resources:

  • AWS PrivateLink endpoints for ECR. This allows instances in your VPC to communicate with ECR to download image manifests
  • AWS PrivateLink gateway for Amazon S3. This allows instances to download the image layers from the underlying private S3 buckets that host them.
  • AWS PrivateLink endpoints for ECS. These endpoints allow instances to communicate with the telemetry and agent services in the ECS control plane.

This post explains how to create these resources.

Create an AWS PrivateLink interface endpoint for ECR

ECR requires 2 interface endpoints:

  • com.amazonaws.region.ecr.api
  • com.amazonaws.region.ecr.dkr

First, create the interface VPC endpoints for ECR using the endpoint creation wizard in the VPC dashboard separately. Select AWS services and select an endpoint. Substitute your region of choice.

Next, specify the VPC and subnets to which the AWS PrivateLink interface should be added. Make sure that you select the same VPC in which your ECS cluster is running. To be on the safe side, select every Availability Zone and subnet from the list. Each zone has a list of the subnets available. You can select all the subnets in each Availability Zone.

However, depending on your networking needs, you might also choose to only enable the AWS PrivateLink endpoint in your private subnets from each Availability Zone. Let instances running in a public subnet continue to communicate with ECR via the public subnet’s internet gateway.

Next, enable Private DNS Name. You are required to enable Private DNS Name for endpoint

com.amazonaws.region.ecr.dkr.

A private hosted zone enables you to access the resources in your VPC using the Amazon ECR default DNS domain names instead of using private IPv4 address or private DNS hostnames provided by AWS VPC Endpoints. The Amazon ECR DNS hostname that AWS CLI and Amazon ECR SDKs use by default (https://api.ecr.region.amazonaws.com) resolves to your VPC endpoint.

If you enabled a private hosted zone for com.amazonaws.region.ecr.api and you are using an SDK released before January 24, 2019, you must specify the following endpoint when using SDK or AWS CLI. For example:

aws --endpoint-url https://api.ecr.region.amazonaws.com

If you don’t enable a private hosted zone, this would be:

aws --endpoint-url https://VPC_Endpoint_ID.api.ecr.region.vpce.amazonaws.com ecr describe-repositories

If you enabled a private hosted zone and you are using the SDK released on January 24, 2019 or later, this would be:

aws ecr describe-repositories

Lastly, specify a security group for the interface itself. This is going to control whether each host is able to talk to the interface. The security group should allow inbound connections on port 80 from the instances in your cluster.

You may have a security group that is applied to all the EC2 instances in the cluster, perhaps using an Auto Scaling group. You can create a rule that allows the VPC endpoint to be accessed by any instance in that security group.

Finally, choose Create endpoint. The new endpoint appears in the list.

Add an AWS PrivateLink gateway endpoint for S3

The next step is to create a gateway VPC endpoint for S3. This is necessary because ECR uses S3 to store Docker image layers. When your instances download Docker images from ECR, they must access ECR to get the image manifest and S3 to download the actual image layers.

S3 uses a slightly different endpoint type called a gateway. Be careful about adding an S3 gateway to your VPC if your application is actively using S3. With gateway endpoints, your application’s existing connections to S3 may be briefly interrupted while the gateway is being added. You may have a busy cluster with many active ECS deployments, causing image layer downloads from S3. Or, your application itself may make heavy usage of S3. In that case, it’s best to create a fresh new VPC with an S3 gateway, then migrate your ECS cluster and its containers into that VPC.

To add the S3 gateway endpoint, select com.amazonaws.region.s3 on the list of AWS services and select the VPC hosting your ECS cluster. Gateway endpoints are added to the VPC route table for the subnets. Select each route table associated with the subnet in which the S3 gateway should be.

Instead of using a security group, the gateway endpoint uses an IAM policy document to limit access to the service. This policy is similar to an IAM policy but does not replace the default level of access that your applications have through their IAM role. It just further limits what portions of the service are available via the gateway. It’s okay to just use the default Full Access policy. Any restrictions you have put on your task IAM roles or other IAM user policies still apply on top of this policy.

Choose Create to add this gateway endpoint to your VPC. When you view the route tables in your VPC subnets, you see an S3 gateway that is used whenever ECR Docker image layers are being downloaded from S3.

Create an AWS PrivateLink interface endpoint for ECS

In addition to downloading Docker images from ECR, your EC2 instances must also communicate with the ECS control plane to receive orchestration instructions.

ECS requires three endpoints:

  • com.amazonaws.region.ecs-agent
  • com.amazonaws.region.ecs-telemetry
  • com.amazonaws.region.ecs

Create these three interface endpoints in the same way that you created the endpoint for ECR, by adding each endpoint and setting the subnets and security group for the endpoint.

After the endpoints are created and added to your VPC, there is one additional step. Restart any ECS agents that are currently running in the VPC. The ECS agent uses a persistent web socket connection to the ECS backend and VPC endpoints do not interrupt existing connections. The agent continues to use its existing connection instead of establishing a new connection through the new endpoint, unless you restart it.

To restart the agent with no disruption to your application containers, you can connect using SSH to each EC2 instance in the cluster and issue the following command:

sudo docker restart ecs-agent

This restarts the ECS agent without stopping any of the other application containers on the host. Your application may be stateless and safe to stop at any time, or you may not have or want SSH access to the underlying hosts. In that case, choose to just reboot each EC2 instance in the cluster one at a time. This restarts the agent on that host while also restarting any service launched tasks on that host on a different host.

If you are using AWS Fargate, you can issue an UpdateService API call to do a rolling restart of all your containers. Or, manually stop your running containers one by one and let them be automatically replaced. When they restart, they use an ECS agent that is communicating using the new ECS endpoints. The Docker image is downloaded using the ECR endpoint and S3 gateway.

Conclusion

In this post, I showed you how to add AWS PrivateLink endpoints to your VPC for ECS and ECR, including an S3 gateway for ECR layer downloads. These endpoints work whether you are running your containers on EC2 instances in a self-managed cluster in your VPC, or as Fargate containers running in your VPC.

Your ECS cluster or Fargate tasks communicate directly with the ECS control plane. They should be able to download Docker images directly without needing to make any connections outside of your VPC via an internet gateway or NAT gateway. All container orchestration traffic stays inside the VPC.

If you have questions or suggestions, please comment below.

Migrate Wildfly Cluster to Amazon ECS using Service Discovery

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/migrate-wildfly-cluster-to-ecs-using-service-discovery/

This post is courtesy of Vidya Narasimhan, AWS Solutions Architect

1. Overview

Java Enterprise Edition has been an important server-side platform for over a decade for developing mission-critical & large-scale applications amongst enterprises. High-availability & fault tolerance for such applications is typically achieved through built-in JEE clustering provided by the platform.

JEE clustering represents a group of machines working together to transparently provide enterprise services such as JNDI, EJB, JMS, HTTPSession etc. that enable distribution, discovery, messaging, transaction, caching, replication & component failover.  Implementation of clustering technology varies in JEE platforms provided by different vendors. Many of the clustering implementations involve proprietary communication protocols that use multicast for intra-cluster communications that is not supported in public cloud.

This article is relevant for JEE platforms & other products that use JGroups based clustering such as Wildfly. The solution described allows easy migration of applications developed on these platforms using native clustering to Amazon Elastic Container Service (Amazon ECS) which is a highly scalable, fast, container management service that makes it easy to orchestrate, run & scale Docker containers on a cluster. This solution is useful when the business objective is to migrate to cloud fast with minimum changes to the application. The approach recommends lift & shift to AWS wherein the initial focus is to migrate as-is with optimizations coming in later incrementally.

Whether the JEE application to be migrated is designed as a monolith or micro services, a legacy or green-field deployment, there are multiple reasons why organizations should opt for containerization of their application. This link explains well the benefits of containerization (see section Why Use Containers) https://aws.amazon.com/getting-started/projects/break-monolith-app-microservices-ecs-docker-ec2/module-one/

2. Wildfly Clustering on ECS

Here onwards, this article highlights how to migrate a standard clustered JEE app deployed on Wildfly Application Server to Amazon ECS. Wildfly supports clustering out of the box and supports two modes of clustering, standalone & domain mode. This article explores how to setup WildFly cluster in ECS with multiple Wildfly standalone nodes enabled for HA to form a cluster. The clustering is demonstrated through a web application that replicates session information across the cluster nodes and can withstand a failover without session data loss.

The important components of clustering that requires a mention right away are ECS Service Discovery, JGroups & Infinispan.

  • JGroups – Wildfly clustering is enabled by the popular open-source JGroups toolkit. The JGroups subsystem provides group communication support for HA services using a multicast transmission by default. It deals with all aspects of node discovery and providing reliable messaging between the nodes as follows-
    • Node-to-node messaging — By default is based on UDP/multicast that can be extended via TCP/unicast.
    • Node discovery — By default uses multicast ping MPING. Alternatives include TCPPING, S3_PING, JDBC_PING, DNS_PING and others.

This article focusses on DNS_PING for node discovery using TCP protocol.

ECS Service discovery – Amazon ECS service can optionally be configured to use Amazon ECS Service Discovery. Service discovery uses Amazon Route 53 auto naming API actions to manage DNS entries (A or SRV records) for service tasks, making them discoverable within your VPC. You can specify health check conditions in a service task definition and Amazon ECS will ensure that only healthy service endpoints are returned by a service lookup.

As your services scale up or down in response to load or container health, the Route 53 hosted zone is kept up to date.

Wildfly uses JGroups to discover the cluster nodes via DNS_PING discovery protocol that sends a DNS service endpoint query to the ECS service registry maintained in Route53.

  • Infinispan – Wildfly uses Infinispan subsystem to provides high-performance, clustered, transactional caching. In a clustered web application, Infinispan handles the replication of application data across the cluster by means of a replicated/distributed cache. Under the hood, it uses JGroups channel for data transmission within the cluster.

3. Implementation Instructions

Configure Wildfly

  • Modify Wildfly standalone configuration file – Standalone-HA.xml. The HA suffix implies high availability configuration.
  1.  Modify the JGroup Subsystem – Add a TCP Stack with DNS_Ping as the discovery protocol & configure the DNS Query endpoint. It is important to note that the DNS_QUERY matches the ECS  service endpoint when configuring the ECS service.  
  2. Change the JGroup default stack to point to the TCP Stack.                           
  3. Configure a custom Infinispan replicated cache to be used by the web app or use the default cache.      

Build the docker image & store it in Elastic Container Registry (ECR)

  1. Package the JBoss/Wildfly image with JEE application & Wildfly platform on Docker. Create a Dockerfile & include the following:
    1. Install the WildFly distribution & set permissions – This approach requires the latest Wildfly distribution 15.0.0.Final released recently.          
    2. Copy the modified Wildfly standalone-ha.xml to the container.
    3. Deploy the JEE web application. This simple web app is configured as distributable and uses Infinispan to replicate session information across cluster nodes. It displays a page containing the container IP/hostname, Session ID & session data & helps demonstrate session replication.                   
    4. Define a custom entrypoint, entrypoint.sh, to boot Wildfly with the specified bind IP addresses to its interfaces. The script gets the container metadata, extracts the container IP to bind it to Wildfly interfaces. This interface binding is an important step as it enables the application related network communication (web, messaging) between the containers.    
    5. Add the enrypoint.sh script to the image in the Dockerfile.                                        
    6. Build the container & push it to ECR repository. Amazon Elastic Container Registry (ECR) is a fully managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images.

The Wildly configuration files, the Dockerfile & the web app WAR file can be found at the Github link https://github.com/vidyann/Wildfly_ECS

Create ECS Service with service discovery

  • Create a Service using service discovery.
    • This link describe steps to set up a ECS task & service with service discovery https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-service-discovery.html#create-service-discovery-taskdef. Though the example in the link creates a Fargate cluster, you can create an EC2 based cluster as well for this example.
    • While configuring the task choose the network mode as AWSVPC. The task networking features provided by the AWSVPC network mode give Amazon ECS tasks the same networking properties as Amazon EC2 instances. Benefits of task networking can be found here – https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-networking.html
    • Tasks can be flagged as compatible with EC2, Fargate, or both. Here is what the cluster & service looks like:                             
    • When setting up the container details in task, use 8080 as the port, which is the default Wildfly port. This can be changed through WIldfly configuration. Enable the cloudwatch logs which captures Wildfly logging.
    • While configuring the ECS service, ensure that the service name & namespace should combine to form service endpoint that exactly matches the DNS_Query endpoint configured in Wildfly configuration file. The container security group should allow inbound traffic to port 8080. Here is what the service endpoint looks like:     
    • The route53 registry created by ECS is shown below. We see two DNS entries corresponding to the DNS endpoint myapp.sampleaws.com.              
    • Finally view the Wildfly logs in the console by clicking a task instance. You can check if clustering is enabled by looking for a log entry as below:            

Here we see that a Wildfly cluster was formed with two nodes(same as the pic in route 53).

Run the Web App in a browser

  • Spin up a windows instance in the VPC & open the web app in a browser. Below is a screenshot of the webapp:                                         
  • Open in different browsers & tabs & verify the Container IP & session ID. Now force shutdown a node by resizing the ECS service task instances to one. Note that though the container IP in the webapp changes, the session ID does not change and the webapp is available and the HTTP Session is alive thus demonstrating the session replication & failover amongst the clustering nodes.

4. Summary

Our goal here is to migrate the enterprise JEE apps to Amazon ECS by tweaking a few configurations but gaining immediately the benefits of containerization & orchestration managed by ECS. By delegating the undifferentiated heavy lifting of container management, orchestration, scaling to ECS, you can focus on improvising/re-architecting your application to micro-services oriented architecture. Please note that all the deployment procedures in this article can be fully automated via the AWS CI/CD services.

AWS Cloud Map: Easily create and maintain custom maps of your applications

Post Syndicated from Abby Fuller original https://aws.amazon.com/blogs/aws/aws-cloud-map-easily-create-and-maintain-custom-maps-of-your-applications/

Companies are increasingly building their applications as microservices (many separate services that each do a single job). Microservices often allow companies to iterate and deploy more quickly. Many of these microservice-based modern applications are built using various types of cloud resources and deployed on dynamically changing infrastructure. Previously you had to use configuration files to manage the location of your application resource. However, dependencies in a microservices-based application can quickly become too complex to easily manage through configuration files. Additionally, many applications are built using containers that scale dynamically, reacting on the changes in traffic load. That increases your application responsiveness, but poses a new class of problem – now your application components need to discover and connect to the upstream services at runtime. This problem of connectivity in dynamically changing infrastructures and microservices is commonly addressed by service discovery.

Introducing AWS Cloud Map

 

AWS Cloud Map keeps track of all your application components, their locations, attributes and health status. Now your applications can simply query AWS Cloud Map using AWS SDK, API or even DNS to discover the locations of its dependencies. That allows your applications to scale dynamically and connect to upstream services directly, increasing the responsiveness of your applications.

When you register your web services and cloud resources in AWS Cloud Map, you can describe them using custom attributes, such as deployment stage and version. Your applications then can make discovery calls specifying the required deployment stage and version. AWS Cloud Map will return the locations of resources that match the supplied parameters. It simplifies your deployments and reduces the operational complexity for your applications.

Integrated health checking for IP-based resources, registered with AWS Cloud Map, automatically stops routing traffic to unhealthy endpoints. Additionally, you have APIs to describe the health status of your services, so that you can learn about potential issues with your infrastructure. That increases the resilience of your applications.

AWS Cloud Map in Action
Getting started with AWS Cloud Map is easy. You can use the AWS console or CLI to create a namespace, such as myapp.com . For this example, I’ll use the CLI. Let’s create a namespace:

aws servicediscovery create-public-dns-namespace --name myapp.com (http://myapp.com/)

At this point, you’ll need to decide whether your want your applications to discover resources only via the AWS SDK and API calls, or if you need optional discovery via DNS. When you enable DNS discovery for a namespace, you’ll need to provide IP addresses for all the resources that you register. If you plan to register other cloud resources, such as DynamoDB tables by ARN or the URLs of the APIs deployed on Amazon API Gateway, you need to select API discovery mode.

Once your namespace is created, it’s time to create services. A service represents your application components, such as users , auth, or payment and can be comprised of many dynamically changing resources. You can specify a friendly name for your service, then select the DNS discovery and health checking options. You can create a service like this:

aws servicediscovery create-service --name frontend --namespace-id %namespace_id%”

After you create a service, you can register service instances with custom attributes:

aws servicediscovery register-instance --service-id %service_id% --instance-id %id%
--attributes AWS_INSTANCE_IPV4=54.20.10.1,stage=beta,version=1.0,active=yes

aws servicediscovery register-instance --service-id %service_id% --instance-id %id%
--attributes AWS_INSTANCE_IPV4=54.20.10.2,stage=beta,version=2.0,active=no

Now, your applications can make API calls to discover the service instances, optionally providing query parameters to filter the results:

aws servicediscovery discover-instances --namespace-name myapp.com --service-name frontend --query-parameters version=1.0,active=yes
-->
{
"Instances": [
{
"InstanceId": "1",
"NamespaceName": "myapp.com",
"ServiceName": "users",
"HealthStatus": "HEALTHY",
"Attributes": {
"version":"1.0",
"active":"yes",
"stage":"beta",
"AWS_INSTANCE_IPV4": "54.20.10.2" }
}
]
}

And that’s it! Amazon Elastic Container Service (ECS) and AWS Fargate are tightly integrated with AWS Cloud Map. When you create your service and enable service discovery, all the task instances are automatically registered in AWS Cloud Map on scale up, and deregistered on scale down. ECS also ensures that only healthy task instances are returned on the discovery calls by publishing always up-to-date health information to AWS Cloud Map.

For Amazon Elastic Container Service for Kubernetes (EKS), you can automatically publish the external IPs of the services running in EKS in AWS Cloud Map. To do this, we’ve released an update to an open source project, ExternalDNS, to make Kubernetes resources discoverable via AWS Cloud Map. You can find out more details about Kubernetes External DNS here.

 

Now Generally Available
You can start building your applications with AWS Cloud Map and enjoy the integration with Amazon ECS and EKS, rich and secure API query interface, ubiquitous DNS name resolution and integrated health checking support today. Want to try it out? Head to https://console.aws.amazon.com/cloudmap/home.  To test out the integration with ECS, head to https://console.aws.amazon.com/ecs/home and enable Service Discovery to get started.

Get Started with Blockchain Using the new AWS Blockchain Templates

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/get-started-with-blockchain-using-the-new-aws-blockchain-templates/

Many of today’s discussions around blockchain technology remind me of the classic Shimmer Floor Wax skit. According to Dan Aykroyd, Shimmer is a dessert topping. Gilda Radner claims that it is a floor wax, and Chevy Chase settles the debate and reveals that it actually is both! Some of the people that I talk to see blockchains as the foundation of a new monetary system and a way to facilitate international payments. Others see blockchains as a distributed ledger and immutable data source that can be applied to logistics, supply chain, land registration, crowdfunding, and other use cases. Either way, it is clear that there are a lot of intriguing possibilities and we are working to help our customers use this technology more effectively.

We are launching AWS Blockchain Templates today. These templates will let you launch an Ethereum (either public or private) or Hyperledger Fabric (private) network in a matter of minutes and with just a few clicks. The templates create and configure all of the AWS resources needed to get you going in a robust and scalable fashion.

Launching a Private Ethereum Network
The Ethereum template offers two launch options. The ecs option creates an Amazon ECS cluster within a Virtual Private Cloud (VPC) and launches a set of Docker images in the cluster. The docker-local option also runs within a VPC, and launches the Docker images on EC2 instances. The template supports Ethereum mining, the EthStats and EthExplorer status pages, and a set of nodes that implement and respond to the Ethereum RPC protocol. Both options create and make use of a DynamoDB table for service discovery, along with Application Load Balancers for the status pages.

Here are the AWS Blockchain Templates for Ethereum:

I start by opening the CloudFormation Console in the desired region and clicking Create Stack:

I select Specify an Amazon S3 template URL, enter the URL of the template for the region, and click Next:

I give my stack a name:

Next, I enter the first set of parameters, including the network ID for the genesis block. I’ll stick with the default values for now:

I will also use the default values for the remaining network parameters:

Moving right along, I choose the container orchestration platform (ecs or docker-local, as I explained earlier) and the EC2 instance type for the container nodes:

Next, I choose my VPC and the subnets for the Ethereum network and the Application Load Balancer:

I configure my keypair, EC2 security group, IAM role, and instance profile ARN (full information on the required permissions can be found in the documentation):

The Instance Profile ARN can be found on the summary page for the role:

I confirm that I want to deploy EthStats and EthExplorer, choose the tag and version for the nested CloudFormation templates that are used by this one, and click Next to proceed:

On the next page I specify a tag for the resources that the stack will create, leave the other options as-is, and click Next:

I review all of the parameters and options, acknowledge that the stack might create IAM resources, and click Create to build my network:

The template makes use of three nested templates:

After all of the stacks have been created (mine took about 5 minutes), I can select JeffNet and click the Outputs tab to discover the links to EthStats and EthExplorer:

Here’s my EthStats:

And my EthExplorer:

If I am writing apps that make use of my private network to store and process smart contracts, I would use the EthJsonRpcUrl.

Stay Tuned
My colleagues are eager to get your feedback on these new templates and plan to add new versions of the frameworks as they become available.

Jeff;

 

Amazon ECS Service Discovery

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-ecs-service-discovery/

Amazon ECS now includes integrated service discovery. This makes it possible for an ECS service to automatically register itself with a predictable and friendly DNS name in Amazon Route 53. As your services scale up or down in response to load or container health, the Route 53 hosted zone is kept up to date, allowing other services to lookup where they need to make connections based on the state of each service. You can see a demo of service discovery in an imaginary social networking app over at: https://servicediscovery.ranman.com/.

Service Discovery


Part of the transition to microservices and modern architectures involves having dynamic, autoscaling, and robust services that can respond quickly to failures and changing loads. Your services probably have complex dependency graphs of services they rely on and services they provide. A modern architectural best practice is to loosely couple these services by allowing them to specify their own dependencies, but this can be complicated in dynamic environments as your individual services are forced to find their own connection points.

Traditional approaches to service discovery like consul, etcd, or zookeeper all solve this problem well, but they require provisioning and maintaining additional infrastructure or installation of agents in your containers or on your instances. Previously, to ensure that services were able to discover and connect with each other, you had to configure and run your own service discovery system or connect every service to a load balancer. Now, you can enable service discovery for your containerized services in the ECS console, AWS CLI, or using the ECS API.

Introducing Amazon Route 53 Service Registry and Auto Naming APIs

Amazon ECS Service Discovery works by communicating with the Amazon Route 53 Service Registry and Auto Naming APIs. Since we haven’t talked about it before on this blog, I want to briefly outline how these Route 53 APIs work. First, some vocabulary:

  • Namespaces – A namespace specifies a domain name you want to route traffic to (e.g. internal, local, corp). You can think of it as a logical boundary between which services should be able to discover each other. You can create a namespace with a call to the aws servicediscovery create-private-dns-namespace command or in the ECS console. Namespaces are roughly equivalent to hosted zones in Route 53. A namespace contains services, our next vocabulary word.
  • Service – A service is a specific application or set of applications in your namespace like “auth”, “timeline”, or “worker”. A service contains service instances.
  • Service Instance – A service instance contains information about how Route 53 should respond to DNS queries for a resource.

Route 53 provides APIs to create: namespaces, A records per task IP, and SRV records per task IP + port.

When we ask Route 53 for something like: worker.corp we should get back a set of possible IPs that could fulfill that request. If the application we’re connecting to exposes dynamic ports then the calling application can easily query the SRV record to get more information.

ECS service discovery is built on top of the Route 53 APIs and manages all of the underlying API calls for you. Now that we understand how the service registry, works lets take a look at the ECS side to see service discovery in action.

Amazon ECS Service Discovery

Let’s launch an application with service discovery! First, I’ll create two task definitions: “flask-backend” and “flask-worker”. Both are simple AWS Fargate tasks with a single container serving HTTP requests. I’ll have flask-backend ask worker.corp to do some work and I’ll return the response as well as the address Route 53 returned for worker. Something like the code below:

@app.route("/")
namespace = os.getenv("namespace")
worker_host = "worker" + namespace
def backend():
    r = requests.get("http://"+worker_host)
    worker = socket.gethostbyname(worker_host)
    return "Worker Message: {]\nFrom: {}".format(r.content, worker)

 

Now, with my containers and task definitions in place, I’ll create a service in the console.

As I move to step two in the service wizard I’ll fill out the service discovery section and have ECS create a new namespace for me.

I’ll also tell ECS to monitor the health of the tasks in my service and add or remove them from Route 53 as needed. Then I’ll set a TTL of 10 seconds on the A records we’ll use.

I’ll repeat those same steps for my “worker” service and after a minute or so most of my tasks should be up and running.

Over in the Route 53 console I can see all the records for my tasks!

We can use the Route 53 service discovery APIs to list all of our available services and tasks and programmatically reach out to each one. We could easily extend to any number of services past just backend and worker. I’ve created a simple demo of an imaginary social network with services like “auth”, “feed”, “timeline”, “worker”, “user” and more here: https://servicediscovery.ranman.com/. You can see the code used to run that page on github.

Available Now
Amazon ECS service discovery is available now in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). AWS Fargate is currently only available in US East (N. Virginia). When you use ECS service discovery, you pay for the Route 53 resources that you consume, including each namespace that you create, and for the lookup queries your services make. Container level health checks are provided at no cost. For more information on pricing check out the documentation.

Please let us know what you’ll be building or refactoring with service discovery either in the comments or on Twitter!

Randall

 

P.S. Every blog post I write is made with a tremendous amount of help from numerous AWS colleagues. To everyone that helped build service discovery across all of our teams – thank you :)!

How I built a data warehouse using Amazon Redshift and AWS services in record time

Post Syndicated from Stephen Borg original https://aws.amazon.com/blogs/big-data/how-i-built-a-data-warehouse-using-amazon-redshift-and-aws-services-in-record-time/

This is a customer post by Stephen Borg, the Head of Big Data and BI at Cerberus Technologies.

Cerberus Technologies, in their own words: Cerberus is a company founded in 2017 by a team of visionary iGaming veterans. Our mission is simple – to offer the best tech solutions through a data-driven and a customer-first approach, delivering innovative solutions that go against traditional forms of working and process. This mission is based on the solid foundations of reliability, flexibility and security, and we intend to fundamentally change the way iGaming and other industries interact with technology.

Over the years, I have developed and created a number of data warehouses from scratch. Recently, I built a data warehouse for the iGaming industry single-handedly. To do it, I used the power and flexibility of Amazon Redshift and the wider AWS data management ecosystem. In this post, I explain how I was able to build a robust and scalable data warehouse without the large team of experts typically needed.

In two of my recent projects, I ran into challenges when scaling our data warehouse using on-premises infrastructure. Data was growing at many tens of gigabytes per day, and query performance was suffering. Scaling required major capital investment for hardware and software licenses, and also significant operational costs for maintenance and technical staff to keep it running and performing well. Unfortunately, I couldn’t get the resources needed to scale the infrastructure with data growth, and these projects were abandoned. Thanks to cloud data warehousing, the bottleneck of infrastructure resources, capital expense, and operational costs have been significantly reduced or have totally gone away. There is no more excuse for allowing obstacles of the past to delay delivering timely insights to decision makers, no matter how much data you have.

With Amazon Redshift and AWS, I delivered a cloud data warehouse to the business very quickly, and with a small team: me. I didn’t have to order hardware or software, and I no longer needed to install, configure, tune, or keep up with patches and version updates. Instead, I easily set up a robust data processing pipeline and we were quickly ingesting and analyzing data. Now, my data warehouse team can be extremely lean, and focus more time on bringing in new data and delivering insights. In this post, I show you the AWS services and the architecture that I used.

Handling data feeds

I have several different data sources that provide everything needed to run the business. The data includes activity from our iGaming platform, social media posts, clickstream data, marketing and campaign performance, and customer support engagements.

To handle the diversity of data feeds, I developed abstract integration applications using Docker that run on Amazon EC2 Container Service (Amazon ECS) and feed data to Amazon Kinesis Data Streams. These data streams can be used for real time analytics. In my system, each record in Kinesis is preprocessed by an AWS Lambda function to cleanse and aggregate information. My system then routes it to be stored where I need on Amazon S3 by Amazon Kinesis Data Firehose. Suppose that you used an on-premises architecture to accomplish the same task. A team of data engineers would be required to maintain and monitor a Kafka cluster, develop applications to stream data, and maintain a Hadoop cluster and the infrastructure underneath it for data storage. With my stream processing architecture, there are no servers to manage, no disk drives to replace, and no service monitoring to write.

Setting up a Kinesis stream can be done with a few clicks, and the same for Kinesis Firehose. Firehose can be configured to automatically consume data from a Kinesis Data Stream, and then write compressed data every N minutes to Amazon S3. When I want to process a Kinesis data stream, it’s very easy to set up a Lambda function to be executed on each message received. I can just set a trigger from the AWS Lambda Management Console, as shown following.

I also monitor the duration of function execution using Amazon CloudWatch and AWS X-Ray.

Regardless of the format I receive the data from our partners, I can send it to Kinesis as JSON data using my own formatters. After Firehose writes this to Amazon S3, I have everything in nearly the same structure I received but compressed, encrypted, and optimized for reading.

This data is automatically crawled by AWS Glue and placed into the AWS Glue Data Catalog. This means that I can immediately query the data directly on S3 using Amazon Athena or through Amazon Redshift Spectrum. Previously, I used Amazon EMR and an Amazon RDS–based metastore in Apache Hive for catalog management. Now I can avoid the complexity of maintaining Hive Metastore catalogs. Glue takes care of high availability and the operations side so that I know that end users can always be productive.

Working with Amazon Athena and Amazon Redshift for analysis

I found Amazon Athena extremely useful out of the box for ad hoc analysis. Our engineers (me) use Athena to understand new datasets that we receive and to understand what transformations will be needed for long-term query efficiency.

For our data analysts and data scientists, we’ve selected Amazon Redshift. Amazon Redshift has proven to be the right tool for us over and over again. It easily processes 20+ million transactions per day, regardless of the footprint of the tables and the type of analytics required by the business. Latency is low and query performance expectations have been more than met. We use Redshift Spectrum for long-term data retention, which enables me to extend the analytic power of Amazon Redshift beyond local data to anything stored in S3, and without requiring me to load any data. Redshift Spectrum gives me the freedom to store data where I want, in the format I want, and have it available for processing when I need it.

To load data directly into Amazon Redshift, I use AWS Data Pipeline to orchestrate data workflows. I create Amazon EMR clusters on an intra-day basis, which I can easily adjust to run more or less frequently as needed throughout the day. EMR clusters are used together with Amazon RDS, Apache Spark 2.0, and S3 storage. The data pipeline application loads ETL configurations from Spring RESTful services hosted on AWS Elastic Beanstalk. The application then loads data from S3 into memory, aggregates and cleans the data, and then writes the final version of the data to Amazon Redshift. This data is then ready to use for analysis. Spark on EMR also helps with recommendations and personalization use cases for various business users, and I find this easy to set up and deliver what users want. Finally, business users use Amazon QuickSight for self-service BI to slice, dice, and visualize the data depending on their requirements.

Each AWS service in this architecture plays its part in saving precious time that’s crucial for delivery and getting different departments in the business on board. I found the services easy to set up and use, and all have proven to be highly reliable for our use as our production environments. When the architecture was in place, scaling out was either completely handled by the service, or a matter of a simple API call, and crucially doesn’t require me to change one line of code. Increasing shards for Kinesis can be done in a minute by editing a stream. Increasing capacity for Lambda functions can be accomplished by editing the megabytes allocated for processing, and concurrency is handled automatically. EMR cluster capacity can easily be increased by changing the master and slave node types in Data Pipeline, or by using Auto Scaling. Lastly, RDS and Amazon Redshift can be easily upgraded without any major tasks to be performed by our team (again, me).

In the end, using AWS services including Kinesis, Lambda, Data Pipeline, and Amazon Redshift allows me to keep my team lean and highly productive. I eliminated the cost and delays of capital infrastructure, as well as the late night and weekend calls for support. I can now give maximum value to the business while keeping operational costs down. My team pushed out an agile and highly responsive data warehouse solution in record time and we can handle changing business requirements rapidly, and quickly adapt to new data and new user requests.


Additional Reading

If you found this post useful, be sure to check out Deploy a Data Warehouse Quickly with Amazon Redshift, Amazon RDS for PostgreSQL and Tableau Server and Top 8 Best Practices for High-Performance ETL Processing Using Amazon Redshift.


About the Author

Stephen Borg is the Head of Big Data and BI at Cerberus Technologies. He has a background in platform software engineering, and first became involved in data warehousing using the typical RDBMS, SQL, ETL, and BI tools. He quickly became passionate about providing insight to help others optimize the business and add personalization to products. He is now the Head of Big Data and BI at Cerberus Technologies.

 

 

 

Migrating Your Amazon ECS Containers to AWS Fargate

Post Syndicated from Tiffany Jernigan original https://aws.amazon.com/blogs/compute/migrating-your-amazon-ecs-containers-to-aws-fargate/

AWS Fargate is a new technology that works with Amazon Elastic Container Service (ECS) to run containers without having to manage servers or clusters. What does this mean? With Fargate, you no longer need to provision or manage a single virtual machine; you can just create tasks and run them directly!

Fargate uses the same API actions as ECS, so you can use the ECS console, the AWS CLI, or the ECS CLI. I recommend running through the first-run experience for Fargate even if you’re familiar with ECS. It creates all of the one-time setup requirements, such as the necessary IAM roles. If you’re using a CLI, make sure to upgrade to the latest version

In this blog, you will see how to migrate ECS containers from running on Amazon EC2 to Fargate.

Getting started

Note: Anything with code blocks is a change in the task definition file. Screen captures are from the console. Additionally, Fargate is currently available in the us-east-1 (N. Virginia) region.

Launch type

When you create tasks (grouping of containers) and clusters (grouping of tasks), you now have two launch type options: EC2 and Fargate. The default launch type, EC2, is ECS as you knew it before the announcement of Fargate. You need to specify Fargate as the launch type when running a Fargate task.

Even though Fargate abstracts away virtual machines, tasks still must be launched into a cluster. With Fargate, clusters are a logical infrastructure and permissions boundary that allow you to isolate and manage groups of tasks. ECS also supports heterogeneous clusters that are made up of tasks running on both EC2 and Fargate launch types.

The optional, new requiresCompatibilities parameter with FARGATE in the field ensures that your task definition only passes validation if you include Fargate-compatible parameters. Tasks can be flagged as compatible with EC2, Fargate, or both.

"requiresCompatibilities": [
    "FARGATE"
]

Networking

"networkMode": "awsvpc"

In November, we announced the addition of task networking with the network mode awsvpc. By default, ECS uses the bridge network mode. Fargate requires using the awsvpc network mode.

In bridge mode, all of your tasks running on the same instance share the instance’s elastic network interface, which is a virtual network interface, IP address, and security groups.

The awsvpc mode provides this networking support to your tasks natively. You now get the same VPC networking and security controls at the task level that were previously only available with EC2 instances. Each task gets its own elastic networking interface and IP address so that multiple applications or copies of a single application can run on the same port number without any conflicts.

The awsvpc mode also provides a separation of responsibility for tasks. You can get complete control of task placement within your own VPCs, subnets, and the security policies associated with them, even though the underlying infrastructure is managed by Fargate. Also, you can assign different security groups to each task, which gives you more fine-grained security. You can give an application only the permissions it needs.

"portMappings": [
    {
        "containerPort": "3000"
    }
 ]

What else has to change? First, you only specify a containerPort value, not a hostPort value, as there is no host to manage. Your container port is the port that you access on your elastic network interface IP address. Therefore, your container ports in a single task definition file need to be unique.

"environment": [
    {
        "name": "WORDPRESS_DB_HOST",
        "value": "127.0.0.1:3306"
    }
 ]

Additionally, links are not allowed as they are a property of the “bridge” network mode (and are now a legacy feature of Docker). Instead, containers share a network namespace and communicate with each other over the localhost interface. They can be referenced using the following:

localhost/127.0.0.1:<some_port_number>

CPU and memory

"memory": "1024",
 "cpu": "256"

"memory": "1gb",
 "cpu": ".25vcpu"

When launching a task with the EC2 launch type, task performance is influenced by the instance types that you select for your cluster combined with your task definition. If you pick larger instances, your applications make use of the extra resources if there is no contention.

In Fargate, you needed a way to get additional resource information so we created task-level resources. Task-level resources define the maximum amount of memory and cpu that your task can consume.

  • memory can be defined in MB with just the number, or in GB, for example, “1024” or “1gb”.
  • cpu can be defined as the number or in vCPUs, for example, “256” or “.25vcpu”.
    • vCPUs are virtual CPUs. You can look at the memory and vCPUs for instance types to get an idea of what you may have used before.

The memory and CPU options available with Fargate are:

CPUMemory
256 (.25 vCPU)0.5GB, 1GB, 2GB
512 (.5 vCPU)1GB, 2GB, 3GB, 4GB
1024 (1 vCPU)2GB, 3GB, 4GB, 5GB, 6GB, 7GB, 8GB
2048 (2 vCPU)Between 4GB and 16GB in 1GB increments
4096 (4 vCPU)Between 8GB and 30GB in 1GB increments

IAM roles

Because Fargate uses awsvpc mode, you need an Amazon ECS service-linked IAM role named AWSServiceRoleForECS. It provides Fargate with the needed permissions, such as the permission to attach an elastic network interface to your task. After you create your service-linked IAM role, you can delete the remaining roles in your services.

"executionRoleArn": "arn:aws:iam::<your_account_id>:role/ecsTaskExecutionRole"

With the EC2 launch type, an instance role gives the agent the ability to pull, publish, talk to ECS, and so on. With Fargate, the task execution IAM role is only needed if you’re pulling from Amazon ECR or publishing data to Amazon CloudWatch Logs.

The Fargate first-run experience tutorial in the console automatically creates these roles for you.

Volumes

Fargate currently supports non-persistent, empty data volumes for containers. When you define your container, you no longer use the host field and only specify a name.

Load balancers

For awsvpc mode, and therefore for Fargate, use the IP target type instead of the instance target type. You define this in the Amazon EC2 service when creating a load balancer.

If you’re using a Classic Load Balancer, change it to an Application Load Balancer or a Network Load Balancer.

Tip: If you are using an Application Load Balancer, make sure that your tasks are launched in the same VPC and Availability Zones as your load balancer.

Let’s migrate a task definition!

Here is an example NGINX task definition. This type of task definition is what you’re used to if you created one before Fargate was announced. It’s what you would run now with the EC2 launch type.

{
    "containerDefinitions": [
        {
            "name": "nginx",
            "image": "nginx",
            "memory": "512",
            "cpu": "100",
            "essential": true,
            "portMappings": [
                {
                    "hostPort": "80",
                    "containerPort": "80",
                    "protocol": "tcp"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "family": "nginx-ec2"
}

OK, so now what do you need to do to change it to run with the Fargate launch type?

  • Add FARGATE for requiredCompatibilities (not required, but a good safety check for your task definition).
  • Use awsvpc as the network mode.
  • Just specify the containerPort (the hostPortvalue is the same).
  • Add a task executionRoleARN value to allow logging to CloudWatch.
  • Provide cpu and memory limits for the task.
{
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "containerDefinitions": [
        {
            "name": "nginx",
            "image": "nginx",
            "memory": "512",
            "cpu": "100",
            "essential": true,
            "portMappings": [
                {
                    "containerPort": "80",
                    "protocol": "tcp"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "networkMode": "awsvpc",
    "executionRoleArn": "arn:aws:iam::<your_account_id>:role/ecsTaskExecutionRole",
    "family": "nginx-fargate",
    "memory": "512",
    "cpu": "256"
}

Are there more examples?

Yep! Head to the AWS Samples GitHub repo. We have several sample task definitions you can try for both the EC2 and Fargate launch types. Contributions are very welcome too :).

 

tiffany jernigan
@tiffanyfayj

Task Networking in AWS Fargate

Post Syndicated from Nathan Peck original https://aws.amazon.com/blogs/compute/task-networking-in-aws-fargate/

AWS Fargate is a technology that allows you to focus on running your application without needing to provision, monitor, or manage the underlying compute infrastructure. You package your application into a Docker container that you can then launch using your container orchestration tool of choice.

Fargate allows you to use containers without being responsible for Amazon EC2 instances, similar to how EC2 allows you to run VMs without managing physical infrastructure. Currently, Fargate provides support for Amazon Elastic Container Service (Amazon ECS). Support for Amazon Elastic Container Service for Kubernetes (Amazon EKS) will be made available in the near future.

Despite offloading the responsibility for the underlying instances, Fargate still gives you deep control over configuration of network placement and policies. This includes the ability to use many networking fundamentals such as Amazon VPC and security groups.

This post covers how to take advantage of the different ways of networking your containers in Fargate when using ECS as your orchestration platform, with a focus on how to do networking securely.

The first step to running any application in Fargate is defining an ECS task for Fargate to launch. A task is a logical group of one or more Docker containers that are deployed with specified settings. When running a task in Fargate, there are two different forms of networking to consider:

  • Container (local) networking
  • External networking

Container Networking

Container networking is often used for tightly coupled application components. Perhaps your application has a web tier that is responsible for serving static content as well as generating some dynamic HTML pages. To generate these dynamic pages, it has to fetch information from another application component that has an HTTP API.

One potential architecture for such an application is to deploy the web tier and the API tier together as a pair and use local networking so the web tier can fetch information from the API tier.

If you are running these two components as two processes on a single EC2 instance, the web tier application process could communicate with the API process on the same machine by using the local loopback interface. The local loopback interface has a special IP address of 127.0.0.1 and hostname of localhost.

By making a networking request to this local interface, it bypasses the network interface hardware and instead the operating system just routes network calls from one process to the other directly. This gives the web tier a fast and efficient way to fetch information from the API tier with almost no networking latency.

In Fargate, when you launch multiple containers as part of a single task, they can also communicate with each other over the local loopback interface. Fargate uses a special container networking mode called awsvpc, which gives all the containers in a task a shared elastic network interface to use for communication.

If you specify a port mapping for each container in the task, then the containers can communicate with each other on that port. For example the following task definition could be used to deploy the web tier and the API tier:

{
  "family": "myapp"
  "containerDefinitions": [
    {
      "name": "web",
      "image": "my web image url",
      "portMappings": [
        {
          "containerPort": 80
        }
      ],
      "memory": 500,
      "cpu": 10,
      "esssential": true
    },
    {
      "name": "api",
      "image": "my api image url",
      "portMappings": [
        {
          "containerPort": 8080
        }
      ],
      "cpu": 10,
      "memory": 500,
      "essential": true
    }
  ]
}

ECS, with Fargate, is able to take this definition and launch two containers, each of which is bound to a specific static port on the elastic network interface for the task.

Because each Fargate task has its own isolated networking stack, there is no need for dynamic ports to avoid port conflicts between different tasks as in other networking modes. The static ports make it easy for containers to communicate with each other. For example, the web container makes a request to the API container using its well-known static port:

curl 127.0.0.1:8080/my-endpoint

This sends a local network request, which goes directly from one container to the other over the local loopback interface without traversing the network. This deployment strategy allows for fast and efficient communication between two tightly coupled containers. But most application architectures require more than just internal local networking.

External Networking

External networking is used for network communications that go outside the task to other servers that are not part of the task, or network communications that originate from other hosts on the internet and are directed to the task.

Configuring external networking for a task is done by modifying the settings of the VPC in which you launch your tasks. A VPC is a fundamental tool in AWS for controlling the networking capabilities of resources that you launch on your account.

When setting up a VPC, you create one or more subnets, which are logical groups that your resources can be placed into. Each subnet has an Availability Zone and its own route table, which defines rules about how network traffic operates for that subnet. There are two main types of subnets: public and private.

Public subnets

A public subnet is a subnet that has an associated internet gateway. Fargate tasks in that subnet are assigned both private and public IP addresses:


A browser or other client on the internet can send network traffic to the task via the internet gateway using its public IP address. The tasks can also send network traffic to other servers on the internet because the route table can route traffic out via the internet gateway.

If tasks want to communicate directly with each other, they can use each other’s private IP address to send traffic directly from one to the other so that it stays inside the subnet without going out to the internet gateway and back in.

Private subnets

A private subnet does not have direct internet access. The Fargate tasks inside the subnet don’t have public IP addresses, only private IP addresses. Instead of an internet gateway, a network address translation (NAT) gateway is attached to the subnet:

 

There is no way for another server or client on the internet to reach your tasks directly, because they don’t even have an address or a direct route to reach them. This is a great way to add another layer of protection for internal tasks that handle sensitive data. Those tasks are protected and can’t receive any inbound traffic at all.

In this configuration, the tasks can still communicate to other servers on the internet via the NAT gateway. They would appear to have the IP address of the NAT gateway to the recipient of the communication. If you run a Fargate task in a private subnet, you must add this NAT gateway. Otherwise, Fargate can’t make a network request to Amazon ECR to download the container image, or communicate with Amazon CloudWatch to store container metrics.

Load balancers

If you are running a container that is hosting internet content in a private subnet, you need a way for traffic from the public to reach the container. This is generally accomplished by using a load balancer such as an Application Load Balancer or a Network Load Balancer.

ECS integrates tightly with AWS load balancers by automatically configuring a service-linked load balancer to send network traffic to containers that are part of the service. When each task starts, the IP address of its elastic network interface is added to the load balancer’s configuration. When the task is being shut down, network traffic is safely drained from the task before removal from the load balancer.

To get internet traffic to containers using a load balancer, the load balancer is placed into a public subnet. ECS configures the load balancer to forward traffic to the container tasks in the private subnet:

This configuration allows your tasks in Fargate to be safely isolated from the rest of the internet. They can still initiate network communication with external resources via the NAT gateway, and still receive traffic from the public via the Application Load Balancer that is in the public subnet.

Another potential use case for a load balancer is for internal communication from one service to another service within the private subnet. This is typically used for a microservice deployment, in which one service such as an internet user account service needs to communicate with an internal service such as a password service. Obviously, it is undesirable for the password service to be directly accessible on the internet, so using an internet load balancer would be a major security vulnerability. Instead, this can be accomplished by hosting an internal load balancer within the private subnet:

With this approach, one container can distribute requests across an Auto Scaling group of other private containers via the internal load balancer, ensuring that the network traffic stays safely protected within the private subnet.

Best Practices for Fargate Networking

Determine whether you should use local task networking

Local task networking is ideal for communicating between containers that are tightly coupled and require maximum networking performance between them. However, when you deploy one or more containers as part of the same task they are always deployed together so it removes the ability to independently scale different types of workload up and down.

In the example of the application with a web tier and an API tier, it may be the case that powering the application requires only two web tier containers but 10 API tier containers. If local container networking is used between these two container types, then an extra eight unnecessary web tier containers would end up being run instead of allowing the two different services to scale independently.

A better approach would be to deploy the two containers as two different services, each with its own load balancer. This allows clients to communicate with the two web containers via the web service’s load balancer. The web service could distribute requests across the eight backend API containers via the API service’s load balancer.

Run internet tasks that require internet access in a public subnet

If you have tasks that require internet access and a lot of bandwidth for communication with other services, it is best to run them in a public subnet. Give them public IP addresses so that each task can communicate with other services directly.

If you run these tasks in a private subnet, then all their outbound traffic has to go through an NAT gateway. AWS NAT gateways support up to 10 Gbps of burst bandwidth. If your bandwidth requirements go over this, then all task networking starts to get throttled. To avoid this, you could distribute the tasks across multiple private subnets, each with their own NAT gateway. It can be easier to just place the tasks into a public subnet, if possible.

Avoid using a public subnet or public IP addresses for private, internal tasks

If you are running a service that handles private, internal information, you should not put it into a public subnet or use a public IP address. For example, imagine that you have one task, which is an API gateway for authentication and access control. You have another background worker task that handles sensitive information.

The intended access pattern is that requests from the public go to the API gateway, which then proxies request to the background task only if the request is from an authenticated user. If the background task is in a public subnet and has a public IP address, then it could be possible for an attacker to bypass the API gateway entirely. They could communicate directly to the background task using its public IP address, without being authenticated.

Conclusion

Fargate gives you a way to run containerized tasks directly without managing any EC2 instances, but you still have full control over how you want networking to work. You can set up containers to talk to each other over the local network interface for maximum speed and efficiency. For running workloads that require privacy and security, use a private subnet with public internet access locked down. Or, for simplicity with an internet workload, you can just use a public subnet and give your containers a public IP address.

To deploy one of these Fargate task networking approaches, check out some sample CloudFormation templates showing how to configure the VPC, subnets, and load balancers.

If you have questions or suggestions, please comment below.

Building Blocks of Amazon ECS

Post Syndicated from Tiffany Jernigan original https://aws.amazon.com/blogs/compute/building-blocks-of-amazon-ecs/

So, what’s Amazon Elastic Container Service (ECS)? ECS is a managed service for running containers on AWS, designed to make it easy to run applications in the cloud without worrying about configuring the environment for your code to run in. Using ECS, you can easily deploy containers to host a simple website or run complex distributed microservices using thousands of containers.

Getting started with ECS isn’t too difficult. To fully understand how it works and how you can use it, it helps to understand the basic building blocks of ECS and how they fit together!

Let’s begin with an analogy

Imagine you’re in a virtual reality game with blocks and portals, in which your task is to build kingdoms.

In your spaceship, you pull up a holographic map of your upcoming destination: Nozama, a golden-orange planet. Looking at its various regions, you see that the nearest one is za-southwest-1 (SW Nozama). You set your destination, and use your jump drive to jump to the outer atmosphere of za-southwest-1.

As you approach SW Nozama, you see three portals, 1a, 1b, and 1c. Each portal lets you transport directly to an isolated zone (Availability Zone), where you can start construction on your new kingdom (cluster), Royaume.

With your supply of blocks, you take the portal to 1b, and erect the surrounding walls of your first territory (instance)*.

Before you get ahead of yourself, there are some rules to keep in mind. For your territory to be a part of Royaume, the land ordinance requires construction of a building (container), specifically a castle, from which your territory’s lord (agent)* rules.

You can then create architectural plans (task definitions) to build your developments (tasks), consisting of up to 10 buildings per plan. A development can be built now within this or any territory, or multiple territories.

If you do decide to create more territories, you can either stay here in 1b or take a portal to another location in SW Nozama and start building there.

Amazon EC2 building blocks

We currently provide two launch types: EC2 and Fargate. With Fargate, the Amazon EC2 instances are abstracted away and managed for you. Instead of worrying about ECS container instances, you can just worry about tasks. In this post, the infrastructure components used by ECS that are handled by Fargate are marked with a *.

Instance*

EC2 instances are good ol’ virtual machines (VMs). And yes, don’t worry, you can connect to them (via SSH). Because customers have varying needs in memory, storage, and computing power, many different instance types are offered. Just want to run a small application or try a free trial? Try t2.micro. Want to run memory-optimized workloads? R3 and X1 instances are a couple options. There are many more instance types as well, which cater to various use cases.

AMI*

Sorry if you wanted to immediately march forward, but before you create your instance, you need to choose an AMI. An AMI stands for Amazon Machine Image. What does that mean? Basically, an AMI provides the information required to launch an instance: root volume, launch permissions, and volume-attachment specifications. You can find and choose a Linux or Windows AMI provided by AWS, the user community, the AWS Marketplace (for example, the Amazon ECS-Optimized AMI), or you can create your own.

Region

AWS is divided into regions that are geographic areas around the world (for now it’s just Earth, but maybe someday…). These regions have semi-evocative names such as us-east-1 (N. Virginia), us-west-2 (Oregon), eu-central-1 (Frankfurt), ap-northeast-1 (Tokyo), etc.

Each region is designed to be completely isolated from the others, and consists of multiple, distinct data centers. This creates a “blast radius” for failure so that even if an entire region goes down, the others aren’t affected. Like many AWS services, to start using ECS, you first need to decide the region in which to operate. Typically, this is the region nearest to you or your users.

Availability Zone

AWS regions are subdivided into Availability Zones. A region has at minimum two zones, and up to a handful. Zones are physically isolated from each other, spanning one or more different data centers, but are connected through low-latency, fiber-optic networking, and share some common facilities. EC2 is designed so that the most common failures only affect a single zone to prevent region-wide outages. This means you can achieve high availability in a region by spanning your services across multiple zones and distributing across hosts.

Amazon ECS building blocks

Container

Well, without containers, ECS wouldn’t exist!

Are containers virtual machines?
Nope! Virtual machines virtualize the hardware (benefits), while containers virtualize the operating system (even more benefits!). If you look inside a container, you would see that it is made by processes running on the host, and tied together by kernel constructs like namespaces, cgroups, etc. But you don’t need to bother about that level of detail, at least not in this post!

Why containers?
Containers give you the ability to build, ship, and run your code anywhere!

Before the cloud, you needed to self-host and therefore had to buy machines in addition to setting up and configuring the operating system (OS), and running your code. In the cloud, with virtualization, you can just skip to setting up the OS and running your code. Containers make the process even easier—you can just run your code.

Additionally, all of the dependencies travel in a package with the code, which is called an image. This allows containers to be deployed on any host machine. From the outside, it looks like a host is just holding a bunch of containers. They all look the same, in the sense that they are generic enough to be deployed on any host.

With ECS, you can easily run your containerized code and applications across a managed cluster of EC2 instances.

Are containers a fairly new technology?
The concept of containerization is not new. Its origins date back to 1979 with the creation of chroot. However, it wasn’t until the early 2000s that containers became a major technology. The most significant milestone to date was the release of Docker in 2013, which led to the popularization and widespread adoption of containers.

What does ECS use?
While other container technologies exist (LXC, rkt, etc.), because of its massive adoption and use by our customers, ECS was designed first to work natively with Docker containers.

Container instance*

Yep, you are back to instances. An instance is just slightly more complex in the ECS realm though. Here, it is an ECS container instance that is an EC2 instance running the agent, has a specifically defined IAM policy and role, and has been registered into your cluster.

And as you probably guessed, in these instances, you are running containers. 

AMI*

These container instances can use any AMI as long as it has the following specifications: a modern Linux distribution with the agent and the Docker Daemon with any Docker runtime dependencies running on it.

Want it more simplified? Well, AWS created the Amazon ECS-Optimized AMI for just that. Not only does that AMI come preconfigured with all of the previously mentioned specifications, it’s tested and includes the recommended ecs-init upstart process to run and monitor the agent.

Cluster

An ECS cluster is a grouping of (container) instances* (or tasks in Fargate) that lie within a single region, but can span multiple Availability Zones – it’s even a good idea for redundancy. When launching an instance (or tasks in Fargate), unless specified, it registers with the cluster named “default”. If “default” doesn’t exist, it is created. You can also scale and delete your clusters.

Agent*

The Amazon ECS container agent is a Go program that runs in its own container within each EC2 instance that you use with ECS. (It’s also available open source on GitHub!) The agent is the intermediary component that takes care of the communication between the scheduler and your instances. Want to register your instance into a cluster? (Why wouldn’t you? A cluster is both a logical boundary and provider of pool of resources!) Then you need to run the agent on it.

Task

When you want to start a container, it has to be part of a task. Therefore, you have to create a task first. Succinctly, tasks are a logical grouping of 1 to N containers that run together on the same instance, with N defined by you, up to 10. Let’s say you want to run a custom blog engine. You could put together a web server, an application server, and an in-memory cache, each in their own container. Together, they form a basic frontend unit.

Task definition

Ah, but you cannot create a task directly. You have to create a task definition that tells ECS that “task definition X is composed of this container (and maybe that other container and that other container too!).” It’s kind of like an architectural plan for a city. Some other details it can include are how the containers interact, container CPU and memory constraints, and task permissions using IAM roles.

Then you can tell ECS, “start one task using task definition X.” It might sound like unnecessary planning at first. As soon as you start to deal with multiple tasks, scaling, upgrades, and other “real life” scenarios, you’ll be glad that you have task definitions to keep track of things!

Scheduler*

So, the scheduler schedules… sorry, this should be more helpful, huh? The scheduler is part of the “hosted orchestration layer” provided by ECS. Wait a minute, what do I mean by “hosted orchestration”? Simply put, hosted means that it’s operated by ECS on your behalf, without you having to care about it. Your applications are deployed in containers running on your instances, but the managing of tasks is taken care of by ECS. One less thing to worry about!

Also, the scheduler is the component that decides what (which containers) gets to run where (on which instances), according to a number of constraints. Say that you have a custom blog engine to scale for high availability. You could create a service, which by default, spreads tasks across all zones in the chosen region. And if you want each task to be on a different instance, you can use the distinctInstance task placement constraint. ECS makes sure that not only this happens, but if a task fails, it starts again.

Service

To ensure that you always have your task running without managing it yourself, you can create a service based on the task that you defined and ECS ensures that it stays running. A service is a special construct that says, “at any given time, I want to make sure that N tasks using task definition X1 are running.” If N=1, it just means “make sure that this task is running, and restart it if needed!” And with N>1, you’re basically scaling your application until you hit N, while also ensuring each task is running.

So, what now?

Hopefully you, at the very least, learned a tiny something. All comments are very welcome!

Want to discuss ECS with others? Join the amazon-ecs slack group, which members of the community created and manage.

Also, if you’re interested in learning more about the core concepts of ECS and its relation to EC2, here are some resources:

Pages
Amazon ECS landing page
AWS Fargate landing page
Amazon ECS Getting Started
Nathan Peck’s AWSome ECS

Docs
Amazon EC2
Amazon ECS

Blogs
AWS Compute Blog
AWS Blog

GitHub code
Amazon ECS container agent
Amazon ECS CLI

AWS videos
Learn Amazon ECS
AWS videos
AWS webinars

 

— tiffany

 @tiffanyfayj

 

Migrating .NET Classic Applications to Amazon ECS Using Windows Containers

Post Syndicated from Sundar Narasiman original https://aws.amazon.com/blogs/compute/migrating-net-classic-applications-to-amazon-ecs-using-windows-containers/

This post contributed by Sundar Narasiman, Arun Kannan, and Thomas Fuller.

AWS recently announced the general availability of Windows container management for Amazon Elastic Container Service (Amazon ECS). Docker containers and Amazon ECS make it easy to run and scale applications on a virtual machine by abstracting the complex cluster management and setup needed.

Classic .NET applications are developed with .NET Framework 4.7.1 or older and can run only on a Windows platform. These include Windows Communication Foundation (WCF), ASP.NET Web Forms, and an ASP.NET MVC web app or web API.

Why classic ASP.NET?

ASP.NET MVC 4.6 and older versions of ASP.NET occupy a significant footprint in the enterprise web application space. As enterprises move towards microservices for new or existing applications, containers are one of the stepping stones for migrating from monolithic to microservices architectures. Additionally, the support for Windows containers in Windows 10, Windows Server 2016, and Visual Studio Tooling support for Docker simplifies the containerization of ASP.NET MVC apps.

Getting started

In this post, you pick an ASP.NET 4.6.2 MVC application and get step-by-step instructions for migrating to ECS using Windows containers. The detailed steps, AWS CloudFormation template, Microsoft Visual Studio solution, ECS service definition, and ECS task definition are available in the aws-ecs-windows-aspnet GitHub repository.

To help you getting started running Windows containers, here is the reference architecture for Windows containers on GitHub: ecs-refarch-cloudformation-windows. This reference architecture is the layered CloudFormation stack, in that it calls the other stacks to create the environment. The CloudFormation YAML template in this reference architecture is referenced to create a single JSON CloudFormation stack, which is used in the steps for the migration.

Steps for Migration

The code and templates to implement this migration can be found on GitHub: https://github.com/aws-samples/aws-ecs-windows-aspnet.

  1. Your development environment needs to have the latest version and updates for Visual Studio 2017, Windows 10, and Docker for Windows Stable.
  2. Next, containerize the ASP.NET application and test it locally. The size of Windows container application images is generally larger compared to Linux containers. This is because the base image of the Windows container itself is large in size, typically greater than 9 GB.
  3. After the application is containerized, the container image needs to be pushed to Amazon Elastic Container Registry (Amazon ECR). Images stored in ECR are compressed to improve pull times and reduce storage costs. In this case, you can see that ECR compresses the image to around 1 GB, for an optimization factor of 90%.
  4. Create a CloudFormation stack using the template in the ‘CloudFormation template’ folder. This creates an ECS service, task definition (referring the containerized ASP.NET application), and other related components mentioned in the ECS reference architecture for Windows containers.
  5. After the stack is created, verify the successful creation of the ECS service, ECS instances, running tasks (with the threshold mentioned in the task definition), and the Application Load Balancer’s successful health check against running containers.
  6. Navigate to the Application Load Balancer URL and see the successful rendering of the containerized ASP.NET MVC app in the browser.

Key Notes

  • Generally, Windows container images occupy large amount of space (in the order of few GBs).
  • All the task definition parameters for Linux containers are not available for Windows containers. For more information, see Windows Task Definitions.
  • An Application Load Balancer can be configured to route requests to one or more ports on each container instance in a cluster. The dynamic port mapping allows you to have multiple tasks from a single service on the same container instance.
  • IAM roles for Windows tasks require extra configuration. For more information, see Windows IAM Roles for Tasks. For this post, configuration was handled by the CloudFormation template.
  • The ECS container agent log file can be accessed for troubleshooting Windows containers: C:\ProgramData\Amazon\ECS\log\ecs-agent.log

Summary

In this post, you migrated an ASP.NET MVC application to ECS using Windows containers.

The logical next step is to automate the activities for migration to ECS and build a fully automated continuous integration/continuous deployment (CI/CD) pipeline for Windows containers. This can be orchestrated by leveraging services such as AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, Amazon ECR, and Amazon ECS. You can learn more about how this is done in the Set Up a Continuous Delivery Pipeline for Containers Using AWS CodePipeline and Amazon ECS post.

If you have questions or suggestions, please comment below.

New AWS Auto Scaling – Unified Scaling For Your Cloud Applications

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-auto-scaling-unified-scaling-for-your-cloud-applications/

I’ve been talking about scalability for servers and other cloud resources for a very long time! Back in 2006, I wrote “This is the new world of scalable, on-demand web services. Pay for what you need and use, and not a byte more.” Shortly after we launched Amazon Elastic Compute Cloud (EC2), we made it easy for you to do this with the simultaneous launch of Elastic Load Balancing, EC2 Auto Scaling, and Amazon CloudWatch. Since then we have added Auto Scaling to other AWS services including ECS, Spot Fleets, DynamoDB, Aurora, AppStream 2.0, and EMR. We have also added features such as target tracking to make it easier for you to scale based on the metric that is most appropriate for your application.

Introducing AWS Auto Scaling
Today we are making it easier for you to use the Auto Scaling features of multiple AWS services from a single user interface with the introduction of AWS Auto Scaling. This new service unifies and builds on our existing, service-specific, scaling features. It operates on any desired EC2 Auto Scaling groups, EC2 Spot Fleets, ECS tasks, DynamoDB tables, DynamoDB Global Secondary Indexes, and Aurora Replicas that are part of your application, as described by an AWS CloudFormation stack or in AWS Elastic Beanstalk (we’re also exploring some other ways to flag a set of resources as an application for use with AWS Auto Scaling).

You no longer need to set up alarms and scaling actions for each resource and each service. Instead, you simply point AWS Auto Scaling at your application and select the services and resources of interest. Then you select the desired scaling option for each one, and AWS Auto Scaling will do the rest, helping you to discover the scalable resources and then creating a scaling plan that addresses the resources of interest.

If you have tried to use any of our Auto Scaling options in the past, you undoubtedly understand the trade-offs involved in choosing scaling thresholds. AWS Auto Scaling gives you a variety of scaling options: You can optimize for availability, keeping plenty of resources in reserve in order to meet sudden spikes in demand. You can optimize for costs, running close to the line and accepting the possibility that you will tax your resources if that spike arrives. Alternatively, you can aim for the middle, with a generous but not excessive level of spare capacity. In addition to optimizing for availability, cost, or a blend of both, you can also set a custom scaling threshold. In each case, AWS Auto Scaling will create scaling policies on your behalf, including appropriate upper and lower bounds for each resource.

AWS Auto Scaling in Action
I will use AWS Auto Scaling on a simple CloudFormation stack consisting of an Auto Scaling group of EC2 instances and a pair of DynamoDB tables. I start by removing the existing Scaling Policies from my Auto Scaling group:

Then I open up the new Auto Scaling Console and selecting the stack:

Behind the scenes, Elastic Beanstalk applications are always launched via a CloudFormation stack. In the screen shot above, awseb-e-sdwttqizbp-stack is an Elastic Beanstalk application that I launched.

I can click on any stack to learn more about it before proceeding:

I select the desired stack and click on Next to proceed. Then I enter a name for my scaling plan and choose the resources that I’d like it to include:

I choose the scaling strategy for each type of resource:

After I have selected the desired strategies, I click Next to proceed. Then I review the proposed scaling plan, and click Create scaling plan to move ahead:

The scaling plan is created and in effect within a few minutes:

I can click on the plan to learn more:

I can also inspect each scaling policy:

I tested my new policy by applying a load to the initial EC2 instance, and watched the scale out activity take place:

I also took a look at the CloudWatch metrics for the EC2 Auto Scaling group:

Available Now
We are launching AWS Auto Scaling today in the US East (Northern Virginia), US East (Ohio), US West (Oregon), EU (Ireland), and Asia Pacific (Singapore) Regions today, with more to follow. There’s no charge for AWS Auto Scaling; you pay only for the CloudWatch Alarms that it creates and any AWS resources that you consume.

As is often the case with our new services, this is just the first step on what we hope to be a long and interesting journey! We have a long roadmap, and we’ll be adding new features and options throughout 2018 in response to your feedback.

Jeff;

Set Up a Continuous Delivery Pipeline for Containers Using AWS CodePipeline and Amazon ECS

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/set-up-a-continuous-delivery-pipeline-for-containers-using-aws-codepipeline-and-amazon-ecs/

This post contributed by Abby FullerAWS Senior Technical Evangelist

Last week, AWS announced support for Amazon Elastic Container Service (ECS) targets (including AWS Fargate) in AWS CodePipeline. This support makes it easier to create a continuous delivery pipeline for container-based applications and microservices.

Building and deploying containerized services manually is slow and prone to errors. Continuous delivery with automated build and test mechanisms helps detect errors early, saves time, and reduces failures, making this a popular model for application deployments. Previously, to automate your container workflows with ECS, you had to build your own solution using AWS CloudFormation. Now, you can integrate CodePipeline and CodeBuild with ECS to automate your workflows in just a few steps.

A typical continuous delivery workflow with CodePipeline, CodeBuild, and ECS might look something like the following:

  • Choosing your source
  • Building your project
  • Deploying your code

We also have a continuous deployment reference architecture on GitHub for this workflow.

Getting Started

First, create a new project with CodePipeline and give the project a name, such as “demo”.

Next, choose a source location where the code is stored. This could be AWS CodeCommit, GitHub, or Amazon S3. For this example, enter GitHub and then give CodePipeline access to the repository.

Next, add a build step. You can import an existing build, such as a Jenkins server URL or CodeBuild project, or create a new step with CodeBuild. If you don’t have an existing build project in CodeBuild, create one from within CodePipeline:

  • Build provider: AWS CodeBuild
  • Configure your project: Create a new build project
  • Environment image: Use an image managed by AWS CodeBuild
  • Operating system: Ubuntu
  • Runtime: Docker
  • Version: aws/codebuild/docker:1.12.1
  • Build specification: Use the buildspec.yml in the source code root directory

Now that you’ve created the CodeBuild step, you can use it as an existing project in CodePipeline.

Next, add a deployment provider. This is where your built code is placed. It can be a number of different options, such as AWS CodeDeploy, AWS Elastic Beanstalk, AWS CloudFormation, or Amazon ECS. For this example, connect to Amazon ECS.

For CodeBuild to deploy to ECS, you must create an image definition JSON file. This requires adding some instructions to the pre-build, build, and post-build phases of the CodeBuild build process in your buildspec.yml file. For help with creating the image definition file, see Step 1 of the Tutorial: Continuous Deployment with AWS CodePipeline.

  • Deployment provider: Amazon ECS
  • Cluster name: enter your project name from the build step
  • Service name: web
  • Image filename: enter your image definition filename (“web.json”).

You are almost done!

You can now choose an existing IAM service role that CodePipeline can use to access resources in your account, or let CodePipeline create one. For this example, use the wizard, and go with the role that it creates (AWS-CodePipeline-Service).

Finally, review all of your changes, and choose Create pipeline.

After the pipeline is created, you’ll have a model of your entire pipeline where you can view your executions, add different tests, add manual approvals, or release a change.

You can learn more in the AWS CodePipeline User Guide.

Happy automating!

Amazon Linux 2 – Modern, Stable, and Enterprise-Friendly

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-linux-2-modern-stable-and-enterprise-friendly/

I’m getting ready to wrap up my work for the year, cleaning up my inbox and catching up on a few recent AWS launches that happened at and shortly after AWS re:Invent.

Last week we launched Amazon Linux 2. This is modern version of Linux, designed to meet the security, stability, and productivity needs of enterprise environments while giving you timely access to new tools and features. It also includes all of the things that made the Amazon Linux AMI popular, including AWS integration, cloud-init, a secure default configuration, regular security updates, and AWS Support. From that base, we have added many new features including:

Long-Term Support – You can use Amazon Linux 2 in situations where you want to stick with a single major version of Linux for an extended period of time, perhaps to avoid re-qualifying your applications too frequently. This build (2017.12) is a candidate for LTS status; the final determination will be made based on feedback in the Amazon Linux Discussion Forum. Long-term support for the Amazon Linux 2 LTS build will include security updates, bug fixes, user-space Application Binary Interface (ABI), and user-space Application Programming Interface (API) compatibility for 5 years.

Extras Library – You can now get fast access to fresh, new functionality while keeping your base OS image stable and lightweight. The Amazon Linux Extras Library eliminates the age-old tradeoff between OS stability and access to fresh software. It contains open source databases, languages, and more, each packaged together with any needed dependencies.

Tuned Kernel – You have access to the latest 4.9 LTS kernel, with support for the latest EC2 features and tuned to run efficiently in AWS and other virtualized environments.

SystemdAmazon Linux 2 includes the systemd init system, designed to provide better boot performance and increased control over individual services and groups of interdependent services. For example, you can indicate that Service B must be started only after Service A is fully started, or that Service C should start on a change in network connection status.

Wide AvailabiltyAmazon Linux 2 is available in all AWS Regions in AMI and Docker image form. Virtual machine images for Hyper-V, KVM, VirtualBox, and VMware are also available. You can build and test your applications on your laptop or in your own data center and then deploy them to AWS.

Launching an Instance
You can launch an instance in all of the usual ways – AWS Management Console, AWS Command Line Interface (CLI), AWS Tools for Windows PowerShell, RunInstances, and via a AWS CloudFormation template. I’ll use the Console:

I’m interested in the Extras Library; here’s how I see which topics (lists of packages) are available:

As you can see, the library includes languages, editors, and web tools that receive frequent updates. Each topic contains all of dependencies that are needed to install the package on Amazon Linux 2. For example, the Rust topic includes the cmake build system for Rust, cargo for Rust package maintenance, and the LLVM-based compiler toolchain for Rust.

Here’s how I install a topic (Emacs 25.3):

SNS Updates
Many AWS customers use the Amazon Linux AMIs as a starting point for their own AMIs. If you do this and would like to kick off your build process whenever a new AMI is released, you can subscribe to an SNS topic:

You can be notified by email, invoke a AWS Lambda function, and so forth.

Available Now
Amazon Linux 2 is available now and you can start using it in the cloud and on-premises today! To learn more, read the Amazon Linux 2 LTS Candidate (2017.12) Release Notes.

Jeff;