Tag Archives: Amazon Elastic Container Service

Automatically update instances in an Amazon ECS cluster using the AMI ID parameter

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/automatically-update-instances-in-an-amazon-ecs-cluster-using-the-ami-id-parameter/

This post is contributed by Adam McLean – Solutions Developer at AWS and Chirill Cucereavii – Application Architect at AWS 

In this post, we show you how to automatically refresh the container instances in an active Amazon Elastic Container Service (ECS) cluster with instances built from a newly released AMI.

The Amazon ECS-optimized AMI  comes prepackaged with the ECS container agent, Docker agent, and the ecs-init upstart service. We recommend that you use the Amazon ECS-optimized AMI for your container instances unless your application requires any of the following:

  • A specific operating system
  • Custom security and monitoring agents installed
  • Root volumes encryption enabled
  • A Docker version that is not yet available in the Amazon ECS-optimized AMI

Regardless of the type of AMI that you choose, AWS recommends updating your ECS containers instance fleet with the latest AMI whenever possible. It’s easier than trying to patch existing instances in place.

Solution overview

In this solution, you deploy the ECS cluster and, specify cluster size, instance type, AMI ID, and other parameters. After the ECS cluster has been created and instances registered, you can update the ECS cluster with another AMI ID to trigger the following events:

  1. A new launch configuration is created using the new AMI ID.
  2. The Auto Scaling group adds one new instance using the new launch configuration. This executes the ‘Adding Instances’ process described below.
  3. The adding instances process finishes for the single new node with the new AMI. Then, the removing instances process is started against the oldest instance with the old AMI ID.
  4. After the removing nodes process is finished, steps 2 and 3 are repeated until all nodes in the cluster have been replaced.
  5. If an error is encountered during the rollout, the new launch configuration is deleted, and the old one is put back in place.

Scaling a cluster out (adding instances)

Take a closer look at each step in scaling out a cluster:

  1. A stack update changes the AMI ID parameter.
  2. CloudFormation updates the launch configuration and tells the Auto Scaling group to add an instance.
  3. Auto Scaling launches an instance using new AMI ID to join the ECS cluster.
  4. Auto Scaling invokes the Launch Lambda function.
  5. Lambda asks the ECS cluster if the newly launched instance has joined and is showing healthy.
  6. Lambda tells Auto Scaling whether the launch succeeded or failed.
  7. Auto Scaling tells CloudFormation whether the scale-up has succeeded.
  8. The stack update succeeds, or rolls back.

Scaling a cluster in (removing instances)

Take a closer look at each step in scaling in a cluster:

  1. CloudFormation tells the Auto Scaling group to remove an instance.
  2. Auto Scaling chooses an instance to be terminated.
  3. Auto Scaling invokes the Terminate Lambda function.
  4. The Lambda function performs the following tasks:
    1. Sets the instance to be terminated to DRAINING mode.
    2. Confirms that all ECS tasks are drained from the instance marked for termination.
    3. Confirms that the ECS cluster services and tasks are stable.
  5. Lambda tells Auto Scaling to proceed with termination.
  6. Auto Scaling tells CloudFormation whether the scale-in has succeeded.
  7. The stack update succeeds, or rolls back.

Solution technologies

Here are the technologies used for this solution, with more details.

  • AWS CloudFormation
  • AWS Auto Scaling
  • Amazon CloudWatch Events
  • AWS Systems Manager Parameter Store
  • AWS Lambda

AWS CloudFormation

AWS CloudFormation is used to deploy the stack, and should be used for lifecycle management. Do not directly edit Auto Scaling groups, Lambda functions, and so on. Instead, update the CloudFormation template.

This forces the resolution of the latest AMI, as well as providing an opportunity to change the size or instance type involved in the ECS cluster.

CloudFormation has rollback capabilities to return to the last known good state if errors are encountered. It is the recommended mechanism for management through the clusters lifecycle.

AWS Auto Scaling

For ECS, the primary scaling and rollout mechanism is AWS Auto Scaling. Auto Scaling allows you to define a desired state environment, and keep that desired state as necessary by launching and terminating instances.

When a new AMI has been selected, CloudFormation informs Auto Scaling that it should replace the existing fleet of instances. This is controlled by an Auto Scaling update policy.

This solution rolls a single instance out to the ECS cluster, then drain, and terminate a single instance in response. This cycle continues until all instances in the ECS cluster have been replaced.

Auto Scaling lifecycle hooks

Auto Scaling permits the use of a lifecycle hooks. This is code that executes when a scaling operation occurs. This solution uses a Lambda function that is informed when an instance is launched or terminated.

A lifecycle hook informs Auto Scaling whether it can proceed with the activity or if it should abandon it. In this case, the ECS cluster remains healthy and all tasks have been redistributed before allowing Auto Scaling to proceed.

Lifecycles also have a timeout. In this case, it is 3600 seconds (1 hour) before Auto Scaling gives up. In that case, the default activity is to abandon the operation.

Amazon CloudWatch Events

CloudWatch Events is a mechanism for watching calls made to the AWS APIs, and then activating functions in response. This is the mechanism used to launch the Lambda functions when a lifecycle event occurs. It’s also the mechanism used to re-launch the Lambda function when it times out (Lambda maximum execution time is 15 minutes).

In this solution, four CloudWatch Events are created. Two to pick up the initial scale-up event. Two more to pick up a continuation from the Lambda function.

AWS Systems Manager Parameter Store

AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration data management and secrets management.

This solution relies on the AMI IDs stored in Parameter Store. Given a naming standard of /ami/ecs/latest, this always resolves to the latest available AMI for ECS.

CloudFormation now supports using the values stored in Parameter Store as inputs to CloudFormation templates. The template can be simply passed a value—/ami/ecs/latest—and it resolve that to the latest AMI.

AWS Lambda

The Lambda functions are used to handle the Auto Scaling lifecycle hooks. They operate against the ECS cluster to assure it is healthy, and inform Auto Scaling that it can proceed, or to abandon its current operation.

The functions are invoked by CloudWatch Events in response to scaling operations so they are idle unless there are changes happening in the cluster.

They’re written in Python, and use the boto3 SDK to communicate with the ECS cluster, and Auto Scaling.

The launch Lambda function waits until the instance has fully joined the ECS cluster. This is shown by the instance being marked ‘ACTIVE’ by the ECS control plane, and it’s ECS agent status showing as connected. This means that the new instance is ready to run tasks for the cluster.

The terminate Lambda function waits until the instance has fully drained all running tasks. It also checks that all tasks, and services are in a stable state before allowing Auto Scaling to terminate an instance. This assures the instance is truly idle, and the cluster stable before an instance can be removed.

Deployment

Before you begin deployment, you need the following:

  • An AWS account 

You need an AWS account with enough room to accommodate the additional EC2 instances required by the ECS cluster.

  • (Optional) Linux system 

Use AWS CLI and optionally JQ to deploy the solution. Although a Linux system is recommended, it’s not required.

  • IAM user

You need an IAM admin user with permissions to create IAM policies and roles and create and update CloudFormation stacks. The user must also be able to deploy the ECS cluster, Lambda functions, Systems Manager parameters, and other resources.

  • Download the code

Clone or download the project from https://github.com/awslabs/ecs-cluster-manager on GitHub:

git clone [email protected]:awslabs/ecs-cluster-manager.git

AMI ID parameter

Create an Systems Manager parameter where the desired AMI ID is stored.

The first run does not use the latest ECS optimized AMI. Later, you update the ECS cluster to the latest AMI.

Use the AMI released on 2017.09. Run the following commands to create /ami/ecs/latest parameter in Parameter Store with a corresponding AMI value.

AMI_ID=$(aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux/amzn-ami-2017.09.l-amazon-ecs-optimized --region us-east-1 --query "Parameters[].Value" --output text | jq -r .image_id)
aws ssm put-parameter \
  --overwrite \
  --name "/ami/ecs/latest" \
  --type "String" \
  --value $AMI_ID \
  --region us-east-1 \
  --profile devAdmin

Substitute us-east-1 with your desired Region.

In the AWS Management Console, choose AWS Systems Manager, Parameter Store.

You should see the /ami/ecs/latest parameter that you just created.

Select the /ami/ecs/latest parameter and make sure that the AMI ID is present in parameter value. If you are using the us-east-1 Region, you should see the following value:

ami-aff65ad2

Upload the Lambda function code to Amazon S3

The Lambda functions are too large to embed in the CloudFormation template. Therefore, they must be loaded into an S3 bucket before CloudFormation stack is created.

Assuming you’re using an S3 bucket called ecs-deployment,  copy each Lambda function zip file as follows:

cd ./ecs-cluster-manager
aws s3 cp lambda/ecs-lifecycle-hook-launch.zip s3://ecs-deployment
aws s3 cp lambda/ecs-lifecycle-hook-terminate.zip s3://ecs-deployment

Refer to these when running your CloudFormation template later so that CloudFormation knows where to find the Lambda files.

Lambda function role

The Lambda functions that execute require read permissions to EC2, write permissions to ECS, and permissions to submit a result or heartbeat to Auto Scaling.

Create a new LambdaECSScaling IAM policy in your AWS account. Use the following JSON as the policy body:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:CompleteLifecycleAction",
                "autoscaling:DescribeScalingActivities",
                "autoscaling:RecordLifecycleActionHeartbeat",
                "ecs:UpdateContainerInstancesState",
                "ecs:Describe*",
                "ecs:List*"
            ],
            "Resource": "*"
        }
    ]
}

 

Now, create a new LambdaECSScalingRole IAM role. For Trusted Entity, choose AWS Service, Lambda. Attach the following permissions policies:

  • LambdaECSScaling (created in the previous step)
  • ReadOnlyAccess (AWS managed policy)
  • AWSLambdaBasicExecutionRole (AWS managed policy)

ECS cluster instance profile

The ECS cluster nodes must have an instance profile attached that allows them to speak to the ECS service. This profile can also contain any other permissions that they would require (Systems Manager for management and executing commands for example).

These are all AWS managed policies so you only add the role.

Create a new IAM role called EcsInstanceRole, select AWS Service → EC2 as Trusted Entity. Attach the following AWS managed permissions policies:

  • AmazonEC2RoleforSSM
  • AmazonEC2ContainerServiceforEC2Role
  • AWSLambdaBasicExecutionRole

The AWSLambdaBasicExecutionRole policy may look out of place, but this allows the instance to create new CloudWatch Logs groups. These permissions facilitate using CloudWatch Logs as the primary logging mechanism with ECS. This managed policy grants the required permissions without you needing to manage a custom role.

CloudFormation parameter file

We recommend using a parameter file for the CloudFormation template. This documents the desired parameters for the template. It is usually less error prone to do this versus using the console for inputting parameters.

There is a file called blank_parameter_file.json in the source code project. Copy this file to something new and with a more meaningful name (such as dev-cluster.json), then fill out the parameters.

The file looks like this:

[
  {
    "ParameterKey": "EcsClusterName",
    "ParameterValue": ""
  }, 
  {
    "ParameterKey": "EcsAmiParameterKey",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "IamRoleInstanceProfile",
    "ParameterValue": ""
  }, 
  {
    "ParameterKey": "EcsInstanceType",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "EbsVolumeSize",
    "ParameterValue": ""
  }, 
  {
    "ParameterKey": "ClusterSize",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "ClusterMaxSize",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "KeyName",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "SubnetIds",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "SecurityGroupIds",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "DeploymentS3Bucket",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "LifecycleLaunchFunctionZip",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "LifecycleTerminateFunctionZip",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "LambdaFunctionRole",
    "ParameterValue": ""
  }
]

Here are the details for each parameter:

  • EcsClusterName:  The name of the ECS cluster to create.
  • EcsAmiParameterKey:  The Systems Manager parameter that contains the AMI ID to be used. This defaults to /ami/ecs/latest.
  • IamRoleInstanceProfile:  The name of the EC2 instance profile used by the ECS cluster members. Discussed in the prerequisite section.
  • EcsInstanceType:  The instance type to use for the cluster. Use whatever is appropriate for your workloads.
  • EbsVolumeSize:  The size of the Docker storage setup that is created using LVM. ECS typically defaults to 100 GB.
  • ClusterSize:  The desired number of EC2 instances for the cluster.
  • ClusterMaxSize:  This value should always be double the amount contained in ClusterSize. CloudFormation has no ‘math’ operators or we wouldn’t prompt for this. This allows rolling updates to be performed safely by doubling the cluster size, then contracting back.
  • KeyName:  The name of the EC2 key pair to place on the ECS instance to support SSH.
  • SubnetIds: A comma-separated list of subnet IDs that the cluster should be allowed to launch instances into. These should map to at least two zones for a resilient cluster, for example subnet-a70508df,subnet-e009eb89.
  • SecurityGroupIds:  A comma-separated list of security group IDs that are attached to each node, for example sg-bd9d1bd4,sg-ac9127dca (a single value is fine).
  • DeploymentS3Bucket: This is the bucket where the two Lambda functions for scale in/scale out lifecycle hooks can be found.
  • LifecycleLaunchFunctionZip: This is the full path within the DeploymentS3Bucket where the ecs-lifecycle-hook-launch.zip contents can be found.
  • LifecycleTerminateFunctionZip:  The full path within the DeploymentS3Bucket where the ecs-lifecycle-hook-terminate.zip contents can be found.
  • LambdaFunctionRole:  The name of the role that the Lambda functions use. Discussed in the prerequisite section.

A completed parameter file looks like the following:

[
  {
    "ParameterKey": "EcsClusterName",
    "ParameterValue": "DevCluster"
  }, 
  {
    "ParameterKey": "EcsAmiParameterKey",
    "ParameterValue": "/ami/ecs/latest"
  },
  {
    "ParameterKey": "IamRoleInstanceProfile",
    "ParameterValue": "EcsInstanceRole"
  }, 
  {
    "ParameterKey": "EcsInstanceType",
    "ParameterValue": "m4.large"
  },
  {
    "ParameterKey": "EbsVolumeSize",
    "ParameterValue": "100"
  }, 
  {
    "ParameterKey": "ClusterSize",
    "ParameterValue": "2"
  },
  {
    "ParameterKey": "ClusterMaxSize",
    "ParameterValue": "4"
  },
  {
    "ParameterKey": "KeyName",
    "ParameterValue": "dev-cluster"
  },
  {
    "ParameterKey": "SubnetIds",
    "ParameterValue": "subnet-a70508df,subnet-e009eb89"
  },
  {
    "ParameterKey": "SecurityGroupIds",
    "ParameterValue": "sg-bd9d1bd4"
  },
  {
    "ParameterKey": "DeploymentS3Bucket",
    "ParameterValue": "ecs-deployment"
  },
  {
    "ParameterKey": "LifecycleLaunchFunctionZip",
    "ParameterValue": "ecs-lifecycle-hook-launch.zip"
  },
  {
    "ParameterKey": "LifecycleTerminateFunctionZip",
    "ParameterValue": "ecs-lifecycle-hook-terminate.zip"
  },
  {
    "ParameterKey": "LambdaFunctionRole",
    "ParameterValue": "LambdaECSScalingRole"
  }
]

Deployment

Given the CloudFormation template and the parameter file, you can deploy the stack using the AWS CLI or the console.

Here’s an example deploying through the AWS CLI. This example uses a stack named ecs-dev and a parameter file named dev-cluster.json. It also uses the --profile argument to assure that the CLI assumes a role in the right account for deployment. Use the corresponding Region and profile from your local ~/.aws/config file.

This command outputs the stack ID as soon as it is executed, even though the other required resources are still being created.

aws cloudformation create-stack \
  --stack-name ecs-dev \
  --template-body file://./ecs-cluster.yaml \
  --parameters file://./dev-cluster.json \
  --region us-east-1 \
  --profile devAdmin

Use the AWS Management Console to check whether the stack is done creating. Or, run the following command:

aws cloudformation wait stack-create-complete \
  --stack-name ecs-dev \
  --region us-east-1 \
  --profile devAdmin

Use the AWS Management Console to check whether the stack is done creating. Or, run the following command:

aws cloudformation wait stack-create-complete \
  --stack-name ecs-dev \
  --region us-east-1 \
  --profile devAdmin

After the CloudFormation stack has been created, go to the ECS console. and open the DevCluster cluster that you just created. There are no tasks running, although you should see two container instances registered with the cluster.

You also see a warning message indicating that the container instances are not running the latest version of Amazon ECS container agent. The reason is that you did not use the latest available version of the ECS-Optimized AMI.

Fix this issue by updating the container instances AMI.

Update the cluster instances AMI

Run the following commands to set the /ami/ecs/latest parameter to the latest AMI ID.

AMI_ID=$(aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux/recommended --region us-east-1 --query "Parameters[].Value" --output text | jq -r .image_id)

 aws ssm put-parameter \
   --overwrite \
   --name "/ami/ecs/latest" \
   --type "String" \
   --value $AMI_ID
   --region us-east-1 \
   --profile devAdmin

Make sure that the parameter value has been updated in the console.

To update your ECS cluster, run the update-stack command without changing any parameters. CloudFormation evaluates the value stored by /ami/ecs/latest. If it has changed, CloudFormation makes updates as appropriate.

aws cloudformation update-stack \
  --stack-name ecs-dev \
  --template-body file://./ecs-cluster.yaml \
  --parameters file://./dev-cluster.json \
  --region us-east-1 \
  --profile devAdmin

Supervising updates

We recommend supervising your updates to the ECS cluster while they are being deployed. This assures that the cluster remains stable. For the majority of situations, there is no manual intervention required.

  • Keep an eye on Auto Scaling activities. In the Auto Scaling groups section of the EC2 console, select the Auto Scaling group for a cluster and choose Activity History.
  • Keep an eye on the ECS instances to ensure that new instances are joining and draining instances are leaving. In the ECS console, choose Cluster, ECS Instances.
  • Lambda function logs help troubleshoot things that aren’t behaving as expected. In the Lambda console, select the LifeCycleLaunch or LifeCycleTerminate functions, and choose Monitoring, View logs in CloudWatch. Expand the logs for the latest executions and see what’s going on:

When you go back to the ECS cluster page, notice that the “Outdated Amazon ECS container agent” warning message has disappeared.

Select one of the cluster’s EC2 instance IDs and observe that the latest ECS optimized AMI is used.

Summary

In this post, you saw how to use CloudFormation, Lambda, CloudWatch Events, and Auto Scaling lifecycle hooks to update your ECS cluster instances with a new AMI.

The sample code is available on GitHub for you to use and extend. Contributions are always welcome!

 

Scheduling GPUs for deep learning tasks on Amazon ECS

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/scheduling-gpus-for-deep-learning-tasks-on-amazon-ecs/

This post is contributed by Brent Langston – Sr. Developer Advocate, Amazon Container Services

Last week, AWS announced enhanced Amazon Elastic Container Service (Amazon ECS) support for GPU-enabled EC2 instances. This means that now GPUs are first class resources that can be requested in your task definition, and scheduled on your cluster by ECS.

Previously, to schedule a GPU workload, you had to maintain your own custom configured AMI, with a custom configured Docker runtime. You also had to use custom vCPU logic as a stand-in for assigning your GPU workloads to GPU instances. Even when all that was in place, there was still no pinning of a GPU to a task. One task might consume more GPU resources than it should. This could cause other tasks to not have a GPU available.

Now, AWS maintains an ECS-optimized AMI that includes the correct NVIDIA drivers and Docker customizations. You can use this AMI to provision your GPU workloads. With this enhancement, GPUs can also be requested directly in the task definition. Like allocating CPU or RAM to a task, now you can explicitly request a number of GPUs to be allocated to your task. The scheduler looks for matching resources on the cluster to place those tasks. The GPUs are pinned to the task for as long as the task is running, and can’t be allocated to any other tasks.

I thought I’d see how easy it is to deploy GPU workloads to my ECS cluster. I’m working in the US-EAST-2 (Ohio) region, from my AWS Cloud9 IDE, so these commands work for Amazon Linux. Feel free to adapt to your environment as necessary.

If you’d like to run this example yourself, you can find all the code in this GitHub repo. If you run this example in your own account, be aware of the instance pricing, and clean up your resources when your experiment is complete.

Clone the repo using the following command:

git clone https://github.com/brentley/tensorflow-container.git

Setup

You need the latest version of the AWS CLI (for this post, I used 1.16.98):

echo “export PATH=$HOME/.local/bin:$HOME/bin:$PATH” >> ~/.bash_profile
source ~/.bash_profile
pip install --user -U awscli

Provision an ECS cluster, with two C5 instances, and two P3 instances:

aws cloudformation deploy --stack-name tensorflow-test --template-file cluster-cpu-gpu.yml --capabilities CAPABILITY_IAM                            

While AWS CloudFormation is provisioning resources, examine the template used to build your infrastructure. Open `cluster-cpu-gpu.yml`, and you see that you are provisioning a test VPC with two c5.2xlarge instances, and two p3.2xlarge instances. This gives you one NVIDIA Tesla V100 GPU per instance, for a total of two GPUs to run training tasks.

I adapted the TensorFlow benchmark Docker container to create a training workload. I use this container to compare the GPU scheduling and runtime.

When the CloudFormation stack is deployed, register a task definition with the ECS service:

aws ecs register-task-definition --cli-input-json file://gpu-1-taskdef.json

To request GPU resources in the task definition, the only change needed is to include a GPU resource requirement in the container definition:

            "resourceRequirements": [
                {
                    "type": "GPU",
                    "value": "1"
                }
            ],

Including this resource requirement ensures that the ECS scheduler allocates the task to an instance with a free GPU resource.

Launch a single-GPU training workload

Now you’re ready to launch the first GPU workload.

export cluster=$(aws cloudformation describe-stacks --stack-name tensorflow-test --query 
'Stacks[0].Outputs[?OutputKey==`ClusterName`].OutputValue' --output text) 
echo $cluster
aws ecs run-task --cluster $cluster --task-definition tensorflow-1-gpu

When you launch the task, the output shows the `guIds` values that are assigned to the task. This GPU is pinned to this task, and can’t be shared with any other tasks. If all GPUs are allocated, you can’t schedule additional GPU tasks until a running task with a GPU completes. That frees the GPU to be scheduled again.

When you look at the log output in Amazon CloudWatch Logs, you see that the container discovered one GPU: `/gpu0` and the training benchmark trained at a rate of 321.16 images/sec.

With your two p3.2xlarge nodes in the cluster, you are limited to two concurrent single GPU based workloads. To scale horizontally, you could add additional p3.2xlarge nodes. This would limit your workloads to a single GPU each.  To scale vertically, you could bump up the instance type,  which would allow you to assign multiple GPUs to a single task.  Now, let’s see how fast your TensorFlow container can train when assigned multiple GPUs.

Launch a multiple-GPU training workload

To begin, replace the p3.2xlarge instances with p3.16xlarge instances. This gives your cluster two instances that each have eight GPUs, for a total of 16 GPUs that can be allocated.

aws cloudformation deploy --stack-name tensorflow-test --template-file cluster-cpu-gpu.yml --parameter-overrides GPUInstanceType=p3.16xlarge --capabilities CAPABILITY_IAM

When the CloudFormation deploy is complete, register two more task definitions to launch your benchmark container requesting more GPUs:

aws ecs register-task-definition --cli-input-json file://gpu-4-taskdef.json  
aws ecs register-task-definition --cli-input-json file://gpu-8-taskdef.json 

Next, launch two TensorFlow benchmark containers, one requesting four GPUs, and one requesting eight GPUs:

aws ecs run-task --cluster $cluster --task-definition tensorflow-4-gpu
aws ecs run-task --cluster $cluster --task-definition tensorflow-8-gpu

With each task request, GPUs are allocated: four in the first request, and eight in the second request. Again, these GPUs are pinned to the task, and not usable by any other task until these tasks are complete.

Check the log output in CloudWatch Logs:

On the “devices” lines, you can see that the container discovered and used four (or eight) GPUs. Also, the total images/sec improved to 1297.41 with four GPUs, and 1707.23 with eight GPUs.

Because you can pin single or multiple GPUs to a task, running advanced GPU based training tasks on Amazon ECS is easier than ever!

Cleanup

To clean up your running resources, delete the CloudFormation stack:

aws cloudformation delete-stack --stack-name tensorflow-test

Conclusion

For more information, see Working with GPUs on Amazon ECS.

If you want to keep up on the latest container info from AWS, please follow me on Twitter and tweet any questions! @brentContained

Setting up AWS PrivateLink for Amazon ECS, and Amazon ECR

Post Syndicated from Nathan Peck original https://aws.amazon.com/blogs/compute/setting-up-aws-privatelink-for-amazon-ecs-and-amazon-ecr/

Amazon ECS and Amazon ECR now have support for AWS PrivateLink. AWS PrivateLink is a networking technology designed to enable access to AWS services in a highly available and scalable manner. It keeps all the network traffic within the AWS network. When you create AWS PrivateLink endpoints for ECR and ECS, these service endpoints appear as elastic network interfaces with a private IP address in your VPC.

Before AWS PrivateLink, your Amazon EC2 instances had to use an internet gateway to download Docker images stored in ECR or communicate to the ECS control plane. Instances in a public subnet with a public IP address used the internet gateway directly. Instances in a private subnet used a network address translation (NAT) gateway hosted in a public subnet. The NAT gateway would then use the internet gateway to talk to ECR and ECS.

Now that AWS PrivateLink support has been added, instances in both public and private subnets can use it to get private connectivity to download images from Amazon ECR. Instances can also communicate with the ECS control plane via AWS PrivateLink endpoints without needing an internet gateway or NAT gateway.

 

This networking architecture is considerably simpler. It enables enhanced security by allowing you to deny your private EC2 instances access to anything other than these AWS services. That’s assuming that you want to block all other outbound internet access for those instances. For this to work, you must create some AWS PrivateLink resources:

  • AWS PrivateLink endpoints for ECR. This allows instances in your VPC to communicate with ECR to download image manifests
  • Gateway VPC endpoint for Amazon S3. This allows instances to download the image layers from the underlying private Amazon S3 buckets that host them.
  • AWS PrivateLink endpoints for ECS. These endpoints allow instances to communicate with the telemetry and agent services in the ECS control plane.

This post explains how to create these resources.

Create an AWS PrivateLink interface endpoint for ECR

ECR requires two interface endpoints:

  • com.amazonaws.region.ecr.api
  • com.amazonaws.region.ecr.dkr

In the VPC console, create the interface VPC endpoints for ECR using the endpoint creation wizard. Choose AWS services and select an endpoint. Substitute your AWS Region of choice.

Next, specify the VPC and subnets to which the AWS PrivateLink interface should be added. Make sure that you select the same VPC in which your ECS cluster is running. To be on the safe side, select every Availability Zone and subnet from the list. Each zone has a list of the subnets available. You can select all the subnets in each Availability Zone.

However, depending on your networking needs, you might also choose to only enable the AWS PrivateLink endpoint in your private subnets from each Availability Zone. Let instances running in a public subnet continue to communicate with ECR via the public subnet’s internet gateway.

Next, enable Private DNS Name, which is required for the endpoint.

com.amazonaws.region.ecr.dkr.

A private hosted zone enables you to access the resources in your VPC using the Amazon ECR default DNS domain names. You don’t need to use the private IPv4 address or the private DNS hostnames provided by Amazon VPC endpoints. The Amazon ECR DNS hostname that the AWS CLI and Amazon ECR SDKs use by default (https://api.ecr.region.amazonaws.com) resolves to your VPC endpoint.

If you enabled a private hosted zone for com.amazonaws.region.ecr.api and you are using an SDK released before January 24, 2019, you must specify the following endpoint when using an SDK or the AWS CLI. Use the following command:

aws --endpoint-url https://api.ecr.region.amazonaws.com

If you don’t enable a private hosted zone, use the following command:

aws --endpoint-url https://VPC_Endpoint_ID.api.ecr.region.vpce.amazonaws.com ecr describe-repositories

If you enabled a private hosted zone and you are using the SDK released on January 24, 2019 or later, use the following command:

aws ecr describe-repositories

Lastly, specify a security group for the interface itself. This is going to control whether each host is able to talk to the interface. The security group should allow inbound connections on port 80 from the instances in your cluster.

You may have a security group that is applied to all the EC2 instances in the cluster, perhaps using an Auto Scaling group. You can create a rule that allows the VPC endpoint to be accessed by any instance in that security group.

Finally, choose Create endpoint. The new endpoint appears in the list.

Add a gateway VPC endpoint for S3

The next step is to create a gateway VPC endpoint for S3. This is necessary because ECR uses S3 to store Docker image layers. When your instances download Docker images from ECR, they must access ECR to get the image manifest and S3 to download the actual image layers.

S3 uses a slightly different endpoint type called a gateway. Be careful about adding an S3 gateway to your VPC if your application is actively using S3. With gateway endpoints, your application’s existing connections to S3 may be briefly interrupted while the gateway is being added. You may have a busy cluster with many active ECS deployments, causing image layer downloads from S3. Or, your application itself may make heavy usage of S3. In that case, it’s best to create a fresh new VPC with an S3 gateway, then migrate your ECS cluster and its containers into that VPC.

To add the S3 gateway endpoint, select com.amazonaws.region.s3 on the list of AWS services and select the VPC hosting your ECS cluster. Gateway endpoints are added to the VPC route table for the subnets. Select each route table associated with the subnet in which the S3 gateway should be.

Instead of using a security group, the gateway endpoint uses an IAM policy document to limit access to the service. This policy is similar to an IAM policy but does not replace the default level of access that your applications have through their IAM role. It just further limits what portions of the service are available via the gateway.

It’s okay to just use the default Full Access policy. Any restrictions you have put on your task IAM roles or other IAM user policies still apply on top of this policy. For information about a minimal access policy, see the Minimum Amazon S3 Bucket Permissions for Amazon ECR.

Choose Create to add this gateway endpoint to your VPC. When you view the route tables in your VPC subnets, you see an S3 gateway that is used whenever ECR Docker image layers are being downloaded from S3.

Create an AWS PrivateLink interface endpoint for ECS

In addition to downloading Docker images from ECR, your EC2 instances must also communicate with the ECS control plane to receive orchestration instructions.

ECS requires three endpoints:

  • com.amazonaws.region.ecs-agent
  • com.amazonaws.region.ecs-telemetry
  • com.amazonaws.region.ecs

Create these three interface endpoints in the same way that you created the endpoint for ECR, by adding each endpoint and setting the subnets and security group for the endpoint.

After the endpoints are created and added to your VPC, there is one additional step. Make sure that your ECS agent is upgraded to version 1.25.1 or higher. For more information, see the instructions for upgrading the ECS agent.

If you are already running the right version of the ECS agent, restart any ECS agents that are currently running in the VPC. The ECS agent uses a persistent web socket connection to the ECS backend and VPC endpoints do not interrupt existing connections. The agent continues to use its existing connection instead of establishing a new connection through the new endpoint, unless you restart it.

To restart the agent with no disruption to your application containers, you can connect using SSH to each EC2 instance in the cluster and issue the following command:

sudo docker restart ecs-agent

This restarts the ECS agent without stopping any of the other application containers on the host. Your application may be stateless and safe to stop at any time, or you may not have or want SSH access to the underlying hosts. In that case, choose to just reboot each EC2 instance in the cluster one at a time. This restarts the agent on that host while also restarting any service launched tasks on that host on a different host.

Conclusion

In this post, I showed you how to add AWS PrivateLink endpoints to your VPC for ECS and ECR, including an S3 gateway for ECR layer downloads.

The instances in your ECS cluster can communicate directly with the ECS control plane. They should be able to download Docker images directly without needing to make any connections outside of your VPC using an internet gateway or NAT gateway. All container orchestration traffic stays inside the VPC.

If you have questions or suggestions, please comment below.

Migrate Wildfly Cluster to Amazon ECS using Service Discovery

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/migrate-wildfly-cluster-to-ecs-using-service-discovery/

This post is courtesy of Vidya Narasimhan, AWS Solutions Architect

1. Overview

Java Enterprise Edition has been an important server-side platform for over a decade for developing mission-critical & large-scale applications amongst enterprises. High-availability & fault tolerance for such applications is typically achieved through built-in JEE clustering provided by the platform.

JEE clustering represents a group of machines working together to transparently provide enterprise services such as JNDI, EJB, JMS, HTTPSession etc. that enable distribution, discovery, messaging, transaction, caching, replication & component failover.  Implementation of clustering technology varies in JEE platforms provided by different vendors. Many of the clustering implementations involve proprietary communication protocols that use multicast for intra-cluster communications that is not supported in public cloud.

This article is relevant for JEE platforms & other products that use JGroups based clustering such as Wildfly. The solution described allows easy migration of applications developed on these platforms using native clustering to Amazon Elastic Container Service (Amazon ECS) which is a highly scalable, fast, container management service that makes it easy to orchestrate, run & scale Docker containers on a cluster. This solution is useful when the business objective is to migrate to cloud fast with minimum changes to the application. The approach recommends lift & shift to AWS wherein the initial focus is to migrate as-is with optimizations coming in later incrementally.

Whether the JEE application to be migrated is designed as a monolith or micro services, a legacy or green-field deployment, there are multiple reasons why organizations should opt for containerization of their application. This link explains well the benefits of containerization (see section Why Use Containers) https://aws.amazon.com/getting-started/projects/break-monolith-app-microservices-ecs-docker-ec2/module-one/

2. Wildfly Clustering on ECS

Here onwards, this article highlights how to migrate a standard clustered JEE app deployed on Wildfly Application Server to Amazon ECS. Wildfly supports clustering out of the box and supports two modes of clustering, standalone & domain mode. This article explores how to setup WildFly cluster in ECS with multiple Wildfly standalone nodes enabled for HA to form a cluster. The clustering is demonstrated through a web application that replicates session information across the cluster nodes and can withstand a failover without session data loss.

The important components of clustering that requires a mention right away are ECS Service Discovery, JGroups & Infinispan.

  • JGroups – Wildfly clustering is enabled by the popular open-source JGroups toolkit. The JGroups subsystem provides group communication support for HA services using a multicast transmission by default. It deals with all aspects of node discovery and providing reliable messaging between the nodes as follows-
    • Node-to-node messaging — By default is based on UDP/multicast that can be extended via TCP/unicast.
    • Node discovery — By default uses multicast ping MPING. Alternatives include TCPPING, S3_PING, JDBC_PING, DNS_PING and others.

This article focusses on DNS_PING for node discovery using TCP protocol.

ECS Service discovery – Amazon ECS service can optionally be configured to use Amazon ECS Service Discovery. Service discovery uses Amazon Route 53 auto naming API actions to manage DNS entries (A or SRV records) for service tasks, making them discoverable within your VPC. You can specify health check conditions in a service task definition and Amazon ECS will ensure that only healthy service endpoints are returned by a service lookup.

As your services scale up or down in response to load or container health, the Route 53 hosted zone is kept up to date.

Wildfly uses JGroups to discover the cluster nodes via DNS_PING discovery protocol that sends a DNS service endpoint query to the ECS service registry maintained in Route53.

  • Infinispan – Wildfly uses Infinispan subsystem to provides high-performance, clustered, transactional caching. In a clustered web application, Infinispan handles the replication of application data across the cluster by means of a replicated/distributed cache. Under the hood, it uses JGroups channel for data transmission within the cluster.

3. Implementation Instructions

Configure Wildfly

  • Modify Wildfly standalone configuration file – Standalone-HA.xml. The HA suffix implies high availability configuration.
  1.  Modify the JGroup Subsystem – Add a TCP Stack with DNS_Ping as the discovery protocol & configure the DNS Query endpoint. It is important to note that the DNS_QUERY matches the ECS  service endpoint when configuring the ECS service.  
  2. Change the JGroup default stack to point to the TCP Stack.                           
  3. Configure a custom Infinispan replicated cache to be used by the web app or use the default cache.      

Build the docker image & store it in Elastic Container Registry (ECR)

  1. Package the JBoss/Wildfly image with JEE application & Wildfly platform on Docker. Create a Dockerfile & include the following:
    1. Install the WildFly distribution & set permissions – This approach requires the latest Wildfly distribution 15.0.0.Final released recently.          
    2. Copy the modified Wildfly standalone-ha.xml to the container.
    3. Deploy the JEE web application. This simple web app is configured as distributable and uses Infinispan to replicate session information across cluster nodes. It displays a page containing the container IP/hostname, Session ID & session data & helps demonstrate session replication.                   
    4. Define a custom entrypoint, entrypoint.sh, to boot Wildfly with the specified bind IP addresses to its interfaces. The script gets the container metadata, extracts the container IP to bind it to Wildfly interfaces. This interface binding is an important step as it enables the application related network communication (web, messaging) between the containers.    
    5. Add the enrypoint.sh script to the image in the Dockerfile.                                        
    6. Build the container & push it to ECR repository. Amazon Elastic Container Registry (ECR) is a fully managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images.

The Wildly configuration files, the Dockerfile & the web app WAR file can be found at the Github link https://github.com/vidyann/Wildfly_ECS

Create ECS Service with service discovery

  • Create a Service using service discovery.
    • This link describe steps to set up a ECS task & service with service discovery https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-service-discovery.html#create-service-discovery-taskdef. Though the example in the link creates a Fargate cluster, you can create an EC2 based cluster as well for this example.
    • While configuring the task choose the network mode as AWSVPC. The task networking features provided by the AWSVPC network mode give Amazon ECS tasks the same networking properties as Amazon EC2 instances. Benefits of task networking can be found here – https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-networking.html
    • Tasks can be flagged as compatible with EC2, Fargate, or both. Here is what the cluster & service looks like:                             
    • When setting up the container details in task, use 8080 as the port, which is the default Wildfly port. This can be changed through WIldfly configuration. Enable the cloudwatch logs which captures Wildfly logging.
    • While configuring the ECS service, ensure that the service name & namespace should combine to form service endpoint that exactly matches the DNS_Query endpoint configured in Wildfly configuration file. The container security group should allow inbound traffic to port 8080. Here is what the service endpoint looks like:     
    • The route53 registry created by ECS is shown below. We see two DNS entries corresponding to the DNS endpoint myapp.sampleaws.com.              
    • Finally view the Wildfly logs in the console by clicking a task instance. You can check if clustering is enabled by looking for a log entry as below:            

Here we see that a Wildfly cluster was formed with two nodes(same as the pic in route 53).

Run the Web App in a browser

  • Spin up a windows instance in the VPC & open the web app in a browser. Below is a screenshot of the webapp:                                         
  • Open in different browsers & tabs & verify the Container IP & session ID. Now force shutdown a node by resizing the ECS service task instances to one. Note that though the container IP in the webapp changes, the session ID does not change and the webapp is available and the HTTP Session is alive thus demonstrating the session replication & failover amongst the clustering nodes.

4. Summary

Our goal here is to migrate the enterprise JEE apps to Amazon ECS by tweaking a few configurations but gaining immediately the benefits of containerization & orchestration managed by ECS. By delegating the undifferentiated heavy lifting of container management, orchestration, scaling to ECS, you can focus on improvising/re-architecting your application to micro-services oriented architecture. Please note that all the deployment procedures in this article can be fully automated via the AWS CI/CD services.

AWS Fargate Price Reduction – Up to 50%

Post Syndicated from Nathan Peck original https://aws.amazon.com/blogs/compute/aws-fargate-price-reduction-up-to-50/

AWS Fargate is a compute engine that uses containers as its fundamental compute primitive. AWS Fargate runs your application containers for you on demand. You no longer need to provision a pool of instances or manage a Docker daemon or orchestration agent. Because the infrastructure that runs your containers is invisible, you don’t have to worry about whether you have provisioned enough instances to run your containerized workload. You also don’t have to worry about whether you’re using those instances efficiently to avoid paying for resources that you don’t use. You no longer need to do undifferentiated heavy lifting to maintain the infrastructure that runs your containers. AWS Fargate automatically updates and patches underlying resources to keep you safe from vulnerabilities in the underlying operating system and software. AWS Fargate uses an on-demand pricing model that charges per vCPU and per GB of memory reserved per second, with a 1-minute minimum.

At re:Invent 2018 we announced Firecracker, an open source virtualization technology that is purpose-built for creating and managing secure, multi-tenant containers and functions-based services. Firecracker enables you to deploy workloads in lightweight virtual machines called microVMs. These microVMs can initiate code faster, with less overhead. Innovations such as these allow us to improve the efficiency of Fargate and help us pass on cost savings to customers.

Effective January 7th, 2019 Fargate pricing per vCPU per second is being reduced by 20%, and pricing per GB of memory per second is being reduced by 65%. Depending on the ratio of CPU to memory that you’re allocating for your containers, you could see an overall price reduction of anywhere from 35% to 50%.

The following table shows the price reduction for each built-in launch configuration.

vCPU GB Memory Effective Price Cut
0.25 0.5 -35.00%
0.25 1 -42.50%
0.25 2 -50.00%
0.5 1 -35.00%
0.5 2 -42.50%
0.5 3 -47.00%
0.5 4 -50.00%
1 2 -35.00%
1 3 -39.30%
1 4 -42.50%
1 5 -45.00%
1 6 -47.00%
1 7 -48.60%
1 8 -50.00%
2 4 -35.00%
2 5 -37.30%
2 6 -39.30%
2 7 -41.00%
2 8 -42.50%
2 9 -43.80%
2 10 -45.00%
2 11 -46.10%
2 12 -47.00%
2 13 -47.90%
2 14 -48.60%
2 15 -49.30%
2 16 -50.00%
4 8 -35.00%
4 9 -36.20%
4 10 -37.30%
4 11 -38.30%
4 12 -39.30%
4 13 -40.20%
4 14 -41.00%
4 15 -41.80%
4 16 -42.50%
4 17 -43.20%
4 18 -43.80%
4 19 -44.40%
4 20 -45.00%
4 21 -45.50%
4 22 -46.10%
4 23 -46.50%
4 24 -47.00%
4 25 -47.40%
4 26 -47.90%
4 27 -48.30%
4 28 -48.60%
4 29 -49.00%
4 30 -49.30%

Many engineering organizations such as Turner Broadcasting System, Veritone, and Catalytic have already been using AWS Fargate to achieve significant infrastructure cost savings for batch jobs, cron jobs, and other on-and-off workloads. Running a cluster of instances at all times to run your containers constantly incurs cost, but AWS Fargate stops charging when your containers stop.

With these new price reductions, AWS Fargate also enables significant savings for containerized web servers, API services, and background queue consumers run by organizations like KPMG, CBS, and Product Hunt. If your application is currently running on large EC2 instances that peak at 10-20% CPU utilization, consider migrating to containers in AWS Fargate. Containers give you more granularity to provision the exact amount of CPU and memory that your application needs. You no longer pay for instance resources that your application doesn’t use. If a sudden spike of traffic causes your application to require more resources you still have the ability to rapidly scale your application out by adding more containers, or scale your application up by launching larger containers.

AWS Fargate lets you focus on building your containerized application without worrying about the infrastructure. This encompasses not just the infrastructure capacity provisioning, monitoring, and maintenance but also the infrastructure price. Implementing Firecracker in AWS Fargate is just part of our journey to keep making AWS Fargate faster, more powerful, and more efficient. Running your containers in AWS Fargate allows you to benefit from these improvements without any manual intervention required on your part.

AWS Fargate has achieved SOC, PCI, HIPAA BAA, ISO, MTCS, C5, and ENS High compliance certification, and has a 99.99% SLA. You can get started with AWS Fargate in 13 AWS Regions around the world.

Amazon ECS Task Placement

Post Syndicated from tiffany jernigan (@tiffanyfayj) original https://aws.amazon.com/blogs/compute/amazon-ecs-task-placement/

Intro

Amazon Elastic Container Service (ECS) is a highly scalable, high-performance container orchestration service that allows you to easily run and scale containerized applications on AWS. This post covers how Amazon Elastic Container Service (Amazon ECS) runs containers in a cluster. Topics include why AWS built the task placement engine, the different strategies and constraints available to decide where and how containers are run, and things to consider when picking placement strategies.

If you are not familiar with the relationship between ECS and Amazon EC2 or its components, see the Building Blocks of Amazon ECS post.

Task Placement

When a task is launched in a cluster, a decision has to be made to choose which container instance should run that task. Conversely, when scaling down a service, a decision has to be made to choose the specific task to be terminated.

Task placement

By default, ECS uses the following placement strategies:

  • When you run tasks with the RunTask API action, tasks are placed randomly in a cluster.
  • When you launch and terminate tasks with the CreateService API action, the service scheduler spreads the tasks across the Availability Zones (and the instances within the zones) in a cluster.

Before December 2016, tasks could only be placed by their default placement strategies. This meant making the decision yourself, such as writing your own scheduler, and calling the StartTask API action to achieve custom task placement. When you manually constrained the placement of your grouping of containers, you could only place based on CPU, memory, and ports. Additionally, while creating your own scheduler can be powerful, there’s a tradeoff with complexity.

AWS built the task placement engine, which removes the need for you to build, run, and manage your own scheduling and placement services. There are several new features that provide you with more control over how applications run across clusters through custom attributes.

You can think of this flow as a funnel with filters for your instances. Constraints must be obeyed. If an instance doesn’t fit, it isn’t used. Strategies are then used to sort the rest of the instances by preference to determine which are the “best.”

For every instantiation of your task, it runs through every step. Calling run-task with a count of n is effectively calling run-task n times (create-service also works the same way).

Cluster Constraints, Placement Constraints, Placement Strategies

Example

Here’s how to use these placement features. In this example, you use the AWS CLI run-task command. For the last couple of filters, I show how to use them with placement flags, but you can just as easily include them in your task definition file instead. This can all be done in the console as well. Start with the cluster shown earlier:

Task Placement Instances

aws ecs run-task --task-definition nouvelleApp \
--placement-constraints type="memberOf",expression="attribute:ecs.instance-type == t2.small" \
--placement-strategy --placement-strategy type="binpack",field="memory" \
--count 8

Cluster constraints

In the first step, eliminate all the instances that don’t have the required resources based on what you defined either in the JSON task definition or what you provided overrides for to RunTask.

Not enough CPU? Not enough memory? A port is needed, but it is already in use on that instance? Then the instance is eliminated from the set of valid candidates.

Task Placement Cluster Constraints

aws ecs run-task --task-definition nouvelleApp

Placement constraints

In the second step, keep only the instances that satisfy the attribute or task group constraints. Yes, this means that you can indicate what instance to use for a task (for example, to make sure that CPU-intensive jobs are scheduled on the right type of instance, or in which Availability Zone).

You can also create any custom tags of your choosing. The green tasks on the green instances, the blue tasks on the blue instances! You can also use the Cluster Query Language to write expressions to check for multiple attributes. In the next section, I cover how to write and use the attributes and expressions.

Placement Constraints

--placement-constraints type="memberOf",expression="attribute:ecs.instance-type == t2.micro"

Placement strategies

In the third step, filter on the following supported task placement strategies:

  • random
  • binpack
  • spread

By default, tasks are randomly placed with RunTask or spread across Availability Zones with CreateService. Spread is typically used to achieve high availability by making sure that multiple copies of a task are scheduled across multiple instances based on attributes such as Availability Zones.

Conversely, binpack places tasks together to be as cost-efficient as possible. Later in this post, you’ll see how these placement strategies work, as well as how to chain them together and why you may want to do so.

Task Placement Binpack

--placement-strategy type="binpack",field="memory"

Task copies

This isn’t part of the filter, but instead, the count flag is used to indicate how many copies (n) of a given task to run. Effectively, it tells ECS to re-run this workflow n times. By default, the count is set to 1, so run-task is executed one time. For services, the desired-count flag is used.

--count 8

Attributes, task groups, and expressions

For task placement, you can use instance fields, such as attributes, as well as task groups. These can be used in expressions for task placement constraints, or instance fields can be used standalone for task placement strategies. Here’s a quick overview of attributes, task groups, and expressions before you go any further.

Instance: Fields

Because you are using these fields with respect to instances in task placement, the instance: preface is optional and can be used either of the following ways with a field name or an attribute.

instance:<field>
<field>

Field names

The currently supported field names are as follows:

ec2InstanceId
agentConnected

Attributes

There are also instance attributes, which are prefaced with attribute. Again, instance: is optional:

attribute:<attribute-name>

Built-in attributes

The following are some of the provided attributes:

ecs.ami-id
ecs.availability-zone
ecs.instance-type
ecs.os-type
ecs.subnet-id
ecs.vpc-id

Custom attributes

Well, what if you don’t see an attribute that you want? This is where custom attributes come in handy! Want to differentiate between test and prod? What about blue versus green?

aws ecs put-attributes \
--attributes name=color,value=blue,targetId=<your-container-instance-arn>

Task groups

In addition to placing tasks based on attributes, you can use task groups. Every task is assigned a group ID that you can reference in placement. For both tasks and services, a default ID is given, or you can choose your own. Perhaps you want to run version 2 of a service but only on instances with version 1.

task:group

Expressions

Alright, so you have some attributes and task groups… now what? Well, AWS created the Cluster Query Language to make it easy to create expressions for task placement constraints. These attributes and task groups are used with the available comparison operators, which may look familiar if you’ve used Boolean operators before. Some of these operators can be written in multiple ways, such as “!” or “not”.

For instance, to create an expression using a single attribute to select only t2.micro instances, use the ecs.instance-type attribute and the string equality comparator as follows:

attribute:ecs.instance-type == t2.micro

For t2.micro and t2.nano instances, you have a few options. You could use the same syntax as earlier with the or comparator:

attribute:ecs.instance-type == t2.micro or attribute:ecs.instance-type == t2.nano

Another way is to use the in comparator with an argument list:

attribute:ecs.instance-type in [t2.micro, t2.nano]

To include all t2 instances, use a wildcard and the pattern match operator instead of listing out each one:

attribute:ecs.instance-type =~ t2.*

Task group comparisons work the same way. The following snippet selects any instance upon which the task group “database” is running:

task:group == database

To select only task groups that are not “database,” combine expressions:

not(task:group == database)

You can use these expressions to filter your instances:

aws ecs list-container-instances \
--filter "attribute:ecs.instance-type != t2.micro"
aws ecs list-container-instances \
--filter "attribute:color == blue"
aws ecs list-container-instances \
--filter "task:group == database"

These expressions and attributes, respectively, are also used for task placement constraints and strategies, which I cover in the next few sections.

Constraints

Now look at placement constraints. When determining task placement, there may be certain EC2 instances to include or exclude from running containers. For example, you may want to place tasks only on GPU types.

Task placement constraints let you define where your containers should run across your cluster. ECS currently supports two types of placement constraints: distinctInstance and memberOf. By default, ECS spreads tasks across Availability Zones and instances.

  "placementConstraints": [ 
      { 
         "expression": "string",
         "type": "string"
      }
   ],

Distinct Instance

Distinct InstanceThe distinctInstance constraint makes it possible to ensure that every container is started on a unique instance in your cluster. The distinctInstance constraint never places multiple copies of a task on a single instance, even if you request more running tasks than available instances.

For example if you decide to place five copies of a task, each time it filters out the instances that are already running the task.

aws ecs run-task --task-definition nouvelleApp \
--count 5 --placement-constraints type="distinctInstance"

Member of

Member of t2-micro The memberOf constraint describes a set of instances on which your tasks should run. It is for anything you could define as an attribute or task. It also takes in an expression of attributes written in the Cluster Query Language.

For example, if you have a small application and just want it to run on t2.micro instances:

aws ecs run-task --task-definition nouvelleApp \
--count 5 \
--placement-constraints 
type="memberOf",expression="attribute:ecs.instance-type == t2.micro"

You can create expressions using the Cluster Query Language to check for multiple attributes. Here’s how you can weed out all instances in the us-west-2c Availability Zone as well as instances that aren’t of type t2.nano or t2.micro:

aws ecs run-task --task-definition nouvelleApp \
--count 5 \
--placement-constraints type="memberOf",expression="attribute:ecs.availability-zone != us-west-2c and (attribute:ecs.instance-type == t2.nano or attribute:ecs.instance-type == t2.micro)"

Member of affinity

You can also use constraints to place all tasks with the same task group on the same instance (affinity):

aws ecs run-task --task-definition nouvelleApp \
--count 5 --group webserver \
--placement-constraints type=memberOf,expression="task:group == webserver"

Or you can ensure that instances never have more than one task in the same group (anti-affinity):

aws ecs run-task –task-definition nouvelleApp –count 5 –group webserver –placement-constraints type=memberOf,expression=”not(task:group == webserver)”

Strategies

Now look at placement strategies. Placement strategies are used to identify an instance that meets a specific strategy. ECS supports three task placement strategies:

  • random
  • binpack
  • spread

Random is how RunTask places tasks by default and is fairly straightforward (it doesn’t require further parameters). The two other strategies, binpack and spread, take opposite actions. Binpack places tasks on as few instances as possible, helping to optimize resource utilization, while spread places tasks evenly across your cluster to help maximize availability. By default, ECS uses spread with the ecs.availability-zone attribute to place tasks.

   "placementStrategy": [ 
      { 
         "field": "string",
         "type": "string"
      }
   ],

Random

Placement Random

 Random places tasks on instances at random. This still honors the other constraints that you specified, implicitly or explicitly. Specifically, it still makes sure that tasks are scheduled on instances with enough resources to run them.

aws ecs run-task --task-definition nouvelleApp \
--count 5 \
--placement-strategy type="random"

Bin packing

Placement Binpack

The binpack strategy tries to fit your workloads in as few instances as possible. It gets its name from the bin packing problem where the goal is to fit objects of various sizes in the smallest number of bins. It is well suited to scenarios for minimizing the number of instances in your cluster, perhaps for cost savings, and lends itself well to automatic scaling for elastic workloads, to shut down instances that are not in use.

When you use the binpack strategy, you must also indicate if you are trying to make optimal use of your instances’ CPU or memory. This is done by passing an extra field parameter, which tells the task placement engine which parameter to use to evaluate how “full” your “bins” are. It then chooses the instance with the least available CPU or memory (depending on which you pick). If there are multiple instances with this CPU or memory remaining, it chooses randomly.

aws ecs run-task --task-definition nouvelleApp \
--count 8 --placement-strategy type="binpack",field="cpu"

aws ecs run-task --task-definition nouvelleApp \
--count 8 --placement-strategy type="binpack",field="memory"

Spread

Placement Spread

The spread strategy, contrary to the binpack strategy, tries to put your tasks on as many different instances as possible. It is typically used to achieve high availability and mitigate risks, by making sure that you don’t put all your task-eggs in the same instance-baskets. Spread across Availability Zones, therefore, is the default placement strategy used for services.

When using the spread strategy, you must also indicate a field parameter. It is used to indicate the “bins” that you are considering. The accepted values are instanceID to balance tasks across all instances, host, or attribute key:value pairs such as attribute:ecs.availability-zone to balance tasks across zones. There are several AWS attributes that start with the “ecs” prefix, but you can be creative and create your own attributes.

aws ecs run-task --task-definition nouvelleApp \
--count 8 \
--placement-strategy type="spread",field="attribute:ecs.availability-zone"

Chaining placement strategies

Placement binpack spread

Now that you’ve seen how to use task placement strategies, you can also chain multiple task placement strategies with their respective attributes together. You can have up to five strategy rules per service. Perhaps you want to spread tasks across Availability Zones and binpack:

aws ecs run-task --task-definition nouvelleApp \
--count 8 \
--placement-strategy type="spread",field="attribute:ecs.availability-zone" type="binpack",field="memory"

Use cases

Here are some use cases for task placement so you can see how they can be solved by combining attributes, expressions, constraints, and strategies.

Task creation

Mariya is fairly new to using containers and especially container orchestrators. She wants to try ECS and has a simple application that she first wants to get running on a single node. (Solution: Use the RunTask API.)

aws ecs run-task --task-definition nouvelleApp

Scaling

After trying this, Mariya wants to scale her application to run 10 containers across any available nodes in her cluster. (Solution: This means she needs to run a task using either random or spread placement strategies.)

aws ecs run-task --task-definition nouvelleApp \
--count 10 \
--placement-strategy type="random"

Availability

Mariya then realizes that if she wants her tasks to automatically restart themselves if they fail, or if she wants more than 10 instantiations of her task running, she needs to create a service. (Solution: Create a service.)

aws ecs create-service --task-definition nouvelleApp \
--desiredCount 300 --placement-strategy type="random"

Christopher wants to achieve high availability by distributing his tasks amongst all the instances in his cluster so he minimizes impact if any one host goes down. (Solution: To do this he uses spread placement over host name.)

aws ecs run-task --task-definition nouvelleApp \
--count 9 \
--placement-strategy type="spread",field="host"

Ming-ya wants to run a monitoring container on each instance in her cluster. To help her do this, she creates a service with a high desired count and a distinctInstance placement constraint. The ECS service scheduler ensures that each instance in the cluster runs this task (up to the desired count).

aws ecs create-service --service-name monitoring \
--task-definition monitor \
--desiredCount 500 \
--placement-constraints type="distinctInstance"

Availability and Task Groups

Alex wants to run a fleet of webservers. For performance reasons, they want each webserver to have local access to a caching process that was written by another team. They define their webserver as one task, the caching server as a second task. When they launch their webserver task they uses a placement constraint so that the tasks are only placed on instances that are already hosting the cache task. (Solution: Use placement constraints with a task group.)

aws ecs run-task --task-definition cache \
--group caching --count 9 \
--placement-constraints type="distinctInstance"

aws ecs run-task --task-definition webserver \
--count 9 \
--placement-constraints type="distinctInstance" type="memberOf",expression="task:group == caching"

Availability and resource optimization

Jake wants to achieve high availability, but he has a limited budget and needs to optimize all the resources he uses. (Solution: Take a balanced approach of spreading over availability Availability Zones and binpacking on memory within a zone.)

aws ecs run-task --task-definition nouvelleApp \
--count 9 \
--placement-strategy type="spread",field="attribute:ecs.availability-zone" type="binpack",field="memory"

Instance type selection

Aditya has a GPU workload that they want to run in containers on ECS. He needs to ensure that only GPU-enabled instances are used for this workload. (Solution: Create a service and spread on instance type = G2* or whatever other GPU-enabled instance types are in the cluster)

aws ecs create-service --service-name workload \
--task-definition GPU --desiredCount 30 \
--placement-constraints type="memberOf",expression="attribute:ecs.instance-type =~ g2* or attribute:ecs.instance-type =~ p2*"

Conclusion

You’ve now looked at task placement at a high level, as well as:

  • Attributes, task groups, and expressions
  • Constraints
  • Strategies
  • Example use cases

To dive deeper into any of these aspects, check out Task Placement. Also, feel free to ask any questions!

@tiffanyfayj

 

Getting started with the AWS Cloud Development Kit for Amazon ECS

Post Syndicated from Nathan Peck original https://aws.amazon.com/blogs/compute/getting-started-with-the-aws-cloud-development-kit-for-amazon-ecs/

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to define cloud infrastructure in code and provision it through AWS CloudFormation. The AWS CDK integrates fully with AWS services and offers a higher-level object-oriented abstraction to define AWS resources imperatively.

Using the AWS CDK library of infrastructure constructs, you can easily encapsulate AWS best practices in your infrastructure definition and share it without worrying about boilerplate logic. The AWS CDK improves the end-to-end development experience because you get to use the power of modern programming languages to define your AWS infrastructure in a predictable and efficient manner. The AWS CDK is currently available for TypeScript, JavaScript, Java, and .NET.

The AWS CDK now includes constructs for ECS resources, allowing you to deploy a fully functioning containerized application environment on AWS with just a few lines of simple, readable code. Here’s how it works.

Install the AWS CDK

The first step is to install the AWS CDK on your development machine:

mkdir greeter-cdk
cd greeter-cdk
npm init -y
npm install @aws-cdk/cdk
npm install -g aws-cdk

Next, write some JavaScript code that imports the AWS CDK library and uses it to define a skeleton that you can place all your resources in:

index.js

const cdk = require('@aws-cdk/cdk');

class GreetingStack extends cdk.Stack {
  constructor(parent, id, props) {
    super(parent, id, props);
  }
}

class GreetingApp extends cdk.App {
  constructor(argv) {
    super(argv);
    new GreetingStack(this, 'greeting-stack');
  }
}

new GreetingApp().run();

Next, write a small configuration file telling the AWS CDK CLI that index.js is the code that defines your application stack:

cdk.json

{
  "app": "node index.js"
}

Now if you run cdk ls –l, you can see that the AWS CDK has found your stack, and has automatically interpolated some details about it from your development machine’s environment, such as your AWS account ID and default Region:

$ cdk ls -l
- name: greeting-stack
  environment:
    name: 209640446841/us-east-1
    account: '209640446841'
    region: us-east-1

Add ECS constructs

It’s time to add some ECS constructs to your stack. To do this, first install the ECS construct library. Also, install a couple of other constructs to help you set up resources linked to your containers:

npm install @aws-cdk/aws-ecs
npm install @aws-cdk/aws-ec2
npm install @aws-cdk/aws-elasticloadbalancingv2

Now it’s time to use the ECS constructs to set up your application environment:

const cdk = require('@aws-cdk/cdk');
const ecs = require('@aws-cdk/aws-ecs');
const ec2 = require('@aws-cdk/aws-ec2');

class GreetingStack extends cdk.Stack {
  constructor(parent, id, props) {
    super(parent, id, props);

    const vpc = new ec2.VpcNetwork(this, 'GreetingVpc', { maxAZs: 2 });

    // Create an ECS cluster
    const cluster = new ecs.Cluster(this, 'Cluster', { vpc });

    // Add capacity to it
    cluster.addDefaultAutoScalingGroupCapacity({
      instanceType: new ec2.InstanceType('t3.xlarge'),
      instanceCount: 3
    });
  }
}

With just three calls, you can create a VPC to hold all your application resources, and an ECS cluster with three t3.xlarge instances. All it takes is one command to tell the AWS CDK to automatically deploy this stack on your account:

cdk deploy

Behind the scenes, the AWS CDK synthesizes your JavaScript calls into a CloudFormation template. It asks CloudFormation to deploy the resources described in the synthesized template. You can see a live log of each resource that is being created and what the status is. As you can see from the numbers on the left side of the message stream, those three simple commands added to the AWS CDK stack automatically expanded into 32 lower-level, primitive resources to be created on your AWS account.

After the AWS CDK deployment finishes, you have a fresh ECS cluster ready to run your services. Next you will deploy a simple microservices stack onto this cluster. Your application will be a simple greeting server. The frontend greeter service fetches a random greeting and name from two backend services. There are two tiers to this application: the frontend and backend. The network will look like the following diagram:

There are two load balancers, one of them allows anyone on the internet to talk to your greeter service. The other is internal and designed to allow the greeter service to talk to the other greeting and name services.

In total, you need to add five more high-level constructs to your AWS CDK application: two load balancers and three services.

Add a new ECS service

Adding a new ECS service to the application stack is easy. Define a task definition, add a container to it, and tell the AWS CDK to turn the task definition into a service:

// Name service
const nameTaskDefinition = new ecs.Ec2TaskDefinition(this, 'name-task-definition', {});

const nameContainer = nameTaskDefinition.addContainer('name', {
   image: ecs.ContainerImage.fromDockerHub('nathanpeck/name'),
   memoryLimitMiB: 128
});

nameContainer.addPortMappings({
   containerPort: 3000
});

const nameService = new ecs.Ec2Service(this, 'name-service', {
    cluster: cluster,
    desiredCount: 2,
     taskDefinition: nameTaskDefinition
});

// Greeting service
 const greetingTaskDefinition = new ecs.Ec2TaskDefinition(this, 'greeting-task-definition', {});

  const greetingContainer = greetingTaskDefinition.addContainer('greeting', {
    image: ecs.ContainerImage.fromDockerHub ('nathanpeck/greeting'),
    memoryLimitMiB: 128
  });

  greetingContainer.addPortMappings({
    containerPort: 3000
  });

  const greetingService = new ecs.Ec2Service(this, 'greeting-service', {
    cluster: cluster,
    desiredCount: 2,
    taskDefinition: greetingTaskDefinition
  });

Just like that, you’ve defined two different ECS services that run by loading a public image from Docker Hub. The next step is to create a load balancer and add the services to it:

// Internal load balancer for the backend services
    const internalLB = new elbv2.ApplicationLoadBalancer(this, 'internal', {
      vpc: vpc,
      internetFacing: false
    });

    const internalListener = internalLB.addListener('PublicListener', { port: 80, open: true });

    internalListener.addTargetGroups('default', {
      targetGroups: [new elbv2.ApplicationTargetGroup(this, 'default', {
        vpc: vpc,
        protocol: 'HTTP',
        port: 80
      })]
    });

    internalListener.addTargets('name', {
      port: 80,
      pathPattern: '/name*',
      priority: 1,
      targets: [nameService]
    });

    internalListener.addTargets('greeting', {
      port: 80,
      pathPattern: '/greeting*',
      priority: 2,
      targets: [greetingService]
    });

For this configuration, the code defines a single load balancer with a single listener on port 80, but adds two different services behind it. If the path of the request looks like /name, it sends the request to your name service. If it looks like /greeting, it sends the request to the greeting service.

Finally, add the frontend greeter service, which constructs a random greeting phrase by fetching a random name from the name service and a random greeting from the greeting service. To do this, configure the greeter service to know how to make requests to the other two backend services:

    // Greeter service
    const greeterTaskDefinition = new ecs.Ec2TaskDefinition(this, 'greeter-task-definition', {});

    const greeterContainer = greeterTaskDefinition.addContainer('greeter', {
      image: ecs.ContainerImage.fromDockerHub ('nathanpeck/greeter'),
      memoryLimitMiB: 128,
      environment: {
        GREETING_URL: 'http://' + internalLB.dnsName + '/greeting',
        NAME_URL: 'http://' + internalLB.dnsName + '/name'
      }
    });

    greeterContainer.addPortMappings({
      containerPort: 3000
    });

    const greeterService = new ecs.Ec2Service(this, 'greeter-service', {
      cluster: cluster,
      desiredCount: 2,
      taskDefinition: greeterTaskDefinition
    });

The AWS CDK has a powerful capability to resolve expressions that you enter in your JavaScript and turn them into a CloudFormation template that resolves the correct values.

In this example, you create a reference to the DNS name of the load balancer, and indicate that you want to assign the following:

·    Environment variable = 'NAME_URL'

·    Value = 'http://' + internalLB.dnsName + '/name'

If you run cdk synth, you can see that the AWS CDK generates a CloudFormation template that dynamically inserts the proper DNS name of the load balancer at deployment time:

Type: 'AWS::ECS::TaskDefinition'
    Properties:
      ContainerDefinitions:
        - Environment:
            - Name: GREETING_URL
              Value:
                'Fn::Join':
                  - ''
                  - - 'http://'
                    - 'Fn::GetAtt':
                        - internal505AC855
                        - DNSName
                    - /greeting
            - Name: NAME_URL
              Value:
                'Fn::Join':
                  - ''
                  - - 'http://'
                    - 'Fn::GetAtt':
                        - internal505AC855
                        - DNSName
                    - /name

One final thing to add to your AWS CDK stack is an output. This gives you the DNS name of your service so you can send traffic to it:

new cdk.Output(this, 'ExternalDNS', { value: externalLB.dnsName });

Now see what is added when you deploy. Type the following command to see a preview of new or modified resources without actually doing the deployment:

cdk diff

The list of new resources being added looks good so run cdk deploy again. Again, the AWS CDK synthesizes the CloudFormation template, and initializes its deployment on your AWS account. This time, however, it creates a total of 66 resources, and gives you a URL output where the application is hosted:

To verify that your application is up and accepting traffic at that URL, load that internet ExternalDNS URL in your browser. The web application was able to talk to the two other backend services to get a greeting and a name:

Conclusion

If you’d like to try deploying this microservice stack yourself or using it as the basis for building your own AWS CDK stack, you can find the full AWS CDK example code on GitHub. Be sure to check out the AWS CDK documentation and the official AWS CDK construct for Amazon ECS on NPM.

AWS Cloud Map: Easily create and maintain custom maps of your applications

Post Syndicated from Abby Fuller original https://aws.amazon.com/blogs/aws/aws-cloud-map-easily-create-and-maintain-custom-maps-of-your-applications/

Companies are increasingly building their applications as microservices (many separate services that each do a single job). Microservices often allow companies to iterate and deploy more quickly. Many of these microservice-based modern applications are built using various types of cloud resources and deployed on dynamically changing infrastructure. Previously you had to use configuration files to manage the location of your application resource. However, dependencies in a microservices-based application can quickly become too complex to easily manage through configuration files. Additionally, many applications are built using containers that scale dynamically, reacting on the changes in traffic load. That increases your application responsiveness, but poses a new class of problem – now your application components need to discover and connect to the upstream services at runtime. This problem of connectivity in dynamically changing infrastructures and microservices is commonly addressed by service discovery.

Introducing AWS Cloud Map

 

AWS Cloud Map keeps track of all your application components, their locations, attributes and health status. Now your applications can simply query AWS Cloud Map using AWS SDK, API or even DNS to discover the locations of its dependencies. That allows your applications to scale dynamically and connect to upstream services directly, increasing the responsiveness of your applications.

When you register your web services and cloud resources in AWS Cloud Map, you can describe them using custom attributes, such as deployment stage and version. Your applications then can make discovery calls specifying the required deployment stage and version. AWS Cloud Map will return the locations of resources that match the supplied parameters. It simplifies your deployments and reduces the operational complexity for your applications.

Integrated health checking for IP-based resources, registered with AWS Cloud Map, automatically stops routing traffic to unhealthy endpoints. Additionally, you have APIs to describe the health status of your services, so that you can learn about potential issues with your infrastructure. That increases the resilience of your applications.

AWS Cloud Map in Action
Getting started with AWS Cloud Map is easy. You can use the AWS console or CLI to create a namespace, such as myapp.com . For this example, I’ll use the CLI. Let’s create a namespace:

aws servicediscovery create-public-dns-namespace --name myapp.com (http://myapp.com/)

At this point, you’ll need to decide whether your want your applications to discover resources only via the AWS SDK and API calls, or if you need optional discovery via DNS. When you enable DNS discovery for a namespace, you’ll need to provide IP addresses for all the resources that you register. If you plan to register other cloud resources, such as DynamoDB tables by ARN or the URLs of the APIs deployed on Amazon API Gateway, you need to select API discovery mode.

Once your namespace is created, it’s time to create services. A service represents your application components, such as users , auth, or payment and can be comprised of many dynamically changing resources. You can specify a friendly name for your service, then select the DNS discovery and health checking options. You can create a service like this:

aws servicediscovery create-service --name frontend --namespace-id %namespace_id%”

After you create a service, you can register service instances with custom attributes:

aws servicediscovery register-instance --service-id %service_id% --instance-id %id%
--attributes AWS_INSTANCE_IPV4=54.20.10.1,stage=beta,version=1.0,active=yes

aws servicediscovery register-instance --service-id %service_id% --instance-id %id%
--attributes AWS_INSTANCE_IPV4=54.20.10.2,stage=beta,version=2.0,active=no

Now, your applications can make API calls to discover the service instances, optionally providing query parameters to filter the results:

aws servicediscovery discover-instances --namespace-name myapp.com --service-name frontend --query-parameters version=1.0,active=yes
-->
{
"Instances": [
{
"InstanceId": "1",
"NamespaceName": "myapp.com",
"ServiceName": "users",
"HealthStatus": "HEALTHY",
"Attributes": {
"version":"1.0",
"active":"yes",
"stage":"beta",
"AWS_INSTANCE_IPV4": "54.20.10.2" }
}
]
}

And that’s it! Amazon Elastic Container Service (ECS) and AWS Fargate are tightly integrated with AWS Cloud Map. When you create your service and enable service discovery, all the task instances are automatically registered in AWS Cloud Map on scale up, and deregistered on scale down. ECS also ensures that only healthy task instances are returned on the discovery calls by publishing always up-to-date health information to AWS Cloud Map.

For Amazon Elastic Container Service for Kubernetes (EKS), you can automatically publish the external IPs of the services running in EKS in AWS Cloud Map. To do this, we’ve released an update to an open source project, ExternalDNS, to make Kubernetes resources discoverable via AWS Cloud Map. You can find out more details about Kubernetes External DNS here.

 

Now Generally Available
You can start building your applications with AWS Cloud Map and enjoy the integration with Amazon ECS and EKS, rich and secure API query interface, ubiquitous DNS name resolution and integrated health checking support today. Want to try it out? Head to https://console.aws.amazon.com/cloudmap/home.  To test out the integration with ECS, head to https://console.aws.amazon.com/ecs/home and enable Service Discovery to get started.

Use AWS CodeDeploy to Implement Blue/Green Deployments for AWS Fargate and Amazon ECS

Post Syndicated from Curtis Rissi original https://aws.amazon.com/blogs/devops/use-aws-codedeploy-to-implement-blue-green-deployments-for-aws-fargate-and-amazon-ecs/

We are pleased to announce support for blue/green deployments for services hosted using AWS Fargate and Amazon Elastic Container Service (Amazon ECS).

In AWS CodeDeploy, blue/green deployments help you minimize downtime during application updates. They allow you to launch a new version of your application alongside the old version and test the new version before you reroute traffic to it. You can also monitor the deployment process and, if there is an issue, quickly roll back.

With this new capability, you can create a new service in AWS Fargate or Amazon ECS  that uses CodeDeploy to manage the deployments, testing, and traffic cutover for you. When you make updates to your service, CodeDeploy triggers a deployment. This deployment, in coordination with Amazon ECS, deploys the new version of your service to the green target group, updates the listeners on your load balancer to allow you to test this new version, and performs the cutover if the health checks pass.

In this post, I show you how to configure blue/green deployments for AWS Fargate and Amazon ECS using AWS CodeDeploy. For information about how to automate this end-to-end using a continuous delivery pipeline in AWS CodePipeline and Amazon ECR, read Build a Continuous Delivery Pipeline for Your Container Images with Amazon ECR as Source.

Let’s dive in!

Prerequisites

To follow along, you must have these resources in place:

  • A Docker image repository that contains an image you have built from your Dockerfile and application source. This walkthrough uses Amazon ECR. For more information, see Creating a Repository and Pushing an Image in the Amazon Elastic Container Registry User Guide.
  • An Amazon ECS cluster. You can use the default cluster created for you when you first use Amazon ECS or, on the Clusters page of the Amazon ECS console, you can choose a Networking only cluster. For more information, see Creating a Cluster in the Amazon Elastic Container Service User Guide.

Note: The image repository and cluster must be created in the same AWS Region.

Set up IAM service roles

Because you will be using AWS CodeDeploy to handle the deployments of your application to Amazon ECS, AWS CodeDeploy needs permissions to call Amazon ECS APIs, modify your load balancers, invoke Lambda functions, and describe CloudWatch alarms. Before you create an Amazon ECS service that uses the blue/green deployment type, you must create the AWS CodeDeploy IAM role (ecsCodeDeployRole). For instructions, see Amazon ECS CodeDeploy IAM Role in the Amazon ECS Developer Guide.

Create an Application Load Balancer

To allow AWS CodeDeploy and Amazon ECS to control the flow of traffic to multiple versions of your Amazon ECS service, you must create an Application Load Balancer.

Follow the steps in Creating an Application Load Balancer and make the following modifications:

  1. For step 6a in the Define Your Load Balancer section, name your load balancer sample-website-alb.
  2. For step 2 in the Configure Security Groups section:
    1. For Security group name, enter sample-website-sg.
    2. Add an additional rule to allow TCP port 8080 from anywhere (0.0.0.0/0).
  3. In the Configure Routing section:
    1. For Name, enter sample-website-tg-1.
    2. For Target type, choose to register your targets with an IP address.
  4. Skip the steps in the Create a Security Group Rule for Your Container Instances section.

Create an Amazon ECS task definition

Create an Amazon ECS task definition that references the Docker image hosted in your image repository. For the sake of this walkthrough, we use the Fargate launch type and the following task definition.

{
  "executionRoleArn": "arn:aws:iam::account_ID:role/ecsTaskExecutionRole",
  "containerDefinitions": [{
    "name": "sample-website",
    "image": "<YOUR ECR REPOSITORY URI>",
    "essential": true,
    "portMappings": [{
      "hostPort": 80,
      "protocol": "tcp",
      "containerPort": 80
    }]
  }],
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc",
  "cpu": "256",
  "memory": "512",
  "family": "sample-website"
}

Note: Be sure to change the value for “image” to the Amazon ECR repository URI for the image you created and uploaded to Amazon ECR in Prerequisites.

Creating an Amazon ECS service with blue/green deployments

Now that you have completed the prerequisites and setup steps, you are ready to create an Amazon ECS service with blue/green deployment support from AWS CodeDeploy.

Create an Amazon ECS service

  1. Open the Amazon ECS console at https://console.aws.amazon.com/ecs/.
  2. From the list of clusters, choose the Amazon ECS cluster you created to run your tasks.
  3. On the Services tab, choose Create.

This opens the Configure service wizard. From here you are able to configure everything required to deploy, run, and update your application using AWS Fargate and AWS CodeDeploy.

  1. Under Configure service:
    1. For the Launch type, choose FARGATE.
    2. For Task Definition, choose the sample-website task definition that you created earlier.
    3. Choose the cluster where you want to run your applications tasks.
    4. For Service Name, enter Sample-Website.
    5. For Number of tasks, specify the number of tasks that you want your service to run.
  2. Under Deployments:
    1. For Deployment type, choose Blue/green deployment (powered by AWS CodeDeploy). This creates a CodeDeploy application and deployment group using the default settings. You can see and edit these settings in the CodeDeploy console later.
    2. For the service role, choose the CodeDeploy service role you created earlier.
  3. Choose Next step.
  4. Under VPC and security groups:
    1. From Subnets, choose the subnets that you want to use for your service.
    2. For Security groups, choose Edit.
      1. For Assigned security groups, choose Select existing security group.
      2. Under Existing security groups, choose the sample-website-sg group that you created earlier.
      3. Choose Save.
  5. Under Load Balancing:
    1. Choose Application Load Balancer.
    2. For Load balancer name, choose sample-website-alb.
  6. Under Container to load balance:
    1. Choose Add to load balancer.
    2. For Production listener port, choose 80:HTTP from the first drop-down list.
    3. For Test listener port, in Enter a listener port, enter 8080.
  7. Under Additional configuration:
    1. For Target group 1 name, choose sample-website-tg-1.
    2. For Target group 2 name, enter sample-website-tg-2.
  8. Under Service discovery (optional), clear Enable service discovery integration, and then choose Next step.
  9. Do not configure Auto Scaling. Choose Next step.
  10. Review your service for accuracy, and then choose Create service.
  11. If everything is created successfully, choose View service.

You should now see your newly created service, with at least one task running.

When you choose the Events tab, you should see that Amazon ECS has deployed the tasks to your sample-website-tg-1 target group. When you refresh, you should see your service reach a steady state.

In the AWS CodeDeploy console, you will see that the Amazon ECS Configure service wizard has created a CodeDeploy application for you. Click into the application to see other details, including the deployment group that was created for you.

If you click the deployment group name, you can view other details about your deployment.  Under Deployment type, you’ll see Blue/green. Under Deployment configuration, you’ll see CodeDeployDefault.ECSAllAtOnce. This indicates that after the health checks are passed, CodeDeploy updates the listeners on the Application Load Balancer to send 100% of the traffic over to the green environment.

Under Load Balancing, you can see details about your target groups and your production and test listener ARNs.

Let’s apply an update to your service to see the CodeDeploy deployment in action.

Trigger a CodeDeploy blue/green deployment

Create a revised task definition

To test the deployment, create a revision to your task definition for your application.

  1. Open the Amazon ECS console at https://console.aws.amazon.com/ecs/.
  2. From the navigation pane, choose Task Definitions.
  3. Choose your sample-website task definition, and then choose Create new revision.
  4. Under Tags:
    1. In Add key, enter Name.
    2. In Add value, enter Sample Website.
  5. Choose Create.

Update ECS service

You now need to update your Amazon ECS service to use the latest revision of your task definition.

  1. Open the Amazon ECS console at https://console.aws.amazon.com/ecs/.
  2. Choose the Amazon ECS cluster where you’ve deployed your Amazon ECS service.
  3. Select the check box next to your sample-website service.
  4. Choose Update to open the Update Service wizard.
    1. Under Configure service, for Task Definition, choose 2 (latest) from the Revision drop-down list.
  5. Choose Next step.
  6. Skip Configure deployments. Choose Next step.
  7. Skip Configure network. Choose Next step.
  8. Skip Set Auto Scaling (optional). Choose Next step.
  9. Review the changes, and then choose Update Service.
  10. Choose View Service.

You are now be taken to the Deployments tab of your service where you can see details about your blue/green deployment.

You can click the deployment ID to go to the details view for the CodeDeploy deployment.

From there you can see the deployments status:

You can also see the progress of the traffic shifting:

If you notice issues, you can stop and roll back the deployment. This shifts traffic back to the original (blue) task set and stops the deployment.

By default, CodeDeploy waits one hour after a successful deployment before it terminates the original task set. You can use the AWS CodeDeploy console to shorten this interval. After the task set is terminated, CodeDeploy marks the deployment complete.

Conclusion

In this post, I showed you how to create an AWS Fargate-based Amazon ECS service with blue/green deployments powered by AWS CodeDeploy. I showed you how to configure the required and prerequisite components, such as an Application Load Balancer and associated targets groups, all from the AWS Management Console. I hope that the information in this posts helps you get started implementing this for your own applications!

Build a Continuous Delivery Pipeline for Your Container Images with Amazon ECR as Source

Post Syndicated from Daniele Stroppa original https://aws.amazon.com/blogs/devops/build-a-continuous-delivery-pipeline-for-your-container-images-with-amazon-ecr-as-source/

Today, we are launching support for Amazon Elastic Container Registry (Amazon ECR) as a source provider in AWS CodePipeline. You can now initiate an AWS CodePipeline pipeline update by uploading a new image to Amazon ECR. This makes it easier to set up a continuous delivery pipeline and use the AWS Developer Tools for CI/CD.

You can use Amazon ECR as a source if you’re implementing a blue/green deployment with AWS CodeDeploy from the AWS CodePipeline console. For more information about using the Amazon Elastic Container Service (Amazon ECS) console to implement a blue/green deployment without CodePipeline, see Implement Blue/Green Deployments for AWS Fargate and Amazon ECS Powered by AWS CodeDeploy.

This post shows you how to create a complete, end-to-end continuous deployment (CD) pipeline with Amazon ECR and AWS CodePipeline. It walks you through setting up a pipeline to build your images when the upstream base image is updated.

Prerequisites

To follow along, you must have these resources in place:

  • A source control repository with your base image Dockerfile and a Docker image repository to store your image. In this walkthrough, we use a simple Dockerfile for the base image:
    FROM alpine:3.8

    RUN apk update

    RUN apk add nodejs
  • A source control repository with your application Dockerfile and source code and a Docker image repository to store your image. For the application Dockerfile, we use our base image and then add our application code:
    FROM 012345678910.dkr.ecr.us-east-1.amazonaws.com/base-image

    ENV PORT=80

    EXPOSE $PORT

    COPY app.js /app/

    CMD ["node", "/app/app.js"]

This walkthrough uses AWS CodeCommit for the source control repositories and Amazon ECR  for the Docker image repositories. For more information, see Create an AWS CodeCommit Repository in the AWS CodeCommit User Guide and Creating a Repository in the Amazon Elastic Container Registry User Guide.

Note: The source control repositories and image repositories must be created in the same AWS Region.

Set up IAM service roles

In this walkthrough you use AWS CodeBuild and AWS CodePipeline to build your Docker images and push them to Amazon ECR. Both services use Identity and Access Management (IAM) service roles to makes calls to Amazon ECR API operations. The service roles must have a policy that provides permissions to make these Amazon ECR calls. The following procedure helps you attach the required permissions to the CodeBuild service role.

To create the CodeBuild service role

  1. Follow these steps to use the IAM console to create a CodeBuild service role.
  2. On step 10, make sure to also add the AmazonEC2ContainerRegistryPowerUser policy to your role.

CodeBuild service role policies

Create a build specification file for your base image

A build specification file (or build spec) is a collection of build commands and related settings, in YAML format, that AWS CodeBuild uses to run a build. Add a buildspec.yml file to your source code repository to tell CodeBuild how to build your base image. The example build specification used here does the following:

  • Pre-build stage:
    • Sign in to Amazon ECR.
    • Set the repository URI to your ECR image and add an image tag with the first seven characters of the Git commit ID of the source.
  • Build stage:
    • Build the Docker image and tag the image with latest and the Git commit ID.
  • Post-build stage:
    • Push the image with both tags to your Amazon ECR repository.
version: 0.2

phases:
  pre_build:
    commands:
      - echo.Logging in to Amazon ECR...
      - aws --version
      - $(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email)
      - REPOSITORY_URI=012345678910.dkr.ecr.us-east-1.amazonaws.com/base-image
      - COMMIT_HASH=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)
      - IMAGE_TAG=${COMMIT_HASH:=latest}
  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...
      - docker build -t $REPOSITORY_URI:latest .
      - docker tag $REPOSITORY_URI:latest $REPOSITORY_URI:$IMAGE_TAG
  post_build:
    commands:
      - echo Build completed on `date`
      - echo Pushing the Docker images...
      - docker push $REPOSITORY_URI:latest
      - docker push $REPOSITORY_URI:$IMAGE_TAG

To add a buildspec.yml file to your source repository

  1. Open a text editor and then copy and paste the build specification above into a new file.
  2. Replace the REPOSITORY_URI value (012345678910.dkr.ecr.us-east-1.amazonaws.com/base-image) with your Amazon ECR repository URI (without any image tag) for your Docker image. Replace base-image with the name for your base Docker image.
  3. Commit and push your buildspec.yml file to your source repository.
    git add .
    git commit -m "Adding build specification."
    git push

Create a build specification file for your application

Add a buildspec.yml file to your source code repository to tell CodeBuild how to build your source code and your application image. The example build specification used here does the following:

  • Pre-build stage:
    • Sign in to Amazon ECR.
    • Set the repository URI to your ECR image and add an image tag with the first seven characters of the CodeBuild build ID.
  • Build stage:
    • Build the Docker image and tag the image with latest and the Git commit ID.
  • Post-build stage:
    • Push the image with both tags to your ECR repository.
version: 0.2

phases:
  pre_build:
    commands:
      - echo Logging in to Amazon ECR...
      - aws --version
      - $(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email)
      - REPOSITORY_URI=012345678910.dkr.ecr.us-east-1.amazonaws.com/hello-world
      - COMMIT_HASH=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)
      - IMAGE_TAG=build-$(echo $CODEBUILD_BUILD_ID | awk -F":" '{print $2}')
  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...
      - docker build -t $REPOSITORY_URI:latest .
      - docker tag $REPOSITORY_URI:latest $REPOSITORY_URI:$IMAGE_TAG
  post_build:
    commands:
      - echo Build completed on `date`
      - echo Pushing the Docker images...
      - docker push $REPOSITORY_URI:latest
      - docker push $REPOSITORY_URI:$IMAGE_TAG
artifacts:
  files:
    - imageDetail.json

To add a buildspec.yml file to your source repository

  1. Open a text editor and then copy and paste the build specification above into a new file.
  2. Replace the REPOSITORY_URI value (012345678910.dkr.ecr.us-east-1.amazonaws.com/hello-world) with your Amazon ECR repository URI (without any image tag) for your Docker image. Replace hello-world with the container name in your service’s task definition that references your Docker image.
  3. Commit and push your buildspec.yml file to your source repository.
    git add .
    git commit -m "Adding build specification."
    git push

Create a continuous deployment pipeline for your base image

Use the AWS CodePipeline wizard to create your pipeline stages:

  1. Open the AWS CodePipeline console at https://console.aws.amazon.com/codepipeline/.
  2. On the Welcome page, choose Create pipeline.
    If this is your first time using AWS CodePipeline, an introductory page appears instead of Welcome. Choose Get Started Now.
  3. On the Step 1: Name page, for Pipeline name, type the name for your pipeline and choose Next step. For this walkthrough, the pipeline name is base-image.
  4. On the Step 2: Source page, for Source provider, choose AWS CodeCommit.
    1. For Repository name, choose the name of the AWS CodeCommit repository to use as the source location for your pipeline.
    2. For Branch name, choose the branch to use, and then choose Next step.
  5. On the Step 3: Build page, choose AWS CodeBuild, and then choose Create project.
    1. For Project name, choose a unique name for your build project. For this walkthrough, the project name is base-image.
    2. For Operating system, choose Ubuntu.
    3. For Runtime, choose Docker.
    4. For Version, choose aws/codebuild/docker:17.09.0.
    5. For Service role, choose Existing service role, choose the CodeBuild service role you’ve created earlier, and then clear the Allow AWS CodeBuild to modify this service role so it can be used with this build project box.
    6. Choose Continue to CodePipeline.
    7. Choose Next.
  6. On the Step 4: Deploy page, choose Skip and acknowledge the pop-up warning.
  7. On the Step 5: Review page, review your pipeline configuration, and then choose Create pipeline.

Base image pipeline

Create a continuous deployment pipeline for your application image

The execution of the application image pipeline is triggered by changes to the application source code and changes to the upstream base image. You first create a pipeline, and then edit it to add a second source stage.

    1. Open the AWS CodePipeline console at https://console.aws.amazon.com/codepipeline/.
    2. On the Welcome page, choose Create pipeline.
    3. On the Step 1: Name page, for Pipeline name, type the name for your pipeline, and then choose Next step. For this walkthrough, the pipeline name is hello-world.
    4. For Service role, choose Existing service role, and then choose the CodePipeline service role you modified earlier.
    5. On the Step 2: Source page, for Source provider, choose Amazon ECR.
      1. For Repository name, choose the name of the Amazon ECR repository to use as the source location for your pipeline. For this walkthrough, the repository name is base-image.

Amazon ECR source configuration

  1. On the Step 3: Build page, choose AWS CodeBuild, and then choose Create project.
    1. For Project name, choose a unique name for your build project. For this walkthrough, the project name is hello-world.
    2. For Operating system, choose Ubuntu.
    3. For Runtime, choose Docker.
    4. For Version, choose aws/codebuild/docker:17.09.0.
    5. For Service role, choose Existing service role, choose the CodeBuild service role you’ve created earlier, and then clear the Allow AWS CodeBuild to modify this service role so it can be used with this build project box.
    6. Choose Continue to CodePipeline.
    7. Choose Next.
  2. On the Step 4: Deploy page, choose Skip and acknowledge the pop-up warning.
  3. On the Step 5: Review page, review your pipeline configuration, and then choose Create pipeline.

The pipeline will fail, because it is missing the application source code. Next, you edit the pipeline to add an additional action to the source stage.

  1. Open the AWS CodePipeline console at https://console.aws.amazon.com/codepipeline/.
  2. On the Welcome page, choose your pipeline from the list. For this walkthrough, the pipeline name is hello-world.
  3. On the pipeline page, choose Edit.
  4. On the Editing: hello-world page, in Edit: Source, choose Edit stage.
  5. Choose the existing source action, and choose the edit icon.
    1. Change Output artifacts to BaseImage, and then choose Save.
  6. Choose Add action, and then enter a name for the action (for example, Code).
    1. For Action provider, choose AWS CodeCommit.
    2. For Repository name, choose the name of the AWS CodeCommit repository for your application source code.
    3. For Branch name, choose the branch.
    4. For Output artifacts, specify SourceArtifact, and then choose Save.
  7. On the Editing: hello-world page, choose Save and acknowledge the pop-up warning.

Application image pipeline

Test your end-to-end pipeline

Your pipeline should have everything for running an end-to-end native AWS continuous deployment. Now, test its functionality by pushing a code change to your base image repository.

  1. Make a change to your configured source repository, and then commit and push the change.
  2. Open the AWS CodePipeline console at https://console.aws.amazon.com/codepipeline/.
  3. Choose your pipeline from the list.
  4. Watch the pipeline progress through its stages. As the base image is built and pushed to Amazon ECR, see how the second pipeline is triggered, too. When the execution of your pipeline is complete, your application image is pushed to Amazon ECR, and you are now ready to deploy your application. For more information about continuously deploying your application, see Create a Pipeline with an Amazon ECR Source and ECS-to-CodeDeploy Deployment in the AWS CodePipeline User Guide.

Conclusion

In this post, we showed you how to create a complete, end-to-end continuous deployment (CD) pipeline with Amazon ECR and AWS CodePipeline. You saw how to initiate an AWS CodePipeline pipeline update by uploading a new image to Amazon ECR. Support for Amazon ECR in AWS CodePipeline makes it easier to set up a continuous delivery pipeline and use the AWS Developer Tools for CI/CD.

Scanning Docker Images for Vulnerabilities using Clair, Amazon ECS, ECR, and AWS CodePipeline

Post Syndicated from tiffany jernigan (@tiffanyfayj) original https://aws.amazon.com/blogs/compute/scanning-docker-images-for-vulnerabilities-using-clair-amazon-ecs-ecr-aws-codepipeline/

Post by Vikrama Adethyaa, Solution Architect and Tiffany Jernigan, Developer Advocate

 

Containers are an increasingly important way for you to package and deploy your applications. They are lightweight and provide a consistent, portable software environment for applications to easily run and scale anywhere.

A container is launched from a container image, an executable package that includes everything needed to run an application: the application code, configuration files, runtime (for example, Java, Python, etc.), libraries, and environment variables.

A container image is built up from a series of layers. For a Docker image, each layer in the image represents an instruction in the image’s Dockerfile. A parent image is the image on which your image is built. It refers to the contents of the FROM directive in the Dockerfile. Most Dockerfiles start from a parent image, and often the parent image was downloaded from a public registry.

It is incredibly difficult and time-consuming to manually track all the files, packages, libraries, and so on, included in an image along with the vulnerabilities that they may possess. Having a security breach is one of the costliest things an organization can endure. It takes years to build up a reputation and only seconds to tear it down.

One way to prevent breaches is to regularly scan your images and compare the dependencies to a known list of common vulnerabilities and exposures (CVEs). Public CVE lists contain an identification number, description, and at least one public reference for known cybersecurity vulnerabilities. The automatic detection of vulnerabilities helps increase awareness and best security practices across developer and operations teams. It encourages action to patch and address the vulnerabilities.

This post walks you through the process of setting up an automated vulnerability scanning pipeline. You use AWS CodePipeline to scan your container images for known security vulnerabilities and deploy the container only if the vulnerabilities are within the defined threshold.

This solution uses CoresOS Clair for static analysis of vulnerabilities in container images. Clair is an API-driven analysis engine that inspects containers layer-by-layer for known security flaws. Clair scans each container layer and provides a notification of vulnerabilities that may be a threat, based on the CVE database and similar data feeds from Red Hat, Ubuntu, and Debian.

Deploying Clair

Here’s how to install Clair on AWS. The following diagram shows the high-level architecture of Clair.

Clair uses PostgreSQL, so use Aurora PostgreSQL to host the Clair database. You deploy Clair as an ECS service with the Fargate launch type behind an Application Load Balancer. The Clair container is deployed in a private subnet behind the Application Load Balancer that is hosted in the public subnets. The private subnets must have a route to the internet using the NAT gateway, as Clair fetches the latest vulnerability information from multiple online sources.

Prerequisites

Ensure that the following are installed or configured on your workstation before you deploy Clair:

  • Docker
  • Git
  • AWS CLI installed
  • AWS CLI is configured with your access key ID and secret access key, and the default region as us-east-1

Download the AWS CloudFormation template for deploying Clair

To help you quickly deploy Clair on AWS and set up CodePipeline with automatic vulnerability detection, use AWS CloudFormation templates that can be downloaded from the aws-codepipeline-docker-vulnerability-scan GitHub repository. The repository also includes a simple, containerized NGINX website for testing your pipeline.

# Clone the GitHub repository
git clone https://github.com/aws-samples/aws-codepipeline-docker-vulnerability-scan.git

cd aws-codepipeline-docker-vulnerability-scan

VPC requirements

We recommend a VPC with the following specification for deploying CoreOS Clair:

  • Two public subnets
  • Two private subnets
  • NAT gateways to allow internet access for services in private subnets

You can create such a VPC using the AWS CloudFormation template networking-template.yaml that is included in the sample code you cloned from GitHub.

# Create the VPC
aws cloudformation create-stack \
--stack-name coreos-clair-vpc-stack \
--template-body file://networking-template.yaml

# Verify that stack creation is complete
aws cloudformation wait stack-create-complete \
–stack-name coreos-clair-vpc-stack

# Get stack outputs
aws cloudformation describe-stacks \
--stack-name coreos-clair-vpc-stack \
--query 'Stacks[].Outputs[]'

Build the Clair Docker image

First, create an Amazon Elastic Container Registry (Amazon ECR) repository to host your Clair Docker image. Then, build the Clair Docker image on your workstation and push it to the ECR repository that you created.

# Create the ECR repository
# Note the URI and ARN of the ECR Repository
aws ecr create-repository --repository-name coreos-clair

# Build the Docker image
docker build -t <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/coreos-clair:latest ./coreos-clair

# Push the Docker image to ECR
aws ecr get-login --no-include-email | bash
docker push <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/coreos-clair:latest

Deploy Clair using AWS CloudFormation

Now that the Clair Docker image has been built and pushed to ECR, deploy Clair as an ECS service with the Fargate launch type. The following AWS CloudFormation stack creates an ECS cluster named clair-demo-cluster and deploys the Clair service.

# Create the AWS CloudFormation stack
# <ECRRepositoryUri> - CoreOS Clair ECR repository URI without an image tag
# Example - <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/coreos-clair

aws cloudformation create-stack \
--stack-name coreos-clair-stack \
--template-body file://coreos-clair/clair-template.yaml \
--capabilities CAPABILITY_IAM \
--parameters \
ParameterKey="VpcId",ParameterValue="<VpcId>" \
ParameterKey="PublicSubnets",ParameterValue=\"<PublicSubnet01-ID>,<PublicSubnet02-ID>\" \
ParameterKey="PrivateSubnets",ParameterValue=\"<PrivateSubnet01-ID>,<PrivateSubnet02-ID>\" \
ParameterKey="ECRRepositoryUri",ParameterValue="<ECRRepositoryUri>"

# Verify that stack creation is complete
aws cloudformation wait stack-create-complete \
–stack-name coreos-clair-stack

# Get stack outputs
# Note the ClairAlbDnsName
aws cloudformation describe-stacks \
--stack-name coreos-clair-stack \
--query 'Stacks[].Outputs[]'

Deploying the sample website

Deploy a simple static website running on NGINX as a container. An AWS CloudFormation template is included in the sample code that you cloned from GitHub.

Create a CodeCommit repository for the NGINX website

You create an AWS CodeCommit repository to host the sample NGINX website code. This repository is the source of the pipeline that you create later. Before you proceed with the following steps, ensure SSH authentication to CodeCommit.

# Create the CodeCommit repository
# Note the cloneUrlSsh value
aws codecommit create-repository --repository-name my-nginx-website
 
# Clone the empty CodeCommit repository
cd ../
git clone <cloneUrlSsh>

# Copy the contents of nginx-website to my-nginx-website
cp -R aws-codepipeline-docker-vulnerability-scan/nginx-website/ my-nginx-website/

# Commit the changes
cd my-nginx-website/
git add *
git commit -m "Initial commit"
git push

Build the NGINX Docker image

Create an ECR repository to host your NGINX website Docker image. Build the image on your workstation using the file Dockerfile-amznlinux, where Amazon Linux is the parent image. After the image is built, push it to the ECR repository that you created.

# Create an ECR repository
# Note the URI and ARN of the ECR repository
aws ecr create-repository --repository-name nginx-website

# Build the Docker image
docker build -f Dockerfile-amznlinux -t <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/nginx-website:latest .

# Push the Docker image to ECR
docker push <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/nginx-website:latest

Deploy the NGINX website using AWS CloudFormation

Now deploy the NGINX website. The following stack deploys the NGINX website onto the same ECS cluster (clair-demo-cluster) as Clair.

# Create the AWS CloudFormation stack
# <ECRRepositoryUri> - Nginx-Website ECR Repository URI without Image tag
# Example: <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/nginx-website

cd ../aws-codepipeline-docker-vulnerability-scan/

aws cloudformation create-stack \
--stack-name nginx-website-stack \
--template-body file://nginx-website/nginx-website-template.yaml \
--capabilities CAPABILITY_IAM \
--parameters \
ParameterKey="VpcId",ParameterValue="<VpcId>" \
ParameterKey="PublicSubnets",ParameterValue=\"<PublicSubnet01-ID>,<PublicSubnet02-ID>\" \
ParameterKey="PrivateSubnets",ParameterValue=\"<PrivateSubnet01-ID>,<PrivateSubnet02-ID>\" \
ParameterKey="ECRRepositoryUri",ParameterValue="<ECRRepositoryUri>"

# Verify that stack creation is complete
aws cloudformation wait stack-create-complete \
–stack-name nginx-website-stack

# Get stack outputs
aws cloudformation describe-stacks \
--stack-name nginx-website-stack \
--query 'Stacks[].Outputs[]'

Note the AWS CloudFormation stack outputs. The stack output contains the Application Load Balancer URL for the NGINX website and the ECS service name of the NGINX website. You need the ECS service name for the pipeline.

Building the pipeline

In this section, you build a pipeline to automate vulnerability scanning for the nginx-website Docker image builds. Every time that a code change is made, the Docker image is rebuilt and scanned for vulnerabilities. Only if vulnerabilities are within the defined threshold is the container is deployed onto ECS. For more information, see Tutorial: Continuous Deployment with AWS CodePipeline.

The sample code includes an AWS CloudFormation template to create the pipeline. The buildspec.yml file is used by AWS CodeBuild to build the nginx-website Docker image and scan the image using Clair.

CodeBuild build spec

build spec is a collection of build commands and related settings, in YAML format, that AWS CodeBuild uses to run a build. You can include a build spec in the root directory of your application source code, or you can define a build spec when you create a build project.

In this sample app, you include the build spec in the root directory of your sample application source code. The buildspec.yml file is located in the /aws-codepipeline-docker-vulnerability-scan/nginx-website folder.

Use Klar, a simple tool to analyze images stored in a private or public Docker registry for security vulnerabilities using Clair. Klar serves as a client which coordinates the image checks between ECR and Clair.

In the buildspec.yml file, you set the variable CLAIR_OUTPUT=Critical. CLAIR_OUTPUT defines the severity level threshold. Vulnerabilities with severity levels higher than or equal to this threshold are outputted. The supported levels are:

  • Unknown
  • Negligible
  • Low
  • Medium
  • High
  • Critical
  • Defcon1

You can configure Klar to your requirements by setting the variables as defined in https://github.com/optiopay/klar.

# Set the following variables as CodeBuild project environment variables
# ECR_REPOSITORY_URI
# CLAIR_URL

version: 0.2
phases:
  pre_build:
    commands:
      - echo Fetching ECR Login
      - ECR_LOGIN=$(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email)
      - echo Logging in to Amazon ECR...
      - $ECR_LOGIN
      - IMAGE_TAG=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)
      - echo Downloading Clair client Klar-2.1.1
      - wget https://github.com/optiopay/klar/releases/download/v2.1.1/klar-2.1.1-linux-amd64
      - mv ./klar-2.1.1-linux-amd64 ./klar
      - chmod +x ./klar
      - PASSWORD=`echo $ECR_LOGIN | cut -d' ' -f6`
  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...
      - docker build -t $ECR_REPOSITORY_URI:latest .
      - docker tag $ECR_REPOSITORY_URI:latest $ECR_REPOSITORY_URI:$IMAGE_TAG
  post_build:
    commands:
      - bash -c "if [ /"$CODEBUILD_BUILD_SUCCEEDING/" == /"0/" ]; then exit 1; fi"
      - echo Build completed on `date`
      - echo Pushing the Docker images...
      - docker push $ECR_REPOSITORY_URI:latest
      - docker push $ECR_REPOSITORY_URI:$IMAGE_TAG
      - echo Running Clair scan on the Docker Image
      - DOCKER_USER=AWS DOCKER_PASSWORD=${PASSWORD} CLAIR_ADDR=$CLAIR_URL CLAIR_OUTPUT=Critical ./klar $ECR_REPOSITORY_URI
      - echo Writing image definitions file...
      - printf '[{"name":"MyWebsite","imageUri":"%s"}]' $ECR_REPOSITORY_URI:$IMAGE_TAG > imagedefinitions.json
artifacts:
  files: imagedefinitions.json

The build spec does the following:

Pre-build stage:

  • Log in to ECR.
  • Download the Clair client Klar.

Build stage:

  • Build the Docker image and tag it as latest and with the Git commit ID.

Post-build stage:

  • Push the image to your ECR repository with both tags.
  • Trigger Klar to scan the image that you pushed to ECR for security vulnerabilities using Clair.
  • Write a file called imagedefinitions.json in the build root that has your Amazon ECS service’s container name and the image and tag. The deployment stage of your CD pipeline uses this information to create a new revision of your service’s task definition. It then updates the service to use the new task definition. The imagedefinitions.json file is required for the AWS CodeDeploy ECS job worker.

Deploy the pipeline

Deploy the pipeline using the AWS CloudFormation template provided with the sample code. The following template creates the CodeBuild project, CodePipeline pipeline, Amazon CloudWatch Events rule, and necessary IAM permissions.

# Deploy the pipeline
 
# Replace the following variables 
# WebsiteECRRepositoryARN – NGINX website ECR repository ARN
# WebsiteECRRepositoryURI – NGINX website ECR repository URI
# ClairAlbDnsName - Output variable from coreos-clair-stack
# EcsServiceName – Output variable from nginx-website-stack

aws cloudformation create-stack \
--stack-name nginx-website-codepipeline-stack \
--template-body file://clair-codepipeline-template.yaml \
--capabilities CAPABILITY_IAM \
--disable-rollback \
--parameters \
ParameterKey="EcrRepositoryArn",ParameterValue="<WebsiteECRRepositoryARN>" \
ParameterKey="EcrRepositoryUri",ParameterValue="<WebsiteECRRepositoryURI>" \
ParameterKey="ClairAlbDnsName",ParameterValue="<ClairAlbDnsName>" \
ParameterKey="EcsServiceName",ParameterValue="<WebsiteECSServiceName>"

# Verify that stack creation is complete
aws cloudformation wait stack-create-complete \
–stack-name nginx-website-codepipeline-stack

The pipeline is triggered after the AWS CloudFormation stack creation is complete. You can log in to the AWS Management Console to monitor the status of the pipeline. The vulnerability scan information is available in CloudWatch Logs.

You can also modify the CLAIR_OUTPUT value from Critical to High in the buildspec.yml file in the /cores-clair-ecs-cicd/nginx-website-repo folder and then check the status of the build.

Summary

I’ve described how to deploy Clair on AWS and set up a release pipeline for the automated vulnerability scanning of container images. The Clair instance can be used as a centralized Docker image vulnerability scanner and used by other CodeBuild projects. To meet your organization’s security requirements, define your vulnerability threshold in Klar by setting the variables, as defined in https://github.com/optiopay/klar.

Re-affirming Long-Term Support for Java in Amazon Linux

Post Syndicated from Deepak Singh original https://aws.amazon.com/blogs/compute/re-affirming-long-term-support-for-java-in-amazon-linux/

In light of Oracle’s recent announcement indicating an end to free long-term support for OpenJDK after January 2019, we re-affirm that the OpenJDK 8 and OpenJDK 11 Java runtimes in Amazon Linux 2 will continue to receive free long-term support from Amazon until at least June 30, 2023. We are collaborating and contributing in the OpenJDK community to provide our customers with a free long-term supported Java runtime.

In addition, Amazon Linux AMI 2018.03, the last major release of Amazon Linux AMI, will receive support for the OpenJDK 8 runtime at least until June 30, 2020, to facilitate migration to Amazon Linux 2. Java runtimes provided by AWS Services such as AWS Lambda, AWS Elastic Map Reduce (EMR), and AWS Elastic Beanstalk will also use the AWS supported OpenJDK builds.

Amazon Linux users will not need to make any changes to get support for OpenJDK 8. OpenJDK 11 will be made available through the Amazon Linux 2 repositories at a future date. The Amazon Linux OpenJDK support posture will also apply to the on-premises virtual machine images and Docker base image of Amazon Linux 2.

Amazon Linux 2 provides a secure, stable, and high-performance execution environment. Amazon Linux AMI and Amazon Linux 2 include a Java runtime based on OpenJDK 8 and are available in all public AWS regions at no additional cost beyond the pricing for Amazon EC2 instance usage.

Amazon ECS and Docker volume drivers, part 2: Amazon EFS

Post Syndicated from tiffany jernigan (@tiffanyfayj) original https://aws.amazon.com/blogs/compute/amazon-ecs-and-docker-volume-drivers-amazon-efs/

← Introduction and Part 1: Amazon EBS

 

Post by: Tiffany Jernigan and Jeremy Cowan

Introduction

This is the second post in a series showing how to use Docker volumes with Amazon ECS. If you are unfamiliar with Docker volumes or REX-Ray, or want to know how to use a volume plugin with ECS and Amazon Elastic Block Store (Amazon EBS), see Part 1.

In this post, you use the REX-Ray EFS plugin with Amazon Elastic File System (Amazon EFS) to persist and share data among multiple ECS tasks. To help you get started, we have created an AWS CloudFormation template that builds a two-instance ECS cluster across two Availability Zones.

The template bootstraps the REX-Ray EFS plugin onto each node. Each instance has the REX-Ray EFS plugin installed, is assigned an IAM role with an inline policy with permissions for REX-Ray to issue the necessary AWS API calls, and a security group to open port 2049 for EFS. The template also creates a Network Load Balancer that is used to expose an ECS service to the internet.

Set up the environment

First, create a folder in which you create all files and enter it. Next, set the full path for your EC2 key pair that you need later to connect to your instance using SSH.

#example path /Users/tiffany/.aws/ec2-keypair.pem
export KeyPairPath=<your-keypair>

Step 1: Instantiate the CloudFormation template

Next, create a CloudFormation stack with the following S3 template:
rexray-demo-efs.yaml

KeyPairName=$(echo $KeyPairPath | cut -d / -f5 | sed 's/.pem//')
Region=$(aws configure get region) #You can also replace this
CloudFormationStack=$(aws cloudformation create-stack \
--region $Region \
--stack-name rexray-demo-efs \
--capabilities CAPABILITY_NAMED_IAM \
--template-url http://s3.amazonaws.com/ecs-refarch-volume-plugins/rexray-demo-efs.yaml \
--parameters ParameterKey=KeyName,ParameterValue=$KeyPairName \
| jq -r .StackId)

The ECS container instances are bootstrapped with a user data script that installs the rexray/efs Docker plugin using:

docker plugin install rexray/efs REXRAY_PREEMPT=true \
EFS_REGION=${AWS::Region} \
EFS_SECURITYGROUPS=${EFSSecurityGroup} \
--grant-all-permissions

Step 2: Export output parameters as environment variables

This shell script exports the output parameters from the CloudFormation template. With the following command, import them as OS environment variables. Later, you use these variables to create task and service definitions.

cat > get-outputs.sh << 'EOF'
#!/bin/bash
function usage {
  echo "usage: source <(./get-outputs.sh  )"
  echo "stack name or ID must be provided or exported as the CloudFormationStack environment variable"
  echo "region must be provided or set with aws configure"
}

function main {
    #Get stack
    if [ -z "$1" ]; then
        if [ -z "$CloudFormationStack" ]; then
            echo "please provide stack name or ID"
            usage
            exit 1
        fi
    else
        CloudFormationStack="$1"
    fi
    #Get region
    if [ -z "$2" ]; then
        region=$(aws configure get region)
        if [ -z $region ]; then
            echo "please provide region"
            usage
            exit 1
        fi
    else
        region="$2"
    fi
    
    echo "#Region: $region"
    echo "#Stack: $CloudFormationStack"
    echo "#---"
    
    echo "#Checking if stack exists..."
    aws cloudformation wait stack-exists \
    --region $region \
    --stack-name $CloudFormationStack
    
    echo "#Checking if stack creation is complete..."
    aws cloudformation wait stack-create-complete \
    --region $region \
    --stack-name $CloudFormationStack
     
    echo "#Getting output keys and values..."
    echo "#---"
    aws cloudformation describe-stacks \
    --region $region \
    --stack-name $CloudFormationStack \
    --query 'Stacks[].Outputs[].[OutputKey, OutputValue]' \
    --output text | awk '{print "export", $1"="$2}'
}
main "[email protected]"
EOF
#Add executable permissions
chmod +x get-outputs.sh

Now run the script:

./get-outputs.sh && source <(./get-outputs.sh)

Step 3: Create a task definition

In this step, you create a task definition for an Apache web service, Space, which is an example website using Apache2 on Ubuntu. The scheduler and the REX-Ray EFS plugin ensure that each copy of the task establishes a connection with EFS.

cat > space-taskdef-efs.json << EOF 
{
    "containerDefinitions": [
        {
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "${CWLogGroupName}",
                    "awslogs-region": "${AWSRegion}",
                    "awslogs-stream-prefix": "ecs"
                }
            },
            "portMappings": [
               {
                    "containerPort": 80,
                    "protocol": "tcp"
                }
            ],
            "mountPoints": [
                {
                    "containerPath": "/var/www/",
                    "sourceVolume": "rexray-efs-vol"
                }
            ],
            "image": "tiffanyfay/space:apache",
            "essential": true,
            "name": "space"
        }
    ],
    "memory": "512",
    "family": "rexray-efs",
    "networkMode": "awsvpc",
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "512",
    "volumes": [
        {
            "name": "rexray-efs-vol",
            "dockerVolumeConfiguration": {
                "autoprovision": true,
                "scope": "shared",
                "driver": "rexray/efs"
            }
        }
    ]
}
EOF

Because autoprovision is set to true, the Docker volume driver, rexray/efs, creates a new file system for you. And because scope is shared, the file system can be used across multiple tasks.

Register the task definition and extract the task definition ARN from the result:

TaskDefinitionArn=$(aws ecs register-task-definition \
--region $AWSRegion \
--cli-input-json 'file://space-taskdef-efs.json' \
| jq -r .taskDefinition.taskDefinitionArn)

Step 4: Create a service definition

In this step, you create a service definition for the rexray-efs task definition. An ECS service is a long-running task that is monitored by the service scheduler. If the task dies or becomes unhealthy, the scheduler automatically attempts to restart the task.

The web service is fronted by a Network Load Balancer that is configured for forward traffic on port 80 to the tasks registered with a specific target group. The desired count is the desired number of task copies to run. The minimum and maximum healthy percent parameters inform the scheduler to run only exactly the number of desired copies of this task at a time. Unless a task has been stopped, it does not try starting a new one.

cat > space-svcdef-efs.json << EOF 
{
    "cluster": "${ECSClusterName}",
    "serviceName": "space-svc",
    "taskDefinition": "${TaskDefinitionArn}",
    "loadBalancers": [
        {
            "targetGroupArn": "${WebTargetGroupArn}",
            "containerName": "space",
            "containerPort": 80
        }
    ],
    "desiredCount": 4,
    "launchType": "EC2",
    "healthCheckGracePeriodSeconds": 60, 
    "deploymentConfiguration": {
        "maximumPercent": 100,
        "minimumHealthyPercent": 0
    },
    "networkConfiguration": {
        "awsvpcConfiguration": {
            "subnets": [
                "${SubnetIds}"
            ],
            "securityGroups": [
                "${EFSSecurityGroupId}",
                "${InstanceSecurityGroupId}"
            ]
        }
    }
}
EOF

Create the Apache service:

SvcDefinitionArn=$(aws ecs create-service \
--region $AWSRegion \
--cli-input-json file://space-svcdef-efs.json \
| jq -r .service.serviceArn)

Wait for service to be up with the last status as RUNNING for the tasks using either the CLI or the console:

aws ecs wait services-stable \
--region $AWSRegion \
--cluster $ECSClusterName \
--services $SvcDefinitionArn

Next, look at your file system and see two mount points—one for each Availability Zone:

FileSystemId=$(aws efs describe-file-systems \
--region $AWSRegion \
--query 'FileSystems[?Name==`/rexray-efs-vol`].FileSystemId' \
--output text)
aws efs describe-mount-targets \
--region $AWSRegion \
--file-system-id $FileSystemId 

Step 5: View the webpage

Now, open a browser and paste NLBDNSName as the URL.

echo $NLBDNSName

If you refresh the page, you can see that the task ID and EC2 instance ID change as the traffic is being load balanced.

Get the DNS info for an instance so that you can connect to it using SSH and modify index.shtml:

InstanceDns=$(aws ec2 describe-instances \
--region $AWSRegion \
--filter Name="tag:aws:cloudformation:stack-id",Values="$CloudFormationStack" \
--query 'Reservations[1].Instances[].PublicDnsName' \
--output text)
ssh -i $KeyPairPath [email protected]$InstanceDns

Now, get one of the Docker container IDs and use docker exec to change the image being displayed:

ContainerId=$(docker ps --filter volume="rexray-efs-vol" \
--format "{{.ID}}" --latest)
docker exec -it $ContainerId sed -i "s/ecsship/cruiser/" /var/www/index.shtml

To see the update, refresh the load balancer webpage.

Step 6: Clean up

To clean up the resources that you created in this post, take the following steps.

Delete the mount targets and file system.

FileSystemId=$(aws efs describe-file-systems \
--region $AWSRegion \
--query 'FileSystems[?Name==`/rexray-efs-vol`].FileSystemId' \
--output text)
MountTargetIds=($(aws efs describe-mount-targets \
--region $AWSRegion \
--file-system-id $FileSystemId \
--query 'MountTargets[].MountTargetId' --output text))
aws efs delete-mount-target --region $AWSRegion \
--mount-target-id ${MountTargetIds[2]}
aws efs delete-mount-target --region $AWSRegion \
--mount-target-id ${MountTargetIds[1]}
aws efs delete-file-system --region $AWSRegion \
--file-system-id $FileSystemId 

Delete the service.

aws ecs update-service \
--region $AWSRegion \
--cluster $ECSClusterName \
--service $SvcDefinitionArn \
--desired-count 0
aws ecs delete-service \
--region $AWSRegion \
--cluster $ECSClusterName \
--service $SvcDefinitionArn

Delete the CloudFormation template. This removes the rest of the environment that was pre-created for this exercise.

aws cloudformation delete-stack --region $AWSRegion \
--stack-name $CloudFormationStack

Summary

Congratulations on getting your service up and running with Docker volume plugins and EFS!

You have created a CloudFormation stack including two instances, running the REX-Ray EFS plugin, across two subnets, a Network Load Balancer, as well as an ECS cluster. You also created a task definition and service which used the plugin to create an elastic filesystem.

We look forward to hearing about how you use Docker Volume Plugins with ECS.

Tiffany and Jeremy

Amazon ECS and Docker volume drivers, part 1: Amazon EBS

Post Syndicated from tiffany jernigan (@tiffanyfayj) original https://aws.amazon.com/blogs/compute/amazon-ecs-and-docker-volume-drivers-amazon-ebs/

→ Part 2: Amazon EFS

 

Post by: Jeremy Cowan, Ronnie Eichler, and Tiffany Jernigan

Introduction

Containers are emerging as the default compute primitive for building cloud-native applications.  They facilitate the adoption of continuous delivery, and help increase infrastructure use.

However, deploying stateful application as containers has been challenging because containers have short life-spans, get re-deployed frequently, are scaled up and down dynamically, and often share the same host with other containers. All of these factors make it challenging for you to appropriately align the lifecycles of storage volumes and containers.

Before Docker volume driver support was added to Amazon ECS, you had to manage storage volumes manually using custom tooling such as bash scripts, Lambda functions, or manual configuration of Docker volumes. Now, you can now take full advantage of the Docker plugin ecosystem by using popular plugins such as REX-Ray or Portworx.

ECS support for Docker volumes means that you can now deploy stateful and storage-intensive use cases. These include:

  • Machine learning and data processing workloads
  • Applications such as GitLab or Jenkins that share a filesystem across multiple tasks
  • Databases such as Cassandra or RocksDB
  • Streaming tools such as Kafka
  • Additional scratch space added to containers that process large workloads and are storage-intensive

To support this broad array of use cases, ECS offers you the flexibility to configure the lifecycle of the Docker volume. For example, you can specify whether it is a scratch space volume specific to a single instantiation of a task, or a persistent volume that persists beyond the lifecycle of a unique instantiation of the task. You can also choose to use a Docker volume that you’ve created before launching your task.

In addition to managing the Docker volume configuration and lifecycle, the ECS scheduler is now plugin-aware. ECS takes the availability of the requested driver into account in its placement decisions, so that tasks that require a certain driver are only placed on container instances that have the driver installed.

Docker and Docker volumes

Docker volumes are a way to persist data outside of the lifecycle of a container. Containers themselves are made up of multiple immutable layers of storage with an ephemeral layer, which is read/write. If your application writes files to the ephemeral layer, these changes are lost when the container stops.

Volumes are managed outside of the container lifecycle—stopping or removing the container does not remove the volume. Docker also supports volume drivers that allow you to use volumes as an abstraction between containers and persistent storage such as Amazon EBS or Amazon EFS. By default, Docker provides a driver called ‘local’ that provides local storage volumes to containers. With Docker plugins, you can now add volume drivers to provision and manage EBS and EFS storage, such as REX-Ray, Portworx, and NetShare.

To deploy a stateful application such as Cassandra, MongoDB, Zookeeper, or Kafka, you likely need high-performance persistent storage like EBS. Docker volumes allow you to present an EBS volume to your application as a Docker volume.

There are other applications such as Jenkins and GitLab, where multiple copies of the application need access to the same data. With volume drivers and EFS, you can present EFS as a shared volume to multiple instances of your container so that you can scale your application yet still retain and persist shared data on EFS.

Another overlooked use case involves applications that need scratch space. When you define a task in ECS and your application writes to the filesystem inside of the container (not on a Docker volume), the task consumes space on the underlying EC2 instance that is shared by all other running tasks. This can lead to issues of ‘noisy neighbors’ if a task were to write a bunch of data to /tmp on its local filesystem.

Now with Docker volume support in ECS, you can map an EBS volume to /tmp (or whatever your scratch space directory you prefer). You can ensure good performance while limiting the size of the underlying EBS volume using arguments in your ECS task to the volume driver.

What is REX-Ray?

REX-Ray is just one example of a Docker volume driver plugin that provides an abstraction between Docker volumes and the underlying storage. Built on top of the libStorage framework, REX-Ray’s simplified architecture consists of a single binary. It runs as a stateless service on every host, using a configuration file to orchestrate multiple storage platforms. REX-Ray supports multiple storage backends. For this post, we focus on EBS as a storage backend. Part two of this series focuses on EFS.

Using a plugin such as REX-Ray, your Docker container is able to persist data outside of the lifespan of a running container. You don’t have to worry about the underlying storage. Instead, you simply reference a Docker volume in your task definition and let REX-Ray provide the abstraction. While this post is specific to REX-Ray, ECS is designed to be open and pass through the volume driver arguments from your task definition to Docker. You can use any volume driver (such as Portworx) that is supported by Docker.

Putting it all together

Before you can get started using Docker volumes with ECS, there are a few things you need to do.

First, you need a suitable volume driver plugin, such as REX-Ray, to provide an abstraction between the Docker volume and the underlying storage, for example, EBS or EFS. Docker designed volumes and the associated driver mechanism to be pluggable to support a variety of storage backends. Although we’ve chosen to highlight REX-Ray for this post, there are several others to choose from, including Portworx and NetShare.

Because the volume plugin interacts with the AWS storage services on your behalf, an IAM role has to be assigned to the ECS container instances. This allows REX-Ray to issue the appropriate AWS API calls and perform actions such as attaching and detaching EBS volumes, and so on.

Using REX-Ray with Amazon EBS

To help you get started, we’ve created an AWS CloudFormation template that builds a two-node ECS cluster.  The template bootstraps the rexray/ebs volume driver onto each node and assigns them an IAM role with an inline policy that allows them to call the API actions that REX-Ray needs.  The template also creates a Network Load Balancer, which is used to expose an ECS service to the internet.

Finally, you create a task definition for a stateful service—MySQL—that uses the the rexray/ebs driver. Observe how the volume where MySQL stores its data is moved when the MySQL task is scheduled on another instance in the cluster.

Set up the environment

Here’s how to set up the environment for this walkthrough.

Step 1: Instantiate the AWS CloudFormation template

aws cloudformation create-stack --stack-name rexray-demo \
--capabilities CAPABILITY_NAMED_IAM \
--template-url http://s3.amazonaws.com/ecs-refarch-volume-plugins/rexray-demo.json \
--parameters ParameterKey=KeyName,ParameterValue=<keypair-name>

The ECS container instances are bootstrapped using the following script, which is given as user data in rexyray-demo.json.

#open file descriptor for stderr
exec 2>>/var/log/ecs/ecs-agent-install.log
set -x
#verify that the agent is running
until curl -s http://localhost:51678/v1/metadata
do
	sleep 1
done
#install the Docker volume plugin
docker plugin install rexray/ebs REXRAY_PREEMPT=true EBS_REGION=<AWS_REGION> --grant-all-permissions
#restart the ECS agent
stop ecs 
start ecs

Step 2: Export output parameters as environment variables

This shell script exports the output parameters from the CloudFormation template and imports them as OS environment variables.  You use these variables later to create task and service definitions.

cat > get-outputs.sh << 'EOF'
#!/bin/bash
function usage {
  echo "usage: source <(./get-outputs.sh <stackname-or-stackid> <region>)"
  echo "stack name or ID must be provided or exported as the CloudFormationStack environment variable"
  echo "region must be provided or set with aws configure"
}

function main {
    #Get stack
    if [ -z "$1" ]; then
        if [ -z "$CloudFormationStack" ]; then
            echo "please provide stack name or ID"
            usage
            exit 1
        fi
    else
        CloudFormationStack="$1"
    fi
    #Get region
    if [ -z "$2" ]; then
        region=$(aws configure get region)
        if [ -z $region ]; then
            echo "please provide region"
            usage
            exit 1
        fi
    else
        region="$2"
    fi
    
    echo "#Region: $region"
    echo "#Stack: $CloudFormationStack"
    echo "#---"
    
    echo "#Checking if stack exists..."
    aws cloudformation wait stack-exists \
    --region $region \
    --stack-name $CloudFormationStack
    
    echo "#Checking if stack creation is complete..."
    aws cloudformation wait stack-create-complete \
    --region $region \
    --stack-name $CloudFormationStack
     
    echo "#Getting output keys and values..."
    echo "#---"
    aws cloudformation describe-stacks \
    --region $region \
    --stack-name $CloudFormationStack \
    --query 'Stacks[].Outputs[].[OutputKey, OutputValue]' \
    --output text | awk '{print "export", $1"="$2}'
}
main "[email protected]"
EOF

#Add executable permissions
chmod +x get-outputs.sh

Export the output parameters. The region parameter is only needed if your Region configuration is not us-west-2, as defined in the CloudFormation template.

./get-outputs.sh && source <(./get-outputs.sh)

Step 3: Create the task definition

In this step, you create a task definition for MySQL.  MySQL is considered stateful service because the data stored in the database has to persist beyond the life of the task.

When the MySQL task is restarted on another instance in the cluster, the scheduler and the rexray/ebs plugin ensure that the task is launched on an instance that can re-establish a connection to the EBS volume where the database is stored.

The placement constraint in the task definition informs the ECS service scheduler to launch the task in a specific Availability Zone; the available zone where the EBS volume was originally created.  Such a constraint is necessary because instances cannot connect to volumes in a different Availability Zone.

cat > mysql-taskdef.json << EOF 
{
    "containerDefinitions": [
        {
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "${CWLogGroupName}",
                    "awslogs-region": "${AWSRegion}",
                    "awslogs-stream-prefix": "ecs"
                }
            },
            "portMappings": [
                {
                    "containerPort": 3306,
                    "protocol": "tcp"
                }
            ],
            "environment": [
                {
                    "name": "MYSQL_ROOT_PASSWORD",
                    "value": "my-secret-pw"
                }
            ],
            "mountPoints": [
                {
                    "containerPath": "/var/lib/mysql",
                    "sourceVolume": "rexray-vol"
                }
            ],
            "image": "mysql",
            "essential": true,
            "name": "mysql"
        }
    ],
    "placementConstraints": [
        {
            "type": "memberOf",
            "expression": "attribute:ecs.availability-zone==${AvailabilityZone}"
        }
    ],
    "memory": "512",
    "family": "mysql",
    "networkMode": "awsvpc",
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "512",
    "volumes": [
        {
            "name": "rexray-vol",
            "dockerVolumeConfiguration": {
                "autoprovision": true,
                "scope": "shared",
                "driver": "rexray/ebs",
                "driverOpts": {
                    "volumetype": "gp2",
                    "size": "5"
                }
            }
        }
    ]
}
EOF

Docker volumes support adds several new the parameters to the ECS task definition. These include the volume type, scope, drivers, and Docker options and labels. A volume can either be scoped to a single, specific task or it can be shared among multiple tasks.

When a volume is scoped to a task, it is not meant to be shared across different running tasks.  In contrast, a shared volume is for use cases where the volume lifecycle is independent of the ECS task. The volume can be used by different tasks concurrently or at different times. It is primarily intended for use cases such as single-task applications where the volume persists after the task dies and is re-used when the task starts again. Another use case is when multiple tasks on the same EC2 container instance access the volume concurrently.

The autoprovision parameter is used to specify whether ECS manages the lifecycle of the volume.  When this is set to true, ECS automatically provisions the volume for you, which is what you are doing in the above example.  When it’s set to false, ECS assumes that the volume already exists.  For this example, you could instead set autoprovision to false and run the following command to create a volume:

aws create-volume --size 1 --volume-type gp2 \
--availability-zone $AvailabilityZone \
--tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=rexray-vol}]'

The driver options are used to configure the type of EBS storage use, for example, gp2, standard, io1, and so on, the size of the volume to provision, IOPS, and encryption.  The specific options vary depending on the volume plugin that you are using.

Register the task definition and extract the task definition ARN from the result:

TaskDefinitionArn=$(aws ecs register-task-definition \
--cli-input-json 'file://mysql-taskdef.json' \
| jq -r .taskDefinition.taskDefinitionArn)

Step 4: Create a service definition

In this step, you create a service definition for MySQL.  An ECS service is a long running task that is monitored by the service scheduler.  If the task dies or becomes unhealthy, the scheduler automatically attempts to restart the task.

The MySQL service is fronted by a Network Load Balancer that is configured for forward traffic on port 3306 to the tasks registered with a specific target group.  The desired count is the desired number of task copies to run. The minimum and maximum healthy percent parameters inform the scheduler to only run exactly the number of desired copies of this task at a time. Unless a task has been stopped, it does not try starting a new one.

cat > mysql-svcdef.json << EOF 
{
    "cluster": "${ECSClusterName}",
    "serviceName": "mysql-svc",
    "taskDefinition": "${TaskDefinitionArn}",
    "loadBalancers": [
        {
            "targetGroupArn": "${MySQLTargetGroupArn}",
            "containerName": "mysql",
            "containerPort": 3306
        }
    ],
    "desiredCount": 1,
    "launchType": "EC2",
    "healthCheckGracePeriodSeconds": 60, 
    "deploymentConfiguration": {
        "maximumPercent": 100,
        "minimumHealthyPercent": 0
    },
    "networkConfiguration": {
        "awsvpcConfiguration": {
            "subnets": [
                "${SubnetId}"
            ],
            "securityGroups": [
                "${SecurityGroupId}"
            ],
            "assignPublicIp": "DISABLED"
        }
    }
}
EOF

Create the MySQL service:

SvcDefinitionArn=$(aws ecs create-service \
--cli-input-json file://mysql-svcdef.json \
| jq -r .service.serviceArn)

Step 5: Connect to the MySQL service

After the service is running, configure a MySQL client, such as MySQL Workbench, to connect to the service:

  1. For Connection Name, type “rexray-demo”.
  2. For Hostname, copy and paste the DNS name of the Network Load Balancer.
  3. For Password, type the default password found in the mysql-taskdef.json file.
  4. Choose Test Connection, Close.
  5. Under MySQL Connections, open the rexray-demo connection.

MySQL Workbench

In the Query window, paste the following:

CREATE DATABASE rexraydb;
USE rexraydb;
CREATE TABLE pets (name VARCHAR(20), breed VARCHAR(20));
SHOW TABLES;
DESCRIBE pets;
INSERT INTO pets VALUES ('Fluffy', 'Poodle');
SELECT * FROM pets;

You can execute each line separately by placing the cursor on a line and clicking the execute statement button.

Execute MySQL commands

Step 6: Drain the instance

Now that you have a running MySQL database server running under a container and persisting its data, make sure that it will survive a container replacement.

Docker containers by their nature are designed to be ephemeral. If you upgrade the underlying host operating system, you must drain the tasks off of the instance and let them be re-scheduled onto another ECS host. Below, I show the behavior of persisting the MySQL instance’s data to an EBS volume and allowing the task to be re-scheduled.

The following script identifies the instance that is currently running the task and puts it in a draining state.  This forces the task to be rescheduled onto the other EC2 container instance in the cluster.

cat > drain-instance.sh << 'EOF'

echo "Region [$AWSRegion]"
echo "Cluster [$ECSClusterName]"
echo "Task Definition [$TaskDefinitionArn]"

TaskArns=$(aws ecs list-tasks --region $AWSRegion \
--cluster $ECSClusterName --query taskArns --output text)
echo "Task ARNs [$TaskArns]"

ContainerInstanceArns=$(aws ecs describe-tasks \
--region $AWSRegion --cluster $ECSClusterName \
--tasks $TaskArns \
--query 'tasks[?taskDefinitionArn==`'$TaskDefinitionArn'`]' \
--query 'tasks[].containerInstanceArn' --output text)
echo "Container Instance ARNs [$ContainerInstanceArns]"

echo "DRAINING Instances"
aws ecs update-container-instances-state --region $AWSRegion \
--cluster $ECSClusterName --container-instances $ContainerInstanceArns \
--status "DRAINING"

EOF

In the ECS console, if you click on the cluster and then the tab for the cluster’s tasks, you see the container instance ID for the MySQL task:

Clicking the link of the container instance ID takes you to another page that shows the EC2 instance ID of the instance where the MySQL task is running:

Now run the script:

chmod +x drain-instance.sh
./drain-instance.sh

When you run the script, the tasks on the draining instance are stopped. Because you have an ECS service definition for MySQL, ECS launches new tasks on other ECS instances in the cluster that meet the placement constraints. In this example, you placed a constraint on the Availability Zone of the EBS volume as it’s not possible to detach and re-attach volumes across Availability Zones. Because the volume already exists, REX-Ray attaches the existing volume to the new task. When MySQL starts, it sees this as its data volume and you have access to the recently stored data.

Step 7: Re-connect to the MySQL service

After you see that a new task has been provisioned on the ECS cluster, you can return to MySQL Workbench and attempt to run the following query:

USE rexraydb;
SELECT * FROM pets;

You may get an error message stating “The MySQL server has gone away.” This usually means that the new ECS task has not completed starting or hasn’t been registered yet as a healthy target behind the Network Load Balancer. If you wait a little longer and try again, you should see the same results in the query grid as before.

This environment is meant as a demonstration on how to use Docker volume plugins with ECS for supporting persistent workloads. For an actual production implementation, I recommend scoping the VPC and security groups to only allow network access from trusted resources. This post creates a MySQL server that is accessible from the internet. In addition, you should implement your own strong MySQL root password, among other things.

To clean up this demo, take the following steps.

Delete the service.

aws ecs update-service --cluster $ECSClusterName \
--service $SvcDefinitionArn \
--desired-count 0
aws ecs delete-service --cluster $ECSClusterName \
--service $SvcDefinitionArn

Delete the volume.

Even though you deleted the task and the service, you still need to clean up the EBS volume that you created. You created this volume and referenced it in the ECS task definition. ECS passed this information along to Docker running on the host, which in turn handed it to REX-Ray (your volume driver), which knew how to attach the EBS volume and map it to the container.

The easiest way to delete this volume is from the EC2 console. In the list of volumes, you should see a volume named rexray-vol that is unattached (state=available). Delete this volume as it is no longer needed.

 

REX-Ray Volume

Otherwise, you can run the following command, which grabs the volume ID and deletes it:

rexrayVolumeID=$(aws ec2 describe-volumes --filter Name="tag:Name",Values=rexray-vol \
--query "Volumes[].VolumeId" --output text)
aws ec2 delete-volume --volume-id $rexrayVolumeID

Delete the CloudFormation template.

Lastly, delete the CloudFormation template. This removes the rest of the environment that was pre-created for this exercise.

aws cloudformation delete-stack --stack-name rexray-demo

Summary

While it was possible to use Docker volume plugins with ECS previously, doing so required you to create volumes out of band, that is, outside of ECS, and create placement constraints to restrict where tasks could be run. With native support for Docker volumes, volumes can now be provisioned simply by adding a handful of parameters to an ECS task definition.

Moreover, the ECS scheduler is now volume plugin aware.  Instances that have a volume driver installed on them automatically get annotated with attributes that inform the scheduler where to place tasks that use a particular driver.  Together, these features help you to run stateful, storage intensive applications such as databases, machine learning, and data processing applications, streaming applications like Kafka, as well as applications that need additional scratch space.  We look forward to hearing about the use cases that this new feature enables.

– Jeremy, Ronnie, and Tiffany

Automating rollback of failed Amazon ECS deployments

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/automating-rollback-of-failed-amazon-ecs-deployments/

Contributed by Vinay Nadig, Associate Solutions Architect, AWS.

With more and more organizations moving toward Agile development, it’s not uncommon to deploy code to production multiple times a day. With the increased speed of deployments, it’s imperative to have a mechanism in place where you can detect errors and roll back problematic deployments early. In this blog post, we look at a solution that automates the process of monitoring Amazon Elastic Container Service (Amazon ECS) deployments and rolling back the deployment if the container health checks fail repeatedly.

The normal flow for a service deployment on Amazon ECS is to create a new task definition revision and update an Amazon ECS service with the new task definition. Based on the values of minimumHealthyPercent and maximumHealthyPercent, Amazon ECS replaces existing containers in batches to complete the deployment. After the deployment is complete, you typically monitor the service health for errors and make a call on rolling back the deployment.

In March 2018, AWS announced support for native Docker health checks on Amazon ECS. Amazon ECS also supports Application Load Balancer health checks for services that are integrated with a load balancer. Leveraging these two features, we can build a solution that automatically rolls back Amazon ECS deployments if health checks fail.

Solution overview

The solution consists of the following components:

·      An Amazon CloudWatch event to listen for the UpdateService event of an Amazon ECS cluster

·      An AWS Lambda function that listens for the Amazon ECS events generated from the cluster after the service update

·      A Lambda function that calculates the failure percentage based on the events in the Amazon ECS event stream

·      A Lambda function that triggers rollback of the deployment if there are high error rates in the event

·      An AWS Step Functions state machine to orchestrate the entire flow

 

The following diagram shows the solution’s components and workflow.

Assumptions

The following assumptions are important to understand before you implement the solution:

·      The solution assumes that with every revision of the task definition, you use a new Docker tag instead of using the default “latest” tag. As a best practice, we advise that you do every release with a different Docker image tag and a revision of the task definition.

·      If there are continuous healthcheck failures even after the deployment is automatically rolled back using this setup, another rollback is triggered due to the health check failures. This might introduce a runaway deployment rollback loop. Make sure that you use the solution where you know that a one-step rollback will bring the Amazon ECS service into a stable state.

·      This blog post assumes deployment to the US West (Oregon) us-west-2 Region. If you want to deploy the solution to other Regions, you need to make minor modifications to the Lambda code.

·      The Amazon ECS cluster launches in a new VPC. Make sure that your VPC service limit allows for a new VPC.

Prerequisites

You need the following permissions in AWS Identity and Access Management (IAM) to implement the solution:

·      Create IAM Roles

·      Create ECS Cluster

·      Create CloudWatch Rule

·      Create Lambda Functions

·      Create Step Functions

Creating the Amazon ECS cluster

First, we create an Amazon ECS cluster using the AWS Management Console.

1. Sign in to the AWS Management Console and open the Amazon ECS console.

2. For Step 1: Select cluster template, choose EC2 Linux + Networking and then choose Next step.

3. For Step 2: Configure cluster, under Configure cluster, enter the Amazon ECS cluster name as AutoRollbackTestCluster.

 

4. Under Instance configuration, for EC2 instance type, choose t2.micro.

5. Keep the default values for the rest of the settings and choose Create.

 

This provisions an Amazon ECS cluster with a single Amazon ECS container instance.

Creating the task definition

Next, we create a new task definition using the Nginx Alpine image.

1. On the Amazon ECS console, choose Task Definitions in the navigation pane and then choose Create new Task Definition.

2. For Step 1: Select launch type compatibility, choose EC2 and then choose Next step.

3. For Task Definition Name, enter Web-Service-Definition.

4. Under Task size, under Container Definitions, choose Add Container.

5.  On the Add container pane, under Standard, enter Web-Service-Container for Container name.

6.  For Image, enter nginx:alpine. This pulls the nginx:alpine Docker image from Docker Hub.

7.  For Memory Limits (MiB), choose Hard limit and enter 128.

8.  Under Advanced container configuration, enter the following information for Healthcheck:

 

·      Command:

CMD-SHELL, wget http://localhost/ && rm index.html || exit 1

·      Interval: 10

·      Timeout: 30

·      Start period: 10

·      Retries: 2

9.  Keep the default values for the rest of the settings on this pane and choose Add.

10. Choose Create.

Creating the Amazon ECS service

Next, we create an Amazon ECS service that uses this task definition.

1.  On the Amazon ECS console, choose Clusters in the navigation pane and then choose AutoRollbackTestCluster.

2.  On the Services view, choose Create.

3.  For Step 1: Configure service, use the following settings:

·      Launch type: EC2.

·      Task Definition Family: Web-Service-Definition. This automatically selects the latest revision of the task definition.

·      Cluster: AutoRollbackTestCluster.

·      Service name: Nginx-Web-Service.

·      Number of tasks: 3.

4.  Keep the default values for the rest of the settings and choose Next Step.

5.  For Step 2: Configure network, keep the default value for Load balancer type and choose Next Step.

6. For Step 3: Set Auto Scaling (optional), keep the default value for Service Auto Scaling and choose Next Step.

7. For Step 4: Review, review the settings and choose Create Service.

After creating the service, you should have three tasks running in the cluster. You can verify this on the Tasks view in the service, as shown in the following image.

Implementing the solution

With the Amazon ECS cluster set up, we can move on to implementing the solution.

Creating the IAM role

First, we create an IAM role for reading the event stream of the Amazon ECS service and rolling back any faulty deployments.

 

1.  Open the IAM console and choose Policies in the navigation pane.

2.  Choose Create policy.

3.  On the Visual editor view, for Service, choose EC2 Container Service.

4.  For Actions, under Access Level, select DescribeServices for Read and UpdateServices for Write.

5.  Choose Review policy.

6.  For Name, enter ECSRollbackPolicy.

7.  For Description, enter an appropriate description.

8.  Choose Create policy.

Creating the Lambda service role

Next, we create a Lambda service role that uses the previously created IAM policy. The Lambda function to roll back faulty deployments uses this role.

 

1.  On the IAM console, choose Roles in the navigation pane and then choose Create role.

2.  For the type of trusted entity, choose AWS service.

3.  For the service that will use this role, choose Lambda.

4.  Choose Next: Permissions.

5.  Under Attach permissions policies, select the ECSRollbackPolicy policy that you created.

6. Choose Next: Review.

7.  For Role name, enter ECSRollbackLambdaRole and choose Create role.

Creating the Lambda function for the Step Functions workflow and Amazon ECS event stream

The next step is to create the Lambda function that will collect Amazon ECS events from the Amazon ECS event stream. This Lambda function will be part of the Step Functions state machine.

 

1.  Open the Lambda console and choose Create function.

2.  For Name, enter ECSEventCollector.

3.  For Runtime, choose Python 3.6.

4.  For Existing role, choose the ECSRollbackLambdaRole IAM role that you created.

5. Choose Create function.

6.  On the Configuration view, under Function code, enter the following code.

import time
import boto3
from datetime import datetime

ecs = boto3.client('ecs', region_name='us-west-2')


def lambda_handler(event, context):
    service_name = event['detail']['requestParameters']['service']
    cluster_name = event['detail']['requestParameters']['cluster']
    _update_time = event['detail']['eventTime']
    _update_time = datetime.strptime(_update_time, "%Y-%m-%dT%H:%M:%SZ")
    start_time = _update_time.strftime("%s")
    seconds_from_start = time.time() - int(start_time)
    event.update({'seconds_from_start': seconds_from_start})

    _services = ecs.describe_services(
        cluster=cluster_name, services=[service_name])
    service = _services['services'][0]
    service_events = service['events']
    events_since_update = [event for event in service_events if int(
        (event['createdAt']).strftime("%s")) > int(start_time)]
    [event.pop('createdAt') for event in events_since_update]
    event.update({"events": events_since_update})
    return event

 

7. Under Basic Settings, set Timeout to 30 seconds.

 

8.  Choose Save.

Creating the Lambda function to calculate failure percentage

Next, we create a Lambda function that calculates the failure percentage based on the number of failed container health checks derived from the event stream.

 

1.     On the Lambda console, choose Create function.

2.     For Name, enter ECSFailureCalculator.

3.     For Runtime, choose Python 3.6.

4.     For Existing role, choose the ECSRollbackLambdaRole IAM role that you created.

5.     Choose Create function.

6.     On the Configuration view, under Function code, enter the following code.

 

import re

lb_hc_regex = re.compile("\(service (.*)?\) \(instance (i-[a-z0-9]{7,17})\) \(port ([0-9]{4,5})\) is unhealthy in \(target-group (.*)?\) due to \((.*)?: \[(.*)\]\)")
docker_hc_regex = re.compile("\(service (.*)?\) \(task ([a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})\) failed container health checks\.")
task_registration_formats = ["\(service (.*)?\) has started ([0-9]{1,9}) tasks: (.*)\."]


def lambda_handler(event, context):
    cluster_name = event['detail']['requestParameters']['cluster']
    service_name = event['detail']['requestParameters']['service']

    messages = [m['message'] for m in event['events']]
    failures = get_failure_messages(messages)
    registrations = get_registration_messages(messages)
    failure_percentage = get_failure_percentage(failures, registrations)
    print("Failure Percentage = {}".format(failure_percentage))
    return {"failure_percentage": failure_percentage, "service_name": service_name, "cluster_name": cluster_name}


def get_failure_percentage(failures, registrations):
    no_of_failures = len(failures)
    no_of_registrations = sum([float(x[0][1]) for x in registrations])
    return no_of_failures / no_of_registrations * 100 if no_of_registrations > 0 else 0


def get_failure_messages(messages):
    failures = []
    for message in messages:
        failures.append(lb_hc_regex.findall(message)) if lb_hc_regex.findall(message) else None
        failures.append(docker_hc_regex.findall(message)) if docker_hc_regex.findall(message) else None
    return failures


def get_registration_messages(messages):
    registrations = []
    for message in messages:
        for registration_format in task_registration_formats:
            if re.findall(registration_format, message):
                registrations.append(re.findall(registration_format, message))
    return registrations

7.     Under Basic Settings, set Timeout to 30 seconds.

8.     Choose Save.

Creating the Lambda function to roll back a deployment

Next, we create a Lambda function to roll back an Amazon ECS deployment.

 

1.     On the Lambda console, choose Create function.

2.     For Name, enter ECSRollbackfunction.

3.     For Runtime, choose Python 3.6.

4.     For Existing role, choose the ECSRollbackLambdaRole IAM role that you created.

5.     Choose Create function.

6.     On the Configuration view, under Function code, enter the following code.

 

import boto3

ecs = boto3.client('ecs', region_name='us-west-2')

def lambda_handler(event, context):
    service_name = event['service_name']
    cluster_name = event['cluster_name']

    _services = ecs.describe_services(cluster=cluster_name, services=[service_name])
    task_definition = _services['services'][0][u'taskDefinition']
    previous_task_definition = get_previous_task_definition(task_definition)

    ecs.update_service(cluster=cluster_name, service=service_name, taskDefinition=previous_task_definition)
    print("Rollback Complete")
    return {"Rollback": True}

def get_previous_task_definition(task_definition):
    previous_version_number = str(int(task_definition.split(':')[-1])-1)
    previous_task_definition = ':'.join(task_definition.split(':')[:-1]) + ':' + previous_version_number
    return previous_task_definition


7.     Under Basic Settings, set Timeout to 30 seconds.

8.     Choose Save.

Creating the Step Functions state machine

Next, we create a Step Functions state machine that performs the following steps:

 

1.     Collect events of a specified service for a specified duration from the event stream of the Amazon ECS cluster.

2.     Calculate the percentage of failures after the deployment.

3.     If the failure percentage is greater than a specified threshold, roll back the service to the previous task definition.

 

To create the state machine:

1.     Open the Step Functions console and choose Create state machine.

2.     For Name, enter ECSAutoRollback.

For IAM role, keep the default selection of Create a role for me and select the check box. This will create a new IAM role with necessary permissions for the execution of the state machine.

Note
If you have already created a Step Functions state machine, IAM Role is populated.

3.     For State machine definition, enter the following code, replacing the Amazon Resource Name (ARN) placeholders with the ARNs of the three Lambda functions that you created.

{
    "StartAt": "VerifyClusterAndService",
    "States":
    {
        "VerifyClusterAndService":
        {
            "Type": "Choice",
            "Choices": [
            {
                "And": [
                {
                    "Variable": "$.detail.requestParameters.cluster",
                    "StringEquals": "AutoRollbackTestCluster"
                },
                {
                    "Variable": "$.detail.requestParameters.service",
                    "StringEquals": "Nginx-Web-Service"
                }],
                "Next": "GetTasksStatus"
            },
            {
                "Not":
                {
                    "And": [
                    {
                        "Variable": "$.detail.requestParameters.cluster",
                        "StringEquals": "AutoRollbackTestCluster"
                    },
                    {
                        "Variable": "$.detail.requestParameters.service",
                        "StringEquals": "Nginx-Web-Service"
                    }]
                },
                "Next": "EndState"
            }]
        },
        "GetTasksStatus":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSEventCollector-Lambda-Function>",
            "Next": "WaitForInterval"
        },
        "WaitForInterval":
        {
            "Type": "Wait",
            "Seconds": 5,
            "Next": "IntervalCheck"
        },
        "IntervalCheck":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.seconds_from_start",
                "NumericGreaterThan": 300,
                "Next": "FailureCalculator"
            },
            {
                "Variable": "$.seconds_from_start",
                "NumericLessThan": 300,
                "Next": "GetTasksStatus"
            }]
        },
        "FailureCalculator":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSFailureCalculator-Lambda-Function-here>",
            "Next": "RollbackDecider"
        },
        "RollbackDecider":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.failure_percentage",
                "NumericGreaterThan": 10,
                "Next": "RollBackDeployment"
            },
            {
                "Variable": "$.failure_percentage",
                "NumericLessThan": 10,
                "Next": "EndState"
            }]
        },
        "RollBackDeployment":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSRollbackFunction-Lambda-Function-here>",
            "Next": "EndState"
        },
        "EndState":
        {
            "Type": "Succeed"
        }
    }
}

4.     Choose Create state machine.

 

Now we have a mechanism to roll back a deployment if there are more than a configurable percentage of errors after a deployment to a specific Amazon ECS service.

(Optional) Monitoring and rolling back all services in the Amazon ECS cluster

Step Functions hard-codes the Amazon ECS service name in the state machine so that you monitor only a specific service in the cluster. The following image shows these lines in the state machine’s definition.

If you want to monitor all services and automatically roll back any Amazon ECS deployment in the cluster  based on failures, modify the state machine definition to verify only the cluster name and to not verify the service name. To do this, remove the service name check in the definition, as shown in the following image.

The following code verifies only the cluster name. It monitors any Amazon ECS service and performs a rollback if there are errors.

{
    "StartAt": "VerifyClusterAndService",
    "States":
    {
        "VerifyClusterAndService":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.detail.requestParameters.cluster",
                "StringEquals": "AutoRollbackTestCluster",
                "Next": "GetTasksStatus"
            },
            {
                "Not":
                {
                    "Variable": "$.detail.requestParameters.cluster",
                    "StringEquals": "AutoRollbackTestCluster"
                },
                "Next": "EndState"
            }]
        },
        "GetTasksStatus":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSEventCollector-Lambda-Function>",
            "Next": "WaitForInterval"
        },
        "WaitForInterval":
        {
            "Type": "Wait",
            "Seconds": 5,
            "Next": "IntervalCheck"
        },
        "IntervalCheck":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.seconds_from_start",
                "NumericGreaterThan": 300,
                "Next": "FailureCalculator"
            },
            {
                "Variable": "$.seconds_from_start",
                "NumericLessThan": 300,
                "Next": "GetTasksStatus"
            }]
        },
        "FailureCalculator":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSFailureCalculator-Lambda-Function-here>",
            "Next": "RollbackDecider"
        },
        "RollbackDecider":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.failure_percentage",
                "NumericGreaterThan": 10,
                "Next": "RollBackDeployment"
            },
            {
                "Variable": "$.failure_percentage",
                "NumericLessThan": 10,
                "Next": "EndState"
            }]
        },
        "RollBackDeployment":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSRollbackFunction-Lambda-Function-here>",
            "Next": "EndState"
        },
        "EndState":
        {
            "Type": "Succeed"
        }
    }
}

 

Configuring the state machine to execute automatically upon Amazon ECS deployment

Next, we configure a trigger for the state machine so that its execution automatically starts when there is an Amazon ECS deployment. We use Amazon CloudWatch to configure the trigger.

 

1.     Open the CloudWatch console and choose Rules in the navigation pane.

2.     Choose Create rule and use the following settings:

·      Event Source

o   Service Name: EC2 Container Service (ECS)

o   Event Type: AWS API Call via CloudTrail

o   Operations: choose ‘Specific Operations and enter UpdateService

·      Targets

o   Step Functions state machine

o   State machine: ECSAutoRollback

3.     Choose Configure details.

4.     For Name, enter ECSServiceUpdateRule.

5.     For Description, enter an appropriate description.

6.     For State, make sure that Enabled is selected.

7.     Click Create rule.

 

Setting up the CloudWatch trigger is the last step in linking the Amazon ECS UpdateService events to the Step Functions state machine that we set up. With this step complete, we can move on to testing the solution.

Testing the solution

Let’s update the task definition and force a failure of the container health checks so that we can confirm that the deployment rollback occurs as expected.

 

To test the solution:

 

1.     Open the Amazon ECS console and choose Task Definitions in the navigation pane.

2.     Select the check box next to Web-Service-Definition and choose Create new revision.

3.     Under Container Definitions, choose Web-Service-Container.

4.     On the Edit container pane, under Healthcheck, update Command to

CMD-SHELL, wget http://localhost/does-not-exist.html && rm index.html || exit 1 

and choose Update.

5.     Choose Create. This creates the task definition revision.

6.     Open the Nginx-Web-Service page of the Amazon ECS console and choose Update.

7.     For Task Definition, select the latest revision.

8.    Keep the default values for the rest of the settings by choosing Next Step until you reach Review.

9.     Choose Update Service. This creates a new Amazon ECS deployment.

This service update triggers the CloudWatch rule, which in turn triggers the state machine. The state machine collects the Amazon ECS events for 300 seconds. If the percentage of errors due to health check failures is more than 10%, the deployment is automatically rolled back. You can verify this on the Step Functions console. On the Executions view, you should see a new execution that the deployment is triggering, as shown in the following image.

Choose the execution to see the workflow in progress. After the workflow is complete, you can check the outcome of the workflow by choosing EndState in Visual Workflow. The output should show {“Rollback”: true}.

You can also verify in the service details that the service has been updated with the previous version of the task definition.

Conclusion

With this solution, you can detect issues with Amazon ECS deployments early on and automate failure responses. You can also integrate the solution into your existing systems by triggering an Amazon SNS notification to send email or SMS instead of rolling back the deployment automatically. Though this blog uses Amazon ECS, you can follow similar steps to have automatic rollback for AWS Fargate.

If you want to customize the duration for monitoring your deployments before deciding to rollback and the error percentage threshold beyond which a rollback should be triggered, modify the values highlighted in the following image of the state machine definition.

Measuring service chargeback in Amazon ECS

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/measuring-service-chargeback-in-amazon-ecs/

Contributed by Subhrangshu Kumar Sarkar, Sr. Technical Account Manager, and Shiva Kumar Subramanian, Sr. Technical Account Manager

Amazon Elastic Container Service (ECS) users have been asking us for a way to allocate cost to the deployed services in a shared Amazon ECS cluster. This blog post can help customers think through different techniques to allocate costs incurred by running Amazon ECS services to owners who include specific teams or individual users. The post dives in to one technique that gives customers a granular way to allocate costs to Amazon ECS service owners.

Amazon ECS pricing models

Amazon ECS has two pricing models.  In the Amazon EC2 launch type model, you pay for the AWS resources (e.g., Amazon EC2 instances or Amazon EBS volumes) that you create to store and run your application. Right now, it’s difficult to calculate the aggregate cost of an Amazon ECS service that consists of multiple tasks. In the AWS Fargate launch type model, you pay for vCPU and memory resources that your containerized application requests. Although the user knows the cost that the tasks incur, there is no out-of-box way to associate that cost to a service.

Possible solutions

There are two possible solutions to this problem.

A. Billing based on the usage of container instances in a partitioned cluster.

One solution for service chargeback is to associate specific container instances with respective teams or customers. Then use task placement constraints to restrict the services that they deploy to only those container instances. The following image shows how this solution works.

Here, user A is allowed to deploy services only the blue container instances and user B is allowed on the green ones. Both users can be charged based on the AWS resources they use. E.g. the EC2 instances and the ALB etc.

This solution is useful when you don’t want to host services from different teams or users on the same set of container instances. However, an Amazon ECS cluster is getting shared, and the end users are still getting charged for the Amazon EC2 instances and other AWS assets that they’re using rather than for the exact vCPU and memory resources that their service is using. The disadvantage to this approach is that you could have provisioned excess capacity for your users and end up wasting resources. You also need to use placement constraints in all of your task definitions.

B. Billing based on resource usage at the task level.

Another solution could be to develop a mechanism to let the Amazon ECS cluster owners calculate the aggregate cost of an Amazon ECS service that consists of multiple tasks. The solution would have a metering mechanism and a chargeback measurement. When deployed for Amazon EC2 launch type tasks, the metering mechanism tracks the vCPU and memory that Amazon ECS reserves in the tasks’ lifetime. Then, with the chargeback measurement, the cluster owner can associate a cost with these tasks based on the cost incurred by the container instances that they’re running on. The following image shows how this solution works.

Here, unlike the previous solution, both users can use all the container instances of the ECS cluster.

With this solution, customers can start using a shared Amazon ECS cluster to deploy their tasks on any of the container instances. After the solution has been deployed, the cost for a service can be calculated at any point in time, using the cluster and the service name as input parameters.

With Fargate tasks, the vCPU and memory usage details are already available in vCPU-hours and GB-hours, respectively. The chargeback measurement in the solution aggregates the CPU and memory reservation of all the tasks that ever ran as part of a service. It associates a cost to this aggregated CPU and memory reservation by multiplying it with Fargate’s per vCPU per hour and perGB per hour cost, respectively.

This solution has the following considerations:

  • Amazon EC2 pricing: For the base price of the container instance, we’re considering the On-Demand price.
  • Platform costs: Common costs for the cluster (the Amazon EBS volume that the containers are launched from, Amazon ECR, etc.) are treated as the platform cost for all of the services running on the cluster.
  • Networking cost: When you’re using bridge or host networking, there is no mechanism to divide costs among different tasks that are launched on the container instance.
  • Elastic Load Balancing or Application Load Balancer costs: If services sit behind multiple target groups of an Application Load Balancer, there is no direct way of dividing costs per target group.

Solution components

The solution has two components: a metering mechanism and a chargeback measurement.

The metering mechanism consists of the following parts:

The chargeback measurement consists of the following parts:

  • Python script
  • AWS Price List Service API

Metering mechanism

The following image shows the architecture of the solution’s metering mechanism.

As part of the deployment of the Metering mechanism, the user needs to do the following.

  1. A CloudWatch Events rule is created by the user to trigger a Lambda function on an Amazon ECS task state change event. Typically, a task state change event is generated with a call to the StartTask, RunTask, and StopTask API operations or when an Amazon ECS service scheduler starts or stops a task.
  2. User needs to create a DynamoDB table, which the Lambda function can update.
  3. Every time the Lambda function is invoked, it updates the DynamoDB table with details of the Amazon ECS task.

With the first run of the metering mechanism, it takes stock of all running Amazon ECS tasks across all services across all clusters. This data resides in DynamoDB from then on, and the solution’s chargeback measurement uses it.

Chargeback measurement

The following image shows the architecture of the chargeback measurement.

When you need to find the cost associated with a service, run the ecs-chargeback Python script with the cluster and service names as parameters. This script performs the following actions.

  1. Find all the tasks that have ever run or are currently running as part of the service.
  2. For each task, calculate the up time.
  3. For each task, find the container instance type (for Amazon EC2 type tasks).
  4. Find what percentage of the host’s compute or memory resources the task has reserved. If there is no task-level CPU reservation for Amazon EC2 launch type tasks, a CPU reservation of 128 CPU shares (0.125 vCPUs) is assumed. In Amazon EC2 launch type tasks, you have to specify memory reservation at the task or container level during creation of the task definition.
  5. Associate that percentage with a cost.
  6. (Optional) Use the following parameters:
    • Duration: By default, the script shows the service cost for its complete uptime. You can use the duration parameter to get the cost for a particular month, the month to date, or the last n days.
    • Weight: This parameter is a weighted fraction that you can use to disproportionately divide the instance cost between vCPU and memory. By default, this value is 0.5.

The vCPU and memory costs are calculated using the following formulas:

  • Task vCPU cost = (task vCPU reservation/total vCPUs in the instance) * (cost of the instance) * (vCPU/memory weight) * task run time in seconds
  • Task memory cost = (task memory reservation/total memory in the instance) * (cost of the instance) * (1- vCPU/memory weight) * task run time in seconds

Solution deployment and cost measurement

Here are the steps to deploy the solution in your AWS account and then calculate the service chargeback.

Metering mechanics

1. Create a DynamoDB table named ECSTaskStatus to capture details of an ECS task state change CloudWatch event.

Primary partition key: taskArn. Type: string.

Provision RCUs or WCUs depending on your Amazon ECS usage.

For the rest, keep the default values.

aws dynamodb create-table --table-name ECSTaskStatus \
--attribute-definitions AttributeName=taskArn,AttributeType=S \
--key-schema AttributeName=taskArn,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=10,WriteCapacityUnits=20

2. Create an IAM policy named LambdaECSTaskStatusPolicy that allows the Lambda function to make    the following API calls. Create a local copy of the policy document LambdaECSTaskStatusPolicy.JSON from GitHub.

o	ecs: DescribeContainerInstances
o	dynamodb: BatchGetItem, BatchWriteItem, PutItem, GetItem, and UpdateItem

o	logs: CreateLogGroup, CreateLogStream, and PutLogEvents

aws iam create-policy --policy-name LambdaECSTaskStatusPolicy \
--policy-document file://LambdaECSTaskStatusPolicy.JSON

3. Create an IAM role named LambdaECSTaskStatusRole and attach the policy to the role. Replace <Policy ARN> with the Amazon Resource Name (ARN) of the IAM policy.

aws iam create-role --role-name LambdaECSTaskStatusRole \
--assume-role-policy-document \
'{ "Version": "2012-10-17", "Statement": { "Effect": "Allow", "Principal": {"Service": "lambda.amazonaws.com"}, "Action": "sts:AssumeRole"}}'

aws iam attach-role-policy --policy-arn <Policy ARN> --role-name LambdaECSTaskStatusRole

4. Create a Lambda function named ecsTaskStatus that PUTs or UPDATEs the Amazon ECS task details to the ECSTaskStatus DynamoDB table. This function has the following details:

o   Runtime: Python 3.6.

o   Memory setting: 128 MB.

o   Timeout: 3 seconds.

o   Execution role: LambdaECSTaskStatusRole.

o   Code: ecsTaskStatus.py. Use the inline code editor on the Lambda console to author the function.

 

5. Create a CloudWatch Events rule for Amazon ECS task state change events and configure the Lambda function as the target. The function puts or updates items in the ECSTaskStatus DynamoDB table with every Amazon ECS task’s details.

a.     Create the CloudWatch Events rule.

aws events put-rule --name ECSTaskStatusRule \
--event-pattern '{"source": ["aws.ecs"], "detail-type": ["ECS Task State Change"], "detail": {"lastStatus": ["RUNNING", "STOPPED"]}}'

b.     Add the Lambda function as a target to the CloudWatch Events rule. Replace <Lambda ARN> with the ARN of the Lambda function that you created in step 4.

aws events put-targets --rule ECSTaskStatusRule --targets "Id"="1","Arn"="<Lambda ARN>"

c.     Add permissions for CloudWatch Events to invoke Lambda. Replace <CW Events Rule ARN> with the ARN of the CloudWatch Events rule that you created in step 5a.

aws lambda add-permission --function-name ecsTaskStatus \
--action 'lambda:InvokeFunction' --statement-id "LambdaAddPermission" \
--principal events.amazonaws.com --source-arn <CW Events Rule ARN>

The solution invokes the Lambda function only when an Amazon ECS task state change event occurs. Therefore, when the solution is deployed, no event is raised for current running tasks, and task details aren’t populated into the DynamoDB table. If you want to meter current running tasks, you can run the script ecsTaskStatus-FirstRun.py after creation of the DynamoDB table. This populates all running tasks’ details into the DynamoDB table. The script is idempotent.

ecsTaskStatus-FirstRun.py --region eu-west-1

Chargeback measurement

To find the cost for running a service, run the Python script ecs-chargeback, which has the following usage and arguments.

./ecs-chargeback -h
usage: ecs-chargeback [-h] --region REGION --cluster CLUSTER --service SERVICE
                      [--weight WEIGHT] [-v]
                      [--month MONTH | --days DAYS | --hours HOURS]

optional arguments:
  -h, --help            show this help message and exit
  --region REGION, -r REGION
                        AWS Region in which Amazon ECS service is running.
  --cluster CLUSTER, -c CLUSTER
                        ClusterARN in which Amazon ECS service is running.
  --service SERVICE, -s SERVICE
                        Name of the AWS ECS service for which cost has to be
                        calculated.
  --weight WEIGHT, -w WEIGHT
                        Floating point value that defines CPU:Memory Cost
                        Ratio to be used for dividing EC2 pricing
  -v, --verbose
  --month MONTH, -M MONTH
                        Show charges for a service for a particular month
  --days DAYS, -D DAYS  Show charges for a service for last N days
  --hours HOURS, -H HOURS
                        Show charges for a service for last N hours

 

To calculate the cost that a service incurs with Amazon EC2 launch type tasks, run the script as follows.

./ecs-chargeback -r eu-west-1 -c ecs-chargeback -s nginxsvc

The following is sample output of running this script.

# ECS Region  : eu-west-1, ECS Service Name: nginxsvc
# ECS Cluster : arn:aws:ecs:eu-west-1:675410410211:cluster/ecs-chargeback
#
# Amazon ECS Service Cost           : 26.547270 USD
#             (Launch Type : EC2)
#         EC2 vCPU Usage Cost       : 21.237816 USD
#         EC2 Memory Usage Cost     : 5.309454 USD

To get the chargeback for Fargate launch type tasks, run the script as follows.

./ecs-chargeback -r eu-west-1 -c ecs-chargeback -s fargatesvc

The following is sample output of this script.


# ECS Region  : eu-west-1, ECS Service Name: fargatesvc
# ECS Cluster : arn:aws:ecs:eu-west-1:675410410211:cluster/ecs-chargeback
#
# Amazon ECS Service Cost           : 118.653359 USD
#             (Launch Type : FARGATE)
#         Fargate vCPU Usage Cost   : 78.998157 USD
#         Fargate Memory Usage Cost : 39.655201 USD

Conclusion

This solution can help Amazon ECS users track and allocate costs for their deployed workloads. It might also help them save some costs by letting them share an Amazon ECS cluster among multiple users or teams. We welcome your comments and questions below. Please reach out to us if you would like to contribute to the solution.

Introducing private registry authentication support for AWS Fargate

Post Syndicated from tiffany jernigan (@tiffanyfayj) original https://aws.amazon.com/blogs/compute/introducing-private-registry-authentication-support-for-aws-fargate/

Private registry authentication support for Amazon Elastic Container Service (Amazon ECS) is now available with the AWS Fargate launch type! Now, in addition to Amazon Elastic Container Registry (Amazon ECR), you can use any private registry or repository of your choice for both EC2 and Fargate launch types.

For ECS to pull from a private repository, it needs a secret in AWS Secrets Manager with your registry credentials, an ECS task execution IAM role in AWS Identity Access Management (IAM) with a policy granting access to the secret, and a task with the secret and task execution IAM role ARNs in the task definition.

Diagram of ECS Private Registry Authentication Architecture

Here’s how to use ECS with a private repository on Docker Hub via the AWS Management Console.

Registry

If you don’t already have a private repository (or account), you can create a free repo now. To follow along, run the following commands in a terminal to pull an image, get the image ID, and push it to your new repository:

docker pull tiffanyfay/space
docker images tiffanyfay/space --format {{.ID}}
docker tag <image-id> <your-username/repository-name>:latest
docker login
docker push <your-username/repository-name>

Secrets Manager

In the Secrets Manager console, store a new secret with your Docker Hub credentials, which is used to access your private repository.

By default, Secrets Manager creates an encryption key, DefaultEncryptionKey, on your behalf. You can instead use an existing key or add a new one with AWS Key Management Service (AWS KMS), if you would prefer.

Choose Other type of secrets and add secret keys and values for username and password.

Next, create a name, such as dockerhub, and description for your secret.

Because the keys are corresponding to your Docker Hub credentials, leave rotation disabled.

On the next page, you can review your settings and store your secret. Open your new secret to see the details. Write down the Secret ARN value and keep it handy, as it is used in the next step and later, in your task definition.

IAM

Now that you have a secret, you need to provide Fargate permissions to read it. This is done via a task execution IAM role.

In the IAM console, choose Policies, Create policy. Provide Secrets Manager with read access for secretsmanager:GetSecretValue, with your secret’s ARN as the resource.

Name your policy dockerhubsecret.

If you chose to use your own encryption key, you also need to create a policy with kms:Decrypt permissions for KMS.

Next, choose Role to create an IAM role, which is used as your task execution role. Choose AWS service, Elastic Container Service, and Elastic Container Service Task.

Search for your dockerhubsecret policy and attach it to the role.

Lastly, give the role a name, such as ecsExecutionRoleDockerHub, and create it. Copy the role ARN value. Depending on how you create your task definition, you may need it.

ECS

While the mechanism to authenticate private registries is supported on both EC2 and Fargate launch types, for this example we will be launching a task on Fargate.

Before you can create a task, you need an ECS cluster, VPC, and subnets. If you don’t already have them, in the ECS console, choose Clusters, Get Started. Keep track of the cluster name, VPC ID, and subnet IDs, as you use them soon.

It’s time to create your task definition, which is used to create your task (grouping of up to ten containers that run on the same host). This is where you need your Secrets Manager ARN and IAM role name.

Choose Task Definitions, Create new Task Definition, and select the Fargate launch type. You can then configure your task definition via the wizard or scroll down, choose Configure via JSON and paste the following task definition after replacing fields with angle brackets. This task definition also works with the EC2 launch type.

{
    "family": "space-td",
    "containerDefinitions": [
        {
            "name": "space",
            "image": "<your-username/repository-name>",
            "portMappings": [
                {
                    "protocol": "tcp",
                    "containerPort": 80
                }
            ],
            "cpu": 0,
            "repositoryCredentials": {
                "credentialsParameter": "<secret-ARN>"
            }
        }
    ],
    "memory": "512",
    "cpu": "256",
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "networkMode": "awsvpc",
    "executionRoleArn": "<execution-role-ARN>"
}

If you use the wizard, give your task a name, such as space-td, and specify your task execution IAM role (ecsTaskExecutionRoleDockerHub), a task size of 0.5 GB of memory, and 0.25 vCPU.

Next, choose Container Definitions, Add container. Give the container a name, specify your image <your-username/repository-name>, check the box for private registry authentication, and add your secrets manager ARN and a container port 80. Choose Add.

After you create your task definition, choose Actions, Run Task, and specify the Fargate launch type, your cluster, cluster VPC, subnets, a security group with inbound permissions for your container ports (the default one provides access to port 80). Enable auto-assigning a public IP address.

Open the task from its ID to see the details:

When the Last status field is RUNNING, under Network, copy the public IP address and paste it in a browser.

If you used pushed tiffanyfay/space to your repository, you should see the following:

I hope this post has helped you. If you have any questions, feel free to reach out!

-tiffany

Special thanks to Yuling Zhou, Deepak Dayama, Derek Petersen, Varun Iyer, Adnan Khan and several others for their insights in this blog.

tiffany jernigan

tiffany jernigan

@tiffanyfayj
Tiffany is a developer advocate at Amazon for containers on AWS. Previously she worked at Docker and Intel in software engineering and as a hardware engineer after graduating from Georgia Tech in Electrical Engineering. In the majority of her free time she dabbles in photography and spends time with family and friends. You can find her on twitter/ig as tiffanyfayj.

Compute Abstractions on AWS: A Visual Story

Post Syndicated from Massimo Re Ferre original https://aws.amazon.com/blogs/architecture/compute-abstractions-on-aws-a-visual-story/

When I joined AWS last year, I wanted to find a way to explain, in the easiest way possible, all the options it offers to users from a compute perspective. There are many ways to peel this onion, but I want to share a “visual story” that I have created.

I define the compute domain as “anything that has CPU and Memory capacity that allows you to run an arbitrary piece of code written in a specific programming language.” Your mileage may vary in how you define it, but this is broad enough that it should cover a lot of different interpretations.

A key part of my story is around the introduction of different levels of compute abstractions this industry has witnessed in the last 20 or so years.

Separation of duties

The start of my story is a line. In a cloud environment, this line defines the perimeter between the consumer role and the provider role. In the cloud, there are things that AWS will do and things that the consumer will do. The perimeter of these responsibilities varies depending on the services you opt to use. If you want to understand more about this concept, read the AWS Shared Responsibility Model documentation.

The different abstraction levels

The reason why the line above is oblique is because it needs to intercept different compute abstraction levels. If you think about what happened in the last 20 years of IT, we have seen a surge of different compute abstractions that changed the way people consume CPU and Memory resources. It all started with physical (x86) servers back in the 80s, and then we have seen the industry adding abstraction layers over the years (for example, hypervisors, containers, functions).

The higher you go in the abstraction levels, the more the cloud provider can add value and can offload the consumer from non-strategic activities. A lot of these activities tend to be “undifferentiated heavy lifting.” We define this as something that AWS customers have to do but that don’t necessarily differentiate them from their competitors (because those activities are table-stakes in that particular industry).

What we found is that supporting millions of customers on AWS requires a certain degree of flexibility in the services we offer because there are many different patterns, use cases, and requirements to satisfy. Giving our customers choices is something AWS always strives for.

A couple of final notes before we dig deeper. The way this story builds up through the blog post is aligned to the progression of the launch dates of the various services, with a few noted exceptions. Also, the services mentioned are all generally available and production-grade. For full transparency, the integration among some of them may still be work-in-progress, which I’ll call out explicitly as we go.

The instance (or virtual machine) abstraction

This is the very first abstraction we introduced on AWS back in 2006. Amazon Elastic Compute Cloud (Amazon EC2) is the service that allows AWS customers to launch instances in the cloud. When customers intercept us at this level, they retain responsibility of the guest operating system and above (middleware, applications, etc.) and their lifecycle. AWS has the responsibility for managing the hardware and the hypervisor including their lifecycle.

At the very same level of the stack there is also Amazon Lightsail, which “is the easiest way to get started with AWS for developers, small businesses, students, and other users who need a simple virtual private server (VPS) solution. Lightsail provides developers compute, storage, and networking capacity and capabilities to deploy and manage websites and web applications in the cloud.”

And this is how these two services appear in our story:

The container abstraction

With the rise of microservices, a new abstraction took the industry by storm in the last few years: containers. Containers are not a new technology, but the rise of Docker a few years ago democratized access. You can think of a container as a self-contained environment with soft boundaries that includes both your own application as well as the software dependencies to run it. Whereas an instance (or VM) virtualizes a piece of hardware so that you can run dedicated operating systems, a container technology virtualizes an operating system so that you can run separated applications with different (and often incompatible) software dependencies.

And now the tricky part. Modern containers-based solutions are usually implemented in two main logical pieces:

  • A containers control plane that is responsible for exposing the API and interfaces to define, deploy, and lifecycle containers. This is also sometimes referred to as the container orchestration layer.
  • A containers data plane that is responsible for providing capacity (as in CPU/Memory/Network/Storage) so that those containers can actually run and connect to a network. From a practical perspective this is typically a Linux host or less often a Windows host where the containers get started and wired to the network.

Arguably, in a specific compute abstraction discussion, the data plane is key, but it is as important to understand what’s happening for the control plane piece.

In 2014, Amazon launched a production-grade containers control plane called Amazon Elastic Container Service (ECS), which “is a highly scalable, high performance container management service that supports Docker … Amazon ECS eliminates the need for you to install, operate, and scale your own cluster management infrastructure.”

In 2017, Amazon also announced the intention to release a new service called Amazon Elastic Container Service for Kubernetes (EKS) based on Kubernetes, a successful open source containers control plane technology. Amazon EKS was made generally available in early June 2018.

Just like for ECS, the aim for this service is to free AWS customers from having to manage a containers control plane. In the past, AWS customers would spin up EC2 instances and deploy/manage their own Kubernetes masters (masters is the name of the Kubernetes hosts running the control plane) on top of an EC2 abstraction. However, we believe many AWS customers will leave to AWS the burden of managing this layer by either consuming ECS or EKS, depending on their use cases. A comparison between ECS and EKS is beyond the scope of this blog post.

You may have noticed that what we have discussed so far is about the container control plane. How about the containers data plane? This is typically a fleet of EC2 instances managed by the customer. In this particular setup, the containers control plane is managed by AWS while the containers data plane is managed by the customer. One could argue that, with ECS and EKS, we have raised the abstraction level for the control plane, but we have not yet really raised the abstraction level for the data plane as the data plane is still comprised of regular EC2 instances that the customer has responsibility for.

There is more on that later on but, for now, this is how the containers control plane and the containers data plane services appear:

The function abstraction

At re:Invent 2014, AWS introduced another abstraction layer: AWS Lambda. Lambda is an execution environment that allows an AWS customer to run a single function. So instead of having to manage and run a full-blown OS instance to run your code, or having to track all software dependencies in a user-built container to run your code, Lambda allows you to upload your code and let AWS figure out how to run it at scale.

What makes Lambda so special is its event-driven model. Not only can you invoke Lambda directly (for example, via the Amazon API Gateway), but you can trigger a Lambda function upon an event in another AWS service (for example, an upload to Amazon S3 or a change in an Amazon DynamoDB table).

The key point about Lambda is that you don’t have to manage the infrastructure underneath the function you are running. No need to track the status of the physical hosts, no need to track the capacity of the fleet, no need to patch the OS where the function will be running. In a nutshell, no need to spend time and money on the undifferentiated heavy lifting.

And this is how the Lambda service appears:

The bare metal abstraction

Also known as the “no abstraction.”

As recently as re:Invent 2017, we announced (the preview of) the Amazon EC2 bare metal instances. We made this service generally available to the public in May 2018.

This announcement is part of Amazon’s strategy to provide choice to our customers. In this case, we are giving customers direct access to hardware. To quote from Jeff Barr’s post:

“…. (AWS customers) wanted access to the physical resources for applications that take advantage of low-level hardware features such as performance counters and Intel® VT that are not always available or fully supported in virtualized environments, and also for applications intended to run directly on the hardware or licensed and supported for use in non-virtualized environments.”

This is how the bare metal Amazon EC2 i3.metal instance appears:

As a side note, and also as alluded to by Jeff, i3.metal is the foundational EC2 instance type on top of which VMware created their own VMware Cloud on AWS service. We are now offering the ability to any AWS user to provision bare metal instances. This doesn’t necessarily mean you can load your hypervisor of choice out of the box, but you can certainly do things you wouldn’t be able to do with a traditional EC2 instance (note: this was just a Saturday afternoon hack).

More seriously, a question I get often asked is whether users could install ESXi on i3.metal on their own. Today this cannot be done, but I’d be interested in hearing your use case for this.

The full container abstraction (for lack of a better term)

Now that we covered all the abstractions, it is time to go back and see if there are other optimizations we can provide for AWS customers. When we discussed the container abstraction, we called out that while there are two different fully managed containers control planes (ECS and EKS), there wasn’t a managed option for the data plane.

Some customers were (and still are) happy about being in full control of said instances. Others have been very vocal that they wanted to get out of the (undifferentiated heavy-lifting) business of managing the lifecycle of that piece of infrastructure.

Enter AWS Fargate, a production-grade service that provides compute capacity to AWS containers control planes. Practically speaking, Fargate is making the containers data plane fall into the “Provider space” responsibility. This means the compute unit exposed to the user is the container abstraction, while AWS will manage transparently the data plane abstractions underneath.

This is how the Fargate service appears:

Now ECS has two “launch types”: one called “EC2” (where your tasks get deployed on a customer-managed fleet of EC2 instances), and the other one called “Fargate” (where your tasks get deployed on an AWS-managed fleet of EC2 instances).

For EKS, the strategy will be very similar, but as of this writing it was not yet available. If you’re interested in some of the exploration being done to make this happen, this is a good read.

Conclusions

We covered the spectrum of abstraction levels available on AWS and how AWS customers can intercept them depending on their use cases and where they sit on their cloud maturity journey. Customers with a “lift & shift” approach may be more akin to consume services on the left-hand side of the slide, whereas customers with a more mature cloud native approach may be more interested in consuming services on the right-hand side of the slide.

In general, customers tend to use higher-level services to get out of the business of managing non-differentiating activities. For example, I recently talked to a customer interested in using Fargate. The trigger there was the fact that Fargate is ISO, PCI, SOC and HIPAA compliant, which was a huge time and money saver for them because it’s easier to point to an AWS document during an audit than having to architect and document for compliance the configuration of a DIY containers data plane.

As a recap, here’s our visual story with all the abstractions available:

I hope you found it useful. Any feedback is greatly appreciated.

About the author

Massimo is a Principal Solutions Architect at AWS. For about 25 years, he specialized on the x86 ecosystem starting with operating systems and virtualization technologies, and lately he has been head down learning about cloud and how application architectures are evolving in that space. Massimo has a blog at www.it20.info and his Twitter handle is @mreferre.

Hosting ASP.NET Core applications in Amazon ECS using AWS Fargate

Post Syndicated from Sundar Narasiman original https://aws.amazon.com/blogs/compute/hosting-asp-net-core-applications-in-amazon-ecs-using-aws-fargate/

There is an increasing amount of customer interest in hosting microservices-based applications using Amazon Elastic Container Service (ECS), largely due to the benefits offered by AWS Fargate.

AWS Fargate is a compute engine for containers that allows you to run containers without needing to provision, manage, or scale any Amazon EC2 compute infrastructure. Fargate works with Amazon ECS and can run microservices developed in many programming languages or application frameworks. This includes Java, .NET Core, Python, Node.js, Go, or Ruby on Rails. Nowadays, enterprises that are building microservices applications using .NET are using .NET core because of the cross-platform support (the ability to run in Linux).

In this post, I cover how to host a cross-platform ASP.NET core application using AWS Fargate.

Reference architecture

A good reference architecture for AWS Fargate application deployment should cover the VPC, Subnets, Load Balancer, Internet Gateway, Elastic Network Interface (ENI), AWS Fargate Task, Network ACLs, and Security Groups. The architectural choices for VPC Networking, Load Balancing, and Container Networking are also important.

There are a couple of networking approaches for deploying containers in Amazon ECS:

  • Deploy containers in the public VPC Subnet with direct Internet access
  • Deploy containers in the private VPC Subnet without direct Internet access

Because the ASP.NET Core application is going to serve traffic from the Internet, we will deploy containers in the Public VPC Subnet with direct Internet access.

When it comes to sending traffic to containers through the Load Balancer, the following options are available:

  • A public Load Balancer that accepts traffic from the Internet and route it to container through the AWS Fargate Task’s Elastic Network Interface (ENI).
  • A private, Internal Load Balancer that only accepts traffic from other containers in the cluster

Because the ASP.NET Core application container lives in the web tier, go with a public Load Balancer. The public Load Balancer accepts traffic from the Internet and routes it to the container through the AWS Fargate Task’s Elastic Network Interface (ENI).

Based on these considerations, the reference architecture for deploying to AWS Fargate should look like this diagram:

This solution deploys containers in a public Subnet (inside a VPC). The AWS Fargate Task and the two containers are hosted with direct access to the internet. They are also accessible to clients, using the public Load Balancer.

Walkthrough

To implement this architecture, we will do the following:

  1. Containerize the ASP.NET core application.
  2. Configure the reverse-proxy server.
  3. Containerize the NGINX reverse-proxy server.
  4. Create the Docker Compose file.
  5. Push container images to Amazon ECR.
  6. Create the ECS cluster.
  7. Create an Application Load Balancer.
  8. Create an AWS Fargate Task definition.
  9. Create the Amazon ECS service.

Code examples

The code examples, Dockerfile definition, Docker Compose file, and ECS task definition for this solution are available in the amazon-ecs-fargate-aspnetcore GitHub repository.

Pre-requisites

The development environment needs to have the following pre-requisites :-

  • Mac OS latest version (or) Windows 10 with latest updates (or) Ubuntu 16.0.4 or higher
  • .NET core 2.0 or higher
  • Docker latest version
  • aws cli
  • aws-ecs cli

Containerize the ASP.NET Core application

The first step in this journey is to containerize the ASP.NET Core application.

If you are using Visual Studio 2017 or later with the latest updates in Windows, you can add container support to the solution. Open the context (right-click) menu for the existing project and add Docker support.

If you are developing in Linux or Mac OS, you must explicitly add a Dockerfile.

The Dockerfile definition should look like the following, irrespective of the operating system used for development.

FROM microsoft/aspnetcore:2.0
WORKDIR /mymvcweb
COPY bin/Release/netcoreapp2.0/publish . 
ENV ASPNETCORE_URLS http://+:5000
EXPOSE 5000
ENTRYPOINT ["dotnet", "mymvcweb.dll"]

This Dockerfile definition creates an application container based on the microsoft/aspnetcore:2.0 base image. It publishes the contents of the bin/Release folder to a specified work directory, starts the default Kestrel web server and listens on port 5000 to serve web traffic.

By default, ASP.NET core uses Kestrel as the web server. Kestrel is a lightweight HTTP server and is great for serving dynamic content from ASP.NET core. However, for capabilities such as serving static content, caching requests, compressing requests, and terminating SSL from the HTTP server, a dedicated reverse-proxy server like NGINX is required.

Configure the reverse-proxy server

NGINX can act as both the HTTP and reverse-proxy server. NGINX is highly adopted because of its asynchronous, event-driven architecture that allows it to serve thousands of concurrent requests with a low-memory footprint.

In this solution, deploy a NGINX (reverse-proxy server) container in front of the application (ASP.NET core) container, defined in the AWS Fargate Task.

The reverse-proxy configuration file nginx.conf should be defined as follows:

worker_processes 4;
 
events { worker_connections 1024; }
 
http {
    sendfile on;
 
    upstream app_servers {
        server 127.0.0.1:5000;
    }
 
    server {
        listen 80;
 
        location / {
            proxy_pass         http://app_servers;
            proxy_redirect     off;
            proxy_set_header   Host $host;
            proxy_set_header   X-Real-IP $remote_addr;
            proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header   X-Forwarded-Host $server_name;
        }
    }
}

The NGINX container is set to listen on port 80 and it is configured to forward the request to the application container listening on port 5000. The attribute upstream app_server in the nginx.conf file must be set with a value of mymvcweb:5000 in the local development environment.

Containerize the NGINX reverse-proxy server

Create a Dockerfile definition like the following to containerize the NGINX reverse-proxy server. It should look like the following:

FROM nginx
COPY nginx.conf /etc/nginx/nginx.conf

Create the Docker Compose file

Next, use docker-compose to define these two containers as a microservices in the local development environment. The Docker Compose file should look like the following:

version: '2'
services:
  mymvcweb:
    build:
      context: ./mymvcweb
      dockerfile: Dockerfile
    expose:
      - "5000"
  reverseproxy:
    build:
      context: ./reverseproxy
      dockerfile: Dockerfile
    ports:
      - "80:80"
    links :
      - mymvcweb

These two containers can be built and tested by issuing the following docker-compose commands:

docker-compose build
docker-compose up

Open http://localhost:80 in the browser and it should render the default view of ‘index. cshtml’. Whenever there is a change to the application code or container definition, the docker-compose cache should be cleaned to affect the latest changes. To do this, run the following docker-compose commands:

docker-compose stop
docker-compose rm
docker-compose rmi ‘containerimageid’

Push container images to Amazon ECR

Next, push the container images from the local environment to Amazon Elastic Container Registry (ECR) so that the container images are available in Amazon ECR before the creation of AWS Fargate cluster.

Before you deploy this application to ECS, the upstream app_server attribute in the nginx.conf file must be set with the value of 127.0.0.1:5000. This enables the communication with the upstream application container listening on port 5000.

The first step to push the container images to ECR is to fetch the docker login command with the required security tokens. Run the following command:

aws ecr get-login --no-include-email --region us-east-1

It should return you a Docker login command with a security token. Copy the command and tokens and run it.

The second step is to tag the local container image with the remote ECR repository. Run the following command:

docker tag aspnetcorefargate_mymvcweb:latest <yourawsaccountnumber>.dkr.ecr.us-east-1.amazonaws.com/mymvcweb:latest

The third step is to push the tagged image to the remote ECR registry. Run the following command:

docker push <yourawsaccountnumber>.dkr.ecr.us-east-1.amazonaws.com/mywebmvc:latest

The above steps are repeated for the NGINX container as well. Now you have the container images available in ECR.

Create the Amazon ECS cluster

The Amazon ECS cluster is a logical grouping for AWS Fargate and Amazon ECS tasks. The cluster remains an administrative boundary for running every application.

In the AWS Management Console, Navigate to Create Cluster and select Networking only.
Since we’re going to create and host the Amazon ECS Service with AWS Fargate as the launch type, the notion of the Amazon ECS Cluster becomes a logical boundary. We need not create ECS instances while creating Amazon ECS Cluster, when the launch type is Fargate. Hence, we can create the Fargate cluster with required networking constructs such as VPC and Subnets.

Name the cluster and select Creation of new VPC for this cluster.

Leave the rest of the fields as their default values. You now have a VPC with two public subnets.

Create an Application Load Balancer

Next, create an Application Load Balancer, as defined in the reference architecture. The Application Load Balancer is required to load balance across multiple AWS Fargate tasks.

In the EC2 console, navigate to Create Load Balancer. Name your Load Balancer as aspnetcorefargatealb.

For Scheme, select internet-facing. For IP address type, choose ipv4. The Load Balancer listens on port 80 (HTTP). The Load Balancer’s Security Group should also allow traffic on port 80 (HTTP) from the internet.

While configuring the routing for the Load Balancer, for Target type, choose ip. For Protocol, choose HTTP. For Path, enter / (forward slash).

For more information, see Creating an Application Load Balancer.

Create an AWS Fargate Task definition

The AWS Fargate Task definition is an important resource, acts as a blueprint for the AWS Fargate task. The Task definition defines parameters such as:

  • Container image URL
  • CPU
  • Memory
  • IAM execution role
  • Host port
  • Container port
  • Log configurations
  • Container networking mode
  • Task type
  • Mount point
  • Volume

A Fargate Task is the running instance of Task definition. Each Task represents a microservice. Tasks can be managed and independently scaled using AWS Fargate Service, which is explained in the upcoming sections.

In the console, choose Task Definitions, Create new Task Definition. For more information, see Creating a Task Definition.

Use the following AWS Fargate Task definition, which based on the reference architecture defined for this walkthrough. Replace <awsaccount> with your own account.

{
  "executionRoleArn": "arn:aws:iam::<awsaccount>:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "dnsSearchDomains": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/aspnetcorefargatetask",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "entryPoint": null,
      "portMappings": [
        {
          "hostPort": 80,
          "protocol": "tcp",
          "containerPort": 80
        }
      ],
      "command": null,
      "linuxParameters": null,
      "cpu": 0,
      "environment": [],
      "ulimits": null,
      "dnsServers": null,
      "mountPoints": [],
      "workingDirectory": null,
      "dockerSecurityOptions": null,
      "memory": null,
      "memoryReservation": 1024,
      "volumesFrom": [],
      "image": "<awsaccount>.dkr.ecr.us-east-1.amazonaws.com/reverseproxy: latest",
      "disableNetworking": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "user": null,
      "readonlyRootFilesystem": null,
      "dockerLabels": null,
      "privileged": null,
      "name": "reverseproxy"
    },
    {
      "dnsSearchDomains": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/aspnetcorefargatetask",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "entryPoint": null,
      "portMappings": [
        {
          "hostPort": 5000,
          "protocol": "tcp",
          "containerPort": 5000
        }
      ],
      "command": null,
      "linuxParameters": null,
      "cpu": 0,
      "environment": [],
      "ulimits": null,
      "dnsServers": null,
      "mountPoints": [],
      "workingDirectory": null,
      "dockerSecurityOptions": null,
      "memory": null,
      "memoryReservation": 1024,
      "volumesFrom": [],
      "image": "<awsaccount>.dkr.ecr.us-east-1.amazonaws.com/mymvcweb:latest",
      "disableNetworking": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "user": null,
      "readonlyRootFilesystem": null,
      "dockerLabels": null,
      "privileged": null,
      "name": "mymvcweb"
    }
  ],
  "placementConstraints": [],
  "memory": "2048",
  "taskRoleArn": "arn:aws:iam::<awsaccount>:role/aspnetecstaskroles",
  "compatibilities": [
    "EC2",
    "FARGATE"
  ],
  "taskDefinitionArn": "arn:aws:ecs:us-east-1:<awsaccount>:task-definition/aspnetcorefargatetask:1",
  "family": "aspnetcorefargatetask",
  "requiresAttributes": [
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.execution-role-ecr-pull"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.task-eni"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.ecr-auth"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.task-iam-role"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.execution-role-awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.21"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
    }
  ],
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc",
  "cpu": "1024",
  "revision": 1,
  "status": "ACTIVE",
  "volumes": []
}

The above Task definition contains two containers, the ASP.NET core and the NGINX reverse-proxy server. Currently, awsvpc is the only networking mode supported for AWS Fargate Tasks. When an AWS Fargate Task is launched, the ECS container network plugin assigns a dedicated Elastic Network Interface (ENI) for the Tasks. This ENI does not share the global default network namespace with ECS instances.

You also specify the subnets for placing tasks across ECS instances. This means that the Subnet Security Group is also applicable to the ENI for the respective Tasks. This enables communication between two AWS Fargate Tasks, or other resources within the VPC. Because of the awsvpc network mode, calls from AWS Fargate Tasks do not go through the eth0 Docker bridge.

Create the Amazon ECS service

The AWS Fargate service is a managed AWS Fargate task. The desired state of the application can be defined using the AWS Fargate service. For more information, see Create a service.

In the console, choose Task Definitions and select the task definition that you just created.

On the Task Definition [name] page, select the revision of the task definition from which to create your service.

Review the task definition, and choose Actions, Create Service. For Launch type, choose FARGATE. Enter values for the rest of the fields:

  • Platform version: LATEST
  • Cluster: aspcorefargatecluster (or the cluster name you chose)
  • Service name: aspcorefargatesvc (or another name of your choice)
  • Number of tasks: 2
  • Minimum healthy percent: 50
  • Maximum percent: 200

On the Configure networking page, select the required VPC and subnets required for running the tasks.

Register the Application Load Balancer (ALB) that you created. The ECS scheduler has built-in intelligence, which makes it seamless to work with Application Load Balancer (ALB).

Then, configure Service Auto Scaling. Even though this is an optional feature, I recommend to enable service-level scaling. It addresses the key tenets of how a microservice should behave at runtime. For more information, see (Optional) Configuring Your Service to Use Service Auto Scaling.

I’m defining minimum number of tasks as 2, desired tasks as 2 and maximum tasks as 3.

Complete the Amazon ECS Service creation.

When the Amazon ECS Task gets placed, the ECS scheduler registers the Task as a target for the Load Balancer.

When the Task is healthy and passes the Load Balancer health checks, it is reflected in the healthy host count.

Access the DNS ‘A’ record of the Load Balancer in the browser. The ASP.NET core application should render successfully.

Conclusion

In this post, we took an existing ASP.NET core application, containerized it, and hosted it in Amazon ECS as a microservice using the AWS Fargate compute engine. AWS Fargate gives you a way to run containers directly without managing any EC2 instances and giving you full control over how the task is defined, including task networking and resources.

If you have questions or suggestions, please comment below.

Sundararajan Narasiman is an AWS Partner Solutions Architect

Refreshing an Amazon ECS Container Instance Cluster With a New AMI

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/refreshing-an-amazon-ecs-container-instance-cluster-with-a-new-ami/

This post contributed by Subhrangshu Kumar Sarkar, Sr. Technical Account Manager at AWS

The Amazon ECS–optimized Amazon Machine Image (AMI) comes prepackaged with the Amazon Elastic Container Service (ECS) container agent, Docker, and the ecs-init service. When updates to these components are released, try to integrate them as quickly as possible. Doing so helps you maintain a safe, secure, and reliable environment for running your containers.

Each release of the ECS–optimized AMI includes bug fixes and feature updates. AWS recommends refreshing your container instance fleet with the latest AMI whenever possible, rather than trying to patch instances in-place. Periodical replacement of your ECS instances aligns with the immutable infrastructure paradigm, which is less prone to human error. It’s also less susceptible to configuration drift because infrastructure is managed through code.

In this post, I show you how to manually refresh the container instances in an active ECS cluster with new container instances built from a newly released AMI. You also see how to refresh the ECS instance fleet when it is part of an Auto Scaling group, and when it is not.

Solution Overview

The following flow chart shows the strategy to be used in refreshing the cluster.

Prerequisites

  • An AWS account with enough room to accommodate “ECS cluster instance count” number of more Amazon EC2 instances, in addition to the existing EC2 instances that you already have during the refresh period. If you have a total of 10 t2.medium instances in an AWS Region where an ECS cluster with four container instances is running, you should be able to spawn four more t2.medium instances. Your instance count comes down to 10 again, after your old instances are de-registered and terminated at the end of the refresh period.
  • An existing ECS cluster (preferably with one or more container instances built with an old AMI), with or without a service running on it.
  • A Linux system with the AWS CLI and JQ installed. This allows you to try the programmatic method of refreshing the cluster. You can SSH into an EC2 virtual machine if you do not have local access to a Linux system.
  • An IAM user with permissions to view ECS resources, deregister and terminate the ECS instances, revise a task definition, and update a service.
  • A specified AWS Region. In this post, the cluster is in us-east-1 and that is the region for all AWS CLI commands mentioned.

Use the following steps to test if you have all the resources and permissions to proceed.

Using the AWS CLI

Run the following command:

# aws ecs list-clusters
Sample output:
{
    "clusterArns": [
        "arn:aws:ecs:us-east-1:012345678910:cluster/workshop-app-cluster"
    ]
}

Choose the cluster to refresh. In my case, the cluster name is workshop-app-cluster, with a service named “workshop-service” running on this cluster.

# aws ecs describe-clusters --clusters <cluster name>

Sample output:

{
    "clusters": [
    {
        "status": "ACTIVE",
        "statistics": [],
        "clusterName": "workshop-app-cluster",
        "registeredContainerInstancesCount": 7,
        "pendingTasksCount": 0,
        "runningTasksCount": 3,
        "activeServicesCount": 1,
        "clusterArn": "arn:aws:ecs:us-east-1:012345678910:cluster/workshop-app-cluster"
    }
    ],
    "failures": []
}

Using the AWS Console

  1. Open the Amazon ECS console.
  2. On the clusters page, select the cluster to refresh.

You should be able to see the details of the services, tasks, and the container instance on the respective tabs.

1. Retrieve the latest ECS–optimized AMI metadata

Previously, to make sure that you were using the latest ECS–optimized AMI, you had to either consult the ECS documentation or subscribe to the ECS AMI Amazon SNS topic.

Now, you can query the AWS Systems Manager Parameter Store API to get the latest AMI version ID or a list of available AMI IDs and their corresponding Docker runtime and ECS agent versions. You can query the Parameter Store API using the AWS CLI or any of the AWS SDKs. In fact, you can now use a Systems Manager parameter in AWS CloudFormation to launch EC2 instances with the latest ECS-optimized AMI.

Run the following command:

aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux/recommended --query "Parameters[].Value" --output text | jq .

Sample output:

{
    "schema_version": 1,
    "image_name": "amzn-ami-2017.09.l-amazon-ecs-optimized",
    "image_id": "ami-aff65ad2",
    "os": "Amazon Linux",
    "ecs_runtime_version": "Docker version 17.12.1-ce",
    "ecs_agent_version": "1.17.3"
}

The image_id is the image ID for the latest ECS–optimized AMI in the Region in which you are operating.

Note: At the time of publication, querying Parameter Store is not possible through the console.

2. Find all outdated container instances

Use the following steps to find all container instances not built with the latest ECS–optimized AMI, which should be refreshed.

Using the AWS CLI

Run the following command on your ECS cluster with the image_id value that you got from the ssm get-parameters command:

aws ecs list-container-instances --cluster <cluster name> --filter "attribute:ecs.ami-id != <image_id>"

Sample output:

{
    "containerInstanceArns": [
    "arn:aws:ecs:us-east-1:012345678910:container-instance/2db66342-5f69-4782-89a3-f9b707f979ab",
    "arn:aws:ecs:us-east-1:012345678910:container-instance/4649d3ab-7f44-40a2-affb-670637c86aad"
    ]
}

Now, find the corresponding EC2 instance IDs for these container instances. The IDs are then used to find the corresponding Auto Scaling group from which to detach the instances.

aws ecs list-container-instances --cluster <cluster name> --filter "attribute:ecs.ami-id != <image_id>"| \
jq -c '.containerInstanceArns[]' | \
xargs aws ecs describe-container-instances --cluster <cluster name> --container-instances | \
jq '[.containerInstances[]|{(.containerInstanceArn) : .ec2InstanceId}]'

Sample output:

[
    {
        "arn:aws:ecs:us-east-1:012345678910:container-instance/2db66342-5f69-4782-89a3-f9b707f979ab": "i-08e8cfc073db135a9"
    },
    {
        "arn:aws:ecs:us-east-1:012345678910:container-instance/4649d3ab-7f44-40a2-affb-670637c86aad": "i-02dd87a0b28e8575b"
    }
]

An ECS container instance is an EC2 instance that is running the ECS container agent and has been registered into a cluster. In the above sample output:

  • 2db66342-5f69-4782-89a3-f9b707f979ab is the container instance ID
  • i-08e8cfc073db135a9 is an EC2 instance ID

Using the AWS Console

  1. In the ECS console, choose Clusters, select the cluster, and choose ECS Instances.
  2. Select Filter by attributes and choose ecs:ami-id as the attribute on which to filter.
  3. Select an AMI ID that is not same as the latest AMI ID, in this case ami-aff65ad2.

For all resulting ECS instances, the container instance ID and the EC2 instance IDs are both visible.

3. List the instances that are part of an Auto Scaling group

If your cluster was created with the console first-run experience after November 24, 2015, then the Auto Scaling group associated with the AWS CloudFormation stack created for your cluster can be scaled up or down to add or remove container instances. You can perform this scaling operation from within the ECS console.

Use the following steps to list the outdated ECS instances that are part of an Auto Scaling group.

Using the AWS CLI

Run the following command:

aws autoscaling describe-auto-scaling-instances --instance-ids <instance id #1> <instance id #2>

Sample output:

{
    "AutoScalingInstances": [
    {
        "ProtectedFromScaleIn": false,
        "AvailabilityZone": "us-east-1b",
        "InstanceId": "i-02dd87a0b28e8575b",
        "AutoScalingGroupName": "EC2ContainerService-workshop-app-cluster-EcsInstanceAsg-1IVVUK4CR81X1",
        "HealthStatus": "HEALTHY",
        "LifecycleState": "InService"
    },
    {
        "ProtectedFromScaleIn": false,
        "AvailabilityZone": "us-east-1a",
        "InstanceId": "i-08e8cfc073db135a9",
        "AutoScalingGroupName": "EC2ContainerService-workshop-app-cluster-EcsInstanceAsg-1IVVUK4CR81X1",
        "HealthStatus": "HEALTHY",
        "LifecycleState": "InService"
    }
    ]
}

The response shows that the instances are part of the EC2ContainerService-workshop-app-cluster-EcsInstanceAsg-1IVVUK4CR81X1 Auto Scaling group.

Using the AWS Console

If the ECS cluster was created from the console, you likely have an associated CloudFormation stack. By default, the stack name is EC2ContainerService-cluster_name.

  1. In the CloudFormation console, select the cluster, choose Outputs, and note the corresponding stack for your cluster.
  2. In the EC2 console, choose Auto Scaling groups.
  3. Select the group and check that the EC2 instance IDs for the ECS instance are registered.

4. Create a new Auto Scaling group

If the container instances are not part of any Auto Scaling group, create a new group from one of the existing container instances and then add all other container instances to it. A launch configuration is automatically created for the new Auto Scaling group.

Using the AWS CLI

Run the following command to create an Auto Scaling group using the EC2 instance ID for an existing container instance:

aws autoscaling create-auto-scaling-group --auto-scaling-group-name <auto-scaling-group-name> --instance-id <instance-id> --min-size 0 --max-size 3

Keep the min-size parameter to 0 and max-size to greater than the number of instances that you are going to add to this Auto Scaling group.

At this point, your Auto Scaling group does not contain any instances. Neither does it have any of the subnets or Availability Zones of any of the old instances, other than the instance from which you made the Auto Scaling group. To add all old instances (including the one from which the Auto Scaling group was created) to this Auto Scaling group, find the subnets and Availability Zones to which they are attached.

Run the following commands:

aws ec2 describe-instances --instance-id <instance-id> --query "Reservations[].Instances[].NetworkInterfaces[].SubnetId" --output text

aws ec2 describe-instances --instance-id <instance-id> --query "Reservations[].Instances[].Placement.AvailabilityZone" --output text

After you have all the Availability Zones and subnets to be added to the Auto Scaling group, run the following command to update the Auto Scaling group:

aws autoscaling update-auto-scaling-group --vpc-zone-identifier <subnet-1>,<subnet-2> --auto-scaling-group-name <auto-scaling-group-name> --availability-zones <availability-zone1> <availability-zone2>

You are now ready to add all the old instances to this Auto Scaling group. Run the following command:

aws autoscaling attach-instances --instance-ids <instance-id 1> <instance-id 2> --auto-scaling-group-name <auto-scaling-group-name>

Now, all existing container instances are part of an Auto Scaling group, which is attached to a launch configuration capable of launching instances with the old AMI.

When you attach instances, Auto Scaling increases the desired capacity of the group by the number of instances being attached.

Using the AWS Console

To create an Auto Scaling group from an existing container instance, do the following steps:

  1. In the ECS console, on the EC2 Instances tab, open the EC2 instance ID for the container instance.
  2. Select the instance and choose Actions, Instance Settings, and Attach to Auto Scaling Group.
  3. On the Attach to Auto Scaling Group page, select a new Auto Scaling group, enter a name for the group, and then choose Attach.

The new Auto Scaling group is created using a new launch configuration with the same name that you specified for the Auto Scaling group. The launch configuration gets its settings (for example, security group and IAM role) from the instance that you attached. The Auto Scaling group also gets settings (for example, Availability Zone and subnet) from the instance that you attached, and has a desired capacity and maximum size of 1.

Now that you have an Auto Scaling group and launch configuration ready, add the max value for the Auto Scaling group to the total number of exiting container instances in the ECS cluster.

To add other container instances of the ECS cluster to this Auto Scaling group:

  1. On the navigation pane, under Auto Scaling, choose Auto Scaling Groups, select the new Auto Scaling group, and choose Edit.
  2. Add subnets for other instances to the Subnet(s) section and save the configuration.
  3. For each of the other container instances of the cluster, open the EC2 instance ID, select the instance, and then choose Actions, Instance Settings, and Attach to Auto Scaling Group.
  4. On the Attach to Auto Scaling Group page, select an existing Auto Scaling group, select the Auto Scaling group that you just created, and then choose Attach.
  5. If the instance doesn’t meet the criteria (for example, if it’s not in the same Availability Zone as the Auto Scaling group), you get an error message with the details. Choose Close and try again with an instance that meets the criteria.

5. Create a new launch configuration

Create a new launch configuration for the Auto Scaling group. This launch configuration should be able to launch instances with the new ECS–optimized AMI. It should also put the user data in the instances to allow them to join the ECS cluster when they are created.

Using the AWS CLI

First, run the following command to get the launch configuration for the Auto Scaling group:

aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names <Auto Scaling group name> --query AutoScalingGroups[].LaunchConfigurationName --output text

Sample output:

EC2ContainerService-workshop-app-cluster-EcsInstanceLc-1LEL4X28KY4X

Now, create a new launch configuration with the new image ID from this existing launch configuration. Create a launch configuration called New-AMI-launch. Substitute the existing launch configuration name for launch-configuration-name and the image ID corresponding to the new AMI for image_id.
aws autoscaling describe-launch-configurations --launch-configuration-name \
<launch-configuration-name> --query "LaunchConfigurations[0]" | \
jq 'del(.LaunchConfigurationARN)' | jq 'del(.CreatedTime)' | \
jq 'del(.KernelId)' | jq 'del(.RamdiskId)' | \
jq '. += {"LaunchConfigurationName": "New-AMI-launch"}' | \
jq '. += {"ImageId": "<image_id>"}' > new-launch-config.json

aws autoscaling create-launch-configuration --cli-input-json file://new-launch-config.json

At this point, the New-AMI-launch launch configuration is ready. Update the Auto Scaling group with the new launch configuration:

aws autoscaling update-auto-scaling-group --auto-scaling-group-name <auto-scaling-group-name> --launch-configuration-name New-AMI-launch

To add block devices to the launch configuration, you can always override the block device mapping for the new launch configuration.

Using the AWS Console

  1. On the Auto Scaling groups page, choose Details in the bottom pane and note the launch configuration for your Auto Scaling group.
  2. On the Launch configurations page, select the launch configuration and choose Copy launch configuration.
  3. On the AMI details page, choose Edit AMI.
  4. In the search box, enter the latest AMI image ID (in this case, ami-aff65ad2) and choose Select.
  5. On the Configure details page, enter a new name for the launch configuration.
  6. Keep everything else the same and choose Create.
  7. On the Auto Scaling groups page, choose Edit.
  8. Select the newly created launch configuration and choose Save.

6. Detach the old ECS instances from the Auto Scaling group

Now that you have a new launch configuration with the Auto Scaling group, detach the old instances from the group.

For every old instance detached, add a new instance through the new launch configuration. This keeps the desired count for the Auto Scaling group unchanged.

Using the AWS CLI

Run the following command:

aws autoscaling detach-instances --instance-ids <instance id #1> <instance id #2> --auto-scaling-group-name <auto-scaling-group-name> --no-should-decrement-desired-capacity

When this is done, the following command should show a blank result:

aws autoscaling describe-auto-scaling-instances --instance-ids <instance id #1> <instance id #2>

The following command should show the new ECS instances, for every old instance detached from the Auto Scaling group:

aws ecs list-container-instances --cluster <cluster name>

The old container instances have been detached from the Auto Scaling group but they are still registered in the ECS cluster.

Using the AWS Console

  1. On the Auto Scaling groups page, select the group.
  2. On the instance tab, select the old container instances.
  3. In the bottom pane, choose Actions, Detach.
  4. In the Detach Instances dialog box, select the check box for Add new instances to the Auto Scaling group to balance the load and choose Detach instances.

7. Revise the task definition and update the service

Now revise the task definition in use to impose a constraint. Subsequent tasks spawned from this task definition are hosted only on ECS instances built with the new AMI.

Using the AWS CLI

Run the following command to get the task definition for the service running on the cluster:

aws ecs describe-services --cluster <cluster name> \
--services <service arn> \
--query "services[].deployments[].["taskDefinition"]" --output text

Sample output

arn:aws:ecs:us-east-1:012345678910:task-definition/workshop-task:9

Here, workshop-task is the family and 9 is the revision. Now, update the task definition with the constraint. Use the built-in attribute, ecs.ami-id, to impose the constraint. Replace the image_id value in the following command with the value found by querying Parameter Store.
aws ecs describe-task-definition --task-definition <task definition family:revision> --query taskDefinition | \
jq '. + {placementConstraints: [{"expression": "attribute:ecs.ami-id == <image_id>", "type": "memberOf"}]}' | \
jq 'del(.status)'| jq 'del(.revision)' | jq 'del(.requiresAttributes)' | \
jq '. + {containerDefinitions:[.containerDefinitions[] + {"memory":256, "memoryReservation": 128}]}'| \
jq 'del(.compatibilities)' | jq 'del(.taskDefinitionArn)' > new-task-def.json

Even if your original container definition doesn’t have a memory or memoryReservation key, you must provide one of those values while updating the task definition. For this post, I have used the task-level memory allocation value (256) and an arbitrary value (128) for those keys, respectively.

aws ecs register-task-definition --cli-input-json file://new-task-def.json

You should now have a new revised version of the task definition. In this example, it’s workshop-task:10.

8. Update the service with the revised task definition

Use the following steps to add the revised task definition to the service.

Using the AWS CLI

Run the following command to update the service with the revised task definition:

aws ecs update-service --cluster <cluster name> --service <service name> --task-definition <task definition family:revised version>

After the service is updated with the revised task definition, the new tasks constituting the service should come up on the new ECS instances, thanks to the constraint in the new task definition.

Use the command on the old container instances until there are no task ARNs in the output:

aws ecs list-tasks --cluster <cluster name> --container-instance <container-instance id #1> --container-instance <container-instance id #2>

Using the AWS Console

  1. In the ECS console, on the Task definitions page, select your task definition and choose Create new revision.
  2. On the Create new revision of task definition page, choose Add constraint.
  3. For Expression, add attribute:ecs.ami-id == <AMI ID for new ECS optimized AMI> and choose Create. You see a new revision of the task definition being created. In this case, workshop-task:10 got created.
  4. To update the service, on the Clusters page, select the service corresponding to the revised task definition.
  5. On the Configure service page, for Task definition, select the appropriate task definition version and choose Next step.
  6. Keep the remaining default values. On the Review page, choose Update service.

On the service page, on the Event tab, you see events corresponding to the old tasks getting stopped new tasks getting started on the new ECS instances.

Wait until no tasks are running on the old ECS instances and you see all tasks starting on the new ECS instances.

9. Deregister and terminate the old ECS instances

Using the AWS CLI

For each of the old container instances, run the following command:

aws ecs deregister-container-instance --cluster <cluster name> --container-instance <container instance id> --query containerInstance.ec2InstanceId

Sample output:

"i-02dd87a0b28e8575b"

Record the EC2 instance ID and then terminate the instance:

aws ec2 terminate-instances --instance-ids <instance-id>

Using the AWS Console

  1. In the ECS console, choose Clusters, ECS instances.
  2. Keep the EC2 instance ID displayed on the EC2 Instance column and keep the instance detail page open.
  3. Open the container instance ID for the ECS instance to deregister.
  4. On the container instance page, choose Deregister.

After the container instance is deregistered, terminate the instance detail page.

At this point, your ECS cluster has been refreshed with the EC2 instances built with the new ECS–optimized AMI.

Conclusion

In this post, I demonstrated how to refresh the container instances in an active ECS cluster with instances built from a newly released ECS–optimized AMI. You can either use the AWS Management Console or programmatically refresh your ECS cluster in some quick steps.

AWS Fargate is a service that’s designed to remove the need to do these types of operations by running and managing all the EC2 infrastructure necessary to support your containers for you. With Fargate, your containers are always started with the latest ECS agent and Docker version.

I welcome your comments and questions below.