All posts by Anuneet Kumar

Securing credentials using AWS Secrets Manager with AWS Fargate

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/securing-credentials-using-aws-secrets-manager-with-aws-fargate/

This post is contributed by Massimo Re Ferre – Principal Developer Advocate, AWS Container Services.

Cloud security at AWS is the highest priority and the work that the Containers team is doing is a testament to that. A month ago, the team introduced an integration between AWS Secrets Manager and AWS Systems Manager Parameter Store with AWS Fargate tasks. Now, Fargate customers can easily consume secrets securely and parameters transparently from their own task definitions.

In this post, I show you an example of how to use Secrets Manager and Fargate integration to ensure that your secrets are never exposed in the wild.

Overview

AWS has engineered Fargate to be highly secure, with multiple, important security measures. One of these measures is ensuring that each Fargate task has its own isolation boundary and does not share the underlying kernel, CPU resources, memory resources, or elastic network interface with other tasks.

Another area of security focus is the Amazon VPC networking integration, which ensures that tasks can be protected the way that an Amazon EC2 instance can be protected from a networking perspective.

This specific announcement, however, is important in the context of our shared responsibility model. For example, DevOps teams building and running solutions on the AWS platform require proper tooling and functionalities to securely manage secrets, passwords, and sensitive parameters at runtime in their application code. Our job is to empower them with platform capabilities to do exactly that and make it as easy as possible.

Sometimes, in a rush to get things out the door quick, we have seen some users trading off some security aspects for agility, from embedding AWS credentials in source code pushed to public repositories all the way to embedding passwords in clear text in privately stored configuration files. We have solved this problem for developers consuming various AWS services by letting them assign IAM roles to Fargate tasks so that their AWS credentials are transparently handled.

This was useful for consuming native AWS services, but what about accessing services and applications that are outside of the scope of IAM roles and IAM policies? Often, the burden of having to deal with these credentials is pushed onto the developers and AWS users in general. It doesn’t have to be this way. Enter the Secrets Manager and Fargate integration!

Starting with Fargate platform version 1.3.0 and later, it is now possible for you to instruct Fargate tasks to securely grab secrets from Secrets Manager so that these secrets are never exposed in the wild—not even in private configuration files.

In addition, this frees you from the burden of having to implement the undifferentiated heavy lifting of securing these secrets. As a bonus, because Secrets Manager supports secrets rotation, you also gain an additional level of security with no additional effort.

Twitter matcher example

In this example, you create a Fargate task that reads a stream of data from Twitter, matches a particular pattern in the messages, and records some information about the tweet in a DynamoDB table.

To do this, use a Python Twitter library called Tweepy to read the stream from Twitter and the AWS Boto 3 Python library to write to Amazon DynamoDB.

The following diagram shows the high-level flow:

The objective of this example is to show a simple use case where you could use IAM roles assigned to tasks to consume AWS services (such as DynamoDB). It also includes consuming external services (such as Twitter), for which explicit non-AWS credentials need to be stored securely.

This is what happens when you launch the Fargate task:

  • The task starts and inherits the task execution role (1) and the task role (2) from IAM.
  • It queries Secrets Manager (3) using the credentials inherited by the task execution role to retrieve the Twitter credentials and pass them onto the task as variables.
  • It reads the stream from Twitter (4) using the credentials that are stored in Secrets Manager.
  • It matches the stream with a configurable pattern and writes to the DynamoDB table (5) using the credentials inherited by the task role.
  • It matches the stream with a configurable pattern and writes to the DynamoDB table (5) and logs to CloudWatch (6) using the credentials inherited by the task role.

As a side note, while for this specific example I use Twitter as an external service that requires sensitive credentials, any external service that has some form of authentication using passwords or keys is acceptable. Modify the Python script as needed to capture relevant data from your own service to write to the DynamoDB table.

Here are the solution steps:

  • Create the Python script
  • Create the Dockerfile
  • Build the container image
  • Create the image repository
  • Create the DynamoDB table
  • Store the credentials securely
  • Create the IAM roles and IAM policies for the Fargate task
  • Create the Fargate task
  • Clean up

Prerequisites

To be able to execute this exercise, you need an environment configured with the following dependencies:

You can also skip this configuration part and launch an AWS Cloud9 instance.

For the purpose of this example, I am working with the AWS CLI, configured to work with the us-west-2 Region. You can opt to work in a different Region. Make sure that the code examples in this post are modified accordingly.

In addition to the list of AWS prerequisites, you need a Twitter developer account. From there, create an application and use the credentials provided that allow you to connect to the Twitter APIs. We will use them later in the blog post when we will add them to AWS Secrets Manager.

Note: many of the commands suggested in this blog post use $REGION and $AWSACCOUNT in them. You can either set environmental variables that point to the region you want to deploy to and to your own account or you can replace those in the command itself with the region and account number. Also, there are some configuration files (json) that use the same patterns; for those the easiest option is to replace the $REGION and $AWSACCOUNT placeholders with the actual region and account number.

Create the Python script

This script is based on the Tweepy streaming example. I modified the script to include the Boto 3 library and instructions that write data to a DynamoDB table. In addition, the script prints the same data to standard output (to be captured in the container log).

This is the Python script:

from __future__ import absolute_import, print_function from tweepy.streaming import StreamListener from tweepy import OAuthHandler from tweepy import Stream import json import boto3 import os

# DynamoDB table name and Region dynamoDBTable=os.environ['DYNAMODBTABLE'] region_name=os.environ['AWSREGION'] # Filter variable (the word for which to filter in your stream) filter=os.environ['FILTER'] # Go to http://apps.twitter.com and create an app. # The consumer key and secret are generated for you after consumer_key=os.environ['CONSUMERKEY'] consumer_secret=os.environ['CONSUMERSECRETKEY'] # After the step above, you are redirected to your app page. # Create an access token under the "Your access token" section access_token=os.environ['ACCESSTOKEN'] access_token_secret=os.environ['ACCESSTOKENSECRET'] class StdOutListener(StreamListener): """ A listener handles tweets that are received from the stream. This is a basic listener that prints received tweets to stdout. """ def on_data(self, data): j = json.loads(data) tweetuser = j['user']['screen_name'] tweetdate = j['created_at'] tweettext = j['text'].encode('ascii', 'ignore').decode('ascii') print(tweetuser) print(tweetdate) print(tweettext) dynamodb = boto3.client('dynamodb',region_name) dynamodb.put_item(TableName=dynamoDBTable, Item={'user':{'S':tweetuser},'date':{'S':tweetdate},'text':{'S':tweettext}}) return True def on_error(self, status): print(status) if __name__ == '__main__': l = StdOutListener() auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) stream = Stream(auth, l) stream.filter(track=[filter]) 

Save this file in a directory and call it twitterstream.py.

This image requires seven parameters, which are clearly visible at the beginning of the script as system variables:

  • The name of the DynamoDB table
  • The Region where you are operating
  • The word or pattern for which to filter
  • The four keys to use to connect to the Twitter API services. Later, I explore how to pass these variables to the container, keeping in mind that some are more sensitive than others.

Create the Dockerfile

Now onto building the actual Docker image. To do that, create a Dockerfile that contains these instructions:

FROM amazonlinux:2
RUN yum install shadow-utils.x86_64 -y
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python get-pip.py
RUN pip install tweepy
RUN pip install boto3
COPY twitterstream.py .
RUN groupadd -r twitterstream && useradd -r -g twitterstream twitterstream
USER twitterstream
CMD ["python", "-u", "twitterstream.py"]

Save it as Dockerfile in the same directory with the twitterstream.py file.

Build the container image

Next, create the container image that you later instantiate as a Fargate task. Build the container image running the following command in the same directory:

docker build -t twitterstream:latest .

Don’t overlook the period (.) at the end of the command: it tells Docker to find the Dockerfile in the current directory.

You now have a local Docker image that, after being properly parameterized, can eventually read from the Twitter APIs and save data in a DynamoDB table.

Create the image repository

Now, store this image in a proper container registry. Create an Amazon ECR repository with the following command:

aws ecr create-repository --repository-name twitterstream --region $REGION

You should see something like the following code example as a result:

{
"repository": {
"registryId": "012345678910",
"repositoryName": "twitterstream",
"repositoryArn": "arn:aws:ecr:us-west-2:012345678910:repository/twitterstream",
"createdAt": 1554473020.0,
"repositoryUri": "012345678910.dkr.ecr.us-west-2.amazonaws.com/twitterstream"
}
}

Tag the local image with the following command:

docker tag twitterstream:latest $AWSACCOUNT.dkr.ecr.$REGION.amazonaws.com/twitterstream:latest

Make sure that you refer to the proper repository by using your AWS account ID and the Region to which you are deploying.

Grab an authorization token from AWS STS:

$(aws ecr get-login --no-include-email --region $REGION)

Now, push the local image to the ECR repository that you just created:

docker push $AWSACCOUNT.dkr.ecr.$REGION.amazonaws.com/twitterstream:latest

You should see something similar to the following result:

The push refers to repository [012345678910.dkr.ecr.us-west-2.amazonaws.com/twitterstream]
435b608431c6: Pushed
86ced7241182: Pushed
e76351c39944: Pushed
e29c13e097a8: Pushed
e55573178275: Pushed
1c729a602f80: Pushed
latest: digest: sha256:010c2446dc40ef2deaedb3f344f12cd916ba0e96877f59029d047417d6cb1f95 size: 1582

Now the image is safely stored in its ECR repository.

Create the DynamoDB table

Now turn to the backend DynamoDB table. This is where you store the extract of the Twitter stream being generated. Specifically, you store the user that published the Tweet, the date when the Tweet was published, and the text of the Tweet.

For the purpose of this example, create a table called twitterStream. This can be customized as one of the parameters that you have to pass to the Fargate task.

Run this command to create the table:

aws dynamodb create-table --region $REGION --table-name twitterStream \
                          --attribute-definitions AttributeName=user,AttributeType=S AttributeName=date,AttributeType=S \
                          --key-schema AttributeName=user,KeyType=HASH AttributeName=date,KeyType=RANGE \
                          --billing-mode PAY_PER_REQUEST

Store the credentials securely

As I hinted earlier, the Python script requires the Fargate task to pass some information as variables. You pass the table name, the Region, and the text to filter as standard task variables. Because this is not sensitive information, it can be shared without raising any concern.

However, other configurations are sensitive and should not be passed over in plaintext, like the Twitter API key. For this reason, use Secrets Manager to store that sensitive information and then read them within the Fargate task securely. This is what the newly announced integration between Fargate and Secrets Manager allows you to accomplish.

You can use the Secrets Manager console or the CLI to store sensitive data.

If you opt to use the console, choose other types of secrets. Under Plaintext, enter your consumer key. Under Select the encryption key, choose DefaultEncryptionKey, as shown in the following screenshot. For more information, see Creating a Basic Secret.

For this example, however, it is easier to use the AWS CLI to create the four secrets required. Run the following commands, but customize them with your own Twitter credentials:

aws secretsmanager create-secret --region $REGION --name CONSUMERKEY \
    --description "Twitter API Consumer Key" \
    --secret-string <your consumer key here> 
aws secretsmanager create-secret --region $REGION --name CONSUMERSECRETKEY \
    --description "Twitter API Consumer Secret Key" \
    --secret-string <your consumer secret key here> 
aws secretsmanager create-secret --region $REGION --name ACCESSTOKEN \
    --description "Twitter API Access Token" \
    --secret-string <your access token here> 
aws secretsmanager create-secret --region $REGION --name ACCESSTOKENSECRET \
    --description "Twitter API Access Token Secret" \
    --secret-string <your access token secret here> 

Each of those commands reports a message confirming that the secret has been created:

{
"VersionId": "7d950825-7aea-42c5-83bb-0c9b36555dbb",
"Name": "CONSUMERSECRETKEY",
"ARN": "arn:aws:secretsmanager:us-west-2:01234567890:secret:CONSUMERSECRETKEY-5D0YUM"
}

From now on, these four API keys no longer appear in any configuration.

The following screenshot shows the console after the commands have been executed:

Create the IAM roles and IAM policies for the Fargate task

To run the Python code properly, your Fargate task must have some specific capabilities. The Fargate task must be able to do the following:

  1. Pull the twitterstream container image (created earlier) from ECR.
  2. Retrieve the Twitter credentials (securely stored earlier) from Secrets Manager.
  3. Log in to a specific Amazon CloudWatch log group (logging is optional but a best practice).
  4. Write to the DynamoDB table (created earlier).

The first three capabilities should be attached to the ECS task execution role. The fourth should be attached to the ECS task role. For more information, see Amazon ECS Task Execution IAM Role.

In other words, the capabilities that are associated with the ECS agent and container instance need to be configured in the ECS task execution role. Capabilities that must be available from within the task itself are configured in the ECS task role.

First, create the two IAM roles that are eventually attached to the Fargate task.

Create a file called ecs-task-role-trust-policy.json with the following content (make sure you replace the $REGION, $AWSACCOUNT placeholders as well as the proper secrets ARNs):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Now, run the following commands to create the twitterstream-task-role role, as well as the twitterstream-task-execution-role:

aws iam create-role --region $REGION --role-name twitterstream-task-role --assume-role-policy-document file://ecs-task-role-trust-policy.json

aws iam create-role --region $REGION --role-name twitterstream-task-execution-role --assume-role-policy-document file://ecs-task-role-trust-policy.json

Next, create a JSON file that codifies the capabilities required for the ECS task role (twitterstream-task-role):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem"
            ],
            "Resource": [
                "arn:aws:dynamodb:$REGION:$AWSACCOUNT:table/twitterStream"
            ]
        }
    ]
}

Save the file as twitterstream-iam-policy-task-role.json.

Now, create a JSON file that codifies the capabilities required for the ECS task execution role (twitterstream-task-execution-role):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue",
                "kms:Decrypt"
            ],
            "Resource": [
                "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERKEY-XXXXXX",
                "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERSECRETKEY-XXXXXX",
                "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:ACCESSTOKEN-XXXXXX",
                "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:ACCESSTOKENSECRET-XXXXXX"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

Save the file as twitterstream-iam-policy-task-execution-role.json.

The following two commands create IAM policy documents and associate them with the IAM roles that you created earlier:

aws iam put-role-policy --region $REGION --role-name twitterstream-task-role --policy-name twitterstream-iam-policy-task-role --policy-document file://twitterstream-iam-policy-task-role.json

aws iam put-role-policy --region $REGION --role-name twitterstream-task-execution-role --policy-name twitterstream-iam-policy-task-execution-role --policy-document file://twitterstream-iam-policy-task-execution-role.json

Create the Fargate task

Now it’s time to tie everything together. As a recap, so far you have:

  • Created the container image that contains your Python code.
  • Created the DynamoDB table where the code is going to save the extract from the Twitter stream.
  • Securely stored the Twitter API credentials in Secrets Manager.
  • Created IAM roles with specific IAM policies that can write to DynamoDB and read from Secrets Manager (among other things).

Now you can tie everything together by creating a Fargate task that executes the container image. To do so, create a file called twitterstream-task.json and populate it with the following configuration:

{
    "family": "twitterstream", 
    "networkMode": "awsvpc", 
    "executionRoleArn": "arn:aws:iam::$AWSACCOUNT:role/twitterstream-task-execution-role",
    "taskRoleArn": "arn:aws:iam::$AWSACCOUNT:role/twitterstream-task-role",
    "containerDefinitions": [
        {
            "name": "twitterstream", 
            "image": "$AWSACCOUNT.dkr.ecr.$REGION.amazonaws.com/twitterstream:latest", 
            "essential": true,
            "environment": [
                {
                    "name": "DYNAMODBTABLE",
                    "value": "twitterStream"
                },
                {
                    "name": "AWSREGION",
                    "value": "$REGION"
                },                
                {
                    "name": "FILTER",
                    "value": "Cloud Computing"
                }
            ],    
            "secrets": [
                {
                    "name": "CONSUMERKEY",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERKEY-XXXXXX"
                },
                {
                    "name": "CONSUMERSECRETKEY",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERSECRETKEY-XXXXXX"
                },
                {
                    "name": "ACCESSTOKEN",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:ACCESSTOKEN-XXXXXX"
                },
                {
                    "name": "ACCESSTOKENSECRET",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:ACCESSTOKENSECRET-XXXXXX"
                }
            ],
            "logConfiguration": {
                    "logDriver": "awslogs",
                    "options": {
                            "awslogs-group": "twitterstream",
                            "awslogs-region": "$REGION",
                            "awslogs-stream-prefix": "twitterstream"
                    }
            }
        }
    ], 
    "requiresCompatibilities": [
        "FARGATE"
    ], 
    "cpu": "256", 
    "memory": "512"
}

To tweak the search string, change the value of the FILTER variable (currently set to “Cloud Computing”).

The Twitter API credentials are never exposed in clear text in these configuration files. There is only a reference to the Amazon Resource Names (ARNs) of the secret names. For example, this is the system variable CONSUMERKEY in the Fargate task configuration:

"secrets": [
                {
                    "name": "CONSUMERKEY",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERKEY-XXXXXX"
                }

This directive asks the ECS agent running on the Fargate instance (that has assumed the specified IAM execution role) to do the following:

  • Connect to Secrets Manager.
  • Get the secret securely.
  • Assign its value to the CONSUMERKEY system variable to be made available to the Fargate task.

Register this task by running the following command:

aws ecs register-task-definition --region $REGION --cli-input-json file://twitterstream-task.json

In preparation to run the task, create the CloudWatch log group with the following command:

aws logs create-log-group --log-group-name twitterstream --region $REGION

If you don’t create the log group upfront, the task fails to start.

Create the ECS cluster

The last step before launching the Fargate task is creating an ECS cluster. An ECS cluster has two distinct dimensions:

  • The EC2 dimension, where the compute capacity is managed by the customer as ECS container instances)
  • The Fargate dimension, where the compute capacity is managed transparently by AWS.

For this example, you use the Fargate dimension, so you are essentially using the ECS cluster as a logical namespace.

Run the following command to create a cluster called twitterstream_cluster (change the name as needed). If you have a default cluster already created in your Region of choice, you can use that, too.

aws ecs create-cluster --cluster-name "twitterstream_cluster" --region $REGION

Now launch the task in the ECS cluster just created (in the us-west-2 Region) with a Fargate launch type. Run the following command:

aws ecs run-task --region $REGION \
  --cluster "twitterstream_cluster" \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=["subnet-6a88e013","subnet-6a88e013"],securityGroups=["sg-7b45660a"],assignPublicIp=ENABLED}" \
  --task-definition twitterstream:1

A few things to pay attention to with this command:

  • If you created more than one revision of the task (by re-running the aws ecs register-task-definition command), make sure to run the aws ecs run-task command with the proper revision number at the end.
  • Customize the network section of the command for your own environment:
    • Use the default security group in your VPC, as the Fargate task only needs outbound connectivity.
    • Use two public subnets in which to start the Fargate task.

The Fargate task comes up in a few seconds and you can see it from the ECS console, as shown in the following screenshot:

Similarly, the DynamoDB table starts being populated with the information collected by the script running in the task, as shown in the following screenshot:

Finally, the Fargate task logs all the activities in the CloudWatch Log group, as shown in the following screenshot:

The log may take a few minutes to populate and be consolidated in CloudWatch.

Clean up

Now that you have completed the walkthrough, you can tear down all the resources that you created to avoid incurring future charges.

First, stop the ECS task that you started:

aws ecs stop-task --cluster twitterstream_cluster --region $REGION --task 4553111a-748e-4f6f-beb5-f95242235fb5

Your task number is different. You can grab it either from the ECS console or from the AWS CLI. This is how you read it from the AWS CLI:

aws ecs list-tasks --cluster twitterstream_cluster --family twitterstream --region $REGION  
{
"taskArns": [
"arn:aws:ecs:us-west-2:693935722839:task/4553111a-748e-4f6f-beb5-f95242235fb5 "
]
}

Then, delete the ECS cluster that you created:

aws ecs delete-cluster --cluster "twitterstream_cluster" --region $REGION

Next, delete the CloudWatch log group:

aws logs delete-log-group --log-group-name twitterstream --region $REGION

The console provides a fast workflow to delete the IAM roles. In the IAM console, choose Roles and filter your search for twitter. You should see the two roles that you created:

Select the two roles and choose Delete role.

Cleaning up the secrets created is straightforward. Run a delete-secret command for each one:

aws secretsmanager delete-secret --region $REGION --secret-id CONSUMERKEY
aws secretsmanager delete-secret --region $REGION --secret-id CONSUMERSECRETKEY
aws secretsmanager delete-secret --region $REGION --secret-id ACCESSTOKEN
aws secretsmanager delete-secret --region $REGION --secret-id ACCESSTOKENSECRET

The next step is to delete the DynamoDB table:

aws dynamodb delete-table --table-name twitterStream --region $REGION

The last step is to delete the ECR repository. By default, you cannot delete a repository that still has container images in it. To address that, add the –force directive:

aws ecr delete-repository --region $REGION --repository-name twitterstream --force

You can de-register the twitterstream task definition by following this procedure in the ECS console. The task definitions remain inactive but visible in the system.

With this, you have deleted all the resources that you created.

Conclusion

In this post, I demonstrated how Fargate can interact with Secrets Manager to retrieve sensitive data (for example, Twitter API credentials). You can securely make the sensitive data available to the code running in the container inside the Fargate task.

I also demonstrated how a Fargate task with a specific IAM role can access other AWS services (for example, DynamoDB).

 

Enabling DNS resolution for Amazon EKS cluster endpoints

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/enabling-dns-resolution-for-amazon-eks-cluster-endpoints/

This post is contributed by Jeremy Cowan – Sr. Container Specialist Solution Architect, AWS

By default, when you create an Amazon EKS cluster, the Kubernetes cluster endpoint is public. While it is accessible from the internet, access to the Kubernetes cluster endpoint is restricted by AWS Identity and Access Management (IAM) and Kubernetes role-based access control (RBAC) policies.

At some point, you may need to configure the Kubernetes cluster endpoint to be private.  Changing your Kubernetes cluster endpoint access from public to private completely disables public access such that it can no longer be accessed from the internet.

In fact, a cluster that has been configured to only allow private access can only be accessed from the following:

  • The VPC where the worker nodes reside
  • Networks that have been peered with that VPC
  • A network that has been connected to AWS through AWS Direct Connect (DX) or a virtual private network (VPN)

However, the name of the Kubernetes cluster endpoint is only resolvable from the worker node VPC, for the following reasons:

  • The Amazon Route 53 private hosted zone that is created for the endpoint is only associated with the worker node VPC.
  • The private hosted zone is created in a separate AWS managed account and cannot be altered.

For more information, see Working with Private Hosted Zones.

This post explains how to use Route 53 inbound and outbound endpoints to resolve the name of the cluster endpoints when a request originates outside the worker node VPC.

Route 53 inbound and outbound endpoints

Route 53 inbound and outbound endpoints allow you to simplify the configuration of hybrid DNS.  DNS queries for AWS resources are resolved by Route 53 resolvers and DNS queries for on-premises resources are forwarded to an on-premises DNS resolver. However, you can also use these Route 53 endpoints to resolve the names of endpoints that are only resolvable from within a specific VPC, like the EKS cluster endpoint.

The following diagrams show how the solution works:

  • A Route 53 inbound endpoint is created in each worker node VPC and associated with a security group that allows inbound DNS requests from external subnets/CIDR ranges.
  • If the requests for the Kubernetes cluster endpoint originate from a peered VPC, those requests must be routed through a Route 53 outbound endpoint.
  • The outbound endpoint, like the inbound endpoint, is associated with a security group that allows inbound requests that originate from the peered VPC or from other VPCs in the Region.
  • A forwarding rule is created for each Kubernetes cluster endpoint.  This rule routes the request through the outbound endpoint to the IP addresses of the inbound endpoints in the worker node VPC, where it is resolved by Route 53.
  • The results of the DNS query for the Kubernetes cluster endpoint are then returned to the requestor.

If the request originates from an on-premises environment, you forego creating the outbound endpoints. Instead, you create a forwarding rule to forward requests for the Kubernetes cluster endpoint to the IP address of the Route 53 inbound endpoints in the worker node VPC.

Solution overview

For this solution, follow these steps:

  • Create an inbound endpoint in the worker node VPC.
  • Create an outbound endpoint in a peered VPC.
  • Create a forwarding rule for the outbound endpoint that sends requests to the Route 53 resolver for the worker node VPC.
  • Create a security group rule to allow inbound traffic from a peered network.
  • (Optional) Create a forwarding rule in your on-premises DNS for the Kubernetes cluster endpoint.

Prerequisites

EKS requires that you enable DNS hostnames and DNS resolution in each worker node VPC when you change the cluster endpoint access from public to private.  It is also a prerequisite for this solution and for all solutions that uses Route 53 private hosted zones.

In addition, you need a route that connects your on-premises network or VPC with the worker node VPC.  In a multi-VPC environment, this can be accomplished by creating a peering connection between two or more VPCs and updating the route table in those VPCs. If you’re connecting from an on-premises environment across a DX or an IPsec VPN, you need a route to the worker node VPC.

Configuring the inbound endpoint

When you provision an EKS cluster, EKS automatically provisions two or more cross-account elastic network interfaces onto two different subnets in your worker node VPC.  These network interfaces are primarily used when the control plane must initiate a connection with your worker nodes, for example, when you use kubectl exec or kubectl proxy. However, they can also be used by the workers to communicate with the Kubernetes API server.

When you change the EKS endpoint access to private, EKS associates a Route 53 private hosted zone with your worker node VPC.  Within this private hosted zone, EKS creates resource records for the cluster endpoint. These records correspond to the IP addresses of the two cross-account elastic network interfaces that were created in your VPC when you provisioned your cluster.

When the IP addresses of these cross-account elastic network interfaces change, for example, when EKS replaces unhealthy control plane nodes, the resource records for the cluster endpoint are automatically updated. This allows your worker nodes to continue communicating with the cluster endpoint when you switch to private access.  If you update the cluster to enable public access and disable private access, your worker nodes revert to using the public Kubernetes cluster endpoint.

By creating a Route 53 inbound endpoint in the worker node VPC, DNS queries are sent to the VPC DNS resolver of worker node VPC.  This endpoint is now capable of resolving the cluster endpoint.

Create an inbound endpoint in the worker node VPC

  1. In the Route 53 console, choose Inbound endpoints, Create Inbound endpoint.
  2. For Endpoint Name, enter a value such as <cluster_name>InboundEndpoint.
  3. For VPC in the Region, choose the VPC ID of the worker node VPC.
  4. For Security group for this endpoint, choose a security group that allows clients or applications from other networks to access this endpoint. For an example, see the Route 53 resolver diagram shown earlier in the post.
  5. Under IP addresses section, choose an Availability Zone that corresponds to a subnet in your VPC.
  6. For IP address, choose Use an IP address that is selected automatically.
  7. Repeat steps 7 and 8 for the second IP address.
  8. Choose Submit.

Or, run the following AWS CLI command:

export DATE=$(date +%s)
export INBOUND_RESOLVER_ID=$(aws route53resolver create-resolver-endpoint --name 
<name> --direction INBOUND --creator-request-id $DATE --security-group-ids <sgs> \
--ip-addresses SubnetId=<subnetId>,Ip=<IP address> SubnetId=<subnetId>,Ip=<IP address> \
| jq -r .ResolverEndpoint.Id)
aws route53resolver list-resolver-endpoint-ip-addresses --resolver-endpoint-id \
$INBOUND_RESOLVER_ID | jq .IpAddresses[].Ip

This outputs the IP addresses assigned to the inbound endpoint.

When you are done creating the inbound endpoint, select the endpoint from the console and choose View details.  This shows you a summary of the configuration for the endpoint.  Record the two IP addresses that were assigned to the inbound endpoint, as you need them later when configuring the forwarding rule.

Connecting from a peered VPC

An outbound endpoint is used to send DNS requests that cannot be resolved “locally” to an external resolver based on a set of rules.

If you are connecting to the EKS cluster from a peered VPC, create an outbound endpoint and forwarding rule in that VPC or expose an outbound endpoint from another VPC. For more information, see Forwarding Outbound DNS Queries to Your Network.

Create an outbound endpoint

  1. In the Route 53 console, choose Outbound endpoints, Create outbound endpoint.
  2. For Endpoint name, enter a value such as <cluster_name>OutboundEnpoint.
  3. For VPC in the Region, select the VPC ID of the VPC where you want to create the outbound endpoint, for example the peered VPC.
  4. For Security group for this endpoint, choose a security group that allows clients and applications from this or other network VPCs to access this endpoint. For an example, see the Route 53 resolver diagram shown earlier in the post.
  5. Under the IP addresses section, choose an Availability Zone that corresponds to a subnet in the peered VPC.
  6. For IP address, choose Use an IP address that is selected automatically.
  7. Repeat steps 7 and 8 for the second IP address.
  8. Choose Submit.

Or, run the following AWS CLI command:

export DATE=$(date +%s)
export OUTBOUND_RESOLVER_ID=$(aws route53resolver create-resolver-endpoint --name 
<name> --direction OUTBOUND --creator-request-id $DATE --security-group-ids <sgs> \
--ip-addresses SubnetId=<subnetId>,Ip=<IP address> SubnetId=<subnetId>,Ip=<Ip address> \
| jq -r .ResolverEndpoint.Id)
aws route53resolver list-resolver-endpoint-ip-addresses --resolver-endpoint-id \
$OUTBOUND_RESOLVER_ID | jq .IpAddresses[].Ip

This outputs the IP addresses that get assigned to the outbound endpoint.

Create a forwarding rule for the cluster endpoint

A forwarding rule is used to send DNS requests that cannot be resolved by the local resolver to another DNS resolver.  For this solution to work, create a forwarding rule for each cluster endpoint to resolve through the outbound endpoint. For more information, see Values That You Specify When You Create or Edit Rules.

  1. In the Route 53 console, choose Rules, Create rule.
  2. Give your rule a name, such as <cluster_name>Rule.
  3. For Rule type, choose Forward.
  4. For Domain name, type the name of the cluster endpoint for your EKS cluster.
  5. For VPCs that use this rule, select all of the VPCs to which this rule should apply.  If you have multiple VPCs that must access the cluster endpoint, include them in the list of VPCs.
  6. For Outbound endpoint, select the outbound endpoint to use to send DNS requests to the inbound endpoint of the worker node VPC.
  7. Under the Target IP addresses section, enter the IP addresses of the inbound endpoint that corresponds to the EKS endpoint that you entered in the Domain name field.
  8. Choose Submit.

Or, run the following AWS CLI command:

export DATE=$(date +%s)
aws route53resolver create-resolver-rule --name <name> --rule-type FORWARD \
--creator-request-id $DATE --domain-name <cluster_endpoint> --target-ips \
Ip=<IP of inbound endpoint>,Port=53 --resolver-endpoint-id <Id of outbound endpoint>

Accessing the cluster endpoint

After creating the inbound and outbound endpoints and the DNS forwarding rule, you should be able to resolve the name of the cluster endpoints from the peered VPC.

$ dig 9FF86DB0668DC670F27F426024E7CDBD.sk1.us-east-1.eks.amazonaws.com 

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.68.rc1.58.amzn1 <<>> 9FF86DB0668DC670F27F426024E7CDBD.sk1.us-east-1.eks.amazonaws.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7168
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;9FF86DB0668DC670F27F426024E7CDBD.sk1.us-east-1.eks.amazonaws.com. IN A
;; ANSWER SECTION:
9FF86DB0668DC670F27F426024E7CDBD.sk1.us-east-1.eks.amazonaws.com. 60 IN A 192.168.109.77
9FF86DB0668DC670F27F426024E7CDBD.sk1.us-east-1.eks.amazonaws.com. 60 IN A 192.168.74.42
;; Query time: 12 msec
;; SERVER: 172.16.0.2#53(172.16.0.2)
;; WHEN: Mon Apr 8 22:39:05 2019
;; MSG SIZE rcvd: 114

Before you can access the cluster endpoint, you must add the IP address range of the peered VPCs to the EKS control plane security group. For more information, see Tutorial: Creating a VPC with Public and Private Subnets for Your Amazon EKS Cluster.

Add a rule to the EKS cluster control plane security group

  1. In the EC2 console, choose Security Groups.
  2. Find the security group associated with the EKS cluster control plane.  If you used eksctl to provision your cluster, the security group is named as follows: eksctl-<cluster_name>-cluster/ControlPlaneSecurityGroup.
  3. Add a rule that allows port 443 inbound from the CIDR range of the peered VPC.
  4. Choose Save.

Run kubectl

With the proper security group rule in place, you should now be able to issue kubectl commands from a machine in the peered VPC against the cluster endpoint.

$ kubectl get nodes
NAME                             STATUS    ROLES     AGE       VERSION
ip-192-168-18-187.ec2.internal   Ready     <none>    22d       v1.11.5
ip-192-168-61-233.ec2.internal   Ready     <none>    22d       v1.11.5

Connecting from an on-premises environment

To manage your EKS cluster from your on-premises environment, configure a forwarding rule in your on-premises DNS to forward DNS queries to the inbound endpoint of the worker node VPCs. I’ve provided brief descriptions for how to do this for BIND, dnsmasq, and Windows DNS below.

Create a forwarding zone in BIND for the cluster endpoint

Add the following to the BIND configuration file:

zone "<cluster endpoint FQDN>" {
    type forward;
    forwarders { <inbound endpoint IP #1>; <inbound endpoint IP #2>; };
};

Create a forwarding zone in dnsmasq for the cluster endpoint

If you’re using dnsmasq, add the --server=/<cluster endpoint FQDN>/<inbound endpoint IP> flag to the startup options.

Create a forwarding zone in Windows DNS for the cluster endpoint

If you’re using Windows DNS, create a conditional forwarder.  Use the cluster endpoint FQDN for the DNS domain and the IPs of the inbound endpoints for the IP addresses of the servers to which to forward the requests.

Add a security group rule to the cluster control plane

Follow the steps in Adding A Rule To The EKS Cluster Control Plane Security Group. This time, use the CIDR of your on-premises network instead of the peered VPC.

Conclusion

When you configure the EKS cluster endpoint to be private only, its name can only be resolved from the worker node VPC. To manage the cluster from another VPC or your on-premises network, you can use the solution outlined in this post to create an inbound resolver for the worker node VPC.

This inbound endpoint is a feature that allows your DNS resolvers to easily resolve domain names for AWS resources. That includes the private hosted zone that gets associated with your VPC when you make the EKS cluster endpoint private. For more information, see Resolving DNS Queries Between VPCs and Your Network.  As always, I welcome your feedback about this solution.

Anatomy of CVE-2019-5736: A runc container escape!

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/anatomy-of-cve-2019-5736-a-runc-container-escape/

This post is courtesy of Samuel Karp, Senior Software Development Engineer — Amazon Container Services.

On Monday, February 11, CVE-2019-5736 was disclosed.  This vulnerability is a flaw in runc, which can be exploited to escape Linux containers launched with Docker, containerd, CRI-O, or any other user of runc.  But how does it work?  Dive in!

This concern has already been addressed for AWS, and no customer action is required. For more information, see the security bulletin.

A review of Linux process management

Before I explain the vulnerability, here’s a review of some Linux basics.

  • Processes and syscalls
  • What is /proc?
  • Dynamic linking

Processes and syscalls

Processes form the core unit of running programs on Linux. Every launched program is represented by one or more processes.  Processes contain a variety of data about the running program, including a process ID (pid), a table tracking in-use memory, a pointer to the currently executing instruction, a set of descriptors for open files, and so forth.

Processes interact with the operating system to perform a variety of operations (for example, reading and writing files, taking input, communicating on the network, etc.) via system calls, or syscalls.  Syscalls can perform a variety of actions. The ones I’m interested in today involve creating other processes (typically through fork(2) or clone(2)) and changing the currently running program into something else (execve(2)).

File descriptors are how a process interacts with files, as managed by the Linux kernel.  File descriptors are short identifiers (numbers) that are passed to the appropriate syscalls for interacting with files: read(2), write(2), close(2), and so forth.

Sometimes a process wants to spawn another process.  That might be a shell running a program you typed at the terminal, a daemon that needs a helper, or even concurrent processing without threads.  When this happens, the process typically uses the fork(2) or clone(2) syscalls.

These syscalls have some differences, but they both operate by creating another copy of the currently executing process and sharing some state.  That state can include things like the memory structures (either shared memory segments or copies of the memory) and file descriptors.

After the new process is started, it’s the responsibility of both processes to figure out which one they are (am I the parent? Am I the child?). Then, they take the appropriate action.  In many cases, the appropriate action is for the child to do some setup, and then execute the execve(2) syscall.

The following example shows the use of fork(2), in pseudocode:

func main() {
    child_pid= fork();
    if (child_pid > 0) {
        // This is the parent process, since child_pid is the pid of the child
        // process.
    } else if (child_pid == 0) {
        // This is the child process. It can retrieve its own pid via getpid(2),
        // if desired.  This child process still sees all the variables in memory
        // and all the open file descriptors.
    }
}

The execve(2) syscall instructs the Linux kernel to replace the currently executing program with another program, in-place.  When called, the Linux kernel loads the new executable as specified and pass the specified arguments.  Because this is done in place, the pid is preserved and a variety of other contextual information is carried over, including environment variables, the current working directory, and any open files.

func main() {
    // execl(3) is a wrapper around the execve(2) syscall that accepts the
    // arguments for the executed program as a list.
    // The first argument to execl(3) is the path of the executable to
    // execute, which in this case is the pwd(1) utility for printing out
    // the working directory.
    // The next argument to execl(3) is the first argument passed through
    // to the new program (in a C program, this would be the first element
    // of the argc array, or argc[0]).  By convention, this is the same as
    // the path of the executable.
    // The remaining arguments to execl(3) are the other arguments visible
    // in the new program's argc array, terminated by NULL.  As you're
    // not passing any additional arguments, just pass NULL here.
    execl("/bin/pwd", "/bin/pwd", NULL);
    // Nothing after this point executes, since the running process has been
    // replaced by the new pwd(1) program.
}

Wait…open files?  By default, open files are passed across the execve(2) boundary.  This is useful in cases where the new program can’t open the file, for example if there’s a new mount covering the existing path.  This is also the mechanism by which the standard I/O streams (stdin, stdout, and stderr) are made available to the new program.

While convenient in some use cases, it’s not always desired to preserve open file descriptors in the new program. This behavior can be changed by passing the O_CLOEXEC flag to open(2) when opening the file or by setting the FD_CLOEXEC flag with fcntl(2).  Using O_CLOEXEC or FD_CLOEXEC (which are both short for close-on-exec) prevents the new program from having access to the file descriptor.

func main() {
    // open(2) opens a file.  The first argument to open is the path of the file
    // and the second argument is a bitmask of flags that describe options
    // applied to the file that's opened.  open(2) then returns a file
    // descriptor, which can be used in subsequent syscalls to represent this
    // file.
    // For this example, open /dev/urandom, which is a file containing random
    // bytes.  Pass two flags: O_RDONLY and O_CLOEXEC; O_RDONLY indicates that
    // the file should be open for reading but not writing, and O_CLOEXEC
    // indicates that the file descriptor should not pass through the execve(2)
    // boundary.
    fd = open("/dev/urandom", O_RDONLY | O_CLOEXEC);
    // All valid file descriptors are positive integers, so a returned value < 0
    // indicates that an error occurred.
    if (fd < 0) {
        // perror(3) is a function to print out the last error that occurred.
        error("could not open /dev/urandom");
        // exit(3) causes a process to exit with a given exit code. Return 1
        // here to indicate that an error occurred.
        exit(1);
    }
}

What is /proc?
/proc (or proc(5)) is a pseudo-filesystem that provides access to a number of Linux kernel data structures.  Every process in Linux has a directory available for it called /proc/[pid].  This directory stores a bunch of information about the process, including the arguments it was given when the program started, the environment variables visible to it, and the open file descriptors.

The special files inside /proc/[pid]/fd describe the file descriptors that the process has open.  They look like symbolic links (symlinks), and you can see the original path of the file, but they aren’t exactly symlinks. You can pass them to open(2) even if the original path is inaccessible and get another working file descriptor.

Another file inside /proc/[pid] is called exe. This file is like the ones in /proc/[pid]/fd except that it points to the binary program that is executing inside that process.

/proc/[pid] also has a companion directory, /proc/self.  This directory is always the same as /proc/[pid] of the process that is accessing it. That is, you can always read your own /proc data from /proc/self without knowing your pid.

Dynamic linking

When writing programs, software developers typically use libraries—collections of previously written code intended to be reused.  Libraries can cover all sorts of things, from high-level concerns like machine learning to lower-level concerns like basic data structures or interfaces with the operating system.

In the code example above, you can see the use of a library through a call to a function defined in a library (fork).

Libraries are made available to programs through linking: a mechanism for resolving symbols (types, functions, variables, etc.) to their definition.  On Linux, programs can be statically linked, in which case all the linking is done at compile time and all symbols are fully resolved. Or they can be dynamically linked, in which case at least some symbols are unresolved until a runtime linker makes them available.

Dynamic linking makes it possible to replace some parts of the resulting code without recompiling the whole application. This is typically used for upgrading libraries to fix bugs, enhance performance, or to address security concerns.  In contrast, static linking requires re-compiling and re-linking each program that uses a given library to affect the same change.

On Linux, runtime linking is typically performed by ld-linux.so(8), which is provided by the GNU project toolchain.  Dynamically linked libraries are specified by a name embedded into the compiled binary.  This dynamic linker reads those names and then performs a search across a standard set of paths to find the associated library file (a shared object file, or .so).

The dynamic linker’s search path can be influenced by the LD_LIBRARY_PATH environment variable.  The LD_PRELOAD environment variable can tell the linker to load additional, user-specified libraries before all others. This is useful in debugging scenarios to allow selective overriding of symbols without having to rebuild a library entirely.

The vulnerability

Now that the cast of characters is set (fork(2), execve(2), open(2), proc(5), file descriptors, and linking), I can start talking about the vulnerability in runc.

runc is a container runtime.  Like a shell, its primary purpose is to launch other programs. However, it does so after manipulating Linux resources like cgroups, namespaces, mounts, seccomp, and capabilities to make what is referred to as a “container.”

The primary mechanism for setting up some of these resources, like namespaces, is through flags to the clone(2) syscall that take effect in the new process.  The target of the final execve(2) call is the program the user requested. It With a container, the target of the final execve(2) call can be specified in the container image or through explicit arguments.

The CVE announcement states:

The vulnerability allows a malicious container to […] overwrite the host runc binary […].  The level of user interaction is being able to run any command […] as root within a container [when creating] a new container using an attacker-controlled image.

The operative parts of this are: being able to overwrite the host runc binary (that seems bad) by running a command (that’s…what runc is supposed to do…).  Note too that the vulnerability is as simple as running a command and does not require running a container with elevated privileges or running in a non-default configuration.

Don’t containers protect against this?

Containers are, in many ways, intended to isolate the host from a given workload or to isolate a given workload from the host.  One of the main mechanisms for doing this is through a separate view of the filesystem.  With a separate view, the container shouldn’t be able to access the host’s files and should only be able to see its own.  runc accomplishes this using a mount namespace and mounting the container image’s root filesystem as /.  This effectively hides the host’s filesystem.

Even with techniques like this, things can pass through the mount namespace.  For example, the /proc/cmdline file contains the running Linux kernel’s command-line parameters.  One of those parameters typically indicates the host’s root filesystem, and a container with enough access (like CAP_SYS_ADMIN) can remount the host’s root filesystem within the container’s mount namespace.

That’s not what I’m talking about today, as that requires non-default privileges to run.  The interesting thing today is that the /proc filesystem exposes a path to the original program’s file, even if that file is not located in the current mount namespace.

What makes this troublesome is that interacting with Linux primitives like namespaces typically requires you to run as root, somewhere.  In most installations involving runc (including the default configuration in Docker, Kubernetes, containerd, and CRI-O), the whole setup runs as root.

runc must be able to perform a number of operations that require elevated privileges, even if your container is limited to a much smaller set of privileges. For example, namespace creation and mounting both require the elevated capability CAP_SYS_ADMIN, and configuring the network requires the elevated capability CAP_NET_ADMIN. You might see a pattern here.

An alternative to running as root is to leverage a user namespace. User namespaces map a set of UIDs and GIDs inside the namespace (including ones that appear to be root) to a different set of UIDs and GIDs outside the namespace.  Kernel operations that are user-namespace-aware can delineate privileged actions occurring inside the user namespace from those that occur outside.

However, user namespaces are not yet widely employed and are not enabled by default. The set of kernel operations that are user-namespace-aware is still growing, and not everyone runs the newest kernel or user-space software.

So, /proc exposes a path to the original program’s file, and the process that starts the container runs as root.  What if that original program is something important that you knew would run again… like runc?

Exploiting it!

runc’s job is to run commands that you specify.  What if you specified /proc/self/exe?  It would cause runc to spawn a copy of itself, but running inside the context of the container, with the container’s namespaces, root filesystem, and so on.  For example, you could run the following command:

docker run –rm amazonlinux:2 /proc/self/exe

This, by itself, doesn’t get you far—runc doesn’t hurt itself.

Generally, runc is dynamically linked against some libraries that provide implementations for seccomp(2), SELinux, or AppArmor.  If you remember from earlier, ld-linux.so(8) searches a standard set of file paths to provide these implementations at runtime.  If you start runc again inside the container’s context, with its separate filesystem, you have the opportunity to provide other files in place of the expected library files. These can run your own code instead of the standard library code.

There’s an easier way, though.  Instead of having to make something that looks like (for example) libseccomp, you can take advantage of a different feature of the dynamic linker: LD_PRELOAD.  And because runc lets you specify environment variables along with the path of the executable to run, you can specify this environment variable, too.

With LD_PRELOAD, you can specify your own libraries to load first, ahead of the other libraries that get loaded.  Because the original libraries still get loaded, you don’t actually have to have a full implementation of their interface.  Instead, you can selectively override some common functions that you might want and omit others that you don’t.

So now you can inject code through LD_PRELOAD and you have a target to inject it into: runc, by way of /proc/self/exe.  For your code to get run, something must call it.  You could search for a target function to override, but that means inspecting runc’s code to figure out what could get called and how.  Again, there’s an easier way.  Dynamic libraries can specify a “constructor” that is run immediately when the library is loaded.

Using the “constructor” along with LD_PRELOAD and specifying the command as /proc/self/exe, you now have a way to inject code and get it to run.  That’s it, right?  You can now write to /proc/self/exe and overwrite runc!

Not so fast.

The Linux kernel does have a bit of a protection mechanism to prevent you from overwriting the currently running executable.  If you open /proc/self/exe for writing, you get -ETXTBSY.  This error code indicates that the file is busy, where “TXT” refers to the text (code) section of the binary.

You know from earlier that execve(2) is a mechanism to replace the currently running executable with another, which means that the original executable isn’t in use anymore.  So instead of just having a single library that you load with LD_PRELOAD, you also must have another executable that can do the dirty work for you, which you can execve(2).

Normally, doing this would still be unsuccessful due to file permissions. Executables are typically not world-writable.  But because runc runs as root and does not change users, the new runc process that you started through /proc/self/exe and the helper program that you executed are also run as root.

After you gain write access to the runc file descriptor and you’ve replaced the currently executing program with execve(2), you can replace runc’s content with your own.  The other software on the system continues to start runc as part of its normal operation (for example, creating new containers, stopping containers, or performing exec operations inside containers). Your code has the chance to operate instead of runc.  When it gets run this way, your code runs as root, in the host’s context instead of in the container’s context.

Now you’re done!  You’ve successfully escaped the container and have full root access.

Putting that all together, you get something like the following pseudocode:

Preload pseudocode for preload.so

func constructor(){
    // /proc/self/exe is a virtual file pointing to the currently running
    // executable.  Open it here so that it can be passed to the next
    // process invoked.  It must be opened read-only, or the kernel will fail
    // the open syscall with ETXTBSY.  You cannot gain write access to the text
    // portion of a running executable.
    fd = open("/proc/self/exe", O_RDONLY);
    if (fd < 0) {
        error("could not open /proc/self/exe");
        exit(1);
    }
    // /proc/self/fd/%d is a virtual file representing the open file descriptor
    // to /proc/self/exe, which you opened earlier.
    filename = sprintf("/proc/self/fd/%d", fd);
    // execl is a call that executes a new executable, replacing the
    // currently running process and preserving aspects like the process ID.
    // Execute the "rewrite" binary, passing it arguments representing the
    // path of the open file descriptor. Because you did not pass O_CLOEXEC when
    // opening the file, the file descriptor remains open in the replacement
    // program and retains the same descriptor.
    execl("/rewrite", "/rewrite", filename, NULL);
    // execl never returns, except on an error
    error("couldn't execl");
}

Pseudocode for the rewrite program

// rewrite is your cooperating malicious program that takes an argument
// representing a file descriptor path, reopens it as read-write, and
// replaces the contents.  rewrite expects that it is unable to open
// the file on the first try, as the kernel has not closed it yet
func main(argc, *argv[]) {
    fd = 0;
    printf("Running\n");
    for(tries = 0; tries < 10000; tries++) {
        // argv[1] is the argument that contains the path to the virtual file
        // of the read-only file descriptor
        fd = open(argv[1], O_RDWR|O_TRUNC);
        if( fd >= 0 ) {
            printf("open succeeded\n");
            break;
        } else {
            if(errno != ETXTBSY) {
                // You expect a lot of ETXTBSY, so only print when you get something else
                error("open");
            }
        }
    }
    if (fd < 0) {
        error("exhausted all open attempts");
        exit(1);
    }
    dprintf(fd, "CVE-2019-5736\n");
    printf("wrote over runc!\n");
    fflush(stdout);

The above code was written by Noah Meyerhans, iliana weller, and Samuel Karp.

How does the patch work?

If you try the same approach with a patched runc, you instead see that opening the file with O_RDWR is denied.  This means that the patch is working!

The runc patch operates by taking advantage of some Linux kernel features introduced in kernel 3.17, specifically a syscall called memfd_create(2).  This syscall creates a temporary memory-backed file and a file descriptor that can be used to access the file.  This file descriptor has some special semantics:  It is automatically removed when the last reference to it is dropped. It’s in memory, so that just equates to freeing the memory.  It supports another useful feature: file sealing.  File sealing allows the file to be made immutable, even to processes that are running as root.

The runc patch changes the behavior of runc so that it creates a copy of itself in one of these temporary file descriptors, and then seals it.  The next time a process launches (via fork(2)) or a process is replaced (via execve(2)), /proc/self/exe will be this sealed, memory-backed file descriptor.  When your rewrite program attempts to modify it, the Linux kernel prevents it as it’s a sealed file.

Could I have avoided being vulnerable?

Yes, a few different mechanisms were available before the patch that provided mitigation for this vulnerability.  The one that I mentioned earlier is user namespaces. Mapping to a different user namespace inside the container would mean that normal Linux file permissions would effectively prevent runc from becoming writable because the compromised process inside the container is not running as the real root user.

Another mechanism, which is used by Google Container-Optimized OS, is to have the host’s root filesystem mounted as read-only.  A read-only mount of the runc binary itself would also prevent the runc binary from becoming writable.

SELinux, when correctly configured, may also prevent this vulnerability.

A different approach to preventing this vulnerability is to treat the Linux kernel as belonging to a single tenant, and spend your effort securing the kernel through another layer of separation.  This is typically accomplished using a hypervisor or a virtual machine.

Amazon invests heavily in this type of boundary. Amazon EC2 instances, AWS Lambda functions, and AWS Fargate tasks are secured from each other using techniques like these.  Amazon EC2 bare-metal instances leverage the next-generation Nitro platform that allows AWS to offer secure, bare-metal compute with a hardware root-of-trust.  Along with traditional hypervisors, the Firecracker virtual machine manager can be used to implement this technique with function- and container-like workloads.

Further reading

The original researchers who discovered this vulnerability have published their own post, CVE-2019-5736: Escape from Docker and Kubernetes containers to root on host. They describe how they discovered the vulnerability, as well as several other attempts.

Acknowledgements

I’d like to thank the original researchers who discovered the vulnerability for reporting responsibly and the OCI maintainers (and Aleksa Sarai specifically) who coordinated the disclosure. Thanks to Linux distribution maintainers and cloud providers who made updated packages available quickly.  I’d also like to thank the Amazonians who made it possible for AWS customers to be protected:

  • AWS Security who ran the whole process
  • Clare Liguori, Onur Filiz, Noah Meyerhans, iliana weller, and Tom Kirchner who performed analysis and validation of the patch
  • The Amazon Linux team (and iliana weller specifically) who backported the patch and built new Docker RPMs
  • The Amazon ECS, Amazon EKS, and AWS Fargate teams for making patched infrastructure available quickly
  • And all of the other AWS teams who put in extra effort to protect AWS customers

A Guide to Locally Testing Containers with Amazon ECS Local Endpoints and Docker Compose

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/a-guide-to-locally-testing-containers-with-amazon-ecs-local-endpoints-and-docker-compose/

This post is contributed by Wesley Pettit, Software Engineer at AWS.

As more companies adopt containers, developers need easy, powerful ways to test their containerized applications locally, before they deploy to AWS. Today, the containers team is releasing the first tool dedicated to this: Amazon ECS Local Container Endpoints. This is part of an ongoing open source project designed to improve the local development process for Amazon Elastic Container Service (ECS) and AWS Fargate.  This first step allows you to locally simulate the ECS Task Metadata V2 and V3 endpoints and IAM Roles for Tasks.

In this post, I will walk you through the following testing scenarios enabled by Amazon ECS Local Endpoints and Docker Compose:

  •  Testing a container that needs credentials to interact with AWS Services
  • Testing a container which uses Task Metadata
  • Testing a multi-container app which uses the awsvpc or host network mode on Docker For Mac and Docker For Windows.
  • Testing multiple containerized applications using local service discovery

Setup

Your local testing toolkit consists of Docker, Docker Compose, and awslabs/amazon-ecs-local-container-endpoints.  To follow along with the scenarios in this post, you will need to have locally installed the Docker Daemon, the Docker Command Line, and Docker Compose.

Once you have the dependencies installed, create a Docker Compose file called docker-compose.yml. The Compose file defines the settings needed to run your application. If you have never used Docker Compose before, check out Docker’s Getting Started with Compose tutorial. This example file defines a web application:

version: "2"
services:
  app:
    build:
      # Build an image from the Dockerfile in the current directory
      context: .
    ports:
      - 8080:80
    environment:
      PORT: "80"

Make sure to save your docker-compose.yml file: it will be needed for the rest of the scenarios.

Our First Scenario: Testing a container which needs credentials to interact with AWS Services

Say I have a container which I want to test locally that needs AWS credentials. I could accomplish this by providing credentials as environment variables on the container, but that would be a bad practice. Instead, I can use Amazon ECS Local Endpoints to safely vend credentials to a local container.

The following Docker Compose override file template defines a single container that will use credentials. This should be used along with the docker-compose.yml file you created in the setup section. Name this file docker-compose.override.yml, (Docker Compose will know to automatically use both of the files).

Your docker-compose.override.yml file should look like this:

version: "2"
networks:
    # This special network is configured so that the local metadata
    # service can bind to the specific IP address that ECS uses
    # in production
    credentials_network:
        driver: bridge
        ipam:
            config:
                - subnet: "169.254.170.0/24"
                  gateway: 169.254.170.1
services:
    # This container vends credentials to your containers
    ecs-local-endpoints:
        # The Amazon ECS Local Container Endpoints Docker Image
        image: amazon/amazon-ecs-local-container-endpoints
        volumes:
          # Mount /var/run so we can access docker.sock and talk to Docker
          - /var/run:/var/run
          # Mount the shared configuration directory, used by the AWS CLI and AWS SDKs
          # On Windows, this directory can be found at "%UserProfile%\.aws"
          - $HOME/.aws/:/home/.aws/
        environment:
          # define the home folder; credentials will be read from $HOME/.aws
          HOME: "/home"
          # You can change which AWS CLI Profile is used
          AWS_PROFILE: "default"
        networks:
            credentials_network:
                # This special IP address is recognized by the AWS SDKs and AWS CLI 
                ipv4_address: "169.254.170.2"
                
    # Here we reference the application container that we are testing
    # You can test multiple containers at a time, simply duplicate this section
    # and customize it for each container, and give it a unique IP in 'credentials_network'.
    app:
        depends_on:
            - ecs-local-endpoints
        networks:
            credentials_network:
                ipv4_address: "169.254.170.3"
        environment:
          AWS_DEFAULT_REGION: "us-east-1"
          AWS_CONTAINER_CREDENTIALS_RELATIVE_URI: "/creds"

To test your container locally, run:

docker-compose up

Your container will now be running and will be using temporary credentials obtained from your default AWS Command Line Interface Profile.

NOTE: You should not use your production credentials locally. If you provide the ecs-local-endpoints with an AWS Profile that has access to your production account, then your application will be able to access/modify production resources from your local testing environment. We recommend creating separate development and production accounts.

How does this work?

In this example, we have created a User Defined Docker Bridge Network which allows the Local Container Endpoints to listen at the IP Address 169.254.170.2. We have also defined the environment variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI on our application container. The AWS SDKs and AWS CLI are all designed to retrieve credentials by making HTTP requests to  169.254.170.2$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI. When containers run in production on ECS, the ECS Agent vends credentials to containers via this endpoint; this is how IAM Roles for Tasks is implemented.

Amazon ECS Local Container Endpoints vends credentials to containers the same way as the ECS Agent does in production. In this case, it vends temporary credentials obtained from your default AWS CLI Profile. It can do that because it mounts your .aws folder, which contains credentials and configuration for the AWS CLI.

Gotchas: Things to Keep in Mind when using ECS Local Container Endpoints and Docker Compose

  • Make sure every container in the credentials_network has a unique IP Address. If you don’t do this, Docker Compose can incorrectly try to assign 169.254.170.2 (the ecs-local-endpoints container IP) to one of the application containers. This will cause your Compose project to fail to start.
  • On Windows, replace $HOME/.aws/ in the volumes declaration for the endpoints container with the correct location of the AWS CLI configuration directory, as explained in the documentation.
  • Notice that the application container is named ‘app’ in both of the example file templates. You must make sure the container names match between your docker-compose.yml and docker-compose.override.yml. When you run docker-compose up, the files will be merged. The settings in each file for each container will be merged, so it’s important to use consistent container names between the two files.

Scenario Two: Testing using Task IAM Role credentials

The endpoints container image can also vend credentials from an IAM Role; this allows you to test your application locally using a Task IAM Role.

NOTE: You should not use your production Task IAM Role locally. Instead, create a separate testing role, with equivalent permissions scoped to testing resources. Modifying the trust boundary of a production role will expand its scope.

In order to use a Task IAM Role locally, you must modify its trust policy. First, get the ARN of the IAM user defined by your default AWS CLI Profile (replace default with a different Profile name if needed):

aws --profile default sts get-caller-identity

Then modify your Task IAM Role so that its trust policy includes the following statement. You can find instructions for modifying IAM Roles in the IAM Documentation.

    {
      "Effect": "Allow",
      "Principal": {
        "AWS": <ARN of the user found with get-caller-identity>
      },
      "Action": "sts:AssumeRole"
    }

To use your Task IAM Role in your docker compose file for local testing, simply change the value of the AWS container credentials relative URI environment variable on your application container:

AWS_CONTAINER_CREDENTIALS_RELATIVE_URI: "/role/<name of your role>"

For example, if your role is named ecs_task_role, then the environment variable should be set to "/role/ecs_task_role". That is all that is required; the ecs-local-endpoints container will now vend credentials obtained from assuming the task role. You can use this to validate that the permissions set on your Task IAM Role are sufficient to run your application.

Scenario Three: Testing a Container that uses Task Metadata endpoints

The Task Metadata endpoints are useful; they allow a container running on ECS to obtain information about itself at runtime. This enables many use cases; my favorite is that it allows you to obtain container resource usage metrics, as shown by this project.

With Amazon ECS Local Container Endpoints, you can locally test applications that use the Task Metadata V2 or V3 endpoints. If you want to use the V2 endpoint, the Docker Compose template shown at the beginning of this post is sufficient. If you want to use V3, simply add another environment variable to each of your application containers:

ECS_CONTAINER_METADATA_URI: "http://169.254.170.2/v3"

This is the environment variable defined by the V3 metadata spec.

Scenario Four: Testing an Application that uses the AWSVPC network mode

Thus far, all or our examples have involved testing containers in a bridge network. But what if you have an application that uses the awsvpc network mode. Can you test these applications locally?

Your local development machine will not have Elastic Network Interfaces. If your ECS Task consists of a single container, then the bridge network used in previous examples will suffice. However, if your application consists of multiple containers that need to communicate, then awsvpc differs significantly from bridge. As noted in the AWS Documentation:

“containers that belong to the same task can communicate over the localhost interface.”

This is one of the benefits of awsvpc; it makes inter-container communication easy. To simulate this locally, a different approach is needed.

If your local development machine is running linux, then you are in luck. You can test your containers using the host network mode, which will allow them to all communicate over localhost. Instructions for how to set up iptables rules to allow your containers to receive credentials and metadata is documented in the ECS Local Container Endpoints Project README.

If you are like me, and do most of your development on Windows or Mac machines, then this option will not work. Docker only supports host mode on Linux. Luckily, this section describes a workaround that will allow you to locally simulate awsvpc on Docker For Mac or Docker For Windows. This also partly serves as a simulation of the host network mode, in the sense that all of your containers will be able to communicate over localhost (from a local testing standpoint, host and awsvpc are functionally the same, the key requirement is that all containers share a single network interface).

In ECS, awsvpc is implemented by first launching a single container, which we call the ‘pause container‘. This container is attached to the Elastic Network Interface, and then all of the containers in your task are launched into the pause container’s network namespace. For the local simulation of awsvpc, a similar approach will be used.

First, create a Dockerfile with the following contents for the ‘local’ pause container.

FROM amazonlinux:latest
RUN yum install -y iptables

CMD iptables -t nat -A PREROUTING -p tcp -d 169.254.170.2 --dport 80 -j DNAT --to-destination 127.0.0.1:51679 \
 && iptables -t nat -A OUTPUT -d 169.254.170.2 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 51679 \
 && iptables-save \
 && /bin/bash -c 'while true; do sleep 30; done;'

This Dockerfile defines a container image which sets some iptables rules and then sleeps forever. The routing rules will allow requests to the credentials and metadata service to be forwarded from 169.254.170.2:80 to localhost:51679, which is the port ECS Local Container Endpoints will listen at in this setup.

Build the image:

docker build -t local-pause:latest .

Now, edit your docker-compose.override.yml file so that it looks like the following:

version: "2"
services:
    ecs-local-endpoints:
        image: amazon/amazon-ecs-local-container-endpoints
        volumes:
          - /var/run:/var/run
          - $HOME/.aws/:/home/.aws/
        environment:
          ECS_LOCAL_METADATA_PORT: "51679"
          HOME: "/home"
        network_mode: container:local-pause

    app:
        depends_on:
            - ecs-local-endpoints
        network_mode: container:local-pause
        environment:
          ECS_CONTAINER_METADATA_URI: "http://169.254.170.2/v3/containers/app"
          AWS_CONTAINER_CREDENTIALS_RELATIVE_URI: "/creds"

Several important things to note:

  • ECS_LOCAL_METADATA_PORT is set to 51679; this is the port that was used in the iptables rules.
  • network_mode is set to container:local-pause for all the containers, which means that they will use the networking stack of a container named local-pause.
  • ECS_CONTAINER_METADATA_URI is set to http://169.254.170.2/ v3/containers/app. This is important. In bridge mode, the local endpoints container can determine which container a request came from using the IP Address in the request. In simulated awsvpc, this will not work, since all of the containers share the same IP Address. Thus, the endpoints container supports using the container name in the request URI so that it can identify which container the request came from. In this case, the container is named app, so app is appended to the value of the environment variable. If you copy the app container configuration to add more containers to this compose file, make sure you update the value of ECS_CONTAINER_METADATA_URI for each new container.
  • Remove any port declarations from your docker-compose.yml file. These are not valid with the network_mode settings that you will be using. The text below explains how to expose ports in this simulated awsvpc network mode.

Before you run the compose file, you must launch the local-pause container. This container can not be defined in the Docker Compose file, because in Compose there is no way to define that one container must be running before all the others. You might think that the depends_on setting would work, but this setting only determines the order in which containers are started. It is not a robust solution for this case.

One key thing to note; any ports used by your application containers must be defined on the local-pause container. You can not define ports directly on your application containers because their network mode is set to container:local-pause. This is a limitation imposed by Docker.

Assuming that your application containers need to expose ports 8080 and 3306 (replace these with the actual ports used by your applications), run the local pause container with this command:

docker run -d -p 8080:8080 -p 3306:3306 --name local-pause --cap-add=NET_ADMIN local-pause

Then, simply run the docker compose files, and you will have containers which share a single network interface and have access to credentials and metadata!

Scenario Five: Testing multiple applications with local Service Discovery

Thus far, all of the examples have focused on running a single containerized application locally. But what if you want to test multiple applications which run as separate Tasks in production?

Docker Compose allows you to set up DNS aliases for your containers. This allows them to talk to each other using a hostname.

For this example, return to the compose override file with a bridge network shown in scenarios one through three. Here is a docker-compose.override.yml file which implements a simple scenario. There are two applications, frontend and backend. Frontend needs to make requests to backend.

version: "2"
networks:
    credentials_network:
        driver: bridge
        ipam:
            config:
                - subnet: "169.254.170.0/24"
                  gateway: 169.254.170.1
services:
    # This container vends credentials to your containers
    ecs-local-endpoints:
        # The Amazon ECS Local Container Endpoints Docker Image
        image: amazon/amazon-ecs-local-container-endpoints
        volumes:
          - /var/run:/var/run
          - $HOME/.aws/:/home/.aws/
        environment:
          HOME: "/home"
          AWS_PROFILE: "default"
        networks:
            credentials_network:
                ipv4_address: "169.254.170.2"
                aliases:
                    - endpoints # settings for the containers which you are testing
    frontend:
        image: amazonlinux:latest
        command: /bin/bash -c 'while true; do sleep 30; done;'
        depends_on:
            - ecs-local-endpoints
        networks:
            credentials_network:
                ipv4_address: "169.254.170.3"
        environment:
          AWS_DEFAULT_REGION: "us-east-1"
          AWS_CONTAINER_CREDENTIALS_RELATIVE_URI: "/creds"
    backend:
        image: nginx
        networks:
            credentials_network:
                # define an alias for service discovery
                aliases:
                    - backend
                ipv4_address: "169.254.170.4"

With these settings, the frontend container can find the backend container by making requests to http://backend.

Conclusion

In this tutorial, you have seen how to use Docker Compose and awslabs/amazon-ecs-local-container-endpoints to test your Amazon ECS and AWS Fargate applications locally before you deploy.

You have learned how to:

  • Construct docker-compose.yml and docker-compose.override.yml files.
  • Test a container locally with temporary credentials from a local AWS CLI Profile.
  • Test a container locally using credentials from an ECS Task IAM Role.
  • Test a container locally that uses the Task Metadata Endpoints.
  • Locally simulate the awsvpc network mode.
  • Use Docker Compose service discovery to locally test multiple dependent applications.

To follow along with new developments to the local development project, you can head to the public AWS Containers Roadmap on GitHub. If you have questions, comments, or feedback, you can let the team know there!

 

Building a scalable log solution aggregator with AWS Fargate, Fluentd, and Amazon Kinesis Data Firehose

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/building-a-scalable-log-solution-aggregator-with-aws-fargate-fluentd-and-amazon-kinesis-data-firehose/

This post is contributed by Wesley Pettit, Software Dev Engineer, and a maintainer of the Amazon ECS CLI.

Modern distributed applications can produce gigabytes of log data every day. Analysis and storage is the easy part. From Amazon S3 to Elasticsearch, many solutions are available. The hard piece is reliably aggregating and shipping logs to their final destinations.

In this post, I show you how to build a log aggregator using AWS Fargate, Amazon Kinesis Data Firehose, and Fluentd. This post is unrelated to the AWS effort to support Fluentd to stream container logs from Fargate tasks. Follow the progress of that effort on the AWS container product roadmap.

Solution overview

Fluentd forms the core of my log aggregation solution. It is an open source project that aims to provide a unified logging layer by handling log collection, filtering, buffering, and routing. Fluentd is widely used across cloud platforms and was adopted by the Cloud Native Computing Foundation (CNCF) in 2016.

AWS Fargate provides a straightforward compute environment for the Fluentd aggregator. Kinesis Data Firehose streams the logs to their destinations. It batches, compresses, transforms, and encrypts the data before loading it. ECS minimizes the amount of storage used at the destination and increases security.

The log aggregator that I detail in this post is generic and can be used with any type of application. However, for simplicity, I focus on how to use the aggregator with Amazon Elastic Container Service (ECS) tasks and services.

Building a Fluentd log aggregator on Fargate that streams to Kinesis Data Firehose

 

The diagram describes the architecture that you are going to implement. A Fluentd aggregator runs as a service on Fargate behind a Network Load Balancer. The service uses Application Auto Scaling to dynamically adjust to changes in load.

Because the load balancer DNS can only be resolved in the VPC, the aggregator is a private log collector that can accept logs from any application in the VPC. Fluentd streams the logs to Kinesis Data Firehose, which dumps them in S3 and Amazon ElasticSearch Service (Amazon ES).

Not all logs are of equal importance. Some require real time analytics, others simply need long-term storage so that they can be analyzed if needed. In this post, applications that log to Fluentd are split up into frontend and backend.

Frontend applications are user-facing and need rich functionality to query and analyze the data present in their logs to obtain insights about users. Therefore, frontend application logs are sent to Amazon ES.

In contrast, backend services do not need the same level of analytics, so their logs are sent to S3. These logs can be queried using Amazon Athena, or they can be downloaded and ingested into other analytics tools as needed.

Each application tags its logs, and Fluentd sends the logs to different destinations based on the tag. Thus, the aggregator can determine whether a log message is from a backend or frontend application. Each log message gets sent to one of two Kinesis Data Firehose streams:

  • One streams to S3
  • One streams to an Amazon ES cluster

Running the aggregator on Fargate makes maintaining this service easy, and you don’t have to worry about provisioning or managing instances. This makes scaling the aggregator simple. You do not have to manage an additional Auto Scaling group for instances.

Aggregator performance and throughput

Before I walk you through how to deploy the Fluentd aggregator in your own VPC, you should know more about its performance.

I performed extensive real world testing with this aggregator set up, to test its limits. Each task in the aggregator service can handle at least at least 9 MB/s of log traffic, and at least 10,000 log messages/second. These are comfortable lower bounds for the aggregator’s performance. I recommend using these numbers to provision your aggregator service based upon your expected throughput for log traffic.

While this aggregator set up includes dynamic scaling, you must carefully choose the minimum size of the service. This is because dynamic scaling with Fluentd is complicated.

The Fluentd aggregator accepts logs via TCP connections, which are balanced across the instances of the service by the Network Load Balancer. However, these TCP connections are long-lived, and the load balancer only distributes new TCP connections. Thus, when the aggregator scales up in response to increased load, the new Fargate tasks can not help with any of the existing load. The new tasks can only take new TCP connections. This also means that older tasks in the aggregator tend to accumulate connections with time. This is an important limitation to keep in mind.

For the Docker logging driver for Fluentd (which can be used by ECS tasks to send logs to the aggregator), a single TCP connection is made when each container starts. This connection is held open as long as possible. A TCP connection can remain open as long as data is still being sent over it, and there are no network connectivity issues. The only way to guarantee that there are new TCP connections is to launch new containers.

Dynamic scaling can only help in cases where there are spikes in log traffic and new TCP connections. If you are using the aggregator with ECS tasks, dynamic scaling is only useful if spikes in log traffic come from launching new containers. On the other hand, if spikes in log traffic come from existing containers that periodically increase their log output, then dynamic scaling can’t help.

Therefore, configure the minimum number of tasks for the aggregator based upon the maximum throughput that you expect from a stable population of containers. For example, if you expect 45 MB/s of log traffic, then I recommend setting the minimum size of the aggregator service to five tasks, so that each one gets 9 MB/s of traffic.

For reference, here is the resource utilization that I saw for a single aggregator task under a variety of loads. The aggregator is configured with four vCPU and 8 GB of memory as its task size. As you can see, CPU usage scales linearly with load, so dynamic scaling is configured based on CPU usage.

Performance of a single aggregator task

Keep in mind that this data does not represent a guarantee, as your performance may differ. I recommend performing real-world testing using logs from your applications so that you can tune Fluentd to your specific needs.

As a warning, one thing to watch out for is messages in the Fluentd logs that mention retry counts:

2018-10-24 19:26:54 +0000 [warn]: #0 [output_kinesis_frontend] Retrying to request batch. Retry count: 1, Retry records: 250, Wait seconds 0.35
2018-10-24 19:26:54 +0000 [warn]: #0 [output_kinesis_frontend] Retrying to request batch. Retry count: 2, Retry records: 125, Wait seconds 0.27
2018-10-24 19:26:57 +0000 [warn]: #0 [output_kinesis_frontend] Retrying to request batch. Retry count: 1, Retry records: 250, Wait seconds 0.30

In my experience, these warnings always came up whenever I was hitting Kinesis Data Firehose API limits. Fluentd can accept high volumes of log traffic, but if it runs into Kinesis Data Firehose limits, then the data is buffered in memory.

If this state persists for a long time, data is eventually lost when the buffer reaches its max size. To prevent this problem, either increase the number of Kinesis Data Firehose delivery streams in use or request a Kinesis Data Firehose limit increase.

Aggregator reliability

In normal use, I didn’t see any dropped or duplicated log messages. A small amount of log data loss occurred when tasks in the service were stopped. This happened when the service was scaling down, and during deployments to update the Fluentd configuration.

When a task is stopped, it is sent SIGTERM, and then after a 30-second timeout, SIGKILL. When Fluentd receives the SIGTERM, it makes a single attempt to send all logs held in its in-memory buffer to their destinations. If this single attempt fails, the logs are lost. Therefore, log loss can be minimized by over-provisioning the aggregator, which reduces the amount of data buffered by each aggregator task.

Also, it is important to stay well within your Kinesis Data Firehose API limits. That way, Fluentd has the best chance of sending all the data to Kinesis Data Firehose during that single attempt.

To test the reliability of the aggregator, I used applications hosted on ECS that created several megabytes per second of log traffic. These applications inserted special ‘tracer’ messages into their normal log output. By querying for these tracer messages at the log storage destination, I was able determine how many messages were lost or duplicated.

These tracer logs were produced at a rate of 18 messages per second. During a deployment to the aggregator service (which stops all the existing tasks after starting new ones), 2.67 tracer messages were lost on average, and 11.7 messages were duplicated.

There are multiple ways to think about this data. If I ran one deployment during an hour, then 0.004% of my log data would be lost during that period, making the aggregator 99.996% reliable. In my experience, stopping a task only causes log loss during a short time slice of about 10 seconds.

Here’s another way to look at this. Every time that a task in my service was stopped (either due to a deployment or the service scaling in), only 1.5% of the logs received by that task in the 10-second period were lost on average.

As you can see, the aggregator is not perfect, but it is fairly reliable. Remember that logs were only dropped when aggregator tasks were stopped. In all other cases, I never saw any log loss. Thus, the aggregator provides a sufficient reliability guarantee that it can be trusted to handle the logs of many production workloads.

Deploying the aggregator

Here’s how to deploy the log aggregator in your own VPC.

1.     Create the Kinesis Data Firehose delivery streams.

2.     Create a VPC and network resources.

3.     Configure Fluentd.

4.     Build the Fluentd Docker image.

5.     Deploy the Fluentd aggregator on Fargate.

6.     Configure ECS tasks to send logs to the aggregator.

Create the Kinesis Data Firehose delivery streams

For the purposes of this post, assume that you have already created an Elasticsearch domain and S3 bucket that can be used as destinations.

Create a delivery stream that sends to Amazon ES, with the following options:

  • For Delivery stream name, type “elasticsearch-delivery-stream.”
  • For Source, choose Direct Put or other sources.
  • For Record transformation and Record format conversion, enable them to change the format of your log data before it is sent to Amazon ES.
  • For Destination, choose Amazon Elasticsearch Service.
  • If needed, enable S3 backup of records.
  • For IAM Role, choose Create new or navigate to the Kinesis Data Firehose IAM role creation wizard.
  • For the IAM policy, remove any statements that do not apply to your delivery stream.

All records sent through this stream are indexed under the same Elasticsearch type, so it’s important that all of the log records be in the same format. Fortunately, Fluentd makes this easy. For more information, see the Configure Fluentd section in this post.

Follow the same steps to create the delivery stream that sends to S3. Call this stream “s3-delivery-stream,” and select your S3 bucket as the destination.

Create a VPC and network resources

Download the ecs-refarch-cloudformation/infrastructure/vpc.yaml AWS CloudFormation template from GitHub. This template specifies a VPC with two public and two private subnets spread across two Availability Zones. The Fluentd aggregator run in the private subnets, along with any other services that should not be accessible outside the VPC. Your backend services would likely run here as well.

The template configures a NAT gateway that allows services in the private subnets to make calls to endpoints on the internet. It allows one-way communication out of the VPC, but blocks incoming traffic. This is important. While the aggregator service should only be accessible in your VPC, it does need to make calls to the Kinesis Data Firehose API endpoint, which lives outside of your VPC.

Deploy the template with the following command:

aws cloudformation deploy --template-file vpc.yaml \
--stack-name vpc-with-nat \
--parameter-overrides EnvironmentName=aggregator-service-infrastructure

Configure Fluentd

The Fluentd aggregator collects logs from other services in your VPC. Assuming that all these services are running in Docker containers that use the Fluentd docker log driver, each log event collected by the aggregator is in the format of the following example:

{
"source": "stdout",
"log": "122.116.50.70 - Corwin8644 264 [2018-10-31T21:31:59Z] \"POST /activate\" 200 19886",
"container_id": "6d33ade920a96179205e01c3a17d6e7f3eb98f0d5bb2b494383250220e7f443c",
"container_name": "/ecs-service-2-apache-d492a08f9480c2fcca01"
}

This log event is from an Apache server running in a container on ECS. The line that the server logged is captured in the log field, while source, container_id, and container_name are metadata added by the Fluentd Docker logging driver.

As I mentioned earlier, all log events sent to Amazon ES from the delivery stream must be in the same format. Furthermore, the log events must be JSON-formatted so that they can be converted into a Elasticsearch type. The Fluentd Docker logging driver satisfies both of these requirements.

If you have applications that emit Fluentd logs in different formats, then you could use a Lambda function in the delivery stream to transform all of the log records into a common format.

Alternatively, you could have a different delivery stream for each application type and log format and each log format could correspond to a different type in the Amazon ES domain. For simplicity, this post assumes that all of the frontend and backend services run on ECS and use the Fluentd Docker logging driver.

Now create the Fluentd configuration file, fluent.conf:

<system>
workers 4
</system>

<source>
@type  forward
@id    input1
@label @mainstream
port  24224
</source>

# Used for docker health check
<source>
@type http
port 8888
bind 0.0.0.0
</source>

# records sent for health checking won't be forwarded anywhere
<match health*>
@type null
</match>

<label @mainstream>
<match frontend*>
@type kinesis_firehose
@id   output_kinesis_frontend
region us-west-2
delivery_stream_name elasticsearch-delivery-stream
<buffer>
flush_interval 1
chunk_limit_size 1m
flush_thread_interval 0.1
flush_thread_burst_interval 0.01
flush_thread_count 15
total_limit_size 2GB
</buffer>
</match>
<match backend*>
@type kinesis_firehose
@id   output_kinesis_backend
region us-west-2
delivery_stream_name s3-delivery-stream
<buffer>
flush_interval 1
chunk_limit_size 1m
flush_thread_interval 0.1
flush_thread_burst_interval 0.01
flush_thread_count 15
total_limit_size 2GB
</buffer>
</match>
</label>

This file can also be found here, along with all of the code for this post. The first three lines tell Fluentd to use four workers, which means it can use up to 4 CPU cores. You later configure each Fargate task to have four vCPU. The rest of the configuration defines sources and destinations for logs processed by the aggregator.

The first source listed is the main source. All the applications in the VPC forward logs to Fluentd using this source definition. The source tells Fluentd to listen for logs on port 24224. Logs are streamed over tcp connections at this port.

The second source is the http Fluentd plugin, listening on port 8888. This plugin accepts logs over http; however, this is only used for container health checks. Because Fluentd lacks a built-in health check, I’ve created a container health check that sends log messages via curl to the http plugin. The rationale is that if Fluentd can accept log messages, it must be healthy. Here is the command used for the container health check:

curl http://localhost:8888/healthcheck?json=%7B%22log%22%3A+%22health+check%22%7D || exit 1

The query parameter in the URL defines a URL-encoded JSON object that looks like this:

{"log": "health check"}

The container health check inputs a log message of “health check”. While the query parameter in the URL defines the log message, the path, which is /healthcheck, sets the tag for the log message. In Fluentd, log messages are tagged, which allows them to be routed to different destinations.

In this case, the tag is healthcheck. As you can see in the configuration file, the first <match> definition handles logs that have a tag that matches the pattern health*. Each <match> element defines a tag pattern and defines a destination for logs with tags that match that pattern. For the health check logs, the destination is null, because you do not want to store these dummy logs anywhere.

The Configure ECS tasks to send logs to the aggregator section of this post explains how log tags are defined with the Fluentd docker logging driver. The other <match> elements process logs for the applications in the VPC and send them to Kinesis Data Firehose.

One of them matches any log tag that begins with “frontend”, the other matches any tag that starts with “backend”. Each sends to a different delivery stream. Your frontend and and backend services can tag their logs and have them be sent to different destinations.

Fluentd lacks built in support for Kinesis Data Firehose, so use an open source plugin maintained by AWS: awslabs/aws-fluent-plugin-kinesis.

Finally, each of the Kinesis Data Firehose <match> tags define buffer settings with the <buffer> element. The Kinesis output plugin buffers data in memory if needed. These settings have been tuned to increase throughput and minimize the chance of data loss, though you should modify them as needed based upon your own testing. For a discussion on the throughput and performance of this setup, see the Aggregator performance and throughput section in this post.

Build the Fluentd Docker image

Download the Dockerfile, and place it in the same directory as the fluentd.conf file discussed earlier. The Dockerfile starts with the latest Fluentd image based on Alpine, and installs awslabs/aws-fluent-plugin-kinesis and curl (for the container health check discussed earlier).

In the same directory, create an empty directory named /plugins. This directory is left empty but is needed when building the Fluentd Docker image. For more information about building custom Fluentd Docker images, see the Fluentd page on DockerHub.

Build and tag the image:

docker build -t custom-fluentd:latest .

Push the image to an ECR repository so that it can be used in the Fargate service.

Deploy the Fluentd aggregator on Fargate

Download the CloudFormation template, which defines all the resources needed for the aggregator service. This includes the Network Load Balancer, target group, security group, and task definition.

First, create a cluster:

aws ecs create-cluster --cluster-name fargate-fluentd-tutorial

Second, create an Amazon ECS task execution IAM role. This allows the Fargate task to pull the Fluentd container image from ECR. It also allows the Fluentd Aggregator Tasks to log to Amazon CloudWatch. This is important: Fluentd can’t manage its own logs because that would be a circular dependency.

The template can then be launched into the VPC and private subnets created earlier. Add the required information as parameters:

aws cloudformation deploy --template-file template.yml --stack-name aggregator-service \
--parameter-overrides EnvironmentName=fluentd-aggregator-service \
DockerImage=<Repository URI used for the image built in Step 4> \
VPC=<your VPC ID> \
Subnets=<private subnet 1, private subnet 2> \
Cluster=fargate-fluentd-tutorial \
ExecutionRoleArn=<your task execution IAM role ARN> \
MinTasks=2 \
MaxTasks=4 \
--capabilities CAPABILITY_NAMED_IAM

MinTasks is set to 2, and MaxTasks is set to 4, which means that the aggregator always has at least two tasks.

When load increases, it can dynamically scale up to four tasks. Recall the discussion in Aggregator performance and throughput and set these values based upon your own expected log throughput.

Configure ECS tasks to send logs to the aggregator

First, get the DNS name of the load balancer created by the CloudFormation template. In the EC2 console, choose Load Balancers. The load balancer has the same value as the EnvironmentName parameter. In this case, it is fluentd-aggregator-service.

Create a container definition for a container that logs to the Fluentd aggregator by adding the appropriate values for logConfiguration. In the following example, replace the fluentd-address value with the DNS name for your own load balancer. Ensure that you add :24224 after the DNS name; the aggregator listens on TCP port 24224.

"logConfiguration": {
    "logDriver": "fluentd",
    "options": {
        "fluentd-address": "fluentd-aggregator-service-cfe858972373a176.elb.us-west-2.amazonaws.com:24224",
        "tag": "frontend-apache"
    }
}

Notice the tag value, frontend-apache. This is how the tag discussed earlier is set. This tag matches the pattern frontend*, so the Fluentd aggregator sends it to the delivery stream for “frontend” logs.

Finally, your container instances need the following user data to enable the Fluentd log driver in the ECS agent:

#!/bin/bash
echo "ECS_AVAILABLE_LOGGING_DRIVERS=[\"awslogs\",\"fluentd\"]" >> /etc/ecs/ecs.config

Conclusion

In this post, I showed you how to build a log aggregator using AWS Fargate, Amazon Kinesis Data Firehose, and Fluentd.

To learn how to use the aggregator with applications that do not run on ECS, I recommend reading All Data Are Belong to AWS: Streaming upload via Fluentd from Kiyoto Tamura, a maintainer of Fluentd.

Automatically update instances in an Amazon ECS cluster using the AMI ID parameter

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/automatically-update-instances-in-an-amazon-ecs-cluster-using-the-ami-id-parameter/

This post is contributed by Adam McLean – Solutions Developer at AWS and Chirill Cucereavii – Application Architect at AWS 

In this post, we show you how to automatically refresh the container instances in an active Amazon Elastic Container Service (ECS) cluster with instances built from a newly released AMI.

The Amazon ECS-optimized AMI  comes prepackaged with the ECS container agent, Docker agent, and the ecs-init upstart service. We recommend that you use the Amazon ECS-optimized AMI for your container instances unless your application requires any of the following:

  • A specific operating system
  • Custom security and monitoring agents installed
  • Root volumes encryption enabled
  • A Docker version that is not yet available in the Amazon ECS-optimized AMI

Regardless of the type of AMI that you choose, AWS recommends updating your ECS containers instance fleet with the latest AMI whenever possible. It’s easier than trying to patch existing instances in place.

Solution overview

In this solution, you deploy the ECS cluster and, specify cluster size, instance type, AMI ID, and other parameters. After the ECS cluster has been created and instances registered, you can update the ECS cluster with another AMI ID to trigger the following events:

  1. A new launch configuration is created using the new AMI ID.
  2. The Auto Scaling group adds one new instance using the new launch configuration. This executes the ‘Adding Instances’ process described below.
  3. The adding instances process finishes for the single new node with the new AMI. Then, the removing instances process is started against the oldest instance with the old AMI ID.
  4. After the removing nodes process is finished, steps 2 and 3 are repeated until all nodes in the cluster have been replaced.
  5. If an error is encountered during the rollout, the new launch configuration is deleted, and the old one is put back in place.

Scaling a cluster out (adding instances)

Take a closer look at each step in scaling out a cluster:

  1. A stack update changes the AMI ID parameter.
  2. CloudFormation updates the launch configuration and tells the Auto Scaling group to add an instance.
  3. Auto Scaling launches an instance using new AMI ID to join the ECS cluster.
  4. Auto Scaling invokes the Launch Lambda function.
  5. Lambda asks the ECS cluster if the newly launched instance has joined and is showing healthy.
  6. Lambda tells Auto Scaling whether the launch succeeded or failed.
  7. Auto Scaling tells CloudFormation whether the scale-up has succeeded.
  8. The stack update succeeds, or rolls back.

Scaling a cluster in (removing instances)

Take a closer look at each step in scaling in a cluster:

  1. CloudFormation tells the Auto Scaling group to remove an instance.
  2. Auto Scaling chooses an instance to be terminated.
  3. Auto Scaling invokes the Terminate Lambda function.
  4. The Lambda function performs the following tasks:
    1. Sets the instance to be terminated to DRAINING mode.
    2. Confirms that all ECS tasks are drained from the instance marked for termination.
    3. Confirms that the ECS cluster services and tasks are stable.
  5. Lambda tells Auto Scaling to proceed with termination.
  6. Auto Scaling tells CloudFormation whether the scale-in has succeeded.
  7. The stack update succeeds, or rolls back.

Solution technologies

Here are the technologies used for this solution, with more details.

  • AWS CloudFormation
  • AWS Auto Scaling
  • Amazon CloudWatch Events
  • AWS Systems Manager Parameter Store
  • AWS Lambda

AWS CloudFormation

AWS CloudFormation is used to deploy the stack, and should be used for lifecycle management. Do not directly edit Auto Scaling groups, Lambda functions, and so on. Instead, update the CloudFormation template.

This forces the resolution of the latest AMI, as well as providing an opportunity to change the size or instance type involved in the ECS cluster.

CloudFormation has rollback capabilities to return to the last known good state if errors are encountered. It is the recommended mechanism for management through the clusters lifecycle.

AWS Auto Scaling

For ECS, the primary scaling and rollout mechanism is AWS Auto Scaling. Auto Scaling allows you to define a desired state environment, and keep that desired state as necessary by launching and terminating instances.

When a new AMI has been selected, CloudFormation informs Auto Scaling that it should replace the existing fleet of instances. This is controlled by an Auto Scaling update policy.

This solution rolls a single instance out to the ECS cluster, then drain, and terminate a single instance in response. This cycle continues until all instances in the ECS cluster have been replaced.

Auto Scaling lifecycle hooks

Auto Scaling permits the use of a lifecycle hooks. This is code that executes when a scaling operation occurs. This solution uses a Lambda function that is informed when an instance is launched or terminated.

A lifecycle hook informs Auto Scaling whether it can proceed with the activity or if it should abandon it. In this case, the ECS cluster remains healthy and all tasks have been redistributed before allowing Auto Scaling to proceed.

Lifecycles also have a timeout. In this case, it is 3600 seconds (1 hour) before Auto Scaling gives up. In that case, the default activity is to abandon the operation.

Amazon CloudWatch Events

CloudWatch Events is a mechanism for watching calls made to the AWS APIs, and then activating functions in response. This is the mechanism used to launch the Lambda functions when a lifecycle event occurs. It’s also the mechanism used to re-launch the Lambda function when it times out (Lambda maximum execution time is 15 minutes).

In this solution, four CloudWatch Events are created. Two to pick up the initial scale-up event. Two more to pick up a continuation from the Lambda function.

AWS Systems Manager Parameter Store

AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration data management and secrets management.

This solution relies on the AMI IDs stored in Parameter Store. Given a naming standard of /ami/ecs/latest, this always resolves to the latest available AMI for ECS.

CloudFormation now supports using the values stored in Parameter Store as inputs to CloudFormation templates. The template can be simply passed a value—/ami/ecs/latest—and it resolve that to the latest AMI.

AWS Lambda

The Lambda functions are used to handle the Auto Scaling lifecycle hooks. They operate against the ECS cluster to assure it is healthy, and inform Auto Scaling that it can proceed, or to abandon its current operation.

The functions are invoked by CloudWatch Events in response to scaling operations so they are idle unless there are changes happening in the cluster.

They’re written in Python, and use the boto3 SDK to communicate with the ECS cluster, and Auto Scaling.

The launch Lambda function waits until the instance has fully joined the ECS cluster. This is shown by the instance being marked ‘ACTIVE’ by the ECS control plane, and it’s ECS agent status showing as connected. This means that the new instance is ready to run tasks for the cluster.

The terminate Lambda function waits until the instance has fully drained all running tasks. It also checks that all tasks, and services are in a stable state before allowing Auto Scaling to terminate an instance. This assures the instance is truly idle, and the cluster stable before an instance can be removed.

Deployment

Before you begin deployment, you need the following:

  • An AWS account 

You need an AWS account with enough room to accommodate the additional EC2 instances required by the ECS cluster.

  • (Optional) Linux system 

Use AWS CLI and optionally JQ to deploy the solution. Although a Linux system is recommended, it’s not required.

  • IAM user

You need an IAM admin user with permissions to create IAM policies and roles and create and update CloudFormation stacks. The user must also be able to deploy the ECS cluster, Lambda functions, Systems Manager parameters, and other resources.

  • Download the code

Clone or download the project from https://github.com/awslabs/ecs-cluster-manager on GitHub:

git clone [email protected]:awslabs/ecs-cluster-manager.git

AMI ID parameter

Create an Systems Manager parameter where the desired AMI ID is stored.

The first run does not use the latest ECS optimized AMI. Later, you update the ECS cluster to the latest AMI.

Use the AMI released on 2017.09. Run the following commands to create /ami/ecs/latest parameter in Parameter Store with a corresponding AMI value.

AMI_ID=$(aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux/amzn-ami-2017.09.l-amazon-ecs-optimized --region us-east-1 --query "Parameters[].Value" --output text | jq -r .image_id)
aws ssm put-parameter \
  --overwrite \
  --name "/ami/ecs/latest" \
  --type "String" \
  --value $AMI_ID \
  --region us-east-1 \
  --profile devAdmin

Substitute us-east-1 with your desired Region.

In the AWS Management Console, choose AWS Systems Manager, Parameter Store.

You should see the /ami/ecs/latest parameter that you just created.

Select the /ami/ecs/latest parameter and make sure that the AMI ID is present in parameter value. If you are using the us-east-1 Region, you should see the following value:

ami-aff65ad2

Upload the Lambda function code to Amazon S3

The Lambda functions are too large to embed in the CloudFormation template. Therefore, they must be loaded into an S3 bucket before CloudFormation stack is created.

Assuming you’re using an S3 bucket called ecs-deployment,  copy each Lambda function zip file as follows:

cd ./ecs-cluster-manager
aws s3 cp lambda/ecs-lifecycle-hook-launch.zip s3://ecs-deployment
aws s3 cp lambda/ecs-lifecycle-hook-terminate.zip s3://ecs-deployment

Refer to these when running your CloudFormation template later so that CloudFormation knows where to find the Lambda files.

Lambda function role

The Lambda functions that execute require read permissions to EC2, write permissions to ECS, and permissions to submit a result or heartbeat to Auto Scaling.

Create a new LambdaECSScaling IAM policy in your AWS account. Use the following JSON as the policy body:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:CompleteLifecycleAction",
                "autoscaling:DescribeScalingActivities",
                "autoscaling:RecordLifecycleActionHeartbeat",
                "ecs:UpdateContainerInstancesState",
                "ecs:Describe*",
                "ecs:List*"
            ],
            "Resource": "*"
        }
    ]
}

 

Now, create a new LambdaECSScalingRole IAM role. For Trusted Entity, choose AWS Service, Lambda. Attach the following permissions policies:

  • LambdaECSScaling (created in the previous step)
  • ReadOnlyAccess (AWS managed policy)
  • AWSLambdaBasicExecutionRole (AWS managed policy)

ECS cluster instance profile

The ECS cluster nodes must have an instance profile attached that allows them to speak to the ECS service. This profile can also contain any other permissions that they would require (Systems Manager for management and executing commands for example).

These are all AWS managed policies so you only add the role.

Create a new IAM role called EcsInstanceRole, select AWS Service → EC2 as Trusted Entity. Attach the following AWS managed permissions policies:

  • AmazonEC2RoleforSSM
  • AmazonEC2ContainerServiceforEC2Role
  • AWSLambdaBasicExecutionRole

The AWSLambdaBasicExecutionRole policy may look out of place, but this allows the instance to create new CloudWatch Logs groups. These permissions facilitate using CloudWatch Logs as the primary logging mechanism with ECS. This managed policy grants the required permissions without you needing to manage a custom role.

CloudFormation parameter file

We recommend using a parameter file for the CloudFormation template. This documents the desired parameters for the template. It is usually less error prone to do this versus using the console for inputting parameters.

There is a file called blank_parameter_file.json in the source code project. Copy this file to something new and with a more meaningful name (such as dev-cluster.json), then fill out the parameters.

The file looks like this:

[
  {
    "ParameterKey": "EcsClusterName",
    "ParameterValue": ""
  }, 
  {
    "ParameterKey": "EcsAmiParameterKey",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "IamRoleInstanceProfile",
    "ParameterValue": ""
  }, 
  {
    "ParameterKey": "EcsInstanceType",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "EbsVolumeSize",
    "ParameterValue": ""
  }, 
  {
    "ParameterKey": "ClusterSize",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "ClusterMaxSize",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "KeyName",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "SubnetIds",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "SecurityGroupIds",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "DeploymentS3Bucket",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "LifecycleLaunchFunctionZip",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "LifecycleTerminateFunctionZip",
    "ParameterValue": ""
  },
  {
    "ParameterKey": "LambdaFunctionRole",
    "ParameterValue": ""
  }
]

Here are the details for each parameter:

  • EcsClusterName:  The name of the ECS cluster to create.
  • EcsAmiParameterKey:  The Systems Manager parameter that contains the AMI ID to be used. This defaults to /ami/ecs/latest.
  • IamRoleInstanceProfile:  The name of the EC2 instance profile used by the ECS cluster members. Discussed in the prerequisite section.
  • EcsInstanceType:  The instance type to use for the cluster. Use whatever is appropriate for your workloads.
  • EbsVolumeSize:  The size of the Docker storage setup that is created using LVM. ECS typically defaults to 100 GB.
  • ClusterSize:  The desired number of EC2 instances for the cluster.
  • ClusterMaxSize:  This value should always be double the amount contained in ClusterSize. CloudFormation has no ‘math’ operators or we wouldn’t prompt for this. This allows rolling updates to be performed safely by doubling the cluster size, then contracting back.
  • KeyName:  The name of the EC2 key pair to place on the ECS instance to support SSH.
  • SubnetIds: A comma-separated list of subnet IDs that the cluster should be allowed to launch instances into. These should map to at least two zones for a resilient cluster, for example subnet-a70508df,subnet-e009eb89.
  • SecurityGroupIds:  A comma-separated list of security group IDs that are attached to each node, for example sg-bd9d1bd4,sg-ac9127dca (a single value is fine).
  • DeploymentS3Bucket: This is the bucket where the two Lambda functions for scale in/scale out lifecycle hooks can be found.
  • LifecycleLaunchFunctionZip: This is the full path within the DeploymentS3Bucket where the ecs-lifecycle-hook-launch.zip contents can be found.
  • LifecycleTerminateFunctionZip:  The full path within the DeploymentS3Bucket where the ecs-lifecycle-hook-terminate.zip contents can be found.
  • LambdaFunctionRole:  The name of the role that the Lambda functions use. Discussed in the prerequisite section.

A completed parameter file looks like the following:

[
  {
    "ParameterKey": "EcsClusterName",
    "ParameterValue": "DevCluster"
  }, 
  {
    "ParameterKey": "EcsAmiParameterKey",
    "ParameterValue": "/ami/ecs/latest"
  },
  {
    "ParameterKey": "IamRoleInstanceProfile",
    "ParameterValue": "EcsInstanceRole"
  }, 
  {
    "ParameterKey": "EcsInstanceType",
    "ParameterValue": "m4.large"
  },
  {
    "ParameterKey": "EbsVolumeSize",
    "ParameterValue": "100"
  }, 
  {
    "ParameterKey": "ClusterSize",
    "ParameterValue": "2"
  },
  {
    "ParameterKey": "ClusterMaxSize",
    "ParameterValue": "4"
  },
  {
    "ParameterKey": "KeyName",
    "ParameterValue": "dev-cluster"
  },
  {
    "ParameterKey": "SubnetIds",
    "ParameterValue": "subnet-a70508df,subnet-e009eb89"
  },
  {
    "ParameterKey": "SecurityGroupIds",
    "ParameterValue": "sg-bd9d1bd4"
  },
  {
    "ParameterKey": "DeploymentS3Bucket",
    "ParameterValue": "ecs-deployment"
  },
  {
    "ParameterKey": "LifecycleLaunchFunctionZip",
    "ParameterValue": "ecs-lifecycle-hook-launch.zip"
  },
  {
    "ParameterKey": "LifecycleTerminateFunctionZip",
    "ParameterValue": "ecs-lifecycle-hook-terminate.zip"
  },
  {
    "ParameterKey": "LambdaFunctionRole",
    "ParameterValue": "LambdaECSScalingRole"
  }
]

Deployment

Given the CloudFormation template and the parameter file, you can deploy the stack using the AWS CLI or the console.

Here’s an example deploying through the AWS CLI. This example uses a stack named ecs-dev and a parameter file named dev-cluster.json. It also uses the --profile argument to assure that the CLI assumes a role in the right account for deployment. Use the corresponding Region and profile from your local ~/.aws/config file.

This command outputs the stack ID as soon as it is executed, even though the other required resources are still being created.

aws cloudformation create-stack \
  --stack-name ecs-dev \
  --template-body file://./ecs-cluster.yaml \
  --parameters file://./dev-cluster.json \
  --region us-east-1 \
  --profile devAdmin

Use the AWS Management Console to check whether the stack is done creating. Or, run the following command:

aws cloudformation wait stack-create-complete \
  --stack-name ecs-dev \
  --region us-east-1 \
  --profile devAdmin

Use the AWS Management Console to check whether the stack is done creating. Or, run the following command:

aws cloudformation wait stack-create-complete \
  --stack-name ecs-dev \
  --region us-east-1 \
  --profile devAdmin

After the CloudFormation stack has been created, go to the ECS console. and open the DevCluster cluster that you just created. There are no tasks running, although you should see two container instances registered with the cluster.

You also see a warning message indicating that the container instances are not running the latest version of Amazon ECS container agent. The reason is that you did not use the latest available version of the ECS-Optimized AMI.

Fix this issue by updating the container instances AMI.

Update the cluster instances AMI

Run the following commands to set the /ami/ecs/latest parameter to the latest AMI ID.

AMI_ID=$(aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux/recommended --region us-east-1 --query "Parameters[].Value" --output text | jq -r .image_id)

 aws ssm put-parameter \
   --overwrite \
   --name "/ami/ecs/latest" \
   --type "String" \
   --value $AMI_ID
   --region us-east-1 \
   --profile devAdmin

Make sure that the parameter value has been updated in the console.

To update your ECS cluster, run the update-stack command without changing any parameters. CloudFormation evaluates the value stored by /ami/ecs/latest. If it has changed, CloudFormation makes updates as appropriate.

aws cloudformation update-stack \
  --stack-name ecs-dev \
  --template-body file://./ecs-cluster.yaml \
  --parameters file://./dev-cluster.json \
  --region us-east-1 \
  --profile devAdmin

Supervising updates

We recommend supervising your updates to the ECS cluster while they are being deployed. This assures that the cluster remains stable. For the majority of situations, there is no manual intervention required.

  • Keep an eye on Auto Scaling activities. In the Auto Scaling groups section of the EC2 console, select the Auto Scaling group for a cluster and choose Activity History.
  • Keep an eye on the ECS instances to ensure that new instances are joining and draining instances are leaving. In the ECS console, choose Cluster, ECS Instances.
  • Lambda function logs help troubleshoot things that aren’t behaving as expected. In the Lambda console, select the LifeCycleLaunch or LifeCycleTerminate functions, and choose Monitoring, View logs in CloudWatch. Expand the logs for the latest executions and see what’s going on:

When you go back to the ECS cluster page, notice that the “Outdated Amazon ECS container agent” warning message has disappeared.

Select one of the cluster’s EC2 instance IDs and observe that the latest ECS optimized AMI is used.

Summary

In this post, you saw how to use CloudFormation, Lambda, CloudWatch Events, and Auto Scaling lifecycle hooks to update your ECS cluster instances with a new AMI.

The sample code is available on GitHub for you to use and extend. Contributions are always welcome!

 

Scheduling GPUs for deep learning tasks on Amazon ECS

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/scheduling-gpus-for-deep-learning-tasks-on-amazon-ecs/

This post is contributed by Brent Langston – Sr. Developer Advocate, Amazon Container Services

Last week, AWS announced enhanced Amazon Elastic Container Service (Amazon ECS) support for GPU-enabled EC2 instances. This means that now GPUs are first class resources that can be requested in your task definition, and scheduled on your cluster by ECS.

Previously, to schedule a GPU workload, you had to maintain your own custom configured AMI, with a custom configured Docker runtime. You also had to use custom vCPU logic as a stand-in for assigning your GPU workloads to GPU instances. Even when all that was in place, there was still no pinning of a GPU to a task. One task might consume more GPU resources than it should. This could cause other tasks to not have a GPU available.

Now, AWS maintains an ECS-optimized AMI that includes the correct NVIDIA drivers and Docker customizations. You can use this AMI to provision your GPU workloads. With this enhancement, GPUs can also be requested directly in the task definition. Like allocating CPU or RAM to a task, now you can explicitly request a number of GPUs to be allocated to your task. The scheduler looks for matching resources on the cluster to place those tasks. The GPUs are pinned to the task for as long as the task is running, and can’t be allocated to any other tasks.

I thought I’d see how easy it is to deploy GPU workloads to my ECS cluster. I’m working in the US-EAST-2 (Ohio) region, from my AWS Cloud9 IDE, so these commands work for Amazon Linux. Feel free to adapt to your environment as necessary.

If you’d like to run this example yourself, you can find all the code in this GitHub repo. If you run this example in your own account, be aware of the instance pricing, and clean up your resources when your experiment is complete.

Clone the repo using the following command:

git clone https://github.com/brentley/tensorflow-container.git

Setup

You need the latest version of the AWS CLI (for this post, I used 1.16.98):

echo “export PATH=$HOME/.local/bin:$HOME/bin:$PATH” >> ~/.bash_profile
source ~/.bash_profile
pip install --user -U awscli

Provision an ECS cluster, with two C5 instances, and two P3 instances:

aws cloudformation deploy --stack-name tensorflow-test --template-file cluster-cpu-gpu.yml --capabilities CAPABILITY_IAM                            

While AWS CloudFormation is provisioning resources, examine the template used to build your infrastructure. Open `cluster-cpu-gpu.yml`, and you see that you are provisioning a test VPC with two c5.2xlarge instances, and two p3.2xlarge instances. This gives you one NVIDIA Tesla V100 GPU per instance, for a total of two GPUs to run training tasks.

I adapted the TensorFlow benchmark Docker container to create a training workload. I use this container to compare the GPU scheduling and runtime.

When the CloudFormation stack is deployed, register a task definition with the ECS service:

aws ecs register-task-definition --cli-input-json file://gpu-1-taskdef.json

To request GPU resources in the task definition, the only change needed is to include a GPU resource requirement in the container definition:

            "resourceRequirements": [
                {
                    "type": "GPU",
                    "value": "1"
                }
            ],

Including this resource requirement ensures that the ECS scheduler allocates the task to an instance with a free GPU resource.

Launch a single-GPU training workload

Now you’re ready to launch the first GPU workload.

export cluster=$(aws cloudformation describe-stacks --stack-name tensorflow-test --query 
'Stacks[0].Outputs[?OutputKey==`ClusterName`].OutputValue' --output text) 
echo $cluster
aws ecs run-task --cluster $cluster --task-definition tensorflow-1-gpu

When you launch the task, the output shows the `guIds` values that are assigned to the task. This GPU is pinned to this task, and can’t be shared with any other tasks. If all GPUs are allocated, you can’t schedule additional GPU tasks until a running task with a GPU completes. That frees the GPU to be scheduled again.

When you look at the log output in Amazon CloudWatch Logs, you see that the container discovered one GPU: `/gpu0` and the training benchmark trained at a rate of 321.16 images/sec.

With your two p3.2xlarge nodes in the cluster, you are limited to two concurrent single GPU based workloads. To scale horizontally, you could add additional p3.2xlarge nodes. This would limit your workloads to a single GPU each.  To scale vertically, you could bump up the instance type,  which would allow you to assign multiple GPUs to a single task.  Now, let’s see how fast your TensorFlow container can train when assigned multiple GPUs.

Launch a multiple-GPU training workload

To begin, replace the p3.2xlarge instances with p3.16xlarge instances. This gives your cluster two instances that each have eight GPUs, for a total of 16 GPUs that can be allocated.

aws cloudformation deploy --stack-name tensorflow-test --template-file cluster-cpu-gpu.yml --parameter-overrides GPUInstanceType=p3.16xlarge --capabilities CAPABILITY_IAM

When the CloudFormation deploy is complete, register two more task definitions to launch your benchmark container requesting more GPUs:

aws ecs register-task-definition --cli-input-json file://gpu-4-taskdef.json  
aws ecs register-task-definition --cli-input-json file://gpu-8-taskdef.json 

Next, launch two TensorFlow benchmark containers, one requesting four GPUs, and one requesting eight GPUs:

aws ecs run-task --cluster $cluster --task-definition tensorflow-4-gpu
aws ecs run-task --cluster $cluster --task-definition tensorflow-8-gpu

With each task request, GPUs are allocated: four in the first request, and eight in the second request. Again, these GPUs are pinned to the task, and not usable by any other task until these tasks are complete.

Check the log output in CloudWatch Logs:

On the “devices” lines, you can see that the container discovered and used four (or eight) GPUs. Also, the total images/sec improved to 1297.41 with four GPUs, and 1707.23 with eight GPUs.

Because you can pin single or multiple GPUs to a task, running advanced GPU based training tasks on Amazon ECS is easier than ever!

Cleanup

To clean up your running resources, delete the CloudFormation stack:

aws cloudformation delete-stack --stack-name tensorflow-test

Conclusion

For more information, see Working with GPUs on Amazon ECS.

If you want to keep up on the latest container info from AWS, please follow me on Twitter and tweet any questions! @brentContained

Setting up AWS PrivateLink for AWS Fargate, Amazon ECS, and Amazon ECR

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/setting-up-aws-privatelink-for-aws-fargate-amazon-ecs-and-amazon-ecr/

This post is contributed by Nathan Peck – Developer Advocate, Amazon Container Services

AWS Fargate, Amazon ECS, and Amazon ECR now have support for AWS PrivateLink. AWS PrivateLink is a networking technology designed to enable access to AWS services in a highly available and scalable manner, while keeping all the network traffic within the AWS network. When you create AWS PrivateLink endpoints for ECR, ECS, and Fargate, these service endpoints appear as elastic network interfaces with a private IP address in your VPC.

Before AWS PrivateLink, your Amazon EC2 instances had to use an internet gateway to download Docker images stored in ECR or communicate to the ECS control plane. Instances in a public subnet with a public IP address used the internet gateway directly. Instances in a private subnet used a network address translation (NAT) gateway hosted in a public subnet. The NAT gateway would then use the internet gateway to talk to ECR and ECS.

Now that AWS PrivateLink support has been added, instances in both public and private subnets can use it to get private connectivity to download images from Amazon ECR. Instances can also communicate with the ECS control plane via AWS PrivateLink endpoints without needing an internet gateway or NAT gateway.

This networking architecture is considerably simpler. It enables enhanced security by allowing you to deny your private EC2 instances access to anything other than these AWS services. That’s assuming that you want to block all other outbound internet access for those instances. For this to work, you must create some AWS PrivateLink resources:

  • AWS PrivateLink endpoints for ECR. This allows instances in your VPC to communicate with ECR to download image manifests
  • AWS PrivateLink gateway for Amazon S3. This allows instances to download the image layers from the underlying private S3 buckets that host them.
  • AWS PrivateLink endpoints for ECS. These endpoints allow instances to communicate with the telemetry and agent services in the ECS control plane.

This post explains how to create these resources.

Create an AWS PrivateLink interface endpoint for ECR

ECR requires 2 interface endpoints:

  • com.amazonaws.region.ecr.api
  • com.amazonaws.region.ecr.dkr

First, create the interface VPC endpoints for ECR using the endpoint creation wizard in the VPC dashboard separately. Select AWS services and select an endpoint. Substitute your region of choice.

Next, specify the VPC and subnets to which the AWS PrivateLink interface should be added. Make sure that you select the same VPC in which your ECS cluster is running. To be on the safe side, select every Availability Zone and subnet from the list. Each zone has a list of the subnets available. You can select all the subnets in each Availability Zone.

However, depending on your networking needs, you might also choose to only enable the AWS PrivateLink endpoint in your private subnets from each Availability Zone. Let instances running in a public subnet continue to communicate with ECR via the public subnet’s internet gateway.

Next, enable Private DNS Name. You are required to enable Private DNS Name for endpoint

com.amazonaws.region.ecr.dkr.

A private hosted zone enables you to access the resources in your VPC using the Amazon ECR default DNS domain names instead of using private IPv4 address or private DNS hostnames provided by AWS VPC Endpoints. The Amazon ECR DNS hostname that AWS CLI and Amazon ECR SDKs use by default (https://api.ecr.region.amazonaws.com) resolves to your VPC endpoint.

If you enabled a private hosted zone for com.amazonaws.region.ecr.api and you are using an SDK released before January 24, 2019, you must specify the following endpoint when using SDK or AWS CLI. For example:

aws --endpoint-url https://api.ecr.region.amazonaws.com

If you don’t enable a private hosted zone, this would be:

aws --endpoint-url https://VPC_Endpoint_ID.api.ecr.region.vpce.amazonaws.com ecr describe-repositories

If you enabled a private hosted zone and you are using the SDK released on January 24, 2019 or later, this would be:

aws ecr describe-repositories

Lastly, specify a security group for the interface itself. This is going to control whether each host is able to talk to the interface. The security group should allow inbound connections on port 80 from the instances in your cluster.

You may have a security group that is applied to all the EC2 instances in the cluster, perhaps using an Auto Scaling group. You can create a rule that allows the VPC endpoint to be accessed by any instance in that security group.

Finally, choose Create endpoint. The new endpoint appears in the list.

Add an AWS PrivateLink gateway endpoint for S3

The next step is to create a gateway VPC endpoint for S3. This is necessary because ECR uses S3 to store Docker image layers. When your instances download Docker images from ECR, they must access ECR to get the image manifest and S3 to download the actual image layers.

S3 uses a slightly different endpoint type called a gateway. Be careful about adding an S3 gateway to your VPC if your application is actively using S3. With gateway endpoints, your application’s existing connections to S3 may be briefly interrupted while the gateway is being added. You may have a busy cluster with many active ECS deployments, causing image layer downloads from S3. Or, your application itself may make heavy usage of S3. In that case, it’s best to create a fresh new VPC with an S3 gateway, then migrate your ECS cluster and its containers into that VPC.

To add the S3 gateway endpoint, select com.amazonaws.region.s3 on the list of AWS services and select the VPC hosting your ECS cluster. Gateway endpoints are added to the VPC route table for the subnets. Select each route table associated with the subnet in which the S3 gateway should be.

Instead of using a security group, the gateway endpoint uses an IAM policy document to limit access to the service. This policy is similar to an IAM policy but does not replace the default level of access that your applications have through their IAM role. It just further limits what portions of the service are available via the gateway. It’s okay to just use the default Full Access policy. Any restrictions you have put on your task IAM roles or other IAM user policies still apply on top of this policy.

Choose Create to add this gateway endpoint to your VPC. When you view the route tables in your VPC subnets, you see an S3 gateway that is used whenever ECR Docker image layers are being downloaded from S3.

Create an AWS PrivateLink interface endpoint for ECS

In addition to downloading Docker images from ECR, your EC2 instances must also communicate with the ECS control plane to receive orchestration instructions.

ECS requires three endpoints:

  • com.amazonaws.region.ecs-agent
  • com.amazonaws.region.ecs-telemetry
  • com.amazonaws.region.ecs

Create these three interface endpoints in the same way that you created the endpoint for ECR, by adding each endpoint and setting the subnets and security group for the endpoint.

After the endpoints are created and added to your VPC, there is one additional step. Restart any ECS agents that are currently running in the VPC. The ECS agent uses a persistent web socket connection to the ECS backend and VPC endpoints do not interrupt existing connections. The agent continues to use its existing connection instead of establishing a new connection through the new endpoint, unless you restart it.

To restart the agent with no disruption to your application containers, you can connect using SSH to each EC2 instance in the cluster and issue the following command:

sudo docker restart ecs-agent

This restarts the ECS agent without stopping any of the other application containers on the host. Your application may be stateless and safe to stop at any time, or you may not have or want SSH access to the underlying hosts. In that case, choose to just reboot each EC2 instance in the cluster one at a time. This restarts the agent on that host while also restarting any service launched tasks on that host on a different host.

If you are using AWS Fargate, you can issue an UpdateService API call to do a rolling restart of all your containers. Or, manually stop your running containers one by one and let them be automatically replaced. When they restart, they use an ECS agent that is communicating using the new ECS endpoints. The Docker image is downloaded using the ECR endpoint and S3 gateway.

Conclusion

In this post, I showed you how to add AWS PrivateLink endpoints to your VPC for ECS and ECR, including an S3 gateway for ECR layer downloads. These endpoints work whether you are running your containers on EC2 instances in a self-managed cluster in your VPC, or as Fargate containers running in your VPC.

Your ECS cluster or Fargate tasks communicate directly with the ECS control plane. They should be able to download Docker images directly without needing to make any connections outside of your VPC via an internet gateway or NAT gateway. All container orchestration traffic stays inside the VPC.

If you have questions or suggestions, please comment below.

Migrate Wildfly Cluster to Amazon ECS using Service Discovery

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/migrate-wildfly-cluster-to-ecs-using-service-discovery/

This post is courtesy of Vidya Narasimhan, AWS Solutions Architect

1. Overview

Java Enterprise Edition has been an important server-side platform for over a decade for developing mission-critical & large-scale applications amongst enterprises. High-availability & fault tolerance for such applications is typically achieved through built-in JEE clustering provided by the platform.

JEE clustering represents a group of machines working together to transparently provide enterprise services such as JNDI, EJB, JMS, HTTPSession etc. that enable distribution, discovery, messaging, transaction, caching, replication & component failover.  Implementation of clustering technology varies in JEE platforms provided by different vendors. Many of the clustering implementations involve proprietary communication protocols that use multicast for intra-cluster communications that is not supported in public cloud.

This article is relevant for JEE platforms & other products that use JGroups based clustering such as Wildfly. The solution described allows easy migration of applications developed on these platforms using native clustering to Amazon Elastic Container Service (Amazon ECS) which is a highly scalable, fast, container management service that makes it easy to orchestrate, run & scale Docker containers on a cluster. This solution is useful when the business objective is to migrate to cloud fast with minimum changes to the application. The approach recommends lift & shift to AWS wherein the initial focus is to migrate as-is with optimizations coming in later incrementally.

Whether the JEE application to be migrated is designed as a monolith or micro services, a legacy or green-field deployment, there are multiple reasons why organizations should opt for containerization of their application. This link explains well the benefits of containerization (see section Why Use Containers) https://aws.amazon.com/getting-started/projects/break-monolith-app-microservices-ecs-docker-ec2/module-one/

2. Wildfly Clustering on ECS

Here onwards, this article highlights how to migrate a standard clustered JEE app deployed on Wildfly Application Server to Amazon ECS. Wildfly supports clustering out of the box and supports two modes of clustering, standalone & domain mode. This article explores how to setup WildFly cluster in ECS with multiple Wildfly standalone nodes enabled for HA to form a cluster. The clustering is demonstrated through a web application that replicates session information across the cluster nodes and can withstand a failover without session data loss.

The important components of clustering that requires a mention right away are ECS Service Discovery, JGroups & Infinispan.

  • JGroups – Wildfly clustering is enabled by the popular open-source JGroups toolkit. The JGroups subsystem provides group communication support for HA services using a multicast transmission by default. It deals with all aspects of node discovery and providing reliable messaging between the nodes as follows-
    • Node-to-node messaging — By default is based on UDP/multicast that can be extended via TCP/unicast.
    • Node discovery — By default uses multicast ping MPING. Alternatives include TCPPING, S3_PING, JDBC_PING, DNS_PING and others.

This article focusses on DNS_PING for node discovery using TCP protocol.

ECS Service discovery – Amazon ECS service can optionally be configured to use Amazon ECS Service Discovery. Service discovery uses Amazon Route 53 auto naming API actions to manage DNS entries (A or SRV records) for service tasks, making them discoverable within your VPC. You can specify health check conditions in a service task definition and Amazon ECS will ensure that only healthy service endpoints are returned by a service lookup.

As your services scale up or down in response to load or container health, the Route 53 hosted zone is kept up to date.

Wildfly uses JGroups to discover the cluster nodes via DNS_PING discovery protocol that sends a DNS service endpoint query to the ECS service registry maintained in Route53.

  • Infinispan – Wildfly uses Infinispan subsystem to provides high-performance, clustered, transactional caching. In a clustered web application, Infinispan handles the replication of application data across the cluster by means of a replicated/distributed cache. Under the hood, it uses JGroups channel for data transmission within the cluster.

3. Implementation Instructions

Configure Wildfly

  • Modify Wildfly standalone configuration file – Standalone-HA.xml. The HA suffix implies high availability configuration.
  1.  Modify the JGroup Subsystem – Add a TCP Stack with DNS_Ping as the discovery protocol & configure the DNS Query endpoint. It is important to note that the DNS_QUERY matches the ECS  service endpoint when configuring the ECS service.  
  2. Change the JGroup default stack to point to the TCP Stack.                           
  3. Configure a custom Infinispan replicated cache to be used by the web app or use the default cache.      

Build the docker image & store it in Elastic Container Registry (ECR)

  1. Package the JBoss/Wildfly image with JEE application & Wildfly platform on Docker. Create a Dockerfile & include the following:
    1. Install the WildFly distribution & set permissions – This approach requires the latest Wildfly distribution 15.0.0.Final released recently.          
    2. Copy the modified Wildfly standalone-ha.xml to the container.
    3. Deploy the JEE web application. This simple web app is configured as distributable and uses Infinispan to replicate session information across cluster nodes. It displays a page containing the container IP/hostname, Session ID & session data & helps demonstrate session replication.                   
    4. Define a custom entrypoint, entrypoint.sh, to boot Wildfly with the specified bind IP addresses to its interfaces. The script gets the container metadata, extracts the container IP to bind it to Wildfly interfaces. This interface binding is an important step as it enables the application related network communication (web, messaging) between the containers.    
    5. Add the enrypoint.sh script to the image in the Dockerfile.                                        
    6. Build the container & push it to ECR repository. Amazon Elastic Container Registry (ECR) is a fully managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images.

The Wildly configuration files, the Dockerfile & the web app WAR file can be found at the Github link https://github.com/vidyann/Wildfly_ECS

Create ECS Service with service discovery

  • Create a Service using service discovery.
    • This link describe steps to set up a ECS task & service with service discovery https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-service-discovery.html#create-service-discovery-taskdef. Though the example in the link creates a Fargate cluster, you can create an EC2 based cluster as well for this example.
    • While configuring the task choose the network mode as AWSVPC. The task networking features provided by the AWSVPC network mode give Amazon ECS tasks the same networking properties as Amazon EC2 instances. Benefits of task networking can be found here – https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-networking.html
    • Tasks can be flagged as compatible with EC2, Fargate, or both. Here is what the cluster & service looks like:                             
    • When setting up the container details in task, use 8080 as the port, which is the default Wildfly port. This can be changed through WIldfly configuration. Enable the cloudwatch logs which captures Wildfly logging.
    • While configuring the ECS service, ensure that the service name & namespace should combine to form service endpoint that exactly matches the DNS_Query endpoint configured in Wildfly configuration file. The container security group should allow inbound traffic to port 8080. Here is what the service endpoint looks like:     
    • The route53 registry created by ECS is shown below. We see two DNS entries corresponding to the DNS endpoint myapp.sampleaws.com.              
    • Finally view the Wildfly logs in the console by clicking a task instance. You can check if clustering is enabled by looking for a log entry as below:            

Here we see that a Wildfly cluster was formed with two nodes(same as the pic in route 53).

Run the Web App in a browser

  • Spin up a windows instance in the VPC & open the web app in a browser. Below is a screenshot of the webapp:                                         
  • Open in different browsers & tabs & verify the Container IP & session ID. Now force shutdown a node by resizing the ECS service task instances to one. Note that though the container IP in the webapp changes, the session ID does not change and the webapp is available and the HTTP Session is alive thus demonstrating the session replication & failover amongst the clustering nodes.

4. Summary

Our goal here is to migrate the enterprise JEE apps to Amazon ECS by tweaking a few configurations but gaining immediately the benefits of containerization & orchestration managed by ECS. By delegating the undifferentiated heavy lifting of container management, orchestration, scaling to ECS, you can focus on improvising/re-architecting your application to micro-services oriented architecture. Please note that all the deployment procedures in this article can be fully automated via the AWS CI/CD services.

Automating rollback of failed Amazon ECS deployments

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/automating-rollback-of-failed-amazon-ecs-deployments/

Contributed by Vinay Nadig, Associate Solutions Architect, AWS.

With more and more organizations moving toward Agile development, it’s not uncommon to deploy code to production multiple times a day. With the increased speed of deployments, it’s imperative to have a mechanism in place where you can detect errors and roll back problematic deployments early. In this blog post, we look at a solution that automates the process of monitoring Amazon Elastic Container Service (Amazon ECS) deployments and rolling back the deployment if the container health checks fail repeatedly.

The normal flow for a service deployment on Amazon ECS is to create a new task definition revision and update an Amazon ECS service with the new task definition. Based on the values of minimumHealthyPercent and maximumHealthyPercent, Amazon ECS replaces existing containers in batches to complete the deployment. After the deployment is complete, you typically monitor the service health for errors and make a call on rolling back the deployment.

In March 2018, AWS announced support for native Docker health checks on Amazon ECS. Amazon ECS also supports Application Load Balancer health checks for services that are integrated with a load balancer. Leveraging these two features, we can build a solution that automatically rolls back Amazon ECS deployments if health checks fail.

Solution overview

The solution consists of the following components:

·      An Amazon CloudWatch event to listen for the UpdateService event of an Amazon ECS cluster

·      An AWS Lambda function that listens for the Amazon ECS events generated from the cluster after the service update

·      A Lambda function that calculates the failure percentage based on the events in the Amazon ECS event stream

·      A Lambda function that triggers rollback of the deployment if there are high error rates in the event

·      An AWS Step Functions state machine to orchestrate the entire flow

 

The following diagram shows the solution’s components and workflow.

Assumptions

The following assumptions are important to understand before you implement the solution:

·      The solution assumes that with every revision of the task definition, you use a new Docker tag instead of using the default “latest” tag. As a best practice, we advise that you do every release with a different Docker image tag and a revision of the task definition.

·      If there are continuous healthcheck failures even after the deployment is automatically rolled back using this setup, another rollback is triggered due to the health check failures. This might introduce a runaway deployment rollback loop. Make sure that you use the solution where you know that a one-step rollback will bring the Amazon ECS service into a stable state.

·      This blog post assumes deployment to the US West (Oregon) us-west-2 Region. If you want to deploy the solution to other Regions, you need to make minor modifications to the Lambda code.

·      The Amazon ECS cluster launches in a new VPC. Make sure that your VPC service limit allows for a new VPC.

Prerequisites

You need the following permissions in AWS Identity and Access Management (IAM) to implement the solution:

·      Create IAM Roles

·      Create ECS Cluster

·      Create CloudWatch Rule

·      Create Lambda Functions

·      Create Step Functions

Creating the Amazon ECS cluster

First, we create an Amazon ECS cluster using the AWS Management Console.

1. Sign in to the AWS Management Console and open the Amazon ECS console.

2. For Step 1: Select cluster template, choose EC2 Linux + Networking and then choose Next step.

3. For Step 2: Configure cluster, under Configure cluster, enter the Amazon ECS cluster name as AutoRollbackTestCluster.

 

4. Under Instance configuration, for EC2 instance type, choose t2.micro.

5. Keep the default values for the rest of the settings and choose Create.

 

This provisions an Amazon ECS cluster with a single Amazon ECS container instance.

Creating the task definition

Next, we create a new task definition using the Nginx Alpine image.

1. On the Amazon ECS console, choose Task Definitions in the navigation pane and then choose Create new Task Definition.

2. For Step 1: Select launch type compatibility, choose EC2 and then choose Next step.

3. For Task Definition Name, enter Web-Service-Definition.

4. Under Task size, under Container Definitions, choose Add Container.

5.  On the Add container pane, under Standard, enter Web-Service-Container for Container name.

6.  For Image, enter nginx:alpine. This pulls the nginx:alpine Docker image from Docker Hub.

7.  For Memory Limits (MiB), choose Hard limit and enter 128.

8.  Under Advanced container configuration, enter the following information for Healthcheck:

 

·      Command:

CMD-SHELL, wget http://localhost/ && rm index.html || exit 1

·      Interval: 10

·      Timeout: 30

·      Start period: 10

·      Retries: 2

9.  Keep the default values for the rest of the settings on this pane and choose Add.

10. Choose Create.

Creating the Amazon ECS service

Next, we create an Amazon ECS service that uses this task definition.

1.  On the Amazon ECS console, choose Clusters in the navigation pane and then choose AutoRollbackTestCluster.

2.  On the Services view, choose Create.

3.  For Step 1: Configure service, use the following settings:

·      Launch type: EC2.

·      Task Definition Family: Web-Service-Definition. This automatically selects the latest revision of the task definition.

·      Cluster: AutoRollbackTestCluster.

·      Service name: Nginx-Web-Service.

·      Number of tasks: 3.

4.  Keep the default values for the rest of the settings and choose Next Step.

5.  For Step 2: Configure network, keep the default value for Load balancer type and choose Next Step.

6. For Step 3: Set Auto Scaling (optional), keep the default value for Service Auto Scaling and choose Next Step.

7. For Step 4: Review, review the settings and choose Create Service.

After creating the service, you should have three tasks running in the cluster. You can verify this on the Tasks view in the service, as shown in the following image.

Implementing the solution

With the Amazon ECS cluster set up, we can move on to implementing the solution.

Creating the IAM role

First, we create an IAM role for reading the event stream of the Amazon ECS service and rolling back any faulty deployments.

 

1.  Open the IAM console and choose Policies in the navigation pane.

2.  Choose Create policy.

3.  On the Visual editor view, for Service, choose EC2 Container Service.

4.  For Actions, under Access Level, select DescribeServices for Read and UpdateServices for Write.

5.  Choose Review policy.

6.  For Name, enter ECSRollbackPolicy.

7.  For Description, enter an appropriate description.

8.  Choose Create policy.

Creating the Lambda service role

Next, we create a Lambda service role that uses the previously created IAM policy. The Lambda function to roll back faulty deployments uses this role.

 

1.  On the IAM console, choose Roles in the navigation pane and then choose Create role.

2.  For the type of trusted entity, choose AWS service.

3.  For the service that will use this role, choose Lambda.

4.  Choose Next: Permissions.

5.  Under Attach permissions policies, select the ECSRollbackPolicy policy that you created.

6. Choose Next: Review.

7.  For Role name, enter ECSRollbackLambdaRole and choose Create role.

Creating the Lambda function for the Step Functions workflow and Amazon ECS event stream

The next step is to create the Lambda function that will collect Amazon ECS events from the Amazon ECS event stream. This Lambda function will be part of the Step Functions state machine.

 

1.  Open the Lambda console and choose Create function.

2.  For Name, enter ECSEventCollector.

3.  For Runtime, choose Python 3.6.

4.  For Existing role, choose the ECSRollbackLambdaRole IAM role that you created.

5. Choose Create function.

6.  On the Configuration view, under Function code, enter the following code.

import time
import boto3
from datetime import datetime

ecs = boto3.client('ecs', region_name='us-west-2')


def lambda_handler(event, context):
    service_name = event['detail']['requestParameters']['service']
    cluster_name = event['detail']['requestParameters']['cluster']
    _update_time = event['detail']['eventTime']
    _update_time = datetime.strptime(_update_time, "%Y-%m-%dT%H:%M:%SZ")
    start_time = _update_time.strftime("%s")
    seconds_from_start = time.time() - int(start_time)
    event.update({'seconds_from_start': seconds_from_start})

    _services = ecs.describe_services(
        cluster=cluster_name, services=[service_name])
    service = _services['services'][0]
    service_events = service['events']
    events_since_update = [event for event in service_events if int(
        (event['createdAt']).strftime("%s")) > int(start_time)]
    [event.pop('createdAt') for event in events_since_update]
    event.update({"events": events_since_update})
    return event

 

7. Under Basic Settings, set Timeout to 30 seconds.

 

8.  Choose Save.

Creating the Lambda function to calculate failure percentage

Next, we create a Lambda function that calculates the failure percentage based on the number of failed container health checks derived from the event stream.

 

1.     On the Lambda console, choose Create function.

2.     For Name, enter ECSFailureCalculator.

3.     For Runtime, choose Python 3.6.

4.     For Existing role, choose the ECSRollbackLambdaRole IAM role that you created.

5.     Choose Create function.

6.     On the Configuration view, under Function code, enter the following code.

 

import re

lb_hc_regex = re.compile("\(service (.*)?\) \(instance (i-[a-z0-9]{7,17})\) \(port ([0-9]{4,5})\) is unhealthy in \(target-group (.*)?\) due to \((.*)?: \[(.*)\]\)")
docker_hc_regex = re.compile("\(service (.*)?\) \(task ([a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})\) failed container health checks\.")
task_registration_formats = ["\(service (.*)?\) has started ([0-9]{1,9}) tasks: (.*)\."]


def lambda_handler(event, context):
    cluster_name = event['detail']['requestParameters']['cluster']
    service_name = event['detail']['requestParameters']['service']

    messages = [m['message'] for m in event['events']]
    failures = get_failure_messages(messages)
    registrations = get_registration_messages(messages)
    failure_percentage = get_failure_percentage(failures, registrations)
    print("Failure Percentage = {}".format(failure_percentage))
    return {"failure_percentage": failure_percentage, "service_name": service_name, "cluster_name": cluster_name}


def get_failure_percentage(failures, registrations):
    no_of_failures = len(failures)
    no_of_registrations = sum([float(x[0][1]) for x in registrations])
    return no_of_failures / no_of_registrations * 100 if no_of_registrations > 0 else 0


def get_failure_messages(messages):
    failures = []
    for message in messages:
        failures.append(lb_hc_regex.findall(message)) if lb_hc_regex.findall(message) else None
        failures.append(docker_hc_regex.findall(message)) if docker_hc_regex.findall(message) else None
    return failures


def get_registration_messages(messages):
    registrations = []
    for message in messages:
        for registration_format in task_registration_formats:
            if re.findall(registration_format, message):
                registrations.append(re.findall(registration_format, message))
    return registrations

7.     Under Basic Settings, set Timeout to 30 seconds.

8.     Choose Save.

Creating the Lambda function to roll back a deployment

Next, we create a Lambda function to roll back an Amazon ECS deployment.

 

1.     On the Lambda console, choose Create function.

2.     For Name, enter ECSRollbackfunction.

3.     For Runtime, choose Python 3.6.

4.     For Existing role, choose the ECSRollbackLambdaRole IAM role that you created.

5.     Choose Create function.

6.     On the Configuration view, under Function code, enter the following code.

 

import boto3

ecs = boto3.client('ecs', region_name='us-west-2')

def lambda_handler(event, context):
    service_name = event['service_name']
    cluster_name = event['cluster_name']

    _services = ecs.describe_services(cluster=cluster_name, services=[service_name])
    task_definition = _services['services'][0][u'taskDefinition']
    previous_task_definition = get_previous_task_definition(task_definition)

    ecs.update_service(cluster=cluster_name, service=service_name, taskDefinition=previous_task_definition)
    print("Rollback Complete")
    return {"Rollback": True}

def get_previous_task_definition(task_definition):
    previous_version_number = str(int(task_definition.split(':')[-1])-1)
    previous_task_definition = ':'.join(task_definition.split(':')[:-1]) + ':' + previous_version_number
    return previous_task_definition


7.     Under Basic Settings, set Timeout to 30 seconds.

8.     Choose Save.

Creating the Step Functions state machine

Next, we create a Step Functions state machine that performs the following steps:

 

1.     Collect events of a specified service for a specified duration from the event stream of the Amazon ECS cluster.

2.     Calculate the percentage of failures after the deployment.

3.     If the failure percentage is greater than a specified threshold, roll back the service to the previous task definition.

 

To create the state machine:

1.     Open the Step Functions console and choose Create state machine.

2.     For Name, enter ECSAutoRollback.

For IAM role, keep the default selection of Create a role for me and select the check box. This will create a new IAM role with necessary permissions for the execution of the state machine.

Note
If you have already created a Step Functions state machine, IAM Role is populated.

3.     For State machine definition, enter the following code, replacing the Amazon Resource Name (ARN) placeholders with the ARNs of the three Lambda functions that you created.

{
    "StartAt": "VerifyClusterAndService",
    "States":
    {
        "VerifyClusterAndService":
        {
            "Type": "Choice",
            "Choices": [
            {
                "And": [
                {
                    "Variable": "$.detail.requestParameters.cluster",
                    "StringEquals": "AutoRollbackTestCluster"
                },
                {
                    "Variable": "$.detail.requestParameters.service",
                    "StringEquals": "Nginx-Web-Service"
                }],
                "Next": "GetTasksStatus"
            },
            {
                "Not":
                {
                    "And": [
                    {
                        "Variable": "$.detail.requestParameters.cluster",
                        "StringEquals": "AutoRollbackTestCluster"
                    },
                    {
                        "Variable": "$.detail.requestParameters.service",
                        "StringEquals": "Nginx-Web-Service"
                    }]
                },
                "Next": "EndState"
            }]
        },
        "GetTasksStatus":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSEventCollector-Lambda-Function>",
            "Next": "WaitForInterval"
        },
        "WaitForInterval":
        {
            "Type": "Wait",
            "Seconds": 5,
            "Next": "IntervalCheck"
        },
        "IntervalCheck":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.seconds_from_start",
                "NumericGreaterThan": 300,
                "Next": "FailureCalculator"
            },
            {
                "Variable": "$.seconds_from_start",
                "NumericLessThan": 300,
                "Next": "GetTasksStatus"
            }]
        },
        "FailureCalculator":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSFailureCalculator-Lambda-Function-here>",
            "Next": "RollbackDecider"
        },
        "RollbackDecider":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.failure_percentage",
                "NumericGreaterThan": 10,
                "Next": "RollBackDeployment"
            },
            {
                "Variable": "$.failure_percentage",
                "NumericLessThan": 10,
                "Next": "EndState"
            }]
        },
        "RollBackDeployment":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSRollbackFunction-Lambda-Function-here>",
            "Next": "EndState"
        },
        "EndState":
        {
            "Type": "Succeed"
        }
    }
}

4.     Choose Create state machine.

 

Now we have a mechanism to roll back a deployment if there are more than a configurable percentage of errors after a deployment to a specific Amazon ECS service.

(Optional) Monitoring and rolling back all services in the Amazon ECS cluster

Step Functions hard-codes the Amazon ECS service name in the state machine so that you monitor only a specific service in the cluster. The following image shows these lines in the state machine’s definition.

If you want to monitor all services and automatically roll back any Amazon ECS deployment in the cluster  based on failures, modify the state machine definition to verify only the cluster name and to not verify the service name. To do this, remove the service name check in the definition, as shown in the following image.

The following code verifies only the cluster name. It monitors any Amazon ECS service and performs a rollback if there are errors.

{
    "StartAt": "VerifyClusterAndService",
    "States":
    {
        "VerifyClusterAndService":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.detail.requestParameters.cluster",
                "StringEquals": "AutoRollbackTestCluster",
                "Next": "GetTasksStatus"
            },
            {
                "Not":
                {
                    "Variable": "$.detail.requestParameters.cluster",
                    "StringEquals": "AutoRollbackTestCluster"
                },
                "Next": "EndState"
            }]
        },
        "GetTasksStatus":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSEventCollector-Lambda-Function>",
            "Next": "WaitForInterval"
        },
        "WaitForInterval":
        {
            "Type": "Wait",
            "Seconds": 5,
            "Next": "IntervalCheck"
        },
        "IntervalCheck":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.seconds_from_start",
                "NumericGreaterThan": 300,
                "Next": "FailureCalculator"
            },
            {
                "Variable": "$.seconds_from_start",
                "NumericLessThan": 300,
                "Next": "GetTasksStatus"
            }]
        },
        "FailureCalculator":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSFailureCalculator-Lambda-Function-here>",
            "Next": "RollbackDecider"
        },
        "RollbackDecider":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.failure_percentage",
                "NumericGreaterThan": 10,
                "Next": "RollBackDeployment"
            },
            {
                "Variable": "$.failure_percentage",
                "NumericLessThan": 10,
                "Next": "EndState"
            }]
        },
        "RollBackDeployment":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSRollbackFunction-Lambda-Function-here>",
            "Next": "EndState"
        },
        "EndState":
        {
            "Type": "Succeed"
        }
    }
}

 

Configuring the state machine to execute automatically upon Amazon ECS deployment

Next, we configure a trigger for the state machine so that its execution automatically starts when there is an Amazon ECS deployment. We use Amazon CloudWatch to configure the trigger.

 

1.     Open the CloudWatch console and choose Rules in the navigation pane.

2.     Choose Create rule and use the following settings:

·      Event Source

o   Service Name: EC2 Container Service (ECS)

o   Event Type: AWS API Call via CloudTrail

o   Operations: choose ‘Specific Operations and enter UpdateService

·      Targets

o   Step Functions state machine

o   State machine: ECSAutoRollback

3.     Choose Configure details.

4.     For Name, enter ECSServiceUpdateRule.

5.     For Description, enter an appropriate description.

6.     For State, make sure that Enabled is selected.

7.     Click Create rule.

 

Setting up the CloudWatch trigger is the last step in linking the Amazon ECS UpdateService events to the Step Functions state machine that we set up. With this step complete, we can move on to testing the solution.

Testing the solution

Let’s update the task definition and force a failure of the container health checks so that we can confirm that the deployment rollback occurs as expected.

 

To test the solution:

 

1.     Open the Amazon ECS console and choose Task Definitions in the navigation pane.

2.     Select the check box next to Web-Service-Definition and choose Create new revision.

3.     Under Container Definitions, choose Web-Service-Container.

4.     On the Edit container pane, under Healthcheck, update Command to

CMD-SHELL, wget http://localhost/does-not-exist.html && rm index.html || exit 1 

and choose Update.

5.     Choose Create. This creates the task definition revision.

6.     Open the Nginx-Web-Service page of the Amazon ECS console and choose Update.

7.     For Task Definition, select the latest revision.

8.    Keep the default values for the rest of the settings by choosing Next Step until you reach Review.

9.     Choose Update Service. This creates a new Amazon ECS deployment.

This service update triggers the CloudWatch rule, which in turn triggers the state machine. The state machine collects the Amazon ECS events for 300 seconds. If the percentage of errors due to health check failures is more than 10%, the deployment is automatically rolled back. You can verify this on the Step Functions console. On the Executions view, you should see a new execution that the deployment is triggering, as shown in the following image.

Choose the execution to see the workflow in progress. After the workflow is complete, you can check the outcome of the workflow by choosing EndState in Visual Workflow. The output should show {“Rollback”: true}.

You can also verify in the service details that the service has been updated with the previous version of the task definition.

Conclusion

With this solution, you can detect issues with Amazon ECS deployments early on and automate failure responses. You can also integrate the solution into your existing systems by triggering an Amazon SNS notification to send email or SMS instead of rolling back the deployment automatically. Though this blog uses Amazon ECS, you can follow similar steps to have automatic rollback for AWS Fargate.

If you want to customize the duration for monitoring your deployments before deciding to rollback and the error percentage threshold beyond which a rollback should be triggered, modify the values highlighted in the following image of the state machine definition.

Measuring service chargeback in Amazon ECS

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/measuring-service-chargeback-in-amazon-ecs/

Contributed by Subhrangshu Kumar Sarkar, Sr. Technical Account Manager, and Shiva Kumar Subramanian, Sr. Technical Account Manager

Amazon Elastic Container Service (ECS) users have been asking us for a way to allocate cost to the deployed services in a shared Amazon ECS cluster. This blog post can help customers think through different techniques to allocate costs incurred by running Amazon ECS services to owners who include specific teams or individual users. The post dives in to one technique that gives customers a granular way to allocate costs to Amazon ECS service owners.

Amazon ECS pricing models

Amazon ECS has two pricing models.  In the Amazon EC2 launch type model, you pay for the AWS resources (e.g., Amazon EC2 instances or Amazon EBS volumes) that you create to store and run your application. Right now, it’s difficult to calculate the aggregate cost of an Amazon ECS service that consists of multiple tasks. In the AWS Fargate launch type model, you pay for vCPU and memory resources that your containerized application requests. Although the user knows the cost that the tasks incur, there is no out-of-box way to associate that cost to a service.

Possible solutions

There are two possible solutions to this problem.

A. Billing based on the usage of container instances in a partitioned cluster.

One solution for service chargeback is to associate specific container instances with respective teams or customers. Then use task placement constraints to restrict the services that they deploy to only those container instances. The following image shows how this solution works.

Here, user A is allowed to deploy services only the blue container instances and user B is allowed on the green ones. Both users can be charged based on the AWS resources they use. E.g. the EC2 instances and the ALB etc.

This solution is useful when you don’t want to host services from different teams or users on the same set of container instances. However, an Amazon ECS cluster is getting shared, and the end users are still getting charged for the Amazon EC2 instances and other AWS assets that they’re using rather than for the exact vCPU and memory resources that their service is using. The disadvantage to this approach is that you could have provisioned excess capacity for your users and end up wasting resources. You also need to use placement constraints in all of your task definitions.

B. Billing based on resource usage at the task level.

Another solution could be to develop a mechanism to let the Amazon ECS cluster owners calculate the aggregate cost of an Amazon ECS service that consists of multiple tasks. The solution would have a metering mechanism and a chargeback measurement. When deployed for Amazon EC2 launch type tasks, the metering mechanism tracks the vCPU and memory that Amazon ECS reserves in the tasks’ lifetime. Then, with the chargeback measurement, the cluster owner can associate a cost with these tasks based on the cost incurred by the container instances that they’re running on. The following image shows how this solution works.

Here, unlike the previous solution, both users can use all the container instances of the ECS cluster.

With this solution, customers can start using a shared Amazon ECS cluster to deploy their tasks on any of the container instances. After the solution has been deployed, the cost for a service can be calculated at any point in time, using the cluster and the service name as input parameters.

With Fargate tasks, the vCPU and memory usage details are already available in vCPU-hours and GB-hours, respectively. The chargeback measurement in the solution aggregates the CPU and memory reservation of all the tasks that ever ran as part of a service. It associates a cost to this aggregated CPU and memory reservation by multiplying it with Fargate’s per vCPU per hour and perGB per hour cost, respectively.

This solution has the following considerations:

  • Amazon EC2 pricing: For the base price of the container instance, we’re considering the On-Demand price.
  • Platform costs: Common costs for the cluster (the Amazon EBS volume that the containers are launched from, Amazon ECR, etc.) are treated as the platform cost for all of the services running on the cluster.
  • Networking cost: When you’re using bridge or host networking, there is no mechanism to divide costs among different tasks that are launched on the container instance.
  • Elastic Load Balancing or Application Load Balancer costs: If services sit behind multiple target groups of an Application Load Balancer, there is no direct way of dividing costs per target group.

Solution components

The solution has two components: a metering mechanism and a chargeback measurement.

The metering mechanism consists of the following parts:

The chargeback measurement consists of the following parts:

  • Python script
  • AWS Price List Service API

Metering mechanism

The following image shows the architecture of the solution’s metering mechanism.

As part of the deployment of the Metering mechanism, the user needs to do the following.

  1. A CloudWatch Events rule is created by the user to trigger a Lambda function on an Amazon ECS task state change event. Typically, a task state change event is generated with a call to the StartTask, RunTask, and StopTask API operations or when an Amazon ECS service scheduler starts or stops a task.
  2. User needs to create a DynamoDB table, which the Lambda function can update.
  3. Every time the Lambda function is invoked, it updates the DynamoDB table with details of the Amazon ECS task.

With the first run of the metering mechanism, it takes stock of all running Amazon ECS tasks across all services across all clusters. This data resides in DynamoDB from then on, and the solution’s chargeback measurement uses it.

Chargeback measurement

The following image shows the architecture of the chargeback measurement.

When you need to find the cost associated with a service, run the ecs-chargeback Python script with the cluster and service names as parameters. This script performs the following actions.

  1. Find all the tasks that have ever run or are currently running as part of the service.
  2. For each task, calculate the up time.
  3. For each task, find the container instance type (for Amazon EC2 type tasks).
  4. Find what percentage of the host’s compute or memory resources the task has reserved. If there is no task-level CPU reservation for Amazon EC2 launch type tasks, a CPU reservation of 128 CPU shares (0.125 vCPUs) is assumed. In Amazon EC2 launch type tasks, you have to specify memory reservation at the task or container level during creation of the task definition.
  5. Associate that percentage with a cost.
  6. (Optional) Use the following parameters:
    • Duration: By default, the script shows the service cost for its complete uptime. You can use the duration parameter to get the cost for a particular month, the month to date, or the last n days.
    • Weight: This parameter is a weighted fraction that you can use to disproportionately divide the instance cost between vCPU and memory. By default, this value is 0.5.

The vCPU and memory costs are calculated using the following formulas:

  • Task vCPU cost = (task vCPU reservation/total vCPUs in the instance) * (cost of the instance) * (vCPU/memory weight) * task run time in seconds
  • Task memory cost = (task memory reservation/total memory in the instance) * (cost of the instance) * (1- vCPU/memory weight) * task run time in seconds

Solution deployment and cost measurement

Here are the steps to deploy the solution in your AWS account and then calculate the service chargeback.

Metering mechanics

1. Create a DynamoDB table named ECSTaskStatus to capture details of an ECS task state change CloudWatch event.

Primary partition key: taskArn. Type: string.

Provision RCUs or WCUs depending on your Amazon ECS usage.

For the rest, keep the default values.

aws dynamodb create-table --table-name ECSTaskStatus \
--attribute-definitions AttributeName=taskArn,AttributeType=S \
--key-schema AttributeName=taskArn,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=10,WriteCapacityUnits=20

2. Create an IAM policy named LambdaECSTaskStatusPolicy that allows the Lambda function to make    the following API calls. Create a local copy of the policy document LambdaECSTaskStatusPolicy.JSON from GitHub.

o	ecs: DescribeContainerInstances
o	dynamodb: BatchGetItem, BatchWriteItem, PutItem, GetItem, and UpdateItem

o	logs: CreateLogGroup, CreateLogStream, and PutLogEvents

aws iam create-policy --policy-name LambdaECSTaskStatusPolicy \
--policy-document file://LambdaECSTaskStatusPolicy.JSON

3. Create an IAM role named LambdaECSTaskStatusRole and attach the policy to the role. Replace <Policy ARN> with the Amazon Resource Name (ARN) of the IAM policy.

aws iam create-role --role-name LambdaECSTaskStatusRole \
--assume-role-policy-document \
'{ "Version": "2012-10-17", "Statement": { "Effect": "Allow", "Principal": {"Service": "lambda.amazonaws.com"}, "Action": "sts:AssumeRole"}}'

aws iam attach-role-policy --policy-arn <Policy ARN> --role-name LambdaECSTaskStatusRole

4. Create a Lambda function named ecsTaskStatus that PUTs or UPDATEs the Amazon ECS task details to the ECSTaskStatus DynamoDB table. This function has the following details:

o   Runtime: Python 3.6.

o   Memory setting: 128 MB.

o   Timeout: 3 seconds.

o   Execution role: LambdaECSTaskStatusRole.

o   Code: ecsTaskStatus.py. Use the inline code editor on the Lambda console to author the function.

 

5. Create a CloudWatch Events rule for Amazon ECS task state change events and configure the Lambda function as the target. The function puts or updates items in the ECSTaskStatus DynamoDB table with every Amazon ECS task’s details.

a.     Create the CloudWatch Events rule.

aws events put-rule --name ECSTaskStatusRule \
--event-pattern '{"source": ["aws.ecs"], "detail-type": ["ECS Task State Change"], "detail": {"lastStatus": ["RUNNING", "STOPPED"]}}'

b.     Add the Lambda function as a target to the CloudWatch Events rule. Replace <Lambda ARN> with the ARN of the Lambda function that you created in step 4.

aws events put-targets --rule ECSTaskStatusRule --targets "Id"="1","Arn"="<Lambda ARN>"

c.     Add permissions for CloudWatch Events to invoke Lambda. Replace <CW Events Rule ARN> with the ARN of the CloudWatch Events rule that you created in step 5a.

aws lambda add-permission --function-name ecsTaskStatus \
--action 'lambda:InvokeFunction' --statement-id "LambdaAddPermission" \
--principal events.amazonaws.com --source-arn <CW Events Rule ARN>

The solution invokes the Lambda function only when an Amazon ECS task state change event occurs. Therefore, when the solution is deployed, no event is raised for current running tasks, and task details aren’t populated into the DynamoDB table. If you want to meter current running tasks, you can run the script ecsTaskStatus-FirstRun.py after creation of the DynamoDB table. This populates all running tasks’ details into the DynamoDB table. The script is idempotent.

ecsTaskStatus-FirstRun.py --region eu-west-1

Chargeback measurement

To find the cost for running a service, run the Python script ecs-chargeback, which has the following usage and arguments.

./ecs-chargeback -h
usage: ecs-chargeback [-h] --region REGION --cluster CLUSTER --service SERVICE
                      [--weight WEIGHT] [-v]
                      [--month MONTH | --days DAYS | --hours HOURS]

optional arguments:
  -h, --help            show this help message and exit
  --region REGION, -r REGION
                        AWS Region in which Amazon ECS service is running.
  --cluster CLUSTER, -c CLUSTER
                        ClusterARN in which Amazon ECS service is running.
  --service SERVICE, -s SERVICE
                        Name of the AWS ECS service for which cost has to be
                        calculated.
  --weight WEIGHT, -w WEIGHT
                        Floating point value that defines CPU:Memory Cost
                        Ratio to be used for dividing EC2 pricing
  -v, --verbose
  --month MONTH, -M MONTH
                        Show charges for a service for a particular month
  --days DAYS, -D DAYS  Show charges for a service for last N days
  --hours HOURS, -H HOURS
                        Show charges for a service for last N hours

 

To calculate the cost that a service incurs with Amazon EC2 launch type tasks, run the script as follows.

./ecs-chargeback -r eu-west-1 -c ecs-chargeback -s nginxsvc

The following is sample output of running this script.

# ECS Region  : eu-west-1, ECS Service Name: nginxsvc
# ECS Cluster : arn:aws:ecs:eu-west-1:675410410211:cluster/ecs-chargeback
#
# Amazon ECS Service Cost           : 26.547270 USD
#             (Launch Type : EC2)
#         EC2 vCPU Usage Cost       : 21.237816 USD
#         EC2 Memory Usage Cost     : 5.309454 USD

To get the chargeback for Fargate launch type tasks, run the script as follows.

./ecs-chargeback -r eu-west-1 -c ecs-chargeback -s fargatesvc

The following is sample output of this script.


# ECS Region  : eu-west-1, ECS Service Name: fargatesvc
# ECS Cluster : arn:aws:ecs:eu-west-1:675410410211:cluster/ecs-chargeback
#
# Amazon ECS Service Cost           : 118.653359 USD
#             (Launch Type : FARGATE)
#         Fargate vCPU Usage Cost   : 78.998157 USD
#         Fargate Memory Usage Cost : 39.655201 USD

Conclusion

This solution can help Amazon ECS users track and allocate costs for their deployed workloads. It might also help them save some costs by letting them share an Amazon ECS cluster among multiple users or teams. We welcome your comments and questions below. Please reach out to us if you would like to contribute to the solution.