Tag Archives: Containers

Improve Productivity and Reduce Overhead Expenses with Red Hat OpenShift Dedicated on AWS

Post Syndicated from Ryan Niksch original https://aws.amazon.com/blogs/architecture/improve-productivity-and-reduce-overhead-expenses-with-red-hat-openshift-dedicated-on-aws/

Red Hat OpenShift on AWS helps you develop, deploy, and manage container-based applications across on-premises and cloud environments. A recent case study from Cathay Pacific Airways proved that the use of the Red Hat OpenShift application platform can significantly improve developer productivity and reduce operational overhead by automating infrastructure, application deployment, and scaling. In this post, I explore how the architectural implementation and customization options of Red Hat OpenShift dedicated on AWS can cater to a variety of customer needs.

Red Hat OpenShift is a turn key solution providing  a container runtime, Kubernetes orchestration, container image repositories, pipeline, build process, monitoring, logging, role-based access control, granular policy-based control, and abstractions to simplify functions. Deploying a single turnkey solution, instead of building and integrating a collection of independent solutions or services, allows you to invest more time and effort in building meaningful applications for your business.

In the past, Red Hat OpenShift deployed on Amazon EC2 using an automated provisioning process with an open source solution, like the Red Hat OpenShift on AWS Quick Start. The Red Hat OpenShift Quick Start is an infrastructure as code solution which accelerates customer provisioning of Red Hat OpenShift on AWS. The OpenShift Quick Start adheres to the reference architecture to deploy Red Hat OpenShift on AWS in a resilient, scalable, well-architected manner. This reference architecture sees the control plane as a collection of load balanced master nodes for traffic routing, session state, scheduling, and monitoring. It also contains the application nodes where the customer’s containerized workloads run. This solution allowed customers to get up and running within three hours; however, it did not reduce management overhead because customers were required to monitor and maintain the infrastructure of the Red Hat OpenShift cluster.

Red Hat and AWS listened to customer feedback and created Red Hat OpenShift dedicated, a fully managed OpenShift implementation running exclusively on AWS. This implementation monitors the layers and functions, scales the layers to cater to consumption needs, and addresses operational concerns.

Customers now have access to a platform that helps manage control planes for business-critical solutions, like their developer and operational platforms.

Red Hat OpenShift Dedicated Infrastructure on AWS

You can purchase Red Hat OpenShift dedicated through the Red Hat account team. Red Hat OpenShift dedicated comes in two varieties: the Standard edition and the Cloud Choice edition (bring your own cloud).

redhar-openshit options

Figure 1: Red Hat OpenShift architecture illustrating master and infrastructure nodes spread over three Availability Zones and placed behind elastic load balancers.

Red Hat OpenShift dedicated adheres to the reference architecture defined by AWS and Red Hat. Master and infrastructure layers are spread across three AWS availability zones providing resilience within the OpenShift solution, as well as the underlying infrastructure.

Red Hat OpenShift Dedicated Standard Edition

In the Red Hat OpenShift dedicated standard edition, Red Hat deploys the OpenShift cluster into an AWS account owned and managed by Red Hat. Red Hat provides an aggregated bill for the OpenShift subscription fees, management fees, and AWS billing. This edition is ideal for customers who want everything to be managed for them. The Red Hat site reliability engineering  team (SRE) will monitor and manage healing, scaling, and patching of the cluster.

Red Hat OpenShift Cloud Choice Edition

The cloud choice edition allows customers to create their own AWS account, and then have the Red Hat OpenShift dedicated infrastructure provisioned into their existing account. The Red Hat SRE team provisions the Red Hat OpenShift cluster into the customer owned AWS account and manages the solution via IAM roles.

Figure 2: Red Hat OpenShift Cloud Choice IAM role separation

Red Hat provides billing for the Red Hat OpenShift Cloud Choice subscription and management fees, and AWS provides billing for the AWS resources. Keeping the Red Hat OpenShift infrastructure within your AWS account allows better cost controls.

Red Hat OpenShift Cloud Choice provides visibility into the resources running in your account; which is desirable if you have regulatory and auditing concerns. You can inspect, monitor, and audit resources within the AWS account — taking advantage of the rich AWS service set (AWS CloudTrail, AWS config, AWS CloudWatch, and AWS cost explorer).

You can also take advantage of cost management solutions like AWS organizations and consolidated billing. Customers with multiple business units using AWS can combine the usage across their accounts to share the volume pricing discounts resulting in cost savings for projects, departments, and companies.

Red Hat OpenShift Cloud Choice dedicated cannot be deployed into an account currently hosting other applications and resources. In order to maintain separation of control with the managed service, Red Hat OpenShift Cloud Choice dedicated requires an AWS account dedicated to the managed Red Hat OpenShift solution.

You can take advantage of cost reductions of up to 70% using Reserved Instances, which match the pervasive running instances. This is ideal for the master and infrastructure nodes of the Red Hat OpenShift solutions running in your account. The reference architecture for Red Hat OpenShift on AWS recommends spanning  nodes over three availability zones, which translates to three master instances. The master and infrastructure nodes scale differently; so, there will be three additional instances for the infrastructure nodes. Purchasing reserved instances to offset the costs of the master nodes and the infrastructure nodes can free up funds for your next project.

Interactions

DevOps teams using either edition of Red Hat OpenShift dedicated have a rich console experience providing control over networking between application workloads, storage, and monitoring. Granular drill down consoles enable operations teams to focus on what is most critical to their organization.

Each interface is controlled through granular role-based access control. Teams have visibility of high-level cluster overviews where they are able to see visualizations of the overall health of the cluster; and they have access to more granular overviews of views of hosts, nodes, and containers. Application owners, key stake holders, and operations teams have access to a customizable dashboard displaying the running state. Teams can drill down to the underlying nodes, and further into the PODs and containers, should they wish to explore the status or overall health of the containerized micro services. The cluster-wide event stream provides the same drill down experience to logging events.

The drill down console menu options are illustrated in the screenshots below:

In summary, the partnership of Red Hat and AWS created a fully managed solution which directly answers customer feedback requests for a fully managed application platform running on the availability, scalability, and cost benefits of AWS. The solution allows visibility and control whenever and wherever you need it.

About the author

Ryan Niksch

Ryan Niksch is a Partner Solutions Architect focusing on application platforms, hybrid application solutions, and modernization. Ryan has worn many hats in his life and has a passion for tinkering and a desire to leave everything he touches a little better than when he found it.

Optimizing Amazon ECS task density using awsvpc network mode

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/optimizing-amazon-ecs-task-density-using-awsvpc-network-mode/

This post is contributed by Tony Pujals | Senior Developer Advocate, AWS

 

AWS recently increased the number of elastic network interfaces available when you run tasks on Amazon ECS. Use the account setting called awsvpcTrunking. If you use the Amazon EC2 launch type and task networking (awsvpc network mode), you can now run more tasks on an instance—5 to 17 times as many—as you did before.

As more of you embrace microservices architectures, you deploy increasing numbers of smaller tasks. AWS now offers you the option of more efficient packing per instance, potentially resulting in smaller clusters and associated savings.

 

Overview

To manage your own cluster of EC2 instances, use the EC2 launch type. Use task networking to run ECS tasks using the same networking properties as if tasks were distinct EC2 instances.

Task networking offers several benefits. Every task launched with awsvpc network mode has its own attached network interface, a primary private IP address, and an internal DNS hostname. This simplifies container networking and gives you more control over how tasks communicate, both with each other and with other services within their virtual private clouds (VPCs).

Task networking also lets you take advantage of other EC2 networking features like VPC Flow Logs. This feature lets you monitor traffic to and from tasks. It also provides greater security control for containers, allowing you to use security groups and network monitoring tools at a more granular level within tasks. For more information, see Introducing Cloud Native Networking for Amazon ECS Containers.

However, if you run container tasks on EC2 instances with task networking, you can face a networking limit. This might surprise you, particularly when an instance has plenty of free CPU and memory. The limit reflects the number of network interfaces available to support awsvpc network mode per container instance.

 

Raise network interface density limits with trunking

The good news is that AWS raised network interface density limits by implementing a networking feature on ECS called “trunking.” This is a technique for multiplexing data over a shared communication link.

If you’re migrating to microservices using AWS App Mesh, you should optimize network interface density. App Mesh requires awsvpc networking to provide routing control and visibility over an ever-expanding array of running tasks. In this context, increased network interface density might save money.

By opting for network interface trunking, you should see a significant increase in capacity—from 5 to 17 times more than the previous limit. For more information on the new task limits per container instance, see Supported Amazon EC2 Instance Types.

Applications with tasks not hitting CPU or memory limits also benefit from this feature through the more cost-effective “bin packing” of container instances.

 

Trunking is an opt-in feature

AWS chose to make the trunking feature opt-in due to the following factors:

  • Instance registration: While normal instance registration is straightforward with trunking, this feature increases the number of asynchronous instance registration steps that can potentially fail. Any such failures might add extra seconds to launch time.
  • Available IP addresses: The “trunk” belongs to the same subnet in which the instance’s primary network interface originates. This effectively reduces the available IP addresses and potentially the ability to scale out on other EC2 instances sharing the same subnet. The trunk consumes an IP address. With a trunk attached, there are two assigned IP addresses per instance, one for the primary interface and one for the trunk.
  • Differing customer preferences and infrastructure: If you have high CPU or memory workloads, you might not benefit from trunking. Or, you may not want awsvpc networking.

Consequently, AWS leaves it to you to decide if you want to use this feature. AWS might revisit this decision in the future, based on customer feedback. For now, your account roles or users must opt in to the awsvpcTrunking account setting to gain the benefits of increased task density per container instance.

 

Enable trunking

Enable the ECS elastic network interface trunking feature to increase the number of network interfaces that can be attached to supported EC2 container instance types. You must meet the following prerequisites before you can launch a container instance with the increased network interface limits:

  • Your account must have the AWSServiceRoleForECS service-linked role for ECS.
  • You must opt into the awsvpcTrunking  account setting.

 

Make sure that a service-linked role exists for ECS

A service-linked role is a unique type of IAM role linked to an AWS service (such as ECS). This role lets you delegate the permissions necessary to call other AWS services on your behalf. Because ECS is a service that manages resources on your behalf, you need this role to proceed.

In most cases, you won’t have to create a service-linked role. If you created or updated an ECS cluster, ECS likely created the service-linked role for you.

You can confirm that your service-linked role exists using the AWS CLI, as shown in the following code example:

$ aws iam get-role --role-name AWSServiceRoleForECS
{
    "Role": {
        "Path": "/aws-service-role/ecs.amazonaws.com/",
        "RoleName": "AWSServiceRoleForECS",
        "RoleId": "AROAJRUPKI7I2FGUZMJJY",
        "Arn": "arn:aws:iam::226767807331:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS",
        "CreateDate": "2018-11-09T21:27:17Z",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "ecs.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        },
        "Description": "Role to enable Amazon ECS to manage your cluster.",
        "MaxSessionDuration": 3600
    }
}

If the service-linked role does not exist, create it manually with the following command:

aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com

For more information, see Using Service-Linked Roles for Amazon ECS.

 

Opt in to the awsvpcTrunking account setting

Your account, IAM user, or role must opt in to the awsvpcTrunking account setting. Select this setting using the AWS CLI or the ECS console. You can opt in for an account by making awsvpcTrunking  its default setting. Or, you can enable this setting for the role associated with the instance profile with which the instance launches. For instructions, see Account Settings.

 

Other considerations

After completing the prerequisites described in the preceding sections, launch a new container instance with increased network interface limits using one of the supported EC2 instance types.

Keep the following in mind:

  • It’s available with the latest variant of the ECS-optimized AMI.
  • It only affects creation of new container instances after opting into awsvpcTrunking.
  • It only affects tasks created with awsvpc network mode and EC2 launch type. Tasks created with the AWS Fargate launch type always have a dedicated network interface, no matter how many you launch.

For details, see ENI Trunking Considerations.

 

Summary

If you seek to optimize the usage of your EC2 container instances for clusters that you manage, enable the increased network interface density feature with awsvpcTrunking. By following the steps outlined in this post, you can launch tasks using significantly fewer EC2 instances. This is especially useful if you embrace a microservices architecture, with its increasing numbers of lighter tasks.

Hopefully, you found this post informative and the proposed solution intriguing. As always, AWS welcomes all feedback or comment.

Using AWS App Mesh with Fargate

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/using-aws-app-mesh-with-fargate/

This post is contributed by Tony Pujals | Senior Developer Advocate, AWS

 

AWS App Mesh is a service mesh, which provides a framework to control and monitor services spanning multiple AWS compute environments. My previous post provided a walkthrough to get you started. In it, I showed deploying a simple microservice application to Amazon ECS and configuring App Mesh to provide traffic control and observability.

In this post, I show more advanced techniques using AWS Fargate as an ECS launch type. I show you how to deploy a specific version of the colorteller service from the previous post. Finally, I move on and explore distributing traffic across other environments, such as Amazon EC2 and Amazon EKS.

I simplified this example for clarity, but in the real world, creating a service mesh that bridges different compute environments becomes useful. Fargate is a compute service for AWS that helps you run containerized tasks using the primitives (the tasks and services) of an ECS application. This lets you work without needing to directly configure and manage EC2 instances.

 

Solution overview

This post assumes that you already have a containerized application running on ECS, but want to shift your workloads to use Fargate.

You deploy a new version of the colorteller service with Fargate, and then begin shifting traffic to it. If all goes well, then you continue to shift more traffic to the new version until it serves 100% of all requests. Use the labels “blue” to represent the original version and “green” to represent the new version. The following diagram shows programmer model of the Color App.

You want to begin shifting traffic over from version 1 (represented by colorteller-blue in the following diagram) over to version 2 (represented by colorteller-green).

In App Mesh, every version of a service is ultimately backed by actual running code somewhere, in this case ECS/Fargate tasks. Each service has its own virtual node representation in the mesh that provides this conduit.

The following diagram shows the App Mesh configuration of the Color App.

 

 

After shifting the traffic, you must physically deploy the application to a compute environment. In this demo, colorteller-blue runs on ECS using the EC2 launch type and colorteller-green runs on ECS using the Fargate launch type. The goal is to test with a portion of traffic going to colorteller-green, ultimately increasing to 100% of traffic going to the new green version.

 

AWS compute model of the Color App.

Prerequisites

Before following along, set up the resources and deploy the Color App as described in the previous walkthrough.

 

Deploy the Fargate app

To get started after you complete your Color App, configure it so that your traffic goes to colorteller-blue for now. The blue color represents version 1 of your colorteller service.

Log into the App Mesh console and navigate to Virtual routers for the mesh. Configure the HTTP route to send 100% of traffic to the colorteller-blue virtual node.

The following screenshot shows routes in the App Mesh console.

Test the service and confirm in AWS X-Ray that the traffic flows through the colorteller-blue as expected with no errors.

The following screenshot shows racing the colorgateway virtual node.

 

Deploy the new colorteller to Fargate

With your original app in place, deploy the send version on Fargate and begin slowly increasing the traffic that it handles rather than the original. The app colorteller-green represents version 2 of the colorteller service. Initially, only send 30% of your traffic to it.

If your monitoring indicates a healthy service, then increase it to 60%, then finally to 100%. In the real world, you might choose more granular increases with automated rollout (and rollback if issues arise), but this demonstration keeps things simple.

You pushed the gateway and colorteller images to ECR (see Deploy Images) in the previous post, and then launched ECS tasks with these images. For this post, launch an ECS task using the Fargate launch type with the same colorteller and envoy images. This sets up the running envoy container as a sidecar for the colorteller container.

You don’t have to manually configure the EC2 instances in a Fargate launch type. Fargate automatically colocates the sidecar on the same physical instance and lifecycle as the primary application container.

To begin deploying the Fargate instance and diverting traffic to it, follow these steps.

 

Step 1: Update the mesh configuration

You can download updated AWS CloudFormation templates located in the repo under walkthroughs/fargate.

This updated mesh configuration adds a new virtual node (colorteller-green-vn). It updates the virtual router (colorteller-vr) for the colorteller virtual service so that it distributes traffic between the blue and green virtual nodes at a 2:1 ratio. That is, the green node receives one-third of the traffic.

$ ./appmesh-colorapp.sh
...
Waiting for changeset to be created..
Waiting for stack create/update to complete
...
Successfully created/updated stack - DEMO-appmesh-colorapp
$

Step 2: Deploy the green task to Fargate

The fargate-colorteller.sh script creates parameterized template definitions before deploying the fargate-colorteller.yaml CloudFormation template. The change to launch a colorteller task as a Fargate task is in fargate-colorteller-task-def.json.

$ ./fargate-colorteller.sh
...

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - DEMO-fargate-colorteller
$

 

Verify the Fargate deployment

The ColorApp endpoint is one of the CloudFormation template’s outputs. You can view it in the stack output in the AWS CloudFormation console, or fetch it with the AWS CLI:

$ colorapp=$(aws cloudformation describe-stacks --stack-name=$ENVIRONMENT_NAME-ecs-colorapp --query="Stacks[0
].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp> ].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp
http://DEMO-Publi-YGZIJQXL5U7S-471987363.us-west-2.elb.amazonaws.com

Assign the endpoint to the colorapp environment variable so you can use it for a few curl requests:

$ curl $colorapp/color
{"color":"blue", "stats": {"blue":1}}
$

The 2:1 weight of blue to green provides predictable results. Clear the histogram and run it a few times until you get a green result:

$ curl $colorapp/color/clear
cleared

$ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done

0: {"color":"blue", "stats": {"blue":1}}
1: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"blue", "stats": {"blue":0.67,"green":0.33}}
3: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
4: {"color":"blue", "stats": {"blue":0.6,"green":0.4}}
5: {"color":"gre
en", "stats": {"blue":0.5,"green":0.5}}
6: {"color":"blue", "stats": {"blue":0.57,"green":0.43}}
7: {"color":"blue", "stats": {"blue":0.63,"green":0.38}}
8: {"color":"green", "stats": {"blue":0.56,"green":0.44}}
...
199: {"color":"blue", "stats": {"blue":0.66,"green":0.34}}

This reflects the expected result for a 2:1 ratio. Check everything on your AWS X-Ray console.

The following screenshot shows the X-Ray console map after the initial testing.

The results look good: 100% success, no errors.

You can now increase the rollout of the new (green) version of your service running on Fargate.

Using AWS CloudFormation to manage your stacks lets you keep your configuration under version control and simplifies the process of deploying resources. AWS CloudFormation also gives you the option to update the virtual route in appmesh-colorapp.yaml and deploy the updated mesh configuration by running appmesh-colorapp.sh.

For this post, use the App Mesh console to make the change. Choose Virtual routers for appmesh-mesh, and edit the colorteller-route. Update the HTTP route so colorteller-blue-vn handles 33.3% of the traffic and colorteller-green-vn now handles 66.7%.

Run your simple verification test again:

$ curl $colorapp/color/clear
cleared
fargate $ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done
0: {"color":"green", "stats": {"green":1}}
1: {"color":"blue", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"green", "stats": {"blue":0.33,"green":0.67}}
3: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
4: {"color":"green", "stats": {"blue":0.2,"green":0.8}}
5: {"color":"green", "stats": {"blue":0.17,"green":0.83}}
6: {"color":"blue", "stats": {"blue":0.29,"green":0.71}}
7: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
...
199: {"color":"green", "stats": {"blue":0.32,"green":0.68}}
$

If your results look good, double-check your result in the X-Ray console.

Finally, shift 100% of your traffic over to the new colorteller version using the same App Mesh console. This time, modify the mesh configuration template and redeploy it:

appmesh-colorteller.yaml
  ColorTellerRoute:
    Type: AWS::AppMesh::Route
    DependsOn:
      - ColorTellerVirtualRouter
      - ColorTellerGreenVirtualNode
    Properties:
      MeshName: !Ref AppMeshMeshName
      VirtualRouterName: colorteller-vr
      RouteName: colorteller-route
      Spec:
        HttpRoute:
          Action:
            WeightedTargets:
              - VirtualNode: colorteller-green-vn
                Weight: 1
          Match:
            Prefix: "/"
$ ./appmesh-colorapp.sh
...
Waiting for changeset to be created..
Waiting for stack create/update to complete
...
Successfully created/updated stack - DEMO-appmesh-colorapp
$

Again, repeat your verification process in both the CLI and X-Ray to confirm that the new version of your service is running successfully.

 

Conclusion

In this walkthrough, I showed you how to roll out an update from version 1 (blue) of the colorteller service to version 2 (green). I demonstrated that App Mesh supports a mesh spanning ECS services that you ran as EC2 tasks and as Fargate tasks.

In my next walkthrough, I will demonstrate that App Mesh handles even uncontainerized services launched directly on EC2 instances. It provides a uniform and powerful way to control and monitor your distributed microservice applications on AWS.

If you have any questions or feedback, feel free to comment below.

Scaling Kubernetes deployments with Amazon CloudWatch metrics

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/scaling-kubernetes-deployments-with-amazon-cloudwatch-metrics/

This post is contributed by Kwunhok Chan | Solutions Architect, AWS

 

In an earlier post, AWS introduced Horizontal Pod Autoscaler and Kubernetes Metrics Server support for Amazon Elastic Kubernetes Service. These tools make it easy to scale your Kubernetes workloads managed by EKS in response to built-in metrics like CPU and memory.

However, one common use case for applications running on EKS is the integration with AWS services. For example, you administer an application that processes messages published to an Amazon SQS queue. You want the application to scale according to the number of messages in that queue. The Amazon CloudWatch Metrics Adapter for Kubernetes (k8s-cloudwatch-adapter) helps.

 

Amazon CloudWatch Metrics Adapter for Kubernetes

The k8s-cloudwatch-adapter is an implementation of the Kubernetes Custom Metrics API and External Metrics API with integration for CloudWatch metrics. It allows you to scale your Kubernetes deployment using the Horizontal Pod Autoscaler (HPA) with CloudWatch metrics.

 

Prerequisites

Before starting, you need the following:

 

Getting started

Before using the k8s-cloudwatch-adapter, set up a way to manage IAM credentials to Kubernetes pods. The CloudWatch Metrics Adapter requires the following permissions to access metric data from CloudWatch:

cloudwatch:GetMetricData

Create an IAM policy with the following template:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        }
    ]
}

For demo purposes, I’m granting admin permissions to my Kubernetes worker nodes. Don’t do this in your production environment. To associate IAM roles to your Kubernetes pods, you may want to look at kube2iam or kiam.

If you’re using an EKS cluster, you most likely provisioned it with AWS CloudFormation. The following command uses AWS CloudFormation stacks to update the proper instance policy with the correct permissions:

aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/AdministratorAccess \
--role-name $(aws cloudformation describe-stacks --stack-name ${STACK_NAME} --query 'Stacks[0].Parameters[?ParameterKey==`NodeInstanceRoleName`].ParameterValue' | jq -r ".[0]")

 

Make sure to replace ${STACK_NAME} with the nodegroup stack name from the AWS CloudFormation console .

 

You can now deploy the k8s-cloudwatch-adapter to your Kubernetes cluster.

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/deploy/adapter.yaml

 

This deployment creates a new namespace—custom-metrics—and deploys the necessary ClusterRole, Service Account, and Role Binding values, along with the deployment of the adapter. Use the created custom resource definition (CRD) to define the configuration for the external metrics to retrieve from CloudWatch. The adapter reads the configuration defined in ExternalMetric CRDs and loads its external metrics. That allows you to use HPA to autoscale your Kubernetes pods.

 

Verifying the deployment

Next, query the metrics APIs to see if the adapter is deployed correctly. Run the following command:

$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq.
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
  ]
}

There are no resources from the response because you haven’t registered any metric resources yet.

 

Deploying an Amazon SQS application

Next, deploy a sample SQS application to test out k8s-cloudwatch-adapter. The SQS producer and consumer are provided, together with the YAML files for deploying the consumer, metric configuration, and HPA.

Both the producer and consumer use an SQS queue named helloworld. If it doesn’t exist already, the producer creates this queue.

Deploy the consumer with the following command:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/consumer-deployment.yaml

 

You can verify that the consumer is running with the following command:

$ kubectl get deploy sqs-consumer
NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
sqs-consumer   1         1         1            0           5s

 

Set up Amazon CloudWatch metric and HPA

Next, create an ExternalMetric resource for the CloudWatch metric. Take note of the Kind value for this resource. This CRD resource tells the adapter how to retrieve metric data from CloudWatch.

You define the query parameters used to retrieve the ApproximateNumberOfMessagesVisible for an SQS queue named helloworld. For details about how metric data queries work, see CloudWatch GetMetricData API.

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric:
  metadata:
    name: hello-queue-length
  spec:
    name: hello-queue-length
    resource:
      resource: "deployment"
      queries:
        - id: sqs_helloworld
          metricStat:
            metric:
              namespace: "AWS/SQS"
              metricName: "ApproximateNumberOfMessagesVisible"
              dimensions:
                - name: QueueName
                  value: "helloworld"
            period: 300
            stat: Average
            unit: Count
          returnData: true

 

Create the ExternalMetric resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/externalmetric.yaml

 

Then, set up the HPA for your consumer. Here is the configuration to use:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: sqs-consumer-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: sqs-consumer
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: hello-queue-length
      targetValue: 30

 

This HPA rule starts scaling out when the number of messages visible in your SQS queue exceeds 30, and scales in when there are fewer than 30 messages in the queue.

Create the HPA resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/hpa.yaml

 

Generate load using a producer

Finally, you can start generating messages to the queue:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/producer-deployment.yaml

On a separate terminal, you can now watch your HPA retrieving the queue length and start scaling the replicas. SQS metrics generate at five-minute intervals, so give the process a few minutes:

$ kubectl get hpa sqs-consumer-scaler -w

 

Clean up

After you complete this experiment, you can delete the Kubernetes deployment and respective resources.

Run the following commands to remove the consumer, external metric, HPA, and SQS queue:

$ kubectl delete deploy sqs-producer
$ kubectl delete hpa sqs-consumer-scaler
$ kubectl delete externalmetric sqs-helloworld-length
$ kubectl delete deploy sqs-consumer

$ aws sqs delete-queue helloworld

 

Other CloudWatch integrations

AWS recently announced the preview for Amazon CloudWatch Container Insights, which monitors, isolates, and diagnoses containerized applications running on EKS and Kubernetes clusters. To get started, see Using Container Insights.

 

Get involved

This project is currently under development. AWS welcomes issues and pull requests, and would love to hear your feedback.

How could this adapter be best implemented to work in your environment? Visit the Amazon CloudWatch Metrics Adapter for Kubernetes project on GitHub and let AWS know what you think.

Access Private applications on AWS Fargate using Amazon API Gateway PrivateLink

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/access-private-applications-on-aws-fargate-using-amazon-api-gateway-privatelink/

This post is contributed by Mani Chandrasekaran | Solutions Architect, AWS

 

Customers would like to run container-based applications in a private subnet inside a virtual private cloud (VPC), where there is no direct connectivity from the outside world to these applications. This is a very secure way of running applications which do not want to be directly exposed to the internet.

AWS Fargate is a compute engine for Amazon ECS that enables you to run containers without having to manage servers or clusters. With AWS Fargate with Amazon ECS, you don’t have to provision, configure, and scale clusters of virtual machines to run containers.

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. The API Gateway private integration makes it simple to expose your HTTP and HTTPS resources behind a virtual private cloud (VPC) with Amazon VPC private endpoints. This allows access by clients outside of the VPC without exposing the resources to the internet.

This post shows how API Gateway can be used to expose an application running on Fargate in a private subnet in a VPC using API Gateway private integration through AWS PrivateLink. With the API Gateway private integration, you can enable access to HTTP and HTTPS resources in a VPC without detailed knowledge of private network configurations or technology-specific appliances.

 

Architecture

You deploy a simple NGINX application running on Fargate within a private subnet as a first step, and then expose this NGINX application to the internet using the API.

As shown in the architecture in the following diagram, you create a VPC with two private subnets and two public subnets. To enable the Fargate tasks to download Docker images from Amazon ECR, you deploy two network address translation (NAT) gateways in the public subnets.

You also deploy a container application, NGINX, as an ECS service with one or more Fargate tasks running inside the private subnets. You provision an internal Network Load Balancer in the VPC private subnets and target the ECS service running as Fargate tasks. This is provisioned using an AWS CloudFormation template (link provided later in this post).

The integration between API Gateway and the Network Load Balancer inside the private subnet uses an API Gateway VpcLink resource. The VpcLink encapsulates connections between the API and targeted VPC resources when the application is hosted on Fargate. You set up an API with the private integration by creating a VpcLink that targets the Network Load Balancer and then uses the VpcLink as an integration endpoint .

 

 

Deployment

Here are the steps to deploy this solution:

  1. Deploy an application on Fargate.
  2. Set up an API Gateway private integration.
  3. Deploy and test the API.
  4. Clean up resources to avoid incurring future charges.

 

Step 1 — Deploy an application on AWS Fargate
I’ve created an AWS CloudFormation template to make it easier for you to get started.

  1. Get the AWS CloudFormation template.
  2. In the AWS Management Console, deploy the CloudFormation template in an AWS Region where Fargate and API Gateway are available.
  3. On the Create stack page, specify the parameters specific to your environment. Or, use the default parameters, which deploy an NGINX Docker image as a Fargate task in an ECS cluster across two Availability Zones.

When the process is finished, the status changes to CREATE_COMPLETE and the details of the Network Load Balancer, VPC, subnets, and ECS cluster name appear on the Outputs tab.

 

Step 2 — Set up an API Gateway Private Integration
Next, set up an API Gateway API with private integrations using the AWS CLI and specify the AWS Region in all the AWS CLI commands.

1. Create a VPCLink in API Gateway with the ARN of the Network Load Balancer that you provisioned. Make sure that you specify the correct endpoint URL and Region based on the AWS Region that you selected for the CloudFormation template. Run the following command:

aws apigateway create-vpc-link \
--name fargate-nlb-private-link \
--target-arns arn:aws:elasticloadbalancing:ap-south-1:xxx:loadbalancer/net/Farga-Netwo-XX/xx \
--endpoint-url https://apigateway.ap-south-1.amazonaws.com \
--region ap-south-1

The command immediately returns the following response, acknowledges the receipt of the request, and shows the PENDING status for the new VpcLink:

{
    "id": "alnXXYY",
    "name": "fargate-nlb-private-link",
    "targetArns": [
        " arn:aws:elasticloadbalancing:ap-south-1:xxx:loadbalancer/net/Farga-Netwo-XX/xx"
    ],
    "status": "PENDING"
}

It takes 2–4 minutes for API Gateway to create the VpcLink. When the operation finishes successfully, the status changes to AVAILABLE.

 

2. To verify that the VpcLink was successfully created, run the following command:

aws apigateway get-vpc-link --vpc-link-id alnXXYY --region ap-south-1

When the VpcLink status is AVAILABLE, you can create the API and integrate it with the VPC resource through the VpcLink.

 

3. To set up an API, run the following command to create an API Gateway RestApi resource

aws apigateway create-rest-api --name 'API Gateway VPC Link NLB Fargate Test' --region ap-south-1

{
    "id": "qc83xxxx",
    "name": "API Gateway VPC Link NLB Fargate Test",
    "createdDate": 1547703133,
    "apiKeySource": "HEADER",
    "endpointConfiguration": {
        "types": [
            "EDGE"
        ]
    }
}

Find the ID value of the RestApi in the returned result. In this example, it is qc83xxxx. Use this ID to finish the operations on the API, including methods and integrations setup.

 

4. In this example, you create an API with only a GET method on the root resource (/) and integrate the method with the VpcLink.

Set up the GET / method. First, get the identifier of the root resource (/):

aws apigateway get-resources --rest-api-id qc83xxxx --region ap-south-1

In the output, find the ID value of the / path. In this example, it is mq165xxxx.

 

5. Set up the method request for the API method of GET /:

aws apigateway put-method \
       --rest-api-id qc83xxxx \
       --resource-id mq165xxxx \
       --http-method GET \
       --authorization-type "NONE" --region ap-south-1

6. Set up the private integration of the HTTP_PROXY type and call the put-integration command:

aws apigateway put-integration \
--rest-api-id qc83xxxx \
--resource-id mq165xxxx \
--uri 'http://myApi.example.com' \
--http-method GET \
--type HTTP_PROXY \
--integration-http-method GET \
--connection-type VPC_LINK \
--connection-id alnXXYY --region ap-south-1

For a private integration, you must set connection-type to VPC_LINK and set connection-id to the VpcLink identifier, alnXXYY in this example. The URI parameter is not used to route requests to your endpoint, but is used to set the host header and for certificate validation.

 

Step 3 — Deploy and test the API

To test the API, run the following command to deploy the API:

aws apigateway create-deployment \
--rest-api-id qc83xxxx \
--stage-name test \
--variables vpcLinkId= alnXXYY --region ap-south-1

Test the APIs with tools such as Postman or the curl command. To call a deployed API, you must submit requests to the URL for the API Gateway component service for API execution, known as execute-api.

The base URL for REST APIs is in this format:

https://{restapi_id}.execute-api.{region}.amazonaws.com/{stage_name}/

Replace {restapi_id} with the API identifier, {region} with the Region, and {stage_name} with the stage name of the API deployment.

To test the API with curl, run the following command:

curl -X GET https://qc83xxxx.execute-api.ap-south-1.amazonaws.com/test/

The curl response should be the NGINX home page.

To test the API with Postman, place the Invoke URL into Postman and choose GET as the method. Choose Send.

The returned result (the NGINX home page) appears.

For more information, see Use Postman to Call a REST API.

 

Step 4 — Clean up resources

After you finish your deployment test, make sure to delete the following resources to avoid incurring future charges.

1. Delete the REST API created in the API Gateway and Amazon VPC endpoint services using the console.
Or, in the AWS CLI, run the following command:

aws apigateway delete-rest-api --rest-api-id qc83xxxx --region ap-south-1

aws apigateway delete-vpc-link --vpc-link-id alnXXYY --region ap-south-1

2. To delete the Fargate-related resources created in CloudFormation, in the console, choose Delete Stack.

 

Conclusion

API Gateway private endpoints enable use cases for building private API–based services running on Fargate inside your own VPCs. You can take advantage of advanced features of API Gateway, such as custom authorizers, Amazon Cognito User Pools integration, usage tiers, throttling, deployment canaries, and API keys. At the same time, you can make sure the APIs or applications running in Fargate are not exposed to the internet.

Learning AWS App Mesh

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/learning-aws-app-mesh/

This post is contributed by Geremy Cohen | Solutions Architect, Strategic Accounts, AWS

At re:Invent 2018, AWS announced AWS App Mesh, a service mesh that provides application-level networking. App Mesh makes it easy for your services to communicate with each other across multiple types of compute infrastructure, including:

App Mesh standardizes how your services communicate, giving you end-to-end visibility and ensuring high availability for your applications. Service meshes like App Mesh help you run and monitor HTTP and TCP services at scale.

Using the open source Envoy proxy, App Mesh gives you access to a wide range of tools from AWS partners and the open source community. Because all traffic in and out of each service goes through the Envoy proxy, all traffic can be routed, shaped, measured, and logged. This extra level of indirection lets you build your services in any language desired without having to use a common set of communication libraries.

In this six-part series of the post, I walk you through setup and configuration of App Mesh for popular platforms and use cases, beginning with EKS. Here’s the list of the parts:

  1. Part 1: Introducing service meshes.
  2. Part 2: Prerequisites for running on EKS.
  3. Part 3: Creating example microservices on Amazon EKS.
  4. Part 4: Installing the sidecar injector and CRDs.
  5. Part 5: Configuring existing microservices.
  6. Part 6: Deploying with the canary technique.

Overview

Throughout the post series, I use diagrams to help describe what’s being built. In the following diagram:

  • The circle represents the container in which your app (microservice) code runs.
  • The dome alongside the circle represents the App Mesh (Envoy) proxy running as a sidecar container. When there is no dome present, no service mesh functionality is implemented for the pod.
  • The arrows show communications traffic between the application container and the proxy, as well as between the proxy and other pods.

PART 1: Introducing service meshes

Life without a service mesh

Best practices call for implementing observability, analytics, and routing capabilities across your microservice infrastructure in a consistent manner.

Between any two interacting services, it’s critical to implement logging, tracing, and metrics gathering—not to mention dynamic routing and load balancing—with minimal impact to your actual application code.

Traditionally, to provide these capabilities, you would compile each service with one or more SDKs that provided this logic. This is known as the “in-process design pattern,” because this logic runs in the same process as the service code.

When you only run a small number of services, running multiple SDKs alongside your application code may not be a huge undertaking. If you can find SDKs that provide the required functionality on the platforms and languages on which you are developing, compiling it into your service code is relatively straightforward.

As your application matures, the in-process design pattern becomes increasingly complex:

  • The number of engineers writing code grows, so each engineer must learn the in-process SDKs in use. They must also spend time integrating the SDKs with their own service logic and the service logic of others.
  • In shops where polyglot development is prevalent, as the number of engineers grow, so may the number of coding languages in use. In these scenarios, you’ll need to make sure that your SDKs are supported on these new languages.
  • The platforms that your engineering teams deploy services to may also increase and become disparate. You may have begun with Node.js containers on Kubernetes, but now, new microservices are being deployed with AWS Lambda, EC2, and other managed services. You’ll need to make sure that the SDK solution that you’ve chosen is compatible with these common platforms.
  • If you’re fortunate to have platform and language support for the SDKs you’re using, inconsistencies across the various SDK languages may creep in. This is especially true when you find a gap in language or platform support and implement custom operational logic for a language or platform that is unsupported.
  • Assuming you’ve accommodated for all the previous caveats, by using SDKs compiled into your service logic, you’re tightly coupling your business logic with your operations logic.

 

Enter the service mesh

Considering the increasing complexity as your application matures, the true value of service meshes becomes clear. With a service mesh, you can decouple your microservices’ observability, analytics, and routing logic from the underlying infrastructure and application layers.

The following diagram combines the previous two. Instead of incorporating these features at the code level (in-process), an out-of-process “sidecar proxy” container (represented by the pink dome) runs alongside your application code’s container in each pod.

 

In this model, consistent and decoupled analytics, logging, tracing, and routing logic capabilities are running alongside each microservice in your infrastructure as a sidecar proxy. Each sidecar proxy is configured by a unique configuration ruleset, based on the services it’s responsible for proxying. With 100% of the communications between pods and services proxied, 100% of the traffic is now observable and actionable.

 

App Mesh as the service mesh

App Mesh implements this sidecar proxy via the production-proven Envoy proxy. Envoy is arguably the most popular open-source service proxy. Created at Lyft in 2016, Envoy is a stable OSS project with wide community support. It’s defined as a “Graduated Project” by the Cloud Native Computing Foundation (CNCF). Envoy is a popular proxy solution due to its lightweight C++-based design, scalable architecture, and successful deployment record.

In the following diagram, a sidecar runs alongside each container in your application to provide its proxying logic, syncing each of their unique configurations from the App Mesh control plane.

Each one of these proxies must have its own unique configuration ruleset pushed to it to operate correctly. To achieve this, DevOps teams can push their intended ruleset configuration to the App Mesh API. From there, the App Mesh control plane reliably keeps all proxy instances up-to-date with their desired configurations. App Mesh dynamically scales to hundreds of thousands of pods, tasks, EC2 instances, and Lambda functions, adjusting configuration changes accordingly as instances scale up, down, and restart.

 

App Mesh components

App Mesh is made up of the following components:

  • Service mesh: A logical boundary for network traffic between the services that reside within it.
  • Virtual nodes: A logical pointer to a Kubernetes service, or an App Mesh virtual service.
  • Virtual routers: Handles traffic for one or more virtual services within your mesh.
  • Routes: Associated with a virtual router, it directs traffic that matches a service name prefix to one or more virtual nodes.
  • Virtual services: An abstraction of a real service that is either provided by a virtual node directly, or indirectly by means of a virtual router.
  • App Mesh sidecar: The App Mesh sidecar container configures your pods to use the App Mesh service mesh traffic rules set up for your virtual routers and virtual nodes.
  • App Mesh injector: Makes it easy to auto-inject the App Mesh sidecars into your pods.
  • App Mesh custom resource definitions: (CRD) Provided to implement App Mesh CRUD and configuration operations directly from the kubectl CLI.  Alternatively, you may use the latest version of the AWS CLI.

 

In the following parts, I walk you through the setup and configuration of each of these components.

 

Conclusion of Part 1

In this first part, I discussed in detail the advantages that service meshes provide, and the specific components that make up the App Mesh service mesh. I hope the information provided helps you to understand the benefit of all services meshes, regardless of vendor.

If you’re intrigued by what you’ve learned so far, don’t stop now!

For even more background on the components of AWS App Mesh, check out the official AWS App Mesh documentation, and when you’re ready, check out part 2 in this post where I guide you through completing the prerequisite steps to run App Mesh in your own environment.

 

 

PART 2: Setting up AWS App Mesh on Amazon EKS

 

In part 1 of this series, I discussed the functionality of service meshes like AWS App Mesh provided on Kubernetes and other services. In this post, I walk you through completing the prerequisites required to install and run App Mesh in your own Amazon EKS-based Kubernetes environment.

When you have the environment set up, be sure to leave it intact if you plan on experimenting in the future with App Mesh on your own (or throughout this series of posts).

 

Prerequisites

To run App Mesh, your environment must meet the following requirements.

  • An AWS account
  • The AWS CLI installed and configured
    • The minimal version supported is 1.16.133. You should have a Region set via the aws configure command. For this tutorial, it should work against all Regions where App Mesh and Amazon EKS are supported. Use us-west-2 if you don’t have a preference or are in doubt:
      aws configure set region us-west-2
  • The jq utility
    • The utility is required by scripts executed in this series. Make sure that you have it installed on the machine from which to run the tutorial steps.
  • Kubernetes and kubectl
    • The minimal Kubernetes and kubectl versions supported are 1.11. You need a Kubernetes cluster deployed on Amazon Elastic Compute Cloud (Amazon EC2) or on an Amazon EKS cluster. Although the steps in this tutorial demonstrate using App Mesh on Amazon EKS, the instructions also work on upstream k8s running on Amazon EC2.

Amazon EKS makes it easy to run Kubernetes on AWS. Start by creating an EKS cluster using eksctl.  For more information about how to use eksctl to spin up an EKS cluster for this exercise, see eksworkshop.com. That site has a great tutorial for getting up and running quickly with an account, as well as an EKS cluster.

 

Clone the tutorial repository

Clone the tutorial’s repository by issuing the following command in a directory of your choice:

git clone https://github.com/aws/aws-app-mesh-examples

Next, navigate to the repo’s /djapp examples directory:

cd aws-app-mesh-examples/examples/apps/djapp/

All the steps in this tutorial are executed out of this directory.

 

IAM permissions for the user and k8s worker nodes

Both k8s worker nodes and any principals (including yourself) running App Mesh AWS CLI commands must have the proper permissions to access the App Mesh service, as shown in the following code example:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "appmesh:DescribeMesh",
                "appmesh:DescribeVirtualNode",
                "appmesh:DescribeVirtualService",
                "appmesh:DescribeVirtualRouter",
                "appmesh:DescribeRoute",
                "appmesh:CreateMesh",
                "appmesh:CreateVirtualNode",
                "appmesh:CreateVirtualService",
                "appmesh:CreateVirtualRouter",
                "appmesh:CreateRoute",
                "appmesh:UpdateMesh",
                "appmesh:UpdateVirtualNode",
                "appmesh:UpdateVirtualService",
                "appmesh:UpdateVirtualRouter",
                "appmesh:UpdateRoute",
                "appmesh:ListMeshes",
                "appmesh:ListVirtualNodes",
                "appmesh:ListVirtualServices",
                "appmesh:ListVirtualRouters",
                "appmesh:ListRoutes",
                "appmesh:DeleteMesh",
                "appmesh:DeleteVirtualNode",
                "appmesh:DeleteVirtualService",
                "appmesh:DeleteVirtualRouter",
                "appmesh:DeleteRoute"
            ],
            "Resource": "*"
        }
    ]
}

To provide users with the correct permissions, add the previous policy to the user’s role or group, or create it as an inline policy.

To verify as a user that you have the correct permissions set for App Mesh, issue the following command:

aws appmesh list-meshes

If you have the proper permissions and haven’t yet created a mesh, you should get back an empty response like the following. If you did have a mesh created, you get a slightly more verbose response.

{
"meshes": []
}

If you do not have the proper permissions, you’ll see a response similar to the following:

An error occurred (AccessDeniedException) when calling the ListMeshes operation: User: arn:aws:iam::123abc:user/foo is not authorized to perform: appmesh:ListMeshes on resource: *

As a user, these permissions (or even the Administrator Access role) enable you to complete this tutorial, but it’s critical to implement least-privileged access for production or internet-facing deployments.

 

Adding the permissions for EKS worker nodes

If you’re using an Amazon EKS-based cluster to follow this tutorial (suggested), you can easily add the previous permissions to your k8s worker nodes with the following steps.

First, get the role under which your k8s workers are running:

INSTANCE_PROFILE_NAME=$(aws iam list-instance-profiles | jq -r '.InstanceProfiles[].InstanceProfileName' | grep nodegroup)
ROLE_NAME=$(aws iam get-instance-profile --instance-profile-name $INSTANCE_PROFILE_NAME | jq -r '.InstanceProfile.Roles[] | .RoleName')
echo $ROLE_NAME

Upon running that command, the $ROLE_NAME environment variable should be output similar to:

eksctl-blog-nodegroup-ng-1234-NodeInstanceRole-abc123

Copy and paste the following code to add the permissions as an inline policy to your worker node instances:

cat << EoF > k8s-appmesh-worker-policy.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "appmesh:DescribeMesh",
        "appmesh:DescribeVirtualNode",
        "appmesh:DescribeVirtualService",
        "appmesh:DescribeVirtualRouter",
        "appmesh:DescribeRoute",
        "appmesh:CreateMesh",
        "appmesh:CreateVirtualNode",
        "appmesh:CreateVirtualService",
        "appmesh:CreateVirtualRouter",
        "appmesh:CreateRoute",
        "appmesh:UpdateMesh",
        "appmesh:UpdateVirtualNode",
        "appmesh:UpdateVirtualService",
        "appmesh:UpdateVirtualRouter",
        "appmesh:UpdateRoute",
        "appmesh:ListMeshes",
        "appmesh:ListVirtualNodes",
        "appmesh:ListVirtualServices",
        "appmesh:ListVirtualRouters",
        "appmesh:ListRoutes",
        "appmesh:DeleteMesh",
        "appmesh:DeleteVirtualNode",
        "appmesh:DeleteVirtualService",
        "appmesh:DeleteVirtualRouter",
        "appmesh:DeleteRoute"
  ],
      "Resource": "*"
    }
  ]
}
EoF

aws iam put-role-policy --role-name $ROLE_NAME --policy-name AppMesh-Policy-For-Worker --policy-document file://k8s-appmesh-worker-policy.json

To verify that the policy was attached to the role, run the following command:

aws iam get-role-policy --role-name $ROLE_NAME --policy-name AppMesh-Policy-For-Worker

To test that your worker nodes are able to use these permissions correctly, run the following job from the project’s directory.

NOTE: The following YAML is configured for the us-west-2 Region. If you are running your cluster and App Mesh out of a different Region, modify the –region value found in the command attribute (not in the image attribute) in the YAML before proceeding, as shown below:

command: ["aws","appmesh","list-meshes","—region","us-west-2"]

Execute the job by running the following command:

kubectl apply -f awscli.yaml

Make sure that the job is completed by issuing the command:

kubectl get jobs

You should see that the desired and successful values are both one:

NAME     DESIRED   SUCCESSFUL   AGE
awscli   1         1            1m

Inspect the output of the job:

kubectl logs jobs/awscli

Similar to the list-meshes call, the output of this command shows whether your nodes can make App Mesh API calls successfully.

This output shows that the workers have proper access:

{
"meshes": []
}

While this output shows that they don’t:

An error occurred (AccessDeniedException) when calling the ListMeshes operation: User: arn:aws:iam::123abc:user/foo is not authorized to perform: appmesh:ListMeshes on resource: *

If you have to troubleshoot further, you must first delete the job before you run it again to test it:

kubectl delete jobs/awscli

After you’ve verified that you have the proper permissions set, you are ready to move forward and understand more about the demo application you’re going to build on top of App Mesh.

 

Cleaning up

When you’re done experimenting and want to delete all the resources created during this series, run the cleanup script via the following command line:

./cleanup.sh

This script does not delete any nodes in your k8s cluster. It only deletes the DJ App and App Mesh components created throughout this series of posts.

Make sure to leave the cluster intact if you plan on experimenting in the future with App Mesh on your own or throughout this series of posts.

 

Conclusion of Part 2

In this second part of the series, I walked you through the prerequisites required to install and run App Mesh in an Amazon EKS-based Kubernetes environment. In part 3 , I show you how to create a simple microservice that can be implemented on an App Mesh service mesh.

 

 

PART 3: Creating example microservices on Amazon EKS

 

In part 2 of this series, I walked you through completing the setup steps needed to configure your environment to run AWS App Mesh. In this post, I walk you through creating three Amazon EKS-based microservices. These microservices work together to form an app called DJ App, which you use later to demonstrate App Mesh functionality.

 

Prerequisites

Make sure that you’ve completed parts 1 and 2 of this series before running through the steps in this post.

 

Overview of DJ App

I’ll now walk you through creating an example app on App Mesh called DJ App, which is used for a cloud-based music service. This application is composed of the following three microservices:

  • dj
  • metal-v1
  • jazz-v1

The dj service makes requests to either the jazz or metal backends for artist lists. If the dj service requests from the jazz backend, then musical artists such as Miles Davis or Astrud Gilberto are returned. Requests made to the metal backend return artists such as Judas Priest or Megadeth.

Today, the dj service is hardwired to make requests to the metal-v1 service for metal requests and to the jazz-v1 service for jazz requests. Each time there is a new metal or jazz release, a new version of dj must also be rolled out to point to its new upstream endpoints. Although it works for now, it’s not an optimal configuration to maintain for the long term.

App Mesh can be used to simplify this architecture. By virtualizing the metal and jazz service via kubectl or the AWS CLI, routing changes can be made dynamically to the endpoints and versions of your choosing. That minimizes the need for the complete re-deployment of DJ App each time there is a new metal or jazz service release.

 

Create the initial architecture

To begin, I’ll walk you through creating the initial application architecture. As the following diagram depicts, in the initial architecture, there are three k8s services:

  • The dj service, which serves as the DJ App entrypoint
  • The metal-v1 service backend
  • The jazz-v1 service backend

As depicted by the arrows, the dj service will make requests to either the metal-v1, or jazz-v1 backends.

First, deploy the k8s components that make up this initial architecture. To keep things organized, create a namespace for the app called prod, and deploy all of the DJ App components into that namespace. To create the prod namespace, issue the following command:

kubectl apply -f 1_create_the_initial_architecture/1_prod_ns.yaml

The output should be similar to the following:

namespace/prod created

Now that you’ve created the prod namespace, deploy the DJ App (the dj, metal, and jazz microservices) into it. Create the DJ App deployment in the prod namespace by issuing the following command:

kubectl apply -nprod -f 1_create_the_initial_architecture/1_initial_architecture_deployment.yaml

The output should be similar to:

deployment.apps "dj" created
deployment.apps "metal-v1" created
deployment.apps "jazz-v1" created

Create the services that front these deployments by issuing the following command:

kubectl apply -nprod -f 1_create_the_initial_architecture/1_initial_architecture_services.yaml

The output should be similar to:

service "dj" created
service "metal-v1" created
service "jazz-v1" created

Now, verify that everything has been set up correctly by getting all resources from the prod namespace. Issue this command:

kubectl get all -nprod

The output should display the dj, jazz, and metal pods, and the services, deployments, and replica sets, similar to the following:

NAME                            READY   STATUS    RESTARTS   AGE
pod/dj-5b445fbdf4-qf8sv         1/1     Running   0          1m
pod/jazz-v1-644856f4b4-mshnr    1/1     Running   0          1m
pod/metal-v1-84bffcc887-97qzw   1/1     Running   0          1m

NAME               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/dj         ClusterIP   10.100.247.180   <none>        9080/TCP   15s
service/jazz-v1    ClusterIP   10.100.157.174   <none>        9080/TCP   15s
service/metal-v1   ClusterIP   10.100.187.186   <none>        9080/TCP   15s

NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/dj         1         1         1            1           1m
deployment.apps/jazz-v1    1         1         1            1           1m
deployment.apps/metal-v1   1         1         1            1           1m

NAME                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/dj-5b445fbdf4         1         1         1       1m
replicaset.apps/jazz-v1-644856f4b4    1         1         1       1m
replicaset.apps/metal-v1-84bffcc887   1         1         1       1m

When you’ve verified that all resources have been created correctly in the prod namespace, test out this initial version of DJ App. To do that, exec into the DJ pod, and issue a curl request out to the jazz-v1 and metal-v1 backends. Get the name of the DJ pod by listing all the pods with the dj app selector:

kubectl get pods -nprod -l app=dj

The output should be similar to:

NAME                  READY     STATUS    RESTARTS   AGE
dj-5b445fbdf4-8xkwp   1/1       Running   0          32s

Next, exec into the DJ pod:

kubectl exec -nprod -it <your-dj-pod-name> bash

The output should be similar to:

[email protected]:/usr/src/app#

Now that you have a root prompt into the DJ pod, issue a curl request to the jazz-v1 backend service:

curl jazz-v1.prod.svc.cluster.local:9080;echo

The output should be similar to:

["Astrud Gilberto","Miles Davis"]

Try it again, but this time issue the command to the metal-v1.prod.svc.cluster.local backend on port 9080:

curl metal-v1.prod.svc.cluster.local:9080;echo

You should get a list of heavy metal bands:

["Megadeth","Judas Priest"]

When you’re done exploring this vast world of music, press CTRL-D, or type exit to exit the container’s shell:

[email protected]:/usr/src/app# exit
command terminated with exit code 1
$

Congratulations on deploying the initial DJ App architecture!

 

Cleaning up

When you’re done experimenting and want to delete all the resources created during this series, run the cleanup script via the following command line:

./cleanup.sh

This script does not delete any nodes in your k8s cluster. It only deletes the DJ app and App Mesh components created throughout this series of posts.

Make sure to leave the cluster intact if you plan on experimenting in the future with App Mesh on your own or throughout this series of posts.

 

Conclusion of Part 3

In this third part of the series, I demonstrated how to create three simple Kubernetes-based microservices, which working together, form an app called DJ App. This app is later used to demonstrate App Mesh functionality.

In part 4, I show you how to install the App Mesh sidecar injector and CRDs, which make defining and configuring App Mesh components easy.

 

 

PART 4: Installing the sidecar injector and CRDs

 

In part 3 of this series, I walked you through setting up a basic microservices-based application called DJ App on Kubernetes with Amazon EKS. In this post, I demonstrate how to set up and configure the AWS App Mesh sidecar injector and custom resource definitions (CRDs).  As you will see later, the sidecar injector and CRD components make defining and configuring DJ App’s service mesh more convenient.

 

Prerequisites

Make sure that you’ve completed parts 1–3 of this series before running through the steps in this post.

 

Installing the App Mesh sidecar

As decoupled logic, an App Mesh sidecar container must run alongside each pod in the DJ App deployment. This can be set up in few different ways:

  1. Before installing the deployment, you could modify the DJ App deployment’s container specs to include App Mesh sidecar containers. When the app is deployed, it would run the sidecar.
  2. After installing the deployment, you could patch the deployment to include the sidecar container specs. Upon applying this patch, the old pods are torn down, and the new pods come up with the sidecar.
  3. You can implement the App Mesh injector controller, which watches for new pods to be created and automatically adds the sidecar data to the pods as they are deployed.

For this tutorial, I walk you through the App Mesh injector controller option, as it enables subsequent pod deployments to automatically come up with the App Mesh sidecar. This is not only quicker in the long run, but it also reduces the chances of typos that manual editing may introduce.

 

Creating the injector controller

To create the injector controller, run a script that creates a namespace, generates certificates, and then installs the injector deployment.

From the base repository directory, change to the injector directory:

cd 2_create_injector

Next, run the create.sh script:

./create.sh

The output should look similar to the following:

namespace/appmesh-inject created
creating certs in tmpdir /var/folders/02/qfw6pbm501xbw4scnk20w80h0_xvht/T/tmp.LFO95khQ
Generating RSA private key, 2048 bit long modulus
.........+++
..............................+++
e is 65537 (0x10001)
certificatesigningrequest.certificates.k8s.io/aws-app-mesh-inject.appmesh-inject created
NAME                                 AGE   REQUESTOR          CONDITION
aws-app-mesh-inject.appmesh-inject   0s    kubernetes-admin   Pending
certificatesigningrequest.certificates.k8s.io/aws-app-mesh-inject.appmesh-inject approved
secret/aws-app-mesh-inject created

processing templates
Created injector manifest at:/2_create_injector/inject.yaml

serviceaccount/aws-app-mesh-inject-sa created
clusterrole.rbac.authorization.k8s.io/aws-app-mesh-inject-cr unchanged
clusterrolebinding.rbac.authorization.k8s.io/aws-app-mesh-inject-binding configured
service/aws-app-mesh-inject created
deployment.apps/aws-app-mesh-inject created
mutatingwebhookconfiguration.admissionregistration.k8s.io/aws-app-mesh-inject unchanged

Waiting for pods to come up...

App Inject Pods and Services After Install:

NAME                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
aws-app-mesh-inject   ClusterIP   10.100.165.254   <none>        443/TCP   16s
NAME                                   READY   STATUS    RESTARTS   AGE
aws-app-mesh-inject-5d84d8c96f-gc6bl   1/1     Running   0          16s

If you’re seeing this output, the injector controller has been installed correctly. By default, the injector doesn’t act on any pods—you must give it the criteria on what to act on. For the purpose of this tutorial, you’ll next configure it to inject the App Mesh sidecar into any new pods created in the prod namespace.

Return to the repo’s base directory:

cd ..

Run the following command to label the prod namespace:

kubectl label namespace prod appmesh.k8s.aws/sidecarInjectorWebhook=enabled

The output should be similar to the following:

namespace/prod labeled

Next, verify that the injector controller is running:

kubectl get pods -nappmesh-inject

You should see output similar to the following:

NAME                                   READY   STATUS    RESTARTS   AGE
aws-app-mesh-inject-78c59cc699-9jrb4   1/1     Running   0          1h

With the injector portion of the setup complete, I’ll now show you how to create the App Mesh components.

 

Choosing a way to create the App Mesh components

There are two ways to create the components of the App Mesh service mesh:

For this tutorial, I show you how to use kubectl to define the App Mesh components.  To do this, add the CRDs and the App Mesh controller logic that syncs your Kubernetes cluster’s CRD state with the AWS Cloud App Mesh control plane.

 

Adding the CRDs and App Mesh controller

To add the CRDs, run the following commands from the repository base directory:

kubectl apply -f 3_add_crds/mesh-definition.yaml
kubectl apply -f 3_add_crds/virtual-node-definition.yaml
kubectl apply -f 3_add_crds/virtual-service-definition.yaml

The output should be similar to the following:

customresourcedefinition.apiextensions.k8s.io/meshes.appmesh.k8s.aws created
customresourcedefinition.apiextensions.k8s.io/virtualnodes.appmesh.k8s.aws created
customresourcedefinition.apiextensions.k8s.io/virtualservices.appmesh.k8s.aws created

Next, add the controller by executing the following command:

kubectl apply -f 3_add_crds/controller-deployment.yaml

The output should be similar to the following:

namespace/appmesh-system created
deployment.apps/app-mesh-controller created
serviceaccount/app-mesh-sa created
clusterrole.rbac.authorization.k8s.io/app-mesh-controller created
clusterrolebinding.rbac.authorization.k8s.io/app-mesh-controller-binding created

Run the following command to verify that the App Mesh controller is running:

kubectl get pods -nappmesh-system

You should see output similar to the following:

NAME                                   READY   STATUS    RESTARTS   AGE
app-mesh-controller-85f9d4b48f-j9vz4   1/1     Running   0          7m

NOTE: The CRD and injector are AWS-supported open source projects. If you plan to deploy the CRD or injector for production projects, always build them from the latest AWS GitHub repos and deploy them from your own container registry. That way, you stay up-to-date on the latest features and bug fixes.

 

Cleaning up

When you’re done experimenting and want to delete all the resources created during this series, run the cleanup script via the following command line:

./cleanup.sh

This script does not delete any nodes in your k8s cluster. It only deletes the DJ app and App Mesh components created throughout this series of posts.

Make sure to leave the cluster intact if you plan on experimenting in the future with App Mesh on your own or throughout this series of posts.

 

Conclusion of Part 4

In this fourth part of the series, I walked you through setting up the App Mesh sidecar injector and CRD components. In part 5, I show you how to define the App Mesh components required to run DJ App on a service mesh.

 

 

PART 5: Configuring existing microservices

 

In part 4 of this series, I demonstrated how to set up the AWS App Mesh Sidecar Injector and CRDs. In this post, I’ll show how to configure the DJ App microservices to run on top of App Mesh by creating the required App Mesh components.

 

Prerequisites

Make sure that you’ve completed parts 1–4 of this series before running through the steps in this post.

 

DJ App revisited

As shown in the following diagram, the dj service is hardwired to make requests to either the metal-v1 or jazz-v1 backends.

The service mesh-enabled version functionally does exactly what the current version does. The only difference is that you use App Mesh to create two new virtual services called metal and jazz. The dj service now makes a request to these metal or jazz virtual services, which route to their metal-v1 and jazz-v1 counterparts accordingly, based on the virtual services’ routing rules. The following diagram depicts this process.

By virtualizing the metal and jazz services, you can dynamically configure routing rules to the versioned backends of your choosing. That eliminates the need to re-deploy the entire DJ App each time there’s a new metal or jazz service version release.

 

Now that you have a better idea of what you’re building, I’ll show you how to create the mesh.

 

Creating the mesh

The mesh component, which serves as the App Mesh foundation, must be created first. Call the mesh dj-app, and define it in the prod namespace by executing the following command from the repository’s base directory:

kubectl create -f 4_create_initial_mesh_components/mesh.yaml

You should see output similar to the following:

mesh.appmesh.k8s.aws/dj-app created

Because an App Mesh mesh is a custom resource, kubectl can be used to view it using the get command. Run the following command:

kubectl get meshes -nprod

This yields the following:

NAME     AGE
dj-app   1h

As is the case for any of the custom resources you interact with in this tutorial, you can also view App Mesh resources using the AWS CLI:

aws appmesh list-meshes

{
    "meshes": [
        {
            "meshName": "dj-app",
            "arn": "arn:aws:appmesh:us-west-2:123586676:mesh/dj-app"
        }
    ]
}

aws appmesh describe-mesh --mesh-name dj-app

{
    "mesh": {
        "status": {
            "status": "ACTIVE"
        },
        "meshName": "dj-app",
        "metadata": {
            "version": 1,
            "lastUpdatedAt": 1553233281.819,
            "createdAt": 1553233281.819,
            "arn": "arn:aws:appmesh:us-west-2:123586676:mesh/dj-app",
            "uid": "10d86ae0-ece7-4b1d-bc2d-08064d9b55e1"
        }
    }
}

NOTE: If you do not see dj-app returned from the previous list-meshes command, then your user account (as well as your worker nodes) may not have the correct IAM permissions to access App Mesh resources. Verify that you and your worker nodes have the correct permissions per part 2 of this series.

 

Creating the virtual nodes and virtual services

With the foundational mesh component created, continue onward to define the App Mesh virtual node and virtual service components. All physical Kubernetes services that interact with each other in App Mesh must first be defined as virtual node objects.

Abstracting out services as virtual nodes helps App Mesh build rulesets around inter-service communication. In addition, as you define virtual service objects, virtual nodes may be referenced as inputs and target endpoints for those virtual services. Because of this, it makes sense to define the virtual nodes first.

Based on the first App Mesh-enabled architecture, the physical service dj makes requests to two new virtual services—metal and jazz. These services route requests respectively to the physical services metal-v1 and jazz-v1, as shown in the following diagram.

Because there are three physical services involved in this configuration, you’ll need to define three virtual nodes. To do that, enter the following:

kubectl create -nprod -f 4_create_initial_mesh_components/nodes_representing_physical_services.yaml

The output should be similar to:

virtualnode.appmesh.k8s.aws/dj created
virtualnode.appmesh.k8s.aws/jazz-v1 created
virtualnode.appmesh.k8s.aws/metal-v1 created

If you open up the YAML in your favorite editor, you may notice a few things about these virtual nodes.

They’re both similar, but for the purposes of this tutorial, examine just the metal-v1.prod.svc.cluster.local VirtualNode:

apiVersion: appmesh.k8s.aws/v1beta1
kind: VirtualNode
metadata:
  name: metal-v1
  namespace: prod
spec:
  meshName: dj-app
  listeners:
    - portMapping:
        port: 9080
        protocol: http
  serviceDiscovery:
    dns:
      hostName: metal-v1.prod.svc.cluster.local

...

According to this YAML, this virtual node points to a service (spec.serviceDiscovery.dns.hostName: metal-v1.prod.svc.cluster.local) that listens on a given port for requests (spec.listeners.portMapping.port: 9080).

You may notice that jazz-v1 and metal-v1 are similar to the dj virtual node, with one key difference; the dj virtual node contains a backend attribute:

apiVersion: appmesh.k8s.aws/v1beta1
kind: VirtualNode
metadata:
  name: dj
  namespace: prod
spec:
  meshName: dj-app
  listeners:
    - portMapping:
        port: 9080
        protocol: http
  serviceDiscovery:
    dns:
      hostName: dj.prod.svc.cluster.local
  backends:
    - virtualService:
        virtualServiceName: jazz.prod.svc.cluster.local
    - virtualService:
        virtualServiceName: metal.prod.svc.cluster.local

The backend attribute specifies that dj is allowed to make requests to the jazz and metal virtual services only.

At this point, you’ve created three virtual nodes:

kubectl get virtualnodes -nprod

NAME            AGE
dj              6m
jazz-v1         6m
metal-v1        6m

The last step is to create the two App Mesh virtual services that intercept and route requests made to jazz and metal. To do this, run the following command:

kubectl apply -nprod -f 4_create_initial_mesh_components/virtual-services.yaml

The output should be similar to:

virtualservice.appmesh.k8s.aws/jazz.prod.svc.cluster.local created
virtualservice.appmesh.k8s.aws/metal.prod.svc.cluster.local created

If you inspect the YAML, you may notice that it created two virtual service resources. Requests made to jazz.prod.svc.cluster.local are intercepted by App Mesh and routed to the virtual node jazz-v1.

Similarly, requests made to metal.prod.svc.cluster.local are routed to the virtual node metal-v1:

apiVersion: appmesh.k8s.aws/v1beta1
kind: VirtualService
metadata:
  name: jazz.prod.svc.cluster.local
  namespace: prod
spec:
  meshName: dj-app
  virtualRouter:
    name: jazz-router
  routes:
    - name: jazz-route
      http:
        match:
          prefix: /
        action:
          weightedTargets:
            - virtualNodeName: jazz-v1
              weight: 100

---
apiVersion: appmesh.k8s.aws/v1beta1
kind: VirtualService
metadata:
  name: metal.prod.svc.cluster.local
  namespace: prod
spec:
  meshName: dj-app
  virtualRouter:
    name: metal-router
  routes:
    - name: metal-route
      http:
        match:
          prefix: /
        action:
          weightedTargets:
            - virtualNodeName: metal-v1
              weight: 100

NOTE: Remember to use fully qualified DNS names for the virtual service’s metadata.name field to prevent the chance of name collisions when using App Mesh cross-cluster.

With these virtual services defined, to access them by name, clients (in this case, the dj container) first perform a DNS lookup to jazz.prod.svc.cluster.local or metal.prod.svc.cluster.local before making the HTTP request.

If the dj container (or any other client) cannot resolve that name to an IP, the subsequent HTTP request fails with a name lookup error.

The existing physical services (jazz-v1, metal-v1, dj) are defined as physical Kubernetes services, and therefore have resolvable names:

kubectl get svc -nprod

NAME       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
dj         ClusterIP   10.100.247.180   <none>        9080/TCP   16h
jazz-v1    ClusterIP   10.100.157.174   <none>        9080/TCP   16h
metal-v1   ClusterIP   10.100.187.186   <none>        9080/TCP   16h

However, the new jazz and metal virtual services we just created don’t (yet) have resolvable names.

To provide the jazz and metal virtual services with resolvable IP addresses and hostnames, define them as Kubernetes services that do not map to any deployments or pods. Do this by creating them as k8s services without defining selectors for them. Because App Mesh is intercepting and routing requests made for them, they don’t have to map to any pods or deployments on the k8s-side.

To register the placeholder names and IP addresses for these virtual services, run the following command:

kubectl create -nprod -f 4_create_initial_mesh_components/metal_and_jazz_placeholder_services.yaml

The output should be similar to:

service/jazz created
service/metal created

You can now use kubectl to get the registered metal and jazz virtual services:

kubectl get -nprod virtualservices

NAME                           AGE
jazz.prod.svc.cluster.local    10m
metal.prod.svc.cluster.local   10m

You can also get the virtual service placeholder IP addresses and physical service IP addresses:

kubectl get svc -nprod

NAME       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
dj         ClusterIP   10.100.247.180   <none>        9080/TCP   17h
jazz       ClusterIP   10.100.220.118   <none>        9080/TCP   27s
jazz-v1    ClusterIP   10.100.157.174   <none>        9080/TCP   17h
metal      ClusterIP   10.100.122.192   <none>        9080/TCP   27s
metal-v1   ClusterIP   10.100.187.186   <none>        9080/TCP   17h

As such, when name lookup requests are made to your virtual services alongside their physical service counterparts, they resolve.

Currently, if you describe any of the pods running in the prod namespace, they are running with just one container (the same one with which you initially deployed it):

kubectl get pods -nprod

NAME                        READY   STATUS    RESTARTS   AGE
dj-5b445fbdf4-qf8sv         1/1     Running   0          3h
jazz-v1-644856f4b4-mshnr    1/1     Running   0          3h
metal-v1-84bffcc887-97qzw   1/1     Running   0          3h

kubectl describe pods/dj-5b445fbdf4-qf8sv -nprod

...
Containers:
  dj:
    Container ID:   docker://76e6d5f7101dfce60158a63cf7af9fcb3c821c087db360e87c5e2fb8850b7aa9
    Image:          970805265562.dkr.ecr.us-west-2.amazonaws.com/hello-world:latest
    Image ID:       docker-pullable://970805265562.dkr.ecr.us-west-2.amazonaws.com/[email protected]:581fe44cf2413a48f0cdf005b86b025501eaff6cafc7b26367860e07be060753
    Port:           9080/TCP
    Host Port:      0/TCP
    State:          Running
...

The injector controller installed earlier watches for new pods to be created and ensures that any new pods created in the prod namespace are injected with the App Mesh sidecar. Because the dj pods were already running before the injector was created, you’ll now force them to be re-created, this time with the sidecars auto-injected into them.

In production, there are more graceful ways to do this. For the purpose of this tutorial, an easy way to have the deployment re-create the pods in an innocuous fashion is to patch a simple date annotation into the deployment.

To do that with your current deployment, first get all the prod namespace pod names:

kubectl get pods -nprod

The output is the pod names:

NAME                        READY   STATUS    RESTARTS   AGE
dj-5b445fbdf4-qf8sv         1/1     Running   0          3h
jazz-v1-644856f4b4-mshnr    1/1     Running   0          3h
metal-v1-84bffcc887-97qzw   1/1     Running   0          3h

 

Under the READY column, you see 1/1, which indicates that one container is running for each pod.

Next, run the following commands to add a date label to each dj, jazz-v1, and metal-1 deployment, forcing the pods to be re-created:

kubectl patch deployment dj -nprod -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%s'`\"}}}}}"
kubectl patch deployment metal-v1 -nprod -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%s'`\"}}}}}"
kubectl patch deployment jazz-v1 -nprod -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%s'`\"}}}}}"

Again, get the pods:

kubectl get pods -nprod

Under READY, you see 2/2, which indicates that two containers for each pod are running:

NAME                        READY   STATUS    RESTARTS   AGE
dj-6cfb85cdd9-z5hsp         2/2     Running   0          10m
jazz-v1-79d67b4fd6-hdrj9    2/2     Running   0          16s
metal-v1-769b58d9dc-7q92q   2/2     Running   0          18s

NOTE: If you don’t see this exact output, wait about 10 seconds (your redeployment is underway), and re-run the command.

Now describe the new dj pod to get more detail:

...
Containers:
  dj:
    Container ID:   docker://bef63f2e45fb911f78230ef86c2a047a56c9acf554c2272bc094300c6394c7fb
    Image:          970805265562.dkr.ecr.us-west-2.amazonaws.com/hello-world:latest
    ...
  envoy:
    Container ID:   docker://2bd0dc0707f80d436338fce399637dcbcf937eaf95fed90683eaaf5187fee43a
    Image:          111345817488.dkr.ecr.us-west-2.amazonaws.com/aws-appmesh-envoy:v1.8.0.2-beta
    ...

Both the original container and the auto-injected sidecar are running for any new pods created in the prod namespace.

Testing the App Mesh architecture

To test if the new architecture is working as expected, exec into the dj container. Get the name of your dj pod by listing all pods with the dj selector:

kubectl get pods -nprod -lapp=dj

The output should be similar to the following:

NAME                  READY     STATUS    RESTARTS   AGE
dj-5b445fbdf4-8xkwp   1/1       Running   0          32s

Next, exec into the dj pod returned from the last step:

kubectl exec -nprod -it <your-dj-pod-name> bash

The output should be similar to:

[email protected]:/usr/src/app#

Now that you have a root prompt into the dj pod, make a curl request to the virtual service jazz on port 9080. Your request simulates what would happen if code running in the same pod made a request to the jazz backend:

curl jazz.prod.svc.cluster.local:9080;echo

The output should be similar to the following:

["Astrud Gilberto","Miles Davis"]

Try it again, but issue the command to the virtual metal service:

curl metal.prod.svc.cluster.local:9080;echo

You should get a list of heavy metal bands:

["Megadeth","Judas Priest"]

When you’re done exploring this vast, service-mesh-enabled world of music, press CTRL-D, or type exit to exit the container’s shell:

[email protected]:/usr/src/app# exit
command terminated with exit code 1
$

 

Cleaning up

When you’re done experimenting and want to delete all the resources created during this series, run the cleanup script via the following command line:

./cleanup.sh

This script does not delete any nodes in your k8s cluster. It only deletes the DJ app and App Mesh components created throughout this series of posts.

Make sure to leave the cluster intact if you plan on experimenting in the future with App Mesh on your own or throughout this series of posts.

Conclusion of Part 5

In this fifth part of the series, you learned how to enable existing microservices to run on App Mesh. In part 6, I demonstrate the true power of App Mesh by walking you through adding new versions of the metal and jazz services and demonstrating how to route between them.

 

 

PART 6: Deploying with the canary technique

In part 5 of this series, I demonstrated how to configure an existing microservices-based application (DJ App) to run on AWS App Mesh. In this post, I demonstrate how App Mesh can be used to deploy new versions of Amazon EKS-based microservices using the canary technique.

Prerequisites

Make sure that you’ve completed parts 1–5 of this series before running through the steps in this post.

Canary testing with v2

A canary release is a method of slowly exposing a new version of software. The theory is that by serving the new version of the software to a small percentage of requests, any problems only affect the small percentage of users before they’re discovered and rolled back.

So now, back to the DJ App scenario. Version 2 of the metal and jazz services is out, and they now include the city that each artist is from in the response. You’ll now release v2 versions of the metal and jazz services in a canary fashion using App Mesh. When you complete this process, requests to the metal and jazz services are distributed in a weighted fashion to both the v1 and v2 versions.

The following diagram shows the final (v2) seven-microservices-based application, running on an App Mesh service mesh.

 

 

To begin, roll out the v2 deployments, services, and virtual nodes with a single YAML file:

kubectl apply -nprod -f 5_canary/jazz_v2.yaml

The output should be similar to the following:

deployment.apps/jazz-v2 created
service/jazz-v2 created
virtualnode.appmesh.k8s.aws/jazz-v2 created

Next, update the jazz virtual service by modifying the route to spread traffic 50/50 across the two versions. Look at it now, and see that the current route points 100% to jazz-v1:

kubectl describe virtualservice jazz -nprod

Name:         jazz.prod.svc.cluster.local
Namespace:    prod
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:

{"apiVersion":"appmesh.k8s.aws/v1beta1","kind":"VirtualService","metadata":{"annotations":{},"name":"jazz.prod.svc.cluster.local","namesp...
API Version:  appmesh.k8s.aws/v1beta1
Kind:         VirtualService
Metadata:
  Creation Timestamp:  2019-03-23T00:15:08Z
  Generation:          3
  Resource Version:    2851527
  Self Link:           /apis/appmesh.k8s.aws/v1beta1/namespaces/prod/virtualservices/jazz.prod.svc.cluster.local
  UID:                 b76eed59-4d00-11e9-87e6-06dd752b96a6
Spec:
  Mesh Name:  dj-app
  Routes:
    Http:
      Action:
        Weighted Targets:
          Virtual Node Name:  jazz-v1
          Weight:             100
      Match:
        Prefix:  /
    Name:        jazz-route
  Virtual Router:
    Name:  jazz-router
Status:
  Conditions:
Events:  <none>

Apply the updated service definition:

kubectl apply -nprod -f 5_canary/jazz_service_update.yaml

When you describe the virtual service again, you see the updated route:

kubectl describe virtualservice jazz -nprod

Name:         jazz.prod.svc.cluster.local
Namespace:    prod
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:

{"apiVersion":"appmesh.k8s.aws/v1beta1","kind":"VirtualService","metadata":{"annotations":{},"name":"jazz.prod.svc.cluster.local","namesp...
API Version:  appmesh.k8s.aws/v1beta1
Kind:         VirtualService
Metadata:
  Creation Timestamp:  2019-03-23T00:15:08Z
  Generation:          4
  Resource Version:    2851774
  Self Link:           /apis/appmesh.k8s.aws/v1beta1/namespaces/prod/virtualservices/jazz.prod.svc.cluster.local
  UID:                 b76eed59-4d00-11e9-87e6-06dd752b96a6
Spec:
  Mesh Name:  dj-app
  Routes:
    Http:
      Action:
        Weighted Targets:
          Virtual Node Name:  jazz-v1
          Weight:             90
          Virtual Node Name:  jazz-v2
          Weight:             10
      Match:
        Prefix:  /
    Name:        jazz-route
  Virtual Router:
    Name:  jazz-router
Status:
  Conditions:
Events:  <none>

To deploy metal-v2, perform the same steps. Roll out the v2 deployments, services, and virtual nodes with a single YAML file:

kubectl apply -nprod -f 5_canary/metal_v2.yaml

The output should be similar to the following:

deployment.apps/metal-v2 created
service/metal-v2 created
virtualnode.appmesh.k8s.aws/metal-v2 created

Update the metal virtual service by modifying the route to spread traffic 50/50 across the two versions:

kubectl apply -nprod -f 5_canary/metal_service_update.yaml

When you describe the virtual service again, you see the updated route:

kubectl describe virtualservice metal -nprod

Name:         metal.prod.svc.cluster.local
Namespace:    prod
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:

{"apiVersion":"appmesh.k8s.aws/v1beta1","kind":"VirtualService","metadata":{"annotations":{},"name":"metal.prod.svc.cluster.local","names...
API Version:  appmesh.k8s.aws/v1beta1
Kind:         VirtualService
Metadata:
  Creation Timestamp:  2019-03-23T00:15:08Z
  Generation:          2
  Resource Version:    2852282
  Self Link:           /apis/appmesh.k8s.aws/v1beta1/namespaces/prod/virtualservices/metal.prod.svc.cluster.local
  UID:                 b784e824-4d00-11e9-87e6-06dd752b96a6
Spec:
  Mesh Name:  dj-app
  Routes:
    Http:
      Action:
        Weighted Targets:
          Virtual Node Name:  metal-v1
          Weight:             50
          Virtual Node Name:  metal-v2
          Weight:             50
      Match:
        Prefix:  /
    Name:        metal-route
  Virtual Router:
    Name:  metal-router
Status:
  Conditions:
Events:  <none>

Testing the v2 jazz and metal services

Now that the v2 services are deployed, it’s time to test them out. To test if it’s working as expected, exec into the DJ pod. To do that, get the name of your dj pod by listing all pods with the dj selector:

kubectl get pods -nprod -l app=dj

The output should be similar to the following:

NAME                  READY     STATUS    RESTARTS   AGE
dj-5b445fbdf4-8xkwp   1/1       Running   0          32s

Next, exec into the DJ pod by running the following command:

kubectl exec -nprod -it <your dj pod name> bash

The output should be similar to the following:

[email protected]:/usr/src/app#

Now that you have a root prompt into the DJ pod, issue a curl request to the metal virtual service:

while [ 1 ]; do curl http://metal.prod.svc.cluster.local:9080/;echo; done

The output should loop about 50/50 between the v1 and v2 versions of the metal service, similar to:

...
["Megadeth","Judas Priest"]
["Megadeth (Los Angeles, California)","Judas Priest (West Bromwich, England)"]
["Megadeth","Judas Priest"]
["Megadeth (Los Angeles, California)","Judas Priest (West Bromwich, England)"]
...

Press CTRL-C to stop the looping.

Next, perform a similar test, but against the jazz service. Issue a curl request to the jazz virtual service from within the dj pod:

while [ 1 ]; do curl http://jazz.prod.svc.cluster.local:9080/;echo; done

The output should loop about in a 90/10 ratio between the v1 and v2 versions of the jazz service, similar to the following:

...
["Astrud Gilberto","Miles Davis"]
["Astrud Gilberto","Miles Davis"]
["Astrud Gilberto","Miles Davis"]
["Astrud Gilberto (Bahia, Brazil)","Miles Davis (Alton, Illinois)"]
["Astrud Gilberto","Miles Davis"]
...

Press CTRL-C to stop the looping, and then type exit to exit the pod’s shell.

Cleaning up

When you’re done experimenting and want to delete all the resources created during this tutorial series, run the cleanup script via the following command line:

./cleanup.sh

This script does not delete any nodes in your k8s cluster. It only deletes the DJ app and App Mesh components created throughout this series of posts.

Make sure to leave the cluster intact if you plan on experimenting in the future with App Mesh on your own.

Conclusion of Part 6

In this final part of the series, I demonstrated how App Mesh can be used to roll out new microservice versions using the canary technique. Feel free to experiment further with the cluster by adding or removing microservices, and tweaking routing rules by changing weights and targets.

 

Geremy is a solutions architect at AWS.  He enjoys spending time with his family, BBQing, and breaking and fixing things around the house.

 

Securing credentials using AWS Secrets Manager with AWS Fargate

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/securing-credentials-using-aws-secrets-manager-with-aws-fargate/

This post is contributed by Massimo Re Ferre – Principal Developer Advocate, AWS Container Services.

Cloud security at AWS is the highest priority and the work that the Containers team is doing is a testament to that. A month ago, the team introduced an integration between AWS Secrets Manager and AWS Systems Manager Parameter Store with AWS Fargate tasks. Now, Fargate customers can easily consume secrets securely and parameters transparently from their own task definitions.

In this post, I show you an example of how to use Secrets Manager and Fargate integration to ensure that your secrets are never exposed in the wild.

Overview

AWS has engineered Fargate to be highly secure, with multiple, important security measures. One of these measures is ensuring that each Fargate task has its own isolation boundary and does not share the underlying kernel, CPU resources, memory resources, or elastic network interface with other tasks.

Another area of security focus is the Amazon VPC networking integration, which ensures that tasks can be protected the way that an Amazon EC2 instance can be protected from a networking perspective.

This specific announcement, however, is important in the context of our shared responsibility model. For example, DevOps teams building and running solutions on the AWS platform require proper tooling and functionalities to securely manage secrets, passwords, and sensitive parameters at runtime in their application code. Our job is to empower them with platform capabilities to do exactly that and make it as easy as possible.

Sometimes, in a rush to get things out the door quick, we have seen some users trading off some security aspects for agility, from embedding AWS credentials in source code pushed to public repositories all the way to embedding passwords in clear text in privately stored configuration files. We have solved this problem for developers consuming various AWS services by letting them assign IAM roles to Fargate tasks so that their AWS credentials are transparently handled.

This was useful for consuming native AWS services, but what about accessing services and applications that are outside of the scope of IAM roles and IAM policies? Often, the burden of having to deal with these credentials is pushed onto the developers and AWS users in general. It doesn’t have to be this way. Enter the Secrets Manager and Fargate integration!

Starting with Fargate platform version 1.3.0 and later, it is now possible for you to instruct Fargate tasks to securely grab secrets from Secrets Manager so that these secrets are never exposed in the wild—not even in private configuration files.

In addition, this frees you from the burden of having to implement the undifferentiated heavy lifting of securing these secrets. As a bonus, because Secrets Manager supports secrets rotation, you also gain an additional level of security with no additional effort.

Twitter matcher example

In this example, you create a Fargate task that reads a stream of data from Twitter, matches a particular pattern in the messages, and records some information about the tweet in a DynamoDB table.

To do this, use a Python Twitter library called Tweepy to read the stream from Twitter and the AWS Boto 3 Python library to write to Amazon DynamoDB.

The following diagram shows the high-level flow:

The objective of this example is to show a simple use case where you could use IAM roles assigned to tasks to consume AWS services (such as DynamoDB). It also includes consuming external services (such as Twitter), for which explicit non-AWS credentials need to be stored securely.

This is what happens when you launch the Fargate task:

  • The task starts and inherits the task execution role (1) and the task role (2) from IAM.
  • It queries Secrets Manager (3) using the credentials inherited by the task execution role to retrieve the Twitter credentials and pass them onto the task as variables.
  • It reads the stream from Twitter (4) using the credentials that are stored in Secrets Manager.
  • It matches the stream with a configurable pattern and writes to the DynamoDB table (5) using the credentials inherited by the task role.
  • It matches the stream with a configurable pattern and writes to the DynamoDB table (5) and logs to CloudWatch (6) using the credentials inherited by the task role.

As a side note, while for this specific example I use Twitter as an external service that requires sensitive credentials, any external service that has some form of authentication using passwords or keys is acceptable. Modify the Python script as needed to capture relevant data from your own service to write to the DynamoDB table.

Here are the solution steps:

  • Create the Python script
  • Create the Dockerfile
  • Build the container image
  • Create the image repository
  • Create the DynamoDB table
  • Store the credentials securely
  • Create the IAM roles and IAM policies for the Fargate task
  • Create the Fargate task
  • Clean up

Prerequisites

To be able to execute this exercise, you need an environment configured with the following dependencies:

You can also skip this configuration part and launch an AWS Cloud9 instance.

For the purpose of this example, I am working with the AWS CLI, configured to work with the us-west-2 Region. You can opt to work in a different Region. Make sure that the code examples in this post are modified accordingly.

In addition to the list of AWS prerequisites, you need a Twitter developer account. From there, create an application and use the credentials provided that allow you to connect to the Twitter APIs. We will use them later in the blog post when we will add them to AWS Secrets Manager.

Note: many of the commands suggested in this blog post use $REGION and $AWSACCOUNT in them. You can either set environmental variables that point to the region you want to deploy to and to your own account or you can replace those in the command itself with the region and account number. Also, there are some configuration files (json) that use the same patterns; for those the easiest option is to replace the $REGION and $AWSACCOUNT placeholders with the actual region and account number.

Create the Python script

This script is based on the Tweepy streaming example. I modified the script to include the Boto 3 library and instructions that write data to a DynamoDB table. In addition, the script prints the same data to standard output (to be captured in the container log).

This is the Python script:

from __future__ import absolute_import, print_function from tweepy.streaming import StreamListener from tweepy import OAuthHandler from tweepy import Stream import json import boto3 import os

# DynamoDB table name and Region dynamoDBTable=os.environ['DYNAMODBTABLE'] region_name=os.environ['AWSREGION'] # Filter variable (the word for which to filter in your stream) filter=os.environ['FILTER'] # Go to http://apps.twitter.com and create an app. # The consumer key and secret are generated for you after consumer_key=os.environ['CONSUMERKEY'] consumer_secret=os.environ['CONSUMERSECRETKEY'] # After the step above, you are redirected to your app page. # Create an access token under the "Your access token" section access_token=os.environ['ACCESSTOKEN'] access_token_secret=os.environ['ACCESSTOKENSECRET'] class StdOutListener(StreamListener): """ A listener handles tweets that are received from the stream. This is a basic listener that prints received tweets to stdout. """ def on_data(self, data): j = json.loads(data) tweetuser = j['user']['screen_name'] tweetdate = j['created_at'] tweettext = j['text'].encode('ascii', 'ignore').decode('ascii') print(tweetuser) print(tweetdate) print(tweettext) dynamodb = boto3.client('dynamodb',region_name) dynamodb.put_item(TableName=dynamoDBTable, Item={'user':{'S':tweetuser},'date':{'S':tweetdate},'text':{'S':tweettext}}) return True def on_error(self, status): print(status) if __name__ == '__main__': l = StdOutListener() auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) stream = Stream(auth, l) stream.filter(track=[filter]) 

Save this file in a directory and call it twitterstream.py.

This image requires seven parameters, which are clearly visible at the beginning of the script as system variables:

  • The name of the DynamoDB table
  • The Region where you are operating
  • The word or pattern for which to filter
  • The four keys to use to connect to the Twitter API services. Later, I explore how to pass these variables to the container, keeping in mind that some are more sensitive than others.

Create the Dockerfile

Now onto building the actual Docker image. To do that, create a Dockerfile that contains these instructions:

FROM amazonlinux:2
RUN yum install shadow-utils.x86_64 -y
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python get-pip.py
RUN pip install tweepy
RUN pip install boto3
COPY twitterstream.py .
RUN groupadd -r twitterstream && useradd -r -g twitterstream twitterstream
USER twitterstream
CMD ["python", "-u", "twitterstream.py"]

Save it as Dockerfile in the same directory with the twitterstream.py file.

Build the container image

Next, create the container image that you later instantiate as a Fargate task. Build the container image running the following command in the same directory:

docker build -t twitterstream:latest .

Don’t overlook the period (.) at the end of the command: it tells Docker to find the Dockerfile in the current directory.

You now have a local Docker image that, after being properly parameterized, can eventually read from the Twitter APIs and save data in a DynamoDB table.

Create the image repository

Now, store this image in a proper container registry. Create an Amazon ECR repository with the following command:

aws ecr create-repository --repository-name twitterstream --region $REGION

You should see something like the following code example as a result:

{
"repository": {
"registryId": "012345678910",
"repositoryName": "twitterstream",
"repositoryArn": "arn:aws:ecr:us-west-2:012345678910:repository/twitterstream",
"createdAt": 1554473020.0,
"repositoryUri": "012345678910.dkr.ecr.us-west-2.amazonaws.com/twitterstream"
}
}

Tag the local image with the following command:

docker tag twitterstream:latest $AWSACCOUNT.dkr.ecr.$REGION.amazonaws.com/twitterstream:latest

Make sure that you refer to the proper repository by using your AWS account ID and the Region to which you are deploying.

Grab an authorization token from AWS STS:

$(aws ecr get-login --no-include-email --region $REGION)

Now, push the local image to the ECR repository that you just created:

docker push $AWSACCOUNT.dkr.ecr.$REGION.amazonaws.com/twitterstream:latest

You should see something similar to the following result:

The push refers to repository [012345678910.dkr.ecr.us-west-2.amazonaws.com/twitterstream]
435b608431c6: Pushed
86ced7241182: Pushed
e76351c39944: Pushed
e29c13e097a8: Pushed
e55573178275: Pushed
1c729a602f80: Pushed
latest: digest: sha256:010c2446dc40ef2deaedb3f344f12cd916ba0e96877f59029d047417d6cb1f95 size: 1582

Now the image is safely stored in its ECR repository.

Create the DynamoDB table

Now turn to the backend DynamoDB table. This is where you store the extract of the Twitter stream being generated. Specifically, you store the user that published the Tweet, the date when the Tweet was published, and the text of the Tweet.

For the purpose of this example, create a table called twitterStream. This can be customized as one of the parameters that you have to pass to the Fargate task.

Run this command to create the table:

aws dynamodb create-table --region $REGION --table-name twitterStream \
                          --attribute-definitions AttributeName=user,AttributeType=S AttributeName=date,AttributeType=S \
                          --key-schema AttributeName=user,KeyType=HASH AttributeName=date,KeyType=RANGE \
                          --billing-mode PAY_PER_REQUEST

Store the credentials securely

As I hinted earlier, the Python script requires the Fargate task to pass some information as variables. You pass the table name, the Region, and the text to filter as standard task variables. Because this is not sensitive information, it can be shared without raising any concern.

However, other configurations are sensitive and should not be passed over in plaintext, like the Twitter API key. For this reason, use Secrets Manager to store that sensitive information and then read them within the Fargate task securely. This is what the newly announced integration between Fargate and Secrets Manager allows you to accomplish.

You can use the Secrets Manager console or the CLI to store sensitive data.

If you opt to use the console, choose other types of secrets. Under Plaintext, enter your consumer key. Under Select the encryption key, choose DefaultEncryptionKey, as shown in the following screenshot. For more information, see Creating a Basic Secret.

For this example, however, it is easier to use the AWS CLI to create the four secrets required. Run the following commands, but customize them with your own Twitter credentials:

aws secretsmanager create-secret --region $REGION --name CONSUMERKEY \
    --description "Twitter API Consumer Key" \
    --secret-string <your consumer key here> 
aws secretsmanager create-secret --region $REGION --name CONSUMERSECRETKEY \
    --description "Twitter API Consumer Secret Key" \
    --secret-string <your consumer secret key here> 
aws secretsmanager create-secret --region $REGION --name ACCESSTOKEN \
    --description "Twitter API Access Token" \
    --secret-string <your access token here> 
aws secretsmanager create-secret --region $REGION --name ACCESSTOKENSECRET \
    --description "Twitter API Access Token Secret" \
    --secret-string <your access token secret here> 

Each of those commands reports a message confirming that the secret has been created:

{
"VersionId": "7d950825-7aea-42c5-83bb-0c9b36555dbb",
"Name": "CONSUMERSECRETKEY",
"ARN": "arn:aws:secretsmanager:us-west-2:01234567890:secret:CONSUMERSECRETKEY-5D0YUM"
}

From now on, these four API keys no longer appear in any configuration.

The following screenshot shows the console after the commands have been executed:

Create the IAM roles and IAM policies for the Fargate task

To run the Python code properly, your Fargate task must have some specific capabilities. The Fargate task must be able to do the following:

  1. Pull the twitterstream container image (created earlier) from ECR.
  2. Retrieve the Twitter credentials (securely stored earlier) from Secrets Manager.
  3. Log in to a specific Amazon CloudWatch log group (logging is optional but a best practice).
  4. Write to the DynamoDB table (created earlier).

The first three capabilities should be attached to the ECS task execution role. The fourth should be attached to the ECS task role. For more information, see Amazon ECS Task Execution IAM Role.

In other words, the capabilities that are associated with the ECS agent and container instance need to be configured in the ECS task execution role. Capabilities that must be available from within the task itself are configured in the ECS task role.

First, create the two IAM roles that are eventually attached to the Fargate task.

Create a file called ecs-task-role-trust-policy.json with the following content (make sure you replace the $REGION, $AWSACCOUNT placeholders as well as the proper secrets ARNs):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Now, run the following commands to create the twitterstream-task-role role, as well as the twitterstream-task-execution-role:

aws iam create-role --region $REGION --role-name twitterstream-task-role --assume-role-policy-document file://ecs-task-role-trust-policy.json

aws iam create-role --region $REGION --role-name twitterstream-task-execution-role --assume-role-policy-document file://ecs-task-role-trust-policy.json

Next, create a JSON file that codifies the capabilities required for the ECS task role (twitterstream-task-role):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem"
            ],
            "Resource": [
                "arn:aws:dynamodb:$REGION:$AWSACCOUNT:table/twitterStream"
            ]
        }
    ]
}

Save the file as twitterstream-iam-policy-task-role.json.

Now, create a JSON file that codifies the capabilities required for the ECS task execution role (twitterstream-task-execution-role):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue",
                "kms:Decrypt"
            ],
            "Resource": [
                "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERKEY-XXXXXX",
                "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERSECRETKEY-XXXXXX",
                "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:ACCESSTOKEN-XXXXXX",
                "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:ACCESSTOKENSECRET-XXXXXX"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

Save the file as twitterstream-iam-policy-task-execution-role.json.

The following two commands create IAM policy documents and associate them with the IAM roles that you created earlier:

aws iam put-role-policy --region $REGION --role-name twitterstream-task-role --policy-name twitterstream-iam-policy-task-role --policy-document file://twitterstream-iam-policy-task-role.json

aws iam put-role-policy --region $REGION --role-name twitterstream-task-execution-role --policy-name twitterstream-iam-policy-task-execution-role --policy-document file://twitterstream-iam-policy-task-execution-role.json

Create the Fargate task

Now it’s time to tie everything together. As a recap, so far you have:

  • Created the container image that contains your Python code.
  • Created the DynamoDB table where the code is going to save the extract from the Twitter stream.
  • Securely stored the Twitter API credentials in Secrets Manager.
  • Created IAM roles with specific IAM policies that can write to DynamoDB and read from Secrets Manager (among other things).

Now you can tie everything together by creating a Fargate task that executes the container image. To do so, create a file called twitterstream-task.json and populate it with the following configuration:

{
    "family": "twitterstream", 
    "networkMode": "awsvpc", 
    "executionRoleArn": "arn:aws:iam::$AWSACCOUNT:role/twitterstream-task-execution-role",
    "taskRoleArn": "arn:aws:iam::$AWSACCOUNT:role/twitterstream-task-role",
    "containerDefinitions": [
        {
            "name": "twitterstream", 
            "image": "$AWSACCOUNT.dkr.ecr.$REGION.amazonaws.com/twitterstream:latest", 
            "essential": true,
            "environment": [
                {
                    "name": "DYNAMODBTABLE",
                    "value": "twitterStream"
                },
                {
                    "name": "AWSREGION",
                    "value": "$REGION"
                },                
                {
                    "name": "FILTER",
                    "value": "Cloud Computing"
                }
            ],    
            "secrets": [
                {
                    "name": "CONSUMERKEY",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERKEY-XXXXXX"
                },
                {
                    "name": "CONSUMERSECRETKEY",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERSECRETKEY-XXXXXX"
                },
                {
                    "name": "ACCESSTOKEN",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:ACCESSTOKEN-XXXXXX"
                },
                {
                    "name": "ACCESSTOKENSECRET",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:ACCESSTOKENSECRET-XXXXXX"
                }
            ],
            "logConfiguration": {
                    "logDriver": "awslogs",
                    "options": {
                            "awslogs-group": "twitterstream",
                            "awslogs-region": "$REGION",
                            "awslogs-stream-prefix": "twitterstream"
                    }
            }
        }
    ], 
    "requiresCompatibilities": [
        "FARGATE"
    ], 
    "cpu": "256", 
    "memory": "512"
}

To tweak the search string, change the value of the FILTER variable (currently set to “Cloud Computing”).

The Twitter API credentials are never exposed in clear text in these configuration files. There is only a reference to the Amazon Resource Names (ARNs) of the secret names. For example, this is the system variable CONSUMERKEY in the Fargate task configuration:

"secrets": [
                {
                    "name": "CONSUMERKEY",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERKEY-XXXXXX"
                }

This directive asks the ECS agent running on the Fargate instance (that has assumed the specified IAM execution role) to do the following:

  • Connect to Secrets Manager.
  • Get the secret securely.
  • Assign its value to the CONSUMERKEY system variable to be made available to the Fargate task.

Register this task by running the following command:

aws ecs register-task-definition --region $REGION --cli-input-json file://twitterstream-task.json

In preparation to run the task, create the CloudWatch log group with the following command:

aws logs create-log-group --log-group-name twitterstream --region $REGION

If you don’t create the log group upfront, the task fails to start.

Create the ECS cluster

The last step before launching the Fargate task is creating an ECS cluster. An ECS cluster has two distinct dimensions:

  • The EC2 dimension, where the compute capacity is managed by the customer as ECS container instances)
  • The Fargate dimension, where the compute capacity is managed transparently by AWS.

For this example, you use the Fargate dimension, so you are essentially using the ECS cluster as a logical namespace.

Run the following command to create a cluster called twitterstream_cluster (change the name as needed). If you have a default cluster already created in your Region of choice, you can use that, too.

aws ecs create-cluster --cluster-name "twitterstream_cluster" --region $REGION

Now launch the task in the ECS cluster just created (in the us-west-2 Region) with a Fargate launch type. Run the following command:

aws ecs run-task --region $REGION \
  --cluster "twitterstream_cluster" \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=["subnet-6a88e013","subnet-6a88e013"],securityGroups=["sg-7b45660a"],assignPublicIp=ENABLED}" \
  --task-definition twitterstream:1

A few things to pay attention to with this command:

  • If you created more than one revision of the task (by re-running the aws ecs register-task-definition command), make sure to run the aws ecs run-task command with the proper revision number at the end.
  • Customize the network section of the command for your own environment:
    • Use the default security group in your VPC, as the Fargate task only needs outbound connectivity.
    • Use two public subnets in which to start the Fargate task.

The Fargate task comes up in a few seconds and you can see it from the ECS console, as shown in the following screenshot:

Similarly, the DynamoDB table starts being populated with the information collected by the script running in the task, as shown in the following screenshot:

Finally, the Fargate task logs all the activities in the CloudWatch Log group, as shown in the following screenshot:

The log may take a few minutes to populate and be consolidated in CloudWatch.

Clean up

Now that you have completed the walkthrough, you can tear down all the resources that you created to avoid incurring future charges.

First, stop the ECS task that you started:

aws ecs stop-task --cluster twitterstream_cluster --region $REGION --task 4553111a-748e-4f6f-beb5-f95242235fb5

Your task number is different. You can grab it either from the ECS console or from the AWS CLI. This is how you read it from the AWS CLI:

aws ecs list-tasks --cluster twitterstream_cluster --family twitterstream --region $REGION  
{
"taskArns": [
"arn:aws:ecs:us-west-2:693935722839:task/4553111a-748e-4f6f-beb5-f95242235fb5 "
]
}

Then, delete the ECS cluster that you created:

aws ecs delete-cluster --cluster "twitterstream_cluster" --region $REGION

Next, delete the CloudWatch log group:

aws logs delete-log-group --log-group-name twitterstream --region $REGION

The console provides a fast workflow to delete the IAM roles. In the IAM console, choose Roles and filter your search for twitter. You should see the two roles that you created:

Select the two roles and choose Delete role.

Cleaning up the secrets created is straightforward. Run a delete-secret command for each one:

aws secretsmanager delete-secret --region $REGION --secret-id CONSUMERKEY
aws secretsmanager delete-secret --region $REGION --secret-id CONSUMERSECRETKEY
aws secretsmanager delete-secret --region $REGION --secret-id ACCESSTOKEN
aws secretsmanager delete-secret --region $REGION --secret-id ACCESSTOKENSECRET

The next step is to delete the DynamoDB table:

aws dynamodb delete-table --table-name twitterStream --region $REGION

The last step is to delete the ECR repository. By default, you cannot delete a repository that still has container images in it. To address that, add the –force directive:

aws ecr delete-repository --region $REGION --repository-name twitterstream --force

You can de-register the twitterstream task definition by following this procedure in the ECS console. The task definitions remain inactive but visible in the system.

With this, you have deleted all the resources that you created.

Conclusion

In this post, I demonstrated how Fargate can interact with Secrets Manager to retrieve sensitive data (for example, Twitter API credentials). You can securely make the sensitive data available to the code running in the container inside the Fargate task.

I also demonstrated how a Fargate task with a specific IAM role can access other AWS services (for example, DynamoDB).

 

Improving and securing your game-binaries distribution at scale

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/improving-and-securing-your-game-binaries-distribution-at-scale/

This post is contributed by Yahav Biran | Sr. Solutions Architect, AWS and Scott Selinger | Associate Solutions Architect, AWS 

One of the challenges that game publishers face when employing CI/CD processes is the distribution of updated game binaries in a scalable, secure, and cost-effective way. Continuous integration and continuous deployment (CI/CD) processes enable game publishers to improve games throughout their lifecycle.

Often, CI/CD jobs contain minor changes that cause the CI/CD processes to push a full set of game binaries over the internet. This is a suboptimal approach. It negatively affects the cost of development network resources, customer network resources (output and input bandwidth), and the time it takes for a game update to propagate.

This post proposes a method of optimizing the game integration and deployments. Specifically, this method improves the distribution of updated game binaries to various targets, such as game-server farms. The proposed mechanism also adds to the security model designed to include progressive layers, starting from the Amazon EC2 instance that runs the game server. It also improves security of the game binaries, the game assets, and the monitoring of the game server deployments across several AWS Regions.

Why CI/CD in gaming is hard today

Game server binaries are usually a native application that includes binaries like graphic, sound, network, and physics assets, as well as scripts and media files. Game servers are usually developed with game engines like Unreal, Amazon Lumberyard, and Unity. Game binaries typically take up tens of gigabytes. However, because game developer teams modify only a few tens of kilobytes every day, frequent distribution of a full set of binaries is wasteful.

For a standard global game deployment, distributing game binaries requires compressing the entire binaries set and transferring the compressed version to destinations, then decompressing it upon arrival. You can optimize the process by decoupling the various layers, pushing and deploying them individually.

In both cases, the continuous deployment process might be slow due to the compression and transfer durations. Also, distributing the image binaries incurs unnecessary data transfer costs, since data is duplicated. Other game-binary distribution methods may require the game publisher’s DevOps teams to install and maintain custom caching mechanisms.

This post demonstrates an optimal method for distributing game server updates. The solution uses containerized images stored in Amazon ECR and deployed using Amazon ECS or Amazon EKS to shorten the distribution duration and reduce network usage.

How can containers help?

Dockerized game binaries enable standard caching with no implementation from the game publisher. Dockerized game binaries allow game publishers to stage their continuous build process in two ways:

  • To rebuild only the layer that was updated in a particular build process and uses the other cached layers.
  • To reassemble both packages into a deployable game server.

The use of ECR with either ECS or EKS takes care of the last mile deployment to the Docker container host.

Larger application binaries mean longer application loading times. To reduce the overall application initialization time, I decouple the deployment of the binaries and media files to allow the application to update faster. For example, updates in the application media files do not require the replication of the engine binaries or media files. This is achievable if the application binaries can be deployed in a separate directory structure. For example:

/opt/local/engine

/opt/local/engine-media

/opt/local/app

/opt/local/app-media

Containerized game servers deployment on EKS

The application server can be deployed as a single Kubernetes pod with multiple containers. The engine media (/opt/local/engine-media), the application (/opt/local/app), and the application media (/opt/local/app-media) spawn as Kubernetes initContainers and the engine binary (/opt/local/engine) runs as the main container.

apiVersion: v1
kind: Pod
metadata:
  name: my-game-app-pod
  labels:
    app: my-game-app
volumes:
      - name: engine-media-volume
          emptyDir: {}
      - name: app-volume
          emptyDir: {}
      - name: app-media-volume
          emptyDir: {}
      initContainers:
        - name: app
          image: the-app- image
          imagePullPolicy: Always
          command:
            - "sh"
            - "-c"
            - "cp /* /opt/local/engine-media"
          volumeMounts:
            - name: engine-media-volume
              mountPath: /opt/local/engine-media
        - name: engine-media
          image: the-engine-media-image
          imagePullPolicy: Always
          command:
            - "sh"
            - "-c"
            - "cp /* /opt/local/app"
          volumeMounts:
            - name: app-volume
              mountPath: /opt/local/app
        - name: app-media
          image: the-app-media-image
          imagePullPolicy: Always
          command:
            - "sh"
            - "-c"
            - "cp /* /opt/local/app-media"
          volumeMounts:
            - name: app-media-volume
              mountPath: /opt/local/app-media
spec:
  containers:
  - name: the-engine
    image: the-engine-image
    imagePullPolicy: Always
    volumeMounts:
       - name: engine-media-volume
         mountPath: /opt/local/engine-media
       - name: app-volume
         mountPath: /opt/local/app
       - name: app-media-volume
         mountPath: /opt/local/app-media
    command: ['sh', '-c', '/opt/local/engine/start.sh']

Applying multi-stage game binaries builds

In this post, I use Docker multi-stage builds for containerizing the game asset builds. I use AWS CodeBuild to manage the build and to deploy the updates of game engines like Amazon Lumberyard as ready-to-play dedicated game servers.

Using this method, frequent changes in the game binaries require less than 1% of the data transfer typically required by full image replication to the nodes that run the game-server instances. This results in significant improvements in build and integration time.

I provide a deployment example for Amazon Lumberyard Multiplayer Sample that is deployed to an EKS cluster, but this can also be done using different container orchestration technology and different game engines. I also show that the image being deployed as a game-server instance is always the latest image, which allows centralized control of the code to be scheduled upon distribution.

This example shows an update of only 50 MB of game assets, whereas the full game-server binary is 3.1 GB. With only 1.5% of the content being updated, that speeds up the build process by 90% compared to non-containerized game binaries.

For security with EKS, apply the imagePullPolicy: Always option as part of the Kubernetes best practice container images deployment option. This option ensures that the latest image is pulled every time that the pod is started, thus deploying images from a single source in ECR, in this case.

Example setup

  • Read through the following sample, a multiplayer game sample, and see how to build and structure multiplayer games to employ the various features of the GridMate networking library.
  • Create an AWS CodeCommit or GitHub repository (multiplayersample-lmbr) that includes the game engine binaries, the game assets (.pak, .cfg and more), AWS CodeBuild specs, and EKS deployment specs.
  • Create a CodeBuild project that points to the CodeCommit repo. The build image uses aws/codebuild/docker:18.09.0: the built-in image maintained by CodeBuild configured with 3 GB of memory and two vCPUs. The compute allocated for build capacity can be modified for cost and build time tradeoff.
  • Create an EKS cluster designated as a staging or an integration environment for the game title. In this case, it’s multiplayersample.

The binaries build Git repository

The Git repository is composed of five core components ordered by their size:

  • The game engine binaries (for example, BinLinux64.Dedicated.tar.gz). This is the compressed version of the game engine artifacts that are not updated regularly, hence they are deployed as a compressed file. The maintenance of this file is usually done by a different team than the developers working on the game title.
  • The game binaries (for example, MultiplayerSample_pc_Paks_Dedicated). This directory is maintained by the game development team and managed as a standard multi-branch repository. The artifacts under this directory get updated on a daily or weekly basis, depending on the game development plan.
  • The build-related specifications (for example, buildspec.yml  and Dockerfile). These files specify the build process. For simplicity, I only included the Docker build process to convey the speed of continuous integration. The process can be easily extended to include the game compilation and linked process as well.
  • The Docker artifacts for containerizing the game engine and the game binaries (for example, start.sh and start.py). These scripts usually are maintained by the game DevOps teams and updated outside of the regular game development plan. More details about these scripts can be found in a sample that describes how to deploy a game-server in Amazon EKS.
  • The deployment specifications (for example, eks-spec) specify the Kubernetes game-server deployment specs. This is for reference only, since the CD process usually runs in a separate set of resources like staging EKS clusters, which are owned and maintained by a different team.

The game build process

The build process starts with any Git push event on the Git repository. The build process includes three core phases denoted by pre_build, buildand post_build in multiplayersample-lmbr/buildspec.yml

  1. The pre_build phase unzips the game-engine binaries and logs in to the container registry (Amazon ECR) to prepare.
  2. The buildphase executes the docker build command that includes the multi-stage build.
    • The Dockerfile spec file describes the multi-stage image build process. It starts by adding the game-engine binaries to the Linux OS, ubuntu:18.04 in this example.
    • FROM ubuntu:18.04
    • ADD BinLinux64.Dedicated.tar /
    • It continues by adding the necessary packages to the game server (for example, ec2-metadata, boto3, libc, and Python) and the necessary scripts for controlling the game server runtime in EKS. These packages are only required for the CI/CD process. Therefore, they are only added in the CI/CD process. This enables a clean decoupling between the necessary packages for development, integration, and deployment, and simplifies the process for both teams.
    • RUN apt-get install -y python python-pip
    • RUN apt-get install -y net-tools vim
    • RUN apt-get install -y libc++-dev
    • RUN pip install mcstatus ec2-metadata boto3
    • ADD start.sh /start.sh
    • ADD start.py /start.py
    • The second part is to copy the game engine from the previous stage --from=0 to the next build stage. In this case, you copy the game engine binaries with the two COPY Docker directives.
    • COPY --from=0 /BinLinux64.Dedicated/* /BinLinux64.Dedicated/
    • COPY --from=0 /BinLinux64.Dedicated/qtlibs /BinLinux64.Dedicated/qtlibs/
    • Finally, the game binaries are added as a separate layer on top of the game-engine layers, which concludes the build. It’s expected that constant daily changes are made to this layer, which is why it is packaged separately. If your game includes other abstractions, you can break this step into several discrete Docker image layers.
    • ADD MultiplayerSample_pc_Paks_Dedicated /BinLinux64.Dedicated/
  3. The post_build phase pushes the game Docker image to the centralized container registry for further deployment to the various regional EKS clusters. In this phase, tag and push the new image to the designated container registry in ECR.

- docker tag $IMAGE_REPO_NAME:$IMAGE_TAG

$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG

docker push

$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG

The game deployment process in EKS

At this point, you’ve pushed the updated image to the designated container registry in ECR (/$IMAGE_REPO_NAME:$IMAGE_TAG). This image is scheduled as a game server in an EKS cluster as game-server Kubernetes deployment, as described in the sample.

In this example, I use  imagePullPolicy: Always.


containers:
…
        image: /$IMAGE_REPO_NAME:$IMAGE_TAG/multiplayersample-build
        imagePullPolicy: Always
        name: multiplayersample
…

By using imagePullPolicy, you ensure that no one can circumvent Amazon ECR security. You can securely make ECR the single source of truth with regards to scheduled binaries. However, ECR to the worker nodes via kubelet, the node agent. Given the size of a whole image combined with the frequency with which it is pulled, that would amount to a significant additional cost to your project.

However, Docker layers allow you to update only the layers that were modified, preventing a whole image update. Also, they enable secure image distribution. In this example, only the layer MultiplayerSample_pc_Paks_Dedicated is updated.

Proposed CI/CD process

The following diagram shows an example end-to-end architecture of a full-scale game-server deployment using EKS as the orchestration system, ECR as the container registry, and CodeBuild as the build engine.

Game developers merge changes to the Git repository that include both the preconfigured game-engine binaries and the game artifacts. Upon merge events, CodeBuild builds a multistage game-server image that is pushed to a centralized container registry hosted by ECR. At this point, DevOps teams in different Regions continuously schedule the image as a game server, pulling only the updated layer in the game server image. This keeps the entire game-server fleet running the same game binaries set, making for a secure deployment.

 

Try it out

I published two examples to guide you through the process of building an Amazon EKS cluster and deploying a containerized game server with large binaries.

Conclusion

Adopting CI/CD in game development improves the software development lifecycle by continuously deploying quality-based updated game binaries. CI/CD in game development is usually hindered by the cost of distributing large binaries, in particular, by cross-regional deployments.

Non-containerized paradigms require deployment of the full set of binaries, which is an expensive and time-consuming task. Containerized game-server binaries with AWS build tools and Amazon EKS-based regional clusters of game servers enable secure and cost-effective distribution of large binary sets to enable increased agility in today’s game development.

In this post, I demonstrated a reduction of more than 90% of the network traffic required by implementing an effective CI/CD system in a large-scale deployment of multiplayer game servers.

Integrating AWS X-Ray with AWS App Mesh

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/integrating-aws-x-ray-with-aws-app-mesh/

This post is contributed by Lulu Zhao | Software Development Engineer II, AWS

 

AWS X-Ray helps developers and DevOps engineers quickly understand how an application and its underlying services are performing. When it’s integrated with AWS App Mesh, the combination makes for a powerful analytical tool.

X-Ray helps to identify and troubleshoot the root causes of errors and performance issues. It’s capable of analyzing and debugging distributed applications, including those based on a microservices architecture. It offers insights into the impact and reach of errors and performance problems.

In this post, I demonstrate how to integrate it with App Mesh.

Overview

App Mesh is a service mesh based on the Envoy proxy that makes it easy to monitor and control microservices. App Mesh standardizes how your microservices communicate, giving you end-to-end visibility and helping to ensure high application availability.

With App Mesh, it’s easy to maintain consistent visibility and network traffic control for services built across multiple types of compute infrastructure. App Mesh configures each service to export monitoring data and implements consistent communications control logic across your application.

A service mesh is like a communication layer for microservices. All communication between services happens through the mesh. Customers use App Mesh to configure a service mesh that contains virtual services, virtual nodes, virtual routes, and corresponding routes.

However, it’s challenging to visualize the way that request traffic flows through the service mesh while attempting to identify latency and other types of performance issues. This is particularly true as the number of microservices increases.

It’s in exactly this area where X-Ray excels. To show a detailed workflow inside a service mesh, I implemented a tracing extension called X-Ray tracer inside Envoy. With it, I ensure that I’m tracing all inbound and outbound calls that are routed through Envoy.

Traffic routing with color app

The following example shows how X-Ray works with App Mesh. I used the Color App, a simple demo application, to showcase traffic routing.

This app has two Go applications that are included in the AWS X-Ray Go SDK: color-gateway and color-teller. The color-gateway application is exposed to external clients and responds to http://service-name:port/color, which retrieves color from color-teller. I deployed color-app using Amazon ECS. This image illustrates how color-gateway routes traffic into a virtual router and then into separate nodes using color-teller.

 

The following image shows client interactions with App Mesh in an X-Ray service map after requests have been made to the color-gateway and to color-teller.

Integration

There are two types of service nodes:

  • AWS::AppMesh::Proxy is generated by the X-Ray tracing extension inside Envoy.
  • AWS::ECS::Container is generated by the AWS X-Ray Go SDK.

The service graph arrows show the request workflow, which you may find helpful as you try to understand the relationships between services.

To send Envoy-generated segments into X-Ray, install the X-Ray daemon. The following code example shows the ECS task definition used to install the daemon into the container.

{
    "name": "xray-daemon",

    "image": "amazon/aws-xray-daemon",

    "user": "1337",

    "essential": true,

    "cpu": 32,

    "memoryReservation": 256,

    "portMappings": [

        {

            "hostPort": 2000,

            "containerPort": 2000,

            "protocol": "udp"

         }

After the Color app successfully launched, I made a request to color-gateway to fetch a color.

  • First, the Envoy proxy appmesh/colorgateway-vn in front of default-gateway received the request and routed it to the server default-gateway.
  • Then, default-gateway made a request to server default-colorteller-white to retrieve the color.
  • Instead of directly calling the color-teller server, the request went to the default-gateway Envoy proxy and the proxy routed the call to color-teller.

That’s the advantage of using the Envoy proxy. Envoy is a self-contained process that is designed to run in parallel with all application servers. All of the Envoy proxies form a transparent communication mesh through which each application sends and receives messages to and from localhost while remaining unaware of the broader network topology.

For App Mesh integration, the X-Ray tracer records the mesh name and virtual node name values and injects them into the segment JSON document. Here is an example:

“aws”: {
	“app_mesh”: {
		“mesh_name”: “appmesh”,
		“virtual_node_name”: “colorgateway-vn”
	}
},

To enable X-Ray tracing through App Mesh inside Envoy, you must set two environment variable configurations:

  • ENABLE_ENVOY_XRAY_TRACING
  • XRAY_DAEMON_PORT

The first one enables X-Ray tracing using 127.0.0.1:2000 as the default daemon endpoint to which generated segments are sent. If the daemon you installed listens on a different port, you can specify a port value to override the default X-Ray daemon port by using the second configuration.

Conclusion

Currently, AWS X-Ray supports SDKs written in multiple languages (including Java, Python, Go, .NET, and .NET Core, Node.js, and Ruby) to help you implement your services. For more information, see Getting Started with AWS X-Ray.

Predictive CPU isolation of containers at Netflix

Post Syndicated from Netflix Technology Blog original https://medium.com/netflix-techblog/predictive-cpu-isolation-of-containers-at-netflix-91f014d856c7?source=rss----2615bd06b42e---4

By Benoit Rostykus, Gabriel Hartmann

Noisy Neighbors

We’ve all had noisy neighbors at one point in our life. Whether it’s at a cafe or through a wall of an apartment, it is always disruptive. The need for good manners in shared spaces turns out to be important not just for people, but for your Docker containers too.

When you’re running in the cloud your containers are in a shared space; in particular they share the CPU’s memory hierarchy of the host instance.

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. However, the key insight here is that these caches are partially shared among the CPUs, which means that perfect performance isolation of co-hosted containers is not possible. If the container running on the core next to your container suddenly decides to fetch a lot of data from the RAM, it will inevitably result in more cache misses for you (and hence a potential performance degradation).

Linux to the rescue?

Traditionally it has been the responsibility of the operating system’s task scheduler to mitigate this performance isolation problem. In Linux, the current mainstream solution is CFS (Completely Fair Scheduler). Its goal is to assign running processes to time slices of the CPU in a “fair” way.

CFS is widely used and therefore well tested and Linux machines around the world run with reasonable performance. So why mess with it? As it turns out, for the large majority of Netflix use cases, its performance is far from optimal. Titus is Netflix’s container platform. Every month, we run millions of containers on thousands of machines on Titus, serving hundreds of internal applications and customers. These applications range from critical low-latency services powering our customer-facing video streaming service, to batch jobs for encoding or machine learning. Maintaining performance isolation between these different applications is critical to ensuring a good experience for internal and external customers.

We were able to meaningfully improve both the predictability and performance of these containers by taking some of the CPU isolation responsibility away from the operating system and moving towards a data driven solution involving combinatorial optimization and machine learning.

The idea

CFS operates by very frequently (every few microseconds) applying a set of heuristics which encapsulate a general concept of best practices around CPU hardware use.

Instead, what if we reduced the frequency of interventions (to every few seconds) but made better data-driven decisions regarding the allocation of processes to compute resources in order to minimize collocation noise?

One traditional way of mitigating CFS performance issues is for application owners to manually cooperate through the use of core pinning or nice values. However, we can automatically make better global decisions by detecting collocation opportunities based on actual usage information. For example if we predict that container A is going to become very CPU intensive soon, then maybe we should run it on a different NUMA socket than container B which is very latency-sensitive. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Optimizing placements through combinatorial optimization

What the OS task scheduler is doing is essentially solving a resource allocation problem: I have X threads to run but only Y CPUs available, how do I allocate the threads to the CPUs to give the illusion of concurrency?

As an illustrative example, let’s consider a toy instance of 16 hyperthreads. It has 8 physical hyperthreaded cores, split on 2 NUMA sockets. Each hyperthread shares its L1 and L2 caches with its neighbor, and shares its L3 cache with the 7 other hyperthreads on the socket:

If we want to run container A on 4 threads and container B on 2 threads on this instance, we can look at what “bad” and “good” placement decisions look like:

The first placement is intuitively bad because we potentially create collocation noise between A and B on the first 2 cores through their L1/L2 caches, and on the socket through the L3 cache while leaving a whole socket empty. The second placement looks better as each CPU is given its own L1/L2 caches, and we make better use of the two L3 caches available.

Resource allocation problems can be efficiently solved through a branch of mathematics called combinatorial optimization, used for example for airline scheduling or logistics problems.

We formulate the problem as a Mixed Integer Program (MIP). Given a set of K containers each requesting a specific number of CPUs on an instance possessing d threads, the goal is to find a binary assignment matrix M of size (d, K) such that each container gets the number of CPUs it requested. The loss function and constraints contain various terms expressing a priori good placement decisions such as:

  • avoid spreading a container across multiple NUMA sockets (to avoid potentially slow cross-sockets memory accesses or page migrations)
  • don’t use hyper-threads unless you need to (to reduce L1/L2 thrashing)
  • try to even out pressure on the L3 caches (based on potential measurements of the container’s hardware usage)
  • don’t shuffle things too much between placement decisions

Given the low-latency and low-compute requirements of the system (we certainly don’t want to spend too many CPU cycles figuring out how containers should use CPU cycles!), can we actually make this work in practice?

Implementation

We decided to implement the strategy through Linux cgroups since they are fully supported by CFS, by modifying each container’s cpuset cgroup based on the desired mapping of containers to hyper-threads. In this way a user-space process defines a “fence” within which CFS operates for each container. In effect we remove the impact of CFS heuristics on performance isolation while retaining its core scheduling capabilities.

This user-space process is a Titus subsystem called titus-isolate which works as follows. On each instance, we define three events that trigger a placement optimization:

  • add: A new container was allocated by the Titus scheduler to this instance and needs to be run
  • remove: A running container just finished
  • rebalance: CPU usage may have changed in the containers so we should reevaluate our placement decisions

We periodically enqueue rebalance events when no other event has recently triggered a placement decision.

Every time a placement event is triggered, titus-isolate queries a remote optimization service (running as a Titus service, hence also isolating itself… turtles all the way down) which solves the container-to-threads placement problem.

This service then queries a local GBRT model (retrained every couple of hours on weeks of data collected from the whole Titus platform) predicting the P95 CPU usage of each container in the coming 10 minutes (conditional quantile regression). The model contains both contextual features (metadata associated with the container: who launched it, image, memory and network configuration, app name…) as well as time-series features extracted from the last hour of historical CPU usage of the container collected regularly by the host from the kernel CPU accounting controller.

The predictions are then fed into a MIP which is solved on the fly. We’re using cvxpy as a nice generic symbolic front-end to represent the problem which can then be fed into various open-source or proprietary MIP solver backends. Since MIPs are NP-hard, some care needs to be taken. We impose a hard time budget to the solver to drive the branch-and-cut strategy into a low-latency regime, with guardrails around the MIP gap to control overall quality of the solution found.

The service then returns the placement decision to the host, which executes it by modifying the cpusets of the containers.

For example, at any moment in time, an r4.16xlarge with 64 logical CPUs might look like this (the color scale represents CPU usage):

Results

The first version of the system led to surprisingly good results. We reduced overall runtime of batch jobs by multiple percent on average while most importantly reducing job runtime variance (a reasonable proxy for isolation), as illustrated below. Here we see a real-world batch job runtime distribution with and without improved isolation:

Notice how we mostly made the problem of long-running outliers disappear. The right-tail of unlucky noisy-neighbors runs is now gone.

For services, the gains were even more impressive. One specific Titus middleware service serving the Netflix streaming service saw a capacity reduction of 13% (a decrease of more than 1000 containers) needed at peak traffic to serve the same load with the required P99 latency SLA! We also noticed a sharp reduction of the CPU usage on the machines, since far less time was spent by the kernel in cache invalidation logic. Our containers are now more predictable, faster and the machine is less used! It’s not often that you can have your cake and eat it too.

Next Steps

We are excited with the strides made so far in this area. We are working on multiple fronts to extend the solution presented here.

We want to extend the system to support CPU oversubscription. Most of our users have challenges knowing how to properly size the numbers of CPUs their app needs. And in fact, this number varies during the lifetime of their containers. Since we already predict future CPU usage of the containers, we want to automatically detect and reclaim unused resources. For example, one could decide to auto-assign a specific container to a shared cgroup of underutilized CPUs, to better improve overall isolation and machine utilization, if we can detect the sensitivity threshold of our users along the various axes of the following graph.

We also want to leverage kernel PMC events to more directly optimize for minimal cache noise. One possible avenue is to use the Intel based bare metal instances recently introduced by Amazon that allow deep access to performance analysis tools. We could then feed this information directly into the optimization engine to move towards a more supervised learning approach. This would require a proper continuous randomization of the placements to collect unbiased counterfactuals, so we could build some sort of interference model (“what would be the performance of container A in the next minute, if I were to colocate one of its threads on the same core as container B, knowing that there’s also C running on the same socket right now?”).

Conclusion

If any of this piques your interest, reach out to us! We’re looking for ML engineers to help us push the boundary of containers performance and “machine learning for systems” and systems engineers for our core infrastructure and compute platform.


Predictive CPU isolation of containers at Netflix was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Enabling DNS resolution for Amazon EKS cluster endpoints

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/enabling-dns-resolution-for-amazon-eks-cluster-endpoints/

This post is contributed by Jeremy Cowan – Sr. Container Specialist Solution Architect, AWS

By default, when you create an Amazon EKS cluster, the Kubernetes cluster endpoint is public. While it is accessible from the internet, access to the Kubernetes cluster endpoint is restricted by AWS Identity and Access Management (IAM) and Kubernetes role-based access control (RBAC) policies.

At some point, you may need to configure the Kubernetes cluster endpoint to be private.  Changing your Kubernetes cluster endpoint access from public to private completely disables public access such that it can no longer be accessed from the internet.

In fact, a cluster that has been configured to only allow private access can only be accessed from the following:

  • The VPC where the worker nodes reside
  • Networks that have been peered with that VPC
  • A network that has been connected to AWS through AWS Direct Connect (DX) or a virtual private network (VPN)

However, the name of the Kubernetes cluster endpoint is only resolvable from the worker node VPC, for the following reasons:

  • The Amazon Route 53 private hosted zone that is created for the endpoint is only associated with the worker node VPC.
  • The private hosted zone is created in a separate AWS managed account and cannot be altered.

For more information, see Working with Private Hosted Zones.

This post explains how to use Route 53 inbound and outbound endpoints to resolve the name of the cluster endpoints when a request originates outside the worker node VPC.

Route 53 inbound and outbound endpoints

Route 53 inbound and outbound endpoints allow you to simplify the configuration of hybrid DNS.  DNS queries for AWS resources are resolved by Route 53 resolvers and DNS queries for on-premises resources are forwarded to an on-premises DNS resolver. However, you can also use these Route 53 endpoints to resolve the names of endpoints that are only resolvable from within a specific VPC, like the EKS cluster endpoint.

The following diagrams show how the solution works:

  • A Route 53 inbound endpoint is created in each worker node VPC and associated with a security group that allows inbound DNS requests from external subnets/CIDR ranges.
  • If the requests for the Kubernetes cluster endpoint originate from a peered VPC, those requests must be routed through a Route 53 outbound endpoint.
  • The outbound endpoint, like the inbound endpoint, is associated with a security group that allows inbound requests that originate from the peered VPC or from other VPCs in the Region.
  • A forwarding rule is created for each Kubernetes cluster endpoint.  This rule routes the request through the outbound endpoint to the IP addresses of the inbound endpoints in the worker node VPC, where it is resolved by Route 53.
  • The results of the DNS query for the Kubernetes cluster endpoint are then returned to the requestor.

If the request originates from an on-premises environment, you forego creating the outbound endpoints. Instead, you create a forwarding rule to forward requests for the Kubernetes cluster endpoint to the IP address of the Route 53 inbound endpoints in the worker node VPC.

Solution overview

For this solution, follow these steps:

  • Create an inbound endpoint in the worker node VPC.
  • Create an outbound endpoint in a peered VPC.
  • Create a forwarding rule for the outbound endpoint that sends requests to the Route 53 resolver for the worker node VPC.
  • Create a security group rule to allow inbound traffic from a peered network.
  • (Optional) Create a forwarding rule in your on-premises DNS for the Kubernetes cluster endpoint.

Prerequisites

EKS requires that you enable DNS hostnames and DNS resolution in each worker node VPC when you change the cluster endpoint access from public to private.  It is also a prerequisite for this solution and for all solutions that uses Route 53 private hosted zones.

In addition, you need a route that connects your on-premises network or VPC with the worker node VPC.  In a multi-VPC environment, this can be accomplished by creating a peering connection between two or more VPCs and updating the route table in those VPCs. If you’re connecting from an on-premises environment across a DX or an IPsec VPN, you need a route to the worker node VPC.

Configuring the inbound endpoint

When you provision an EKS cluster, EKS automatically provisions two or more cross-account elastic network interfaces onto two different subnets in your worker node VPC.  These network interfaces are primarily used when the control plane must initiate a connection with your worker nodes, for example, when you use kubectl exec or kubectl proxy. However, they can also be used by the workers to communicate with the Kubernetes API server.

When you change the EKS endpoint access to private, EKS associates a Route 53 private hosted zone with your worker node VPC.  Within this private hosted zone, EKS creates resource records for the cluster endpoint. These records correspond to the IP addresses of the two cross-account elastic network interfaces that were created in your VPC when you provisioned your cluster.

When the IP addresses of these cross-account elastic network interfaces change, for example, when EKS replaces unhealthy control plane nodes, the resource records for the cluster endpoint are automatically updated. This allows your worker nodes to continue communicating with the cluster endpoint when you switch to private access.  If you update the cluster to enable public access and disable private access, your worker nodes revert to using the public Kubernetes cluster endpoint.

By creating a Route 53 inbound endpoint in the worker node VPC, DNS queries are sent to the VPC DNS resolver of worker node VPC.  This endpoint is now capable of resolving the cluster endpoint.

Create an inbound endpoint in the worker node VPC

  1. In the Route 53 console, choose Inbound endpoints, Create Inbound endpoint.
  2. For Endpoint Name, enter a value such as <cluster_name>InboundEndpoint.
  3. For VPC in the Region, choose the VPC ID of the worker node VPC.
  4. For Security group for this endpoint, choose a security group that allows clients or applications from other networks to access this endpoint. For an example, see the Route 53 resolver diagram shown earlier in the post.
  5. Under IP addresses section, choose an Availability Zone that corresponds to a subnet in your VPC.
  6. For IP address, choose Use an IP address that is selected automatically.
  7. Repeat steps 7 and 8 for the second IP address.
  8. Choose Submit.

Or, run the following AWS CLI command:

export DATE=$(date +%s)
export INBOUND_RESOLVER_ID=$(aws route53resolver create-resolver-endpoint --name 
<name> --direction INBOUND --creator-request-id $DATE --security-group-ids <sgs> \
--ip-addresses SubnetId=<subnetId>,Ip=<IP address> SubnetId=<subnetId>,Ip=<IP address> \
| jq -r .ResolverEndpoint.Id)
aws route53resolver list-resolver-endpoint-ip-addresses --resolver-endpoint-id \
$INBOUND_RESOLVER_ID | jq .IpAddresses[].Ip

This outputs the IP addresses assigned to the inbound endpoint.

When you are done creating the inbound endpoint, select the endpoint from the console and choose View details.  This shows you a summary of the configuration for the endpoint.  Record the two IP addresses that were assigned to the inbound endpoint, as you need them later when configuring the forwarding rule.

Connecting from a peered VPC

An outbound endpoint is used to send DNS requests that cannot be resolved “locally” to an external resolver based on a set of rules.

If you are connecting to the EKS cluster from a peered VPC, create an outbound endpoint and forwarding rule in that VPC or expose an outbound endpoint from another VPC. For more information, see Forwarding Outbound DNS Queries to Your Network.

Create an outbound endpoint

  1. In the Route 53 console, choose Outbound endpoints, Create outbound endpoint.
  2. For Endpoint name, enter a value such as <cluster_name>OutboundEnpoint.
  3. For VPC in the Region, select the VPC ID of the VPC where you want to create the outbound endpoint, for example the peered VPC.
  4. For Security group for this endpoint, choose a security group that allows clients and applications from this or other network VPCs to access this endpoint. For an example, see the Route 53 resolver diagram shown earlier in the post.
  5. Under the IP addresses section, choose an Availability Zone that corresponds to a subnet in the peered VPC.
  6. For IP address, choose Use an IP address that is selected automatically.
  7. Repeat steps 7 and 8 for the second IP address.
  8. Choose Submit.

Or, run the following AWS CLI command:

export DATE=$(date +%s)
export OUTBOUND_RESOLVER_ID=$(aws route53resolver create-resolver-endpoint --name 
<name> --direction OUTBOUND --creator-request-id $DATE --security-group-ids <sgs> \
--ip-addresses SubnetId=<subnetId>,Ip=<IP address> SubnetId=<subnetId>,Ip=<Ip address> \
| jq -r .ResolverEndpoint.Id)
aws route53resolver list-resolver-endpoint-ip-addresses --resolver-endpoint-id \
$OUTBOUND_RESOLVER_ID | jq .IpAddresses[].Ip

This outputs the IP addresses that get assigned to the outbound endpoint.

Create a forwarding rule for the cluster endpoint

A forwarding rule is used to send DNS requests that cannot be resolved by the local resolver to another DNS resolver.  For this solution to work, create a forwarding rule for each cluster endpoint to resolve through the outbound endpoint. For more information, see Values That You Specify When You Create or Edit Rules.

  1. In the Route 53 console, choose Rules, Create rule.
  2. Give your rule a name, such as <cluster_name>Rule.
  3. For Rule type, choose Forward.
  4. For Domain name, type the name of the cluster endpoint for your EKS cluster.
  5. For VPCs that use this rule, select all of the VPCs to which this rule should apply.  If you have multiple VPCs that must access the cluster endpoint, include them in the list of VPCs.
  6. For Outbound endpoint, select the outbound endpoint to use to send DNS requests to the inbound endpoint of the worker node VPC.
  7. Under the Target IP addresses section, enter the IP addresses of the inbound endpoint that corresponds to the EKS endpoint that you entered in the Domain name field.
  8. Choose Submit.

Or, run the following AWS CLI command:

export DATE=$(date +%s)
aws route53resolver create-resolver-rule --name <name> --rule-type FORWARD \
--creator-request-id $DATE --domain-name <cluster_endpoint> --target-ips \
Ip=<IP of inbound endpoint>,Port=53 --resolver-endpoint-id <Id of outbound endpoint>

Accessing the cluster endpoint

After creating the inbound and outbound endpoints and the DNS forwarding rule, you should be able to resolve the name of the cluster endpoints from the peered VPC.

$ dig 9FF86DB0668DC670F27F426024E7CDBD.sk1.us-east-1.eks.amazonaws.com 

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.68.rc1.58.amzn1 <<>> 9FF86DB0668DC670F27F426024E7CDBD.sk1.us-east-1.eks.amazonaws.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7168
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;9FF86DB0668DC670F27F426024E7CDBD.sk1.us-east-1.eks.amazonaws.com. IN A
;; ANSWER SECTION:
9FF86DB0668DC670F27F426024E7CDBD.sk1.us-east-1.eks.amazonaws.com. 60 IN A 192.168.109.77
9FF86DB0668DC670F27F426024E7CDBD.sk1.us-east-1.eks.amazonaws.com. 60 IN A 192.168.74.42
;; Query time: 12 msec
;; SERVER: 172.16.0.2#53(172.16.0.2)
;; WHEN: Mon Apr 8 22:39:05 2019
;; MSG SIZE rcvd: 114

Before you can access the cluster endpoint, you must add the IP address range of the peered VPCs to the EKS control plane security group. For more information, see Tutorial: Creating a VPC with Public and Private Subnets for Your Amazon EKS Cluster.

Add a rule to the EKS cluster control plane security group

  1. In the EC2 console, choose Security Groups.
  2. Find the security group associated with the EKS cluster control plane.  If you used eksctl to provision your cluster, the security group is named as follows: eksctl-<cluster_name>-cluster/ControlPlaneSecurityGroup.
  3. Add a rule that allows port 443 inbound from the CIDR range of the peered VPC.
  4. Choose Save.

Run kubectl

With the proper security group rule in place, you should now be able to issue kubectl commands from a machine in the peered VPC against the cluster endpoint.

$ kubectl get nodes
NAME                             STATUS    ROLES     AGE       VERSION
ip-192-168-18-187.ec2.internal   Ready     <none>    22d       v1.11.5
ip-192-168-61-233.ec2.internal   Ready     <none>    22d       v1.11.5

Connecting from an on-premises environment

To manage your EKS cluster from your on-premises environment, configure a forwarding rule in your on-premises DNS to forward DNS queries to the inbound endpoint of the worker node VPCs. I’ve provided brief descriptions for how to do this for BIND, dnsmasq, and Windows DNS below.

Create a forwarding zone in BIND for the cluster endpoint

Add the following to the BIND configuration file:

zone "<cluster endpoint FQDN>" {
    type forward;
    forwarders { <inbound endpoint IP #1>; <inbound endpoint IP #2>; };
};

Create a forwarding zone in dnsmasq for the cluster endpoint

If you’re using dnsmasq, add the --server=/<cluster endpoint FQDN>/<inbound endpoint IP> flag to the startup options.

Create a forwarding zone in Windows DNS for the cluster endpoint

If you’re using Windows DNS, create a conditional forwarder.  Use the cluster endpoint FQDN for the DNS domain and the IPs of the inbound endpoints for the IP addresses of the servers to which to forward the requests.

Add a security group rule to the cluster control plane

Follow the steps in Adding A Rule To The EKS Cluster Control Plane Security Group. This time, use the CIDR of your on-premises network instead of the peered VPC.

Conclusion

When you configure the EKS cluster endpoint to be private only, its name can only be resolved from the worker node VPC. To manage the cluster from another VPC or your on-premises network, you can use the solution outlined in this post to create an inbound resolver for the worker node VPC.

This inbound endpoint is a feature that allows your DNS resolvers to easily resolve domain names for AWS resources. That includes the private hosted zone that gets associated with your VPC when you make the EKS cluster endpoint private. For more information, see Resolving DNS Queries Between VPCs and Your Network.  As always, I welcome your feedback about this solution.

Anatomy of CVE-2019-5736: A runc container escape!

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/anatomy-of-cve-2019-5736-a-runc-container-escape/

This post is courtesy of Samuel Karp, Senior Software Development Engineer — Amazon Container Services.

On Monday, February 11, CVE-2019-5736 was disclosed.  This vulnerability is a flaw in runc, which can be exploited to escape Linux containers launched with Docker, containerd, CRI-O, or any other user of runc.  But how does it work?  Dive in!

This concern has already been addressed for AWS, and no customer action is required. For more information, see the security bulletin.

A review of Linux process management

Before I explain the vulnerability, here’s a review of some Linux basics.

  • Processes and syscalls
  • What is /proc?
  • Dynamic linking

Processes and syscalls

Processes form the core unit of running programs on Linux. Every launched program is represented by one or more processes.  Processes contain a variety of data about the running program, including a process ID (pid), a table tracking in-use memory, a pointer to the currently executing instruction, a set of descriptors for open files, and so forth.

Processes interact with the operating system to perform a variety of operations (for example, reading and writing files, taking input, communicating on the network, etc.) via system calls, or syscalls.  Syscalls can perform a variety of actions. The ones I’m interested in today involve creating other processes (typically through fork(2) or clone(2)) and changing the currently running program into something else (execve(2)).

File descriptors are how a process interacts with files, as managed by the Linux kernel.  File descriptors are short identifiers (numbers) that are passed to the appropriate syscalls for interacting with files: read(2), write(2), close(2), and so forth.

Sometimes a process wants to spawn another process.  That might be a shell running a program you typed at the terminal, a daemon that needs a helper, or even concurrent processing without threads.  When this happens, the process typically uses the fork(2) or clone(2) syscalls.

These syscalls have some differences, but they both operate by creating another copy of the currently executing process and sharing some state.  That state can include things like the memory structures (either shared memory segments or copies of the memory) and file descriptors.

After the new process is started, it’s the responsibility of both processes to figure out which one they are (am I the parent? Am I the child?). Then, they take the appropriate action.  In many cases, the appropriate action is for the child to do some setup, and then execute the execve(2) syscall.

The following example shows the use of fork(2), in pseudocode:

func main() {
    child_pid= fork();
    if (child_pid > 0) {
        // This is the parent process, since child_pid is the pid of the child
        // process.
    } else if (child_pid == 0) {
        // This is the child process. It can retrieve its own pid via getpid(2),
        // if desired.  This child process still sees all the variables in memory
        // and all the open file descriptors.
    }
}

The execve(2) syscall instructs the Linux kernel to replace the currently executing program with another program, in-place.  When called, the Linux kernel loads the new executable as specified and pass the specified arguments.  Because this is done in place, the pid is preserved and a variety of other contextual information is carried over, including environment variables, the current working directory, and any open files.

func main() {
    // execl(3) is a wrapper around the execve(2) syscall that accepts the
    // arguments for the executed program as a list.
    // The first argument to execl(3) is the path of the executable to
    // execute, which in this case is the pwd(1) utility for printing out
    // the working directory.
    // The next argument to execl(3) is the first argument passed through
    // to the new program (in a C program, this would be the first element
    // of the argc array, or argc[0]).  By convention, this is the same as
    // the path of the executable.
    // The remaining arguments to execl(3) are the other arguments visible
    // in the new program's argc array, terminated by NULL.  As you're
    // not passing any additional arguments, just pass NULL here.
    execl("/bin/pwd", "/bin/pwd", NULL);
    // Nothing after this point executes, since the running process has been
    // replaced by the new pwd(1) program.
}

Wait…open files?  By default, open files are passed across the execve(2) boundary.  This is useful in cases where the new program can’t open the file, for example if there’s a new mount covering the existing path.  This is also the mechanism by which the standard I/O streams (stdin, stdout, and stderr) are made available to the new program.

While convenient in some use cases, it’s not always desired to preserve open file descriptors in the new program. This behavior can be changed by passing the O_CLOEXEC flag to open(2) when opening the file or by setting the FD_CLOEXEC flag with fcntl(2).  Using O_CLOEXEC or FD_CLOEXEC (which are both short for close-on-exec) prevents the new program from having access to the file descriptor.

func main() {
    // open(2) opens a file.  The first argument to open is the path of the file
    // and the second argument is a bitmask of flags that describe options
    // applied to the file that's opened.  open(2) then returns a file
    // descriptor, which can be used in subsequent syscalls to represent this
    // file.
    // For this example, open /dev/urandom, which is a file containing random
    // bytes.  Pass two flags: O_RDONLY and O_CLOEXEC; O_RDONLY indicates that
    // the file should be open for reading but not writing, and O_CLOEXEC
    // indicates that the file descriptor should not pass through the execve(2)
    // boundary.
    fd = open("/dev/urandom", O_RDONLY | O_CLOEXEC);
    // All valid file descriptors are positive integers, so a returned value < 0
    // indicates that an error occurred.
    if (fd < 0) {
        // perror(3) is a function to print out the last error that occurred.
        error("could not open /dev/urandom");
        // exit(3) causes a process to exit with a given exit code. Return 1
        // here to indicate that an error occurred.
        exit(1);
    }
}

What is /proc?
/proc (or proc(5)) is a pseudo-filesystem that provides access to a number of Linux kernel data structures.  Every process in Linux has a directory available for it called /proc/[pid].  This directory stores a bunch of information about the process, including the arguments it was given when the program started, the environment variables visible to it, and the open file descriptors.

The special files inside /proc/[pid]/fd describe the file descriptors that the process has open.  They look like symbolic links (symlinks), and you can see the original path of the file, but they aren’t exactly symlinks. You can pass them to open(2) even if the original path is inaccessible and get another working file descriptor.

Another file inside /proc/[pid] is called exe. This file is like the ones in /proc/[pid]/fd except that it points to the binary program that is executing inside that process.

/proc/[pid] also has a companion directory, /proc/self.  This directory is always the same as /proc/[pid] of the process that is accessing it. That is, you can always read your own /proc data from /proc/self without knowing your pid.

Dynamic linking

When writing programs, software developers typically use libraries—collections of previously written code intended to be reused.  Libraries can cover all sorts of things, from high-level concerns like machine learning to lower-level concerns like basic data structures or interfaces with the operating system.

In the code example above, you can see the use of a library through a call to a function defined in a library (fork).

Libraries are made available to programs through linking: a mechanism for resolving symbols (types, functions, variables, etc.) to their definition.  On Linux, programs can be statically linked, in which case all the linking is done at compile time and all symbols are fully resolved. Or they can be dynamically linked, in which case at least some symbols are unresolved until a runtime linker makes them available.

Dynamic linking makes it possible to replace some parts of the resulting code without recompiling the whole application. This is typically used for upgrading libraries to fix bugs, enhance performance, or to address security concerns.  In contrast, static linking requires re-compiling and re-linking each program that uses a given library to affect the same change.

On Linux, runtime linking is typically performed by ld-linux.so(8), which is provided by the GNU project toolchain.  Dynamically linked libraries are specified by a name embedded into the compiled binary.  This dynamic linker reads those names and then performs a search across a standard set of paths to find the associated library file (a shared object file, or .so).

The dynamic linker’s search path can be influenced by the LD_LIBRARY_PATH environment variable.  The LD_PRELOAD environment variable can tell the linker to load additional, user-specified libraries before all others. This is useful in debugging scenarios to allow selective overriding of symbols without having to rebuild a library entirely.

The vulnerability

Now that the cast of characters is set (fork(2), execve(2), open(2), proc(5), file descriptors, and linking), I can start talking about the vulnerability in runc.

runc is a container runtime.  Like a shell, its primary purpose is to launch other programs. However, it does so after manipulating Linux resources like cgroups, namespaces, mounts, seccomp, and capabilities to make what is referred to as a “container.”

The primary mechanism for setting up some of these resources, like namespaces, is through flags to the clone(2) syscall that take effect in the new process.  The target of the final execve(2) call is the program the user requested. It With a container, the target of the final execve(2) call can be specified in the container image or through explicit arguments.

The CVE announcement states:

The vulnerability allows a malicious container to […] overwrite the host runc binary […].  The level of user interaction is being able to run any command […] as root within a container [when creating] a new container using an attacker-controlled image.

The operative parts of this are: being able to overwrite the host runc binary (that seems bad) by running a command (that’s…what runc is supposed to do…).  Note too that the vulnerability is as simple as running a command and does not require running a container with elevated privileges or running in a non-default configuration.

Don’t containers protect against this?

Containers are, in many ways, intended to isolate the host from a given workload or to isolate a given workload from the host.  One of the main mechanisms for doing this is through a separate view of the filesystem.  With a separate view, the container shouldn’t be able to access the host’s files and should only be able to see its own.  runc accomplishes this using a mount namespace and mounting the container image’s root filesystem as /.  This effectively hides the host’s filesystem.

Even with techniques like this, things can pass through the mount namespace.  For example, the /proc/cmdline file contains the running Linux kernel’s command-line parameters.  One of those parameters typically indicates the host’s root filesystem, and a container with enough access (like CAP_SYS_ADMIN) can remount the host’s root filesystem within the container’s mount namespace.

That’s not what I’m talking about today, as that requires non-default privileges to run.  The interesting thing today is that the /proc filesystem exposes a path to the original program’s file, even if that file is not located in the current mount namespace.

What makes this troublesome is that interacting with Linux primitives like namespaces typically requires you to run as root, somewhere.  In most installations involving runc (including the default configuration in Docker, Kubernetes, containerd, and CRI-O), the whole setup runs as root.

runc must be able to perform a number of operations that require elevated privileges, even if your container is limited to a much smaller set of privileges. For example, namespace creation and mounting both require the elevated capability CAP_SYS_ADMIN, and configuring the network requires the elevated capability CAP_NET_ADMIN. You might see a pattern here.

An alternative to running as root is to leverage a user namespace. User namespaces map a set of UIDs and GIDs inside the namespace (including ones that appear to be root) to a different set of UIDs and GIDs outside the namespace.  Kernel operations that are user-namespace-aware can delineate privileged actions occurring inside the user namespace from those that occur outside.

However, user namespaces are not yet widely employed and are not enabled by default. The set of kernel operations that are user-namespace-aware is still growing, and not everyone runs the newest kernel or user-space software.

So, /proc exposes a path to the original program’s file, and the process that starts the container runs as root.  What if that original program is something important that you knew would run again… like runc?

Exploiting it!

runc’s job is to run commands that you specify.  What if you specified /proc/self/exe?  It would cause runc to spawn a copy of itself, but running inside the context of the container, with the container’s namespaces, root filesystem, and so on.  For example, you could run the following command:

docker run –rm amazonlinux:2 /proc/self/exe

This, by itself, doesn’t get you far—runc doesn’t hurt itself.

Generally, runc is dynamically linked against some libraries that provide implementations for seccomp(2), SELinux, or AppArmor.  If you remember from earlier, ld-linux.so(8) searches a standard set of file paths to provide these implementations at runtime.  If you start runc again inside the container’s context, with its separate filesystem, you have the opportunity to provide other files in place of the expected library files. These can run your own code instead of the standard library code.

There’s an easier way, though.  Instead of having to make something that looks like (for example) libseccomp, you can take advantage of a different feature of the dynamic linker: LD_PRELOAD.  And because runc lets you specify environment variables along with the path of the executable to run, you can specify this environment variable, too.

With LD_PRELOAD, you can specify your own libraries to load first, ahead of the other libraries that get loaded.  Because the original libraries still get loaded, you don’t actually have to have a full implementation of their interface.  Instead, you can selectively override some common functions that you might want and omit others that you don’t.

So now you can inject code through LD_PRELOAD and you have a target to inject it into: runc, by way of /proc/self/exe.  For your code to get run, something must call it.  You could search for a target function to override, but that means inspecting runc’s code to figure out what could get called and how.  Again, there’s an easier way.  Dynamic libraries can specify a “constructor” that is run immediately when the library is loaded.

Using the “constructor” along with LD_PRELOAD and specifying the command as /proc/self/exe, you now have a way to inject code and get it to run.  That’s it, right?  You can now write to /proc/self/exe and overwrite runc!

Not so fast.

The Linux kernel does have a bit of a protection mechanism to prevent you from overwriting the currently running executable.  If you open /proc/self/exe for writing, you get -ETXTBSY.  This error code indicates that the file is busy, where “TXT” refers to the text (code) section of the binary.

You know from earlier that execve(2) is a mechanism to replace the currently running executable with another, which means that the original executable isn’t in use anymore.  So instead of just having a single library that you load with LD_PRELOAD, you also must have another executable that can do the dirty work for you, which you can execve(2).

Normally, doing this would still be unsuccessful due to file permissions. Executables are typically not world-writable.  But because runc runs as root and does not change users, the new runc process that you started through /proc/self/exe and the helper program that you executed are also run as root.

After you gain write access to the runc file descriptor and you’ve replaced the currently executing program with execve(2), you can replace runc’s content with your own.  The other software on the system continues to start runc as part of its normal operation (for example, creating new containers, stopping containers, or performing exec operations inside containers). Your code has the chance to operate instead of runc.  When it gets run this way, your code runs as root, in the host’s context instead of in the container’s context.

Now you’re done!  You’ve successfully escaped the container and have full root access.

Putting that all together, you get something like the following pseudocode:

Preload pseudocode for preload.so

func constructor(){
    // /proc/self/exe is a virtual file pointing to the currently running
    // executable.  Open it here so that it can be passed to the next
    // process invoked.  It must be opened read-only, or the kernel will fail
    // the open syscall with ETXTBSY.  You cannot gain write access to the text
    // portion of a running executable.
    fd = open("/proc/self/exe", O_RDONLY);
    if (fd < 0) {
        error("could not open /proc/self/exe");
        exit(1);
    }
    // /proc/self/fd/%d is a virtual file representing the open file descriptor
    // to /proc/self/exe, which you opened earlier.
    filename = sprintf("/proc/self/fd/%d", fd);
    // execl is a call that executes a new executable, replacing the
    // currently running process and preserving aspects like the process ID.
    // Execute the "rewrite" binary, passing it arguments representing the
    // path of the open file descriptor. Because you did not pass O_CLOEXEC when
    // opening the file, the file descriptor remains open in the replacement
    // program and retains the same descriptor.
    execl("/rewrite", "/rewrite", filename, NULL);
    // execl never returns, except on an error
    error("couldn't execl");
}

Pseudocode for the rewrite program

// rewrite is your cooperating malicious program that takes an argument
// representing a file descriptor path, reopens it as read-write, and
// replaces the contents.  rewrite expects that it is unable to open
// the file on the first try, as the kernel has not closed it yet
func main(argc, *argv[]) {
    fd = 0;
    printf("Running\n");
    for(tries = 0; tries < 10000; tries++) {
        // argv[1] is the argument that contains the path to the virtual file
        // of the read-only file descriptor
        fd = open(argv[1], O_RDWR|O_TRUNC);
        if( fd >= 0 ) {
            printf("open succeeded\n");
            break;
        } else {
            if(errno != ETXTBSY) {
                // You expect a lot of ETXTBSY, so only print when you get something else
                error("open");
            }
        }
    }
    if (fd < 0) {
        error("exhausted all open attempts");
        exit(1);
    }
    dprintf(fd, "CVE-2019-5736\n");
    printf("wrote over runc!\n");
    fflush(stdout);

The above code was written by Noah Meyerhans, iliana weller, and Samuel Karp.

How does the patch work?

If you try the same approach with a patched runc, you instead see that opening the file with O_RDWR is denied.  This means that the patch is working!

The runc patch operates by taking advantage of some Linux kernel features introduced in kernel 3.17, specifically a syscall called memfd_create(2).  This syscall creates a temporary memory-backed file and a file descriptor that can be used to access the file.  This file descriptor has some special semantics:  It is automatically removed when the last reference to it is dropped. It’s in memory, so that just equates to freeing the memory.  It supports another useful feature: file sealing.  File sealing allows the file to be made immutable, even to processes that are running as root.

The runc patch changes the behavior of runc so that it creates a copy of itself in one of these temporary file descriptors, and then seals it.  The next time a process launches (via fork(2)) or a process is replaced (via execve(2)), /proc/self/exe will be this sealed, memory-backed file descriptor.  When your rewrite program attempts to modify it, the Linux kernel prevents it as it’s a sealed file.

Could I have avoided being vulnerable?

Yes, a few different mechanisms were available before the patch that provided mitigation for this vulnerability.  The one that I mentioned earlier is user namespaces. Mapping to a different user namespace inside the container would mean that normal Linux file permissions would effectively prevent runc from becoming writable because the compromised process inside the container is not running as the real root user.

Another mechanism, which is used by Google Container-Optimized OS, is to have the host’s root filesystem mounted as read-only.  A read-only mount of the runc binary itself would also prevent the runc binary from becoming writable.

SELinux, when correctly configured, may also prevent this vulnerability.

A different approach to preventing this vulnerability is to treat the Linux kernel as belonging to a single tenant, and spend your effort securing the kernel through another layer of separation.  This is typically accomplished using a hypervisor or a virtual machine.

Amazon invests heavily in this type of boundary. Amazon EC2 instances, AWS Lambda functions, and AWS Fargate tasks are secured from each other using techniques like these.  Amazon EC2 bare-metal instances leverage the next-generation Nitro platform that allows AWS to offer secure, bare-metal compute with a hardware root-of-trust.  Along with traditional hypervisors, the Firecracker virtual machine manager can be used to implement this technique with function- and container-like workloads.

Further reading

The original researchers who discovered this vulnerability have published their own post, CVE-2019-5736: Escape from Docker and Kubernetes containers to root on host. They describe how they discovered the vulnerability, as well as several other attempts.

Acknowledgements

I’d like to thank the original researchers who discovered the vulnerability for reporting responsibly and the OCI maintainers (and Aleksa Sarai specifically) who coordinated the disclosure. Thanks to Linux distribution maintainers and cloud providers who made updated packages available quickly.  I’d also like to thank the Amazonians who made it possible for AWS customers to be protected:

  • AWS Security who ran the whole process
  • Clare Liguori, Onur Filiz, Noah Meyerhans, iliana weller, and Tom Kirchner who performed analysis and validation of the patch
  • The Amazon Linux team (and iliana weller specifically) who backported the patch and built new Docker RPMs
  • The Amazon ECS, Amazon EKS, and AWS Fargate teams for making patched infrastructure available quickly
  • And all of the other AWS teams who put in extra effort to protect AWS customers

How to run AWS CloudHSM workloads on Docker containers

Post Syndicated from Mohamed AboElKheir original https://aws.amazon.com/blogs/security/how-to-run-aws-cloudhsm-workloads-on-docker-containers/

AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to generate and use your own encryption keys on the AWS Cloud. With CloudHSM, you can manage your own encryption keys using FIPS 140-2 Level 3 validated HSMs. Your HSMs are part of a CloudHSM cluster. CloudHSM automatically manages synchronization, high availability, and failover within a cluster.

CloudHSM is part of the AWS Cryptography suite of services, which also includes AWS Key Management Service (KMS) and AWS Certificate Manager Private Certificate Authority (ACM PCA). KMS and ACM PCA are fully managed services that are easy to use and integrate. You’ll generally use AWS CloudHSM only if your workload needs a single-tenant HSM under your own control, or if you need cryptographic algorithms that aren’t available in the fully-managed alternatives.

CloudHSM offers several options for you to connect your application to your HSMs, including PKCS#11, Java Cryptography Extensions (JCE), or Microsoft CryptoNG (CNG). Regardless of which library you choose, you’ll use the CloudHSM client to connect to all HSMs in your cluster. The CloudHSM client runs as a daemon, locally on the same Amazon Elastic Compute Cloud (EC2) instance or server as your applications.

The deployment process is straightforward if you’re running your application directly on your compute resource. However, if you want to deploy applications using the HSMs in containers, you’ll need to make some adjustments to the installation and execution of your application and the CloudHSM components it depends on. Docker containers don’t typically include access to an init process like systemd or upstart. This means that you can’t start the CloudHSM client service from within the container using the general instructions provided by CloudHSM. You also can’t run the CloudHSM client service remotely and connect to it from the containers, as the client daemon listens to your application using a local Unix Domain Socket. You cannot connect to this socket remotely from outside the EC2 instance network namespace.

This blog post discusses the workaround that you’ll need in order to configure your container and start the client daemon so that you can utilize CloudHSM-based applications with containers. Specifically, in this post, I’ll show you how to run the CloudHSM client daemon from within a Docker container without needing to start the service. This enables you to use Docker to develop, deploy and run applications using the CloudHSM software libraries, and it also gives you the ability to manage and orchestrate workloads using tools and services like Amazon Elastic Container Service (Amazon ECS), Kubernetes, Amazon Elastic Container Service for Kubernetes (Amazon EKS), and Jenkins.

Solution overview

My solution shows you how to create a proof-of-concept sample Docker container that is configured to run the CloudHSM client daemon. When the daemon is up and running, it runs the AESGCMEncryptDecryptRunner Java class, available on the AWS CloudHSM Java JCE samples repo. This class uses CloudHSM to generate an AES key, then it uses the key to encrypt and decrypt randomly generated data.

Note: In my example, you must manually enter the crypto user (CU) credentials as environment variables when running the container. For any production workload, you’ll need to carefully consider how to provide, secure, and automate the handling and distribution of your HSM credentials. You should work with your security or compliance officer to ensure that you’re using an appropriate method of securing HSM login credentials for your application and security needs.

Figure 1: Architectural diagram

Figure 1: Architectural diagram

Prerequisites

To implement my solution, I recommend that you have basic knowledge of the below:

  • CloudHSM
  • Docker
  • Java

Here’s what you’ll need to follow along with my example:

  1. An active CloudHSM cluster with at least one active HSM. You can follow the Getting Started Guide to create and initialize a CloudHSM cluster. (Note that for any production cluster, you should have at least two active HSMs spread across Availability Zones.)
  2. An Amazon Linux 2 EC2 instance in the same Amazon Virtual Private Cloud in which you created your CloudHSM cluster. The EC2 instance must have the CloudHSM cluster security group attached—this security group is automatically created during the cluster initialization and is used to control access to the HSMs. You can learn about attaching security groups to allow EC2 instances to connect to your HSMs in our online documentation.
  3. A CloudHSM crypto user (CU) account created on your HSM. You can create a CU by following these user guide steps.

Solution details

  1. On your Amazon Linux EC2 instance, install Docker:
    
            # sudo yum -y install docker
            

  2. Start the docker service:
    
            # sudo service docker start
            

  3. Create a new directory and step into it. In my example, I use a directory named “cloudhsm_container.” You’ll use the new directory to configure the Docker image.
    
            # mkdir cloudhsm_container
            # cd cloudhsm_container           
            

  4. Copy the CloudHSM cluster’s CA certificate (customerCA.crt) to the directory you just created. You can find the CA certificate on any working CloudHSM client instance under the path /opt/cloudhsm/etc/customerCA.crt. This certificate is created during initialization of the CloudHSM Cluster and is needed to connect to the CloudHSM cluster.
  5. In your new directory, create a new file with the name run_sample.sh that includes the contents below. The script starts the CloudHSM client daemon, waits until the daemon process is running and ready, and then runs the Java class that is used to generate an AES key to encrypt and decrypt your data.
    
            #! /bin/bash
    
            # start cloudhsm client
            echo -n "* Starting CloudHSM client ... "
            /opt/cloudhsm/bin/cloudhsm_client /opt/cloudhsm/etc/cloudhsm_client.cfg &> /tmp/cloudhsm_client_start.log &
            
            # wait for startup
            while true
            do
                if grep 'libevmulti_init: Ready !' /tmp/cloudhsm_client_start.log &> /dev/null
                then
                    echo "[OK]"
                    break
                fi
                sleep 0.5
            done
            echo -e "\n* CloudHSM client started successfully ... \n"
            
            # start application
            echo -e "\n* Running application ... \n"
            
            java -ea -Djava.library.path=/opt/cloudhsm/lib/ -jar target/assembly/aesgcm-runner.jar --method environment
            
            echo -e "\n* Application completed successfully ... \n"                      
            

  6. In the new directory, create another new file and name it Dockerfile (with no extension). This file will specify that the Docker image is built with the following components:
    • The AWS CloudHSM client package.
    • The AWS CloudHSM Java JCE package.
    • OpenJDK 1.8. This is needed to compile and run the Java classes and JAR files.
    • Maven, a build automation tool that is needed to assist with building the Java classes and JAR files.
    • The AWS CloudHSM Java JCE samples that will be downloaded and built.
  7. Cut and paste the contents below into Dockerfile.

    Note: Make sure to replace the HSM_IP line with the IP of an HSM in your CloudHSM cluster. You can get your HSM IPs from the CloudHSM console, or by running the describe-clusters AWS CLI command.

    
            # Use the amazon linux image
            FROM amazonlinux:2
            
            # Install CloudHSM client
            RUN yum install -y https://s3.amazonaws.com/cloudhsmv2-software/CloudHsmClient/EL7/cloudhsm-client-latest.el7.x86_64.rpm
            
            # Install CloudHSM Java library
            RUN yum install -y https://s3.amazonaws.com/cloudhsmv2-software/CloudHsmClient/EL7/cloudhsm-client-jce-latest.el7.x86_64.rpm
            
            # Install Java, Maven, wget, unzip and ncurses-compat-libs
            RUN yum install -y java maven wget unzip ncurses-compat-libs
            
            # Create a work dir
            WORKDIR /app
            
            # Download sample code
            RUN wget https://github.com/aws-samples/aws-cloudhsm-jce-examples/archive/master.zip
            
            # unzip sample code
            RUN unzip master.zip
            
            # Change to the create directory
            WORKDIR aws-cloudhsm-jce-examples-master
            
            # Build JAR files
            RUN mvn validate && mvn clean package
            
            # Set HSM IP as an environmental variable
            ENV HSM_IP <insert the IP address of an active CloudHSM instance here>
            
            # Configure cloudhms-client
            COPY customerCA.crt /opt/cloudhsm/etc/
            RUN /opt/cloudhsm/bin/configure -a $HSM_IP
            
            # Copy the run_sample.sh script
            COPY run_sample.sh .
            
            # Run the script
            CMD ["bash","run_sample.sh"]                        
            

  8. Now you’re ready to build the Docker image. Use the following command, with the name jce_sample_client. This command will let you use the Dockerfile you created in step 6 to create the image.
    
            # sudo docker build -t jce_sample_client .
            

  9. To run a Docker container from the Docker image you just created, use the following command. Make sure to replace the user and password with your actual CU username and password. (If you need help setting up your CU credentials, see prerequisite 3. For more information on how to provide CU credentials to the AWS CloudHSM Java JCE Library, refer to the steps in the CloudHSM user guide.)
    
            # sudo docker run --env HSM_PARTITION=PARTITION_1 \
            --env HSM_USER=<user> \
            --env HSM_PASSWORD=<password> \
            jce_sample_client
            

    If successful, the output should look like this:

    
            * Starting cloudhsm-client ... [OK]
            
            * cloudhsm-client started successfully ...
            
            * Running application ...
            
            ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors 
            to the console.
            70132FAC146BFA41697E164500000000
            Successful decryption
                SDK Version: 2.03
            
            * Application completed successfully ...          
            

Conclusion

My solution provides an example of how to run CloudHSM workloads on Docker containers. You can use it as a reference to implement your cryptographic application in a way that benefits from the high availability and load balancing built in to AWS CloudHSM without compromising on the flexibility that Docker provides for developing, deploying, and running applications. If you have comments about this post, submit them in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author photo

Mohamed AboElKheir

Mohamed AboElKheir joined AWS in September 2017 as a Security CSE (Cloud Support Engineer) based in Cape Town. He is a subject matter expert for CloudHSM and is always enthusiastic about assisting CloudHSM customers with advanced issues and use cases. Mohamed is passionate about InfoSec, specifically cryptography, penetration testing (he’s OSCP certified), application security, and cloud security (he’s AWS Security Specialty certified).

Scheduling GPUs for deep learning tasks on Amazon ECS

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/scheduling-gpus-for-deep-learning-tasks-on-amazon-ecs/

This post is contributed by Brent Langston – Sr. Developer Advocate, Amazon Container Services

Last week, AWS announced enhanced Amazon Elastic Container Service (Amazon ECS) support for GPU-enabled EC2 instances. This means that now GPUs are first class resources that can be requested in your task definition, and scheduled on your cluster by ECS.

Previously, to schedule a GPU workload, you had to maintain your own custom configured AMI, with a custom configured Docker runtime. You also had to use custom vCPU logic as a stand-in for assigning your GPU workloads to GPU instances. Even when all that was in place, there was still no pinning of a GPU to a task. One task might consume more GPU resources than it should. This could cause other tasks to not have a GPU available.

Now, AWS maintains an ECS-optimized AMI that includes the correct NVIDIA drivers and Docker customizations. You can use this AMI to provision your GPU workloads. With this enhancement, GPUs can also be requested directly in the task definition. Like allocating CPU or RAM to a task, now you can explicitly request a number of GPUs to be allocated to your task. The scheduler looks for matching resources on the cluster to place those tasks. The GPUs are pinned to the task for as long as the task is running, and can’t be allocated to any other tasks.

I thought I’d see how easy it is to deploy GPU workloads to my ECS cluster. I’m working in the US-EAST-2 (Ohio) region, from my AWS Cloud9 IDE, so these commands work for Amazon Linux. Feel free to adapt to your environment as necessary.

If you’d like to run this example yourself, you can find all the code in this GitHub repo. If you run this example in your own account, be aware of the instance pricing, and clean up your resources when your experiment is complete.

Clone the repo using the following command:

git clone https://github.com/brentley/tensorflow-container.git

Setup

You need the latest version of the AWS CLI (for this post, I used 1.16.98):

echo “export PATH=$HOME/.local/bin:$HOME/bin:$PATH” >> ~/.bash_profile
source ~/.bash_profile
pip install --user -U awscli

Provision an ECS cluster, with two C5 instances, and two P3 instances:

aws cloudformation deploy --stack-name tensorflow-test --template-file cluster-cpu-gpu.yml --capabilities CAPABILITY_IAM                            

While AWS CloudFormation is provisioning resources, examine the template used to build your infrastructure. Open `cluster-cpu-gpu.yml`, and you see that you are provisioning a test VPC with two c5.2xlarge instances, and two p3.2xlarge instances. This gives you one NVIDIA Tesla V100 GPU per instance, for a total of two GPUs to run training tasks.

I adapted the TensorFlow benchmark Docker container to create a training workload. I use this container to compare the GPU scheduling and runtime.

When the CloudFormation stack is deployed, register a task definition with the ECS service:

aws ecs register-task-definition --cli-input-json file://gpu-1-taskdef.json

To request GPU resources in the task definition, the only change needed is to include a GPU resource requirement in the container definition:

            "resourceRequirements": [
                {
                    "type": "GPU",
                    "value": "1"
                }
            ],

Including this resource requirement ensures that the ECS scheduler allocates the task to an instance with a free GPU resource.

Launch a single-GPU training workload

Now you’re ready to launch the first GPU workload.

export cluster=$(aws cloudformation describe-stacks --stack-name tensorflow-test --query 
'Stacks[0].Outputs[?OutputKey==`ClusterName`].OutputValue' --output text) 
echo $cluster
aws ecs run-task --cluster $cluster --task-definition tensorflow-1-gpu

When you launch the task, the output shows the `guIds` values that are assigned to the task. This GPU is pinned to this task, and can’t be shared with any other tasks. If all GPUs are allocated, you can’t schedule additional GPU tasks until a running task with a GPU completes. That frees the GPU to be scheduled again.

When you look at the log output in Amazon CloudWatch Logs, you see that the container discovered one GPU: `/gpu0` and the training benchmark trained at a rate of 321.16 images/sec.

With your two p3.2xlarge nodes in the cluster, you are limited to two concurrent single GPU based workloads. To scale horizontally, you could add additional p3.2xlarge nodes. This would limit your workloads to a single GPU each.  To scale vertically, you could bump up the instance type,  which would allow you to assign multiple GPUs to a single task.  Now, let’s see how fast your TensorFlow container can train when assigned multiple GPUs.

Launch a multiple-GPU training workload

To begin, replace the p3.2xlarge instances with p3.16xlarge instances. This gives your cluster two instances that each have eight GPUs, for a total of 16 GPUs that can be allocated.

aws cloudformation deploy --stack-name tensorflow-test --template-file cluster-cpu-gpu.yml --parameter-overrides GPUInstanceType=p3.16xlarge --capabilities CAPABILITY_IAM

When the CloudFormation deploy is complete, register two more task definitions to launch your benchmark container requesting more GPUs:

aws ecs register-task-definition --cli-input-json file://gpu-4-taskdef.json  
aws ecs register-task-definition --cli-input-json file://gpu-8-taskdef.json 

Next, launch two TensorFlow benchmark containers, one requesting four GPUs, and one requesting eight GPUs:

aws ecs run-task --cluster $cluster --task-definition tensorflow-4-gpu
aws ecs run-task --cluster $cluster --task-definition tensorflow-8-gpu

With each task request, GPUs are allocated: four in the first request, and eight in the second request. Again, these GPUs are pinned to the task, and not usable by any other task until these tasks are complete.

Check the log output in CloudWatch Logs:

On the “devices” lines, you can see that the container discovered and used four (or eight) GPUs. Also, the total images/sec improved to 1297.41 with four GPUs, and 1707.23 with eight GPUs.

Because you can pin single or multiple GPUs to a task, running advanced GPU based training tasks on Amazon ECS is easier than ever!

Cleanup

To clean up your running resources, delete the CloudFormation stack:

aws cloudformation delete-stack --stack-name tensorflow-test

Conclusion

For more information, see Working with GPUs on Amazon ECS.

If you want to keep up on the latest container info from AWS, please follow me on Twitter and tweet any questions! @brentContained

Amazon ECS Task Placement

Post Syndicated from tiffany jernigan (@tiffanyfayj) original https://aws.amazon.com/blogs/compute/amazon-ecs-task-placement/

Intro

Amazon Elastic Container Service (ECS) is a highly scalable, high-performance container orchestration service that allows you to easily run and scale containerized applications on AWS. This post covers how Amazon Elastic Container Service (Amazon ECS) runs containers in a cluster. Topics include why AWS built the task placement engine, the different strategies and constraints available to decide where and how containers are run, and things to consider when picking placement strategies.

If you are not familiar with the relationship between ECS and Amazon EC2 or its components, see the Building Blocks of Amazon ECS post.

Task Placement

When a task is launched in a cluster, a decision has to be made to choose which container instance should run that task. Conversely, when scaling down a service, a decision has to be made to choose the specific task to be terminated.

Task placement

By default, ECS uses the following placement strategies:

  • When you run tasks with the RunTask API action, tasks are placed randomly in a cluster.
  • When you launch and terminate tasks with the CreateService API action, the service scheduler spreads the tasks across the Availability Zones (and the instances within the zones) in a cluster.

Before December 2016, tasks could only be placed by their default placement strategies. This meant making the decision yourself, such as writing your own scheduler, and calling the StartTask API action to achieve custom task placement. When you manually constrained the placement of your grouping of containers, you could only place based on CPU, memory, and ports. Additionally, while creating your own scheduler can be powerful, there’s a tradeoff with complexity.

AWS built the task placement engine, which removes the need for you to build, run, and manage your own scheduling and placement services. There are several new features that provide you with more control over how applications run across clusters through custom attributes.

You can think of this flow as a funnel with filters for your instances. Constraints must be obeyed. If an instance doesn’t fit, it isn’t used. Strategies are then used to sort the rest of the instances by preference to determine which are the “best.”

For every instantiation of your task, it runs through every step. Calling run-task with a count of n is effectively calling run-task n times (create-service also works the same way).

Cluster Constraints, Placement Constraints, Placement Strategies

Example

Here’s how to use these placement features. In this example, you use the AWS CLI run-task command. For the last couple of filters, I show how to use them with placement flags, but you can just as easily include them in your task definition file instead. This can all be done in the console as well. Start with the cluster shown earlier:

Task Placement Instances

aws ecs run-task --task-definition nouvelleApp \
--placement-constraints type="memberOf",expression="attribute:ecs.instance-type == t2.small" \
--placement-strategy --placement-strategy type="binpack",field="memory" \
--count 8

Cluster constraints

In the first step, eliminate all the instances that don’t have the required resources based on what you defined either in the JSON task definition or what you provided overrides for to RunTask.

Not enough CPU? Not enough memory? A port is needed, but it is already in use on that instance? Then the instance is eliminated from the set of valid candidates.

Task Placement Cluster Constraints

aws ecs run-task --task-definition nouvelleApp

Placement constraints

In the second step, keep only the instances that satisfy the attribute or task group constraints. Yes, this means that you can indicate what instance to use for a task (for example, to make sure that CPU-intensive jobs are scheduled on the right type of instance, or in which Availability Zone).

You can also create any custom tags of your choosing. The green tasks on the green instances, the blue tasks on the blue instances! You can also use the Cluster Query Language to write expressions to check for multiple attributes. In the next section, I cover how to write and use the attributes and expressions.

Placement Constraints

--placement-constraints type="memberOf",expression="attribute:ecs.instance-type == t2.micro"

Placement strategies

In the third step, filter on the following supported task placement strategies:

  • random
  • binpack
  • spread

By default, tasks are randomly placed with RunTask or spread across Availability Zones with CreateService. Spread is typically used to achieve high availability by making sure that multiple copies of a task are scheduled across multiple instances based on attributes such as Availability Zones.

Conversely, binpack places tasks together to be as cost-efficient as possible. Later in this post, you’ll see how these placement strategies work, as well as how to chain them together and why you may want to do so.

Task Placement Binpack

--placement-strategy type="binpack",field="memory"

Task copies

This isn’t part of the filter, but instead, the count flag is used to indicate how many copies (n) of a given task to run. Effectively, it tells ECS to re-run this workflow n times. By default, the count is set to 1, so run-task is executed one time. For services, the desired-count flag is used.

--count 8

Attributes, task groups, and expressions

For task placement, you can use instance fields, such as attributes, as well as task groups. These can be used in expressions for task placement constraints, or instance fields can be used standalone for task placement strategies. Here’s a quick overview of attributes, task groups, and expressions before you go any further.

Instance: Fields

Because you are using these fields with respect to instances in task placement, the instance: preface is optional and can be used either of the following ways with a field name or an attribute.

instance:<field>
<field>

Field names

The currently supported field names are as follows:

ec2InstanceId
agentConnected

Attributes

There are also instance attributes, which are prefaced with attribute. Again, instance: is optional:

attribute:<attribute-name>

Built-in attributes

The following are some of the provided attributes:

ecs.ami-id
ecs.availability-zone
ecs.instance-type
ecs.os-type
ecs.subnet-id
ecs.vpc-id

Custom attributes

Well, what if you don’t see an attribute that you want? This is where custom attributes come in handy! Want to differentiate between test and prod? What about blue versus green?

aws ecs put-attributes \
--attributes name=color,value=blue,targetId=<your-container-instance-arn>

Task groups

In addition to placing tasks based on attributes, you can use task groups. Every task is assigned a group ID that you can reference in placement. For both tasks and services, a default ID is given, or you can choose your own. Perhaps you want to run version 2 of a service but only on instances with version 1.

task:group

Expressions

Alright, so you have some attributes and task groups… now what? Well, AWS created the Cluster Query Language to make it easy to create expressions for task placement constraints. These attributes and task groups are used with the available comparison operators, which may look familiar if you’ve used Boolean operators before. Some of these operators can be written in multiple ways, such as “!” or “not”.

For instance, to create an expression using a single attribute to select only t2.micro instances, use the ecs.instance-type attribute and the string equality comparator as follows:

attribute:ecs.instance-type == t2.micro

For t2.micro and t2.nano instances, you have a few options. You could use the same syntax as earlier with the or comparator:

attribute:ecs.instance-type == t2.micro or attribute:ecs.instance-type == t2.nano

Another way is to use the in comparator with an argument list:

attribute:ecs.instance-type in [t2.micro, t2.nano]

To include all t2 instances, use a wildcard and the pattern match operator instead of listing out each one:

attribute:ecs.instance-type =~ t2.*

Task group comparisons work the same way. The following snippet selects any instance upon which the task group “database” is running:

task:group == database

To select only task groups that are not “database,” combine expressions:

not(task:group == database)

You can use these expressions to filter your instances:

aws ecs list-container-instances \
--filter "attribute:ecs.instance-type != t2.micro"
aws ecs list-container-instances \
--filter "attribute:color == blue"
aws ecs list-container-instances \
--filter "task:group == database"

These expressions and attributes, respectively, are also used for task placement constraints and strategies, which I cover in the next few sections.

Constraints

Now look at placement constraints. When determining task placement, there may be certain EC2 instances to include or exclude from running containers. For example, you may want to place tasks only on GPU types.

Task placement constraints let you define where your containers should run across your cluster. ECS currently supports two types of placement constraints: distinctInstance and memberOf. By default, ECS spreads tasks across Availability Zones and instances.

  "placementConstraints": [ 
      { 
         "expression": "string",
         "type": "string"
      }
   ],

Distinct Instance

Distinct InstanceThe distinctInstance constraint makes it possible to ensure that every container is started on a unique instance in your cluster. The distinctInstance constraint never places multiple copies of a task on a single instance, even if you request more running tasks than available instances.

For example if you decide to place five copies of a task, each time it filters out the instances that are already running the task.

aws ecs run-task --task-definition nouvelleApp \
--count 5 --placement-constraints type="distinctInstance"

Member of

Member of t2-micro The memberOf constraint describes a set of instances on which your tasks should run. It is for anything you could define as an attribute or task. It also takes in an expression of attributes written in the Cluster Query Language.

For example, if you have a small application and just want it to run on t2.micro instances:

aws ecs run-task --task-definition nouvelleApp \
--count 5 \
--placement-constraints 
type="memberOf",expression="attribute:ecs.instance-type == t2.micro"

You can create expressions using the Cluster Query Language to check for multiple attributes. Here’s how you can weed out all instances in the us-west-2c Availability Zone as well as instances that aren’t of type t2.nano or t2.micro:

aws ecs run-task --task-definition nouvelleApp \
--count 5 \
--placement-constraints type="memberOf",expression="attribute:ecs.availability-zone != us-west-2c and (attribute:ecs.instance-type == t2.nano or attribute:ecs.instance-type == t2.micro)"

Member of affinity

You can also use constraints to place all tasks with the same task group on the same instance (affinity):

aws ecs run-task --task-definition nouvelleApp \
--count 5 --group webserver \
--placement-constraints type=memberOf,expression="task:group == webserver"

Or you can ensure that instances never have more than one task in the same group (anti-affinity):

aws ecs run-task –task-definition nouvelleApp –count 5 –group webserver –placement-constraints type=memberOf,expression=”not(task:group == webserver)”

Strategies

Now look at placement strategies. Placement strategies are used to identify an instance that meets a specific strategy. ECS supports three task placement strategies:

  • random
  • binpack
  • spread

Random is how RunTask places tasks by default and is fairly straightforward (it doesn’t require further parameters). The two other strategies, binpack and spread, take opposite actions. Binpack places tasks on as few instances as possible, helping to optimize resource utilization, while spread places tasks evenly across your cluster to help maximize availability. By default, ECS uses spread with the ecs.availability-zone attribute to place tasks.

   "placementStrategy": [ 
      { 
         "field": "string",
         "type": "string"
      }
   ],

Random

Placement Random

 Random places tasks on instances at random. This still honors the other constraints that you specified, implicitly or explicitly. Specifically, it still makes sure that tasks are scheduled on instances with enough resources to run them.

aws ecs run-task --task-definition nouvelleApp \
--count 5 \
--placement-strategy type="random"

Bin packing

Placement Binpack

The binpack strategy tries to fit your workloads in as few instances as possible. It gets its name from the bin packing problem where the goal is to fit objects of various sizes in the smallest number of bins. It is well suited to scenarios for minimizing the number of instances in your cluster, perhaps for cost savings, and lends itself well to automatic scaling for elastic workloads, to shut down instances that are not in use.

When you use the binpack strategy, you must also indicate if you are trying to make optimal use of your instances’ CPU or memory. This is done by passing an extra field parameter, which tells the task placement engine which parameter to use to evaluate how “full” your “bins” are. It then chooses the instance with the least available CPU or memory (depending on which you pick). If there are multiple instances with this CPU or memory remaining, it chooses randomly.

aws ecs run-task --task-definition nouvelleApp \
--count 8 --placement-strategy type="binpack",field="cpu"

aws ecs run-task --task-definition nouvelleApp \
--count 8 --placement-strategy type="binpack",field="memory"

Spread

Placement Spread

The spread strategy, contrary to the binpack strategy, tries to put your tasks on as many different instances as possible. It is typically used to achieve high availability and mitigate risks, by making sure that you don’t put all your task-eggs in the same instance-baskets. Spread across Availability Zones, therefore, is the default placement strategy used for services.

When using the spread strategy, you must also indicate a field parameter. It is used to indicate the “bins” that you are considering. The accepted values are instanceID to balance tasks across all instances, host, or attribute key:value pairs such as attribute:ecs.availability-zone to balance tasks across zones. There are several AWS attributes that start with the “ecs” prefix, but you can be creative and create your own attributes.

aws ecs run-task --task-definition nouvelleApp \
--count 8 \
--placement-strategy type="spread",field="attribute:ecs.availability-zone"

Chaining placement strategies

Placement binpack spread

Now that you’ve seen how to use task placement strategies, you can also chain multiple task placement strategies with their respective attributes together. You can have up to five strategy rules per service. Perhaps you want to spread tasks across Availability Zones and binpack:

aws ecs run-task --task-definition nouvelleApp \
--count 8 \
--placement-strategy type="spread",field="attribute:ecs.availability-zone" type="binpack",field="memory"

Use cases

Here are some use cases for task placement so you can see how they can be solved by combining attributes, expressions, constraints, and strategies.

Task creation

Mariya is fairly new to using containers and especially container orchestrators. She wants to try ECS and has a simple application that she first wants to get running on a single node. (Solution: Use the RunTask API.)

aws ecs run-task --task-definition nouvelleApp

Scaling

After trying this, Mariya wants to scale her application to run 10 containers across any available nodes in her cluster. (Solution: This means she needs to run a task using either random or spread placement strategies.)

aws ecs run-task --task-definition nouvelleApp \
--count 10 \
--placement-strategy type="random"

Availability

Mariya then realizes that if she wants her tasks to automatically restart themselves if they fail, or if she wants more than 10 instantiations of her task running, she needs to create a service. (Solution: Create a service.)

aws ecs create-service --task-definition nouvelleApp \
--desiredCount 300 --placement-strategy type="random"

Christopher wants to achieve high availability by distributing his tasks amongst all the instances in his cluster so he minimizes impact if any one host goes down. (Solution: To do this he uses spread placement over host name.)

aws ecs run-task --task-definition nouvelleApp \
--count 9 \
--placement-strategy type="spread",field="host"

Ming-ya wants to run a monitoring container on each instance in her cluster. To help her do this, she creates a service with a high desired count and a distinctInstance placement constraint. The ECS service scheduler ensures that each instance in the cluster runs this task (up to the desired count).

aws ecs create-service --service-name monitoring \
--task-definition monitor \
--desiredCount 500 \
--placement-constraints type="distinctInstance"

Availability and Task Groups

Alex wants to run a fleet of webservers. For performance reasons, they want each webserver to have local access to a caching process that was written by another team. They define their webserver as one task, the caching server as a second task. When they launch their webserver task they uses a placement constraint so that the tasks are only placed on instances that are already hosting the cache task. (Solution: Use placement constraints with a task group.)

aws ecs run-task --task-definition cache \
--group caching --count 9 \
--placement-constraints type="distinctInstance"

aws ecs run-task --task-definition webserver \
--count 9 \
--placement-constraints type="distinctInstance" type="memberOf",expression="task:group == caching"

Availability and resource optimization

Jake wants to achieve high availability, but he has a limited budget and needs to optimize all the resources he uses. (Solution: Take a balanced approach of spreading over availability Availability Zones and binpacking on memory within a zone.)

aws ecs run-task --task-definition nouvelleApp \
--count 9 \
--placement-strategy type="spread",field="attribute:ecs.availability-zone" type="binpack",field="memory"

Instance type selection

Aditya has a GPU workload that they want to run in containers on ECS. He needs to ensure that only GPU-enabled instances are used for this workload. (Solution: Create a service and spread on instance type = G2* or whatever other GPU-enabled instance types are in the cluster)

aws ecs create-service --service-name workload \
--task-definition GPU --desiredCount 30 \
--placement-constraints type="memberOf",expression="attribute:ecs.instance-type =~ g2* or attribute:ecs.instance-type =~ p2*"

Conclusion

You’ve now looked at task placement at a high level, as well as:

  • Attributes, task groups, and expressions
  • Constraints
  • Strategies
  • Example use cases

To dive deeper into any of these aspects, check out Task Placement. Also, feel free to ask any questions!

@tiffanyfayj

 

Tagging container image repositories on Amazon ECR

Post Syndicated from Brent Langston original https://aws.amazon.com/blogs/compute/tagging-container-image-repositories-on-amazon-ecr/

Starting today, you can add tags to your Amazon Elastic Container Registry (Amazon ECR) resources. This new feature enables better grouping of ECR repositories, better searching and filtering in the console, and better cost allocation. In this post, I show you how to create a tagging strategy.

You might have many ECR repositories and want start assigning tags to each of them. Two strategies come to mind almost immediately:

  • You could have repositories to host your development Docker images, and keep different repositories for hosting production images.
  • You could group repositories together according to the organization of the development teams.

Or, you can follow both strategies. Here’s a typical ecommerce application as an example. The services are organized as follows:

  • Accounts team
    • users
    • password
    • email
    • 2fa
  • Inventory team
    • catalog
    • pricing
    • backorders
  • Cart team
    • contents
    • shipping
    • coupons

In this example company, there are 10 services to manage. Realistically, these services would easily number into the hundreds or thousands for many ecommerce websites. You would likely have one development set of repositories for each service, and another production set.

Tag the repositories by team and by service level: development or production. Here’s one example, for the Accounts team repos in development.

You can also use tags in an IAM policy to allow your dev teams access to the development version of their repositories. For example, you can restrict the ECS and EKS instances to their service level, and allow the SRE team access to all repositories.

Here is an example IAM policy that restricts access to the Accounts team:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "ecr:GetAuthorizationToken",
            "ecr:BatchCheckLayerAvailability",
            "ecr:GetDownloadUrlForLayer",
            "ecr:GetRepositoryPolicy",
            "ecr:DescribeRepositories",
            "ecr:ListImages",
            "ecr:DescribeImages",
            "ecr:BatchGetImage",
            "ecr:InitiateLayerUpload",
            "ecr:UploadLayerPart",
            "ecr:CompleteLayerUpload",
            "ecr:PutImage"
        ],
        "Resource": "*",
        "Condition": {"StringLike": {"aws:RequestTag/Team": "Accounts"}}
        
    }]
}

After you configure these tags appropriately for your repos and set the IAM policy for specific teams, developers can push and pull from any repo tagged for their team. They can even access future repos that are added with their team’s tag. They do not have access to push or pull from a different team’s repo.

You can also use tags to track costs and review in the Cost Allocation report. This helps you understand how much you’re spending in dev or prod, and how much for each service and in dev or prod per service group. Add your billing tags to your repositories.

Summary

With today’s feature launch for adding tags to ECR resources, you can now apply policies and analyze costs by grouping ECR repositories together in many different ways. For more information, see Tagging Your Amazon ECS Resources. For the public roadmap for container releases, see the containers-roadmap on GitHub.

I hope you find this helpful. If you have other creative and useful tagging schemes, for ECR or ECS, please share in the comments.

— Brent

Introducing AWS App Mesh – service mesh for microservices on AWS

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/introducing-aws-app-mesh-service-mesh-for-microservices-on-aws/

AWS App Mesh is a service mesh that allows you to easily monitor and control communications across microservices applications on AWS. You can use App Mesh with microservices running on Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Container Service for Kubernetes (Amazon EKS), and Kubernetes running on Amazon EC2.

Today, App Mesh is available as a public preview. In the coming months, we plan to add new functionality and integrations.

Why App Mesh?

Many of our customers are building applications with microservices architectures, breaking applications into many separate, smaller pieces of software that are independently deployed and operated. Microservices help to increase the availability and scalability of an application by allowing each component to scale independently based on demand. Each microservice interacts with the other microservices through an API.

When you start building more than a few microservices within an application, it becomes difficult to identify and isolate issues. These can include high latencies, error rates, or error codes across the application. There is no dynamic way to route network traffic when there are failures or when new containers need to be deployed.

You can address these problems by adding custom code and libraries into each microservice and using open source tools that manage communications for each microservice. However, these solutions can be hard to install, difficult to update across teams, and complex to manage for availability and resiliency.

AWS App Mesh implements a new architectural pattern that helps solve many of these challenges and provides a consistent, dynamic way to manage the communications between microservices. With App Mesh, the logic for monitoring and controlling communications between microservices is implemented as a proxy that runs alongside each microservice, instead of being built into the microservice code. The proxy handles all of the network traffic into and out of the microservice and provides consistency for visibility, traffic control, and security capabilities to all of your microservices.

Use App Mesh to model how all of your microservices connect. App Mesh automatically computes and sends the appropriate configuration information to each microservice proxy. This gives you standardized, easy-to-use visibility and traffic controls across your entire application.  App Mesh uses Envoy, an open source proxy. That makes it compatible with a wide range of AWS partner and open source tools for monitoring microservices.

Using App Mesh, you can export observability data to multiple AWS and third-party tools, including Amazon CloudWatch, AWS X-Ray, or any third-party monitoring and tracing tool that integrates with Envoy. You can configure new traffic routing controls to enable dynamic blue/green canary deployments for your services.

Getting started

Here’s a sample application with two services, where service A receives traffic from the internet and uses service B for some backend processing. You want to route traffic dynamically between services B and B’, a new version of B deployed to act as the canary.

First, create a mesh, a namespace that groups related microservices that must interact.

Next, create virtual nodes to represent services in the mesh. A virtual node can represent a microservice or a specific microservice version. In this example, service A and B participate in the mesh and you manage the traffic to service B using App Mesh.

Now, deploy your services with the required Envoy proxy and with a mapping to the node in the mesh.

After you have defined your virtual nodes, you can define how the traffic flows between your microservices. To do this, define a virtual router and routes for communications between microservices.

A virtual router handles traffic for your microservices. After you create a virtual router, you create routes to direct traffic appropriately. These routes include the connection requests that the route should accept, where they should go, and the weighted amount of traffic to send. All of these changes to adjust traffic between services is computed and sent dynamically to the appropriate proxies by App Mesh to execute your deployment.

You now have a virtual router set up that accepts all traffic from virtual node A sending to the existing version of service B, as well some traffic to the new version, B’.

Exporting metrics, logs, and traces

One of benefits about placing a proxy in front of every microservice is that you can automatically capture metrics, logs, and traces about the communication between your services. App Mesh enables you to easily collect and export this data to the tools of your choice. Envoy is already integrated with several tools like Prometheus and Datadog.

During the preview, we are adding support for AWS services such as Amazon CloudWatch and AWS X-Ray. We have a lot more integrations planned as well.

Available now

AWS App Mesh is available as a public preview and you can start using it today in the North Virginia, Ohio, Oregon, and Ireland AWS Regions. During the preview, we plan to add new features and want to hear your feedback. You can check out our GitHub repository for examples and our roadmap.

— Nate

Scanning Docker Images for Vulnerabilities using Clair, Amazon ECS, ECR, and AWS CodePipeline

Post Syndicated from tiffany jernigan (@tiffanyfayj) original https://aws.amazon.com/blogs/compute/scanning-docker-images-for-vulnerabilities-using-clair-amazon-ecs-ecr-aws-codepipeline/

Post by Vikrama Adethyaa, Solution Architect and Tiffany Jernigan, Developer Advocate

 

Containers are an increasingly important way for you to package and deploy your applications. They are lightweight and provide a consistent, portable software environment for applications to easily run and scale anywhere.

A container is launched from a container image, an executable package that includes everything needed to run an application: the application code, configuration files, runtime (for example, Java, Python, etc.), libraries, and environment variables.

A container image is built up from a series of layers. For a Docker image, each layer in the image represents an instruction in the image’s Dockerfile. A parent image is the image on which your image is built. It refers to the contents of the FROM directive in the Dockerfile. Most Dockerfiles start from a parent image, and often the parent image was downloaded from a public registry.

It is incredibly difficult and time-consuming to manually track all the files, packages, libraries, and so on, included in an image along with the vulnerabilities that they may possess. Having a security breach is one of the costliest things an organization can endure. It takes years to build up a reputation and only seconds to tear it down.

One way to prevent breaches is to regularly scan your images and compare the dependencies to a known list of common vulnerabilities and exposures (CVEs). Public CVE lists contain an identification number, description, and at least one public reference for known cybersecurity vulnerabilities. The automatic detection of vulnerabilities helps increase awareness and best security practices across developer and operations teams. It encourages action to patch and address the vulnerabilities.

This post walks you through the process of setting up an automated vulnerability scanning pipeline. You use AWS CodePipeline to scan your container images for known security vulnerabilities and deploy the container only if the vulnerabilities are within the defined threshold.

This solution uses CoresOS Clair for static analysis of vulnerabilities in container images. Clair is an API-driven analysis engine that inspects containers layer-by-layer for known security flaws. Clair scans each container layer and provides a notification of vulnerabilities that may be a threat, based on the CVE database and similar data feeds from Red Hat, Ubuntu, and Debian.

Deploying Clair

Here’s how to install Clair on AWS. The following diagram shows the high-level architecture of Clair.

Clair uses PostgreSQL, so use Aurora PostgreSQL to host the Clair database. You deploy Clair as an ECS service with the Fargate launch type behind an Application Load Balancer. The Clair container is deployed in a private subnet behind the Application Load Balancer that is hosted in the public subnets. The private subnets must have a route to the internet using the NAT gateway, as Clair fetches the latest vulnerability information from multiple online sources.

Prerequisites

Ensure that the following are installed or configured on your workstation before you deploy Clair:

  • Docker
  • Git
  • AWS CLI installed
  • AWS CLI is configured with your access key ID and secret access key, and the default region as us-east-1

Download the AWS CloudFormation template for deploying Clair

To help you quickly deploy Clair on AWS and set up CodePipeline with automatic vulnerability detection, use AWS CloudFormation templates that can be downloaded from the aws-codepipeline-docker-vulnerability-scan GitHub repository. The repository also includes a simple, containerized NGINX website for testing your pipeline.

# Clone the GitHub repository
git clone https://github.com/aws-samples/aws-codepipeline-docker-vulnerability-scan.git

cd aws-codepipeline-docker-vulnerability-scan

VPC requirements

We recommend a VPC with the following specification for deploying CoreOS Clair:

  • Two public subnets
  • Two private subnets
  • NAT gateways to allow internet access for services in private subnets

You can create such a VPC using the AWS CloudFormation template networking-template.yaml that is included in the sample code you cloned from GitHub.

# Create the VPC
aws cloudformation create-stack \
--stack-name coreos-clair-vpc-stack \
--template-body file://networking-template.yaml

# Verify that stack creation is complete
aws cloudformation wait stack-create-complete \
–stack-name coreos-clair-vpc-stack

# Get stack outputs
aws cloudformation describe-stacks \
--stack-name coreos-clair-vpc-stack \
--query 'Stacks[].Outputs[]'

Build the Clair Docker image

First, create an Amazon Elastic Container Registry (Amazon ECR) repository to host your Clair Docker image. Then, build the Clair Docker image on your workstation and push it to the ECR repository that you created.

# Create the ECR repository
# Note the URI and ARN of the ECR Repository
aws ecr create-repository --repository-name coreos-clair

# Build the Docker image
docker build -t <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/coreos-clair:latest ./coreos-clair

# Push the Docker image to ECR
aws ecr get-login --no-include-email | bash
docker push <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/coreos-clair:latest

Deploy Clair using AWS CloudFormation

Now that the Clair Docker image has been built and pushed to ECR, deploy Clair as an ECS service with the Fargate launch type. The following AWS CloudFormation stack creates an ECS cluster named clair-demo-cluster and deploys the Clair service.

# Create the AWS CloudFormation stack
# <ECRRepositoryUri> - CoreOS Clair ECR repository URI without an image tag
# Example - <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/coreos-clair

aws cloudformation create-stack \
--stack-name coreos-clair-stack \
--template-body file://coreos-clair/clair-template.yaml \
--capabilities CAPABILITY_IAM \
--parameters \
ParameterKey="VpcId",ParameterValue="<VpcId>" \
ParameterKey="PublicSubnets",ParameterValue=\"<PublicSubnet01-ID>,<PublicSubnet02-ID>\" \
ParameterKey="PrivateSubnets",ParameterValue=\"<PrivateSubnet01-ID>,<PrivateSubnet02-ID>\" \
ParameterKey="ECRRepositoryUri",ParameterValue="<ECRRepositoryUri>"

# Verify that stack creation is complete
aws cloudformation wait stack-create-complete \
–stack-name coreos-clair-stack

# Get stack outputs
# Note the ClairAlbDnsName
aws cloudformation describe-stacks \
--stack-name coreos-clair-stack \
--query 'Stacks[].Outputs[]'

Deploying the sample website

Deploy a simple static website running on NGINX as a container. An AWS CloudFormation template is included in the sample code that you cloned from GitHub.

Create a CodeCommit repository for the NGINX website

You create an AWS CodeCommit repository to host the sample NGINX website code. This repository is the source of the pipeline that you create later. Before you proceed with the following steps, ensure SSH authentication to CodeCommit.

# Create the CodeCommit repository
# Note the cloneUrlSsh value
aws codecommit create-repository --repository-name my-nginx-website
 
# Clone the empty CodeCommit repository
cd ../
git clone <cloneUrlSsh>

# Copy the contents of nginx-website to my-nginx-website
cp -R aws-codepipeline-docker-vulnerability-scan/nginx-website/ my-nginx-website/

# Commit the changes
cd my-nginx-website/
git add *
git commit -m "Initial commit"
git push

Build the NGINX Docker image

Create an ECR repository to host your NGINX website Docker image. Build the image on your workstation using the file Dockerfile-amznlinux, where Amazon Linux is the parent image. After the image is built, push it to the ECR repository that you created.

# Create an ECR repository
# Note the URI and ARN of the ECR repository
aws ecr create-repository --repository-name nginx-website

# Build the Docker image
docker build -f Dockerfile-amznlinux -t <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/nginx-website:latest .

# Push the Docker image to ECR
docker push <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/nginx-website:latest

Deploy the NGINX website using AWS CloudFormation

Now deploy the NGINX website. The following stack deploys the NGINX website onto the same ECS cluster (clair-demo-cluster) as Clair.

# Create the AWS CloudFormation stack
# <ECRRepositoryUri> - Nginx-Website ECR Repository URI without Image tag
# Example: <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/nginx-website

cd ../aws-codepipeline-docker-vulnerability-scan/

aws cloudformation create-stack \
--stack-name nginx-website-stack \
--template-body file://nginx-website/nginx-website-template.yaml \
--capabilities CAPABILITY_IAM \
--parameters \
ParameterKey="VpcId",ParameterValue="<VpcId>" \
ParameterKey="PublicSubnets",ParameterValue=\"<PublicSubnet01-ID>,<PublicSubnet02-ID>\" \
ParameterKey="PrivateSubnets",ParameterValue=\"<PrivateSubnet01-ID>,<PrivateSubnet02-ID>\" \
ParameterKey="ECRRepositoryUri",ParameterValue="<ECRRepositoryUri>"

# Verify that stack creation is complete
aws cloudformation wait stack-create-complete \
–stack-name nginx-website-stack

# Get stack outputs
aws cloudformation describe-stacks \
--stack-name nginx-website-stack \
--query 'Stacks[].Outputs[]'

Note the AWS CloudFormation stack outputs. The stack output contains the Application Load Balancer URL for the NGINX website and the ECS service name of the NGINX website. You need the ECS service name for the pipeline.

Building the pipeline

In this section, you build a pipeline to automate vulnerability scanning for the nginx-website Docker image builds. Every time that a code change is made, the Docker image is rebuilt and scanned for vulnerabilities. Only if vulnerabilities are within the defined threshold is the container is deployed onto ECS. For more information, see Tutorial: Continuous Deployment with AWS CodePipeline.

The sample code includes an AWS CloudFormation template to create the pipeline. The buildspec.yml file is used by AWS CodeBuild to build the nginx-website Docker image and scan the image using Clair.

CodeBuild build spec

build spec is a collection of build commands and related settings, in YAML format, that AWS CodeBuild uses to run a build. You can include a build spec in the root directory of your application source code, or you can define a build spec when you create a build project.

In this sample app, you include the build spec in the root directory of your sample application source code. The buildspec.yml file is located in the /aws-codepipeline-docker-vulnerability-scan/nginx-website folder.

Use Klar, a simple tool to analyze images stored in a private or public Docker registry for security vulnerabilities using Clair. Klar serves as a client which coordinates the image checks between ECR and Clair.

In the buildspec.yml file, you set the variable CLAIR_OUTPUT=Critical. CLAIR_OUTPUT defines the severity level threshold. Vulnerabilities with severity levels higher than or equal to this threshold are outputted. The supported levels are:

  • Unknown
  • Negligible
  • Low
  • Medium
  • High
  • Critical
  • Defcon1

You can configure Klar to your requirements by setting the variables as defined in https://github.com/optiopay/klar.

# Set the following variables as CodeBuild project environment variables
# ECR_REPOSITORY_URI
# CLAIR_URL

version: 0.2
phases:
  pre_build:
    commands:
      - echo Fetching ECR Login
      - ECR_LOGIN=$(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email)
      - echo Logging in to Amazon ECR...
      - $ECR_LOGIN
      - IMAGE_TAG=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)
      - echo Downloading Clair client Klar-2.1.1
      - wget https://github.com/optiopay/klar/releases/download/v2.1.1/klar-2.1.1-linux-amd64
      - mv ./klar-2.1.1-linux-amd64 ./klar
      - chmod +x ./klar
      - PASSWORD=`echo $ECR_LOGIN | cut -d' ' -f6`
  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...
      - docker build -t $ECR_REPOSITORY_URI:latest .
      - docker tag $ECR_REPOSITORY_URI:latest $ECR_REPOSITORY_URI:$IMAGE_TAG
  post_build:
    commands:
      - bash -c "if [ /"$CODEBUILD_BUILD_SUCCEEDING/" == /"0/" ]; then exit 1; fi"
      - echo Build completed on `date`
      - echo Pushing the Docker images...
      - docker push $ECR_REPOSITORY_URI:latest
      - docker push $ECR_REPOSITORY_URI:$IMAGE_TAG
      - echo Running Clair scan on the Docker Image
      - DOCKER_USER=AWS DOCKER_PASSWORD=${PASSWORD} CLAIR_ADDR=$CLAIR_URL CLAIR_OUTPUT=Critical ./klar $ECR_REPOSITORY_URI
      - echo Writing image definitions file...
      - printf '[{"name":"MyWebsite","imageUri":"%s"}]' $ECR_REPOSITORY_URI:$IMAGE_TAG > imagedefinitions.json
artifacts:
  files: imagedefinitions.json

The build spec does the following:

Pre-build stage:

  • Log in to ECR.
  • Download the Clair client Klar.

Build stage:

  • Build the Docker image and tag it as latest and with the Git commit ID.

Post-build stage:

  • Push the image to your ECR repository with both tags.
  • Trigger Klar to scan the image that you pushed to ECR for security vulnerabilities using Clair.
  • Write a file called imagedefinitions.json in the build root that has your Amazon ECS service’s container name and the image and tag. The deployment stage of your CD pipeline uses this information to create a new revision of your service’s task definition. It then updates the service to use the new task definition. The imagedefinitions.json file is required for the AWS CodeDeploy ECS job worker.

Deploy the pipeline

Deploy the pipeline using the AWS CloudFormation template provided with the sample code. The following template creates the CodeBuild project, CodePipeline pipeline, Amazon CloudWatch Events rule, and necessary IAM permissions.

# Deploy the pipeline
 
# Replace the following variables 
# WebsiteECRRepositoryARN – NGINX website ECR repository ARN
# WebsiteECRRepositoryURI – NGINX website ECR repository URI
# ClairAlbDnsName - Output variable from coreos-clair-stack
# EcsServiceName – Output variable from nginx-website-stack

aws cloudformation create-stack \
--stack-name nginx-website-codepipeline-stack \
--template-body file://clair-codepipeline-template.yaml \
--capabilities CAPABILITY_IAM \
--disable-rollback \
--parameters \
ParameterKey="EcrRepositoryArn",ParameterValue="<WebsiteECRRepositoryARN>" \
ParameterKey="EcrRepositoryUri",ParameterValue="<WebsiteECRRepositoryURI>" \
ParameterKey="ClairAlbDnsName",ParameterValue="<ClairAlbDnsName>" \
ParameterKey="EcsServiceName",ParameterValue="<WebsiteECSServiceName>"

# Verify that stack creation is complete
aws cloudformation wait stack-create-complete \
–stack-name nginx-website-codepipeline-stack

The pipeline is triggered after the AWS CloudFormation stack creation is complete. You can log in to the AWS Management Console to monitor the status of the pipeline. The vulnerability scan information is available in CloudWatch Logs.

You can also modify the CLAIR_OUTPUT value from Critical to High in the buildspec.yml file in the /cores-clair-ecs-cicd/nginx-website-repo folder and then check the status of the build.

Summary

I’ve described how to deploy Clair on AWS and set up a release pipeline for the automated vulnerability scanning of container images. The Clair instance can be used as a centralized Docker image vulnerability scanner and used by other CodeBuild projects. To meet your organization’s security requirements, define your vulnerability threshold in Klar by setting the variables, as defined in https://github.com/optiopay/klar.

Introducing private registry authentication support for AWS Fargate

Post Syndicated from tiffany jernigan (@tiffanyfayj) original https://aws.amazon.com/blogs/compute/introducing-private-registry-authentication-support-for-aws-fargate/

Private registry authentication support for Amazon Elastic Container Service (Amazon ECS) is now available with the AWS Fargate launch type! Now, in addition to Amazon Elastic Container Registry (Amazon ECR), you can use any private registry or repository of your choice for both EC2 and Fargate launch types.

For ECS to pull from a private repository, it needs a secret in AWS Secrets Manager with your registry credentials, an ECS task execution IAM role in AWS Identity Access Management (IAM) with a policy granting access to the secret, and a task with the secret and task execution IAM role ARNs in the task definition.

Diagram of ECS Private Registry Authentication Architecture

Here’s how to use ECS with a private repository on Docker Hub via the AWS Management Console.

Registry

If you don’t already have a private repository (or account), you can create a free repo now. To follow along, run the following commands in a terminal to pull an image, get the image ID, and push it to your new repository:

docker pull tiffanyfay/space
docker images tiffanyfay/space --format {{.ID}}
docker tag <image-id> <your-username/repository-name>:latest
docker login
docker push <your-username/repository-name>

Secrets Manager

In the Secrets Manager console, store a new secret with your Docker Hub credentials, which is used to access your private repository.

By default, Secrets Manager creates an encryption key, DefaultEncryptionKey, on your behalf. You can instead use an existing key or add a new one with AWS Key Management Service (AWS KMS), if you would prefer.

Choose Other type of secrets and add secret keys and values for username and password.

Next, create a name, such as dockerhub, and description for your secret.

Because the keys are corresponding to your Docker Hub credentials, leave rotation disabled.

On the next page, you can review your settings and store your secret. Open your new secret to see the details. Write down the Secret ARN value and keep it handy, as it is used in the next step and later, in your task definition.

IAM

Now that you have a secret, you need to provide Fargate permissions to read it. This is done via a task execution IAM role.

In the IAM console, choose Policies, Create policy. Provide Secrets Manager with read access for secretsmanager:GetSecretValue, with your secret’s ARN as the resource.

Name your policy dockerhubsecret.

If you chose to use your own encryption key, you also need to create a policy with kms:Decrypt permissions for KMS.

Next, choose Role to create an IAM role, which is used as your task execution role. Choose AWS service, Elastic Container Service, and Elastic Container Service Task.

Search for your dockerhubsecret policy and attach it to the role.

Lastly, give the role a name, such as ecsExecutionRoleDockerHub, and create it. Copy the role ARN value. Depending on how you create your task definition, you may need it.

ECS

While the mechanism to authenticate private registries is supported on both EC2 and Fargate launch types, for this example we will be launching a task on Fargate.

Before you can create a task, you need an ECS cluster, VPC, and subnets. If you don’t already have them, in the ECS console, choose Clusters, Get Started. Keep track of the cluster name, VPC ID, and subnet IDs, as you use them soon.

It’s time to create your task definition, which is used to create your task (grouping of up to ten containers that run on the same host). This is where you need your Secrets Manager ARN and IAM role name.

Choose Task Definitions, Create new Task Definition, and select the Fargate launch type. You can then configure your task definition via the wizard or scroll down, choose Configure via JSON and paste the following task definition after replacing fields with angle brackets. This task definition also works with the EC2 launch type.

{
    "family": "space-td",
    "containerDefinitions": [
        {
            "name": "space",
            "image": "<your-username/repository-name>",
            "portMappings": [
                {
                    "protocol": "tcp",
                    "containerPort": 80
                }
            ],
            "cpu": 0,
            "repositoryCredentials": {
                "credentialsParameter": "<secret-ARN>"
            }
        }
    ],
    "memory": "512",
    "cpu": "256",
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "networkMode": "awsvpc",
    "executionRoleArn": "<execution-role-ARN>"
}

If you use the wizard, give your task a name, such as space-td, and specify your task execution IAM role (ecsTaskExecutionRoleDockerHub), a task size of 0.5 GB of memory, and 0.25 vCPU.

Next, choose Container Definitions, Add container. Give the container a name, specify your image <your-username/repository-name>, check the box for private registry authentication, and add your secrets manager ARN and a container port 80. Choose Add.

After you create your task definition, choose Actions, Run Task, and specify the Fargate launch type, your cluster, cluster VPC, subnets, a security group with inbound permissions for your container ports (the default one provides access to port 80). Enable auto-assigning a public IP address.

Open the task from its ID to see the details:

When the Last status field is RUNNING, under Network, copy the public IP address and paste it in a browser.

If you used pushed tiffanyfay/space to your repository, you should see the following:

I hope this post has helped you. If you have any questions, feel free to reach out!

-tiffany

Special thanks to Yuling Zhou, Deepak Dayama, Derek Petersen, Varun Iyer, Adnan Khan and several others for their insights in this blog.

tiffany jernigan

tiffany jernigan

@tiffanyfayj
Tiffany is a developer advocate at Amazon for containers on AWS. Previously she worked at Docker and Intel in software engineering and as a hardware engineer after graduating from Georgia Tech in Electrical Engineering. In the majority of her free time she dabbles in photography and spends time with family and friends. You can find her on twitter/ig as tiffanyfayj.

Hosting ASP.NET Core applications in Amazon ECS using AWS Fargate

Post Syndicated from Sundar Narasiman original https://aws.amazon.com/blogs/compute/hosting-asp-net-core-applications-in-amazon-ecs-using-aws-fargate/

There is an increasing amount of customer interest in hosting microservices-based applications using Amazon Elastic Container Service (ECS), largely due to the benefits offered by AWS Fargate.

AWS Fargate is a compute engine for containers that allows you to run containers without needing to provision, manage, or scale any Amazon EC2 compute infrastructure. Fargate works with Amazon ECS and can run microservices developed in many programming languages or application frameworks. This includes Java, .NET Core, Python, Node.js, Go, or Ruby on Rails. Nowadays, enterprises that are building microservices applications using .NET are using .NET core because of the cross-platform support (the ability to run in Linux).

In this post, I cover how to host a cross-platform ASP.NET core application using AWS Fargate.

Reference architecture

A good reference architecture for AWS Fargate application deployment should cover the VPC, Subnets, Load Balancer, Internet Gateway, Elastic Network Interface (ENI), AWS Fargate Task, Network ACLs, and Security Groups. The architectural choices for VPC Networking, Load Balancing, and Container Networking are also important.

There are a couple of networking approaches for deploying containers in Amazon ECS:

  • Deploy containers in the public VPC Subnet with direct Internet access
  • Deploy containers in the private VPC Subnet without direct Internet access

Because the ASP.NET Core application is going to serve traffic from the Internet, we will deploy containers in the Public VPC Subnet with direct Internet access.

When it comes to sending traffic to containers through the Load Balancer, the following options are available:

  • A public Load Balancer that accepts traffic from the Internet and route it to container through the AWS Fargate Task’s Elastic Network Interface (ENI).
  • A private, Internal Load Balancer that only accepts traffic from other containers in the cluster

Because the ASP.NET Core application container lives in the web tier, go with a public Load Balancer. The public Load Balancer accepts traffic from the Internet and routes it to the container through the AWS Fargate Task’s Elastic Network Interface (ENI).

Based on these considerations, the reference architecture for deploying to AWS Fargate should look like this diagram:

This solution deploys containers in a public Subnet (inside a VPC). The AWS Fargate Task and the two containers are hosted with direct access to the internet. They are also accessible to clients, using the public Load Balancer.

Walkthrough

To implement this architecture, we will do the following:

  1. Containerize the ASP.NET core application.
  2. Configure the reverse-proxy server.
  3. Containerize the NGINX reverse-proxy server.
  4. Create the Docker Compose file.
  5. Push container images to Amazon ECR.
  6. Create the ECS cluster.
  7. Create an Application Load Balancer.
  8. Create an AWS Fargate Task definition.
  9. Create the Amazon ECS service.

Code examples

The code examples, Dockerfile definition, Docker Compose file, and ECS task definition for this solution are available in the amazon-ecs-fargate-aspnetcore GitHub repository.

Pre-requisites

The development environment needs to have the following pre-requisites :-

  • Mac OS latest version (or) Windows 10 with latest updates (or) Ubuntu 16.0.4 or higher
  • .NET core 2.0 or higher
  • Docker latest version
  • aws cli
  • aws-ecs cli

Containerize the ASP.NET Core application

The first step in this journey is to containerize the ASP.NET Core application.

If you are using Visual Studio 2017 or later with the latest updates in Windows, you can add container support to the solution. Open the context (right-click) menu for the existing project and add Docker support.

If you are developing in Linux or Mac OS, you must explicitly add a Dockerfile.

The Dockerfile definition should look like the following, irrespective of the operating system used for development.

FROM microsoft/aspnetcore:2.0
WORKDIR /mymvcweb
COPY bin/Release/netcoreapp2.0/publish . 
ENV ASPNETCORE_URLS http://+:5000
EXPOSE 5000
ENTRYPOINT ["dotnet", "mymvcweb.dll"]

This Dockerfile definition creates an application container based on the microsoft/aspnetcore:2.0 base image. It publishes the contents of the bin/Release folder to a specified work directory, starts the default Kestrel web server and listens on port 5000 to serve web traffic.

By default, ASP.NET core uses Kestrel as the web server. Kestrel is a lightweight HTTP server and is great for serving dynamic content from ASP.NET core. However, for capabilities such as serving static content, caching requests, compressing requests, and terminating SSL from the HTTP server, a dedicated reverse-proxy server like NGINX is required.

Configure the reverse-proxy server

NGINX can act as both the HTTP and reverse-proxy server. NGINX is highly adopted because of its asynchronous, event-driven architecture that allows it to serve thousands of concurrent requests with a low-memory footprint.

In this solution, deploy a NGINX (reverse-proxy server) container in front of the application (ASP.NET core) container, defined in the AWS Fargate Task.

The reverse-proxy configuration file nginx.conf should be defined as follows:

worker_processes 4;
 
events { worker_connections 1024; }
 
http {
    sendfile on;
 
    upstream app_servers {
        server 127.0.0.1:5000;
    }
 
    server {
        listen 80;
 
        location / {
            proxy_pass         http://app_servers;
            proxy_redirect     off;
            proxy_set_header   Host $host;
            proxy_set_header   X-Real-IP $remote_addr;
            proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header   X-Forwarded-Host $server_name;
        }
    }
}

The NGINX container is set to listen on port 80 and it is configured to forward the request to the application container listening on port 5000. The attribute upstream app_server in the nginx.conf file must be set with a value of mymvcweb:5000 in the local development environment.

Containerize the NGINX reverse-proxy server

Create a Dockerfile definition like the following to containerize the NGINX reverse-proxy server. It should look like the following:

FROM nginx
COPY nginx.conf /etc/nginx/nginx.conf

Create the Docker Compose file

Next, use docker-compose to define these two containers as a microservices in the local development environment. The Docker Compose file should look like the following:

version: '2'
services:
  mymvcweb:
    build:
      context: ./mymvcweb
      dockerfile: Dockerfile
    expose:
      - "5000"
  reverseproxy:
    build:
      context: ./reverseproxy
      dockerfile: Dockerfile
    ports:
      - "80:80"
    links :
      - mymvcweb

These two containers can be built and tested by issuing the following docker-compose commands:

docker-compose build
docker-compose up

Open http://localhost:80 in the browser and it should render the default view of ‘index. cshtml’. Whenever there is a change to the application code or container definition, the docker-compose cache should be cleaned to affect the latest changes. To do this, run the following docker-compose commands:

docker-compose stop
docker-compose rm
docker-compose rmi ‘containerimageid’

Push container images to Amazon ECR

Next, push the container images from the local environment to Amazon Elastic Container Registry (ECR) so that the container images are available in Amazon ECR before the creation of AWS Fargate cluster.

Before you deploy this application to ECS, the upstream app_server attribute in the nginx.conf file must be set with the value of 127.0.0.1:5000. This enables the communication with the upstream application container listening on port 5000.

The first step to push the container images to ECR is to fetch the docker login command with the required security tokens. Run the following command:

aws ecr get-login --no-include-email --region us-east-1

It should return you a Docker login command with a security token. Copy the command and tokens and run it.

The second step is to tag the local container image with the remote ECR repository. Run the following command:

docker tag aspnetcorefargate_mymvcweb:latest <yourawsaccountnumber>.dkr.ecr.us-east-1.amazonaws.com/mymvcweb:latest

The third step is to push the tagged image to the remote ECR registry. Run the following command:

docker push <yourawsaccountnumber>.dkr.ecr.us-east-1.amazonaws.com/mywebmvc:latest

The above steps are repeated for the NGINX container as well. Now you have the container images available in ECR.

Create the Amazon ECS cluster

The Amazon ECS cluster is a logical grouping for AWS Fargate and Amazon ECS tasks. The cluster remains an administrative boundary for running every application.

In the AWS Management Console, Navigate to Create Cluster and select Networking only.
Since we’re going to create and host the Amazon ECS Service with AWS Fargate as the launch type, the notion of the Amazon ECS Cluster becomes a logical boundary. We need not create ECS instances while creating Amazon ECS Cluster, when the launch type is Fargate. Hence, we can create the Fargate cluster with required networking constructs such as VPC and Subnets.

Name the cluster and select Creation of new VPC for this cluster.

Leave the rest of the fields as their default values. You now have a VPC with two public subnets.

Create an Application Load Balancer

Next, create an Application Load Balancer, as defined in the reference architecture. The Application Load Balancer is required to load balance across multiple AWS Fargate tasks.

In the EC2 console, navigate to Create Load Balancer. Name your Load Balancer as aspnetcorefargatealb.

For Scheme, select internet-facing. For IP address type, choose ipv4. The Load Balancer listens on port 80 (HTTP). The Load Balancer’s Security Group should also allow traffic on port 80 (HTTP) from the internet.

While configuring the routing for the Load Balancer, for Target type, choose ip. For Protocol, choose HTTP. For Path, enter / (forward slash).

For more information, see Creating an Application Load Balancer.

Create an AWS Fargate Task definition

The AWS Fargate Task definition is an important resource, acts as a blueprint for the AWS Fargate task. The Task definition defines parameters such as:

  • Container image URL
  • CPU
  • Memory
  • IAM execution role
  • Host port
  • Container port
  • Log configurations
  • Container networking mode
  • Task type
  • Mount point
  • Volume

A Fargate Task is the running instance of Task definition. Each Task represents a microservice. Tasks can be managed and independently scaled using AWS Fargate Service, which is explained in the upcoming sections.

In the console, choose Task Definitions, Create new Task Definition. For more information, see Creating a Task Definition.

Use the following AWS Fargate Task definition, which based on the reference architecture defined for this walkthrough. Replace <awsaccount> with your own account.

{
  "executionRoleArn": "arn:aws:iam::<awsaccount>:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "dnsSearchDomains": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/aspnetcorefargatetask",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "entryPoint": null,
      "portMappings": [
        {
          "hostPort": 80,
          "protocol": "tcp",
          "containerPort": 80
        }
      ],
      "command": null,
      "linuxParameters": null,
      "cpu": 0,
      "environment": [],
      "ulimits": null,
      "dnsServers": null,
      "mountPoints": [],
      "workingDirectory": null,
      "dockerSecurityOptions": null,
      "memory": null,
      "memoryReservation": 1024,
      "volumesFrom": [],
      "image": "<awsaccount>.dkr.ecr.us-east-1.amazonaws.com/reverseproxy: latest",
      "disableNetworking": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "user": null,
      "readonlyRootFilesystem": null,
      "dockerLabels": null,
      "privileged": null,
      "name": "reverseproxy"
    },
    {
      "dnsSearchDomains": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/aspnetcorefargatetask",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "entryPoint": null,
      "portMappings": [
        {
          "hostPort": 5000,
          "protocol": "tcp",
          "containerPort": 5000
        }
      ],
      "command": null,
      "linuxParameters": null,
      "cpu": 0,
      "environment": [],
      "ulimits": null,
      "dnsServers": null,
      "mountPoints": [],
      "workingDirectory": null,
      "dockerSecurityOptions": null,
      "memory": null,
      "memoryReservation": 1024,
      "volumesFrom": [],
      "image": "<awsaccount>.dkr.ecr.us-east-1.amazonaws.com/mymvcweb:latest",
      "disableNetworking": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "user": null,
      "readonlyRootFilesystem": null,
      "dockerLabels": null,
      "privileged": null,
      "name": "mymvcweb"
    }
  ],
  "placementConstraints": [],
  "memory": "2048",
  "taskRoleArn": "arn:aws:iam::<awsaccount>:role/aspnetecstaskroles",
  "compatibilities": [
    "EC2",
    "FARGATE"
  ],
  "taskDefinitionArn": "arn:aws:ecs:us-east-1:<awsaccount>:task-definition/aspnetcorefargatetask:1",
  "family": "aspnetcorefargatetask",
  "requiresAttributes": [
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.execution-role-ecr-pull"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.task-eni"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.ecr-auth"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.task-iam-role"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.execution-role-awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.21"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
    }
  ],
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc",
  "cpu": "1024",
  "revision": 1,
  "status": "ACTIVE",
  "volumes": []
}

The above Task definition contains two containers, the ASP.NET core and the NGINX reverse-proxy server. Currently, awsvpc is the only networking mode supported for AWS Fargate Tasks. When an AWS Fargate Task is launched, the ECS container network plugin assigns a dedicated Elastic Network Interface (ENI) for the Tasks. This ENI does not share the global default network namespace with ECS instances.

You also specify the subnets for placing tasks across ECS instances. This means that the Subnet Security Group is also applicable to the ENI for the respective Tasks. This enables communication between two AWS Fargate Tasks, or other resources within the VPC. Because of the awsvpc network mode, calls from AWS Fargate Tasks do not go through the eth0 Docker bridge.

Create the Amazon ECS service

The AWS Fargate service is a managed AWS Fargate task. The desired state of the application can be defined using the AWS Fargate service. For more information, see Create a service.

In the console, choose Task Definitions and select the task definition that you just created.

On the Task Definition [name] page, select the revision of the task definition from which to create your service.

Review the task definition, and choose Actions, Create Service. For Launch type, choose FARGATE. Enter values for the rest of the fields:

  • Platform version: LATEST
  • Cluster: aspcorefargatecluster (or the cluster name you chose)
  • Service name: aspcorefargatesvc (or another name of your choice)
  • Number of tasks: 2
  • Minimum healthy percent: 50
  • Maximum percent: 200

On the Configure networking page, select the required VPC and subnets required for running the tasks.

Register the Application Load Balancer (ALB) that you created. The ECS scheduler has built-in intelligence, which makes it seamless to work with Application Load Balancer (ALB).

Then, configure Service Auto Scaling. Even though this is an optional feature, I recommend to enable service-level scaling. It addresses the key tenets of how a microservice should behave at runtime. For more information, see (Optional) Configuring Your Service to Use Service Auto Scaling.

I’m defining minimum number of tasks as 2, desired tasks as 2 and maximum tasks as 3.

Complete the Amazon ECS Service creation.

When the Amazon ECS Task gets placed, the ECS scheduler registers the Task as a target for the Load Balancer.

When the Task is healthy and passes the Load Balancer health checks, it is reflected in the healthy host count.

Access the DNS ‘A’ record of the Load Balancer in the browser. The ASP.NET core application should render successfully.

Conclusion

In this post, we took an existing ASP.NET core application, containerized it, and hosted it in Amazon ECS as a microservice using the AWS Fargate compute engine. AWS Fargate gives you a way to run containers directly without managing any EC2 instances and giving you full control over how the task is defined, including task networking and resources.

If you have questions or suggestions, please comment below.

Sundararajan Narasiman is an AWS Partner Solutions Architect