Tag Archives: Kubernetes

Improve Productivity and Reduce Overhead Expenses with Red Hat OpenShift Dedicated on AWS

Post Syndicated from Ryan Niksch original https://aws.amazon.com/blogs/architecture/improve-productivity-and-reduce-overhead-expenses-with-red-hat-openshift-dedicated-on-aws/

Red Hat OpenShift on AWS helps you develop, deploy, and manage container-based applications across on-premises and cloud environments. A recent case study from Cathay Pacific Airways proved that the use of the Red Hat OpenShift application platform can significantly improve developer productivity and reduce operational overhead by automating infrastructure, application deployment, and scaling. In this post, I explore how the architectural implementation and customization options of Red Hat OpenShift dedicated on AWS can cater to a variety of customer needs.

Red Hat OpenShift is a turn key solution providing  a container runtime, Kubernetes orchestration, container image repositories, pipeline, build process, monitoring, logging, role-based access control, granular policy-based control, and abstractions to simplify functions. Deploying a single turnkey solution, instead of building and integrating a collection of independent solutions or services, allows you to invest more time and effort in building meaningful applications for your business.

In the past, Red Hat OpenShift deployed on Amazon EC2 using an automated provisioning process with an open source solution, like the Red Hat OpenShift on AWS Quick Start. The Red Hat OpenShift Quick Start is an infrastructure as code solution which accelerates customer provisioning of Red Hat OpenShift on AWS. The OpenShift Quick Start adheres to the reference architecture to deploy Red Hat OpenShift on AWS in a resilient, scalable, well-architected manner. This reference architecture sees the control plane as a collection of load balanced master nodes for traffic routing, session state, scheduling, and monitoring. It also contains the application nodes where the customer’s containerized workloads run. This solution allowed customers to get up and running within three hours; however, it did not reduce management overhead because customers were required to monitor and maintain the infrastructure of the Red Hat OpenShift cluster.

Red Hat and AWS listened to customer feedback and created Red Hat OpenShift dedicated, a fully managed OpenShift implementation running exclusively on AWS. This implementation monitors the layers and functions, scales the layers to cater to consumption needs, and addresses operational concerns.

Customers now have access to a platform that helps manage control planes for business-critical solutions, like their developer and operational platforms.

Red Hat OpenShift Dedicated Infrastructure on AWS

You can purchase Red Hat OpenShift dedicated through the Red Hat account team. Red Hat OpenShift dedicated comes in two varieties: the Standard edition and the Cloud Choice edition (bring your own cloud).

redhar-openshit options

Figure 1: Red Hat OpenShift architecture illustrating master and infrastructure nodes spread over three Availability Zones and placed behind elastic load balancers.

Red Hat OpenShift dedicated adheres to the reference architecture defined by AWS and Red Hat. Master and infrastructure layers are spread across three AWS availability zones providing resilience within the OpenShift solution, as well as the underlying infrastructure.

Red Hat OpenShift Dedicated Standard Edition

In the Red Hat OpenShift dedicated standard edition, Red Hat deploys the OpenShift cluster into an AWS account owned and managed by Red Hat. Red Hat provides an aggregated bill for the OpenShift subscription fees, management fees, and AWS billing. This edition is ideal for customers who want everything to be managed for them. The Red Hat site reliability engineering  team (SRE) will monitor and manage healing, scaling, and patching of the cluster.

Red Hat OpenShift Cloud Choice Edition

The cloud choice edition allows customers to create their own AWS account, and then have the Red Hat OpenShift dedicated infrastructure provisioned into their existing account. The Red Hat SRE team provisions the Red Hat OpenShift cluster into the customer owned AWS account and manages the solution via IAM roles.

Figure 2: Red Hat OpenShift Cloud Choice IAM role separation

Red Hat provides billing for the Red Hat OpenShift Cloud Choice subscription and management fees, and AWS provides billing for the AWS resources. Keeping the Red Hat OpenShift infrastructure within your AWS account allows better cost controls.

Red Hat OpenShift Cloud Choice provides visibility into the resources running in your account; which is desirable if you have regulatory and auditing concerns. You can inspect, monitor, and audit resources within the AWS account — taking advantage of the rich AWS service set (AWS CloudTrail, AWS config, AWS CloudWatch, and AWS cost explorer).

You can also take advantage of cost management solutions like AWS organizations and consolidated billing. Customers with multiple business units using AWS can combine the usage across their accounts to share the volume pricing discounts resulting in cost savings for projects, departments, and companies.

Red Hat OpenShift Cloud Choice dedicated cannot be deployed into an account currently hosting other applications and resources. In order to maintain separation of control with the managed service, Red Hat OpenShift Cloud Choice dedicated requires an AWS account dedicated to the managed Red Hat OpenShift solution.

You can take advantage of cost reductions of up to 70% using Reserved Instances, which match the pervasive running instances. This is ideal for the master and infrastructure nodes of the Red Hat OpenShift solutions running in your account. The reference architecture for Red Hat OpenShift on AWS recommends spanning  nodes over three availability zones, which translates to three master instances. The master and infrastructure nodes scale differently; so, there will be three additional instances for the infrastructure nodes. Purchasing reserved instances to offset the costs of the master nodes and the infrastructure nodes can free up funds for your next project.

Interactions

DevOps teams using either edition of Red Hat OpenShift dedicated have a rich console experience providing control over networking between application workloads, storage, and monitoring. Granular drill down consoles enable operations teams to focus on what is most critical to their organization.

Each interface is controlled through granular role-based access control. Teams have visibility of high-level cluster overviews where they are able to see visualizations of the overall health of the cluster; and they have access to more granular overviews of views of hosts, nodes, and containers. Application owners, key stake holders, and operations teams have access to a customizable dashboard displaying the running state. Teams can drill down to the underlying nodes, and further into the PODs and containers, should they wish to explore the status or overall health of the containerized micro services. The cluster-wide event stream provides the same drill down experience to logging events.

The drill down console menu options are illustrated in the screenshots below:

In summary, the partnership of Red Hat and AWS created a fully managed solution which directly answers customer feedback requests for a fully managed application platform running on the availability, scalability, and cost benefits of AWS. The solution allows visibility and control whenever and wherever you need it.

About the author

Ryan Niksch

Ryan Niksch is a Partner Solutions Architect focusing on application platforms, hybrid application solutions, and modernization. Ryan has worn many hats in his life and has a passion for tinkering and a desire to leave everything he touches a little better than when he found it.

Using AWS App Mesh with Fargate

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/using-aws-app-mesh-with-fargate/

This post is contributed by Tony Pujals | Senior Developer Advocate, AWS

 

AWS App Mesh is a service mesh, which provides a framework to control and monitor services spanning multiple AWS compute environments. My previous post provided a walkthrough to get you started. In it, I showed deploying a simple microservice application to Amazon ECS and configuring App Mesh to provide traffic control and observability.

In this post, I show more advanced techniques using AWS Fargate as an ECS launch type. I show you how to deploy a specific version of the colorteller service from the previous post. Finally, I move on and explore distributing traffic across other environments, such as Amazon EC2 and Amazon EKS.

I simplified this example for clarity, but in the real world, creating a service mesh that bridges different compute environments becomes useful. Fargate is a compute service for AWS that helps you run containerized tasks using the primitives (the tasks and services) of an ECS application. This lets you work without needing to directly configure and manage EC2 instances.

 

Solution overview

This post assumes that you already have a containerized application running on ECS, but want to shift your workloads to use Fargate.

You deploy a new version of the colorteller service with Fargate, and then begin shifting traffic to it. If all goes well, then you continue to shift more traffic to the new version until it serves 100% of all requests. Use the labels “blue” to represent the original version and “green” to represent the new version. The following diagram shows programmer model of the Color App.

You want to begin shifting traffic over from version 1 (represented by colorteller-blue in the following diagram) over to version 2 (represented by colorteller-green).

In App Mesh, every version of a service is ultimately backed by actual running code somewhere, in this case ECS/Fargate tasks. Each service has its own virtual node representation in the mesh that provides this conduit.

The following diagram shows the App Mesh configuration of the Color App.

 

 

After shifting the traffic, you must physically deploy the application to a compute environment. In this demo, colorteller-blue runs on ECS using the EC2 launch type and colorteller-green runs on ECS using the Fargate launch type. The goal is to test with a portion of traffic going to colorteller-green, ultimately increasing to 100% of traffic going to the new green version.

 

AWS compute model of the Color App.

Prerequisites

Before following along, set up the resources and deploy the Color App as described in the previous walkthrough.

 

Deploy the Fargate app

To get started after you complete your Color App, configure it so that your traffic goes to colorteller-blue for now. The blue color represents version 1 of your colorteller service.

Log into the App Mesh console and navigate to Virtual routers for the mesh. Configure the HTTP route to send 100% of traffic to the colorteller-blue virtual node.

The following screenshot shows routes in the App Mesh console.

Test the service and confirm in AWS X-Ray that the traffic flows through the colorteller-blue as expected with no errors.

The following screenshot shows racing the colorgateway virtual node.

 

Deploy the new colorteller to Fargate

With your original app in place, deploy the send version on Fargate and begin slowly increasing the traffic that it handles rather than the original. The app colorteller-green represents version 2 of the colorteller service. Initially, only send 30% of your traffic to it.

If your monitoring indicates a healthy service, then increase it to 60%, then finally to 100%. In the real world, you might choose more granular increases with automated rollout (and rollback if issues arise), but this demonstration keeps things simple.

You pushed the gateway and colorteller images to ECR (see Deploy Images) in the previous post, and then launched ECS tasks with these images. For this post, launch an ECS task using the Fargate launch type with the same colorteller and envoy images. This sets up the running envoy container as a sidecar for the colorteller container.

You don’t have to manually configure the EC2 instances in a Fargate launch type. Fargate automatically colocates the sidecar on the same physical instance and lifecycle as the primary application container.

To begin deploying the Fargate instance and diverting traffic to it, follow these steps.

 

Step 1: Update the mesh configuration

You can download updated AWS CloudFormation templates located in the repo under walkthroughs/fargate.

This updated mesh configuration adds a new virtual node (colorteller-green-vn). It updates the virtual router (colorteller-vr) for the colorteller virtual service so that it distributes traffic between the blue and green virtual nodes at a 2:1 ratio. That is, the green node receives one-third of the traffic.

$ ./appmesh-colorapp.sh
...
Waiting for changeset to be created..
Waiting for stack create/update to complete
...
Successfully created/updated stack - DEMO-appmesh-colorapp
$

Step 2: Deploy the green task to Fargate

The fargate-colorteller.sh script creates parameterized template definitions before deploying the fargate-colorteller.yaml CloudFormation template. The change to launch a colorteller task as a Fargate task is in fargate-colorteller-task-def.json.

$ ./fargate-colorteller.sh
...

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - DEMO-fargate-colorteller
$

 

Verify the Fargate deployment

The ColorApp endpoint is one of the CloudFormation template’s outputs. You can view it in the stack output in the AWS CloudFormation console, or fetch it with the AWS CLI:

$ colorapp=$(aws cloudformation describe-stacks --stack-name=$ENVIRONMENT_NAME-ecs-colorapp --query="Stacks[0
].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp> ].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp
http://DEMO-Publi-YGZIJQXL5U7S-471987363.us-west-2.elb.amazonaws.com

Assign the endpoint to the colorapp environment variable so you can use it for a few curl requests:

$ curl $colorapp/color
{"color":"blue", "stats": {"blue":1}}
$

The 2:1 weight of blue to green provides predictable results. Clear the histogram and run it a few times until you get a green result:

$ curl $colorapp/color/clear
cleared

$ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done

0: {"color":"blue", "stats": {"blue":1}}
1: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"blue", "stats": {"blue":0.67,"green":0.33}}
3: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
4: {"color":"blue", "stats": {"blue":0.6,"green":0.4}}
5: {"color":"gre
en", "stats": {"blue":0.5,"green":0.5}}
6: {"color":"blue", "stats": {"blue":0.57,"green":0.43}}
7: {"color":"blue", "stats": {"blue":0.63,"green":0.38}}
8: {"color":"green", "stats": {"blue":0.56,"green":0.44}}
...
199: {"color":"blue", "stats": {"blue":0.66,"green":0.34}}

This reflects the expected result for a 2:1 ratio. Check everything on your AWS X-Ray console.

The following screenshot shows the X-Ray console map after the initial testing.

The results look good: 100% success, no errors.

You can now increase the rollout of the new (green) version of your service running on Fargate.

Using AWS CloudFormation to manage your stacks lets you keep your configuration under version control and simplifies the process of deploying resources. AWS CloudFormation also gives you the option to update the virtual route in appmesh-colorapp.yaml and deploy the updated mesh configuration by running appmesh-colorapp.sh.

For this post, use the App Mesh console to make the change. Choose Virtual routers for appmesh-mesh, and edit the colorteller-route. Update the HTTP route so colorteller-blue-vn handles 33.3% of the traffic and colorteller-green-vn now handles 66.7%.

Run your simple verification test again:

$ curl $colorapp/color/clear
cleared
fargate $ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done
0: {"color":"green", "stats": {"green":1}}
1: {"color":"blue", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"green", "stats": {"blue":0.33,"green":0.67}}
3: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
4: {"color":"green", "stats": {"blue":0.2,"green":0.8}}
5: {"color":"green", "stats": {"blue":0.17,"green":0.83}}
6: {"color":"blue", "stats": {"blue":0.29,"green":0.71}}
7: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
...
199: {"color":"green", "stats": {"blue":0.32,"green":0.68}}
$

If your results look good, double-check your result in the X-Ray console.

Finally, shift 100% of your traffic over to the new colorteller version using the same App Mesh console. This time, modify the mesh configuration template and redeploy it:

appmesh-colorteller.yaml
  ColorTellerRoute:
    Type: AWS::AppMesh::Route
    DependsOn:
      - ColorTellerVirtualRouter
      - ColorTellerGreenVirtualNode
    Properties:
      MeshName: !Ref AppMeshMeshName
      VirtualRouterName: colorteller-vr
      RouteName: colorteller-route
      Spec:
        HttpRoute:
          Action:
            WeightedTargets:
              - VirtualNode: colorteller-green-vn
                Weight: 1
          Match:
            Prefix: "/"
$ ./appmesh-colorapp.sh
...
Waiting for changeset to be created..
Waiting for stack create/update to complete
...
Successfully created/updated stack - DEMO-appmesh-colorapp
$

Again, repeat your verification process in both the CLI and X-Ray to confirm that the new version of your service is running successfully.

 

Conclusion

In this walkthrough, I showed you how to roll out an update from version 1 (blue) of the colorteller service to version 2 (green). I demonstrated that App Mesh supports a mesh spanning ECS services that you ran as EC2 tasks and as Fargate tasks.

In my next walkthrough, I will demonstrate that App Mesh handles even uncontainerized services launched directly on EC2 instances. It provides a uniform and powerful way to control and monitor your distributed microservice applications on AWS.

If you have any questions or feedback, feel free to comment below.

Scaling Kubernetes deployments with Amazon CloudWatch metrics

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/scaling-kubernetes-deployments-with-amazon-cloudwatch-metrics/

This post is contributed by Kwunhok Chan | Solutions Architect, AWS

 

In an earlier post, AWS introduced Horizontal Pod Autoscaler and Kubernetes Metrics Server support for Amazon Elastic Kubernetes Service. These tools make it easy to scale your Kubernetes workloads managed by EKS in response to built-in metrics like CPU and memory.

However, one common use case for applications running on EKS is the integration with AWS services. For example, you administer an application that processes messages published to an Amazon SQS queue. You want the application to scale according to the number of messages in that queue. The Amazon CloudWatch Metrics Adapter for Kubernetes (k8s-cloudwatch-adapter) helps.

 

Amazon CloudWatch Metrics Adapter for Kubernetes

The k8s-cloudwatch-adapter is an implementation of the Kubernetes Custom Metrics API and External Metrics API with integration for CloudWatch metrics. It allows you to scale your Kubernetes deployment using the Horizontal Pod Autoscaler (HPA) with CloudWatch metrics.

 

Prerequisites

Before starting, you need the following:

 

Getting started

Before using the k8s-cloudwatch-adapter, set up a way to manage IAM credentials to Kubernetes pods. The CloudWatch Metrics Adapter requires the following permissions to access metric data from CloudWatch:

cloudwatch:GetMetricData

Create an IAM policy with the following template:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        }
    ]
}

For demo purposes, I’m granting admin permissions to my Kubernetes worker nodes. Don’t do this in your production environment. To associate IAM roles to your Kubernetes pods, you may want to look at kube2iam or kiam.

If you’re using an EKS cluster, you most likely provisioned it with AWS CloudFormation. The following command uses AWS CloudFormation stacks to update the proper instance policy with the correct permissions:

aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/AdministratorAccess \
--role-name $(aws cloudformation describe-stacks --stack-name ${STACK_NAME} --query 'Stacks[0].Parameters[?ParameterKey==`NodeInstanceRoleName`].ParameterValue' | jq -r ".[0]")

 

Make sure to replace ${STACK_NAME} with the nodegroup stack name from the AWS CloudFormation console .

 

You can now deploy the k8s-cloudwatch-adapter to your Kubernetes cluster.

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/deploy/adapter.yaml

 

This deployment creates a new namespace—custom-metrics—and deploys the necessary ClusterRole, Service Account, and Role Binding values, along with the deployment of the adapter. Use the created custom resource definition (CRD) to define the configuration for the external metrics to retrieve from CloudWatch. The adapter reads the configuration defined in ExternalMetric CRDs and loads its external metrics. That allows you to use HPA to autoscale your Kubernetes pods.

 

Verifying the deployment

Next, query the metrics APIs to see if the adapter is deployed correctly. Run the following command:

$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq.
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
  ]
}

There are no resources from the response because you haven’t registered any metric resources yet.

 

Deploying an Amazon SQS application

Next, deploy a sample SQS application to test out k8s-cloudwatch-adapter. The SQS producer and consumer are provided, together with the YAML files for deploying the consumer, metric configuration, and HPA.

Both the producer and consumer use an SQS queue named helloworld. If it doesn’t exist already, the producer creates this queue.

Deploy the consumer with the following command:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/consumer-deployment.yaml

 

You can verify that the consumer is running with the following command:

$ kubectl get deploy sqs-consumer
NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
sqs-consumer   1         1         1            0           5s

 

Set up Amazon CloudWatch metric and HPA

Next, create an ExternalMetric resource for the CloudWatch metric. Take note of the Kind value for this resource. This CRD resource tells the adapter how to retrieve metric data from CloudWatch.

You define the query parameters used to retrieve the ApproximateNumberOfMessagesVisible for an SQS queue named helloworld. For details about how metric data queries work, see CloudWatch GetMetricData API.

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric:
  metadata:
    name: hello-queue-length
  spec:
    name: hello-queue-length
    resource:
      resource: "deployment"
      queries:
        - id: sqs_helloworld
          metricStat:
            metric:
              namespace: "AWS/SQS"
              metricName: "ApproximateNumberOfMessagesVisible"
              dimensions:
                - name: QueueName
                  value: "helloworld"
            period: 300
            stat: Average
            unit: Count
          returnData: true

 

Create the ExternalMetric resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/externalmetric.yaml

 

Then, set up the HPA for your consumer. Here is the configuration to use:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: sqs-consumer-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: sqs-consumer
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: hello-queue-length
      targetValue: 30

 

This HPA rule starts scaling out when the number of messages visible in your SQS queue exceeds 30, and scales in when there are fewer than 30 messages in the queue.

Create the HPA resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/hpa.yaml

 

Generate load using a producer

Finally, you can start generating messages to the queue:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/producer-deployment.yaml

On a separate terminal, you can now watch your HPA retrieving the queue length and start scaling the replicas. SQS metrics generate at five-minute intervals, so give the process a few minutes:

$ kubectl get hpa sqs-consumer-scaler -w

 

Clean up

After you complete this experiment, you can delete the Kubernetes deployment and respective resources.

Run the following commands to remove the consumer, external metric, HPA, and SQS queue:

$ kubectl delete deploy sqs-producer
$ kubectl delete hpa sqs-consumer-scaler
$ kubectl delete externalmetric sqs-helloworld-length
$ kubectl delete deploy sqs-consumer

$ aws sqs delete-queue helloworld

 

Other CloudWatch integrations

AWS recently announced the preview for Amazon CloudWatch Container Insights, which monitors, isolates, and diagnoses containerized applications running on EKS and Kubernetes clusters. To get started, see Using Container Insights.

 

Get involved

This project is currently under development. AWS welcomes issues and pull requests, and would love to hear your feedback.

How could this adapter be best implemented to work in your environment? Visit the Amazon CloudWatch Metrics Adapter for Kubernetes project on GitHub and let AWS know what you think.

Learning AWS App Mesh

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/learning-aws-app-mesh/

This post is contributed by Geremy Cohen | Solutions Architect, Strategic Accounts, AWS

At re:Invent 2018, AWS announced AWS App Mesh, a service mesh that provides application-level networking. App Mesh makes it easy for your services to communicate with each other across multiple types of compute infrastructure, including:

App Mesh standardizes how your services communicate, giving you end-to-end visibility and ensuring high availability for your applications. Service meshes like App Mesh help you run and monitor HTTP and TCP services at scale.

Using the open source Envoy proxy, App Mesh gives you access to a wide range of tools from AWS partners and the open source community. Because all traffic in and out of each service goes through the Envoy proxy, all traffic can be routed, shaped, measured, and logged. This extra level of indirection lets you build your services in any language desired without having to use a common set of communication libraries.

In this six-part series of the post, I walk you through setup and configuration of App Mesh for popular platforms and use cases, beginning with EKS. Here’s the list of the parts:

  1. Part 1: Introducing service meshes.
  2. Part 2: Prerequisites for running on EKS.
  3. Part 3: Creating example microservices on Amazon EKS.
  4. Part 4: Installing the sidecar injector and CRDs.
  5. Part 5: Configuring existing microservices.
  6. Part 6: Deploying with the canary technique.

Overview

Throughout the post series, I use diagrams to help describe what’s being built. In the following diagram:

  • The circle represents the container in which your app (microservice) code runs.
  • The dome alongside the circle represents the App Mesh (Envoy) proxy running as a sidecar container. When there is no dome present, no service mesh functionality is implemented for the pod.
  • The arrows show communications traffic between the application container and the proxy, as well as between the proxy and other pods.

PART 1: Introducing service meshes

Life without a service mesh

Best practices call for implementing observability, analytics, and routing capabilities across your microservice infrastructure in a consistent manner.

Between any two interacting services, it’s critical to implement logging, tracing, and metrics gathering—not to mention dynamic routing and load balancing—with minimal impact to your actual application code.

Traditionally, to provide these capabilities, you would compile each service with one or more SDKs that provided this logic. This is known as the “in-process design pattern,” because this logic runs in the same process as the service code.

When you only run a small number of services, running multiple SDKs alongside your application code may not be a huge undertaking. If you can find SDKs that provide the required functionality on the platforms and languages on which you are developing, compiling it into your service code is relatively straightforward.

As your application matures, the in-process design pattern becomes increasingly complex:

  • The number of engineers writing code grows, so each engineer must learn the in-process SDKs in use. They must also spend time integrating the SDKs with their own service logic and the service logic of others.
  • In shops where polyglot development is prevalent, as the number of engineers grow, so may the number of coding languages in use. In these scenarios, you’ll need to make sure that your SDKs are supported on these new languages.
  • The platforms that your engineering teams deploy services to may also increase and become disparate. You may have begun with Node.js containers on Kubernetes, but now, new microservices are being deployed with AWS Lambda, EC2, and other managed services. You’ll need to make sure that the SDK solution that you’ve chosen is compatible with these common platforms.
  • If you’re fortunate to have platform and language support for the SDKs you’re using, inconsistencies across the various SDK languages may creep in. This is especially true when you find a gap in language or platform support and implement custom operational logic for a language or platform that is unsupported.
  • Assuming you’ve accommodated for all the previous caveats, by using SDKs compiled into your service logic, you’re tightly coupling your business logic with your operations logic.

 

Enter the service mesh

Considering the increasing complexity as your application matures, the true value of service meshes becomes clear. With a service mesh, you can decouple your microservices’ observability, analytics, and routing logic from the underlying infrastructure and application layers.

The following diagram combines the previous two. Instead of incorporating these features at the code level (in-process), an out-of-process “sidecar proxy” container (represented by the pink dome) runs alongside your application code’s container in each pod.

 

In this model, consistent and decoupled analytics, logging, tracing, and routing logic capabilities are running alongside each microservice in your infrastructure as a sidecar proxy. Each sidecar proxy is configured by a unique configuration ruleset, based on the services it’s responsible for proxying. With 100% of the communications between pods and services proxied, 100% of the traffic is now observable and actionable.

 

App Mesh as the service mesh

App Mesh implements this sidecar proxy via the production-proven Envoy proxy. Envoy is arguably the most popular open-source service proxy. Created at Lyft in 2016, Envoy is a stable OSS project with wide community support. It’s defined as a “Graduated Project” by the Cloud Native Computing Foundation (CNCF). Envoy is a popular proxy solution due to its lightweight C++-based design, scalable architecture, and successful deployment record.

In the following diagram, a sidecar runs alongside each container in your application to provide its proxying logic, syncing each of their unique configurations from the App Mesh control plane.

Each one of these proxies must have its own unique configuration ruleset pushed to it to operate correctly. To achieve this, DevOps teams can push their intended ruleset configuration to the App Mesh API. From there, the App Mesh control plane reliably keeps all proxy instances up-to-date with their desired configurations. App Mesh dynamically scales to hundreds of thousands of pods, tasks, EC2 instances, and Lambda functions, adjusting configuration changes accordingly as instances scale up, down, and restart.

 

App Mesh components

App Mesh is made up of the following components:

  • Service mesh: A logical boundary for network traffic between the services that reside within it.
  • Virtual nodes: A logical pointer to a Kubernetes service, or an App Mesh virtual service.
  • Virtual routers: Handles traffic for one or more virtual services within your mesh.
  • Routes: Associated with a virtual router, it directs traffic that matches a service name prefix to one or more virtual nodes.
  • Virtual services: An abstraction of a real service that is either provided by a virtual node directly, or indirectly by means of a virtual router.
  • App Mesh sidecar: The App Mesh sidecar container configures your pods to use the App Mesh service mesh traffic rules set up for your virtual routers and virtual nodes.
  • App Mesh injector: Makes it easy to auto-inject the App Mesh sidecars into your pods.
  • App Mesh custom resource definitions: (CRD) Provided to implement App Mesh CRUD and configuration operations directly from the kubectl CLI.  Alternatively, you may use the latest version of the AWS CLI.

 

In the following parts, I walk you through the setup and configuration of each of these components.

 

Conclusion of Part 1

In this first part, I discussed in detail the advantages that service meshes provide, and the specific components that make up the App Mesh service mesh. I hope the information provided helps you to understand the benefit of all services meshes, regardless of vendor.

If you’re intrigued by what you’ve learned so far, don’t stop now!

For even more background on the components of AWS App Mesh, check out the official AWS App Mesh documentation, and when you’re ready, check out part 2 in this post where I guide you through completing the prerequisite steps to run App Mesh in your own environment.

 

 

PART 2: Setting up AWS App Mesh on Amazon EKS

 

In part 1 of this series, I discussed the functionality of service meshes like AWS App Mesh provided on Kubernetes and other services. In this post, I walk you through completing the prerequisites required to install and run App Mesh in your own Amazon EKS-based Kubernetes environment.

When you have the environment set up, be sure to leave it intact if you plan on experimenting in the future with App Mesh on your own (or throughout this series of posts).

 

Prerequisites

To run App Mesh, your environment must meet the following requirements.

  • An AWS account
  • The AWS CLI installed and configured
    • The minimal version supported is 1.16.133. You should have a Region set via the aws configure command. For this tutorial, it should work against all Regions where App Mesh and Amazon EKS are supported. Use us-west-2 if you don’t have a preference or are in doubt:
      aws configure set region us-west-2
  • The jq utility
    • The utility is required by scripts executed in this series. Make sure that you have it installed on the machine from which to run the tutorial steps.
  • Kubernetes and kubectl
    • The minimal Kubernetes and kubectl versions supported are 1.11. You need a Kubernetes cluster deployed on Amazon Elastic Compute Cloud (Amazon EC2) or on an Amazon EKS cluster. Although the steps in this tutorial demonstrate using App Mesh on Amazon EKS, the instructions also work on upstream k8s running on Amazon EC2.

Amazon EKS makes it easy to run Kubernetes on AWS. Start by creating an EKS cluster using eksctl.  For more information about how to use eksctl to spin up an EKS cluster for this exercise, see eksworkshop.com. That site has a great tutorial for getting up and running quickly with an account, as well as an EKS cluster.

 

Clone the tutorial repository

Clone the tutorial’s repository by issuing the following command in a directory of your choice:

git clone https://github.com/aws/aws-app-mesh-examples

Next, navigate to the repo’s /djapp examples directory:

cd aws-app-mesh-examples/examples/apps/djapp/

All the steps in this tutorial are executed out of this directory.

 

IAM permissions for the user and k8s worker nodes

Both k8s worker nodes and any principals (including yourself) running App Mesh AWS CLI commands must have the proper permissions to access the App Mesh service, as shown in the following code example:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "appmesh:DescribeMesh",
                "appmesh:DescribeVirtualNode",
                "appmesh:DescribeVirtualService",
                "appmesh:DescribeVirtualRouter",
                "appmesh:DescribeRoute",
                "appmesh:CreateMesh",
                "appmesh:CreateVirtualNode",
                "appmesh:CreateVirtualService",
                "appmesh:CreateVirtualRouter",
                "appmesh:CreateRoute",
                "appmesh:UpdateMesh",
                "appmesh:UpdateVirtualNode",
                "appmesh:UpdateVirtualService",
                "appmesh:UpdateVirtualRouter",
                "appmesh:UpdateRoute",
                "appmesh:ListMeshes",
                "appmesh:ListVirtualNodes",
                "appmesh:ListVirtualServices",
                "appmesh:ListVirtualRouters",
                "appmesh:ListRoutes",
                "appmesh:DeleteMesh",
                "appmesh:DeleteVirtualNode",
                "appmesh:DeleteVirtualService",
                "appmesh:DeleteVirtualRouter",
                "appmesh:DeleteRoute"
            ],
            "Resource": "*"
        }
    ]
}

To provide users with the correct permissions, add the previous policy to the user’s role or group, or create it as an inline policy.

To verify as a user that you have the correct permissions set for App Mesh, issue the following command:

aws appmesh list-meshes

If you have the proper permissions and haven’t yet created a mesh, you should get back an empty response like the following. If you did have a mesh created, you get a slightly more verbose response.

{
"meshes": []
}

If you do not have the proper permissions, you’ll see a response similar to the following:

An error occurred (AccessDeniedException) when calling the ListMeshes operation: User: arn:aws:iam::123abc:user/foo is not authorized to perform: appmesh:ListMeshes on resource: *

As a user, these permissions (or even the Administrator Access role) enable you to complete this tutorial, but it’s critical to implement least-privileged access for production or internet-facing deployments.

 

Adding the permissions for EKS worker nodes

If you’re using an Amazon EKS-based cluster to follow this tutorial (suggested), you can easily add the previous permissions to your k8s worker nodes with the following steps.

First, get the role under which your k8s workers are running:

INSTANCE_PROFILE_NAME=$(aws iam list-instance-profiles | jq -r '.InstanceProfiles[].InstanceProfileName' | grep nodegroup)
ROLE_NAME=$(aws iam get-instance-profile --instance-profile-name $INSTANCE_PROFILE_NAME | jq -r '.InstanceProfile.Roles[] | .RoleName')
echo $ROLE_NAME

Upon running that command, the $ROLE_NAME environment variable should be output similar to:

eksctl-blog-nodegroup-ng-1234-NodeInstanceRole-abc123

Copy and paste the following code to add the permissions as an inline policy to your worker node instances:

cat << EoF > k8s-appmesh-worker-policy.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "appmesh:DescribeMesh",
        "appmesh:DescribeVirtualNode",
        "appmesh:DescribeVirtualService",
        "appmesh:DescribeVirtualRouter",
        "appmesh:DescribeRoute",
        "appmesh:CreateMesh",
        "appmesh:CreateVirtualNode",
        "appmesh:CreateVirtualService",
        "appmesh:CreateVirtualRouter",
        "appmesh:CreateRoute",
        "appmesh:UpdateMesh",
        "appmesh:UpdateVirtualNode",
        "appmesh:UpdateVirtualService",
        "appmesh:UpdateVirtualRouter",
        "appmesh:UpdateRoute",
        "appmesh:ListMeshes",
        "appmesh:ListVirtualNodes",
        "appmesh:ListVirtualServices",
        "appmesh:ListVirtualRouters",
        "appmesh:ListRoutes",
        "appmesh:DeleteMesh",
        "appmesh:DeleteVirtualNode",
        "appmesh:DeleteVirtualService",
        "appmesh:DeleteVirtualRouter",
        "appmesh:DeleteRoute"
  ],
      "Resource": "*"
    }
  ]
}
EoF

aws iam put-role-policy --role-name $ROLE_NAME --policy-name AppMesh-Policy-For-Worker --policy-document file://k8s-appmesh-worker-policy.json

To verify that the policy was attached to the role, run the following command:

aws iam get-role-policy --role-name $ROLE_NAME --policy-name AppMesh-Policy-For-Worker

To test that your worker nodes are able to use these permissions correctly, run the following job from the project’s directory.

NOTE: The following YAML is configured for the us-west-2 Region. If you are running your cluster and App Mesh out of a different Region, modify the –region value found in the command attribute (not in the image attribute) in the YAML before proceeding, as shown below:

command: ["aws","appmesh","list-meshes","—region","us-west-2"]

Execute the job by running the following command:

kubectl apply -f awscli.yaml

Make sure that the job is completed by issuing the command:

kubectl get jobs

You should see that the desired and successful values are both one:

NAME     DESIRED   SUCCESSFUL   AGE
awscli   1         1            1m

Inspect the output of the job:

kubectl logs jobs/awscli

Similar to the list-meshes call, the output of this command shows whether your nodes can make App Mesh API calls successfully.

This output shows that the workers have proper access:

{
"meshes": []
}

While this output shows that they don’t:

An error occurred (AccessDeniedException) when calling the ListMeshes operation: User: arn:aws:iam::123abc:user/foo is not authorized to perform: appmesh:ListMeshes on resource: *

If you have to troubleshoot further, you must first delete the job before you run it again to test it:

kubectl delete jobs/awscli

After you’ve verified that you have the proper permissions set, you are ready to move forward and understand more about the demo application you’re going to build on top of App Mesh.

 

Cleaning up

When you’re done experimenting and want to delete all the resources created during this series, run the cleanup script via the following command line:

./cleanup.sh

This script does not delete any nodes in your k8s cluster. It only deletes the DJ App and App Mesh components created throughout this series of posts.

Make sure to leave the cluster intact if you plan on experimenting in the future with App Mesh on your own or throughout this series of posts.

 

Conclusion of Part 2

In this second part of the series, I walked you through the prerequisites required to install and run App Mesh in an Amazon EKS-based Kubernetes environment. In part 3 , I show you how to create a simple microservice that can be implemented on an App Mesh service mesh.

 

 

PART 3: Creating example microservices on Amazon EKS

 

In part 2 of this series, I walked you through completing the setup steps needed to configure your environment to run AWS App Mesh. In this post, I walk you through creating three Amazon EKS-based microservices. These microservices work together to form an app called DJ App, which you use later to demonstrate App Mesh functionality.

 

Prerequisites

Make sure that you’ve completed parts 1 and 2 of this series before running through the steps in this post.

 

Overview of DJ App

I’ll now walk you through creating an example app on App Mesh called DJ App, which is used for a cloud-based music service. This application is composed of the following three microservices:

  • dj
  • metal-v1
  • jazz-v1

The dj service makes requests to either the jazz or metal backends for artist lists. If the dj service requests from the jazz backend, then musical artists such as Miles Davis or Astrud Gilberto are returned. Requests made to the metal backend return artists such as Judas Priest or Megadeth.

Today, the dj service is hardwired to make requests to the metal-v1 service for metal requests and to the jazz-v1 service for jazz requests. Each time there is a new metal or jazz release, a new version of dj must also be rolled out to point to its new upstream endpoints. Although it works for now, it’s not an optimal configuration to maintain for the long term.

App Mesh can be used to simplify this architecture. By virtualizing the metal and jazz service via kubectl or the AWS CLI, routing changes can be made dynamically to the endpoints and versions of your choosing. That minimizes the need for the complete re-deployment of DJ App each time there is a new metal or jazz service release.

 

Create the initial architecture

To begin, I’ll walk you through creating the initial application architecture. As the following diagram depicts, in the initial architecture, there are three k8s services:

  • The dj service, which serves as the DJ App entrypoint
  • The metal-v1 service backend
  • The jazz-v1 service backend

As depicted by the arrows, the dj service will make requests to either the metal-v1, or jazz-v1 backends.

First, deploy the k8s components that make up this initial architecture. To keep things organized, create a namespace for the app called prod, and deploy all of the DJ App components into that namespace. To create the prod namespace, issue the following command:

kubectl apply -f 1_create_the_initial_architecture/1_prod_ns.yaml

The output should be similar to the following:

namespace/prod created

Now that you’ve created the prod namespace, deploy the DJ App (the dj, metal, and jazz microservices) into it. Create the DJ App deployment in the prod namespace by issuing the following command:

kubectl apply -nprod -f 1_create_the_initial_architecture/1_initial_architecture_deployment.yaml

The output should be similar to:

deployment.apps "dj" created
deployment.apps "metal-v1" created
deployment.apps "jazz-v1" created

Create the services that front these deployments by issuing the following command:

kubectl apply -nprod -f 1_create_the_initial_architecture/1_initial_architecture_services.yaml

The output should be similar to:

service "dj" created
service "metal-v1" created
service "jazz-v1" created

Now, verify that everything has been set up correctly by getting all resources from the prod namespace. Issue this command:

kubectl get all -nprod

The output should display the dj, jazz, and metal pods, and the services, deployments, and replica sets, similar to the following:

NAME                            READY   STATUS    RESTARTS   AGE
pod/dj-5b445fbdf4-qf8sv         1/1     Running   0          1m
pod/jazz-v1-644856f4b4-mshnr    1/1     Running   0          1m
pod/metal-v1-84bffcc887-97qzw   1/1     Running   0          1m

NAME               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/dj         ClusterIP   10.100.247.180   <none>        9080/TCP   15s
service/jazz-v1    ClusterIP   10.100.157.174   <none>        9080/TCP   15s
service/metal-v1   ClusterIP   10.100.187.186   <none>        9080/TCP   15s

NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/dj         1         1         1            1           1m
deployment.apps/jazz-v1    1         1         1            1           1m
deployment.apps/metal-v1   1         1         1            1           1m

NAME                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/dj-5b445fbdf4         1         1         1       1m
replicaset.apps/jazz-v1-644856f4b4    1         1         1       1m
replicaset.apps/metal-v1-84bffcc887   1         1         1       1m

When you’ve verified that all resources have been created correctly in the prod namespace, test out this initial version of DJ App. To do that, exec into the DJ pod, and issue a curl request out to the jazz-v1 and metal-v1 backends. Get the name of the DJ pod by listing all the pods with the dj app selector:

kubectl get pods -nprod -l app=dj

The output should be similar to:

NAME                  READY     STATUS    RESTARTS   AGE
dj-5b445fbdf4-8xkwp   1/1       Running   0          32s

Next, exec into the DJ pod:

kubectl exec -nprod -it <your-dj-pod-name> bash

The output should be similar to:

[email protected]:/usr/src/app#

Now that you have a root prompt into the DJ pod, issue a curl request to the jazz-v1 backend service:

curl jazz-v1.prod.svc.cluster.local:9080;echo

The output should be similar to:

["Astrud Gilberto","Miles Davis"]

Try it again, but this time issue the command to the metal-v1.prod.svc.cluster.local backend on port 9080:

curl metal-v1.prod.svc.cluster.local:9080;echo

You should get a list of heavy metal bands:

["Megadeth","Judas Priest"]

When you’re done exploring this vast world of music, press CTRL-D, or type exit to exit the container’s shell:

[email protected]:/usr/src/app# exit
command terminated with exit code 1
$

Congratulations on deploying the initial DJ App architecture!

 

Cleaning up

When you’re done experimenting and want to delete all the resources created during this series, run the cleanup script via the following command line:

./cleanup.sh

This script does not delete any nodes in your k8s cluster. It only deletes the DJ app and App Mesh components created throughout this series of posts.

Make sure to leave the cluster intact if you plan on experimenting in the future with App Mesh on your own or throughout this series of posts.

 

Conclusion of Part 3

In this third part of the series, I demonstrated how to create three simple Kubernetes-based microservices, which working together, form an app called DJ App. This app is later used to demonstrate App Mesh functionality.

In part 4, I show you how to install the App Mesh sidecar injector and CRDs, which make defining and configuring App Mesh components easy.

 

 

PART 4: Installing the sidecar injector and CRDs

 

In part 3 of this series, I walked you through setting up a basic microservices-based application called DJ App on Kubernetes with Amazon EKS. In this post, I demonstrate how to set up and configure the AWS App Mesh sidecar injector and custom resource definitions (CRDs).  As you will see later, the sidecar injector and CRD components make defining and configuring DJ App’s service mesh more convenient.

 

Prerequisites

Make sure that you’ve completed parts 1–3 of this series before running through the steps in this post.

 

Installing the App Mesh sidecar

As decoupled logic, an App Mesh sidecar container must run alongside each pod in the DJ App deployment. This can be set up in few different ways:

  1. Before installing the deployment, you could modify the DJ App deployment’s container specs to include App Mesh sidecar containers. When the app is deployed, it would run the sidecar.
  2. After installing the deployment, you could patch the deployment to include the sidecar container specs. Upon applying this patch, the old pods are torn down, and the new pods come up with the sidecar.
  3. You can implement the App Mesh injector controller, which watches for new pods to be created and automatically adds the sidecar data to the pods as they are deployed.

For this tutorial, I walk you through the App Mesh injector controller option, as it enables subsequent pod deployments to automatically come up with the App Mesh sidecar. This is not only quicker in the long run, but it also reduces the chances of typos that manual editing may introduce.

 

Creating the injector controller

To create the injector controller, run a script that creates a namespace, generates certificates, and then installs the injector deployment.

From the base repository directory, change to the injector directory:

cd 2_create_injector

Next, run the create.sh script:

./create.sh

The output should look similar to the following:

namespace/appmesh-inject created
creating certs in tmpdir /var/folders/02/qfw6pbm501xbw4scnk20w80h0_xvht/T/tmp.LFO95khQ
Generating RSA private key, 2048 bit long modulus
.........+++
..............................+++
e is 65537 (0x10001)
certificatesigningrequest.certificates.k8s.io/aws-app-mesh-inject.appmesh-inject created
NAME                                 AGE   REQUESTOR          CONDITION
aws-app-mesh-inject.appmesh-inject   0s    kubernetes-admin   Pending
certificatesigningrequest.certificates.k8s.io/aws-app-mesh-inject.appmesh-inject approved
secret/aws-app-mesh-inject created

processing templates
Created injector manifest at:/2_create_injector/inject.yaml

serviceaccount/aws-app-mesh-inject-sa created
clusterrole.rbac.authorization.k8s.io/aws-app-mesh-inject-cr unchanged
clusterrolebinding.rbac.authorization.k8s.io/aws-app-mesh-inject-binding configured
service/aws-app-mesh-inject created
deployment.apps/aws-app-mesh-inject created
mutatingwebhookconfiguration.admissionregistration.k8s.io/aws-app-mesh-inject unchanged

Waiting for pods to come up...

App Inject Pods and Services After Install:

NAME                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
aws-app-mesh-inject   ClusterIP   10.100.165.254   <none>        443/TCP   16s
NAME                                   READY   STATUS    RESTARTS   AGE
aws-app-mesh-inject-5d84d8c96f-gc6bl   1/1     Running   0          16s

If you’re seeing this output, the injector controller has been installed correctly. By default, the injector doesn’t act on any pods—you must give it the criteria on what to act on. For the purpose of this tutorial, you’ll next configure it to inject the App Mesh sidecar into any new pods created in the prod namespace.

Return to the repo’s base directory:

cd ..

Run the following command to label the prod namespace:

kubectl label namespace prod appmesh.k8s.aws/sidecarInjectorWebhook=enabled

The output should be similar to the following:

namespace/prod labeled

Next, verify that the injector controller is running:

kubectl get pods -nappmesh-inject

You should see output similar to the following:

NAME                                   READY   STATUS    RESTARTS   AGE
aws-app-mesh-inject-78c59cc699-9jrb4   1/1     Running   0          1h

With the injector portion of the setup complete, I’ll now show you how to create the App Mesh components.

 

Choosing a way to create the App Mesh components

There are two ways to create the components of the App Mesh service mesh:

For this tutorial, I show you how to use kubectl to define the App Mesh components.  To do this, add the CRDs and the App Mesh controller logic that syncs your Kubernetes cluster’s CRD state with the AWS Cloud App Mesh control plane.

 

Adding the CRDs and App Mesh controller

To add the CRDs, run the following commands from the repository base directory:

kubectl apply -f 3_add_crds/mesh-definition.yaml
kubectl apply -f 3_add_crds/virtual-node-definition.yaml
kubectl apply -f 3_add_crds/virtual-service-definition.yaml

The output should be similar to the following:

customresourcedefinition.apiextensions.k8s.io/meshes.appmesh.k8s.aws created
customresourcedefinition.apiextensions.k8s.io/virtualnodes.appmesh.k8s.aws created
customresourcedefinition.apiextensions.k8s.io/virtualservices.appmesh.k8s.aws created

Next, add the controller by executing the following command:

kubectl apply -f 3_add_crds/controller-deployment.yaml

The output should be similar to the following:

namespace/appmesh-system created
deployment.apps/app-mesh-controller created
serviceaccount/app-mesh-sa created
clusterrole.rbac.authorization.k8s.io/app-mesh-controller created
clusterrolebinding.rbac.authorization.k8s.io/app-mesh-controller-binding created

Run the following command to verify that the App Mesh controller is running:

kubectl get pods -nappmesh-system

You should see output similar to the following:

NAME                                   READY   STATUS    RESTARTS   AGE
app-mesh-controller-85f9d4b48f-j9vz4   1/1     Running   0          7m

NOTE: The CRD and injector are AWS-supported open source projects. If you plan to deploy the CRD or injector for production projects, always build them from the latest AWS GitHub repos and deploy them from your own container registry. That way, you stay up-to-date on the latest features and bug fixes.

 

Cleaning up

When you’re done experimenting and want to delete all the resources created during this series, run the cleanup script via the following command line:

./cleanup.sh

This script does not delete any nodes in your k8s cluster. It only deletes the DJ app and App Mesh components created throughout this series of posts.

Make sure to leave the cluster intact if you plan on experimenting in the future with App Mesh on your own or throughout this series of posts.

 

Conclusion of Part 4

In this fourth part of the series, I walked you through setting up the App Mesh sidecar injector and CRD components. In part 5, I show you how to define the App Mesh components required to run DJ App on a service mesh.

 

 

PART 5: Configuring existing microservices

 

In part 4 of this series, I demonstrated how to set up the AWS App Mesh Sidecar Injector and CRDs. In this post, I’ll show how to configure the DJ App microservices to run on top of App Mesh by creating the required App Mesh components.

 

Prerequisites

Make sure that you’ve completed parts 1–4 of this series before running through the steps in this post.

 

DJ App revisited

As shown in the following diagram, the dj service is hardwired to make requests to either the metal-v1 or jazz-v1 backends.

The service mesh-enabled version functionally does exactly what the current version does. The only difference is that you use App Mesh to create two new virtual services called metal and jazz. The dj service now makes a request to these metal or jazz virtual services, which route to their metal-v1 and jazz-v1 counterparts accordingly, based on the virtual services’ routing rules. The following diagram depicts this process.

By virtualizing the metal and jazz services, you can dynamically configure routing rules to the versioned backends of your choosing. That eliminates the need to re-deploy the entire DJ App each time there’s a new metal or jazz service version release.

 

Now that you have a better idea of what you’re building, I’ll show you how to create the mesh.

 

Creating the mesh

The mesh component, which serves as the App Mesh foundation, must be created first. Call the mesh dj-app, and define it in the prod namespace by executing the following command from the repository’s base directory:

kubectl create -f 4_create_initial_mesh_components/mesh.yaml

You should see output similar to the following:

mesh.appmesh.k8s.aws/dj-app created

Because an App Mesh mesh is a custom resource, kubectl can be used to view it using the get command. Run the following command:

kubectl get meshes -nprod

This yields the following:

NAME     AGE
dj-app   1h

As is the case for any of the custom resources you interact with in this tutorial, you can also view App Mesh resources using the AWS CLI:

aws appmesh list-meshes

{
    "meshes": [
        {
            "meshName": "dj-app",
            "arn": "arn:aws:appmesh:us-west-2:123586676:mesh/dj-app"
        }
    ]
}

aws appmesh describe-mesh --mesh-name dj-app

{
    "mesh": {
        "status": {
            "status": "ACTIVE"
        },
        "meshName": "dj-app",
        "metadata": {
            "version": 1,
            "lastUpdatedAt": 1553233281.819,
            "createdAt": 1553233281.819,
            "arn": "arn:aws:appmesh:us-west-2:123586676:mesh/dj-app",
            "uid": "10d86ae0-ece7-4b1d-bc2d-08064d9b55e1"
        }
    }
}

NOTE: If you do not see dj-app returned from the previous list-meshes command, then your user account (as well as your worker nodes) may not have the correct IAM permissions to access App Mesh resources. Verify that you and your worker nodes have the correct permissions per part 2 of this series.

 

Creating the virtual nodes and virtual services

With the foundational mesh component created, continue onward to define the App Mesh virtual node and virtual service components. All physical Kubernetes services that interact with each other in App Mesh must first be defined as virtual node objects.

Abstracting out services as virtual nodes helps App Mesh build rulesets around inter-service communication. In addition, as you define virtual service objects, virtual nodes may be referenced as inputs and target endpoints for those virtual services. Because of this, it makes sense to define the virtual nodes first.

Based on the first App Mesh-enabled architecture, the physical service dj makes requests to two new virtual services—metal and jazz. These services route requests respectively to the physical services metal-v1 and jazz-v1, as shown in the following diagram.

Because there are three physical services involved in this configuration, you’ll need to define three virtual nodes. To do that, enter the following:

kubectl create -nprod -f 4_create_initial_mesh_components/nodes_representing_physical_services.yaml

The output should be similar to:

virtualnode.appmesh.k8s.aws/dj created
virtualnode.appmesh.k8s.aws/jazz-v1 created
virtualnode.appmesh.k8s.aws/metal-v1 created

If you open up the YAML in your favorite editor, you may notice a few things about these virtual nodes.

They’re both similar, but for the purposes of this tutorial, examine just the metal-v1.prod.svc.cluster.local VirtualNode:

apiVersion: appmesh.k8s.aws/v1beta1
kind: VirtualNode
metadata:
  name: metal-v1
  namespace: prod
spec:
  meshName: dj-app
  listeners:
    - portMapping:
        port: 9080
        protocol: http
  serviceDiscovery:
    dns:
      hostName: metal-v1.prod.svc.cluster.local

...

According to this YAML, this virtual node points to a service (spec.serviceDiscovery.dns.hostName: metal-v1.prod.svc.cluster.local) that listens on a given port for requests (spec.listeners.portMapping.port: 9080).

You may notice that jazz-v1 and metal-v1 are similar to the dj virtual node, with one key difference; the dj virtual node contains a backend attribute:

apiVersion: appmesh.k8s.aws/v1beta1
kind: VirtualNode
metadata:
  name: dj
  namespace: prod
spec:
  meshName: dj-app
  listeners:
    - portMapping:
        port: 9080
        protocol: http
  serviceDiscovery:
    dns:
      hostName: dj.prod.svc.cluster.local
  backends:
    - virtualService:
        virtualServiceName: jazz.prod.svc.cluster.local
    - virtualService:
        virtualServiceName: metal.prod.svc.cluster.local

The backend attribute specifies that dj is allowed to make requests to the jazz and metal virtual services only.

At this point, you’ve created three virtual nodes:

kubectl get virtualnodes -nprod

NAME            AGE
dj              6m
jazz-v1         6m
metal-v1        6m

The last step is to create the two App Mesh virtual services that intercept and route requests made to jazz and metal. To do this, run the following command:

kubectl apply -nprod -f 4_create_initial_mesh_components/virtual-services.yaml

The output should be similar to:

virtualservice.appmesh.k8s.aws/jazz.prod.svc.cluster.local created
virtualservice.appmesh.k8s.aws/metal.prod.svc.cluster.local created

If you inspect the YAML, you may notice that it created two virtual service resources. Requests made to jazz.prod.svc.cluster.local are intercepted by App Mesh and routed to the virtual node jazz-v1.

Similarly, requests made to metal.prod.svc.cluster.local are routed to the virtual node metal-v1:

apiVersion: appmesh.k8s.aws/v1beta1
kind: VirtualService
metadata:
  name: jazz.prod.svc.cluster.local
  namespace: prod
spec:
  meshName: dj-app
  virtualRouter:
    name: jazz-router
  routes:
    - name: jazz-route
      http:
        match:
          prefix: /
        action:
          weightedTargets:
            - virtualNodeName: jazz-v1
              weight: 100

---
apiVersion: appmesh.k8s.aws/v1beta1
kind: VirtualService
metadata:
  name: metal.prod.svc.cluster.local
  namespace: prod
spec:
  meshName: dj-app
  virtualRouter:
    name: metal-router
  routes:
    - name: metal-route
      http:
        match:
          prefix: /
        action:
          weightedTargets:
            - virtualNodeName: metal-v1
              weight: 100

NOTE: Remember to use fully qualified DNS names for the virtual service’s metadata.name field to prevent the chance of name collisions when using App Mesh cross-cluster.

With these virtual services defined, to access them by name, clients (in this case, the dj container) first perform a DNS lookup to jazz.prod.svc.cluster.local or metal.prod.svc.cluster.local before making the HTTP request.

If the dj container (or any other client) cannot resolve that name to an IP, the subsequent HTTP request fails with a name lookup error.

The existing physical services (jazz-v1, metal-v1, dj) are defined as physical Kubernetes services, and therefore have resolvable names:

kubectl get svc -nprod

NAME       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
dj         ClusterIP   10.100.247.180   <none>        9080/TCP   16h
jazz-v1    ClusterIP   10.100.157.174   <none>        9080/TCP   16h
metal-v1   ClusterIP   10.100.187.186   <none>        9080/TCP   16h

However, the new jazz and metal virtual services we just created don’t (yet) have resolvable names.

To provide the jazz and metal virtual services with resolvable IP addresses and hostnames, define them as Kubernetes services that do not map to any deployments or pods. Do this by creating them as k8s services without defining selectors for them. Because App Mesh is intercepting and routing requests made for them, they don’t have to map to any pods or deployments on the k8s-side.

To register the placeholder names and IP addresses for these virtual services, run the following command:

kubectl create -nprod -f 4_create_initial_mesh_components/metal_and_jazz_placeholder_services.yaml

The output should be similar to:

service/jazz created
service/metal created

You can now use kubectl to get the registered metal and jazz virtual services:

kubectl get -nprod virtualservices

NAME                           AGE
jazz.prod.svc.cluster.local    10m
metal.prod.svc.cluster.local   10m

You can also get the virtual service placeholder IP addresses and physical service IP addresses:

kubectl get svc -nprod

NAME       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
dj         ClusterIP   10.100.247.180   <none>        9080/TCP   17h
jazz       ClusterIP   10.100.220.118   <none>        9080/TCP   27s
jazz-v1    ClusterIP   10.100.157.174   <none>        9080/TCP   17h
metal      ClusterIP   10.100.122.192   <none>        9080/TCP   27s
metal-v1   ClusterIP   10.100.187.186   <none>        9080/TCP   17h

As such, when name lookup requests are made to your virtual services alongside their physical service counterparts, they resolve.

Currently, if you describe any of the pods running in the prod namespace, they are running with just one container (the same one with which you initially deployed it):

kubectl get pods -nprod

NAME                        READY   STATUS    RESTARTS   AGE
dj-5b445fbdf4-qf8sv         1/1     Running   0          3h
jazz-v1-644856f4b4-mshnr    1/1     Running   0          3h
metal-v1-84bffcc887-97qzw   1/1     Running   0          3h

kubectl describe pods/dj-5b445fbdf4-qf8sv -nprod

...
Containers:
  dj:
    Container ID:   docker://76e6d5f7101dfce60158a63cf7af9fcb3c821c087db360e87c5e2fb8850b7aa9
    Image:          970805265562.dkr.ecr.us-west-2.amazonaws.com/hello-world:latest
    Image ID:       docker-pullable://970805265562.dkr.ecr.us-west-2.amazonaws.com/[email protected]:581fe44cf2413a48f0cdf005b86b025501eaff6cafc7b26367860e07be060753
    Port:           9080/TCP
    Host Port:      0/TCP
    State:          Running
...

The injector controller installed earlier watches for new pods to be created and ensures that any new pods created in the prod namespace are injected with the App Mesh sidecar. Because the dj pods were already running before the injector was created, you’ll now force them to be re-created, this time with the sidecars auto-injected into them.

In production, there are more graceful ways to do this. For the purpose of this tutorial, an easy way to have the deployment re-create the pods in an innocuous fashion is to patch a simple date annotation into the deployment.

To do that with your current deployment, first get all the prod namespace pod names:

kubectl get pods -nprod

The output is the pod names:

NAME                        READY   STATUS    RESTARTS   AGE
dj-5b445fbdf4-qf8sv         1/1     Running   0          3h
jazz-v1-644856f4b4-mshnr    1/1     Running   0          3h
metal-v1-84bffcc887-97qzw   1/1     Running   0          3h

 

Under the READY column, you see 1/1, which indicates that one container is running for each pod.

Next, run the following commands to add a date label to each dj, jazz-v1, and metal-1 deployment, forcing the pods to be re-created:

kubectl patch deployment dj -nprod -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%s'`\"}}}}}"
kubectl patch deployment metal-v1 -nprod -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%s'`\"}}}}}"
kubectl patch deployment jazz-v1 -nprod -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%s'`\"}}}}}"

Again, get the pods:

kubectl get pods -nprod

Under READY, you see 2/2, which indicates that two containers for each pod are running:

NAME                        READY   STATUS    RESTARTS   AGE
dj-6cfb85cdd9-z5hsp         2/2     Running   0          10m
jazz-v1-79d67b4fd6-hdrj9    2/2     Running   0          16s
metal-v1-769b58d9dc-7q92q   2/2     Running   0          18s

NOTE: If you don’t see this exact output, wait about 10 seconds (your redeployment is underway), and re-run the command.

Now describe the new dj pod to get more detail:

...
Containers:
  dj:
    Container ID:   docker://bef63f2e45fb911f78230ef86c2a047a56c9acf554c2272bc094300c6394c7fb
    Image:          970805265562.dkr.ecr.us-west-2.amazonaws.com/hello-world:latest
    ...
  envoy:
    Container ID:   docker://2bd0dc0707f80d436338fce399637dcbcf937eaf95fed90683eaaf5187fee43a
    Image:          111345817488.dkr.ecr.us-west-2.amazonaws.com/aws-appmesh-envoy:v1.8.0.2-beta
    ...

Both the original container and the auto-injected sidecar are running for any new pods created in the prod namespace.

Testing the App Mesh architecture

To test if the new architecture is working as expected, exec into the dj container. Get the name of your dj pod by listing all pods with the dj selector:

kubectl get pods -nprod -lapp=dj

The output should be similar to the following:

NAME                  READY     STATUS    RESTARTS   AGE
dj-5b445fbdf4-8xkwp   1/1       Running   0          32s

Next, exec into the dj pod returned from the last step:

kubectl exec -nprod -it <your-dj-pod-name> bash

The output should be similar to:

[email protected]:/usr/src/app#

Now that you have a root prompt into the dj pod, make a curl request to the virtual service jazz on port 9080. Your request simulates what would happen if code running in the same pod made a request to the jazz backend:

curl jazz.prod.svc.cluster.local:9080;echo

The output should be similar to the following:

["Astrud Gilberto","Miles Davis"]

Try it again, but issue the command to the virtual metal service:

curl metal.prod.svc.cluster.local:9080;echo

You should get a list of heavy metal bands:

["Megadeth","Judas Priest"]

When you’re done exploring this vast, service-mesh-enabled world of music, press CTRL-D, or type exit to exit the container’s shell:

[email protected]:/usr/src/app# exit
command terminated with exit code 1
$

 

Cleaning up

When you’re done experimenting and want to delete all the resources created during this series, run the cleanup script via the following command line:

./cleanup.sh

This script does not delete any nodes in your k8s cluster. It only deletes the DJ app and App Mesh components created throughout this series of posts.

Make sure to leave the cluster intact if you plan on experimenting in the future with App Mesh on your own or throughout this series of posts.

Conclusion of Part 5

In this fifth part of the series, you learned how to enable existing microservices to run on App Mesh. In part 6, I demonstrate the true power of App Mesh by walking you through adding new versions of the metal and jazz services and demonstrating how to route between them.

 

 

PART 6: Deploying with the canary technique

In part 5 of this series, I demonstrated how to configure an existing microservices-based application (DJ App) to run on AWS App Mesh. In this post, I demonstrate how App Mesh can be used to deploy new versions of Amazon EKS-based microservices using the canary technique.

Prerequisites

Make sure that you’ve completed parts 1–5 of this series before running through the steps in this post.

Canary testing with v2

A canary release is a method of slowly exposing a new version of software. The theory is that by serving the new version of the software to a small percentage of requests, any problems only affect the small percentage of users before they’re discovered and rolled back.

So now, back to the DJ App scenario. Version 2 of the metal and jazz services is out, and they now include the city that each artist is from in the response. You’ll now release v2 versions of the metal and jazz services in a canary fashion using App Mesh. When you complete this process, requests to the metal and jazz services are distributed in a weighted fashion to both the v1 and v2 versions.

The following diagram shows the final (v2) seven-microservices-based application, running on an App Mesh service mesh.

 

 

To begin, roll out the v2 deployments, services, and virtual nodes with a single YAML file:

kubectl apply -nprod -f 5_canary/jazz_v2.yaml

The output should be similar to the following:

deployment.apps/jazz-v2 created
service/jazz-v2 created
virtualnode.appmesh.k8s.aws/jazz-v2 created

Next, update the jazz virtual service by modifying the route to spread traffic 50/50 across the two versions. Look at it now, and see that the current route points 100% to jazz-v1:

kubectl describe virtualservice jazz -nprod

Name:         jazz.prod.svc.cluster.local
Namespace:    prod
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:

{"apiVersion":"appmesh.k8s.aws/v1beta1","kind":"VirtualService","metadata":{"annotations":{},"name":"jazz.prod.svc.cluster.local","namesp...
API Version:  appmesh.k8s.aws/v1beta1
Kind:         VirtualService
Metadata:
  Creation Timestamp:  2019-03-23T00:15:08Z
  Generation:          3
  Resource Version:    2851527
  Self Link:           /apis/appmesh.k8s.aws/v1beta1/namespaces/prod/virtualservices/jazz.prod.svc.cluster.local
  UID:                 b76eed59-4d00-11e9-87e6-06dd752b96a6
Spec:
  Mesh Name:  dj-app
  Routes:
    Http:
      Action:
        Weighted Targets:
          Virtual Node Name:  jazz-v1
          Weight:             100
      Match:
        Prefix:  /
    Name:        jazz-route
  Virtual Router:
    Name:  jazz-router
Status:
  Conditions:
Events:  <none>

Apply the updated service definition:

kubectl apply -nprod -f 5_canary/jazz_service_update.yaml

When you describe the virtual service again, you see the updated route:

kubectl describe virtualservice jazz -nprod

Name:         jazz.prod.svc.cluster.local
Namespace:    prod
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:

{"apiVersion":"appmesh.k8s.aws/v1beta1","kind":"VirtualService","metadata":{"annotations":{},"name":"jazz.prod.svc.cluster.local","namesp...
API Version:  appmesh.k8s.aws/v1beta1
Kind:         VirtualService
Metadata:
  Creation Timestamp:  2019-03-23T00:15:08Z
  Generation:          4
  Resource Version:    2851774
  Self Link:           /apis/appmesh.k8s.aws/v1beta1/namespaces/prod/virtualservices/jazz.prod.svc.cluster.local
  UID:                 b76eed59-4d00-11e9-87e6-06dd752b96a6
Spec:
  Mesh Name:  dj-app
  Routes:
    Http:
      Action:
        Weighted Targets:
          Virtual Node Name:  jazz-v1
          Weight:             90
          Virtual Node Name:  jazz-v2
          Weight:             10
      Match:
        Prefix:  /
    Name:        jazz-route
  Virtual Router:
    Name:  jazz-router
Status:
  Conditions:
Events:  <none>

To deploy metal-v2, perform the same steps. Roll out the v2 deployments, services, and virtual nodes with a single YAML file:

kubectl apply -nprod -f 5_canary/metal_v2.yaml

The output should be similar to the following:

deployment.apps/metal-v2 created
service/metal-v2 created
virtualnode.appmesh.k8s.aws/metal-v2 created

Update the metal virtual service by modifying the route to spread traffic 50/50 across the two versions:

kubectl apply -nprod -f 5_canary/metal_service_update.yaml

When you describe the virtual service again, you see the updated route:

kubectl describe virtualservice metal -nprod

Name:         metal.prod.svc.cluster.local
Namespace:    prod
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:

{"apiVersion":"appmesh.k8s.aws/v1beta1","kind":"VirtualService","metadata":{"annotations":{},"name":"metal.prod.svc.cluster.local","names...
API Version:  appmesh.k8s.aws/v1beta1
Kind:         VirtualService
Metadata:
  Creation Timestamp:  2019-03-23T00:15:08Z
  Generation:          2
  Resource Version:    2852282
  Self Link:           /apis/appmesh.k8s.aws/v1beta1/namespaces/prod/virtualservices/metal.prod.svc.cluster.local
  UID:                 b784e824-4d00-11e9-87e6-06dd752b96a6
Spec:
  Mesh Name:  dj-app
  Routes:
    Http:
      Action:
        Weighted Targets:
          Virtual Node Name:  metal-v1
          Weight:             50
          Virtual Node Name:  metal-v2
          Weight:             50
      Match:
        Prefix:  /
    Name:        metal-route
  Virtual Router:
    Name:  metal-router
Status:
  Conditions:
Events:  <none>

Testing the v2 jazz and metal services

Now that the v2 services are deployed, it’s time to test them out. To test if it’s working as expected, exec into the DJ pod. To do that, get the name of your dj pod by listing all pods with the dj selector:

kubectl get pods -nprod -l app=dj

The output should be similar to the following:

NAME                  READY     STATUS    RESTARTS   AGE
dj-5b445fbdf4-8xkwp   1/1       Running   0          32s

Next, exec into the DJ pod by running the following command:

kubectl exec -nprod -it <your dj pod name> bash

The output should be similar to the following:

[email protected]:/usr/src/app#

Now that you have a root prompt into the DJ pod, issue a curl request to the metal virtual service:

while [ 1 ]; do curl http://metal.prod.svc.cluster.local:9080/;echo; done

The output should loop about 50/50 between the v1 and v2 versions of the metal service, similar to:

...
["Megadeth","Judas Priest"]
["Megadeth (Los Angeles, California)","Judas Priest (West Bromwich, England)"]
["Megadeth","Judas Priest"]
["Megadeth (Los Angeles, California)","Judas Priest (West Bromwich, England)"]
...

Press CTRL-C to stop the looping.

Next, perform a similar test, but against the jazz service. Issue a curl request to the jazz virtual service from within the dj pod:

while [ 1 ]; do curl http://jazz.prod.svc.cluster.local:9080/;echo; done

The output should loop about in a 90/10 ratio between the v1 and v2 versions of the jazz service, similar to the following:

...
["Astrud Gilberto","Miles Davis"]
["Astrud Gilberto","Miles Davis"]
["Astrud Gilberto","Miles Davis"]
["Astrud Gilberto (Bahia, Brazil)","Miles Davis (Alton, Illinois)"]
["Astrud Gilberto","Miles Davis"]
...

Press CTRL-C to stop the looping, and then type exit to exit the pod’s shell.

Cleaning up

When you’re done experimenting and want to delete all the resources created during this tutorial series, run the cleanup script via the following command line:

./cleanup.sh

This script does not delete any nodes in your k8s cluster. It only deletes the DJ app and App Mesh components created throughout this series of posts.

Make sure to leave the cluster intact if you plan on experimenting in the future with App Mesh on your own.

Conclusion of Part 6

In this final part of the series, I demonstrated how App Mesh can be used to roll out new microservice versions using the canary technique. Feel free to experiment further with the cluster by adding or removing microservices, and tweaking routing rules by changing weights and targets.

 

Geremy is a solutions architect at AWS.  He enjoys spending time with his family, BBQing, and breaking and fixing things around the house.

 

Updates to Amazon EKS Version Lifecycle

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/updates-to-amazon-eks-version-lifecycle/

Contributed by Nathan Taber and Michael Hausenblas

At re:Invent 2017 we introduced the Amazon Elastic Container Service for Kubernetes, or Amazon EKS for short. We consider these tenets as valid today as they were at launch:

  • EKS is a platform to run production-grade workloads. This means that security and reliability are our first priority. After that we focus on doing the heavy lifting for you in the control plane, including life cycle-related things like version upgrades.
  • EKS provides a native and upstream Kubernetes experience. This means, with EKS you get vanilla, un-forked Kubernetes. Of course, in keeping with our first tenant, we ensure the Kubernetes versions we run have security-related patches, even for older, supported versions as quickly as possible. However, in terms of portability there’s no special sauce and no lock in.
  • If you want to use additional AWS services, the integrations are as seamless as possible.
  • The EKS team in AWS actively contributes to the upstream Kubernetes project, both on the technical level as well as community, from communicating good practices to participation in SIGs and working groups.

The first two tenets are highlighted and that is for a good reason: on the one hand we aim to go in lock-step with the upstream release cadence as much as possible, including outcomes of the SIG PM as well as the LTS Working Group. Given that running a service for production applications is our main focus, we want to make sure that you can rely on the Kubernetes we run for you. This includes, but is not limited to, security considerations around community support for ongoing bug fixes and patches for critical vulnerabilities and exposures (CVEs).

In this post, we want to give you a heads-up on upcoming changes with out Amazon EKS is managing the lifecycle for Kubernetes versions, walk you through the process in general and then have a look at a concrete example, Kubernetes version 1.10. This version happens to be the first version that will be deprecated on Amazon EKS.

But why now?

Glad you asked. It’s really all about security. Past a certain point (usually 1 year), the Kubernetes community stops releasing bug and CVE patches. Additionally, the Kubernetes project does not encourage CVE submission for deprecated versions. This means that vulnerabilities specific to an older version of Kubernetes may not even be reported, leaving users exposed with no notice in the case of a vulnerability. We consider this to be an unacceptable security posture for our customers.

Earlier this year we announced support for Kubernetes 1.12 in EKS. That, together with our commitment to support three Kubernetes versions at any given point in time and the fact that 1.13 will land very soon in EKS means that we have to deprecate 1.10, after which the three supported versions, unsurprisingly, will be 1.11, 1.12, and (you guessed it) 1.13. OK, with that out of the way, let’s have a look at the options you have to move to the latest Kubernetes versions with Amazon EKS and then dive into the update and deprecation process in greater detail:

  • Ideally, you test a new version and move to one of the three supported ones, in time (details below).
  • If you are still on a version we deprecate, you will be upgraded automatically, after some time (details, again, below).
  • If you’re using a deprecated version beyond a certain point and we can’t upgrade the cluster, we may deactivate it.

A quick Kubernetes release cycle refresher

In a nutshell, the Kubernetes versioning and release regime is roughly following a four-releases-per-year pattern, with cadence varying between 70 and 130 days. It also lays out an expectation in terms of upgrades:

We expect users to stay reasonably up-to-date with the versions of Kubernetes they use in production, but understand that it may take time to upgrade, especially for production-critical components.

The formal API versioning allows for a strict deprecation policy which states, amongst other things, that stable (GA) API support is “12 months or 3 releases (whichever is longer)”.

Now that we’re on the same page how upstream Kubernetes releases are managed, let’s have a look at how we at AWS implement the process in EKS.

The EKS Process

In line with the Kubernetes community support for Kubernetes versions, Amazon EKS is committed to running at least three production-ready versions of Kubernetes at any given time, with a fourth version in deprecation. A new Kubernetes version is released as generally available by the Kubernetes project every 70 and 130 days (we take the average of 90 days for simplicity). New GA versions will be supported by EKS some time after GA release (typically at the first patch version release – 1.XX.1, but sometimes later). This means that the total time a version is in production with EKS should be roughly 270 days.

We will announce the deprecation of a given Kubernetes version (n) at least 60 days before the deprecation date and over time, will align the deprecation of a Kubernetes version on EKS to be on or after the date the Kubernetes project stops supporting the version upstream.

For example, we will announce deprecation of version 1.10 while 1.12 is available for EKS and complete the deprecation process after version 1.13 is available for EKS. We will announce the deprecation of 1.11 after 1.13 is available and complete the deprecation after 1.14 is available for EKS.

The following table shows how this will work:

 EKS Version

   Today

   Soon

 About +90 days

 About +180 days

 About +270 days

 Latest Available 

1.12

1.13

1.14

1.15

1.16

 Default 

1.11

1.12

1.13

1.14

1.15

 Oldest 

1.10

1.11

1.12

1.13

1.14

 In Deprecation 

1.10

1.11

1.12

1.13

When we announce the deprecation, we will give customers a specific date when new cluster creation will be disabled for the version targeted for deprecation. On this date, EKS clusters running the version targeted for deprecation will begin to be updated to the next EKS-supported version of Kubernetes. This means that if the deprecated version is 1.10, clusters will be automatically updated to version 1.11. If a cluster is automatically updated by EKS, customers will need to update the version of their worker nodes after the update is complete. Kubernetes has compatibility between masters and workers for at least 2 versions, so 1.10 workers will continue to operate when orchestrated by a 1.11 control plane.

Upcoming deprecation of Kubernetes 1.10 in EKS

Amazon EKS will deprecate Kubernetes version 1.10 on July 22, 2019. On this day, you will no longer be able to create new 1.10 clusters and all EKS clusters running Kubernetes version 1.10 will be updated to the latest available platform version of Kubernetes version 1.11.

We recommend that all Amazon EKS customers update their 1.10 clusters to Kubernetes version 1.11 or 1.12 as soon as possible.

 

Wrapping up

What can you do today to prepare? Well, first off, internalize the timeline and try to align internal processes with it. Our documentation has more information about the EKS Kubernetes version deprecation process and EKS updates. If you have any questions, send us a note on our version deprecation issue in the public containers roadmap on GitHub.

Making Cluster Updates Easy with Amazon EKS

Post Syndicated from Brandon Chavis original https://aws.amazon.com/blogs/compute/making-cluster-updates-easy-with-amazon-eks/

Kubernetes is rapidly evolving, with frequent feature releases, functionality updates, and bug fixes. Additionally, AWS periodically changes the way it configures Amazon Elastic Container Service for Kubernetes (Amazon EKS) to improve performance, support bug fixes, and enable new functionality. Previously, moving to a new Kubernetes version required you to re-create your cluster and migrate your applications. This is a time-consuming process that can result in application downtime.

Today, I’m excited to announce that EKS now performs managed, in-place cluster upgrades for both Kubernetes and EKS platform versions. This simplifies cluster operations and lets you quickly take advantage of the latest Kubernetes features, as well as the updates to EKS configuration and security patches, without any downtime. EKS also now supports Kubernetes version 1.11.5 for all new EKS clusters.

Updates for Kubernetes and EKS

There are two types of updates that you can apply to your EKS cluster, Kubernetes version updates and EKS platform version updates. Today, EKS supports upgrades between Kubernetes minor versions 1.10 and 1.11.

As new Kubernetes versions are released and validated for use with EKS, we will support three stable Kubernetes versions as part of the update process at any given time.

EKS platform versions

The EKS platform version contains Kubernetes patches and changes to the API server configuration. Platform versions are separate from but associated with Kubernetes minor versions.

When a new Kubernetes version is made available for EKS, its initial control plane configuration is released as the “eks.1” platform version. AWS releases new platform versions as needed to enable Kubernetes patches. AWS also releases new versions when there are EKS API server configuration changes that could affect cluster behavior.

Using this versioning scheme makes it possible to independently update the configuration of different Kubernetes versions. For example, AWS might need to release a patch for Kubernetes version 1.10 that is incompatible with Kubernetes version 1.11.

Currently, platform version updates are automatic. AWS plans to provide manual control over platform version updates through the UpdateClusterVersion API operation in the future.

Using the update API operations

There are three new EKS API operations to enable cluster updates:

  • UpdateClusterVersion
  • ListUpdates
  • DescribeUpdates

The UpdateClusterVersion operation can be used through the EKS CLI to start a cluster update between Kubernetes minor versions:

aws eks update-cluster-version --name Your-EKS-Cluster --kubernetes-version 1.11

You only need to pass in a cluster name and the desired Kubernetes version. You do not need to pick a specific patch version for Kubernetes. We pick patch versions that are stable and well-tested. This CLI command returns an “update” API object with several important pieces of information:

{
    "update" : {
        "updateId" : UUID,
        "updateStatus" : PENDING,
        "updateType" : VERSION-UPDATE
        "createdAt" : Timestamp
     }
 }

This update object lets you track the status of your requested modification to your cluster. This can show you if there was there an error due to a misconfiguration on your cluster and if the update in progress, completed, or failed.

You can also list and describe the status of the update independently, using the following operations:

aws eks list-updates --name Your-EKS-Cluster

This returns the in-flight updates for your cluster:

{
    "updates" : {
        "UUID-1",
        "UUID-2"
     },
     "nextToken" : null
 }

Finally, you can also describe a particular update to see details about the update’s status:

aws eks describe-update --name Your-EKS-Cluster --update-id UUID

{
    "update" : {
        "updateId" : UUID,
        "updateStatus" : FAILED,
        "updateType" : VERSION-UPDATE
        "createdAt" : Timestamp
        "error": {
            "errorCode" : DependentResourceNotFound
            "errorMessage" : The Role used for creating the cluster is deleted.
            "resources" : ["aws:iam:arn:role"] 
     }
 }

Considerations when updating

New Kubernetes versions introduce significant changes. I highly recommend that you test the behavior of your application against a new Kubernetes version before performing the update on a production cluster.

Generally, I recommend integrating EKS into your existing CI workflow to test how your application behaves on a new version before updating your production clusters.

Worker node updates

Today, EKS does not update your Kubernetes worker nodes when you update the EKS control plane. You are responsible for updating EKS worker nodes. You can find an overview of this process in Worker Node Updates.

The EKS team releases a set of EKS-optimized AMIs for worker nodes that correspond with each version of Kubernetes supported by EKS. You can find these AMIs listed in the documentation, and you can find the build configuration in a version-specific branch of the Amazon-EKS-AMI GitHub repository .

Getting started

You can start using Kubernetes version 1.11 today for all new EKS clusters. Use cluster updates to move to version 1.11 for all existing EKS clusters. You can learn more about the update process and APIs in our documentation.

Run your Kubernetes Workloads on Amazon EC2 Spot Instances with Amazon EKS

Post Syndicated from Roshni Pary original https://aws.amazon.com/blogs/compute/run-your-kubernetes-workloads-on-amazon-ec2-spot-instances-with-amazon-eks/

Contributed by Madhuri Peri, Sr. EC2 Spot Specialist SA, and Shawn OConnor, AWS Enterprise Solutions Architect

Many organizations today are using containers to package source code and dependencies into lightweight, immutable artifacts that can be deployed reliably to any environment.

Kubernetes (K8s) is an open-source framework for automated scheduling and management of containerized workloads. In addition to master nodes, a K8s cluster is made up of worker nodes where containers are scheduled and run.

Amazon Elastic Container Service for Kubernetes (Amazon EKS) is a managed service that removes the need to manage the installation, scaling, or administration of master nodes and the etcd distributed key-value store. It provides a highly available and secure K8s control plane.

This post demonstrates how to use Spot Instances as K8s worker nodes, and shows the areas of provisioning, automatic scaling, and handling interruptions (termination) of K8s worker nodes across your cluster.

What this post does not cover

This post focuses primarily on EC2 instance scaling. This post also assumes a default interruption mode of terminate for EC2 instances, though there are other interruption types, stop and hibernate. For stateless K8s sessions, I recommend choosing the interruption mode of terminate.

Spot Instances

Amazon EC2 Spot Instances are spare EC2 capacity that offer discounts of 70-90% over On-Demand prices. The Spot price is determined by term trends in supply and demand and the amount of On-Demand capacity on a particular instance size, family, Availability Zone, and AWS Region.

If the available On-Demand capacity of a particular instance type is depleted, the Spot Instance is sent an interruption notice two minutes ahead to gracefully wrap up things. I recommend a diversified fleet of instances, with multiple instance types created by Spot Fleets or EC2 Fleets.

You can use Spot Instances for various fault-tolerant and flexible applications. In a workload that uses container orchestration and management platforms like EKS or Amazon Elastic Container Service (Amazon ECS), the schedulers have built-in mechanisms to identify any pods or containers on these interrupted EC2 instances. The interrupted pods or containers are then replaced on other EC2 instances in the cluster.

Solution architecture

There are three goals to accomplish with this solution:

  1.  The cluster must scale automatically to match the demands of an application.
  2. Optimize for cost by using Spot Instances.
  3. The cluster must be resilient to Spot Instance interruptions.

These goals are accomplished with the following components:

Solution componentRole in solutionCodeDeployment
Cluster AutoscalerScales EC2 instances in or outOpen sourceK8s pod DaemonSet on On-Demand Instances
Auto Scaling groupProvisions Spot or On-Demand InstancesAWSVia CloudFormation
Spot Instance interrupt handlerSets K8s nodes to drain state, when the Spot Instance is interruptedOpen sourceK8s pod DaemonSet on all K8s nodes with the label lifecycle=EC2Spot

Here’s a diagram of the solution architecture.

There are a few important things to note in this architecture:

  • Cluster Autoscaler is being used to control all scaling activities, with changes to the MinSize and DesiredCapacity parameters of the Auto Scaling group. This separation of duties ensures that there are no race conditions.
  • The Auto Scaling groups are used purely to replace any lost instances automatically (for example, terminations or interruptions) and maintain the desired number of instances. There are no scaling policies attached to the groups.
  • Auto Scaling, at the time of this post, supports a single instance type. As noted by Jeff Barr’s post EC2 Fleet – Manage Thousands of On-Demand and Spot Instances with One Request, in H2 2018, Auto Scaling groups will support mixed instance types. At that point, multiple groups will not be required, and can collapse into a single group specifying all instance types.

Here’s a further breakdown on the components.

Cluster Autoscaler

Automatic scaling in K8s comes in two forms:

  • Horizontal Pod Autoscaler scales the pods in a deployment or replica set. It is implemented as a K8s API resource and a controller. The controller manager queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition. It obtains the metrics from either the resource metrics API (for per-pod resource metrics), or the custom metrics API (for all other metrics).
  • Cluster Autoscaler scales the worker nodes available for pods to be placed. Cluster Autoscaler is the focus for this post.

Cluster Autoscaler is the default K8s component that can be used to perform pod scaling as well as scaling nodes in a cluster. It automatically increases the size of an Auto Scaling group so that pods have a place to run. And it attempts to remove idle nodes, that is, nodes with no running pods.

When a pod cannot be scheduled due to lack of available resources, Cluster Autoscaler determines that the cluster must scale up. Expander interfaces allow you to apply different pod placement strategies. Currently, the following strategies are supported:

  • Random – Randomly select an available node group.
  • Most Pods – Selects the group that can schedule the largest quantity of nodes. This can be used balance the load across groups of nodes.
  • Least Waste – This is commonly referred to as ‘bin packing.’ It selects the node-group with the least available tied resource (CPU or memory). This helps to reduce the total node footprint, and is the strategy used in this post.

Although Cluster Autoscaler is the de facto standard for automatic scaling in K8s, it is not part of the main release. Deploy it like any other pod in the kube-system namespace, like other management pods. Those management pods would prevent the cluster from scaling down. Override this default behavior by passing in the –-skip-nodes-with-system-pods=false flag.

But how do you reliably control scale-down operations so that you do not remove the pods that you need? This is accomplished using a pod disruption budget (PDB). A PDB limits the number of replicated pods that can be down at a given time. Create a PDB to ensure that you always have at least one Cluster Autoscaler pod running

In summary, Cluster Autoscaler does not remove nodes under the following scenarios:

  • Pods with a restrictive PDB.
  • Pods running in the kube-system namespace that are deployed (that is, not run on the node by default or which do not have a PDB).
  • Pods not backed by a controller object (not created by a deployment, replica set, job, stateful-set, and so on).
  • Pods running with local storage.
  • Pods running that cannot be moved elsewhere due to various constraints (lack of resources, non-matching node selectors or affinity, matching anti-affinity, and so on).

Auto Scaling Group

With Spot Instances, each instance type in each Availability Zone is a pool with its own Spot price based on the available capacity. A recommended best practice when working with Spot Instances is to use a diversified fleet of instances with multiple instance types, as created by Spot Fleet or EC2 Fleet. These APIs aim to fulfill the specified TargetCapacity across the instance types to launch the number of Spot Instances and optionally, On-Demand Instances.

Unfortunately, Cluster Autoscaler does not support Spot Fleets at this time. You need a different strategy to provide diversification. Cluster Autoscaler for AWS provides integration with Auto Scaling groups. It enables users to choose from four different options of deployment:

  • One Auto Scaling group
  • Multiple Auto Scaling groups
  • Auto-Discovery
  • Master Node setup

For this post, you use the Multi-ASG deployment option. For Cluster Autoscaler and other cluster administration and management pods that run on EKS worker nodes, create a small Auto Scaling group using On-Demand Instances. This ensures that the health of the cluster is not impacted by Spot interruptions.

In K8s, label selectors are used to control where pods are placed. Use the K8s node label selector to place the appropriate pods on Spot or On-Demand Instances.

Interrupt handler

The last component to consider handles how the cluster responds to the interruption of a Spot Instance. The workflow can be summarized as:

  • Identify that a Spot Instance is being reclaimed.
  • Use the 2-minute notification window to gracefully prepare the node for termination.
  • Taint the node and cordon it off to prevent new pods from being placed.
  • Drain connections on the running pods.
  • To maintain desired capacity, replace the pods on remaining nodes.

Spot interruptions are reported in the following ways:

For this post, you use a K8s DaemonSet, which means running one pod per node. The pod periodically polls the EC2 metadata service for a Spot termination notice. If a termination notice is received (HTTP status 200), then it tries to gracefully stop and restart on other nodes before the 2-minute grace period expires. This approach is based on an existing project at the kube-spot-termination-notice-handler GitHub repo.

 Walkthrough

Here’s the suggested workflow for this solution:

  1. Provision the worker nodes with EC2 instances using CloudFormation templates.
  2. Deploy the K8s Cluster Autoscaler pods as a DaemonSet, with a PDB.
  3. Deploy the Spot Instance interrupt handler pods as a DaemonSet.
  4. Deploy the sample application

Prerequisites

You should have the following resources or configurations before starting this walkthrough:

  • An EKS cluster master endpoint
  • An EKS service role ARN
  • Subnet IDs and the control plane security group values
  • EKS master cluster certificates
  • Configuration of kubectl against the master EKS endpoint

For more information, see Amazon EKS – Now Generally Available and Deploy a Kubernetes Application with Amazon Elastic Container Service for Kubernetes.

When you describe the EKS cluster, you get a response like the following sample output:

    "cluster": {
        "name": " DemoSpotClusterScale",
        "arn": "arn:aws:eks:us-west-2: 0123456789012:cluster/ DemoSpotClusterScale",
        "createdAt": 1528317531.751,
        "version": "1.10",
        "endpoint": "https://B960845ED5E21A3439ABB5E12F09CE88.sk1.us-west-2.eks.amazonaws.com",
        "roleArn": "arn:aws:iam::0123456789012:role/eksServiceRoleGA",
        "resourcesVpcConfig": {
            "subnetIds": [
                "subnet-3326464a",
                "subnet-c2b93b89",
                "subnet-13225b49"
            ],
            "securityGroupIds": [
                "sg-7fd0b70e"
            ],
            "vpcId": "vpc-c7c8c4be"
        },
        "status": "ACTIVE",
        "certificateAuthority": {
            "data": "<Your ca data here>"
        }
    }
}

I use the cluster name DemoSpotClusterScale throughout this post. Replace that with your cluster name in the following commands.

Get started

git clone https://github.com/awslabs/ec2-spot-labs.git

cd ec2-spot-labs/ec2-spot-eks-solution

Provision the worker nodes

Add worker nodes to your cluster so that you can deploy your applications. Worker nodes can be either Spot or On-Demand Instances. In this example, use Spot Instances for worker nodes.

You can use this customized AWS CloudFormation template to create the Auto Scaling groups described earlier. This template also labels the node with a lifecycle key value indicating whether it is an On-Demand or Spot Instance node.

The template deploys Auto Scaling groups dedicated to the following instance types:

  • Spot Instances, m4.large, across three Availability Zones.
  • Spot Instances, t2.medium, across three Availability Zones.
  • On-Demand Instances, across three Availability Zones.

Make sure that you apply the aws-auth-cm.yaml file with the appropriate NodeInstanceRole value, as provisioned by the CloudFormation template. Find this parameter on the Resources tab.

kubectl apply -f aws-auth-cm.yaml

If the kubectl get nodes command worked as documented, then you are ready to proceed to the next section

Deploying Cluster Autoscaler and PDB

  1. Download the manifest file cluster-autoscaler-ds.yaml. There are six K8s resources that enable the cluster-autoscaler add-on to work in the EKS environment:
    • Service account
    • Cluster role
    • Role
    • Cluster role binding
    • Role binding
    • Two Auto Scaling groups created by the CloudFormation template for Spot and On-Demand Instances

    You also see the cluster-autoscaler command with configured parameters.

  2. Edit the cluster-autoscaler-ds.yaml file to replace the [OD-NodeGroup-Name], [Spot-NodeGroup1-Name], [Spot-NodeGroup2-Name] sections in lines 141-143 with the resources created in your worker node cloudformation template as shown in screenshot above. Deploy the cluster-autoscaler-ds.yaml manifest
    $ kubectl create -f cluster-autoscaler/cluster-autoscaler-ds.yaml

  3. Monitor the deployment:
    $ kubectl logs cluster-autoscaler-<podgeneratedID> --namespace=kube-system

  4. Download and deploy the Cluster Autoscaler PDB:
    $ kubectl create -f cluster-autoscaler/cluster-autoscaler-pdb.yaml

Deploy the Spot Instance interrupt handler

Each K8s EC2 node being launched must have the lifecycle=Ec2Spot value for -node-label, as in the following example. This line is an excerpt from the CloudFormation template:

“sed -i s,MAX_PODS,”, !Join [ “”, [ “‘”, { “Fn::FindInMap”: [ MaxPodsPerNode, { Ref: SpotNode2InstanceType }, MaxPods ] }, ” –node-labels “, “lifecycle=Ec2Spot” , “‘” ] ], “,g /etc/systemd/system/kubelet.service”, “\n”,

The Docker image contains the instance metadata poll script, as shown in entrypoint.sh. Publish this image to your repository. In the following screenshot, I used my ECR repository. A sample image is available on Docker Hub.

Deploy the Spot interrupt handler pod using spec. This sets up the DaemonSet only on the instances that have a K8s label of lifecycle=Ec2Spot.

kubectl apply -f spot-termination-handler/deploy-k8-pod/spot-interrupt-handler.yaml

When the Spot Instance is interrupted, this pod catches the interruption and vacates the pods.

Deploy the sample application and test out scaling up & down

Deploy a sample application with three replicas. Create a new manifest file named greeter-sample.yaml from the code below, or download it from here

You are using node affinity to prefer deployment on Spot Instances. If the Ec2Spot label is unavailable, the manifest file allows the application to run elsewhere

$ kubectl create -f sample/greeter-sample.yaml

Scale up, and watch Cluster Autoscaler manage the Auto Scaling groups. Verify that Cluster Autoscaler is working by scaling up the sample service beyond the current limits of the cluster.

$ kubectl scale --replicas=50 deployment/greeter-sample

Check the AWS Management Console to confirm that the Auto Scaling groups are scaling up to meet demand. This may take a few minutes. You can also follow along with the pod deployment from the command line. You should see the pods transition from pending to running as nodes are scaled up.

$ kubectl get pods -o wide --watch

Scale down, and watch Cluster Autoscaler manage the Auto Scaling groups:

$ kubectl scale --replicas=1 deployment/greeter

Check the K8s logs to watch the terminations occur:

$ kubectl logs deployment/cluster-autoscaler-<podgeneratedID> –namespace=kube-system

Conclusion

In this post, I showed you how to use Spot Instances with K8s workloads, by provisioning, scaling, and managing terminations effectively in EKS clusters to leverage both cost and scale optimizations. Happy coding!

Running GPU-Accelerated Kubernetes Workloads on P3 and P2 EC2 Instances with Amazon EKS

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/running-gpu-accelerated-kubernetes-workloads-on-p3-and-p2-ec2-instances-with-amazon-eks/

This post contributed by Scott Malkie, AWS Solutions Architect

Amazon EC2 P3 and P2 instances, featuring NVIDIA GPUs, power some of the most computationally advanced workloads today, including machine learning (ML), high performance computing (HPC), financial analytics, and video transcoding. Now Amazon Elastic Container Service for Kubernetes (Amazon EKS) supports P3 and P2 instances, making it easy to deploy, manage, and scale GPU-based containerized applications.

This blog post walks through how to start up GPU-powered worker nodes and connect them to an existing Amazon EKS cluster. Then it demonstrates an example application to show how containers can take advantage of all that GPU power!

Prerequisites

You need an existing Amazon EKS cluster, kubectl, and the aws-iam-authenticator set up according to Getting Started with Amazon EKS.

Two steps are required to enable GPU workloads. First, join Amazon EC2 P3 or P2 GPU compute instances as worker nodes to the Kubernetes cluster. Second, configure pods to enable container-level access to the node’s GPUs.

Spinning up Amazon EC2 GPU instances and joining them to an existing Amazon EKS Cluster

To start the worker nodes, use the standard AWS CloudFormation template for Amazon EKS worker nodes, specifying the AMI ID of the new Amazon EKS-optimized AMI for GPU workloads. This AMI is available on AWS Marketplace.

Subscribe to the AMI and then launch it using the AWS CloudFormation template. The template takes care of networking, configuring kubelets, and placing your worker nodes into an Auto Scaling group, as shown in the following image.

This template creates an Auto Scaling group with up to two p3.8xlarge Amazon EC2 GPU instances. Powered by up to eight NVIDIA Tesla V100 GPUs, these instances deliver up to 1 petaflop of mixed-precision performance per instance to significantly accelerate ML and HPC applications. Amazon EC2 P3 instances have been proven to reduce ML training times from days to hours and to reduce time-to-results for HPC.

After the AWS CloudFormation template completes, the Outputs view contains the NodeInstanceRole parameter, as shown in the following image.

NodeInstanceRole needs to be passed in to the AWS Authenticator ConfigMap, as documented in the AWS EKS Getting Started Guide. To do so, edit the ConfigMap template and run the command kubectl apply -f aws-auth-cm.yaml in your terminal to apply the ConfigMap. You can then run kubectl get nodes —watch to watch the two Amazon EC2 GPU instances join the cluster, as shown in the following image.

Configuring Kubernetes pods to access GPU resources

First, use the following command to apply the NVIDIA Kubernetes device plugin as a daemon set on the cluster.

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.10/nvidia-device-plugin.yml

This command produces the following output:

Once the daemon set is running on the GPU-powered worker nodes, use the following command to verify that each node has allocatable GPUs.

kubectl get nodes \
"-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"

The following output shows that each node has four GPUs available:

Next, modify any Kubernetes pod manifests, such as the following one, to take advantage of these GPUs. In general, adding the resources configuration (resources: limits:) to pod manifests gives containers access to one GPU. A pod can have access to all of the GPUs available to the node that it’s running on.

apiVersion: v1
kind: Pod
metadata:
  name: pod-name
spec:
  containers:
  - name: container-name
    ...
    resources:
      limits:
        nvidia.com/gpu: 4

As a more specific example, the following sample manifest displays the results of the nvidia-smi binary, which shows diagnostic information about all GPUs visible to the container.

apiVersion: v1
kind: Pod
metadata:
  name: nvidia-smi
spec:
  restartPolicy: OnFailure
  containers:
  - name: nvidia-smi
    image: nvidia/cuda:latest
    args:
    - "nvidia-smi"
    resources:
      limits:
        nvidia.com/gpu: 4

Download this manifest as nvidia-smi-pod.yaml and launch it with kubectl apply -f nvidia-smi-pod.yaml.

To confirm successful nvidia-smi execution, use the following command to examine the log.

kubectl logs nvidia-smi

The above commands produce the following output:

Existing limitations

  • GPUs cannot be overprovisioned – containers and pods cannot share GPUs
  • The maximum number of GPUs that you can schedule to a pod is capped by the number of GPUs available to that pod’s node
  • Depending on your account, you might have Amazon EC2 service limits on how many and which type of Amazon EC2 GPU compute instances you can launch simultaneously

For more information about GPU support in Kubernetes, see the Kubernetes documentation. For more information about using Amazon EKS, see the Amazon EKS documentation. Guidance setting up and running Amazon EKS can be found in the AWS Workshop for Kubernetes on GitHub.

Please leave any comments about this post and share what you’re working on. I can’t wait to see what you build with GPU-powered workloads on Amazon EKS!

AWS Online Tech Talks – June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-june-2018/

AWS Online Tech Talks – June 2018

Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

 

Analytics & Big Data

June 18, 2018 | 11:00 AM – 11:45 AM PTGet Started with Real-Time Streaming Data in Under 5 Minutes – Learn how to use Amazon Kinesis to capture, store, and analyze streaming data in real-time including IoT device data, VPC flow logs, and clickstream data.
June 20, 2018 | 11:00 AM – 11:45 AM PT – Insights For Everyone – Deploying Data across your Organization – Learn how to deploy data at scale using AWS Analytics and QuickSight’s new reader role and usage based pricing.

 

AWS re:Invent
June 13, 2018 | 05:00 PM – 05:30 PM PTEpisode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar.
Compute

June 25, 2018 | 01:00 PM – 01:45 PM PTAccelerating Containerized Workloads with Amazon EC2 Spot Instances – Learn how to efficiently deploy containerized workloads and easily manage clusters at any scale at a fraction of the cost with Spot Instances.

June 26, 2018 | 01:00 PM – 01:45 PM PTEnsuring Your Windows Server Workloads Are Well-Architected – Get the benefits, best practices and tools on running your Microsoft Workloads on AWS leveraging a well-architected approach.

 

Containers
June 25, 2018 | 09:00 AM – 09:45 AM PTRunning Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.

 

Databases

June 18, 2018 | 01:00 PM – 01:45 PM PTOracle to Amazon Aurora Migration, Step by Step – Learn how to migrate your Oracle database to Amazon Aurora.
DevOps

June 20, 2018 | 09:00 AM – 09:45 AM PTSet Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tools – Learn how to set up a CI/CD pipeline for deploying containers using the AWS Developer Tools.

 

Enterprise & Hybrid
June 18, 2018 | 09:00 AM – 09:45 AM PTDe-risking Enterprise Migration with AWS Managed Services – Learn how enterprise customers are de-risking cloud adoption with AWS Managed Services.

June 19, 2018 | 11:00 AM – 11:45 AM PTLaunch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new

 

AWS Environments

June 21, 2018 | 11:00 AM – 11:45 AM PTLeading Your Team Through a Cloud Transformation – Learn how you can help lead your organization through a cloud transformation.

June 21, 2018 | 01:00 PM – 01:45 PM PTEnabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.

June 28, 2018 | 01:00 PM – 01:45 PM PTFireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device.
IoT

June 27, 2018 | 11:00 AM – 11:45 AM PTAWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.

 

Machine Learning

June 19, 2018 | 09:00 AM – 09:45 AM PTIntegrating Amazon SageMaker into your Enterprise – Learn how to integrate Amazon SageMaker and other AWS Services within an Enterprise environment.

June 21, 2018 | 09:00 AM – 09:45 AM PTBuilding Text Analytics Applications on AWS using Amazon Comprehend – Learn how you can unlock the value of your unstructured data with NLP-based text analytics.

 

Management Tools

June 20, 2018 | 01:00 PM – 01:45 PM PTOptimizing Application Performance and Costs with Auto Scaling – Learn how selecting the right scaling option can help optimize application performance and costs.

 

Mobile
June 25, 2018 | 11:00 AM – 11:45 AM PTDrive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.

 

Security, Identity & Compliance

June 26, 2018 | 09:00 AM – 09:45 AM PTUnderstanding AWS Secrets Manager – Learn how AWS Secrets Manager helps you rotate and manage access to secrets centrally.
June 28, 2018 | 09:00 AM – 09:45 AM PTUsing Amazon Inspector to Discover Potential Security Issues – See how Amazon Inspector can be used to discover security issues of your instances.

 

Serverless

June 19, 2018 | 01:00 PM – 01:45 PM PTProductionize Serverless Application Building and Deployments with AWS SAM – Learn expert tips and techniques for building and deploying serverless applications at scale with AWS SAM.

 

Storage

June 26, 2018 | 11:00 AM – 11:45 AM PTDeep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services.
June 27, 2018 | 01:00 PM – 01:45 PM PTChanging the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances.
June 28, 2018 | 11:00 AM – 11:45 AM PTBig Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.

Kata Containers 1.0

Post Syndicated from ris original https://lwn.net/Articles/755230/rss

Kata Containers 1.0 has been released. “This first release of Kata Containers completes the merger of Intel’s Clear Containers and Hyper’s runV technologies, and delivers an OCI compatible runtime with seamless integration for container ecosystem technologies like Docker and Kubernetes.

[$] Securing the container image supply chain

Post Syndicated from corbet original https://lwn.net/Articles/754443/rss

“Security is hard” is a tautology, especially in the fast-moving world
of container orchestration. We have previously covered various aspects of
Linux container
security through, for example, the Clear Containers implementation
or the broader question of Kubernetes and
security
, but those are mostly concerned with container isolation; they do not address the
question of trusting a container’s contents. What is a container running?
Who built it and when? Even assuming we have good programmers and solid
isolation layers, propagating that good code around a Kubernetes cluster
and making strong assertions on the integrity of that supply chain is far
from trivial. The 2018 KubeCon
+ CloudNativeCon Europe
event featured some projects that could
eventually solve that problem.

[$] Updates in container isolation

Post Syndicated from corbet original https://lwn.net/Articles/754433/rss

At KubeCon
+ CloudNativeCon Europe
2018, several talks explored the topic of
container isolation and security. The last year saw the release of Kata Containers which, combined with
the CRI-O project, provided strong isolation
guarantees for containers using a hypervisor. During the conference, Google
released its own hypervisor called gVisor, adding yet another
possible solution for this problem. Those new developments prompted the
community to work on integrating the concept of “secure containers”
(or “sandboxed containers”) deeper
into Kubernetes. This work is now coming to fruition; it prompts us to look
again at how Kubernetes tries to keep the bad guys from wreaking havoc once
they break into a container.

Security updates for Wednesday

Post Syndicated from ris original https://lwn.net/Articles/754653/rss

Security updates have been issued by CentOS (dhcp), Debian (xen), Fedora (dhcp, flac, kubernetes, leptonica, libgxps, LibRaw, matrix-synapse, mingw-LibRaw, mysql-mmm, patch, seamonkey, webkitgtk4, and xen), Mageia (389-ds-base, exempi, golang, graphite2, libpam4j, libraw, libsndfile, libtiff, perl, quassel, spring-ldap, util-linux, and wget), Oracle (dhcp and kernel), Red Hat (389-ds-base, chromium-browser, dhcp, docker-latest, firefox, kernel-alt, libvirt, qemu-kvm, redhat-vertualization-host, rh-haproxy18-haproxy, and rhvm-appliance), Scientific Linux (389-ds-base, dhcp, firefox, libvirt, and qemu-kvm), and Ubuntu (poppler).

[$] Autoscaling for Kubernetes workloads

Post Syndicated from corbet original https://lwn.net/Articles/754153/rss

Technologies like containers, clusters, and Kubernetes offer the prospect
of rapidly scaling the available computing resources to match variable demands
placed on the system. Actually implementing that scaling can be a
challenge, though.
During KubeCon
+ CloudNativeCon Europe 2018
,
Frederic Branczyk from CoreOS (now
part of Red Hat) held a packed session
to introduce a standard and officially recommended way to scale workloads
automatically in Kubernetes
clusters.

Security updates for Friday

Post Syndicated from ris original https://lwn.net/Articles/754257/rss

Security updates have been issued by Arch Linux (libmupdf, mupdf, mupdf-gl, and mupdf-tools), Debian (firebird2.5, firefox-esr, and wget), Fedora (ckeditor, drupal7, firefox, kubernetes, papi, perl-Dancer2, and quassel), openSUSE (cairo, firefox, ImageMagick, libapr1, nodejs6, php7, and tiff), Red Hat (qemu-kvm-rhev), Slackware (mariadb), SUSE (xen), and Ubuntu (openjdk-8).

CI/CD with Data: Enabling Data Portability in a Software Delivery Pipeline with AWS Developer Tools, Kubernetes, and Portworx

Post Syndicated from Kausalya Rani Krishna Samy original https://aws.amazon.com/blogs/devops/cicd-with-data-enabling-data-portability-in-a-software-delivery-pipeline-with-aws-developer-tools-kubernetes-and-portworx/

This post is written by Eric Han – Vice President of Product Management Portworx and Asif Khan – Solutions Architect

Data is the soul of an application. As containers make it easier to package and deploy applications faster, testing plays an even more important role in the reliable delivery of software. Given that all applications have data, development teams want a way to reliably control, move, and test using real application data or, at times, obfuscated data.

For many teams, moving application data through a CI/CD pipeline, while honoring compliance and maintaining separation of concerns, has been a manual task that doesn’t scale. At best, it is limited to a few applications, and is not portable across environments. The goal should be to make running and testing stateful containers (think databases and message buses where operations are tracked) as easy as with stateless (such as with web front ends where they are often not).

Why is state important in testing scenarios? One reason is that many bugs manifest only when code is tested against real data. For example, we might simply want to test a database schema upgrade but a small synthetic dataset does not exercise the critical, finer corner cases in complex business logic. If we want true end-to-end testing, we need to be able to easily manage our data or state.

In this blog post, we define a CI/CD pipeline reference architecture that can automate data movement between applications. We also provide the steps to follow to configure the CI/CD pipeline.

 

Stateful Pipelines: Need for Portable Volumes

As part of continuous integration, testing, and deployment, a team may need to reproduce a bug found in production against a staging setup. Here, the hosting environment is comprised of a cluster with Kubernetes as the scheduler and Portworx for persistent volumes. The testing workflow is then automated by AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild.

Portworx offers Kubernetes storage that can be used to make persistent volumes portable between AWS environments and pipelines. The addition of Portworx to the AWS Developer Tools continuous deployment for Kubernetes reference architecture adds persistent storage and storage orchestration to a Kubernetes cluster. The example uses MongoDB as the demonstration of a stateful application. In practice, the workflow applies to any containerized application such as Cassandra, MySQL, Kafka, and Elasticsearch.

Using the reference architecture, a developer calls CodePipeline to trigger a snapshot of the running production MongoDB database. Portworx then creates a block-based, writable snapshot of the MongoDB volume. Meanwhile, the production MongoDB database continues serving end users and is uninterrupted.

Without the Portworx integrations, a manual process would require an application-level backup of the database instance that is outside of the CI/CD process. For larger databases, this could take hours and impact production. The use of block-based snapshots follows best practices for resilient and non-disruptive backups.

As part of the workflow, CodePipeline deploys a new MongoDB instance for staging onto the Kubernetes cluster and mounts the second Portworx volume that has the data from production. CodePipeline triggers the snapshot of a Portworx volume through an AWS Lambda function, as shown here

 

 

 

AWS Developer Tools with Kubernetes: Integrated Workflow with Portworx

In the following workflow, a developer is testing changes to a containerized application that calls on MongoDB. The tests are performed against a staging instance of MongoDB. The same workflow applies if changes were on the server side. The original production deployment is scheduled as a Kubernetes deployment object and uses Portworx as the storage for the persistent volume.

The continuous deployment pipeline runs as follows:

  • Developers integrate bug fix changes into a main development branch that gets merged into a CodeCommit master branch.
  • Amazon CloudWatch triggers the pipeline when code is merged into a master branch of an AWS CodeCommit repository.
  • AWS CodePipeline sends the new revision to AWS CodeBuild, which builds a Docker container image with the build ID.
  • AWS CodeBuild pushes the new Docker container image tagged with the build ID to an Amazon ECR registry.
  • Kubernetes downloads the new container (for the database client) from Amazon ECR and deploys the application (as a pod) and staging MongoDB instance (as a deployment object).
  • AWS CodePipeline, through a Lambda function, calls Portworx to snapshot the production MongoDB and deploy a staging instance of MongoDB• Portworx provides a snapshot of the production instance as the persistent storage of the staging MongoDB
    • The MongoDB instance mounts the snapshot.

At this point, the staging setup mimics a production environment. Teams can run integration and full end-to-end tests, using partner tooling, without impacting production workloads. The full pipeline is shown here.

 

Summary

This reference architecture showcases how development teams can easily move data between production and staging for the purposes of testing. Instead of taking application-specific manual steps, all operations in this CodePipeline architecture are automated and tracked as part of the CI/CD process.

This integrated experience is part of making stateful containers as easy as stateless. With AWS CodePipeline for CI/CD process, developers can easily deploy stateful containers onto a Kubernetes cluster with Portworx storage and automate data movement within their process.

The reference architecture and code are available on GitHub:

● Reference architecture: https://github.com/portworx/aws-kube-codesuite
● Lambda function source code for Portworx additions: https://github.com/portworx/aws-kube-codesuite/blob/master/src/kube-lambda.py

For more information about persistent storage for containers, visit the Portworx website. For more information about Code Pipeline, see the AWS CodePipeline User Guide.

Security updates for Tuesday

Post Syndicated from ris original https://lwn.net/Articles/751454/rss

Security updates have been issued by CentOS (libvorbis and thunderbird), Debian (pjproject), Fedora (compat-openssl10, java-1.8.0-openjdk-aarch32, libid3tag, python-pip, python3, and python3-docs), Gentoo (ZendFramework), Oracle (thunderbird), Red Hat (ansible, gcc, glibc, golang, kernel, kernel-alt, kernel-rt, krb5, kubernetes, libvncserver, libvorbis, ntp, openssh, openssl, pcs, policycoreutils, qemu-kvm, and xdg-user-dirs), SUSE (openssl and openssl1), and Ubuntu (python-crypto, ubuntu-release-upgrader, and wayland).

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/751346/rss

Security updates have been issued by Arch Linux (openssl and zziplib), Debian (ldap-account-manager, ming, python-crypto, sam2p, sdl-image1.2, and squirrelmail), Fedora (bchunk, koji, libidn, librelp, nodejs, and php), Gentoo (curl, dhcp, libvirt, mailx, poppler, qemu, and spice-vdagent), Mageia (389-ds-base, aubio, cfitsio, libvncserver, nmap, and ntp), openSUSE (GraphicsMagick, ImageMagick, spice-gtk, and wireshark), Oracle (kubernetes), Slackware (patch), and SUSE (apache2 and openssl).

timeShift(GrafanaBuzz, 1w) Issue 38

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/03/30/timeshiftgrafanabuzz-1w-issue-38/

Welcome to TimeShift We have an abridged version of timeShift this week due to the long holiday weekend. Earlier this week we released Grafana v5.0.4 which included fixes for alerting, snapshots, starting Grafana on K8s and more. See the section below for specific bug fixes. Enjoy this issue and we’ll see you next week! Latest Stable Release This week we rolled out Grafana 5.0.4. Bug fixes in the latest release include: Docker: Can’t start Grafana on Kubernetes 1.

Kubernetes 1.10 released

Post Syndicated from ris original https://lwn.net/Articles/750236/rss

Kubernetes 1.10 has been released. “This newest version stabilizes features in 3 key areas, including storage, security, and networking. Notable additions in this release include the introduction of external kubectl credential providers (alpha), the ability to switch DNS service to CoreDNS at install time (beta), and the move of Container Storage Interface (CSI) and persistent local volumes to beta.