Mixing AWS Graviton with x86 CPUs to optimize cost and resiliency using Amazon EKS

Post Syndicated from Macey Neff original https://aws.amazon.com/blogs/compute/mixing-aws-graviton-with-x86-cpus-to-optimize-cost-and-resilience-using-amazon-eks/

This post is written by Yahav Biran, Principal SA, and Yuval Dovrat, Israel Head Compute SA.

This post shows you how to integrate AWS Graviton-based Amazon EC2 instances into an existing Amazon Elastic Kubernetes Service (Amazon EKS) environment running on x86-based Amazon EC2 instances. Customers use mixed-CPU architectures to enable their application to utilize a wide selection of Amazon EC2 instance types and improve overall application resilience. In order to successfully run a mixed-CPU application, it is strongly recommended that you test application performance in a test environment before running production applications on Graviton-based instances. You can follow AWS’ transition guide to learn more about porting your application to AWS Graviton.

This example shows how you can use KEDA for controlling application capacity across CPU types in EKS. KEDA will trigger a deployment based on the application’s response latency as measured by the Application Load Balancer (ALB). To simplify resource provisioning, Karpenter, an open-source Kubernetes node provisioning software, and AWS Load Balancer Controller, are shown as well.

Solution Overview

There are two solutions that this post covers to test a mixed-CPU application. The first configuration (shown in Figure 1 below) is the “A/B Configuration”. It uses an Application Load Balancer (ALB)-based Ingress to control traffic flowing to x86-based and Graviton-based node pools. You use this configuration to gradually migrate a live application from x86-based instances to Graviton-based instances, while validating the response time with Amazon CloudWatch.

A/B Configuration, with ALB ingress for gradual transition between CPU types

Figure 1, config 1: A/B Configuration

In the second configuration, the “Karpenter Controlled Configuration” (shown in Figure 2 below as Config 2), Karpenter automatically controls the instance blend. Karpenter is configured to use weighted provisioners with values that prioritize AWS Graviton-based Amazon EC2 instances over x86-based Amazon EC2 instances.

Karpenter Controlled Configuration, with Weighting provisioners topology

Figure 2, config II:  Karpenter Controlled Configuration, with Weighting provisioners topology

It is recommended that you start with the “A/B” configuration to measure the response time of live requests. Once your workload is validated on Graviton-based instances, you can build the second configuration to simplify the deployment configuration and increase resiliency. This enables your application to automatically utilize x86-based instances if needed, for example, during an unplanned large-scale event.

You can find the step-by-step guide on GitHub to help you to examine and try the example app deployment described in this post. The following provides an overview of the step-by-step guide.

Code Migration to AWS Graviton

The first step is migrating your code from x86-based instances to Graviton-based instances. AWS has multiple resources to help you migrate your code. These include AWS Graviton Fast Start Program, AWS Graviton Technical Guide GitHub Repository, AWS Graviton Transition Guide, and Porting Advisor for Graviton.

After making any required changes, you might need to recompile your application for the Arm64 architecture. This is necessary if your application is written in a language that compiles to machine code, such as Golang and C/C++, or if you need to rebuild native-code libraries for interpreted/JIT compiled languages such as the Python/C API or Java Native Interface (JNI).

To allow your containerized application to run on both x86 and Graviton-based nodes, you must build OCI images for both the x86 and Arm64 architectures, push them to your image repository (such as Amazon ECR), and stitch them together by creating and pushing an OCI multi-architecture manifest list. You can find an overview of these steps in this AWS blog post. You can also find the AWS Cloud Development Kit (CDK) construct on GitHub to help get you started.

To simplify the process, you can use a Linux distribution package manager that supports cross-platform packages and avoid platform-specific software package names in the Linux distribution wherever possible. For example, use:

RUN pip install httpd

instead of:

ARG ARCH=aarch64 or amd64
RUN yum install httpd.${ARCH}

This blog post shows you how to automate multi-arch OCI image building in greater depth.

Application Deployment

Config 1 – A/B controlled topology

This topology allows you to migrate to Graviton while validating the application’s response time (approximately 300ms) on both x86 and Graviton-based instances. As shown in Figure 1, this design has a single Listener that forwards incoming requests to two Target Groups. One Target Group is associated with Graviton-based instances, while the other Target Group is associated with x86-based instances. The traffic ratio associated with each target group is defined in the Ingress configuration.

Here are the steps to create Config 1:

  1. Create two KEDA ScaledObjects that scale the number of pods based on the latency metric (AWS/ApplicationELB-TargetResponseTime) that matches the target group (triggers.metadata.dimensionValue). Declare the maximum acceptable latency in targetMetricValue:0.3.
    Below is the Graviton deployment scaledObject (spec.scaleTargetRef), note the comments that denote the value of the x86 deployment scaledObject
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
…
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: armsimplemultiarchapp #amdsimplemultiarchapp
…
  triggers:                 
    - type: aws-cloudwatch
      metadata:
        namespace: "AWS/ApplicationELB"
        dimensionName: "LoadBalancer"
        dimensionValue: "app/simplemultiarchapp/xxxxxx"
        metricName: "TargetResponseTime"
        targetMetricValue: "0.3"
  1. Once the topology has been created, add Amazon CloudWatch Container Insights to measure CPU, network throughput, and instance performance.
  2. To simplify testing and control for potential performance differences in instance generations, create two dedicated Karpenter provisioners and Kubernetes Deployments (replica sets) and specify the instance generation, CPU count, and CPU architecture for each one. This example uses c7g (Graviton3) and c6i (Intel) . You will remove these constraints in the next topology to allow more allocation flexibility.

The x86-based instances Karpenter provisioner:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: x86provisioner
spec:
  requirements:
  - key: karpenter.k8s.aws/instance-generation
    operator: In
    values:
    - "6"
  - key: karpenter.k8s.aws/instance-cpu
    operator: In 
    values:
    - "2"
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64

The Graviton-based instances Karpenter provisioner:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: arm64provisioner
spec:
  requirements:
  - key: karpenter.k8s.aws/instance-generation
    operator: In
    values:
    - "7"
  - key: karpenter.k8s.aws/instance-cpu
    operator: In
    values:
    - "2"
  - key: kubernetes.io/arch
    operator: In
    values:
    - arm64
  1. Create two Kubernetes Deployment resources—one per CPU architecture—that use nodeSelector to schedule one Deployment on Graviton-based instances, and another Deployment on x86-based instances. Similarly, create two NodePort Service resources, where each Service points to its architecture-specific ReplicaSet.
  2. Create an Application Load Balancer using the AWS Load Balancer Controller to distribute incoming requests among the different pods. Control the traffic routing in the ingress by adding an ingress.kubernetes.io/actions.weighted-routing annotation. You can adjust the weight in the example below to meet your needs. This migration example started with a 100%-to-0% x86-to-Graviton ratio, adjusting over time by 10% increments until it reached a 0%-to-100% x86-to-Graviton ratio.
…
alb.ingress.kubernetes.io/actions.weighted-routing: | 
{
…
  "targetGroups":[
    {
      "serviceName":"armsimplemultiarchapp-svc",
      "servicePort":"80","weight":50
    },
    {
      "serviceName":"amdsimplemultiarchapp-svc",
      "servicePort":"80","weight":50}]
    }
 }

spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: weighted-routing

You can simulate live user requests to an example application ALB endpoint. Amazon CloudWatch populates ALB Target Group request/second metrics, dimensioned by HTTP response code, to help assess the application throughput and CPU usage.

During the simulation, you will need to verify the following:

  • Both Graviton-based instances and x86-based instances pods process a variable amount of traffic.
  • The application response time (p99) meets the performance requirements (300ms).

The orange (Graviton) and blue (x86) curves of HTTP 2xx responses (figure 4) show the application throughput (HTTP requests/seconds) for each CPU architecture during the migration.

Gradual transition from x86 to Graviton using ALB ingress

Figure 3 HTTP 2XX per CPU architecture

Figure 4 shows an example of application response time during the transition from x86-based instances to Graviton-based instances. The latency associated with each instance family grows and shrinks as the live request simulation changes the load on the application. In this example, the latency on x86 instances (until 07:00) grew up to 300ms because most of the request load was directed at to x86-based pods. It began to converge at around 08:00 when more pods were powered by Graviton-based instances. Finally, after 15:00, the request load was processed by Graviton-based instances entirely.

Two curves with different colors indicate p99 application targets response time. Graviton-based pods have a response time (between 150 and 300ms) similar to x86-based pods.

Figure 4: Target Response Time p99

Config 2 – Karpenter Controlled Configuration

After fully testing the application on Graviton-based EC2 instances, you are ready to simplify the deployment topology with weighted provisioners while preserving the ability to launch x86-based instances as needed.

Here are the steps to create Config 2:

  1. Reuse the CPU-based provisioners from the previous topology, but assign a higher .spec.weight to Graviton-based instances provisioner. The x86 provisioner is still deployed in case x86-based instances are required. The karpenter.k8s.aws/instance-family can be expanded beyond those set in Config 1 or excluded by switching the operator to NotIn.

The x86-based Amazon EC2 instances Karpenter provisioner:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: x86provisioner
spec:
  requirements:
  - key: kubernetes.io/arch
    operator: In
    values: [amd64]

The Graviton-based Amazon EC2 instances Karpenter provisioner:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: priority-arm64provisioner
spec:
  weight: 10
  requirements:
  - key: kubernetes.io/arch
    operator: In
    values: [arm64]
  1. Next, merge the two Kubernetes deployments into one deployment similar to the original before migration (i.e., no specific nodeSelector that points to a CPU-specific provisioner).

The two services are also combined into a single Kubernetes service and the actions.weighted-routing annotation is removed from the ingress resources:

spec:
  rules:
    - http:
        paths:
          - path: /app
            pathType: Prefix
            backend:
              service:
                name: simplemultiarchapp-svc
  1. Unite the two KEDA ScaledObject resources from the first configuration and point them to a single deployment, e.g., simplemultiarchapp. The new KEDA ScaledObject will be:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: simplemultiarchapp-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: simplemultiarchapp
…

Two curves with different colors to indicate HTTP request/sec count. The curves show Graviton (Blue) as baseline and bursting with x86 (Orange).

Figure 5 Config 2 – Weighting provisioners results

A synthetic limit on Graviton CPU capacity is set to illustrate the scaling to x86_64 CPUs (Provisioner.limits.resources.cpu). The total application throughput (figure 6) is shown by aarch64_200 (blue) and x86_64_200 (orange). Mixing CPUs did not impact the target response time (Figure 6). Karpenter behaved as expected: prioritizing Graviton-based instances, and bursting to x86-based Amazon EC2 instances when CPU limits were crossed.

Mixing CPU did not impact the application latency when x86 instances where added

Figure 6 Config 2 -HTTP response time p99 with mixed-CPU provisioner

Conclusion

The use of a mixed-CPU architecture enables your application to utilize a wide selection of Amazon EC2 instance types and improves your applications’ resilience while meeting your service-level objectives. Application metrics can be used to control the migration with AWS ALB Ingress, Karpenter, and KEDA. Moreover, AWS Graviton-based Amazon EC2 instances can deliver up to 40% better price performance than x86-based Amazon EC2 instances. Learn more about this example on GitHub and more announcements about Gravtion.