Implementing Attribute-Based Instance Type Selection using Terraform

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/implementing-attribute-based-instance-type-selection-using-terraform/

This blog post is written by Christian Melendez, Senior Specialist Solutions Architect, Flexible Compute – EC2 Spot and Carlos Manzanedo Rueda, WW SA Leader, Flexible Compute – EC2 Spot.

In this blog post we will cover the release of Terraform support for Attribute-Based Instance Type Selection (ABS). ABS simplifies the configuration required to acquire compute capacity for Instance Flexible workloads. Terraform is an open-source infrastructure as code software tool by HashiCorp. Hashicorp is an AWS Partner Network (APN) Advanced Technology Partner and member of the AWS DevOps Competency.

Introduction

Amazon EC2 provides a wide selection of instance types optimized to fit different use cases. Instance types comprise varying combinations of CPU, memory, storage, and networking capacity and give you the flexibility to choose the appropriate mix of resources for your applications.

Workloads such as continuous integration, analytics, microservices on containers, etc., can use multiple instance types. Customers have been telling us that simplifying the configuration of instance flexible workloads is important. For workloads that are instance flexible, AWS released ABS to express workload requirements as a set of instance attributes such as: vCPU, memory, type of processor, etc. ABS translates these requirements and selects all matching instance types that meet the criteria. To select which instance to launch, Amazon EC2 Auto Scaling Groups and EC2 Fleet chose instances based on the allocation strategy configured. Lowest-price allocation strategy is supported on both Amazon EC2 On-Demand Instances and Amazon EC2 Spot Instances. The recommendation for Spot Instances is to use capacity-optimized, which select the optimal instances that reduce the frequency of interruptions. ABS does also future-proof EC2 Auto Scaling Group and EC2 Fleet configurations: any new instance type we launch that matches the selected attributes, will be included in the list automatically. No need to update your EC2 Auto Scaling Group or EC2 Fleet configuration.

Following our commitment to open-source projects, AWS has added support for ABS in the AWS Terraform provider. You can use ABS for launch templates, EC2 Auto Scaling Group, and EC2 Fleet resources. The minimum required version of the AWS provider is v4.16.0.

Applying instance flexibility is key for running fault-tolerant, elastic, reliable, and cost optimized workloads. By selecting a diversified choice of instances that qualify for your workload, your application will be better prepared to avoid scenarios where lack of capacity on a specific instance type could be an issue. This applies both to On-Demand and Spot Instance-based workloads. For Spot Instances, applying diversification is key. Spot Instances are spare capacity that can be reclaimed by EC2 when it is required. ABS allows you to specify diversification in simple terms, allowing EC2 Auto Scaling Group and EC2 Fleet’s allocation strategy to replace reclaimed Spot Instances with instances from other pools where capacity is available.

Instance Requirement Attributes

To represent the instance requirements for your workload using ABS, there are a set of attributes you can use within the instance_requirements block. When using Terraform, the only two required attributes are memory_mib and vcpu_count. The rest of the attributes provide default values that adhere to Instance Flexible workloads best practices. For example, bare_metal attribute is by default excluded. You can see the full list of ABS attributes in the Terraform docs site.

Once ABS attributes are configured, ABS picks a list of instance types that match the criteria. This list is especially important when you’re using Spot Instances. One of Spot Instances best practices is to diversify the instance types which, in combination with the capacity-optimized allocation strategy, gives you access to the highest amount of Spot capacity pools. For On-Demand Instances, the instance types list is important as well. There might be scenarios where On-Demand Instance pools lack capacity. By applying instance flexibility using ABS, you can avoid the InsufficientInstanceCapacity error. And in combination with the lowest-price allocation strategy, you get the lowest price instance types from your diversified selection.

There are different places where we can specify ABS attributes. We can specify them at the launch templates level and declare a base mechanism to select instances. In most cases, the recommendation is to configure ABS attributes at the EC2 Auto Scaling Group and EC2 Fleet level. Let’s explore each of these options.

Configuring instance requirements within launch templates

Launch templates are instance configuration templates where you specify parameters like AMI ID, instance type, key pair, and security groups to launch instances. You can use ABS attributes in a launch template when you need to be prescriptive, and define sane defaults or guardrails for your workloads. This way, EC2 Auto Scaling Group or EC2 Fleet simply reference and use the launch template.

You should use ABS attributes in launch templates when you want to prevent users from overriding the resources specified by a launch template. Note that is still possible to override those requirements.

Let’s say that we have a Java application that requires a minimum of 4 vCPUs and 8 GiB of memory, and has been using the c5.xlarge instance type. After performance testing we’ve identified that runs better with current instance types generations. The following code snippet represents how to define these requirements in a Launch Template. To see the full list of attributes, visit the launch template doc site.

resource "aws_launch_template" "abs" {
  name_prefix = "abs"
  image_id    = data.aws_ami.abs.id

  instance_requirements {
    memory_mib {
      min = 8192
    }
    vcpu_count {
      min = 4
    }
    instance_generations = ["current"]
  }
}

Note by using instance_requirements block in a LaunchTemplate, you’ll need to use the mixed_instances_policy block in the EC2 Auto Scaling Group.

resource "aws_autoscaling_group" "on_demand" {
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
  max_size           = 1
  min_size           = 1
  mixed_instances_policy {
    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.abs.id
      }
    }
  }
}

The EC2 Auto Scaling Group will use instance types that match the requirements in the launch template.

You can preview what are the instances that the EC2 Auto Scaling Group will select. The section “How to Preview Matching Instances without Launching Them” of the ABS blog post, describes how to preview the instances that will be selected.

Launching EC2 Spot Instances with EC2 Auto Scaling

Launch templates are very powerful. They allow you to decouple the attributes such as user_data from the actual instance management. They are idempotent and can be versioned which is key for rolling out changes in the configuration and applying EC2 Auto Scaling Group Instance Refresh.

Our recommendation is to define ABS attributes as overrides within the mixed_instances_policy block in EC2 Auto Scaling Groups. For most of the applications, we recommend using EC2 Auto Scaling Group to provision EC2 instances.

Let’s get back to our previous example. Now we want to be more prescriptive for the instance requirements our Java application uses.

Let’s assume that the Java application is memory intensive, requires a vCPU to Memory ratio of 4GB for every vCPU, and has been using the m5.large instance type. Additionally, it does not need hardware accelerators like GPUs, FPGAs, etc. Our application does also require a minimum of 2 vCPUs, and the range of memory has been reduced to 32GB to avoid large garbage collection scenarios. This time, we’d like to launch only Spot Instances. As mentioned earlier in this blog post, diversification is key for Spot. To improve the experience with Spot, let’s enable the capacity rebalance feature from the EC2 Auto Scaling Group to proactively replace instances that are at an elevated risk of being interrupted. The following code snippet represents the ABS attributes we need for the more prescriptive workload:

resource “aws_autoscaling_group” “spot” {
  availability_zones = [“us-east-1a”, “us-east-1b”, “us-east-1c”]
  desired_capacity   = 1
  max_size           = 1
  min_size           = 1
  capacity_rebalance = true

  mixed_instances_policy {
    instances_distribution {
      spot_allocation_strategy = "capacity-optimized"
    }

    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.x86.id
      }

      override {
        instance_requirements {
          memory_mib {
            min = 4096
            max = 32768
          }

          vcpu_count {
            min = 2
          }

          memory_gib_per_vcpu {
            min = 4
            max = 4
          }

          accelerator_count {
            max = 0
          }
        }
      }
    }
  }
}

Launching EC2 Spot Instances with EC2 Fleet

Another method we have to launch EC2 instances is the EC2 Fleet API. We recommend to use EC2 Fleet for workloads that need granular controls to provision capacity. For example, tight HPC workloads where instances must be close together (single Availability Zone and within the same placement group) and need similar instance types. EC2 Fleet is also used by capacity orchestrators such as Karpenter or Atlassian Escalator that implement tuned up and optimized logic to provision capacity.

Let’s say that this time the workload is a CPU bound workload and has been using the c5.9xlarge instance type. The workload can be retried and the application supports checkpointing, so it qualifies to use Spot Instances. Given we’ll be using Spot Instances, we would like to benefit from the capacity rebalance feature as we did before in the EC2 Auto Scaling Group example. The application requires very prescriptive ranges of vCPU and memory and we also need a minimum of 100 GB SSD local storage. While in most cases EC2 Auto Scaling Groups are appropriate solution to procure and maintain capacity, in this case we will use EC2 Fleet.

The following code snippet represents the ABS attributes we need for this workload:

resource "aws_ec2_fleet" "spot" {
  target_capacity_specification {
    default_target_capacity_type = "spot"
    total_target_capacity        = 5
  }

  spot_options {
    allocation_strategy = "capacity-optimized"
    maintenance_strategies {
      capacity_rebalance {
        replacement_strategy = "launch"
      }
    }
  }

  launch_template_config {
    launch_template_specification {
      launch_template_id = aws_launch_template.x86.id
      version            = aws_launch_template.x86.latest_version
    }

    override {
      instance_requirements {
        memory_mib {
          min = 65536
          max = 73728
        }

        vcpu_count {
          min = 32
          max = 36
        }

        cpu_manufacturers   = ["intel"]
        local_storage       = "required"
        local_storage_types = ["ssd"]

        total_local_storage_gb {
          min = 100
        }
      }
    }
  }
}

Multi-Architecture workloads using Graviton and x86 with EC2 Auto Scaling Groups

Another recent feature from EC2 Auto Scaling Groups is that you can build multi-architecture workloads. EC2 Auto Scaling Group allows you to mix Graviton and x86 instance types in the same EC2 Auto Scaling Group. Unlike the x86_64 instances we have used so far, AWS Graviton processors are custom built by AWS using 64-bit Arm. You need to use different launch templates as each architecture needs to use a different AMI. This can be defined in within the override block.

In the example below, we use ABS to define different attributes depending on the CPU architecture. And what’s great about doing this is that we don’t need to exclude instance types. Instances will be launched with a compatible CPU architecture based on the AMI that you specify in our launch template.

Besides supporting mixing architectures, EC2 Auto Scaling Groups allows to combine purchase models. For our example, this time we’ll use a more complex scenario to showcase how powerful and feature rich EC2 Auto Scaling Group has become. The following code snippet applies many of the configurations we’ve seen before, but the key difference here is that we have two overrides. One is for Graviton instances, and another one for x86 instances.

resource "aws_autoscaling_group" "on_demand_spot" {
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
  desired_capacity   = 4
  max_size           = 10
  min_size           = 2
  capacity_rebalance = true

  mixed_instances_policy {
    instances_distribution {
      on_demand_base_capacity                  = 2
      on_demand_percentage_above_base_capacity = 0
      spot_allocation_strategy                 = "capacity-optimized"
    }

    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.arm.id
      }

      override {
        launch_template_specification {
          launch_template_id = aws_launch_template.arm.id
        }

        instance_requirements {
          memory_mib {
            min = 16384
            max = 16384
          }

          vcpu_count {
            max = 4
          }
        }
      }

      override {
        launch_template_specification {
          launch_template_id = aws_launch_template.x86.id
        }

        instance_requirements {
          memory_mib {
            min = 16384
          }

          vcpu_count {
            min = 4
          }
        }
      }
    }
  }
}

Multi-architecture workloads can also be applied to container orchestration. Thanks to the Amazon Elastic Container Registry (Amazon ECR) support for multi-architecture container images, you can use manifests to push both container images, and Amazon ECR will pull the proper image based on the CPU architecture. We have a workshop for Elastic Kubernetes Service (Amazon EKS) where you can learn more about how to deploy a multi-architecture workload in Amazon EKS.

Conclusion

In this post you’ve learned how to configure ABS to launch instances using Auto Scaling Groups and EC2 Fleet. You’ve learned and how to mix CPU architectures along with purchase models in the same EC2 Auto Scaling Group using Terraform while simplifying the configuration.

Our commitment with open-source projects such as Terraform is to help customers implement AWS best practices in a larger ecosystem easily. ABS support allows customers to get access to compute capacity by simply specifying the resource requirement attributes of their workloads rather than the instance names.

ABS simplifies the configuration for instance flexible workloads and removes the need to list the instances that qualify for your workload. Instead, it simplifies the configuration and future proof for scenarios where AWS includes new instances that qualify for the workload. For Spot workloads where instance diversification is key, ABS simplifies the selection of instances and helps to increase the total number of pools. For more information, visit the ABS user guide and Terraform documentation for Auto Scaling Groups and EC2 Fleet.

Noise