Tag Archives: AWS EC2 Bare Metal

Quick Restoration through Replacing the Root Volumes of Amazon EC2 instances

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/quick-restoration-through-replacing-the-root-volumes-of-amazon-ec2/

This blog post is written by Katja-Maja Krödel, IoT Specialist Solutions Architect, and Benjamin Meyer, Senior Solutions Architect, Game Tech.

Customers use Amazon Elastic Compute Cloud (Amazon EC2) instances to develop, deploy, and test applications. To use those instances most effectively, customers have expressed the need to set back their instance to a previous state within minutes or even seconds. They want to find a quick and automated way to manage setting back their instances at scale.

The feature of replacing Root Volumes of Amazon EC2 instances enables customers to replace the root volumes of running EC2 instances to a specific snapshot or its launch state. Without stopping the instance, this allows customers to fix issues while retaining the instance store data, networking, and AWS Identity and Access Management (IAM) configuration. Customers can resume their operations with their instance store data intact. This works for all virtualized EC2 instances and bare metal EC2 Mac instances today.

In this post, we show you how to design your architecture for automated Root Volume Replacement using this Amazon EC2 feature. We start with the automated snapshot creation, continue with automatically replacing the root volume, and finish with how to keep your environment clean after your replacement job succeeds.

What is Root Volume Replacement?

Amazon EC2 enables customers to replace the root Amazon Elastic Block Store (Amazon EBS) volume for an instance without stopping the instance to which it’s attached. An Amazon EBS root volume is replaced to the launch state, or any snapshot taken from the EBS volume itself. This allows issues to be fixed, such as root volume corruption or guest OS networking errors. Replacing the root volume of an instance includes the following steps:

  • A new EBS volume is created from a previously taken snapshot or the launch state
  • Reboot of the instance
  • While rebooting, the current root volume is detached and the new root volume is attached

The previous EBS root volume isn’t deleted and can be attached to an instance for later investigation of the volume. If replacing to a different state of the EBS than the launch state, then a snapshot of the current root volume is used.

An example use case is a continuous integration/continuous deployment (CI/CD) System that builds on EC2 instances to build artifacts. Within this system, you could alter the installed tools on the host and may cause failing builds on the same machine. To prevent any unclean builds, the introduced architecture is used to clean up the machine by replacing the root volume to a previously known good state. This is especially interesting for EC2 Mac Instances, as their Dedicated Host won’t undergo the scrubbing process, and the instance is more quickly restored than launching a fresh EC2 Mac instance on the same host.

Overview

The feature of replacing Root Volumes was introduced in April 2021 and has just been <TBD> extended to work for Bare Metal EC2 Mac Instances. This means that EC2 Mac Instances are included. If you want to reset an EC2 instance to a previously known good state, then you can create Snapshots of your EBS volumes. To reset the root volume to its launch state, no snapshot is needed. For non-root volumes, you can use these Snapshots to create new EBS volumes, and then attach those to your instance as well as detach them. To automate the process of replacing your root volume not only once, but also in a repeatable manner, we’re introducing you to an architecture that can fully-automate this process.

In the case that you use a snapshot to create a new root volume, you must take a new snapshot of that volume to be able to get back to that state later on. You can’t use a snapshot of a different volume to restore to, which is the reason that the architecture includes the automatic snapshot creation of a fresh root volume.

The architecture is built in three steps:

  1. Automation of Snapshot Creation for new EBS volumes
  2. Automation of replacing your Root Volume
  3. Preparation of the environment for the next Root Volume Replacement

The following diagram illustrates the architecture of this solution.

 Architecture of the automated creation of Root Volumes for Amazon EC2 Instances

In the next sections, we go through these concepts to design the automatic Root Volume Replacement Task.

Automation of Snapshot Creation for new EBS volumes

Architecture of the automated creation of Snapshots of new EBS Volumes.

The figure above illustrates the architecture for automatically creating a snapshot of an existing EBS volume. In this architecture, we focus on the automation of creating a snapshot whenever a new EBS root volume is created.

Amazon EventBridge is used to invoke an AWS Lambda function on an emitted createVolume event. For automated reaction to the event, you can add a rule to the EventBridge which will forward the event to an AWS Lambda function whenever a new EBS volume is created. The rule within EventBridge looks like this:

{
  "source": ["aws.ec2"],
  "detail-type": ["EBS Volume Notification"],
  "detail": {
    "event": ["createVolume"]
  }
}

An example event is emitted when an EBS root volume is created, which will then invoke the Lambda function to look like this:

{
   "version": "0",
   "id": "01234567-0123-0123-0123-012345678901",
   "detail-type": "EBS Volume Notification",
   "source": "aws.ec2",
   "account": "012345678901",
   "time": "yyyy-mm-ddThh:mm:ssZ",
   "region": "us-east-1",
   "resources": [
      "arn:aws:ec2:us-east-1:012345678901:volume/vol-01234567"
   ],
   "detail": {
      "result": "available",
      "cause": "",
      "event": "createVolume",
      "request-id": "01234567-0123-0123-0123-0123456789ab"
   }
}

The code of the function uses the resource ARN within the received event and requests resource details about the EBS volume from the Amazon EC2 APIs. Since the event doesn’t include information if it’s a root volume, then you must verify this using the Amazon EC2 API.

The following is a summary of the tasks of the Lambda function:

  1. Extract the EBS ARN from the EventBridge Event
  2. Verify that it’s a root volume of an EC2 Instance
  3. Call the Amazon EC2 API create-snapshot to create a snapshot of the root volume and add a tag replace-snapshot=true

Then, the tag is used to clean up the environment and get rid of snapshots that aren’t needed.

As an alternative, you can emit your own event to EventBridge. This can be used to automatically create snapshots to which you can restore your volume. Instead of reacting to the createVolume event, you can use a customized approach for this architecture.

Automation of replacing your Root Volume

Architecture of the automated creation of Snapshots of new EBS Volumes.

The figure above illustrates the procedure of replacing the EBS root volume. It starts with the event, which is created through the AWS Command Line Interface (AWS CLI), console, or usage of the API. This leads to creating a new volume from a snapshot or using the initial launch state. The EC2 instance is rebooted, and during that time the old root volume is detached and a new volume gets attached as the root volume.

To invoke the create-replace-root-volume-task, you can call the Amazon EC2 API with the following AWS CLI command:

aws ec2 create-replace-root-volume-task --instance-id <value> --snapshot <value> --tag-specifications ResourceType=string,Tags=[{Key=replaced-volume,Value=true}]

If you want to restore to launch state, then omit the --snapshot parameter:

aws ec2 create-replace-root-volume-task --instance-id <value> --tag-specifications ResourceType=string,Tags=[{Key=delete-volume,Value=true}]

After running this command, AWS will create a new EBS volume, add the tag to the old EBS replaced-volume=true, restart your instance, and attach the new volume to the instance as the root volume. The tag is used later to detect old root volumes and clean up the environment.

If this is combined with the earlier explained automation, then the automation will immediately take a snapshot from the new EBS volume. A restore operation can only be done to a snapshot of the current EBS root volume. Therefore, if no snapshot is taken from the freshly restored EBS volume, then no restore operation is possible except the restore to launch state.

Preparation of the Environment for the next Root Volume Replacement

After the task is completed, the old root volume isn’t removed. Additionally, snapshots of previous root volumes can’t be used to restore current root volumes. To clean up your environment, you can schedule a Lambda function which does the following steps:

  • Delete detached EBS volumes with the tag delete-volume=true
  • Delete snapshots with the tag replace-snapshot=true, which aren’t associated with an existing EBS volume

Conclusion

In this post, we described an architecture to quickly restore EC2 instances through Root Volume Replacement. The feature of replacing Root Volumes of Amazon EC2 instances, now including Bare Metal EC2 Mac instances, enables customers to replace the root volumes of running EC2 instances to a specific snapshot or its launch state. Customers can resume their operations with their instance store data intact. We’ve split the process of doing this in an automated and quick manner into three steps: Create a snapshot, run the replacement task, and reset your environment to be prepared for a following replacement task. If you want to learn more about this feature, then see the Announcement of replacing Root Volumes, as well as the documentation for this feature. <TBD Announcement Bare Metal>

Understanding the lifecycle of Amazon EC2 Dedicated Hosts

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/understanding-the-lifecycle-of-amazon-ec2-dedicated-hosts/

This post is written by Benjamin Meyer, Sr. Solutions Architect, and Pascal Vogel, Associate Solutions Architect.

Amazon Elastic Compute Cloud (Amazon EC2) Dedicated Hosts enable you to run software on dedicated physical servers. This lets you comply with corporate compliance requirements or per-socket, per-core, or per-VM licensing agreements by vendors, such as Microsoft, Oracle, and Red Hat. Dedicated Hosts are also required to run Amazon EC2 Mac Instances.

The lifecycles and states of Amazon EC2 Dedicated Hosts and Amazon EC2 instances are closely connected and dependent on each other. To operate Dedicated Hosts correctly and consistently, it is critical to understand the interplay between Dedicated Hosts and EC2 Instances. In this post, you’ll learn how EC2 instances are reliant on their (dedicated) hosts. We’ll also dive deep into their respective lifecycles, the connection points of these lifecycles, and the resulting considerations.

What is an EC2 instance?

An EC2 instance is a virtual server running on top of a physical Amazon EC2 host. EC2 instances are launched using a preconfigured template called Amazon Machine Image (AMI), which packages the information required to launch an instance. EC2 instances come in various CPU, memory, storage and GPU configurations, known as instance types, to enable you to choose the right instance for your workload. The process of finding the right instance size is known as right sizing. Amazon EC2 builds on the AWS Nitro System, which is a combination of dedicated hardware and the lightweight Nitro hypervisor. The EC2 instances that you launch in your AWS Management Console via Launch Instances are launched on AWS-controlled physical hosts.

What is an Amazon EC2 Bare Metal instance?

Bare Metal instances are instances that aren’t using the Nitro hypervisor. Bare Metal instances provide direct access to physical server hardware. Therefore, they let you run legacy workloads that don’t support a virtual environment, license-restricted business-critical applications, or even your own hypervisor. Workloads on Bare Metal instances continue to utilize AWS Cloud features, such as Amazon Elastic Block Store (Amazon EBS), Elastic Load Balancing (ELB), and Amazon Virtual Private Cloud (Amazon VPC).

What is an Amazon EC2 Dedicated Host?

An Amazon EC2 Dedicated Host is a physical server fully dedicated to a single customer. With visibility of sockets and physical cores of the Dedicated Host, you can address corporate compliance requirements, such as per-socket, per-core, or per-VM software licensing agreements.

You can launch EC2 instances onto a Dedicated Host. Instance families such as M5, C5, R5, M5n, C5n, and R5n allow for the launching of different instance sizes, such as4xlarge and 8xlarge, to the same host. Other instance families only support a homogenous launching of a single instance size. For more details, see Dedicated Host instance capacity.

As an example, let’s look at an M6i Dedicated Host. M6i Dedicated Hosts have 2 sockets and 64 physical cores. If you allocate a M6i Dedicated Host, then you can specify what instance type you’d like to support for allocation. In this case, possible instance sizes are:

  • large
  • xlarge
  • 2xlarge
  • 4xlarge
  • 8xlarge
  • 12xlarge
  • 16xlarge
  • 24xlarge
  • 32xlarge
  • metal

The number of instances that you can launch on a single M6i Dedicated Host depends on the selected instance size. For example:

  • In the case of xlarge (4 vCPUs), a maximum of 32 m6i.xlarge instances can be scheduled on this Dedicated Host.
  • In the case of 8xlarge (32 vCPUs), a maximum of 4 m6i.8xlarge instances can be scheduled on this Dedicated Host.
  • In the case of metal (128 vCPUs), a maximum of 1 m6i.metal instance can be scheduled on this Dedicated Host.

When launching an EC2 instance on a Dedicated Host, you’re billed for the Dedicated Host but not for the instance. The cost for Amazon EBS volumes is the same as in the case of regular EC2 instances.

Exemplary homogenious M6i Dedicated Host shown with 32 m6i.xlarge, four m6i.8xlarge and one m6i.metal each.

Exemplary M6i Dedicated Host instance selections: m6i.xlarge, m6i.8xlarge and m6i.metal

Understanding the EC2 instance lifecycle

Amazon EC2 instance lifecycle states and transitions

Throughout its lifecycle, an EC2 instance transitions through different states, starting with its launch and ending with its termination. Upon Launch, an EC2 instance enters the pending state. You can only launch EC2 instances on Dedicated Hosts in the available state. You aren’t billed for the time that the EC2 instance is in any state other than running. When launching an EC2 instance on a Dedicated Host, you’re billed for the Dedicated Host but not for the instance. Depending on the user action, the instance can transition into three different states from the running state:

  1. Via Reboot from the running state, the instance enters the rebooting state. Once the reboot is complete, it reenters the running state.
  2. In the case of an Amazon EBS-backed instance, a Stop or Stop-Hibernate transitions the running instance into the stopping state. After reaching the stopped state, it remains there until further action is taken. Via Start, the instance will reenter the pending and subsequently the running state. Via Terminate from the stopped state, the instance will enter the terminated state. As part of a Stop or Stop-Hibernate and subsequent Start, the EC2 instance may move to a different AWS-managed host. On Reboot, it remains on the same AWS-managed host.
  3. Via Terminate from the running state, the instance will enter the shutting-down state, and finally the terminated state. An instance can’t be started from the terminated state.

Understanding the Amazon EC2 Dedicated Host lifecycle

A diagram of the the Amazon EC2 Dedicated Host lifecycle states and transitions between them.

Amazon EC2 Dedicated Host lifecycle states and transitions

An Amazon EC2 Dedicated Host enters the available state as soon as you allocate it in your AWS account. Only if the Dedicated Host is in the available state, you can launch EC2 instances on it. You aren’t billed for the time that your Dedicated Host is in any state other than available. From the available state, the following states and state transitions can be reached:

  1. You can Release the Dedicated Host, transitioning it into the released state. Amazon EC2 Mac Instances Dedicated Hosts have a minimum allocation time of 24h. They can’t be released within the 24h. You can’t release a Dedicated Host that contains instances in one of the following states: pending, running, rebooting, stopping, or shutting down. Consequently, you must Stop or Terminate any EC2 instances on the Dedicated Host and wait until it’s in the available state before being able to release it. Once an instance is in the stopped state, you can move it to a different Dedicated Host by modifying its Instance placement configuration.
  2. The Dedicated Host may enter the pending state due to a number of reasons. In case of an EC2 Mac instance, stopping or terminating a Mac instance initiates a scrubbing workflow of the underlying Dedicated Host, during which it enters the pending state. This scrubbing workflow includes tasks such as erasing the internal SSD, resetting NVRAM, and more, and it can take up to 50 minutes to complete. Additionally, adding or removing a Dedicated Host to or from a Resource Group can cause the Dedicated Host to go into the pending state. From the pending state, the Dedicated Host will reenter the available state.
  3. The Dedicated Host may enter the under-assessment state if AWS is investigating a possible issue with the underlying infrastructure, such as a hardware defect or network connectivity event. While the host is in the under-assessment state, all of the EC2 instances running on it will have the impaired status. Depending on the nature of the underlying issue and if it’s configured, the Dedicated Host will initiate host auto recovery.

If Dedicated Host Auto Recovery is enabled for your host, then AWS attempts to restart the instances currently running on a defect Dedicated Host on an automatically allocated replacement Dedicated Host without requiring your manual intervention. When host recovery is initiated, the AWS account owner is notified by email and by an AWS Health Dashboard event. A second notification is sent after the host recovery has been successfully completed. Initially, the replacement Dedicated Host is in the pending state. EC2 instances running on the defect dedicated Host remain in the impaired status throughout this process. For more information, see the Host Recovery documentation.

Once all of the EC2 instances have been successfully relaunched on the replacement Dedicated Host, it enters the available state. Recovered instances reenter the running state. The original Dedicated Host enters the released-permanent-failure state. However, if the EC2 instances running on the Dedicated Host don’t support host recovery, then the original Dedicated Host enters the permanent-failure state instead.

Conclusion

In this post, we’ve explored the lifecycles of Amazon EC2 instances and Amazon EC2 Dedicated Hosts. We took a close look at the individual lifecycle states and how both lifecycles must be considered in unison to operate EC2 Instances on EC2 Dedicated Hosts correctly and consistently. To learn more about operating Amazon EC2 Dedicated Hosts, visit the EC2 Dedicated Hosts User Guide.