Tag Archives: Amazon EC2

Amazon CloudWatch Insights for Amazon EKS on EC2 using AWS Distro for OpenTelemetry Helm charts

Post Syndicated from Vimala Pydi original https://aws.amazon.com/blogs/architecture/amazon-cloudwatch-insights-for-amazon-eks-on-ec2-using-aws-distro-for-opentelemetry-helm-charts/

This blog provides a simplified three-step solution to collect metrics and logs from an Amazon Elastic Kubernetes Service (Amazon EKS) cluster on Amazon Elastic Compute Cloud (Amazon EC2) using the AWS Distro for OpenTelemetry (ADOT) Helm charts repository and send them to Amazon CloudWatch Logs and Amazon CloudWatch Container Insights. The ADOT Helm charts repository contains Helm charts to provide easy mechanisms to set up the ADOT Collector and other collection agents like fluentbit to collect telemetry data such as metrics, logs and traces to send to AWS monitoring services.

Amazon EKS is a managed Kubernetes service that makes it easy for organizations to run Kubernetes on AWS Cloud and on premises. Organizations use Amazon EKS to automatically manage the availability and scalability of the Kubernetes control plane nodes responsible for scheduling containers, managing application availability, storing cluster data, and performing other key tasks. ADOT is a secure, production-ready, AWS-supported distribution of the OpenTelemetry project. Applications can set up ADOT Collector and other collector agents only once to send correlated metrics and traces to multiple AWS and Partner monitoring solutions. Fluent Bit is an open-source log processor and forwarder that you can use to collect data such as metrics and logs from different sources. Helm deploys packaged applications to Kubernetes and structures them into Helm charts.

Solution overview

A high-level architecture diagram depicted in Figure 1 shows a simple solution for collecting metrics and logs to send to Amazon CloudWatch Container Insights by installing an ADOT Helm chart on your existing or new Amazon EKS cluster.

Here are the steps to set up an ADOT and fluentbit collector:

  1. Set up your environment and install the necessary tools to connect to an existing or newly created Amazon EKS cluster.
  2. Configure the necessary roles for AWS Identity and Access Management (IAM) roles for service accounts and install Helm charts for ADOT, enabling fluentbit.
  3. Monitor logs, metrics, and traces from Amazon CloudWatch Logs and Container Insights.
Architecture diagram for Helm chart installation of ADOT and fluentbit to an existing Amazon EKS cluster

Figure 1. Architecture diagram for Helm chart installation of ADOT and fluentbit to an existing Amazon EKS cluster


  • Existing AWS account with access to AWS Management Console
  • Intermediate-level knowledge and understanding of Amazon EKS
  • An existing or new Amazon EKS cluster

Install the tools

In this blog, AWS Cloud9 is used as an environment to connect to the Amazon EKS cluster and install Helm charts. If you choose to use AWS Cloud9, follow the step-by-step instructions provided in Creating an EC2 Environment. Refer to Getting started with Amazon EKS for additional instructions to install eksctl, create EKS clusters, and set up required IAM permissions for connecting to an EKS cluster.

  1. Log in to your Amazon EKS cluster and inspect the cluster. Select an EKS cluster in AWS Management Console. On the Resources tab, check the DaemonSets, as in Figure 2a.

    EKS cluster DaemonSets

    Figure 2a. EKS cluster DaemonSets

  2. Open Amazon CloudWatch and inspect the Log groups and Amazon CloudWatch Container Insights. Note that the Log groups and Amazon CloudWatch Container Insights in Figure 2b do not show any EKS cluster-specific logs.

    Container Insights before ADOT and fluentbit collector installation

    Figure 2b. Container Insights before ADOT and fluentbit collector installation

Install Helm and configure IAM roles

  1. Run the following command to install Helm, verify the version, and configure Bash completion for the Helm command:
    curl -ssl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
    helm version --short
    helm completion bash >> ~/.bash_completion
    . /etc/profile.d/bash_completion.sh
    . ~/.bash_completion
    source <(helm completion bash)
  2. Set up IAM roles for service accounts.
    Replace XXX in the following commands with your EKS Cluster name.

    eksctl create iamserviceaccount \
    --name fluent-bit \
    --role-name EKS-ADOT-CWCI-Helm-Chart-Role-CW \
    --namespace amazon-cloudwatch \
    --cluster XXX \
    --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \
    --role-only \
    eksctl create iamserviceaccount \
    --name adot-collector-sa \
    --role-name EKS-ADOT-CWCI-Helm-Chart-Role-METRICS \
    --namespace amazon-metrics \
    --cluster XXX \
    --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \
    --role-only \
  3. Deploy the ADOT Helm chart.
    Replace XXX in the following code with your EKS Cluster name.

    CWCI_ADOT_HELM_ROLE_ARN_CW=$(aws iam get-role --role-name EKS-ADOT-CWCI-Helm-Chart-Role-CW | jq .Role.Arn -r)
    CWCI_ADOT_HELM_ROLE_ARN_METRICS=$(aws iam get-role --role-name EKS-ADOT-CWCI-Helm-Chart-Role-METRICS | jq .Role.Arn -r)
    helm repo add adot-helm-repo https://aws-observability.github.io/aws-otel-helm-charts
    helm install adot-release adot-helm-repo/adot-exporter-for-eks-on-ec2  \
    --set clusterName=XXX --set awsRegion=us-east-1 --set fluentbit.enabled=true \
    --set adotCollector.daemonSet.service.metrics.receivers={awscontainerinsightreceiver} \
    --set adotCollector.daemonSet.service.metrics.exporters={awsemf} \
    --set adotCollector.daemonSet.cwexporters.logStreamName=EKSNode \
  4. Run the following commands to validate the successful deployment.
    • Verify that two new namespaces have been created.
      kubectl get ns
      The result should be:

      $ kubectl get ns
      NAME                STATUS           AGE
      amazon-cloudwatch   Active           2d20h
      amazon-metrics      Active           2d20h
    • Verify that a fluentbit pod was enabled as part of the ADOT Helm Chart under the amazon-cloudwatch namespace.
      kubectl get all -n amazon-cloudwatch
      The result should be:

      kubectl get all -n amazon-cloudwatch
      NAME                   READY   STATUS    RESTARTS   AGE
      pod/fluent-bit-9lrnt   1/1     Running   0          2d20h
      pod/fluent-bit-h9lvt   1/1     Running   0          2d20h
      pod/fluent-bit-nbqjm   1/1     Running   0          2d20h
    • Verify the adot-collector-pod under the amazon-metrics namespace.
      kubectl get all -n amazon-metrics
      The result should be:

      $ kubectl get all -n amazon-metrics
      NAME                                 READY   STATUS    RESTARTS   AGE
      pod/adot-collector-daemonset-6qcsd   1/1     Running   0          2d20h
      pod/adot-collector-daemonset-f92fr   1/1     Running   0          2d20h
      pod/adot-collector-daemonset-gmhbx   1/1     Running   0          2d20h
      NAME                                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
      daemonset.apps/adot-collector-daemonset   3         3         3       3            3           <none>          2d20h
  5. Validate the installation through the Amazon EKS cluster.
    Go to the Amazon EKS cluster and select the Resources tab. Under Workloads, select DaemonSets, and find the fluent-bit and adot-collector-daemonsets as demonstrated in Figure 3.

    DaemonSet under Amazon EKS cluster resources

    Figure 3. DaemonSet under Amazon EKS cluster resources

Monitor logs, metrics, and traces

Monitor the CloudWatch Logs and CloudWatch Insights.

  • In the Logs section, choose Log groups to view Amazon EKS cluster log groups with a prefix of /aws/containerinsights, as in Figure 4a.

    EKS cluster log groups

    Figure 4a. EKS cluster log groups

  • In the Insights section, choose Container Insights to view all the resources within your Amazon EKS cluster, as in Figure 4b.

    EKS cluster's Container Insights resources

    Figure 4b. EKS cluster’s Container Insights resources

  • On the Container Insights page, select Container map from the dropdown to check the container map for Amazon EKS clusters, as demonstrated in Figure 4c.

    EKS cluster's Container Insights container map

    Figure 4c. EKS cluster’s Container Insights container map

  • On the Container Insights page, select Performance monitoring from the dropdown to view various performance metrics for Amazon EKS cluster, as demonstrated in Figure 4d.

    EKS cluster's Container Insights performance monitoring

    Figure 4d. EKS cluster’s Container Insights performance monitoring


If you are no longer using the resources discussed in this blog, remove the excess AWS resources to avoid incurring charges. After you finish setting up ADOT and fluentbit collectors to send logs and metrics to Amazon CloudWatch Logs and Container Insights, clean up resources by uninstalling the ADOT Helm chart, deleting IAM Roles created for the services, deleting CloudWatch Logs, and deleting Container Insights.


In this blog we walked through a simple three-step solution to set up Amazon EKS cluster logs and Container Insights using Helm charts. The Helm chart installs ADOT and fluentbit as a DaemonSet in the existing EKS cluster to collect and port logs, metrics, and traces to Amazon CloudWatch Logs and Container Insights. The Amazon CloudWatch Container Insights provide insights into resources, monitor performance, and container map of all the resources within the Amazon EKS cluster.

Introducing VPC Lattice – Simplify Networking for Service-to-Service Communication (Preview)

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-vpc-lattice-simplify-networking-for-service-to-service-communication-preview/

Modern applications are built using modular and distributed components. Each component is a service that implements its own subset of functionalities. To make these services communicate with each other, you need a way to let them discover where they are, authorize access, and route traffic. When troubleshooting issues, you need to keep communication configurations under control so that you can quickly understand what is happening at the application, service, and network levels. This can take a lot of your time.

Today, we are making available in preview Amazon VPC Lattice, a new capability of Amazon Virtual Private Cloud (Amazon VPC) that gives you a consistent way to connect, secure, and monitor communication between your services. With VPC Lattice, you can define policies for traffic management, network access, and monitoring so you can connect applications in a simple and consistent way across AWS compute services (instances, containers, and serverless functions). VPC Lattice automatically handles network connectivity between VPCs and accounts and network address translation between IPv4, IPv6, and overlapping IP addresses. VPC Lattice integrates with AWS Identity and Access Management (IAM) to give you the same authentication and authorization capabilities you are familiar with when interacting with AWS services today, but for your own service-to-service communication. With VPC Lattice, you have common controls to route traffic based on request characteristics and weighted routing for blue/green and canary-style deployments. For example, VPC Lattice allows you to mix and match compute types for a given service, which helps you modernize a monolith application architecture to microservices.

VPC Lattice is designed to be noninvasive, allowing teams across your organization to incrementally opt in over time. In this way, you are able to deliver applications faster by focusing on your application logic, while VPC Lattice handles service-to-service networking, security, and monitoring requirements.

How Amazon VPC Lattice Works
With VPC Lattice, you create a logical application layer network, called a service network, that connects clients and services across different VPCs and accounts, abstracting network complexity. A service network is a logical boundary that is used to automatically implement service discovery and connectivity as well as apply access and observability policies to a collection of services. It offers inter-application connectivity over HTTP/HTTPS and gRPC protocols within a VPC.

Once a VPC has been enabled for a service network, clients in the VPC will automatically be able to discover the services in the service network through DNS and will direct all inter-application traffic through VPC Lattice. You can use AWS Resource Access Manager (RAM) to control which accounts, VPCs, and applications can establish communication via VPC Lattice.

A service is an independently deployable unit of software that delivers a specific task or function. In VPC Lattice, a service is a logical component that can live in any VPC or account and can run on a mixture of compute types (virtual machines, containers, and serverless functions). A service configuration consists of:

  • One or two listeners that define the port and protocol that the service is expecting traffic on. Supported protocols are HTTP/1.1, HTTP/2, and gRPC, including HTTPS for TLS-enabled services.
  • Listeners have rules that consist of a priority, which specifies the order in which rules should be processed, one or more conditions that define when to apply the rule, and actions that forward traffic to target groups. Each listener has a default rule that takes effect when no additional rules are configured, or no conditions are met.
  • A target group is a collection of targets, or compute resources, that are running a specific workload you are trying to route toward. Targets can be Amazon Elastic Compute Cloud (Amazon EC2) instances, IP addresses, and Lambda functions. For Kubernetes workloads, VPC Lattice can target services and pods via the AWS Gateway Controller for Kubernetes. To have access to the AWS Gateway Controller for Kubernetes, you can join the preview.

VPC Lattice logical architecture.

To configure service access controls, you can use access policies. An access policy is an IAM resource policy that can be associated with a service network and individual services. With access policies, you can use the “PARC” (principal, action, resource, and condition) model to enforce context-specific access controls for services. For example, you can use an access policy to define which services can access a service you own. If you use AWS Organizations, you can limit access to a service network to a specific organization.

VPC Lattice also provides a service directory, a centralized view of the services that you own or have been shared with you via AWS RAM.

Using Amazon VPC Lattice
We expect people with different roles can use VPC Lattice. For example:

  • The service network administrator can:
    • Create and manage a service network.
    • Define access and monitoring for the service network.
    • Associate client and services.
    • Share the service network with other AWS accounts.
  • The service owner can:
    • Create and manage a service, including access and monitoring.
    • Define routing, for example, configuring listeners and rules that point to the target groups where the service is running.
    • Associate a service to service networks.

Let’s see how this works in practice. In this quick walkthrough, I am covering both roles.

Creating Two Backend Services
There is nothing specific to VPC Lattice in this section. I am just creating a couple of services, one running on Amazon EC2 and one on AWS Lambda, that I’ll use later when I configure networking with VPC Lattice.

In an Amazon Linux EC2 instance, I create a web app that replies “Hello from the instance” to HTTP requests. To allow access to the instance from clients coming via VPC Lattice, I add an inbound rule to the security group to allow TCP traffic on port 8080 from the VPC Lattice AWS-managed prefix list.

Here’s the app.py file. I am using Python and Flask for this app, but you don’t need to know them to follow along with the post.

from flask import Flask

app = Flask(__name__)

def index():
  return 'Hello from the instance'

def somePath(path):
  return 'Hello from the instance at path "{}"'.format(path)

app.run(host='', port=8080)

Here’s the requirements.txt file with the Python dependencies. There’s only one line because the only module I need is flask:


I install the dependencies:

pip3 install -r requirements.txt

Then, I start the web app using the nohup command to keep it running in case I log out of the instance:

nohup flask run --host= --port 8080 &

On the EC2 instance, the web service is now listening to HTTP traffic on port 8080.

In the Lambda console, I create a simple function using the Node.js 18.x runtime that replies “Hello from the function” to all invocations.

exports.handler = async (event) => {
    const response = {
        statusCode: 200,
        body: JSON.stringify('Hello from the function'),
    return response;

The two services are now both ready. Let’s use VPC Lattice to configure networking.

Creating VPC Lattice Target Groups
I start by creating two target groups, one for the EC2 instance and one for the Lambda function. In the VPC console, there is a new VPC Lattice section in the navigation pane. There, I choose Target groups and then Create target group.

For the first target group, I choose the Instances target type and enter a name.

Console screenshot.

I choose the protocol (HTTP) and port (8080) used by the web app running on the instance. I select the VPC where the instance is running and the protocol version (HTTP1).

Console screenshot.

Now I can configure the health check that will be used to test the target status. In this case, I use the default values proposed by the console.

Console screenshot.

In the next step, I can register the targets. I select the instance on which the web app is running from the list and choose to include it.

Console screenshot.

I review the selected targets (one instance in this case) and choose Submit.

In a similar way, I create a target group for the Lambda function. This time, I select the function from the list. I can choose which function version or function alias to use. For simplicity, I use the $LATEST version.

Console screenshot.

Creating VPC Lattice Services
Now that the target groups are ready, I choose Services in the navigation pane and then Create service. I enter a name and a description.

Console screenshot.

Now, I can choose the authentication type. If I choose None, the service network does not authenticate or authorize client access, and the auth policy, if present, is not used. I select AWS IAM and then, from the Apply policy template dropdown, the template that allows both authenticated and unauthenticated access.

Console screenshot.

In the Monitoring section, I turn on Access logs. As the destination for the access logs, I use an Amazon CloudWatch Log group that I created before. I also have the option to use an Amazon Simple Storage Service (Amazon S3) bucket or a Amazon Kinesis Data Firehose delivery stream.

Console screenshot.

In the next step, I define routing for the service. I choose Add listener. For the protocol, I configure the service to listen using HTTPS. In the default action, I choose to send two-thirds (Weight 20) of the requests to the instance target group and one-third (Weight 10) to the function target group.

Console screenshot.

Then, I add two additional rules. The first rule (Priority 10) sends all requests where the path is /to-instance to the instance target group.

Console screenshot.

The second rule (Priority 20) sends all traffic where the path is /to-function to the function target group.

Console screenshot.

In the next step, I am asked to associate the service with one or more service networks. I didn’t create a service network yet, so I skip this step for now and choose Next. I review the configuration and create the service.

Creating VPC Lattice Service Networks
Now, I create the service network so that I can associate the service and the VPCs I want to use. I choose Service network from the navigation pane and then Create service network. I enter a name and a description for the service network.

Console screenshot.

In the Associate services, I select the service I just created.

Console screenshot.

In the VPC associations, I select the VPC used by the instance where the web app runs. This can help in the future because it allows the web app to call other services associated with the service network.

Console screenshot.

Then, I select a second VPC where I have another EC2 instance that I want to use to run some tests.

Console screenshot.

For simplicity, in the Access section, I select the None auth type.

Console screenshot.

In the Monitoring section, I choose to send the access logs for the whole service network to an S3 bucket.

Console screenshot.

I review the summary of the configuration and create the service network. After a few seconds all service and VPC associations are active, and I can start using the service.

I write down the domain name of the service from the list of service associations.

Console screenshot.

Testing Access to the Service Using VPC Lattice
I look at the Routing tab of the service to find a nice recap of how the listener is handling routing towards the different target groups.

Console screenshot.

Then, I log into the EC2 instance in my second VPC and use curl to call the service domain name. As expected, I get about two-thirds of the responses from the instance and one-third from the function.

curl https://my-service-03e92ee54968d87ca.7d67968.vpc-lattice-svcs.us-west-2.on.aws
Hello from the instance

curl https://my-service-03e92ee54968d87ca.7d67968.vpc-lattice-svcs.us-west-2.on.aws
Hello from the instance

curl https://my-service-03e92ee54968d87ca.7d67968.vpc-lattice-svcs.us-west-2.on.aws
"Hello from the function"

When I call the /to-instance and /to-function paths, the additional rules forward the requests to the instance and the function, respectively.

curl https://my-service-03e92ee54968d87ca.7d67968.vpc-lattice-svcs.us-west-2.on.aws/to-instance
Hello from the instance "to-instance" path

curl https://my-service-03e92ee54968d87ca.7d67968.vpc-lattice-svcs.us-west-2.on.aws/to-function
"Hello from the function"

I can now review access to my service using the access log subscriptions I configured before.

For the service, I look in the CloudWatch Log group. There, I find a log stream containing detailed access information about the service.

Console screenshot.

The access log for all services associated with the service network is on the S3 bucket. I have only one service for now, but more are coming.

Console screenshot.

Available in Preview
Amazon VPC Lattice is available in preview in the US West (Oregon) Region.

VPC Lattice provides deployment consistency across AWS compute types so that you can connect your services across instances, containers, and serverless functions. You can use VPC Lattice to apply granular and rich traffic controls, such as policy-based routing and weighted targets to support blue/green and canary-style deployments.

VPC Lattice allows monitoring and troubleshooting service-to-service communication with detailed access logs and metrics that capture request type, volume of traffic, error rates, response time, and more. In this blog post, I only scratched the surface of what you can do with VPC Lattice.

Simplify the way you connect, secure, and monitor service-to-service communication with Amazon VPC Lattice.

New – Amazon EC2 Hpc6id Instances Optimized for High Performance Computing

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-ec2-hpc6id-instances-optimized-for-high-performance-computing/

We have given you the flexibility and ability to run the largest and most complex high performance computing (HPC) workloads with Amazon Elastic Compute Cloud (Amazon EC2) instances that feature enhanced networking like C5n, C6gnR5n, M5n, and our recently launched HPC instances Hpc6a.

We heard feedback from customers asking us to deliver more options to support their most intensive workloads with higher per-vCPU compute performance as well as larger memory and local disk storage to reduce job completion time for data-intensive workloads like Finite Element Analysis (FEA) and seismic processing.

Announcing Amazon EC2 Hpc6id Instance for HPC Workloads
Today, we announce the general availability of Amazon EC2 Hpc6id instances, a new instance type that is purpose-built for tightly coupled HPC workloads. Amazon EC2 Hpc6id instances are powered by 3rd Gen Intel Xeon Scalable processors (Ice Lake) that run at frequencies up to 3.5 GHz, 1024 GiB memory, 15.2 TB local SSD disk, 200 Gbps Elastic Fabric Adapter (EFA) network bandwidth, which is 4x higher than R6i instances.

Amazon EC2 Hpc6id instances have the best per-vCPU HPC performance when compared to similar x86-based EC2 instances for data-intensive HPC workloads.

Here are the detailed specs:

Instance Name CPUs RAM EFA Network Bandwidth Attached Storage
hpc6id.32xlarge 64 1024 GiB Up to 200 Gbps 15.2 TB local SSD disk

Amazon EC2 Hpc6id Instances Use Cases
Customers running license-bound scenarios can lower infrastructure and HPC software licensing costs with Hpc6id. Other customers with HPC codes that are optimized for Intel-specific features, such as Math Kernel Library or AVX-512, can migrate their largest HPC workloads to Hpc6id and scale up their workloads on AWS by taking advantage of 200 Gbps EFA bandwidth.

Other customers using HPC software codes that are optimized for per-CPU performance are also able to consolidate their workloads on fewer nodes and complete jobs faster with Hpc6id. Faster job completion time helps customers to reduce both infrastructure and software licensing costs. Customers can use Hpc6id instances to quickly carry out complex calculations across a range of cluster sizes—up to tens of thousands of cores.

Customers also can use Hpc6id instances with AWS ParallelCluster to provision Hpc6id instances alongside other instance types, giving customers the flexibility to run different workload types within the same HPC cluster. Hpc6id instances benefit from the AWS Nitro System, a rich collection of building blocks that offloads many of the traditional virtualization functions to dedicated hardware and software to deliver high performance, high availability, and high security while also reducing virtualization overhead.

Now Available
Amazon EC2 Hpc6id instances are available for purchase as On-Demand or Reserved Instances or with Savings Plans. Hpc6id instances are available in the US East (Ohio) and AWS GovCloud (US-West) Regions. To optimize Amazon EC2 Hpc6id instances networking for tightly coupled workloads, use cluster placement groups within a single Availability Zone.

To learn more, visit our Hpc6 instance page and get in touch with our HPC teamAWS re:Post for EC2, or through your usual AWS Support contacts.


New – ENA Express: Improved Network Latency and Per-Flow Performance on EC2

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-ena-express-improved-network-latency-and-per-flow-performance-on-ec2/

We know that you can always make great use of all available network bandwidth and network performance, and have done our best to supply it to you. Over the years, network bandwidth has grown from the 250 Mbps on the original m1 instance to 200 Gbps on the newest m6in instances. In addition to raw bandwidth, we have also introduced advanced networking features including Enhanced Networking, Elastic Network Adapters (ENAs), and (for tightly coupled HPC workloads) Elastic Fabric Adapters (EFAs).

Introducing ENA Express
Today we are launching ENA Express. Building on the Scalable Reliable Datagram (SRD) protocol that already powers Elastic Fabric Adapters, ENA Express reduces P99 latency of traffic flows by up to 50% and P99.9 latency by up to 85% (in comparison to TCP), while also increasing the maximum single-flow bandwidth from 5 Gbps to 25 Gbps. Bottom line, you get a lot more per-flow bandwidth and a lot less variability.

You can enable ENA Express on new and existing ENAs and take advantage of this performance right away for TCP and UDP traffic between c6gn instances running in the same Availability Zone.

Using ENA Express
I used a pair of c6gn instances to set up and test ENA Express. After I launched the instances I used the AWS Management Console to enable ENA Express for both instances. I find each ENI, select it, and choose Manage ENA Express from the Actions menu:

I enable ENA Express and ENA Express UDP and click Save:

Then I set the Maximum Transmission Unit (MTU) to 8900 on both instances:

$ sudo /sbin/ifconfig eth0 mtu 8900

I install iperf3 on both instances, and start the first one in server mode:

$ iperf3 -s
Server listening on 5201

Then I run the second one in client mode and observe the results:

$ iperf3 -c
Connecting to host, port 5201
[  4] local port 35622 connected to port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.80 GBytes  24.1 Gbits/sec    0   1.43 MBytes
[  4]   1.00-2.00   sec  2.81 GBytes  24.1 Gbits/sec    0   1.43 MBytes
[  4]   2.00-3.00   sec  2.80 GBytes  24.1 Gbits/sec    0   1.43 MBytes
[  4]   3.00-4.00   sec  2.81 GBytes  24.1 Gbits/sec    0   1.43 MBytes
[  4]   4.00-5.00   sec  2.81 GBytes  24.1 Gbits/sec    0   1.43 MBytes
[  4]   5.00-6.00   sec  2.80 GBytes  24.1 Gbits/sec    0   1.43 MBytes
[  4]   6.00-7.00   sec  2.80 GBytes  24.1 Gbits/sec    0   1.43 MBytes
[  4]   7.00-8.00   sec  2.81 GBytes  24.1 Gbits/sec    0   1.43 MBytes
[  4]   8.00-9.00   sec  2.81 GBytes  24.1 Gbits/sec    0   1.43 MBytes
[  4]   9.00-10.00  sec  2.81 GBytes  24.1 Gbits/sec    0   1.43 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  28.0 GBytes  24.1 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  28.0 GBytes  24.1 Gbits/sec                  receiver

The ENA driver reports on metrics that I can review to confirm the use of SRD:

ethtool -S eth0 | grep ena_srd
     ena_srd_mode: 3
     ena_srd_tx_pkts: 25858313
     ena_srd_eligible_tx_pkts: 25858323
     ena_srd_rx_pkts: 2831267
     ena_srd_resource_utilization: 0

The metrics work as follows:

  • ena_srd_mode indicates that SRD is enabled for TCP and UDP.
  • ena_srd_tx_pkts denotes the number of packets that have been transmitted via SRD.
  • ena_srd_eligible_pkts denotes the number of packets that were eligible for transmission via SRD. A packet is eligible for SRD if ENA-SRD is enabled on both ends of the connection, both connections reside in the same Availability Zone, and the packet is using either UDP or TCP.
  • ena_srd_rx_pkts denotes the number of packets that have been received via SRD.
  • ena_srd_resource_utilization denotes the percent of allocated Nitro network card resources that are in use, and is proportional to the number of open SRD connections. If this value is consistently approaching 100%, scaling out to more instances or scaling up to a larger instance size may be warranted.

Thing to Know
Here are a couple of things to know about ENA Express and SRD:

Access – I used the Management Console to enable and test ENA Express; CLI, API, CloudFormation and CDK support is also available.

Fallback – If a TCP or UDP packet is not eligible for transmission via SRD, it will simply be transmitted in the usual way.

UDP – SRD takes advantage of multiple network paths and “sprays” packets across them. This would normally present a challenge for applications that expect packets to arrive more or less in order, but ENA Express helps out by putting the UDP packets back into order before delivering them to you, taking the burden off of your application. If you have built your own reliability layer over UDP, or if your application does not require packets to arrive in order, you can enable ENA Express for TCP but not for UDP.

Instance Types and Sizes – We are launching with support for the 16xlarge size of the c6gn instances, with additional instance families and sizes in the works.

Resource Utilization – As I hinted at above, ENA Express uses some Nitro card resources to process packets. This processing also adds a few microseconds of latency per packet processed, and also has a moderate but measurable effect on the maximum number of packets that a particular instance can process per second. In situations where high packet rates are coupled with small packet sizes, ENA Express may not be appropriate. In all other cases you can simply enable SRD to enjoy higher per-flow bandwidth and consistent latency.

Pricing – There is no additional charge for the use of ENA Express.

Regions – ENA Express is available in all commercial AWS Regions.

All About SRD
I could write an entire blog post about SRD, but my colleagues beat me to it! Here are some great resources to help you to learn more:

A Cloud-Optimized Transport for Elastic and Scalable HPC – This paper reviews the challenges that arise when trying to run HPC traffic across a TCP-based network, and points out that the variability (latency outliers) can have a profound effect on scaling efficiency, and includes a succinct overview of SRD:

Scalable reliable datagram (SRD) is optimized for hyper-scale datacenters: it provides load balancing across multiple paths and fast recovery from packet drops or link failures. It utilizes standard ECMP functionality on the commodity Ethernet switches and works around its limitations: the sender controls the ECMP path selection by manipulating packet encapsulation.

There’s a lot of interesting detail in the full paper, and it is well worth reading!

In the Search for Performance, There’s More Than One Way to Build a Network – This 2021 blog post reviews our decision to build the Elastic Fabric Adapter, and includes some important data (and cool graphics) to demonstrate the impact of packet loss on overall application performance. One of the interesting things about SRD is that it keeps track of the availability and performance of multiple network paths between transmitter and receiver, and sprays packets across up to 64 paths at a time in order to take advantage of as much bandwidth as possible and to recover quickly in case of packet loss.


New General Purpose, Compute Optimized, and Memory-Optimized Amazon EC2 Instances with Higher Packet-Processing Performance

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-general-purpose-compute-optimized-and-memory-optimized-amazon-ec2-instances-with-higher-packet-processing-performance/

Today I would like to tell you about the next generation of Intel-powered general purpose, compute-optimized, and memory-optimized instances. All three of these instance families are powered by 3rd generation Intel Xeon Scalable processors (Ice Lake) running at 3.5 GHz, and are designed to support your data-intensive workloads with up to 200 Gbps of network bandwidth, the highest EBS performance in EC2 (up to 80 Gbps of bandwidth and up to 350,000 IOPS), and the ability to handle up to twice as many packets per second (PPS) as earlier instances.

New General Purpose (M6in/M6idn) Instances
The original general purpose EC2 instance (m1.small) was launched in 2006 and was the one and only instance type for a little over a year, until we launched the m1.large and m1.xlarge in late 2007. After that, we added the m3 in 2012, m4 in 2015, and the first in a very long line of m5 instances starting in 2017. The family tree branched in 2018 with the addition of the m5d instances with local NVMe storage.

And that brings us to today, and to the new m6in and m6idn instances, both available in 9 sizes:

Name vCPUs Memory Local Storage
(m6idn only)
Network Bandwidth EBS Bandwidth EBS IOPS
2 8 GiB 118 GB Up to 25 Gbps Up to 20 Gbps Up to 87,500
4 16 GiB 237 GB Up to 30 Gbps Up to 20 Gbps Up to 87,500
8 32 GiB 474 GB Up to 40 Gbps Up to 20 Gbps Up to 87,500
16 64 GiB 950 GB Up to 50 Gbps Up to 20 Gbps Up to 87,500
32 128 GiB 1900 GB 50 Gbps 20 Gbps 87,500
48 192 GiB 2950 GB
(2 x 1425)
75 Gbps 30 Gbps 131,250
64 256 GiB 3800 GB
(2 x 1900)
100 Gbps 40 Gbps 175,000
96 384 GiB 5700 GB
(4 x 1425)
150 Gbps 60 Gbps 262,500
128 512 GiB 7600 GB
(4 x 1900)
200 Gbps 80 Gbps 350,000

The m6in and m6idn instances are available in the US East (Ohio, N. Virginia) and Europe (Ireland) regions in On-Demand and Spot form. Savings Plans and Reserved Instances are available.

New C6in Instances
Back in 2008 we launched the first in what would prove to be a very long line of Amazon Elastic Compute Cloud (Amazon EC2) instances designed to give you high compute performance and a higher ratio of CPU power to memory than the general purpose instances. Starting with those initial c1 instances, we went on to launch cluster computing instances in 2010 (cc1) and 2011 (cc2), and then (once we got our naming figured out), multiple generations of compute-optimized instances powered by Intel processors: c3 (2013), c4 (2015), and c5 (2016). As our customers put these instances to use in environments where networking performance was starting to become a limiting factor, we introduced c5n instances with 100 Gbps networking in 2018. We also broadened the c5 instance lineup by adding additional sizes (including bare metal), and instances with blazing-fast local NVMe storage.

Today I am happy to announce the latest in our lineup of Intel-powered compute-optimized instances, the c6in, available in 9 sizes:

Name vCPUs Memory
Network Bandwidth EBS Bandwidth
c6in.large 2 4 GiB Up to 25 Gbps Up to 20 Gbps Up to 87,500
c6in.xlarge 4 8 GiB Up to 30 Gbps Up to 20 Gbps Up to 87,500
c6in.2xlarge 8 16 GiB Up to 40 Gbps Up to 20 Gbps Up to 87,500
c6in.4xlarge 16 32 GiB Up to 50 Gbps Up to 20 Gbps Up to 87,500
c6in.8xlarge 32 64 GiB 50 Gbps 20 Gbps 87,500
c6in.12xlarge 48 96 GiB 75 Gbps 30 Gbps 131,250
c6in.16xlarge 64 128 GiB 100 Gbps 40 Gbps 175,000
c6in.24xlarge 96 192 GiB 150 Gbps 60 Gbps 262,500
c6in.32xlarge 128 256 GiB 200 Gbps 80 Gbps 350,000

The c6in instances are available in the US East (Ohio, N. Virginia), US West (Oregon), and Europe (Ireland) Regions.

As I noted earlier, these instances are designed to be able to handle up to twice as many packets per second (PPS) as their predecessors. This allows them to deliver increased performance in situations where they need to handle a large number of small-ish network packets, which will accelerate many applications and use cases includes network virtual appliances (firewalls, virtual routers, load balancers, and appliances that detect and protect against DDoS attacks), telecommunications (Voice over IP (VoIP) and 5G communication), build servers, caches, in-memory databases, and gaming hosts. With more network bandwidth and PPS on tap, heavy-duty analytics applications that retrieve and store massive amounts of data and objects from Amazon Amazon Simple Storage Service (Amazon S3) or data lakes will benefit. For workloads that benefit from low latency local storage, the disk versions of the new instances offer twice as much instance storage versus previous generation.

New Memory-Optimized (R6in/R6idn) Instances
The first memory-optimized instance was the m2, launched in 2009 with the now-quaint Double Extra Large and Quadruple Extra Large names, and a higher ration of memory to CPU power than the earlier m1 instances. We had yet to learn our naming lesson and launched the High Memory Cluster Eight Extra Large (aka cr1.8xlarge) in 2013, before settling on the r prefix and launching r3 instances in 2013, followed by r4 instances in 2014, and r5 instances in 2018.

And again that brings us to today, and to the new r6in and r6idn instances, also available in 9 sizes:

Name vCPUs Memory Local Storage
(r6idn only)
Network Bandwidth EBS Bandwidth EBS IOPS
2 16 GiB 118 GB Up to 25 Gbps Up to 20 Gbps Up to 87,500
4 32 GiB 237 GB Up to 30 Gbps Up to 20 Gbps Up to 87,500
8 64 GiB 474 GB Up to 40 Gbps Up to 20 Gbps Up to 87,500
16 128 GiB 950 GB Up to 50 Gbps Up to 20 Gbps Up to 87,500
32 256 GiB 1900 GB 50 Gbps 20 Gbps 87,500
48 384 GiB 2950 GB
(2 x 1425)
75 Gbps 30 Gbps 131,250
64 512 GiB 3800 GB
(2 x 1900)
100 Gbps 40 Gbps 175,000
96 768 GiB 5700 GB
(4 x 1425)
150 Gbps 60 Gbps 262,500
128 1024 GiB 7600 GB
(4 x 1900)
200 Gbps 80 Gbps 350,000

The r6in and r6idn instances are available in the US East (Ohio, N. Virginia), US West (Oregon), and Europe (Ireland) regions in On-Demand and Spot form. Savings Plans and Reserved Instances are available.

Inside the Instances
As you can probably guess from these specs and from the blog post that I wrote to launch the c6in instances, all of these new instance types have a lot in common. I’ll do a rare cut-and-paste from that post in order to reiterate all of the other cool features that are available to you:

Ice Lake Processors – The 3rd generation Intel Xeon Scalable processors run at 3.5 GHz, and (according to Intel) offer a 1.46x average performance gain over the prior generation. All-core Intel Turbo Boost mode is enabled on all instance sizes up to and including the 12xlarge. On the larger sizes, you can control the C-states. Intel Total Memory Encryption (TME) is enabled, protecting instance memory with a single, transient 128-bit key generated at boot time within the processor.

NUMA – Short for Non-Uniform Memory Access, this important architectural feature gives you the power to optimize for workloads where the majority of requests for a particular block of memory come from one of the processors, and that block is “closer” (architecturally speaking) to one of the processors. You can control processor affinity (and take advantage of NUMA) on the 24xlarge and 32xlarge instances.

NetworkingElastic Network Adapter (ENA) is available on all sizes of m6in, m6idn, c6in, r6in, and r6idn instances, and Elastic Fabric Adapter (EFA) is available on the 32xlarge instances. In order to make use of these adapters, you will need to make sure that your AMI includes the latest NVMe and ENA drivers. You can also make use of Cluster Placement Groups.

io2 Block Express – You can use all types of EBS volumes with these instances, including the io2 Block Express volumes that we launched earlier this year. As Channy shared in his post (Amazon EBS io2 Block Express Volumes with Amazon EC2 R5b Instances Are Now Generally Available), these volumes can be as large as 64 TiB, and can deliver up to 256,000 IOPS. As you can see from the tables above, you can use a 24xlarge or 32xlarge instance to achieve this level of performance.

Choosing the Right Instance
Prior to today’s launch, you could choose a c5n, m5n, or r5n instance to get the highest network bandwidth on an EC2 instance, or an r5b instance to have access to the highest EBS IOPS performance and high EBS bandwidth. Now, customers who need high networking or EBS performance can choose from a full portfolio of instances with different memory to vCPU ratio and instance storage options available, by selecting one of c6in, m6in, m6idn, r6in, or r6idn instances.

The higher performance of the c6in instances will allow you to scale your network intensive workloads that need a low memory to vCPU, such as network virtual appliances, caching servers, and gaming hosts.

The higher performance of m6in instances will allow you to scale your network and/or EBS intensive workloads such as data analytics, and telco applications including 5G User Plane Functions (UPF). You have the option to use the m6idn instance for workloads that benefit from low-latency local storage, such as high-performance file systems, or distributed web-scale in-memory caches.

Similarly, the higher network and EBS performance of the r6in instances will allow you to scale your network-intensive SQL, NoSQL, and in-memory database workloads, with the option to use the r6idn when you need low-latency local storage.


New Amazon EC2 Instance Types In the Works – C7gn, R7iz, and Hpc7g

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-ec2-instance-types-in-the-works-c7gn-r7iz-and-hpc7g/

We are getting ready to launch three new Amazon Elastic Compute Cloud (Amazon EC2) instance types and I am happy to be able to give you a sneak peek at them today.

C7gn Instances are designed for your most demanding network-intensive workloads: network virtual appliances (firewalls, virtual routers, load balancers, and so forth), data analytics, and tightly-coupled cluster computing jobs. They are powered by AWS Graviton3E processors and will support up to 200 Gbps of network bandwidth, along with 50% higher packet processing performance. The c7gn instances will be available in multiple sizes with up to 64 vCPUs and 128 GiB of memory. We are launching the preview today and you can Sign Up Today to join in.

Hpc7g Instances are also powered by AWS Graviton3E processors, with up to 35% higher vector instruction processing performance than the Graviton3. They are designed to give you the best price/performance for tightly coupled compute-intensive HPC and distributed computing workloads, and deliver 200 Gbps of dedicated network bandwidth that is optimized for traffic between instances in the same VPC. The hpc7g instances will be available in multiple sizes with up to 64 vCPUs and 128 GiB of memory. I’ll have more information to share on these instances in early 2023.

R7iz Instances are powered by the latest 4th generation Intel Xeon Scalable Processors (code named Sapphire Rapids) and run at a sustained all-core turbo frequency of 3.9 GHz. With high performance and DDR5 memory, these instances are a perfect match for your Electronic Design Automation (EDA), financial, actuarial, and simulation workloads. They are also great hosts for relational databases and other commercial software that is licensed on a per-core basis. The r7iz instances will be available in multiple sizes with up to 128 vCPUs and 1 TiB of memory. We are launching the instances in preview today and you can Sign up Today to participate.


Our guide to AWS Compute at re:Invent 2022

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/our-guide-to-aws-compute-at-reinvent-2022/

This blog post is written by Shruti Koparkar, Senior Product Marketing Manager, Amazon EC2.

AWS re:Invent is the most transformative event in cloud computing and it is starting on November 28, 2022. AWS Compute team has many exciting sessions planned for you covering everything from foundational content, to technology deep dives, customer stories, and even hands on workshops. To help you build out your calendar for this year’s re:Invent, let’s look at some highlights from the AWS Compute track in this blog. Please visit the session catalog for a full list of AWS Compute sessions.

Learn what powers AWS Compute

AWS offers the broadest and deepest functionality for compute. Amazon Elastic Cloud Compute (Amazon EC2) offers granular control for managing your infrastructure with the choice of processors, storage, and networking.

The AWS Nitro System is the underlying platform for our all our modern EC2 instances. It enables AWS to innovate faster, further reduce cost for our customers, and deliver added benefits like increased security and new instance types.

Discover the benefits of AWS Silicon

AWS has invested years designing custom silicon optimized for the cloud. This investment helps us deliver high performance at lower costs for a wide range of applications and workloads using AWS services.

  • Explore the AWS journey into silicon innovation with our “CMP201: Silicon Innovation at AWS” session. We will cover some of the thought processes, learnings, and results from our experience building silicon for AWS Graviton, AWS Nitro System, and AWS Inferentia.
  • To learn about customer-proven strategies to help you make the move to AWS Graviton quickly and confidently while minimizing uncertainty and risk, attend “CMP410: Framework for adopting AWS Graviton-based instances”.

 Explore different use cases

Amazon EC2 provides secure and resizable compute capacity for several different use-cases including general purpose computing for cloud native and enterprise applications, and accelerated computing for machine learning and high performance computing (HPC) applications.

High performance computing

  • HPC on AWS can help you design your products faster with simulations, predict the weather, detect seismic activity with greater precision, and more. To learn how to solve world’s toughest problems with extreme-scale compute come join us for “CMP205: HPC on AWS: Solve complex problems with pay-as-you-go infrastructure”.
  • Single on-premises general-purpose supercomputers can fall short when solving increasingly complex problems. Attend “CMP222: Redefining supercomputing on AWS” to learn how AWS is reimagining supercomputing to provide scientists and engineers with more access to world-class facilities and technology.
  • AWS offers many solutions to design, simulate, and verify the advanced semiconductor devices that are the foundation of modern technology. Attend “CMP320: Accelerating semiconductor design, simulation, and verification” to hear from ARM and Marvel about how they are using AWS to accelerate EDA workloads.

Machine Learning

Cost Optimization

Hear from our customers

We have several sessions this year where AWS customers are taking the stage to share their stories and details of exciting innovations made possible by AWS.

Get started with hands-on sessions

Nothing like a hands-on session where you can learn by doing and get started easily with AWS compute. Our speakers and workshop assistants will help you every step of the way. Just bring your laptop to get started!

You’ll get to meet the global cloud community at AWS re:Invent and get an opportunity to learn, get inspired, and rethink what’s possible. So build your schedule in the re:Invent portal and get ready to hit the ground running. We invite you to stop by the AWS Compute booth and chat with our experts. We look forward to seeing you in Las Vegas!

Introducing the price-capacity-optimized allocation strategy for EC2 Spot Instances

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/introducing-price-capacity-optimized-allocation-strategy-for-ec2-spot-instances/

This blog post is written by Jagdeep Phoolkumar, Senior Specialist Solution Architect, Flexible Compute and Peter Manastyrny, Senior Product Manager Tech, EC2 Core.

Amazon EC2 Spot Instances are unused Amazon Elastic Compute Cloud (Amazon EC2) capacity in the AWS Cloud available at up to a 90% discount compared to On-Demand prices. One of the best practices for using EC2 Spot Instances is to be flexible across a wide range of instance types to increase the chances of getting the aggregate compute capacity. Amazon EC2 Auto Scaling and Amazon EC2 Fleet make it easy to configure a request with a flexible set of instance types, as well as use a Spot allocation strategy to determine how to fulfill Spot capacity from the Spot Instance pools that you provide in your request.

The existing allocation strategies available in Amazon EC2 Auto Scaling and Amazon EC2 Fleet are called “lowest-price” and “capacity-optimized”. The lowest-price allocation strategy allocates Spot Instance pools where the Spot price is currently the lowest. Customers told us that in some cases the lowest-price strategy picks the Spot Instance pools that are not optimized for capacity availability and results in more frequent Spot Instance interruptions. As an improvement over lowest-price allocation strategy, in August 2019 AWS launched the capacity-optimized allocation strategy for Spot Instances, which helps customers tap into the deepest Spot Instance pools by analyzing capacity metrics. Since then, customers have seen a significantly lower interruption rate with capacity-optimized strategy when compared to the lowest-price strategy. You can read more about these customer stories in the Capacity-Optimized Spot Instance Allocation in Action at Mobileye and Skyscanner blog post. The capacity-optimized allocation strategy strictly selects the deepest pools. Therefore, sometimes it can pick high-priced pools even when there are low-priced pools available with marginally less capacity. Customers have been telling us that, for an optimal experience, they would like an allocation strategy that balances the best trade-offs between lowest-price and capacity-optimized.

Today, we’re excited to share the new price-capacity-optimized allocation strategy that makes Spot Instance allocation decisions based on both the price and the capacity availability of Spot Instances. The price-capacity-optimized allocation strategy should be the first preference and the default allocation strategy for most Spot workloads.

This post illustrates how the price-capacity-optimized allocation strategy selects Spot Instances in comparison with lowest-price and capacity-optimized. Furthermore, it discusses some common use cases of the price-capacity-optimized allocation strategy.


The price-capacity-optimized allocation strategy makes Spot allocation decisions based on both capacity availability and Spot prices. In comparison to the lowest-price allocation strategy, the price-capacity-optimized strategy doesn’t always attempt to launch in the absolute lowest priced Spot Instance pool. Instead, price-capacity-optimized attempts to diversify as much as possible across the multiple low-priced pools with high capacity availability. As a result, the price-capacity-optimized strategy in most cases has a higher chance of getting Spot capacity and delivers lower interruption rates when compared to the lowest-price strategy. If you factor in the cost associated with retrying the interrupted requests, then the price-capacity-optimized strategy becomes even more attractive from a savings perspective over the lowest-price strategy.

We recommend the price-capacity-optimized allocation strategy for workloads that require optimization of cost savings, Spot capacity availability, and interruption rates. For existing workloads using lowest-price strategy, we recommend price-capacity-optimized strategy as a replacement. The capacity-optimized allocation strategy is still suitable for workloads that either use similarly priced instances, or ones where the cost of interruption is so significant that any cost saving is inadequate in comparison to a marginal increase in interruptions.


In this section, we illustrate how the price-capacity-optimized allocation strategy deploys Spot capacity when compared to the other two allocation strategies. The following example configuration shows how Spot capacity could be allocated in an Auto Scaling group using the different allocation strategies:

    "AutoScalingGroupName": "myasg ",
    "MixedInstancesPolicy": {
        "LaunchTemplate": {
            "LaunchTemplateSpecification": {
                "LaunchTemplateId": "lt-abcde12345"
            "Overrides": [
                    "InstanceRequirements": {
                        "VCpuCount": {
                            "Min": 4,
                            "Max": 4
                        "MemoryMiB": {
                            "Min": 0,
                            "Max": 16384
                        "InstanceGenerations": [
                        "BurstablePerformance": "excluded",
                        "AcceleratorCount": {
                            "Max": 0
        "InstancesDistribution": {
            "OnDemandPercentageAboveBaseCapacity": 0,
            "SpotAllocationStrategy": "spot-allocation-strategy"
    "MinSize": 10,
    "MaxSize": 100,
    "DesiredCapacity": 60,
    "VPCZoneIdentifier": "subnet-a12345a,subnet-b12345b,subnet-c12345c"

First, Amazon EC2 Auto Scaling attempts to balance capacity evenly across Availability Zones (AZ). Next, Amazon EC2 Auto Scaling applies the Spot allocation strategy using the 30+ instances selected by attribute-based instance type selection, in each Availability Zone. The results after testing different allocation strategies are as follows:

  • Price-capacity-optimized strategy diversifies over multiple low-priced Spot Instance pools that are optimized for capacity availability.
  • Capacity-optimize strategy identifies Spot Instance pools that are only optimized for capacity availability.
  • Lowest-price strategy by default allocates the two lowest priced Spot Instance pools that aren’t optimized for capacity availability

To find out how each allocation strategy fares regarding Spot savings and capacity, we compare ‘Cost of Auto Scaling group’ (number of instances x Spot price/hour for each type of instance) and ‘Spot interruptions rate’ (number of instances interrupted/number of instances launched) for each allocation strategy. We use fictional numbers for the purpose of this post. However, you can use the Cloud Intelligence Dashboards to find the actual Spot Saving, and the Amazon EC2 Spot interruption dashboard to log Spot Instance interruptions. The example results after a 30-day period are as follows:

Allocation strategy

Instance allocation

Cost of Auto Scaling group

Spot interruptions rate


40 c6i.xlarge

20 c5.xlarge

$4.80/hour 3%


60 c5.xlarge




30 c5a.xlarge

30 m5n.xlarge



As per the above table, with the price-capacity-optimized strategy, the cost of the Auto Scaling group is only 5 cents (1%) higher, whereas the rate of Spot interruptions is six times lower (3% vs 20%) than the lowest-price strategy. In summary, from this exercise you learn that the price-capacity-optimized strategy provides the optimal Spot experience that is the best of both the lowest-price and capacity-optimized allocation strategies.

Common use-cases of price-capacity-optimized allocation strategy

Earlier we mentioned that the price-capacity-optimized allocation strategy is recommended for most Spot workloads. To elaborate further, in this section we explore some of these common workloads.

Stateless and fault-tolerant workloads

Stateless workloads that can complete ongoing requests within two minutes of a Spot interruption notice, and the fault-tolerant workloads that have a low cost of retries, are the best fit for the price-capacity-optimized allocation strategy. This category has workloads such as stateless containerized applications, microservices, web applications, data and analytics jobs, and batch processing.

Workloads with a high cost of interruption

Workloads that have a high cost of interruption associated with an expensive cost of retries should implement checkpointing to lower the cost of interruptions. By using checkpointing, you make the price-capacity-optimized allocation strategy a good fit for these workloads, as it allocates capacity from the low-priced Spot Instance pools that offer a low Spot interruptions rate. This category has workloads such as long Continuous Integration (CI), image and media rendering, Deep Learning, and High Performance Compute (HPC) workloads.


We recommend that customers use the price-capacity-optimized allocation strategy as the default option. The price-capacity-optimized strategy helps Amazon EC2 Auto Scaling groups and Amazon EC2 Fleet provision target capacity with an optimal experience. Updating to the price-capacity-optimized allocation strategy is as simple as updating a single parameter in an Amazon EC2 Auto Scaling group and Amazon EC2 Fleet.

To learn more about allocation strategies for Spot Instances, visit the Spot allocation strategies documentation page.

Running AI-ML Object Detection Model to Process Confidential Data using Nitro Enclaves

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/running-ai-ml-object-detection-model-to-process-confidential-data-using-nitro-enclaves/

This blog post was written by, Antoine Awad, Solutions Architect, Kevin Taylor, Senior Solutions Architect and Joel Desaulniers, Senior Solutions Architect.

Machine Learning (ML) models are used for inferencing of highly sensitive data in many industries such as government, healthcare, financial, and pharmaceutical. These industries require tools and services that protect their data in transit, at rest, and isolate data while in use. During processing, threats may originate from the technology stack such as the operating system or programs installed on the host which we need to protect against. Having a process that enforces the separation of roles and responsibilities within an organization minimizes the ability of personnel to access sensitive data. In this post, we walk you through how to run ML inference inside AWS Nitro Enclaves to illustrate how your sensitive data is protected during processing.

We are using a Nitro Enclave to run ML inference on sensitive data which helps reduce the attack surface area when the data is decrypted for processing. Nitro Enclaves enable you to create isolated compute environments within Amazon EC2 instances to protect and securely process highly sensitive data. Enclaves have no persistent storage, no interactive access, and no external networking. Communication between your instance and your enclave is done using a secure local channel called a vsock. By default, even an admin or root user on the parent instance will not be able to access the enclave.


Our example use-case demonstrates how to deploy an AI/ML workload and run inferencing inside Nitro Enclaves to securely process sensitive data. We use an image to demonstrate the process of how data can be encrypted, stored, transferred, decrypted and processed when necessary, to minimize the risk to your sensitive data. The workload uses an open-source AI/ML model to detect objects in an image, representing the sensitive data, and returns a summary of the type of objects detected. The image below is used for illustration purposes to provide clarity on the inference that occurs inside the Nitro Enclave. It was generated by adding bounding boxes to the original image based on the coordinates returned by the AI/ML model.

Image of airplanes with bounding boxes

Figure 1 – Image of airplanes with bounding boxes

To encrypt this image, we are using a Python script (Encryptor app – see Figure 2) which runs on an EC2 instance, in a real-world scenario this step would be performed in a secure environment like a Nitro Enclave or a secured workstation before transferring the encrypted data. The Encryptor app uses AWS KMS envelope encryption with a symmetrical Customer Master Key (CMK) to encrypt the data.

Image Encryption with AWS KMS using Envelope Encryption

Figure 2 – Image Encryption with AWS KMS using Envelope Encryption

Note, it’s also possible to use asymmetrical keys to perform the encryption/decryption.

Now that the image is encrypted, let’s look at each component and its role in the solution architecture, see Figure 3 below for reference.

  1. The Client app reads the encrypted image file and sends it to the Server app over the vsock (secure local communication channel).
  2. The Server app, running inside a Nitro Enclave, extracts the encrypted data key and sends it to AWS KMS for decryption. Once the data key is decrypted, the Server app uses it to decrypt the image and run inference on it to detect the objects in the image. Once the inference is complete, the results are returned to the Client app without exposing the original image or sensitive data.
  3. To allow the Nitro Enclave to communicate with AWS KMS, we use the KMS Enclave Tool which uses the vsock to connect to AWS KMS and decrypt the encrypted key.
  4. The vsock-proxy (packaged with the Nitro CLI) routes incoming traffic from the KMS Tool to AWS KMS provided that the AWS KMS endpoint is included on the vsock-proxy allowlist. The response from AWS KMS is then sent back to the KMS Enclave Tool over the vsock.

As part of the request to AWS KMS, the KMS Enclave Tool extracts and sends a signed attestation document to AWS KMS containing the enclave’s measurements to prove its identity. AWS KMS will validate the attestation document before decrypting the data key. Once validated, the data key is decrypted and securely returned to the KMS Tool which securely transfers it to the Server app to decrypt the image.

Solution architecture diagram for this blog post

Figure 3 – Solution architecture diagram for this blog post

Environment Setup


Before we get started, you will need the following prequisites to deploy the solution:

  1. AWS account
  2. AWS Identity and Access Management (IAM) role with appropriate access

AWS CloudFormation Template

We are going to use AWS CloudFormation to provision our infrastructure.

  1. Download the CloudFormation (CFN) template nitro-enclave-demo.yaml. This template orchestrates an EC2 instance with the required networking components such as a VPC, Subnet and NAT Gateway.
  2. Log in to the AWS Management Console and select the AWS Region where you’d like to deploy this stack. In the example, we select Canada (Central).
  3. Open the AWS CloudFormation console at: https://console.aws.amazon.com/cloudformation/
  4. Choose Create Stack, Template is ready, Upload a template file. Choose File to select nitro-enclave-demo.yaml that you saved locally.
  5. Choose Next, enter a stack name such as NitroEnclaveStack, choose Next.
  6. On the subsequent screens, leave the defaults, and continue to select Next until you arrive at the Review step
  7. At the Review step, scroll to the bottom and place a checkmark in “I acknowledge that AWS CloudFormation might create IAM resources with custom names.” and click “Create stack”
  8. The stack status is initially CREATE_IN_PROGRESS. It will take around 5 minutes to complete. Click the Refresh button periodically to refresh the status. Upon completion, the status changes to CREATE_COMPLETE.
  9. Once completed, click on “Resources” tab and search for “NitroEnclaveInstance”, click on its “Physical ID” to navigate to the EC2 instance
  10. On the Amazon EC2 page, select the instance and click “Connect”
  11. Choose “Session Manager” and click “Connect”

EC2 Instance Configuration

Now that the EC2 instance has been provisioned and you are connected to it, follow these steps to configure it:

  1. Install the Nitro Enclaves CLI which will allow you to build and run a Nitro Enclave application:
    sudo amazon-linux-extras install aws-nitro-enclaves-cli -y
    sudo yum install aws-nitro-enclaves-cli-devel -y
  2. Verify that the Nitro Enclaves CLI was installed successfully by running the following command:
    nitro-cli --version

    Nitro Enclaves CLI

  3. To download the application from GitHub and build a docker image, you need to first install Docker and Git by executing the following commands:
    sudo yum install git -y
    sudo usermod -aG ne ssm-user
    sudo usermod -aG docker ssm-user
    sudo systemctl start docker && sudo systemctl enable docker

Nitro Enclave Configuration

A Nitro Enclave is an isolated environment which runs within the EC2 instance, hence we need to specify the resources (CPU & Memory) that the Nitro Enclaves allocator service dedicates to the enclave.

  1. Enter the following commands to set the CPU and Memory available for the Nitro Enclave allocator service to allocate to your enclave container:
    sudo sed -r "s/^(\s*${MEM_KEY}\s*:\s*).*/\1${DEFAULT_MEM}/" -i "${ALLOCATOR_YAML}"
    sudo systemctl start nitro-enclaves-allocator.service && sudo systemctl enable nitro-enclaves-allocator.service
  2. To verify the configuration has been applied, run the following command and note the values for memory_mib and cpu_count:
    cat /etc/nitro_enclaves/allocator.yaml

    Enclave Configuration File

Creating a Nitro Enclave Image

Download the Project and Build the Enclave Base Image

Now that the EC2 instance is configured, download the workload code and build the enclave base Docker image. This image contains the Nitro Enclaves Software Development Kit (SDK) which allows an enclave to request a cryptographically signed attestation document from the Nitro Hypervisor. The attestation document includes unique measurements (SHA384 hashes) that are used to prove the enclave’s identity to services such as AWS KMS.

  1. Clone the Github Project
    cd ~/ && git clone https://github.com/aws-samples/aws-nitro-enclaves-ai-ml-object-detection.git
  2. Navigate to the cloned project’s folder and build the “enclave_base” image:
    cd ~/aws-nitro-enclaves-ai-ml-object-detection/enclave-base-image
    sudo docker build ./ -t enclave_base

    Note: The above step will take approximately 8-10 minutes to complete.

Build and Run The Nitro Enclave Image

To build the Nitro Enclave image of the workload, build a docker image of your application and then use the Nitro CLI to build the Nitro Enclave image:

  1. Download TensorFlow pre-trained model:
    cd ~/aws-nitro-enclaves-ai-ml-object-detection/src
    mkdir -p models/faster_rcnn_openimages_v4_inception_resnet_v2_1 && cd models/
    wget -O tensorflow-model.tar.gz https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1?tf-hub-format=compressed
    tar -xvf tensorflow-model.tar.gz -C faster_rcnn_openimages_v4_inception_resnet_v2_1
  2. Navigate to the use-case folder and build the docker image for the application:
    cd ~/aws-nitro-enclaves-ai-ml-object-detection/src
    sudo docker build ./ -t nitro-enclave-container-ai-ml:latest
  3. Use the Nitro CLI to build an Enclave Image File (.eif) using the docker image you built in the previous step:
    sudo nitro-cli build-enclave --docker-uri nitro-enclave-container-ai-ml:latest --output-file nitro-enclave-container-ai-ml.eif
  4. The output of the previous step produces the Platform configuration registers or PCR hashes and a nitro enclave image file (.eif). Take note of the PCR0 value, which is a hash of the enclave image file.Example PCR0:
        "Measurements": {
            "PCR0": "7968aee86dc343ace7d35fa1a504f955ee4e53f0d7ad23310e7df535a187364a0e6218b135a8c2f8fe205d39d9321923"
  5. Launch the Nitro Enclave container using the Enclave Image File (.eif) generated in the previous step and allocate resources to it. You should allocate at least 4 times the EIF file size for enclave memory. This is necessary because the tmpfs filesystem uses half of the memory and the remainder of the memory is used to uncompress the initial initramfs where the application executable resides. For CPU allocation, you should allocate CPU in full cores i.e. 2x vCPU for x86 hyper-threaded instances.
    In our case, we are going to allocate 14GB or 14,366 MB for the enclave:

    sudo nitro-cli run-enclave --cpu-count 2 --memory 14336 --eif-path nitro-enclave-container-ai-ml.eif

    Note: Allow a few seconds for the server to boot up prior to running the Client app in the below section “Object Detection using Nitro Enclaves”.

Update the KMS Key Policy to Include the PCR0 Hash

Now that you have the PCR0 value for your enclave image, update the KMS key policy to only allow your Nitro Enclave container access to the KMS key.

  1. Navigate to AWS KMS in your AWS Console and make sure you are in the same region where your CloudFormation template was deployed
  2. Select “Customer managed keys”
  3. Search for a key with alias “EnclaveKMSKey” and click on it
  4. Click “Edit” on the “Key Policy”
  5. Scroll to the bottom of the key policy and replace the value of “EXAMPLETOBEUPDATED” for the “kms:RecipientAttestation:PCR0” key with the PCR0 hash you noted in the previous section and click “Save changes”

AI/ML Object Detection using a Nitro Enclave

Now that you have an enclave image file, run the components of the solution.

Requirements Installation for Client App

  1. Install the python requirements using the following command:
    cd ~/aws-nitro-enclaves-ai-ml-object-detection/src
    pip3 install -r requirements.txt
  2. Set the region that your CloudFormation stack is deployed in. In our case we selected Canada (Centra)
  3. Run the following command to encrypt the image using the AWS KMS key “EnclaveKMSKey”, make sure to replace “ca-central-1” with the region where you deployed your CloudFormation template:
    python3 ./envelope-encryption/encryptor.py --filePath ./images/air-show.jpg --cmkId alias/EnclaveKMSkey --region $CFN_REGION
  4. Verify that the output contains: file encrypted? True
    Note: The previous command generates two files: an encrypted image file and an encrypted data key file. The data key file is generated so we can demonstrate an attempt from the parent instance at decrypting the data key.

Launching VSock Proxy

Launch the VSock Proxy which proxies requests from the Nitro Enclave to an external endpoint, in this case, to AWS KMS. Note the file vsock-proxy-config.yaml contains a list of endpoints which allow-lists the endpoints that an enclave can communicate with.

cd ~/aws-nitro-enclaves-ai-ml-object-detection/src
vsock-proxy 8001 "kms.$CFN_REGION.amazonaws.com" 443 --config vsock-proxy-config.yaml &

Object Detection using Nitro Enclaves

Send the encrypted image to the enclave to decrypt the image and use the AI/ML model to detect objects and return a summary of the objects detected:

cd ~/aws-nitro-enclaves-ai-ml-object-detection/src
python3 client.py --filePath ./images/air-show.jpg.encrypted | jq -C '.'

The previous step takes around a minute to complete when first called. Inside the enclave, the server application decrypts the image, runs it through the AI/ML model to generate a list of objects detected and returns that list to the client application.

Parent Instance Credentials

Attempt to Decrypt Data Key using Parent Instance Credentials

To prove that the parent instance is not able to decrypt the content, attempt to decrypt the image using the parent’s credentials:

cd ~/aws-nitro-enclaves-ai-ml-object-detection/src
aws kms decrypt --ciphertext-blob fileb://images/air-show.jpg.data_key.encrypted --region $CFN_REGION

Note: The command is expected to fail with AccessDeniedException, since the parent instance is not allowed to decrypt the data key.

Cleaning up

  1. Open the AWS CloudFormation console at: https://console.aws.amazon.com/cloudformation/.
  2. Select the stack you created earlier, such as NitroEnclaveStack.
  3. Choose Delete, then choose Delete Stack.
  4. The stack status is initially DELETE_IN_PROGRESS. Click the Refresh button periodically to refresh its status. The status changes to DELETE_COMPLETE after it’s finished and the stack name no longer appears in your list of active stacks.


In this post, we showcase how to process sensitive data with Nitro Enclaves using an AI/ML model deployed on Amazon EC2, as well as how to integrate an enclave with AWS KMS to restrict access to an AWS KMS CMK so that only the Nitro Enclave is allowed to use the key and decrypt the image.

We encrypt the sample data with envelope encryption to illustrate how to protect, transfer and securely process highly sensitive data. This process would be similar for any kind of sensitive information such as personally identifiable information (PII), healthcare or intellectual property (IP) which could also be the AI/ML model.

Dig deeper by exploring how to further restrict your AWS KMS CMK using additional PCR hashes such as PCR1 (hash of the Linux kernel and bootstrap), PCR2 (Hash of the application), and other hashes available to you.

Also, try our comprehensive Nitro Enclave workshop which includes use-cases at different complexity levels.

Simplifying Amazon EC2 instance type flexibility with new attribute-based instance type selection features

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/simplifying-amazon-ec2-instance-type-flexibility-with-new-attribute-based-instance-type-selection-features/

This blog is written by Rajesh Kesaraju, Sr. Solution Architect, EC2-Flexible Compute and Peter Manastyrny, Sr. Product Manager, EC2.

Today AWS is adding two new attributes for the attribute-based instance type selection (ABS) feature to make it even easier to create and manage instance type flexible configurations on Amazon EC2. The new network bandwidth attribute allows customers to request instances based on the network requirements of their workload. The new allowed instance types attribute is useful for workloads that have some instance type flexibility but still need more granular control over which instance types to run on.

The two new attributes are supported in EC2 Auto Scaling Groups (ASG), EC2 Fleet, Spot Fleet, and Spot Placement Score.

Before exploring the new attributes in detail, let us review the core ABS capability.

ABS refresher

ABS lets you express your instance type requirements as a set of attributes, such as vCPU, memory, and storage when provisioning EC2 instances with ASG, EC2 Fleet, or Spot Fleet. Your requirements are translated by ABS to all matching EC2 instance types, simplifying the creation and maintenance of instance type flexible configurations. ABS identifies the instance types based on attributes that you set in ASG, EC2 Fleet, or Spot Fleet configurations. When Amazon EC2 releases new instance types, ABS will automatically consider them for provisioning if they match the selected attributes, removing the need to update configurations to include new instance types.

ABS helps you to shift from an infrastructure-first to an application-first paradigm. ABS is ideal for workloads that need generic compute resources and do not necessarily require the hardware differentiation that the Amazon EC2 instance type portfolio delivers. By defining a set of compute attributes instead of specific instance types, you allow ABS to always consider the broadest and newest set of instance types that qualify for your workload. When you use EC2 Spot Instances to optimize your costs and save up to 90% compared to On-Demand prices, instance type diversification is the key to access the highest amount of Spot capacity. ABS provides an easy way to configure and maintain instance type flexible configurations to run fault-tolerant workloads on Spot Instances.

We recommend ABS as the default compute provisioning method for instance type flexible workloads including containerized apps, microservices, web applications, big data, and CI/CD.

Now, let us dive deep on the two new attributes: network bandwidth and allowed instance types.

How network bandwidth attribute for ABS works

Network bandwidth attribute allows customers with network-sensitive workloads to specify their network bandwidth requirements for compute infrastructure. Some of the workloads that depend on network bandwidth include video streaming, networking appliances (e.g., firewalls), and data processing workloads that require faster inter-node communication and high-volume data handling.

The network bandwidth attribute uses the same min/max format as other ABS attributes (e.g., vCPU count or memory) that assume a numeric value or range (e.g., min: ‘10’ or min: ‘15’; max: ‘40’). Note that setting the minimum network bandwidth does not guarantee that your instance will achieve that network bandwidth. ABS will identify instance types that support the specified minimum bandwidth, but the actual bandwidth of your instance might go below the specified minimum at times.

Two important things to remember when using the network bandwidth attribute are:

  • ABS will only take burst bandwidth values into account when evaluating maximum values. When evaluating minimum values, only the baseline bandwidth will be considered.
    • For example, if you specify the minimum bandwidth as 10 Gbps, instances that have burst bandwidth of “up to 10 Gbps” will not be considered, as their baseline bandwidth is lower than the minimum requested value (e.g., m5.4xlarge is burstable up to 10 Gbps with a baseline bandwidth of 5 Gbps).
    • Alternatively, c5n.2xlarge, which is burstable up to 25 Gbps with a baseline bandwidth of 10 Gbps will be considered because its baseline bandwidth meets the minimum requested value.
  • Our recommendation is to only set a value for maximum network bandwidth if you have specific requirements to restrict instances with higher bandwidth. That would help to ensure that ABS considers the broadest possible set of instance types to choose from.

Using the network bandwidth attribute in ASG

In this example, let us look at a high-performance computing (HPC) workload or similar network bandwidth sensitive workload that requires a high volume of inter-node communications. We use ABS to select instances that have at minimum 10 Gpbs of network bandwidth and at least 32 vCPUs and 64 GiB of memory.

To get started, you can create or update an ASG or EC2 Fleet set up with ABS configuration and specify the network bandwidth attribute.

The following example shows an ABS configuration with network bandwidth attribute set to a minimum of 10 Gbps. In this example, we do not set a maximum limit for network bandwidth. This is done to remain flexible and avoid restricting available instance type choices that meet our minimum network bandwidth requirement.

Create the following configuration file and name it: my_asg_network_bandwidth_configuration.json

    "AutoScalingGroupName": "network-bandwidth-based-instances-asg",
    "DesiredCapacityType": "units",
    "MixedInstancesPolicy": {
        "LaunchTemplate": {
            "LaunchTemplateSpecification": {
                "LaunchTemplateName": "LaunchTemplate-x86",
                "Version": "$Latest"
            "Overrides": [
                "InstanceRequirements": {
                    "VCpuCount": {"Min": 32},
                    "MemoryMiB": {"Min": 65536},
                    "NetworkBandwidthGbps": {"Min": 10} }
        "InstancesDistribution": {
            "OnDemandPercentageAboveBaseCapacity": 30,
            "SpotAllocationStrategy": "capacity-optimized"
    "MinSize": 1,
    "MaxSize": 10,
    "VPCZoneIdentifier": "subnet-f76e208a, subnet-f76e208b, subnet-f76e208c"

Next, let us create an ASG using the following command:

my_asg_network_bandwidth_configuration.json file

aws autoscaling create-auto-scaling-group --cli-input-json file://my_asg_network_bandwidth_configuration.json

As a result, you have created an ASG that may include instance types m5.8xlarge, m5.12xlarge, m5.16xlarge, m5n.8xlarge, and c5.9xlarge, among others. The actual selection at the time of the request is made by capacity optimized Spot allocation strategy. If EC2 releases an instance type in the future that would satisfy the attributes provided in the request, that instance will also be automatically considered for provisioning.

Considered Instances (not an exhaustive list)

Instance Type        Network Bandwidth
m5.8xlarge             “10 Gbps”

m5.12xlarge           “12 Gbps”

m5.16xlarge           “20 Gbps”

m5n.8xlarge          “25 Gbps”

c5.9xlarge               “10 Gbps”

c5.12xlarge             “12 Gbps”

c5.18xlarge             “25 Gbps”

c5n.9xlarge            “50 Gbps”

c5n.18xlarge          “100 Gbps”

Now let us focus our attention on another new attribute – allowed instance types.

How allowed instance types attribute works in ABS

As discussed earlier, ABS lets us provision compute infrastructure based on our application requirements instead of selecting specific EC2 instance types. Although this infrastructure agnostic approach is suitable for many workloads, some workloads, while having some instance type flexibility, still need to limit the selection to specific instance families, and/or generations due to reasons like licensing or compliance requirements, application performance benchmarking, and others. Furthermore, customers have asked us to provide the ability to restrict the auto-consideration of newly released instances types in their ABS configurations to meet their specific hardware qualification requirements before considering them for their workload. To provide this functionality, we added a new allowed instance types attribute to ABS.

The allowed instance types attribute allows ABS customers to narrow down the list of instance types that ABS considers for selection to a specific list of instances, families, or generations. It takes a comma separated list of specific instance types, instance families, and wildcard (*) patterns. Please note, that it does not use the full regular expression syntax.

For example, consider container-based web application that can only run on any 5th generation instances from compute optimized (c), general purpose (m), or memory optimized (r) families. It can be specified as “AllowedInstanceTypes”: [“c5*”, “m5*”,”r5*”].

Another example could be to limit the ABS selection to only memory-optimized instances for big data Spark workloads. It can be specified as “AllowedInstanceTypes”: [“r6*”, “r5*”, “r4*”].

Note that you cannot use both the existing exclude instance types and the new allowed instance types attributes together, because it would lead to a validation error.

Using allowed instance types attribute in ASG

Let us look at the InstanceRequirements section of an ASG configuration file for a sample web application. The AllowedInstanceTypes attribute is configured as [“c5.*”, “m5.*”,”c4.*”, “m4.*”] which means that ABS will limit the instance type consideration set to any instance from 4th and 5th generation of c or m families. Additional attributes are defined to a minimum of 4 vCPUs and 16 GiB RAM and allow both Intel and AMD processors.

Create the following configuration file and name it: my_asg_allow_instance_types_configuration.json

    "AutoScalingGroupName": "allow-instance-types-based-instances-asg",
    "DesiredCapacityType": "units",
    "MixedInstancesPolicy": {
        "LaunchTemplate": {
            "LaunchTemplateSpecification": {
                "LaunchTemplateName": "LaunchTemplate-x86",
                "Version": "$Latest"
            "Overrides": [
                "InstanceRequirements": {
                    "VCpuCount": {"Min": 4},
                    "MemoryMiB": {"Min": 16384},
                    "CpuManufacturers": ["intel","amd"],
                    "AllowedInstanceTypes": ["c5.*", "m5.*","c4.*", "m4.*"] }
        "InstancesDistribution": {
            "OnDemandPercentageAboveBaseCapacity": 30,
            "SpotAllocationStrategy": "capacity-optimized"
    "MinSize": 1,
    "MaxSize": 10,
    "VPCZoneIdentifier": "subnet-f76e208a, subnet-f76e208b, subnet-f76e208c"

As a result, you have created an ASG that may include instance types like m5.xlarge, m5.2xlarge, c5.xlarge, and c5.2xlarge, among others. The actual selection at the time of the request is made by capacity optimized Spot allocation strategy. Please note that if EC2 will in the future release a new instance type which will satisfy the other attributes provided in the request, but will not be a member of 4th or 5th generation of m or c families specified in the allowed instance types attribute, the instance type will not be considered for provisioning.

Selected Instances (not an exhaustive list)











As you can see, ABS considers a broad set of instance types for provisioning, however they all meet the compute attributes that are required for your workload.


To delete both ASGs and terminate all the instances, execute the following commands:

aws autoscaling delete-auto-scaling-group --auto-scaling-group-name network-bandwidth-based-instances-asg --force-delete

aws autoscaling delete-auto-scaling-group --auto-scaling-group-name allow-instance-types-based-instances-asg --force-delete


In this post, we explored the two new ABS attributes – network bandwidth and allowed instance types. Customers can use these attributes to select instances based on network bandwidth and to limit the set of instances that ABS selects from. The two new attributes, as well as the existing set of ABS attributes enable you to save time on creating and maintaining instance type flexible configurations and make it even easier to express the compute requirements of your workload.

ABS represents the paradigm shift in the way that our customers interact with compute, making it easier than ever to request diversified compute resources at scale. We recommend ABS as a tool to help you identify and access the largest amount of EC2 compute capacity for your instance type flexible workloads.

AWS Week in Review – November 7, 2022

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-week-in-review-november-7-2022/

With three weeks to go until AWS re:Invent opens in Las Vegas, the AWS News Blog Team is hard at work creating blog posts to share the latest launches and previews with you. As usual, we have a strong mix of new services, new features, and a surprise or two.

Last Week’s Launches
Here are some launches that caught my eye last week:

Amazon SNS Data Protection and Masking – After a quick public preview, this cool feature is now generally available. It uses pattern matching, machine learning models, and content policies to help protect data at scale. You can find many different kinds of personally identifiable information (PII) and protected health information (PHI) in message bodies and either block message delivery or mask (de-identify) the sensitive data, all in real-time and on a per-topic basis. To learn more, read the blog post or the message data protection documentation.

Amazon Textract Updates – This service extracts text, handwriting, and data from any document or image. This past week we updated the AnalyzeID function so that it can now extract the machine readable zone (MRZ) on passports issued by the United States, and we added the entire OCR output to the API response. We also updated the machine learning models that power the AnalyzeDocument function, with a focus on single-character boxed forms commonly found on tax and immigration documents. Finally, we updated the AnalyzeExpense function with support for new fields and higher accuracy for existing fields, bringing the total field count to more than 40.

Another Amazon Braket Processor – Our quantum computing service now supports Aquila, a new 256-qubit quantum computer from QuEra that is based on a programmable array of neutral Rubidium atoms. According to the What’s New, Aquila supports the Analog Hamiltonian Simulation (AHS) paradigm, allowing it to solve for the static and dynamic properties of quantum systems composed of many interacting particles.

Amazon S3 on Outposts – This service now lets you use additional S3 Lifecycle rules to optimize capacity management. You can expire objects as they age or are replaced with newer versions, with control at the bucket level, or for subsets defined by prefixes, object tags, or object sizes. There’s more info in the What’s New and in the S3 documentation.

AWS CloudFormation – There were two big updates last week: support for Amazon RDS Multi-AZ deployments with two readable standbys, and better access to detailed information on failed stack instances for operations on CloudFormation StackSets.

Amazon MemoryDB for Redis – You can now use data tiering as a lower cost way to to scale your clusters up to hundreds of terabytes of capacity. This new option uses a combination of instance memory and SSD storage in each cluster node, with all data stored durably in a multi-AZ transaction log. There’s more information in the What’s New and the blog post.

Amazon EC2 – You can now remove launch permissions for Amazon Machine Images (AMIs) that are directly shared with your AWS account.

X in Y – We launched existing AWS services and instance types in additional Regions:

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some additional news items that you may find interesting:

AWS Open Source News and Updates – My colleague Ricardo Sueiras highlights new open source projects, tools, and demos from the AWS Community. Read Installment 134 to see what’s going on!

New Case Study – A new AWS case study describes how Taggle (a company focused on smart water solutions in Australia) created an IoT platform that runs on AWS and uses Amazon Kinesis Data Streams to store & ingest data in real time. Using AWS allowed them to scale to accommodate 80,000 additional sensors that will roll out in 2022.

Upcoming AWS Events
re:Invent 2022AWS re:Invent is just three weeks away! Join us live from November 28th to December 2nd for keynotes, training and certification opportunities, and over 1,500 technical sessions. If you cannot make it to Las Vegas you can also join us online to watch the keynotes and leadership sessions live. Be sure to check out the re:Invent 2022 Attendee Guides, each curated by an AWS Hero, AWS industry team, or AWS partner.

PeerTalk – If you will be attending re:Invent in person and are interested in meeting with me or any of our featured experts, be sure to check out PeerTalk, our new onsite networking program.

That’s all for this week!


This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS.

How Wego secured developer connectivity to Amazon Relational Database Service instances

Post Syndicated from Adriaan de Jonge original https://aws.amazon.com/blogs/architecture/how-wego-secured-developer-connectivity-to-amazon-relational-database-service-instances/

How do you securely access Amazon Relational Database Service (Amazon RDS) instances from a developer’s laptop? Online travel marketplace, Wego, shares their journey from bastion hosts in the public subnet to lightweight VPN tunnels on top of Session Manager, a capability of AWS Systems Manager, using temporary access keys.

In this post, we explore how developers get access to allow-listed resources in their virtual private cloud (VPC) directly from their workstation, by tunnelling VPN over secure shell (SSH), which, in turn, is tunneled over Session Manager.

Note: This blog post is not intended as a step-by-step, how-to guide. Commands stated here are for illustrative purposes and may need customization.

Wego’s architecture before starting this journey

In 2021, Wego’s developer connectivity architecture was based on jump hosts in a public subnet, as illustrated in Figure 1.

Original Wego architecture

Figure 1. Original Wego architecture

Figure 1 demonstrates a network architecture with both public and private subnets. The public subnet contains an Amazon Elastic Compute Cloud (Amazon EC2) instance that serves as jump host. The diagram illustrates a VPN tunnel between the developer’s desktop and the VPC.

In Wego’s previous architecture, the jump host was connected to the internet for terminal access through the secure shell (SSH) protocol, which accepts traffic at Port 22. Despite restrictions to the allowed source IP addresses, exposing Port 22 to the internet can increase the likeliness of a security breach; it is possible to spoof (mimic) an allowed IP address and attempt a denial of service attack.

Moving the jump host to a private subnet with Session Manager

Session Manager helps minimize the likeliness of a security breach. Figure 2 demonstrates how Wego moved the jump host from a public subnet to a private subnet. In this architecture, Session Manager serves as the main entry point for incoming network traffic.

Wego's new architecture using Session Manager

Figure 2. Wego’s new architecture using Session Manager

We will explore how developers connect to Amazon RDS directly from their workstation in this architecture.

Tunnel TCP traffic through Session Manager

Session Manager is best known for its terminal access capability, but it can also tunnel TCP connections. This is helpful if you want to access EC2 instances from your local workstation (Figure 3).

Tunneling TCP traffic over Session Manager

Figure 3. Tunneling TCP traffic over Session Manager

Here’s an example command to forward traffic from local host Port 8888 to an EC2 instance:

$ aws ssm start-session --target <instance-id> \
  --document-name AWS-StartPortForwardingSession \
  --parameters '{"portNumber":["8888"], "localPortNumber":["8888"]}'

This assumes the target EC2 instance is configured with AWS Systems Manager connectivity.

Tunnel SSH traffic over Session Manager

SSH is a protocol built on top of TCP; therefore, you can tunnel SSH traffic similarly (Figure 4).

Tunneling SSH traffic over Session Manager

Figure 4. Tunneling SSH traffic over Session Manager

To allow a short-hand notation for SSH over SSM, add the following configuration to the ~/.ssh/config configuration file:

host i-* mi-*
    ProxyCommand sh -c "aws ssm start-session --target %h \
        --document-name AWS-StartSSHSession \
        --parameters 'portNumber=%p'"

You can now connect to the EC2 instance over SSH with the following command:

ssh -i <key-file> <username>@<ec2-instance-id>

For example:

ssh -i my_key [email protected]

Ideally, your key-file is a short-lived credential, as recommended by the AWS Well-Architected Framework, as it narrows the window of opportunity for a security breach. However, it can be tedious to manage short-lived credentials. This is where EC2 Instance Connect comes to the rescue!

Replace SSH keys with EC2 Instance Connect

EC2 Instance Connect is available both on the AWS console and the command line. It makes it easier to work with short-lived keys. On the command line, it allows us to install our own temporary access credentials into a private EC2 instance for the duration of 60 seconds (Figure 5).

Connecting to SSH with temporary keys

Figure 5. Connecting to SSH with temporary keys

Ensure the EC2 instance connect plugin is installed on your workstation:

pip3 install ec2instanceconnectcli

This blog post assumes you are using Amazon Linux on the EC2 instance with all pre-requisites installed. Make sure your IAM role or user has the required permissions.

To generate a temporary SSH key pair, insert:

$ ssh-keygen -t rsa -f my_key
$ ssh-add my_key

To install the public key into the EC2 instance, insert:

$ aws ec2-instance-connect send-ssh-public-key \
  --instance-id <instance-id> \
  --instance-os-user <username> \
  --ssh-public-key <location ssh key public key> \
  --availability-zone <availabilityzone> \
  --region <region>

For example:

$ aws ec2-instance-connect send-ssh-public-key \
  --instance-id i-1234567890abcdef0 \
  --instance-os-user ec2-user \
  --ssh-public-key file://my_key.pub \
  --availability-zone ap-southeast-1b \
  --region ap-southeast-1

Connect to the EC2 instance within 60 seconds and delete the key after use.

Tunneling VPN over SSH, then over Session Manager

In this section, we adopt a third-party, open-source tool that is not supported by AWS, called sshuttle. sshuttle is a transparent proxy server that works as a VPN over SSH. It is based on Python and released under the LGPL 2.1 license. It runs across a wide range of Linux distributions and on macOS (Figure 6).

Tunneling VPN over SSH over Session Manager

Figure 6. Tunneling VPN over SSH over Session Manager

Why do we need to tunnel VPN over SSH, rather than using the earlier TCP over Session Manager? Keep in mind that the developer’s goal is to connect to Amazon RDS, not Amazon EC2. The SSM tunnel only works for connections to EC2 instances, not Amazon RDS.

A lightweight VPN solution, like sshuttle, bridges this gap by allowing you to forward traffic from Amazon EC2 to Amazon RDS. From the developer’s perspective, this works transparently, as if it is regular network traffic.

To install sshuttle, use one of the documented commands:

$ pip3 install sshuttle

To start sshuttle, use the following command pattern:

$ sshuttle -r <username>@<instance-id> <private CIDR range>

For example:

$ sshuttle -r [email protected]

Make sure the security group for the RDS DB instance allows network access from the jump host. You can now connect directly from the developer’s workstation to the RDS DB instance based on its IP address.

Advantages of this architecture

In this blog post, we layered a VPN over SSH that, in turn, is layered over Session Manager, plus we used temporary SSH keys.

Wego designed this architecture, and it was practical and stable for day-to-day use. They found that this solution runs at lower cost than AWS Client VPN and is sufficient for the use case of developers accessing online development environments.

Wego’s new architecture has a number of advantages, including:

  • More easily connecting to workloads in private and isolated subnets
  • Inbound security group rules are not required for the jump host, as Session Manager is an outbound connection
  • Access attempts are logged in AWS CloudTrail
  • Access control uses standard IAM policies, including tag-based resource access
  • Security groups and network access control lists still apply to “allow” or “deny” traffic to specific destinations
  • SSH keys are installed only temporarily for 60 seconds through EC2 Instance Connect


In this blog post, we explored Wego’s access patterns that can help you reduce your exposure to potential security attacks. Whether you adopt Wego’s full architecture or only adopt intermediary steps (like SSH over Session Manager and EC2 Instance Connect), reducing exposure to the public subnet and shortening the lifetime of access credentials can improve your security posture!

Further reading

AWS Week in Review – October 31, 2022

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-week-in-review-october-31-2022/

No tricks, just treats in this weekly roundup of news and announcements. Let’s switch our AWS Management Console into dark mode and dive right into it.

Last Week’s Launches
Here are some launches that got my attention during the previous week:

AWS Local Zones in Hamburg and Warsaw now generally available – AWS Local Zones help you run latency-sensitive applications closer to end users. The AWS Local Zones in Hamburg, Germany, and Warsaw, Poland, are the first Local Zones in Europe. AWS Local Zones are now generally available in 20 metro areas globally, with announced plans to launch 33 additional Local Zones in metro areas around the world. See the full list of available and announced AWS Local Zones, and learn how to get started.

Amazon SageMaker multi-model endpoint (MME) now supports GPU instances – MME is a managed capability of SageMaker Inference that lets you deploy thousands of models on a single endpoint. MMEs can now run multiple models on a GPU core, share GPU instances behind an endpoint across multiple models, and dynamically load and unload models based on the incoming traffic. This can help you reduce costs and achieve better price performance. Learn how to run multiple deep learning models on GPU with Amazon SageMaker multi-model endpoints.

Amazon EC2 now lets you replace the root Amazon EBS volume for a running instance – You can now use the Replace Root Volume for patching features in Amazon EC2 to replace your instance root volume using an updated AMI without needing to stop the instance. This makes patching of the guest operating system and applications easier, while retraining the instance store data, networking, and IAM configuration. Check out the documentation to learn more.

AWS Fault Injection Simulator now supports network connectivity disruption – AWS Fault Injection Simulator (FIS) is a managed service for running controlled fault injection experiments on AWS. AWS FIS now has a new action type to disrupt network connectivity and validate that your applications are resilient to a total or partial loss of connectivity. To learn more, visit Network Actions in the AWS FIS user guide.

Amazon SageMaker Automatic Model Tuning now supports Grid Search – SageMaker Automatic Model Tuning helps you find the hyperparameter values that result in the best-performing model for a chosen metric. Until now, you could choose between random, Bayesian, and hyperband search strategies. Grid search now lets you cover every combination of the specified hyperparameter values for use cases in which you need reproducible tuning results. Learn how Amazon SageMaker Automatic Model Tuning now supports grid search.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some additional news items that you may find interesting:

Celebrating over 20 years of AI/ML innovation – On October 25, we hosted the AWS AI/ML Innovation Day. Bratin Saha and other leaders in the field shared the great strides we have made in the past and discussed what’s next in the world of ML. You can watch the recording here.

AWS open-source news and updates – My colleague Ricardo Sueiras writes this weekly open-source newsletter in which he highlights new open-source projects, tools, and demos from the AWS Community. Read edition #133 here.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

AWS re:Invent is only 4 weeks away! Join us live in Las Vegas from November 28–December 2 for keynote announcements, training and certification opportunities, access to 1,500+ technical sessions, and much more. Seats are still available to reserve, and walk-ups are available onsite. You can also join us online to watch live keynotes and leadership sessions.

If you are into machine learning like me, check out the ML attendee guide. AWS Machine Learning Hero Vinicius Caridá put together recommended sessions and tips and tricks for building your agenda. We also have attendee guides on additional topics and industries.

On November 2, there is a virtual event for building modern .NET applications on AWS. You can register for free.

On November 11–12, AWS User Groups in India are hosting the AWS Community Day India 2022, with success stories, use cases, and much more from industry leaders. Sign up for free to join this virtual event.

That’s all for this week. Check back next Monday for another Week in Review!

— Antje

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

How USAA built an Amazon S3 malware scanning solution

Post Syndicated from Jonathan Nguyen original https://aws.amazon.com/blogs/architecture/how-usaa-built-an-amazon-s3-malware-scanning-solution/

United Services Automobile Association (USAA) is a San Antonio-based insurance, financial services, banking, and FinTech company supporting millions of military members and their families. USAA has partnered with Amazon Web Services (AWS) to digitally transform and build multiple USAA solutions that help keep members safe and save members money and time.

Why build a S3 malware scanning solution?

As complex companies’ businesses continue to grow, there may be an increased need for collaboration and interactions with outside vendors. Prior to developing an Amazon Simple Storage Solution (Amazon S3) scanning solution, a security review and approval process for application teams to ingest data into an AWS Organization from external vendors’ AWS accounts may be warranted, to ensure additional threats are not being introduced. This could result in a lengthy review and exception process, and subsequently, could hinder the velocity of application teams’ collaboration with external vendors.

USAA security standards, like those of most companies, require all data from external vendors to be treated as untrusted, and therefore must be scanned by an antivirus or antimalware solution prior to being ingested by downstream processes within the AWS environment. Companies looking to automate the scanning process may want to consider a solution where all incoming external data flow through a demilitarized drop zone to be scanned, and subsequently released to downstream processes if malware and viruses are not detected.

S3 malware scanning solution overview

Dedicated AWS accounts should be provisioned for specific data classifications and used as a demilitarized zone (DMZ) for an untrusted staging area. The solution discussed in this blog uses a dedicated staging AWS account that controls the release of Amazon S3 objects to other AWS accounts within an AWS Organization. AWS accounts within an AWS Organization should follow security best practices in terms of infrastructure, networking, logging, and security. External vendors should explicitly be given limited permissions to appropriate resources in their respective staging S3 bucket.

A staging S3 bucket should have specific resource policies restricting which applications and identity and access management (IAM) principals can interact with S3 objects using object attributes, such as object tags, to determine whether an object has been scanned, and what the results of that scan are. Additional guardrails are implemented using Service Control Policies (SCP) to restrict authorized IAM principals to create or modify S3 object attributes (Figure 1).

Amazon S3 antivirus and antimalware scanning architecture workflow

Figure 1. Amazon S3 antivirus and antimalware scanning architecture workflow

  1. The external vendor copies an object to the staging S3 bucket.
  2. The staging S3 bucket has event notifications configured and generates an event.
  3. The S3 PutObject event is sent to an Object Created Amazon Simple Queue Service (Amazon SQS) queue topic.
  4. An Amazon Elastic Compute Cloud (Amazon EC2) Auto Scaling group is configured to scale based on messages in the Object Created SQS queue.
  5. An antivirus and antimalware scanning service application on the Amazon EC2 instances takes the following actions on objects within the Object Created Amazon SQS queue:
    a. Tag the S3 object with an “In Progress” status.
    b. Get the object from the Staging S3 bucket and stores it in a local ephemeral file system.
    c. Scan the copied object using antivirus or antimalware tool.
    d. Based on the antivirus or antimalware scan results, tag the S3 object with the scan results (for example, No_Malware_Detected vs. Malware_Detected).
    e. Create and publish a payload to the Object Scanned Amazon Simple Notification Service (Amazon SNS) topic, allowing application team filtering.
    f. Delete the message from the Object Created SQS queue.
  6. Application teams are subscribed to the Object Scanned SNS topic with a filter for their application.
  7. For any objects where a virus or malware is detected, a company can use its cyber threat response team to conduct a thorough analysis and take appropriate actions.

USAA built a custom anti-virus and anti-malware scanning application using EC2 instances, using a private, hardened Amazon Machine Image (AMI). For cost-efficacy purposes, the EC2 automatic scaling event can be configured based on Object Created SQS queue depth and Service Level Objective (SLO). A serverless version of an anti-virus and anti-malware solution can be used instead of an EC2 application, depending on your specific use-case and other factors. Some important factors include antivirus and antimalware tool serverless support, resource tuning and configuration requirements, and additional AWS services to manage that could possibly result in a bottleneck. If your enterprise is going with a serverless approach, you can use open-source tools such as ClamAV using Lambda functions.

In the event of an infected object, proper guardrails and response mechanisms need to be in place. USAA teams have developed playbooks to monitor the health and performance of S3 scanning solution, as well as responding to detected virus or malware.

This cloud native, event-driven solution has benefited multiple USAA application teams who have previously requested the ability to ingest data into AWS workloads from teams outside of USAA’s AWS Organization, and allowed additional capabilities and functionality to better serve their members. To enhance this solution even further, USAA’s security team plans to incorporate additional mechanisms to find specific objects that either failed or required additional processing, without having to scan all objects in the buckets. This can be accomplished by including an additional AWS Lambda function and Amazon DynamoDB table to track object metadata as objects get added to the Object Created SQS queue for processing. The metadata could possibly include information such as S3 bucket origin, S3 object key, version ID, scan status, and the original S3 event payload to replay the event into the Object Created SQS queue. The Lambda function primarily ensures the DynamoDB table is kept up to date as objects are processed, as well as handling issues for objects that may need to be reprocessed. The DynamoDB table also has time-to-live (TTL) configured to clear records as they expire from the Staging S3 bucket.


In this post, we reviewed how USAA’s Public Cloud Security team facilitated collaboration and interactions with external vendors and AWS workloads securely by creating a scalable solution to scan S3 objects for virus and malware prior to releasing objects downstream. The solution uses native AWS services and can be utilized for any use-cases requiring antivirus or antimalware capabilities. Because the S3 object scanning solution uses EC2 instances, you can use your existing antivirus or antimalware enterprise tool.

Building highly resilient applications with on-premises interdependencies using AWS Local Zones

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/building-highly-resilient-applications-with-on-premises-interdependencies-using-aws-local-zones/

This blog post is written by Rachel Rui Liu, Senior Solutions Architect.

AWS Local Zones are a type of infrastructure deployment that places compute, storage, database, and other select AWS services close to large population and industry centers.

Following the successful launch of the AWS Local Zones in 16 US cities since 2019, in Feb 2022, AWS announced plans to launch new AWS Local Zones in 32 metropolitan areas in 26 countries worldwide.

With Local Zones, we’ve seen use cases in two common categories.

The first category of use cases is for workloads that require extremely low latency between end-user devices and workload servers. For example, let’s consider media content creation and real-time multiplayer gaming. For these use cases, deploying the workload to a Local Zone can help achieve down to single-digit milliseconds latency between end-user devices and the AWS infrastructure, which is ideal for a good end-user experience.

This post will focus on addressing the second category of use cases, which is commonly seen in an enterprise hybrid architecture, where customers must achieve low latency between AWS infrastructure and existing on-premises data centers.  Compared to the first category of use cases, these use cases can tolerate slightly higher latency between the end-user devices and the AWS infrastructure. However, these workloads have dependencies to these on-premises systems, so the lowest possible latency between AWS infrastructure and on-premises data centers is required for better application performance. Here are a few examples of these systems:

  • Financial services sector mainframe workloads hosted on premises serving regional customers.
  • Enterprise Active Directory hosted on premise serving cloud and on-premises workloads.
  • Enterprise applications hosted on premises processing a high volume of locally generated data.

For workloads deployed in AWS, the time taken for each interaction with components still hosted in the on-premises data center is increased by the latency. In turn, this delays responses received by the end-user. The total latency accumulates and results in suboptimal user experiences.

By deploying modernized workloads in Local Zones, you can reduce latency while continuing to access systems hosted in on-premises data centers, thereby reducing the total latency for the end-user. At the same time, you can enjoy the benefits of agility, elasticity, and security offered by AWS, and can apply the same automation, compliance, and security best practices that you’ve been familiar with in the AWS Regions.

Enterprise workload resiliency with Local Zones

While designing hybrid architectures with Local Zones, resiliency is an important consideration. You want to route traffic to the nearest Local Zone for low latency. However, when disasters happen, it’s critical to fail over to the parent Region automatically.

Let’s look at the details of hybrid architecture design based on real world deployments from different angles to understand how the architecture achieves all of the design goals.

Hybrid architecture with resilient network connectivity

The following diagram shows a high-level overview of a resilient enterprise hybrid architecture with Local Zones, where you have redundant connections between the AWS Region, the Local Zone, and the corporate data center.

resillient network connectivity

Here are a few key points with this network connectivity design:

  1. Use AWS Direct Connect or Site-to-Site VPN to connect the corporate data center and AWS Region.
  2. Use Direct Connect or self-hosted VPN to connect the corporate data center and the Local Zone. This connection will provide dedicated low-latency connectivity between the Local Zone and corporate data center.
  3. Transit Gateway is a regional service. When attaching the VPC to AWS Transit Gateway, you can only add subnets provisioned in the Region. Instances on subnets in the Local Zone can still use Transit Gateway to reach resources in the Region.
  4. For subnets provisioned in the Region, the VPC route table should be configured to route the traffic to the corporate data center via Transit Gateway.
  5. For subnets provisioned in Local Zone, the VPC route table should be configured to route the traffic to the corporate data center via the self-hosted VPN instance or Direct Connect.

Hybrid architecture with resilient workload deployment

The next examples show a public and a private facing workload.

To simplify the diagram and focus on application layer architecture, the following diagrams assume that you are using Direct Connect to connect between AWS and the on-premises data center.

Example 1: Resilient public facing workload

With a public facing workload, end-user traffic will be routed to the Local Zone. If the Local Zone is unavailable, then the traffic will be routed to the Region automatically using an Amazon Route 53 failover policy.

public facing workload resilliency
Here are the key design considerations for this architecture:

  1. Deploy the workload in the Local Zone and put the compute layer in an AWS AutoScaling Group, so that the application can scale up and down depending on volume of requests.
  2. Deploy the workload in both the Local Zone and an AWS Region, and put the compute layer into an autoscaling group. The regional deployment will act as pilot light or warm standby with minimal footprint. But it can scale out when the Local Zone is unavailable.
  3. Two Application Load Balancers (ALBs) are required: one in the Region and one in the Local Zone. Each ALB will dispatch the traffic to each workload cluster inside the autoscaling group local to it.
  4. An internet gateway is required for public facing workloads. When using a Local Zone, there’s no extra configuration needed: define a single internet gateway and attach it to the VPC.

If you want to specify an Elastic IP address to be the workload’s public endpoint, the Local Zone will have a different address pool than the Region. Noting that BYOIP is unsupported for Local Zones.

  1. Create a Route 53 DNS record with “Failover” as the routing policy.
  • For the primary record, point it to the alias of the ALB in the Local Zone. This will set Local Zone as the preferred destination for the application traffic which minimizes latency for end-users.
  • For the secondary record, point it to the alias of the ALB in the AWS Region.
  • Enable health check for the primary record. If health check against the primary record fails, which indicates that the workload deployed in the Local Zone has failed to respond, then Route 53 will automatically point to the secondary record, which is the workload deployed in the AWS Region.

Example 2: Resilient private workload

For a private workload that’s only accessible by internal users, a few extra considerations must be made to keep the traffic inside of the trusted private network.

private workload resilliency

The architecture for resilient private facing workload has the same steps as public facing workload, but with some key differences. These include:

  1. Instead of using a public hosted zone, create private hosted zones in Route 53 to respond to DNS queries for the workload.
  2. Create the primary and secondary records in Route 53 just like the public workload but referencing the private ALBs.
  3. To allow end-users onto the corporate network (within offices or connected via VPN) to resolve the workload, use the Route 53 Resolver with an inbound endpoint. This allows end-users located on-premises to resolve the records in the private hosted zone. Route 53 Resolver is designed to be integrated with an on-premises DNS server.
  4. No internet gateway is required for hosting the private workload. You might need an internet gateway in the Local Zone for other purposes: for example, to host a self-managed VPN solution to connect the Local Zone with the corporate data center.

Hosting multiple workloads

Customers who host multiple workloads in a single VPC generally must consider how to segregate those workloads. As with workloads in the AWS Region, segregation can be implemented at a subnet or VPC level.

If you want to segregate workloads at the subnet level, you can extend your existing VPC architecture by provisioning extra sets of subnets to the Local Zone.

segregate workloads at subnet level

Although not shown in the diagram, for those of you using a self-hosted VPN to connect the Local Zone with an on-premises data center, the VPN solution can be deployed in a centralized subnet.

You can continue to use security groups, network access control lists (NACLs) , and VPC route tables – just as you would in the Region to segregate the workloads.

If you want to segregate workloads at the VPC level, like many of our customers do, within the Region, inter-VPC routing is generally handled by Transit Gateway. However, in this case, it may be undesirable to send traffic to the Region to reach a subnet in another VPC that is also extended to the Local Zone.

segregate workloads at VPC level

Key considerations for this design are as follows:

  1. Direct Connect is deployed to connect the Local Zone with the corporate data center. Therefore, each VPC will have a dedicated Virtual Private Gateway provisioned to allow association with the Direct Connect Gateway.
  2. To enable inter-VPC traffic within the Local Zone, peer the two VPCs together.
  3. Create a VPC route table in VPC A. Add a route for Subnet Y where the destination is the peering link. Assign this route table to Subnet X.
  4. Create a VPC route table in VPC B. Add a route for Subnet X where the destination is the peering link. Assign this route table to Subnet Y.
  5. If necessary, add routes for on-premises networks and the transit gateway to both route tables.

This design allows traffic between subnets X and Y to stay within the Local Zone, thereby avoiding any latency from the Local Zone to the AWS Region while still permitting full connectivity to all other networks.


In this post, we summarized the use cases for enterprise hybrid architecture with Local Zones, and showed you:

  • Reference architectures to host workloads in Local Zones with low-latency connectivity to corporate data centers and resiliency to enable fail over to the AWS Region automatically.
  • Different design considerations for public and private facing workloads utilizing this hybrid architecture.
  • Segregation and connectivity considerations when extending this hybrid architecture to host multiple workloads.

Hopefully you will be able to follow along with these reference architectures to build and run highly resilient applications with local system interdependencies using Local Zones.

How Shiji Group created a global guest profile store on AWS

Post Syndicated from Maximilian Schellhorn original https://aws.amazon.com/blogs/architecture/how-shiji-group-created-a-global-guest-profile-store-on-aws/

Shiji Group provides global software solutions for the hospitality industry. The Shiji Enterprise Platform enables customers to manage large hotel property portfolios using software as a service (SaaS). Among functionalities such as reservations, housekeeping, finance, and integrations with external systems, the guest profile is a key aspect of the system. Besides personal information (such as name and address) and billing details, the guest profile can include room preferences and entertainment options.

A property portfolio can span multiple hotels across the globe, and each hotel location can offer better customer service by consolidating data. Once the guest gives their cross-border data processing consent (CBDPC), profile information can be shared between properties. This provides a centralized and seamless experience for the hotel guest no matter which hotel in the portfolio was chosen.

In the following blog post, you will explore the architecture of the guest profile store that replicates the profile across multiple geographic areas. We will review the single Region design first and its infrastructure components and architectural patterns. We will then show the evolution to a multi-Region architecture.

Single Region architecture with CQRS

The ability to find relevant guest profile data fast is essential in the day-to-day hospitality business. Therefore, the following architecture uses the command query responsibility segregation (CQRS) pattern to provide high scalability and rich full-text search capabilities without sacrificing performance. With CQRS, write requests (commands) are targeting a different service than read requests (queries). This allows systems to store an item (such as a profile) in a search-optimized format for serving reads, while providing a simple schema for writes.

The microservices for the guest profile architecture are operated as containers on Amazon Elastic Kubernetes Service (Amazon EKS). The write model of the guest profile is stored in an Amazon Relational Database Service (Amazon RDS) PostgreSQL database. A separate read model uses Amazon OpenSearch Service. For interservice communication, Shiji runs a self-managed Apache Kafka cluster on Amazon Elastic Compute Cloud (Amazon EC2).

The following diagram provides a walk through the single Region architecture:

Single Region architecture with CQRS

Figure 1. Single Region architecture with CQRS

  1. The front desk employee creates the Guest Profile upon first interaction with the hotel guest (name, address, billing, and room preferences).
  2. The request is routed to the Kong API Management Solution that is running in an Amazon EKS Kubernetes cluster. It acts as the single entry-point to the system. It identifies the type of request by parsing the URL and forwarding write requests to the profile-write-model-service.
  3. The service validates the request. It stores the data and ProfileCreated event in the PostgreSQL database, Amazon RDS.
  4. A change data capture (CDC) mechanism publishes the ProfileCreated event to an Apache Kafka Local Profiles topic.
  5. The profile-read-model-service subscribes to the Local Profiles topic and stores the profile in an optimized read format in Amazon OpenSearch. Whenever the hotel performs a guest profile search, results will now be provided via the profile-read-model-service.

Multi-Region networking setup

Shiji operates in multiple AWS Regions to provide low latency, regulatory requirements, and resilience across the globe. The previously presented single Region architecture can be replicated to multiple AWS Regions (eu-central-1 and ap-southeast-1, for example). Hotels with a given property portfolio that operate in the same Region can reuse the profile store of the Shiji Enterprise Platform. However, hotels that are being operated in a different AWS Region can be interconnected as well.

This is achieved by providing an AWS Transit Gateway in a separate networking account that connects the different Regions with a VPC attachment:

Multi-Region networking setup

Figure 2. Multi-Region networking setup

The account segregation provides an additional layer of flexibility to add further Regions in the future.

Multi-Region event replication

Upon first arrival, guests can choose to sign a cross-border-data processing consent (CBDPC). This permits the hotel to share the profile information globally. If accepted, the profile-write-model-service creates an additional ProfileCreated event that gets published to a GlobalProfilesEU Apache Kafka topic. This topic is accessible for subscribers in the target Region, which replicates relevant profiles into the local database as follows.

A replicator-service in the target Region (ap-souteast-1) is now able to subscribe to the GlobalProfileEU topic in (eu-central-1), via the established network connection from the previous section. It republishes the event to a local ReplicatedProfiles topic that the profile-write-model-service subscribes to and saves to the local database:

Event replication

Figure 3. Event replication

Putting it all together: The multi-Region guest profile store

The following diagram combines all the components from the previous sections. It provides an end-to-end look at the multi-Region guest profile architecture. Due to the event driven nature of the system, the architecture can be extended without changing the initial flow outlined in the single Region design.

Multi-Region guest profile architecture

Figure 4. Multi-Region guest profile architecture

  1. If the hotel guest signed a cross-border data processing consent (CBDPC), the ProfileCreated event will also be published to a Global Profiles topic.
  2. The replicator-service in the target Region (for example, ap-southeast-1) subscribes to the Global Profiles topic of the source Region (for example, eu-central-1). It then publishes the event to its local Replicated Profiles topic.
  3. The profile-write-model-service in the target Region subscribes to the Replicated Profiles topic and records the item in the Amazon RDS PostgreSQL database with information about the source Region. This will initiate the local replication similar to the single Region design, and therefore creates a consistent experience between both Regions.

Conclusion and outlook

In this blog post, we showed how Shiji built a modern multi-Region microservice architecture on AWS. You have learned about patterns such as CQRS, which provide a scalable solution for both read and write traffic. We’ve also shown what is needed to interconnect two physically separated Regions. With cross-border data processing consent (CBDPC), you have seen how the ownership of guest data can be secured and utilized. The single Region architecture already provided a solid baseline for this solution architecture. The event-driven nature of the system permitted us to add additional functionality for the final multi-Region architecture.

The ability to manage a global guest profile within the main system as well as at the property itself is a huge advantage for enterprise hotel companies. It permits hotels to deliver a unified experience to their guests no matter where the guest is within the hotel or on their journey. Food preferences, spa, room, and more, can all be managed from a single guest profile. This centralized information hasn’t been possible within the hotel’s property management system (PMS) until recently.

Visit Shiji Enterprise Platform for more information.

Adding approval notifications to EC2 Image Builder before sharing AMIs

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/adding-approval-notifications-to-ec2-image-builder-before-sharing-amis-2/

­­­­­This blog post is written by, Glenn Chia Jin Wee, Associate Cloud Architect, and Randall Han, Professional Services.

You may be required to manually validate the Amazon Machine Image (AMI) built from an Amazon Elastic Compute Cloud (Amazon EC2) Image Builder pipeline before sharing this AMI to other AWS accounts or to an AWS organization. Currently, Image Builder provides an end-to-end pipeline that automatically shares AMIs after they’ve been built.

In this post, we will walk through the steps to enable approval notifications before AMIs are shared with other AWS accounts. Image Builder supports automated image testing using test components. The recommended best practice is to automate test steps, however situations can arise where test steps become either challenging to automate or internal compliance policies mandate manual checks be conducted prior to distributing images. In such situations, having a manual approval step is useful if you would like to verify the AMI configuration before it is shared to other AWS accounts or an AWS Organization. A manual approval step reduces the potential for sharing an incorrectly configured AMI with other teams which can lead to downstream issues. This solution sends an email with a link to approve or reject the AMI. Users approve the AMI after they’ve verified that it is built according to specifications. Upon approving the AMI, the solution automatically shares it with the specified AWS accounts.

OverviewArchitecture Diagram

  1. In this solution, an Image Builder Pipeline is run that builds a Golden AMI in Account A. After the AMI is built, Image Builder publishes data about the AMI to an Amazon Simple Notification Service (Amazon SNS)
  2. The SNS Topic passes the data to an AWS Lambda function that subscribes to it.
  3. The Lambda function that subscribes to this topic retrieves the data, formats it, and then starts an SSM Automation, passing it the AMI Name and ID.
  4. The first step of the SSM Automation is a manual approval step. The SSM Automation first publishes to an SNS Topic that has an email subscription with the Approver’s email. The approver will receive the email with a URL that they can click to approve the step.
  5. The approval step defines a specific AWS Identity and Access Management (IAM) Role as an approver. This role has the minimum required permissions to approve the manual approval step. After performing manual tests on the Golden AMI, the Approver principal will assume this role.
  6. After assuming this role, the approver will click on the approval link that was sent via email. After approving the step, an AWS Lambda Function is triggered.
  7. This Lambda Function shares the Golden AMI with Account B and sends an email notifying the Target Account Recipients that the AMI has been shared.


For this walkthrough, you will need the following:

  • Two AWS accounts – one to host the solution resources, and the second which receives the shared Golden AMI.
    • In the account that hosts the solution, prepare an AWS Identity and Access Management (IAM) principal with the sts:AssumeRole permission. This principal must assume the IAM Role that is listed as an approver in the Systems Manager approval step. The ARN of this IAM principal is used in the AWS CloudFormation Approver parameter, This ARN is added to the trust policy of approval IAM Role.
    • In addition, in the account hosting the solution, ensure that the IAM principal deploying the CloudFormation template has the required permissions to create the resources in the stack.
  • A new Amazon Virtual Private Cloud (Amazon VPC) will be created from the stack. Make sure that you have fewer than five VPCs in the selected Region.


In this section, we will guide you through the steps required to deploy the Image Builder solution. The solution is deployed with CloudFormation.

In this scenario, we deploy the solution within the approver’s account. The approval email will be sent to a predefined email address for manual approval, before the newly created AMI is shared to target accounts.

The approver first assumes the approval IAM Role and then selects the approval link. This leads to the Systems Manager approval page. Upon approval, an email notification will be sent to the predefined target account email address, notifying the relevant stakeholders that the AMI has been successfully shared.

The high-level steps we will follow are:

  1. In Account A, deploy the provided AWS CloudFormation template. This includes an example Image Builder Pipeline, Amazon SNS topics, Lambda functions, and an SSM Automation Document.
  2. Approve the SNS subscription from your supplied email address.
  3. Run the pipeline from the Amazon EC2 Image Builder Console.
  4. [Optional] To conduct manual tests, launch an Amazon EC2 instance from the built AMI after the pipeline runs.
  5. An email will be sent to you with options to approve or reject the step. Ensure that you have assumed the IAM Role that is the approver before clicking the approval link that leads to the SSM console approval page.
  6. Upon approving the step, an AWS Lambda function shares the AMI to the Account B and also sends an email to the target account email recipients notifying them that the AMI has been shared.
  7. Log in to Account B and verify that the AMI has been shared.

Step 1: Deploy the AWS CloudFormation template

1. The CloudFormation template, template.yaml that deploys the solution can also found at this GitHub repository. Follow the instructions at the repository to deploy the stack.

Step 2: Verify your email address

  1. After running the deployment, you will receive an email prompting you to confirm the Subscription at the approver email address. Choose Confirm subscription.

SNS Topic Subscription confirmation email

  1. This leads to the following screen, which shows that your subscription is confirmed.


  1. Repeat the previous 2 steps for the target email address.

Step 3: Run the pipeline from the Image Builder console

  1. In the Image Builder console, under Image pipelines, select the checkbox next to the Pipeline created, choose Actions, and select Run pipeline.


Note: The pipeline takes approximately 20 – 30 minutes to complete.

Step 4: [Optional] Launch an Amazon EC2 instance from the built AMI

If you have a requirement to manually validate the AMI before sharing it with other accounts or to the AWS organization an approver will launch an Amazon EC2 instance from the built AMI and conduct manual tests on the EC2 instance to make sure it is functional.

  1. In the Amazon EC2 console, under Images, choose AMIs. Validate that the AMI is created.


  1. Follow AWS docs: Launching an EC2 instances from a custom AMI for steps on how to launch an Amazon EC2 instance from the AMI.

Step 5: Select the approval URL in the email sent

  1. When the pipeline is run successfully, you will receive another email with a URL to approve the AMI.


  1. Before clicking on the Approve link, you must assume the IAM Role that is set as an approver for the Systems Manager step.
  2. In the CloudFormation console, choose the stack that was deployed.


4. Choose Outputs and copy the IAM Role name.


5. While logged in as the IAM Principal that has permissions to assume the approval IAM Role, follow the instructions at AWS IAM documentation for switching a role using the console to assume the approval role.
In the Switch Role page, in Role paste the name of the IAM Role that you copied in the previous step.

Note: This IAM Role was deployed with minimum permissions. Hence, seeing warning messages in the console is expected after assuming this role.


6. Now in the approval email, select the Approve URL. This leads to the Systems Manager console. Choose Submit.


7. After approving the manual step, the second step is executed, which shares the AMI to the target account.


Step 6: Verify that the AMI is shared to Account B

  1. Log in to Account B.
  2. In the Amazon EC2 console, under Images, choose AMIs. Then, in the dropdown, choose Private images. Validate that the AMI is shared.


  1. Verify that a success email notification was sent to the target account email address provided.


Clean up

This section provides the necessary information for deleting various resources created as part of this post.

  1. Deregister the AMIs that were created and shared.
    1. Log in to Account A and follow the steps at AWS documentation: Deregister your Linux AMI.
  2. Delete the CloudFormation stack. For instructions, refer to Deleting a stack on the AWS CloudFormation console.


In this post, we explained how to enable approval notifications for an Image Builder pipeline before AMIs are shared to other accounts. This solution can be extended to share to more than one AWS account or even to an AWS organization. With this solution, you will be notified when new golden images are created, allowing you to verify the accuracy of their configuration before sharing them to for wider use. This reduces the possibility of sharing AMIs with misconfigurations that the written tests may not have identified.

We invite you to experiment with different AMIs created using Image Builder, and with different Image Builder components. Check out this GitHub repository for various examples that use Image Builder. Also check out this blog on Image builder integrations with EC2 Auto Scaling Instance Refresh. Let us know your questions and findings in the comments, and have fun!

Amazon EC2 Trn1 Instances for High-Performance Model Training are Now Available

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/amazon-ec2-trn1-instances-for-high-performance-model-training-are-now-available/

Deep learning (DL) models have been increasing in size and complexity over the last few years, pushing the time to train from days to weeks. Training large language models the size of GPT-3 can take months, leading to an exponential growth in training cost. To reduce model training times and enable machine learning (ML) practitioners to iterate fast, AWS has been innovating across chips, servers, and data center connectivity.

At AWS re:Invent 2021, we announced the preview of Amazon EC2 Trn1 instances powered by AWS Trainium chips. AWS Trainium is optimized for high-performance deep learning training and is the second-generation ML chip built by AWS, following AWS Inferentia.

Today, I’m excited to announce that Amazon EC2 Trn1 instances are now generally available! These instances are well-suited for large-scale distributed training of complex DL models across a broad set of applications, such as natural language processing, image recognition, and more.

Compared to Amazon EC2 P4d instances, Trn1 instances deliver 1.4x the teraFLOPS for BF16 data types, 2.5x more teraFLOPS for TF32 data types, 5x the teraFLOPS for FP32 data types, 4x inter-node network bandwidth, and up to 50 percent cost-to-train savings. Trn1 instances can be deployed in EC2 UltraClusters that serve as powerful supercomputers to rapidly train complex deep learning models. I’ll share more details on EC2 UltraClusters later in this blog post.

New Trn1 Instance Highlights
Trn1 instances are available today in two sizes and are powered by up to 16 AWS Trainium chips with 128 vCPUs. They provide high-performance networking and storage to support efficient data and model parallelism, popular strategies for distributed training.

Trn1 instances offer up to 512 GB of high-bandwidth memory, deliver up to 3.4 petaFLOPS of TF32/FP16/BF16 compute power, and feature an ultra-high-speed NeuronLink interconnect between chips. NeuronLink helps avoid communication bottlenecks when scaling workloads across multiple Trainium chips.

Trn1 instances are also the first EC2 instances to enable up to 800 Gbps of Elastic Fabric Adapter (EFA) network bandwidth for high-throughput network communication. This second generation EFA delivers lower latency and up to 2x more network bandwidth compared to the previous generation. Trn1 instances also come with up to 8 TB of local NVMe SSD storage for ultra-fast access to large datasets.

The following table lists the sizes and specs of Trn1 instances in detail.

Instance Name
vCPUs AWS Trainium Chips Accelerator Memory NeuronLink Instance Memory Instance Networking Local Instance Storage
trn1.2xlarge 8 1 32 GB N/A 32 GB Up to 12.5 Gbps 1x 500 GB NVMe
trn1.32xlarge 128 16 512 GB Supported 512 GB 800 Gbps 4x 2 TB NVMe

Trn1 EC2 UltraClusters
For large-scale model training, Trn1 instances integrate with Amazon FSx for Lustre high-performance storage and are deployed in EC2 UltraClusters. EC2 UltraClusters are hyperscale clusters interconnected with a non-blocking petabit-scale network. This gives you on-demand access to a supercomputer to cut model training time for large and complex models from months to weeks or even days.

Amazon EC2 Trn1 UltraCluster

AWS Trainium Innovation
AWS Trainium chips include specific scalar, vector, and tensor engines that are purpose-built for deep learning algorithms. This ensures higher chip utilization as compared to other architectures, resulting in higher performance.

Here is a short summary of additional hardware innovations:

  • Data Types: AWS Trainium supports a wide range of data types, including FP32, TF32, BF16, FP16, and UINT8, so you can choose the most suitable data type for your workloads. It also supports a new, configurable FP8 (cFP8) data type, which is especially relevant for large models because it reduces the memory footprint and I/O requirements of the model.
  • Hardware-Optimized Stochastic Rounding: Stochastic rounding achieves close to FP32-level accuracy with faster BF16-level performance when you enable auto-casting from FP32 to BF16 data types. Stochastic rounding is a different way of rounding floating-point numbers, which is more suitable for machine learning workloads versus the commonly used Round Nearest Even rounding. By setting the environment variable NEURON_RT_STOCHASTIC_ROUNDING_EN=1 to use stochastic rounding, you can train a model up to 30 percent faster.
  • Custom Operators, Dynamic Tensor Shapes: AWS Trainium also supports custom operators written in C++ and dynamic tensor shapes. Dynamic tensor shapes are key for models with unknown input tensor sizes, such as models processing text.

AWS Trainium shares the same AWS Neuron SDK as AWS Inferentia, making it easy for everyone who is already using AWS Inferentia to get started with AWS Trainium.

For model training, the Neuron SDK consists of a compiler, framework extensions, a runtime library, and developer tools. The Neuron plugin natively integrates with popular ML frameworks, such as PyTorch and TensorFlow.

The AWS Neuron SDK supports just-in-time (JIT) compilation, in addition to ahead-of-time (AOT) compilation, to speed up model compilation, and Eager Debug Mode, for a step-by-step execution.

To compile and run your model on AWS Trainium, you need to change only a few lines of code in your training script. You don’t need to tweak your model or think about data type conversion.

Get Started with Trn1 Instances
In this example, I train a PyTorch model on an EC2 Trn1 instance using the available PyTorch Neuron packages. PyTorch Neuron is based on the PyTorch XLA software package and enables conversion of PyTorch operations to AWS Trainium instructions.

Each AWS Trainium chip includes two NeuronCore accelerators, which are the main neural network compute units. With only a few changes to your training code, you can train your PyTorch model on AWS Trainium NeuronCores.

SSH into the Trn1 instance and activate a Python virtual environment that includes the PyTorch Neuron packages. If you’re using a Neuron-provided AMI, you can activate the preinstalled environment by running the following command:

source aws_neuron_venv_pytorch_p36/bin/activate

Before you can run your training script, you need to make a few modifications. On Trn1 instances, the default XLA device should be mapped to a NeuronCore.

Let’s start by adding the PyTorch XLA imports to your training script:

import torch, torch_xla
import torch_xla.core.xla_model as xm

Then, place your model and tensors onto an XLA device:


When the model is moved to the XLA device (NeuronCore), subsequent operations on the model are recorded for later execution. This is XLA’s lazy execution which is different from PyTorch’s eager execution. Within the training loop, you have to mark the graph to be optimized and run on the XLA device using xm.mark_step(). Without this mark, XLA cannot determine where the graph ends.

for data, target in train_loader:
	output = model(data)
	loss = loss_fn(output, target)

You can now run your training script using torchrun <my_training_script>.py.

When running the training script, you can configure the number of NeuronCores to use for training by using torchrun –nproc_per_node.

For example, to run a multi-worker data parallel model training on all 32 NeuronCores in one trn1.32xlarge instance, run torchrun --nproc_per_node=32 <my_training_script>.py.

Data parallel is a strategy for distributed training that allows you to replicate your script across multiple workers, with each worker processing a portion of the training dataset. The workers then share their result with each other.

For more details on supported ML frameworks, model types, and how to prepare your model training script for large-scale distributed training across trn1.32xlarge instances, have a look at the AWS Neuron SDK documentation.

Profiling Tools
Let’s have a quick look at useful tools to keep track of your ML experiments and profile Trn1 instance resource consumption. Neuron integrates with TensorBoard to track and visualize your model training metrics.

AWS Neuron SDK TensorBoard integration

On the Trn1 instance, you can use the neuron-ls command to describe the number of Neuron devices present in the system, along with the associated NeuronCore count, memory, connectivity/topology, PCI device information, and the Python process that currently has ownership of the NeuronCores:

AWS Neuron SDK neuron-ls command

Similarly, you can use the neuron-top command to see a high-level view of the Neuron environment. This shows the utilization of each of the NeuronCores, any models that are currently loaded onto one or more NeuronCores, process IDs for any processes that are using the Neuron runtime, and basic system statistics relating to vCPU and memory usage.

AWS Neuron SDK neuron-top command

Available Now
You can launch Trn1 instances today in the AWS US East (N. Virginia) and US West (Oregon) Regions as On-Demand, Reserved, and Spot Instances or as part of a Savings Plan. As usual with Amazon EC2, you pay only for what you use. For more information, see Amazon EC2 pricing.

Trn1 instances can be deployed using AWS Deep Learning AMIs, and container images are available via managed services such as Amazon SageMaker, Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Container Service (Amazon ECS), and AWS ParallelCluster.

To learn more, visit our Amazon EC2 Trn1 instances page, and please send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

— Antje

Best Practices for Hosting Regulated Gaming Workloads in AWS Local Zones and on AWS Outposts

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/best-practices-for-hosting-regulated-gaming-workloads-in-aws-local-zones-and-on-aws-outposts/

This blog post is written by Shiv Bhatt, Manthan Raval, and Pawan Matta, who are Senior Solutions Architects with AWS.

Many industries are subject to regulations that are created to protect the interests of the various stakeholders. For some industries, the specific details of the regulatory requirements influence not only the organization’s operations, but also their decisions for adopting new technology. In this post, we highlight the workload residency challenges that you may encounter when you deploy regulated gaming workloads, and how AWS Local Zones and AWS Outposts can help you address those challenges.

Regulated gaming workloads and residency requirements

A regulated gaming workload is a type of workload that’s subject to federal, state, local, or tribal laws related to the regulation of gambling and real money gaming. Examples of these workloads include sports betting, horse racing, casino, poker, lottery, bingo, and fantasy sports. The operators provide gamers with access to these workloads through online and land-based channels, and they’re required to follow various regulations required in their jurisdiction. Some regulations define specific workload residency requirements, and depending on the regulatory agency, the regulations could require that workloads be hosted within a specific city, state, province, or country. For example, in the United States, different state and tribal regulatory agencies dictate whether and where gaming operations are legal in a state, and who can operate. The agencies grant licenses to the operators of regulated gaming workloads, which then govern who can operate within the state, and sometimes, specifically where these workloads can be hosted. In addition, federal legislation can also constrain how regulated gaming workloads can be operated. For example, the United States Federal Wire Act makes it illegal to facilitate bets or wagers on sporting events across state lines. This regulation requires that operators make sure that users who place bets in a specific state are also within the borders of that state.

Benefits of using AWS edge infrastructure with regulated gaming workloads

The use of AWS edge infrastructure, specifically Local Zones and Outposts to host a regulated gaming workload, can help you meet workload residency requirements. You can manage Local Zones and Outposts by using the AWS Management Console or by using control plane API operations, which lets you seamlessly consume compute, storage, and other AWS services.

Local Zones

Local Zones are a type of AWS infrastructure deployment that place compute, storage, database, and other select services closer to large population, industry, and IT centers. Like AWS Regions, Local Zones enable you to innovate more quickly and bring new products to market sooner without having to worry about hardware and data center space procurement, capacity planning, and other forms of undifferentiated heavy-lifting. Local Zones have their own connections to the internet, and support AWS Direct Connect, so that workloads hosted in the Local Zone can serve local end-users with very low-latency communications. Local Zones are by default connected to a parent Region via Amazon’s redundant and high-bandwidth private network. This lets you extend Amazon Virtual Private Cloud (Amazon VPC) in the AWS Region to Local Zones. Furthermore, this provides applications hosted in AWS Local Zones with fast, secure, and seamless access to the broader portfolio of AWS services in the AWS Region. You can see the full list of AWS services supported in Local Zones on the AWS Local Zones features page.

You can start using Local Zones right away by enabling them in your AWS account. There are no setup fees, and as with the AWS Region, you pay only for the services that you use. There are three ways to pay for Amazon Elastic Compute Cloud (Amazon EC2) instances in Local Zones: On-Demand, Savings Plans, and Spot Instances. See the full list of cities where Local Zones are available on the Local Zones locations page.


Outposts is a family of fully-managed solutions that deliver AWS infrastructure and services to most customer data center locations for a consistent hybrid experience. For a full list of countries and territories where Outposts is available, see the Outposts rack FAQs and Outposts servers FAQs. Outposts is available in various form factors, from 1U and 2U Outposts servers to 42U Outposts racks, and multiple rack deployments. To learn more about specific configuration options and pricing, see Outposts rack and Outposts servers.

You configure Outposts to work with a specific AWS Region using AWS Direct Connect or an internet connection, which lets you extend Amazon VPC in the AWS Region to Outposts. Like Local Zones, this provides applications hosted on Outposts with fast, secure, and seamless access to the broader portfolio of AWS services in the AWS Region. See the full list of AWS services supported on Outposts rack and on Outposts servers.

Choosing between AWS Regions, Local Zones, and Outposts

When you build and deploy a regulated gaming workload, you must assess the residency requirements carefully to make sure that your workload complies with regulations. As you make your assessment, we recommend that you consider separating your regulated gaming workload into regulated and non-regulated components. For example, for a sports betting workload, the regulated components might include sportsbook operation, and account and wallet management, while non-regulated components might include marketing, the odds engine, and responsible gaming. In describing the following scenarios, it’s assumed that regulated and non-regulated components must be fault-tolerant.

For hosting the non-regulated components of your regulated gaming workload, we recommend that you consider using an AWS Region instead of a Local Zone or Outpost. An AWS Region offers higher availability, larger scale, and a broader selection of AWS services.

For hosting regulated components, the type of AWS infrastructure that you choose will depend on which of the following scenarios applies to your situation:

  1. Scenario one: An AWS Region is available in your jurisdiction and local regulators have approved the use of cloud services for your regulated gaming workload.
  2. Scenario two: An AWS Region isn’t available in your jurisdiction, but a Local Zone is available, and local regulators have approved the use of cloud services for your regulated gaming workload.
  3. Scenario three: An AWS Region or Local Zone isn’t available in your jurisdiction, or local regulators haven’t approved the use of cloud services for your regulated gaming workload, but Outposts is available.

Let’s look at each of these scenarios in detail.

Scenario one: Use an AWS Region for regulated components

When local regulators have approved the use of cloud services for regulated gaming workloads, and an AWS Region is available in your jurisdiction, consider using an AWS Region rather than a Local Zone and Outpost. For example, in the United States, the State of Ohio has announced that it will permit regulated gaming workloads to be deployed in the cloud on infrastructure located within the state when sports betting goes live in January 2023. By using the US East (Ohio) Region, operators in the state don’t need to procure and manage physical infrastructure and data center space. Instead, they can use various compute, storage, database, analytics, and artificial intelligence/machine learning (AI/ML) services that are readily available in the AWS Region. You can host a regulated gaming workload entirely in a single AWS Region, which includes Availability Zones (AZs) – multiple, isolated locations within each AWS Region. By deploying your workload redundantly across at least two AZs, you can help make sure of the high availability, as shown in the following figure.

AWS Region hosting regulated and non-regulated components

Scenario two: Use a Local Zone for regulated components

A second scenario might be that local regulators have approved the use of cloud services for regulated gaming workloads, and an AWS Region isn’t available in your jurisdiction, but a Local Zone is available. In this scenario, consider using a Local Zone rather than Outposts. A Local Zone can support more elasticity in a more cost-effective way than Outposts can. However, you might also consider using a Local Zone and Outposts together to increase availability and scalability for regulated components. Let’s consider the State of Illinois, in the United States, which allows regulated gaming workloads to be deployed in the cloud, if workload residency requirements are met. Operators in this state can host regulated components in a Local Zone in Chicago, and they can also use Outposts in their data center in the same state, for high availability and disaster recovery, as shown in the following figure.

Route ingress gaming traffic through an AWS Region hosting non-regulated components, with a Local Zone and Outposts hosting regulated components

Scenario three: Use of Outposts for regulated components

When local regulators haven’t approved the use of cloud services for regulated gaming workloads, or when an AWS Region or Local Zone isn’t available in your jurisdiction, you can still choose to host your regulated gaming workloads on Outposts for a consistent cloud experience, if Outposts is available in your jurisdiction. If you choose to use Outposts, then note that as part of the shared responsibility model, customers are responsible for attesting to physical security and access controls around the Outpost, as well as environmental requirements for the facility, networking, and power. Use of Outposts requires you to procure and manage the data center within the city, state, province, or country boundary (as required by local regulations) that may be suitable to host regulated components, depending on the jurisdiction. Furthermore, you should procure and configure supported network connections between Outposts and the parent AWS Region. During the Outposts ordering process, you should account for the compute and network capacity required to support the peak load and availability design.

For a higher availability level, you should consider procuring and deploying two or more Outposts racks or Outposts servers in a data center. You might also consider deploying redundant network paths between Outposts and the parent AWS Region. However, depending on your business service level agreement (SLA) for regulated gaming workload, you might choose to spread Outposts racks across two or more isolated data centers within the same regulated boundary, as shown in the following figure.

Route ingress gaming traffic through an AWS Region hosting non-regulated components, with an Outposts hosting regulated components

Options to route ingress gaming traffic

You have two options to route ingress gaming traffic coming into your regulated and non-regulated components when you deploy the configurations that we described previously in Scenarios two and three. Your gaming traffic can come through to the AWS Region, or through the Local Zones or Outposts. Note that the benefits that we mentioned previously around selecting the AWS Region for deploying regulated and non-regulated components are the same when you select an ingress route.

Let’s discuss the benefits and trade offs for each of these options.

Option one: Route ingress gaming traffic through an AWS Region

If you choose to route ingress gaming traffic through an AWS Region, your regulated gaming workloads benefit from access to the wide range of tools, services, and capacity available in the AWS Region. For example, native AWS security services, like AWS WAF and AWS Shield, which provide protection against DDoS attacks, are currently only available in AWS Regions. Only traffic that you route into your workload through an AWS Region benefits from these services.

If you route gaming traffic through an AWS Region, and non-regulated components are hosted in an AWS Region, then traffic has a direct path to non-regulated components. In addition, gaming traffic destined to regulated components, hosted in a Local Zone and on Outposts, can be routed through your non-regulated components and a few native AWS services in the AWS Region, as shown in Figure 2.

Option two: Route ingress gaming traffic through a Local Zone or Outposts

Choosing to route ingress gaming traffic through a Local Zone or Outposts requires careful planning to make sure that tools, services, and capacity are available in that jurisdiction, as shown in the following figure. In addition, consider how choosing this route will influence the pillars of the AWS Well-Architected Framework. This route might require deploying and managing most of your non-regulated components in a Local Zone or on Outposts as well, including native AWS services that aren’t available in Local Zones or on Outposts. If you plan to implement this topology, then we recommend that you consider using AWS Partner solutions to replace the native AWS services that aren’t available in Local Zones or Outposts.

Route ingress gaming traffic through a Local Zone and Outposts that are hosting regulated and non-regulated components, with an AWS Region hosting limited non-regulated components


If you’re building regulated gaming workloads, then you might have to follow strict workload residency and availability requirements. In this post, we’ve highlighted how Local Zones and Outposts can help you meet these workload residency requirements by bringing AWS services closer to where they’re needed. We also discussed the benefits of using AWS Regions in compliment to the AWS edge infrastructure, and several reliability and cost design considerations.

Although this post provides information to consider when making choices about using AWS for regulated gaming workloads, you’re ultimately responsible for maintaining compliance with the gaming regulations and laws in your jurisdiction. You’re in the best position to determine and maintain ultimate responsibility for determining whether activities are legal, including evaluating the jurisdiction of the activities, how activities are made available, and whether specific technologies or services are required to make sure of compliance with the applicable law. You should always review these regulations and laws before you deploy regulated gaming workloads on AWS.

How Launchmetrics improves fashion brands performance using Amazon EC2 Spot Instances

Post Syndicated from Ivo Pinto original https://aws.amazon.com/blogs/architecture/how-launchmetrics-improves-fashion-brands-performance-using-amazon-ec2-spot-instances/

Launchmetrics offers its Brand Performance Cloud tools and intelligence to help fashion, luxury, and beauty retail executives optimize their global strategy. Launchmetrics initially operated their whole infrastructure on-premises; however, they wanted to scale their data ingestion while simultaneously providing improved and faster insights for their clients. These business needs led them to build their architecture in AWS cloud.

In this blog post, we explain how Launchmetrics’ uses Amazon Web Services (AWS) to crawl the web for online social and print media. Using the data gathered, Launchmetrics is able to provide prescriptive analytics and insights to their clients. As a result, clients can understand their brand’s momentum and interact with their audience, successfully launching their products.

Architecture overview

Launchmetrics’ platform architecture is represented in Figure 1 and composed of three tiers:

  1. Crawl
  2. Data Persistence
  3. Processing
Launchmetrics backend architecture

Figure 1. Launchmetrics backend architecture

The Crawl tier is composed of several Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances launched via Auto Scaling groups. Spot Instances take advantage of unused Amazon EC2 capacity at a discounted rate compared with On-Demand Instances, which are compute instances that are billed per-hour or -second with no long-term commitments. Launchmetrics heavily leverages Spot Instances. The Crawl tier is responsible for retrieving, processing, and storing data from several media sources (represented in Figure 1 with the number 1).

The Data Persistence tier consists of two components: Amazon Kinesis Data Streams and Amazon Simple Queue Service (Amazon SQS). Kinesis Data Streams stores data that the Crawl tier collects, while Amazon SQS stores the metadata of the whole process. In this context, metadata helps Launchmetrics gain insight into when the data is collected and if it has started processing. This is key information if a Spot Instance is interrupted, which we will dive deeper into later.

The third tier, Processing, also makes use of Spot Instances and is responsible for pulling data from the Data Persistence tier (represented in Figure 1 with the number 2). It then applies proprietary algorithms, both analytics and machine learning models, to create consumer insights. These insights are stored in a data layer (not depicted) that consists of an Amazon Aurora cluster and an Amazon OpenSearch Service cluster.

By having this separation of tiers, Launchmetrics is able to use a decoupled architecture, where each component can scale independently and is more reliable. Both the Crawl and the Data Processing tiers use Spot Instances for up to 90% of their capacity.

Data processing using EC2 Spot Instances

When Launchmetrics decided to migrate their workloads to the AWS cloud, Spot Instances were one of the main drivers. As Spot Instances offer large discounts without commitment, Launchmetrics was able to track more than 1200 brands, translating to 1+ billion end users. Daily, this represents tracking upwards of 500k influencer profiles, 8 million documents, and around 70 million social media comments.

Aside from the cost-savings with Spot Instances, Launchmetrics incurred collateral benefits in terms of architecture design: building stateless, decoupled, elastic, and fault-tolerant applications. In turn, their stack architecture became more loosely coupled, as well.

All Launchmetrics Auto Scaling groups have the following configuration:

  • Spot allocation strategy: cost-optimized
  • Capacity rebalance: true
  • Three availability zones
  • A diversified list of instance types

By using Auto Scaling groups, Launchmetrics is able to scale worker instances depending on how many items they have in the SQS queue, increasing the instance efficiency. Data processing workloads like the ones Launchmetrics’ platform have, are an exemplary use of multiple instance types, such as M5, M5a, C5, and C5a. When adopting Spot Instances, Launchmetrics considered other instance types to have access to spare capacity. As a result, Launchmetrics found out that workload’s performance improved, as they use instances with more resources at a lower cost.

By decoupling their data processing workload using SQS queues, processes are stopped when an interruption arrives. As the Auto Scaling group launches a replacement Spot Instance, clients are not impacted and data is not lost. All processes go through a data checkpoint, where a new Spot Instance resumes processing any pending data. Spot Instances have resulted in a reduction of up to 75% of related operational costs.

To increase confidence in their ability to deal with Spot interruptions and service disruptions, Launchmetrics is exploring using AWS Fault Injection Simulator to simulate faults on their architecture, like a Spot interruption. Learn more about how this service works on the AWS Fault Injection Simulator now supports Spot Interruptions launch page.

Reporting data insights

After processing data from different media sources, AWS aided Launchmetrics in producing higher quality data insights, faster: the previous on-premises architecture had a time range of 5-6 minutes to run, whereas the AWS-driven architecture takes less than 1 minute.

This is made possible by elasticity and availability compute capacity that Amazon EC2 provides compared with an on-premises static fleet. Furthermore, offloading some management and operational tasks to AWS by using AWS managed services, such as Amazon Aurora or Amazon OpenSearch Service, Launchmetrics can focus on their core business and improve proprietary solutions rather than use that time in undifferentiated activities.

Building continuous delivery pipelines

Let’s discuss how Launchmetrics makes changes to their software with so many components.

Both of their computing tiers, Crawl and Processing, consist of standalone EC2 instances launched via Auto Scaling groups and EC2 instances that are part of an Amazon Elastic Container Service (Amazon ECS) cluster. Currently, 70% of Launchmetrics workloads are still running with Auto Scaling groups, while 30% are containerized and run on Amazon ECS. This is important because for each of these workload groups, the deployment process is different.

For workloads that run on Auto Scaling groups, they use an AWS CodePipeline to orchestrate the whole process, which includes:

I.  Creating a new Amazon Machine Image (AMI) using AWS CodeBuild
II. Deploying the newly built AMI using Terraform in CodeBuild

For containerized workloads that run on Amazon ECS, Launchmetrics also uses a CodePipeline to orchestrate the process by:

III. Creating a new container image, and storing it in Amazon Elastic Container Registry
IV. Changing the container image in the task definition, and updating the Amazon ECS service using CodeBuild


In this blog post, we explored how Launchmetrics is using EC2 Spot Instances to reduce costs while producing high-quality data insights for their clients. We also demonstrated how decoupling an architecture is important for handling interruptions and why following Spot Instance best practices can grant access to more spare capacity.

Using this architecture, Launchmetrics produced faster, data-driven insights for their clients and increased their capacity to innovate. They are continuing to containerize their applications and are projected to have 100% of their workloads running on Amazon ECS with Spot Instances by the end of 2023.

To learn more about handling EC2 Spot Instance interruptions, visit the AWS Best practices for handling EC2 Spot Instance interruptions blog post. Likewise, if you are interested in learning more about AWS Fault Injection Simulator and how it can benefit your architecture, read Increase your e-commerce website reliability using chaos engineering and AWS Fault Injection Simulator.