Tag Archives: Amazon EC2

Implementing advanced AWS Graviton adoption strategies across AWS Regions

2025-08-26 Matt Howard

Post Syndicated from Matt Howard original https://aws.amazon.com/blogs/compute/implementing-advanced-aws-graviton-adoption-strategies-across-aws-regions/

AWS Graviton Processors can offer cost savings, improved performance, and reduce your carbon footprint when using Amazon Elastic Compute Cloud (Amazon EC2) instances. When expanding your Graviton deployment across multiple AWS Regions, careful planning helps you navigate considerations around regional instance type availability and capacity optimization. This post shows how to implement advanced configuration strategies for Graviton-enabled EC2 Auto Scaling groups across multiple Regions, helping you maximize instance availability, reduce costs, and maintain consistent application performance even in AWS Regions with limited Graviton instance type availability.

Instance type flexibility strategies

One of the most effective strategies for maximizing Graviton availability is to be flexible across multiple instance types and families. Instance families (such as m7g, c7g, and r7g) group similar instances with different sizes, where each size offers proportionally more vCPUs and memory. When configuring EC2 Auto Scaling groups, aim for at least 10 instance types rather than limiting to just one or two specific types. EC2 Auto Scaling supports this flexibility through the mixed instances group, which allows you to specify multiple instance types in a single group. Consider this example AWS CloudFormation template snippet for an EC2 Auto Scaling group MixedInstancesPolicy that only specifies two Graviton instance types across two different families:

"MixedInstancesPolicy": {
  "Overrides": [
    {
      "InstanceType": "m7g.large"
    },
    {
      "InstanceType": "c7g.xlarge"
    }
  ]
}

This limited selection significantly reduces your ability to access available capacity pools. Assuming this workload needs a minimum of 2 vCPU and 8 GiB of memory, you can add these additional eight Graviton instance types: m6g.large, m8g.large, m6gd.large, m7gd.large, m8gd.large, c6g.xlarge, c6gd.xlarge, and c8g.xlarge. These allow you to meet the recommendation of being flexible across 10 instance types. While some of these instance types may have different price points, you can manage these cost implications through allocation strategies discussed later in this post.

To efficiently identify all compatible Graviton instance types available for your workload, you can use the GetInstanceTypesFromInstanceRequirements Amazon EC2 API. This approach removes the manual effort of researching and choosing individual instance types.

aws ec2 get-instance-types-from-instance-requirements \
--architecture-types arm64 \
--virtualization-types hvm \
--instance-requirements '{"VCpuCount": {"Min": 2,"Max":8}, "MemoryMiB": {"Min": 8000}, "InstanceGenerations":["current"]}' \
--region us-east-1

This example command returns dozens of compatible Graviton instance types across multiple families (c7g, c7gd, c7gn, m7g, m7gd, etc.), thus expanding your capacity options. An EC2 Auto Scaling group’s mixed instance policy can allow up to 40 instance types, thus you have more room for even greater flexibility.

After expanding your instance type selection, you need to configure how EC2 Auto Scaling chooses between the available instance types. The OnDemandAllocationStrategy CloudFormation property controls this behavior, offering two approaches: “lowest-price” and “prioritized”. With the “lowest-price” strategy, EC2 Auto Scaling launches instances from the lowest-priced capacity pool available:

"OnDemandAllocationStrategy": "lowest-price"

This strategy helps manage costs when you’ve included a variety of instance types. Even with expanded instance type flexibility, your workloads will automatically select the most cost-effective option from available capacity pools. Alternatively, you can use the “prioritized” strategy when you want more control over which instance types are chosen first:

"OnDemandAllocationStrategy": "prioritized"

Regional adaptation techniques

Not all AWS Regions have the same Graviton instance types available. Regional variation in instance type availability creates a challenge when deploying applications consistently across multiple AWS Regions. To handle these differences, expand your instance type flexibility beyond the minimum 10 types to make sure of sufficient options in each AWS Region where you operate.

To implement this flexibility across AWS Regions, you must determine which Graviton instance types are available in each target AWS Region. AWS provides several methods to access this information: check the Amazon EC2 Instance Types by Region documentation for a comprehensive list, use the DescribeInstanceTypeOfferings Amazon EC2 API to programmatically identify available types, or visit the EC2 Instance Types page in the AWS Management Console.

You can also run the GetInstanceTypesFromInstanceRequirements API across different AWS Regions to understand Regional differences. For example, running identical queries in the US East (N. Virginia) and Asia Pacific (Taipei) Regions reveals significant variations: over 70 compatible instance types in the US East (N. Virginia) and 27 in Asia Pacific (Taipei) Regions.

# Query for US East (N. Virginia)
aws ec2 get-instance-types-from-instance-requirements \
--architecture-types arm64 \
--virtualization-types hvm \
--instance-requirements '{"VCpuCount": {"Min": 2,"Max":8}, "MemoryMiB": {"Min": 8000}, "InstanceGenerations":["current"]}' \
--region us-east-1

# Query for Asia Pacific (Taipei)
aws ec2 get-instance-types-from-instance-requirements \
--architecture-types arm64 \
--virtualization-types hvm \
--instance-requirements '{"VCpuCount": {"Min": 2,"Max":8}, "MemoryMiB": {"Min": 8000}, "InstanceGenerations":["current"]}' \
--region ap-east-2

When operating across multiple AWS Regions, design a single mixed instance policy that works everywhere by including instance types available in all AWS Regions where you operate. Based on the previous query results, you might include these 10 instance types that are available in both AWS Regions: m6g.large, m7g.large, m6gd.large, m7gd.large, c6g.xlarge, c7g.xlarge, m6g.xlarge, m7g.xlarge, c6gn.xlarge, and m6gd.xlarge.

You should also span your EC2 Auto Scaling group across multiple Availability Zones (AZs) for greater resiliency and access to deeper capacity pools. To determine available AZs in your AWS Region, refer to the Availability Zones documentation or check your Amazon Virtual Private Cloud (Amazon VPC) to identify which AZs its subnets use through the DescribeSubnets Amazon EC2 API. Configure your EC2 Auto Scaling group to use all available AZs using the CloudFormation AWS::AutoScaling::AutoScalingGroup AvailabilityZones parameter:

"AvailabilityZones": [
  "us-west-2a",
  "us-west-2b",
  "us-west-2c",
  "us-west-2d",
]

Best practices for EC2 Spot Instances usage with Graviton-based instances

Although optimizing for regional availability and AZ distribution provides a strong foundation, further enhancing your Graviton deployment strategy with proper Amazon EC2 Spot Instances configuration can significantly improve cost efficiency without sacrificing reliability. When using Spot Instances with Graviton, you should implement strategies that maximize your chances of obtaining and maintaining capacity.

First, the Spot Instance Advisor provides valuable information about the interruption frequency of different instance types across AWS Regions. Use this tool to identify Graviton instance types with lower interruption rates in your target AWS Regions. Then, expand your mixed instance group to include these other instance types. Especially for Spot Instance workloads, maximize your instance type flexibility by specifying up to the full limit of 40 instance types for EC2 Auto Scaling groups mixed instance policies. This broad selection increases your chances of finding available Spot Instances capacity.

Beyond instance type selection, the allocation strategy you choose significantly impacts your ability to maintain Spot Instances capacity. Configure your Spot allocation strategy using the AWS::AutoScaling::AutoScalingGroup InstancesDistribution property with the SpotAllocationStrategy parameter set to price-capacity-optimized to choose Spot pools with the lowest interruption risk while still considering price:

"InstancesDistribution": {
  "SpotAllocationStrategy": "price-capacity-optimized"
}

For workloads that can benefit from more time beyond the standard two-minute Spot interruption notice, enable Capacity Rebalancing. This feature, configured using the AWS::AutoScaling::AutoScalingGroup CapacityRebalanceproperty, enables EC2 Auto Scaling to proactively respond to rebalance recommendations by launching a new Spot Instance before a running instance receives the two-minute Spot Instance interruption notice, which provides more time for graceful transitions:

"CapacityRebalance": true

For maximum flexibility and capacity access, consider mixing x86 and ARM architectures in your launch templates. Although the Graviton capacity pools are newer and sometimes smaller than their x86 counterparts, a mixed architecture approach makes sure that you can still launch instances even when one architecture has limited availability. For detailed instructions, refer to the AWS post: Supporting AWS Graviton2 and x86 instance types in the same Auto Scaling group.

Attribute-based instance type selection

Although mixed instance policies with explicit instance type lists provide excellent flexibility, AWS offers an even more powerful approach for dynamic instance selection: attribute-based instance type selection. This streamlines management by allowing you to specify the attributes your application needs rather than specific instance types, automatically adapting to new instance types and handling Regional differences in availability.

Implement attribute-based instance type selection in your EC2 Launch Template through the AWS::EC2::LaunchTemplate InstanceRequirements property:

{
  "InstanceRequirements": {
    "AcceleratorCount": {
      "Max": 0
    },
    "BareMetal": "excluded",
    "BaselinePerformanceFactors": {
      "Cpu": {
        "References": [
          {
            "InstanceFamily": "c7g"
          }
        ]
      }
    },
    "InstanceGenerations": [
      "current"
    ],
    "MemoryMiB": {
      "Min": 8000
    },
    "VCpuCount": {
      "Min": 4
    }
  }
}

The BaselinePerformanceFactors parameter of the AWS::EC2::LaunchTemplate InstanceRequirements property enables performance protection. This feature makes sure that your EC2 Auto Scaling group uses instance types that meet or exceed a specified performance baseline. When you specify an instance family such as “c7g” as the baseline reference, Amazon EC2 automatically excludes instance types that fall below this performance level, even if they match your other specified attributes. For Graviton deployments, specifying “c7g” makes sure that only instance types with performance like or better than Graviton3 processors are chosen.

Attribute-based instance type selection also allows you to specify instance types in your template that may not yet be available in an AWS Region by using the AllowedInstanceTypes parameter:

{
  "AllowedInstanceTypes": [
    "m6g.large",
    "m7g.large",
    "m8g.large"
  ]
}

This approach allows your EC2 Auto Scaling group to use newer instance types where available and automatically deploy them in other AWS Regions as soon as they become available. This single template approach simplifies the deployment and management of your EC2 instance selection in EC2 Auto Scaling groups across many regions.

Special considerations

The following special considerations should be taken into account.

Performance testing with multiple instance types

When implementing instance type flexibility, a common concern is the need to test all instance types with your application. Testing 40 different instance types isn’t practical for most organizations. Instead, consider these streamlined approaches to reduce testing overhead while maintaining performance confidence. First, Graviton instance families within the same generation (for example, c7g, m7g, and r7g) use the same processor, providing similar performance profiles across families. Therefore, you can include multiple instance types from the same generation after testing a representative instance. Second, you should also consider including variants within families (such as c7gd with NVMe storage), because these provide specialized capabilities without changing the fundamental CPU architecture. Third, for maximum flexibility, include multiple instance generations. If your application runs well on Graviton3, then it likely performs even better on Graviton4, allowing you to specify both in your EC2 Auto Scaling group.

Reserving specific Graviton instance types

If your workload needs a specific Graviton instance type, then we recommend that you use EC2 Capacity Reservations, which allow you to reserve compute capacity for EC2 instances in a specific AZ for any duration. On-Demand Capacity Reservations (ODCR) are for immediate use and come with no term commitment. Alternatively, Future-dated Capacity Reservations allow you to specify when you need the capacity to become available along with a commitment duration.

Amazon EMR workloads

Although Amazon EMR clusters must exist in only one AZ, you can use Amazon EMR instance fleets to choose multiple subnets across different AZs. Then, when launching a cluster, Amazon EMR searches across these subnets to find specified instances and purchasing options, thus providing access to a deeper capacity pool. For Instance Fleets you can specify up to 30 EC2 instance types for each primary, core, and task node group, which significantly improves instance flexibility and availability. For more information go to the Responding to Amazon EMR cluster insufficient instance capacity events documentation.

Conclusion

In this post, we covered advanced strategies for maximizing AWS Graviton adoption across multiple AWS Regions. You can use the AWS CloudFormation examples provided in this post as templates for your own implementations. Following these approaches allows you to maintain consistent application performance and maximize Graviton instance availability across all AWS Regions where you operate, even as Graviton availability continues to expand across the AWS global infrastructure. For comprehensive guidance on maximizing your Graviton deployment, explore the AWS Graviton Technical Guide.

AWS Weekly Roundup: Amazon Aurora 10th anniversary, Amazon EC2 R8 instances, Amazon Bedrock and more (August 25, 2025)

2025-08-25 Betty Zheng (郑予彬)

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-aurora-10th-anniversary-amazon-ec2-r8-instances-amazon-bedrock-and-more-august-25-2025/

As I was preparing for this week’s roundup, I couldn’t help but reflect on how database technology has evolved over the past decade. It’s fascinating to see how architectural decisions made years ago continue to shape the way we build modern applications. This week brings a special milestone that perfectly captures this evolution in cloud database innovation as Amazon Aurora celebrated 10 years of database innovation.

Amazon Web Services (AWS) Vice President Swami Sivasubramanian reflected on LinkedIn about his journey with Amazon Aurora, calling it “one of the most interesting products” he’s worked on. When Aurora launched in 2015, it shifted the database landscape by separating compute and storage. Now trusted by hundreds of thousands of customers across industries, Aurora has grown from a MySQL-compatible database to a comprehensive platform featuring innovations such as Aurora DSQL, serverless capabilities, I/O-Optimized pricing, zero-ETL integrations, and generative AI support. Last week’s celebration on August 21 highlighted this decade-long transformation that continues to simplify database scaling for customers.

Last week’s launches

In addition to the inspiring celebrations, here are some AWS launches that caught my attention:

AWS Billing and Cost Management introduces customizable Dashboards — This new feature consolidates cost data into visual dashboards with multiple widget types and visualization options, combining information from Cost Explorer, Savings Plans, and Reserved Instance reports to help organizations track spending patterns and share standardized cost reporting across accounts.
Amazon Bedrock simplifies access to OpenAI open weight models — AWS has streamlined access to OpenAI’s open weight models (gpt-oss-120b and gpt-oss-20b), making them automatically available to all users without manual activation while maintaining administrator control through IAM policies and service control policies.
Amazon Bedrock adds batch inference support for Claude Sonnet 4 and GPT-OSS models —This feature provides asynchronous processing of multiple inference requests with 50 percent lower pricing compared to on-demand inference, optimizing high-volume AI tasks such as document analysis, content generation, and data extraction with Amazon CloudWatch metrics for tracking batch workload progress
AWS launching Amazon EC2 R8i and R8i-flex memory-optimized instances — Powered by custom Intel Xeon 6 processors, these new instances deliver up to 20 percent better performance and 2.5 times higher memory throughput than R7i instances, making them ideal for memory-intensive workloads like databases and big data analytics, with R8i-flex offering additional cost savings for applications that don’t fully utilize compute resources.
Amazon S3 introduces batch data verification feature — A new capability in S3 Batch Operations that offers efficient verification of billions of objects using multiple checksum algorithms without downloading or restoring data, generating detailed integrity reports for compliance and audit purposes regardless of storage class or object size.

Other AWS news

Here are some additional projects and blog posts that you might find interesting:

Amazon introduces DeepFleet foundation models for multirobot coordination — Trained on millions of hours of data from Amazon fulfillment and sortation centers, these pioneering models predict future traffic patterns for robot fleets, representing the first foundation models specifically designed for coordinating multiple robots in complex environments.
Building Strands Agents with a few lines of code — A new blog demonstrates how to build multi-agent AI systems with a few lines of code, enabling specialized agents to collaborate seamlessly, handle complex workflows, and share information through standardized protocols for creating distributed AI systems beyond individual agent capabilities.
AWS Security Incident Response introduces ITSM integrations — New integrations with Jira and ServiceNow provide bidirectional synchronization of security incidents, comments, and attachments, streamlining response while maintaining existing processes, with open source code available on GitHub for customization and extension to additional IT service management (ITSM) platforms.
Finding root-causes using a network digital twin graph and agentic AI — A detailed blog post shows how AWS collaborated with NTT DOCOMO to build a network digital twin using graph databases and autonomous AI agents, helping telecom operators to move beyond correlation to identify true root causes of complex network issues, predict future problems, and improve overall service reliability.

Upcoming AWS events
Check your calendars and sign up for these upcoming AWS events:

AWS Summits — Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Toronto (September 4), Los Angeles (September 17), and Bogotá (October 9).
AWS re:Invent 2025 — This flagship annual conference is coming to Las Vegas from December 1–5. The event catalog is now available. Mark your calendars for this not to be missed gathering of the AWS community.
AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Adria (September 5), Baltic (September 10), Aotearoa (September 18), South Africa (September 20), Bolivia (September 20), Portugal (September 27).

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse here for upcoming in-person and virtual developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

– Betty

Best performance and fastest memory with the new Amazon EC2 R8i and R8i-flex instances

2025-08-19 Veliswa Boya

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/best-performance-and-fastest-memory-with-the-new-amazon-ec2-r8i-and-r8i-flex-instances/

Today, we’re announcing general availability of the new eighth generation, memory optimized Amazon Elastic Compute Cloud (Amazon EC2) R8i and R8i-flex instances powered by custom Intel Xeon 6 processors, available only on AWS. They deliver the highest performance and fastest memory bandwidth among comparable Intel processors in the cloud. These instances deliver up to 15 percent better price performance, 20 percent higher performance, and 2.5 times more memory throughput compared to previous generation instances.

With these improvements, R8i and R8i-flex instances are ideal for a variety of memory intensive workloads such as SQL and NoSQL databases, distributed web scale in-memory caches (Memcached and Redis), in-memory databases such as SAP HANA, and real-time big data analytics (Apache Hadoop and Apache Spark clusters). For a majority of the workloads that don’t fully utilize the compute resources, the R8i-flex instances are a great first choice to achieve an additional 5 percent better price performance and 5 percent lower prices.

Improvements made to both instances compared to their predecessors
In terms of performance, R8i and R8i-flex instances offer 20 percent better performance than R7i instances, with even higher gains for specific workloads. These instances are up to 30 percent faster for PostgreSQL databases, up to 60 percent faster for NGINX web applications, and up to 40 percent faster for AI deep learning recommendation models compared to previous generation R7i instances, with sustained all-core turbo frequency now reaching 3.9 GHz (compared to 3.2 GHz in the previous generation). They also feature a 4.6x larger L3 cache and significantly better memory throughput, offering 2.5 times higher memory bandwidth than the seventh generation. With this higher performance across all the vectors, you can run a greater number of workloads while keeping costs down.

R8i instances now scale up to 96xlarge with up to 384 vCPUs and 3TB memory (versus 48xlarge sizes in the seventh generation), helping you to scale up database applications. R8i instances are SAP certified to deliver 142,100 aSAPS, which is highest among all comparable machines in on premises and cloud environments, delivering exceptional performance for your mission-critical SAP workloads. R8i-flex instances offer the most common sizes, from large to 16xlarge, and are a great first choice for applications that don’t fully utilize all compute resources. Both R8i and R8i-flex instances use the latest sixth generation AWS Nitro Cards, delivering up to two times more network and Amazon Elastic Block Storage (Amazon EBS) bandwidth compared to the previous generation, which greatly improves network throughput for workloads handling small packets, such as web, application, and gaming servers.

R8i and R8i-flex instances also support bandwidth configuration with 25 percent allocation adjustments between network and Amazon EBS bandwidth, enabling better database performance, query processing, and logging speeds. Additional enhancements include FP16 datatype support for Intel AMX to support workloads such as deep learning training and inference and other artificial intelligence and machine learning (AI/ML) applications.

The specs for the R8i instances are as follows.

Instance size	vCPUs	Memory (GiB)	Network bandwidth (Gbps)	EBS bandwidth (Gbps)
r8i.large	2	16	Up to 12.5	Up to 10
r8i.xlarge	4	32	Up to 12.5	Up to 10
r8i.2xlarge	8	64	Up to 15	Up to 10
r8i.4xlarge	16	128	Up to 15	Up to 10
r8i.8xlarge	32	256	15	10
r8i.12xlarge	48	384	22.5	15
r8i.16xlarge	64	512	30	20
r8i.24xlarge	96	768	40	30
r8i.32xlarge	128	1024	50	40
r8i.48xlarge	192	1536	75	60
r8i.96xlarge	384	3072	100	80
r8i.metal-48xl	192	1536	75	60
r8i.metal-96xl	384	3072	100	80

The specs for the R8i-flex instances are as follows.

Instance size	vCPUs	Memory (GiB)	Network bandwidth (Gbps)	EBS bandwidth (Gbps)
r8i-flex.large	2	16	Up to 12.5	Up to 10
r8i-flex.xlarge	4	32	Up to 12.5	Up to 10
r8i-flex.2xlarge	8	64	Up to 15	Up to 10
r8i-flex.4xlarge	16	128	Up to 15	Up to 10
r8i-flex.8xlarge	32	256	Up to 15	Up to 10
r8i-flex.12xlarge	48	384	Up to 22.5	Up to 15
r8i-flex.16xlarge	64	512	Up to 30	Up to 20

When to use the R8i-flex instances
As stated earlier, R8i-flex instances are more affordable versions of the R8i instances, offering up to 5 percent better price performance at 5 percent lower prices. They’re designed for workloads that benefit from the latest generation performance but don’t fully use all compute resources. These instances can reach up to the full CPU performance 95 percent of the time and work well for in-memory databases, distributed web scale cache stores, mid-size in-memory analytics, real-time big data analytics, and other enterprise applications. R8i instances are recommended for more demanding workloads that need sustained high CPU, network, or EBS performance such as analytics, databases, enterprise applications, and web scale in-memory caches.

Available now
R8i and R8i-flex instances are available today in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Spain) AWS Regions. As usual with Amazon EC2, you pay only for what you use. For more information, refer to Amazon EC2 Pricing. Check out the full collection of memory optimized instances to help you start migrating your applications.

To learn more, visit our Amazon EC2 R8i instances page and Amazon EC2 R8i-flex instances page. Send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

– Veliswa

Achieve low-latency data processing with Amazon EMR on AWS Local Zones

2025-08-18 Gagan Brahmi

Post Syndicated from Gagan Brahmi original https://aws.amazon.com/blogs/big-data/achieve-low-latency-data-processing-with-amazon-emr-on-aws-local-zones/

Enterprises today require both single-digit millisecond latency data processing and data residency compliance for their applications. By deploying Amazon EMR on AWS Local Zones, organizations can achieve single-digit millisecond latency data processing for applications while maintaining data residency compliance. This post demonstrates how to use AWS Local Zones to deploy EMR clusters closer to your users, enabling millisecond-level response times. We use a Secure Web Gateway as an example and implement Amazon EMR with Apache Flink on AWS Local Zones to process network traffic with ultra-low latency. We also go through the process of creating an EMR cluster on AWS Local Zones, highlighting performance optimizations and architecture considerations specific to edge deployments. This approach uses AWS Local Zones to bring Amazon EMR’s data processing capabilities closer to your users and data sources – ideal for security applications or any other latency-sensitive workloads.

Solution overview

The following diagram illustrates the solution architecture.

Sample Architecture for Secure Web Gateway on AWS LocalZones

The solution consists of several key components:

AWS Local Zones deployment – Positioned close to corporate offices to minimize latency
Network traffic interception – Using AWS Transit Gateway and virtual private cloud (VPC) endpoints
Request queuing and rules streaming – Using Apache Kafka on Amazon Elastic Kubernetes Service (Amazon EKS) to queue the incoming and outgoing network requests as well as stream rules as they are updated by the security administrator
EMR cluster – Running Flink for real-time stream processing and functions to combine rules
Policy management system – For defining and updating security rules
Logging – Using Amazon Simple Storage Service (Amazon S3) for visibility, compliance, and data analytics

In this scenario, the Secure Web Gateway is designed to inspect and make decisions on network traffic within single-digit milliseconds. The workflow consists of the following steps:

The corporate office uses AWS Direct Connect to connect to AWS Local Zones.
The security administrator defines the rules from a rules interface running on Kubernetes pods on Amazon EKS. As the rules are added or modified, they are sent to the swg_rules Kafka topic running on Amazon EKS. These rules are stored and processed by Flink running on the EMR cluster.
A corporate user requests for a software as a service (SaaS) application from the corporate office. The request is routed through Direct Connect to the Local Zone.
The Secure Web Gateway proxy service running on Kubernetes pods on Amazon EKS receives the access request, which is sent to the swg_requests Kafka topic.
Flink running on EMR evaluates and consumes the messages from the swg_requests Kafka topic and determines the routing decision, which is sent back to the swg_decisions Kafka topic.
The Secure Web Gateway proxy service consumes the swg_decisions topic and routes the traffic to the SaaS application, if the access request is allowed. If the request is denied, the proxy responds back to the users with the reason or violations details, if any.

Due to the real-time nature of the solution, the security administrator can add, modify, or remove the rules through the swg_rules topic as Flink constantly consumes and evaluates this topic.In the following sections, we discuss the key components of the solution in more detail.

AWS Local Zones: The foundation

AWS Local Zones provide low-latency extensions of AWS Regions positioned near large population and industry centers. For our Secure Web Gateway use case, deploying in a Local Zones offers several advantages:

Proximity to corporate offices – Reducing round-trip latency for traffic inspection. AWS Local Zones is designed to provide applications with low latency aiming for single-digit millisecond performance.
AWS-native security controls – Using AWS security features.
Consistent connectivity – Reliable connection between corporate networks and AWS resources.

The Local Zone hosts our EMR cluster and networking components, making sure traffic inspection through the Secure Web Gateway happens with single-digit millisecond latency. For scenarios where traffic inspection doesn’t require single-digit millisecond latency, deploying hosting the solution on EMR cluster in a Region should work fine.

Amazon EMR with Apache Flink: The decision engine

The core intelligence of our Secure Web Gateway solution is powered by Amazon EMR running Flink for real-time stream processing. With Amazon EMR running on Flink, we take advantage of the optimized real-time stream processing capability offered by Flink. EMR running in AWS Local Zones helps users perform complex data processing closer to their data centers or corporate locations without worrying any potential latency introduced for moving the data to other Regions. In this particular solution, we use Flink’s stateful processing, which allows for maintaining the session context across multiple network requests/packets. The solution also provides a dynamic rules engine that is combined with the real-time stream of requests for network access.

Architectural component choice considerations

Amazon EMR offers several deployment options for different kinds of workloads and use cases, including Amazon EMR on EKS. AWS also provides Amazon Managed Service for Apache Flink, a fully managed service that simplifies the process of building and managing Flink applications. As of this writing, both the EMR on EKS deployment option and Amazon Managed Service for Apache Flink are not available in AWS Local Zones.

Prerequisites

Before proceeding with this deployment, ensure you have:

AWS account with AWS IAM permissions for Amazon VPC, EMR, and Local Zones management
Basic familiarity with the AWS Management Console

Deploy Amazon EMR on a Local Zone

To deploy Amazon EMR on a Local Zone, you first need to enable the Local Zone for the AWS account. For instructions, refer to Step 1 and Step 2 in Getting started with AWS Local Zones.

After you have enabled a Local Zone and created a Local Zone subnet, create your EMR cluster. For instructions, refer to Step 1: Configure data resources and launch an Amazon EMR cluster. You can follow the instructions provided for the AWS Management Console. Make sure you select the appropriate Amazon EMR release version (5.28.0 or later for Local Zone support). Select the applications you need, which in this case is Hadoop and Flink.

A crucial step to launching an EMR cluster in a Local Zone is selecting the Local Zone network configuration. Choose the VPC that contains your Local Zone subnet, and choose the subnet that you created in the Local Zone.

Review all other configurations and settings for your cluster and make any final adjustments as needed, then choose Create cluster to launch your EMR cluster in the Local Zone.

Performance and scaling considerations

The Local Zone EMR deployment can be scaled based on traffic patterns. You can manually scale the EMR cluster horizontally by adding more worker nodes during peak traffics to provide low-latency performance, after you have increased the number of users that access the Secure Web Gateway. Alternatively, you can set up a scheduled action to scale the EMR cluster at predetermined times based on known workload patterns. You can also perform vertical scaling by using Amazon Elastic Compute Cloud (Amazon EC2) instance types with more compute capacity. Consider using the manual resize option for EMR clusters to modify the cluster size based on workload requirements.

Another important performance consideration is to optimize Flink checkpointing for fault tolerance. To learn more, see Optimizing job restart times for task recovery and scaling operations.

Security considerations

Although this architecture prioritizes low-latency performance, implementing proper security controls is essential for production deployments. The solution handles sensitive corporate network traffic that requires protection through encryption, access controls, and monitoring. For comprehensive security guidance specific to EMR deployments, refer to Security in Amazon EMR. Consider the following key areas:

Data protection – Enable encryption at rest and in transit using Amazon EMR security configurations, including Amazon S3 encryption and TLS certificates for inter-node communication
Access control – Implement AWS Identity and Access Management (IAM) roles with least privilege for Amazon EMR service roles, EC2 instance profiles, and runtime roles to isolate job access
Network security – Deploy EMR clusters in private subnets with security groups following least privilege, and enable the Amazon EMR block public access feature

Benefits of Amazon EMR

Using Amazon EMR on AWS Local Zones in this architecture offers several key benefits:

Low latency – Providing the compute in AWS Local Zones close to corporate offices helps you achieve low-latency processing.
Real-time inspection – Flink’s streaming capabilities unlocks the ability to process real-time inspection for network requests.
Complex policy application – With Flink on Amazon EMR, you can build a complex policy application that, for instance, can detect sophisticated access patterns across multiple events and time windows that would be impossible with traditional rule-based systems.
Scalability – Amazon EMR provides the flexibility to automatically scale the cluster with a custom policy. Moreover, Amazon EMR release 6.15.0 and higher supports Flink autoscaler, which automatically scales the individual Flink job vertexes based on the job metrics.
Compliance – Logging all the events to a durable storage like Amazon S3 helps users improve their security and audit posture.

Clean up

To avoid incurring unnecessary charges, clean up the resources you created during this walkthrough. Follow these steps in order:

Step 1: Terminate the EMR cluster

Open the Amazon EMR console
Select your EMR cluster from the list
Choose Terminate
Confirm the termination when prompted
Wait for the cluster status to change to “TERMINATED”

Step 2: Clean up VPC resources

In the Amazon VPC console, delete the Local Zone subnet you created
If you created a custom VPC specifically for this demo, delete any associated:
- Route tables
- Internet gateways
- Security groups (other than default)
- The VPC itself

Step 3: Disable the Local Zone (optional)

In the EC2 console, go to Zones under “Settings”
Find your enabled Local Zone
Choose Manage and disable the zone if you no longer need it for other workloads

Step 4: Review additional resources Check for and clean up any other resources you may have created:

S3 buckets used for logging or EMR storage
CloudWatch log groups
Any custom IAM roles or policies created specifically for this architecture

Conclusion

This implementation of Amazon EMR on AWS Local Zones demonstrates how enterprises can bring powerful data processing capabilities to the edge while maintaining single-digit millisecond latency. By showcasing a Secure Web Gateway application, we have illustrated just one of many possible use cases where performance-sensitive workloads can benefit from this architecture.As the edge computing landscape evolves, we anticipate organizations will increasingly use this pattern for additional use cases, including:

Real-time fraud detection for financial transactions requiring immediate decision-making
Connected vehicle applications where processing telemetry data with minimal latency is critical
Internet of Things (IoT) sensor analytics that require immediate insights from operational technology environments
Augmented reality experiences where processing must happen close to end-users

We encourage you to evaluate your latency-sensitive workloads and consider how AWS Local Zones with Amazon EMR might help you implement architectures previously perceived highly challenging. Start small with a proof of concept like the one outlined here, measure the performance gains, and expand to production use cases with confidence. Implementing a Secure Web Gateway in AWS Local Zones with Amazon EMR and Flink offers enterprises a powerful solution for securing corporate traffic. By using the proximity of Local Zones and the real-time processing capabilities of Flink, organizations can implement sophisticated security policies without the latency penalties traditionally associated with traffic inspection.

About the authors

Gagan Brahmi is a Specialist Senior Solutions Architect at Amazon Web Services (AWS), specializing in Data Analytics and AI/ML solutions. With over 20 years in information technology, he helps customers architect scalable, high-performance analytics platforms using distributed data processing, real-time streaming technologies, and machine learning services on AWS. When not designing cloud solutions, Gagan enjoys exploring new places with his family.

Arun Shanmugam is a Senior Analytics Solutions Architect at AWS, with a focus on building modern data architecture. He has been successfully delivering scalable data analytics solutions for customers across diverse industries. Outside of work, Arun is an avid outdoor enthusiast who actively engages in CrossFit, road biking, and cricket.

George Oakes is a Senior Hybrid Solutions Architect at AWS, with a focus on edge, on-premise, and low latency architectures. He has been successfully delivering scalable hybrid AWS solutions for customers across diverse industries. Outside of work, George is an avid outdoor enthusiast who enjoys hiking and visiting parks and UNESCO sites around.

AWS Weekly Roundup: Single GPU P5 instances, Advanced Go Driver, Amazon SageMaker HyperPod and more (August 18, 2025)

2025-08-18 Prasad Rao

Post Syndicated from Prasad Rao original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-single-gpu-p5-instances-advanced-go-driver-amazon-sagemaker-hyperpod-and-more-august-18-2025/

Let me start this week’s update with something I’m especially excited about – the upcoming BeSA (Become a Solutions Architect) cohort. BeSA is a free mentoring program that I host along with a few other AWS employees on a volunteer basis to help people excel in their cloud careers. Last week, the instructors’ lineup was finalized for the 6-week cohort starting September 6. The cohort will focus on migration and modernization on AWS. Visit the BeSA website to learn more.

Another highlight for me last week was the announcement of six new AWS Heroes for their technical leadership and exceptional contributions to the AWS community. Read the full announcement to learn more about these community leaders.

Last week’s launches
Here are some launches from last week that got my attention:

Amazon EC2 Single GPU P5 instances are now generally available — You can right-size your machine learning (ML) and high performance computing (HPC) resources cost-effectively with the new Amazon Elastic Compute Cloud (Amazon EC2) P5 instance size with one NVIDIA H100 GPU.
AWS Advanced Go Driver is generally available — You can now use the AWS Advanced Go Driver with Amazon Relational Database Service (Amazon RDS) and Amazon Aurora PostgreSQL-Compatible and MySQL-Compatible database clusters for faster switchover and failover times, Federated Authentication, and authentication with AWS Secrets Manager or AWS Identity and Access Management (IAM). You can install the PostgreSQL and MySQL packages for Windows, Mac, or Linux, by following the installation guides in GitHub.
Expanded support for Cilium with Amazon EKS Hybrid Nodes — Cilium is a Cloud Native Computing Foundation (CNCF) graduated project that provides core networking capabilities for Kubernetes workloads. Now, you can receive support from AWS for a broader set of Cilium features when using Cilium with Amazon EKS Hybrid Nodes including application ingress, in-cluster load balancing, Kubernetes network policies, and kube-proxy replacement mode.
Amazon SageMaker AI now supports P6e-GB200 UltraServers — You can accelerate training and deployment of foundational models (FMs) at trillion-parameter scale by using up to 72 NVIDIA Blackwell GPUs under one NVLink domain with the new P6e-GB200 UltraServer support in Amazon SageMaker HyperPod and Model Training.
Amazon SageMaker HyperPod now supports fine-grained quota allocation of compute resources, topology-aware-scheduling of LLM tasks and custom Amazon Machine Images (AMIs) — You can allocate fine-grained compute quota for GPU, Trainium accelerator, vCPU, and vCPU memory within an instance to optimize compute resource distribution. With topology-aware scheduling, you can schedule your large language model (LLM) tasks on an optimal network topology to minimize network communication and enhance training efficiency. Using custom AMIs, you can deploy clusters with pre-configured, security-hardened environments that meet your specific organizational requirements.

Additional updates
Here are some additional news items and blog posts that I found interesting:

Celebrating 10 years of Amazon Aurora innovation — Join the livestream event on August 21, 2025, to celebrate a decade of Aurora database innovation.
AWS named as a Leader in 2025 Gartner Magic Quadrant for Strategic Cloud Platform Services — For the fifteenth consecutive year, Gartner has named AWS a Leader in the Gartner Magic Quadrant for Strategic Cloud Platform Services (SCPS), making AWS the longest-running Magic Quadrant Leader.
Introducing AWS Cloud Control API (CCAPI) MCP Server — You can now use natural language to managing cloud infrastructure using CCAPI MCP Server. You can create, read, update, delete, and list resources using natural language.
Introducing Amazon Bedrock AgentCore Identity — AgentCore Identity provides a centralized capability to manage agent identities, securing credentials, and supporting seamless integration with AWS and third-party services through Sigv4, standardized OAuth 2.0 flows, and API key.
Introducing Amazon Bedrock AgentCore Gateway — A fully managed service to connect AI agents with tools and services. It serves as a centralized tool server, providing a unified interface where agents can discover, access, and invoke tools.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS and AWS Community events:

AWS re:Invent 2025 (December 1-5, 2025, Las Vegas) — The AWS flagship annual conference offering collaborative innovation through peer-to-peer learning, expert-led discussions, and invaluable networking opportunities.
AWS Summits — Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Coming up soon are summits in Johannesburg (August 20) and Toronto (September 4).
AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Adria (September 5), Baltic (September 10), Aotearoa (September 18), and South Africa (September 20).

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse here for upcoming in-person and virtual developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

– Prasad

Amazon EC2 defenses against L1TF Reloaded

2025-08-11 Ali Saidi

Post Syndicated from Ali Saidi original https://aws.amazon.com/blogs/security/ec2-defenses-against-l1tf-reloaded/

The guest data of AWS customers running on the AWS Nitro System and Nitro Hypervisor is not at risk from a new attack dubbed “L1TF Reloaded.” No additional action is required by AWS customers; however, AWS continues to recommend that customers isolate their workloads using instance, enclave, or function boundaries as described in AWS public documentation. The AWS Nitro System and Nitro Hypervisor are designed to help protect against this class of attacks.

A research paper titled Rain: Transiently Leaking Data from Public Clouds Using Old Vulnerabilities, and its presentation titled Spectre in the real world: Leaking your private data from the cloud with CPU vulnerabilities, demonstrate the attack L1TF Reloaded, which combines half-Spectre gadgets with L1 Terminal Fault (L1TF) to leak guest data. While this attack can successfully leak guest data from upstream Linux/Kernel-based Virtual Machine (KVM) and other cloud providers, it does not impact the guest data of AWS customers running on the AWS Nitro System and Nitro Hypervisor.

The Nitro Hypervisor’s protection against L1TF Reloaded is not the result of a specific patch or reactive mitigation, but rather due to the proactive approach to security at AWS. The fundamental security design principles of the Nitro Hypervisor—particularly the implementation of secret hiding through an extensive use of the eXclusive Page Frame Ownership (XFPO) concept (in some contexts referred to as process-local memory)—provides robust protection against this class of attacks. L1TF Reloaded represents an innovative approach to transient execution attacks, showing how threat actors can combine seemingly mitigated vulnerabilities to create new attacks that are more than the sum of their parts. The research is impressive and constructs a multilayer end-to-end exploit with real-world applicability. AWS sponsored a portion of this work and would like to thank the researchers for their collaboration and coordinated disclosure. The remainder of this post is a deeper dive into the published research.

The Nitro Hypervisor: Purpose-built for security

The Nitro Hypervisor is a foundational component of the AWS Nitro System, designed from the ground up with security as a primary consideration. Unlike traditional hypervisors that evolved from general-purpose operating systems, the Nitro Hypervisor, which is based on Linux/Kernel-based Virtual Machine (KVM), has been intentionally minimized and purpose-built with only the capabilities needed to perform its assigned functions.

The Nitro Hypervisor’s responsibilities are deliberately constrained: it receives virtual machine (VM) management requests from the Nitro Controller, partitions memory and CPU resources using hardware virtualization features, and assigns PCIe devices, including both Physical (PF) and Single Root I/O Virtualization (SR-IOV) Virtual Functions (VF) provided by Nitro hardware (such as NVMe for EBS and instance storage, and Elastic Network Adapter for networking) and third party devices (GPUs), to VMs. Critically, the Nitro Hypervisor excludes entire categories of functionality that exist in conventional hypervisors. There is no networking stack, no general-purpose file system implementations, no peripheral device-driver support, no shell, and no interactive access mode. This meticulous exclusion of non-essential features helps avoid entire classes of issues and attack vectors that can impact other hypervisors, such as remote networking attacks or driver-based privilege escalations.

Understanding transient execution vulnerabilities

To understand why the Nitro Hypervisor’s defenses are effective against L1TF Reloaded, it is important to first understand the fundamentals of transient execution vulnerabilities that emerged in 2018. Modern CPUs implement out-of-order and prediction-based speculative execution to optimize performance by executing operations before they are needed or before the CPU knows whether it should perform them at all. When predictions are wrong, or the CPU encounters execution faults, the CPU will eventually detect these errors and roll back all speculatively computed changes to the architectural state. However, traces of these “transient executions” remain detectable in the microarchitectural state, such as data that was speculatively loaded into CPU caches, creating opportunities for data leakage through side-channel attacks.

Half-Spectre gadgets: Incomplete but dangerous code patterns

While traditional Spectre attacks require complete “gadgets” that both access secret data and transmit it through side channels, researchers have identified a weaker class of gadgets called “half-Spectre gadgets.” These are incomplete Spectre-like code patterns that perform speculative out-of-bounds memory accesses, but lack the transmission component that would make them immediately exploitable.

A classic Spectre v1 gadget contains two key elements: first, a speculative access that loads secret data (such as x = A[index] where index is out of bounds), and second, a transmission mechanism that leaks the data through a side channel (such as y = B[64 * x] that creates cache patterns based on the secret value). Half-Spectre gadgets contain only the first element—the speculative access—without the transmission component.

Because half-Spectre gadgets appear harmless in isolation, they are commonly found throughout software, including hypervisors. These gadgets typically arise from array-indexing operations where bounds checking occurs, but the transient execution window allows out-of-bounds access before the bounds check resolves. The gadgets can be either absolute (directly providing the address to access) or relative (controlling an offset from a base address), with relative gadgets being more common due to typical array indexing patterns. The key insight of L1TF Reloaded is that half-Spectre gadgets, while harmless alone, become dangerous when combined with other vulnerabilities like L1TF. A threat actor can trigger a half-Spectre gadget in the hypervisor to speculatively load arbitrary data into the L1 data cache and then use L1TF to extract that cached data—effectively turning the “harmless” half-Spectre gadget into a complete gadget.

Intel L1TF: Leveraging speculative address translation

L1 Terminal Fault (L1TF), discovered in January 2018 and disclosed in August 2018, represents a significant type of transient execution vulnerability that affects Intel processors up to Coffee Lake. These processors are used in some 5th generation EC2 instance families and all older instance types. L1TF leverages faulty address translations during transient execution when accessing invalid page table entries. Under normal operation, when a CPU encounters a Page Table Entry (PTE) with the present bit cleared or reserved bits set, address translation should halt immediately. However, during transient execution, Intel processors affected by L1TF ignore these invalid page table states and utilize a partially translated address. If the target data exists in the L1 data cache, the CPU will speculatively load it and make it available to subsequent instructions, even though the access should be blocked. This behavior is particularly problematic in virtualized environments. A malicious guest operating system can deliberately clear present bits in its own page tables to trigger terminal faults. When this happens, the CPU skips the normal host address translation process and passes the guest physical address directly to the L1 data cache. This allows the threat actor to potentially read any cached physical memory on the system, regardless of ownership or privilege boundaries. For affected processors, comprehensive software mitigation requires expensive measures, like disabling Simultaneous Multi-Threading (SMT), flushing the L1 data cache on every context switch, or disabling Extended Page Tables (EPT) entirely—performance costs so significant that many systems implement only partial mitigations.

The L1TF Reloaded attack: Exploiting mitigation gaps using Spectre

The research paper demonstrates how threat actors can combine half-Spectre gadgets with L1TF to create a powerful attack vector against hypervisors that lack complete implementation of the previously outlined mitigations. The attack shows that vulnerabilities considered individually mitigated can still be leveraged if combined in novel ways. L1TF Reloaded works by leveraging the fact that while L1TF mitigations like L1 data cache flushing and core scheduling help prevent guest-to-guest attacks, they do not fully mitigate guest-to-host attacks. The attack operates across logical cores that share the L1 data cache in an SMT core. On one logical core, the threat actor triggers a half-Spectre gadget. By mistraining the branch predictor, the threat actor causes the hypervisor to speculatively access out-of-bounds memory, loading sensitive data into the shared L1 data cache. Simultaneously, on the other logical core, the threat actor uses L1TF to extract the cached data. While other research papers have demonstrated L1TF exploitation, this research paper has successfully demonstrated a multilayer end-to-end attack on upstream Linux/KVM and other cloud providers. The authors were able to use an existing half-Spectre gadget, break host Kernel Address Space Layout Randomization (KASLR), gain host address translation capability, find all the processes running on the host, identify the victim VM, break guest KASLR, gain guest address translation capability, identify the init process in the victim VM, enumerate the child processes of the init process, identify the nginx webserver process, locate the private TLS certificate in the guest process heap, and finally leak the private TLS certificate. However, when they attempted the same attack on AWS instances, they encountered a critical limitation: while they could leak some non-sensitive host data, they were unable to access guest data due to what they described as “an undocumented defense in the hypervisor that unmaps victim data from it. This “undocumented defense” is the Nitro Hypervisor’s implementation of secret hiding—a fundamental architectural decision that prevented this type of attack.

Secret hiding: Rethinking hypervisor memory architecture

Traditional hypervisor designs follow a hierarchical privilege model where each higher level of privilege is granted access to all lower level memory. In conventional systems, the hypervisor running at the highest privilege level can access all VM memory, ostensibly for legitimate management purposes. However, this design creates a vulnerability: if a threat actor can trick the hypervisor into speculatively accessing guest data, that data becomes available for extraction through side-channel attacks. The Nitro Hypervisor takes a fundamentally different approach through a technique called secret hiding. Instead of following the traditional model where the hypervisor has access to all VM memory (Figure 1), the Nitro Hypervisor makes sure that guest data is not present in the hypervisor’s virtual address space. By removing VM memory pages from the hypervisor’s virtual address space (Figure 2), we avoid the possibility of transient execution attacks accessing guest data, even if a threat actor successfully triggers gadgets within the hypervisor.

Figure 1: Memory view of the hypervisor without mitigations in the context of VM1

Figure 2: Memory view of the Nitro Hypervisor in the context of VM1. While no guest memory is mapped, only the state of the active guest can be accessed with other guest states remaining inaccessible.

This architectural decision means that when transient execution occurs in the Nitro Hypervisor—whether through L1TF, half-Spectre gadgets, or other transient execution vulnerabilities—there is simply no guest data available to be leaked, creating a barrier against this class of vulnerabilities. The Nitro Hypervisor retains access only to its own data, but guest data remain isolated and inaccessible. While we could not anticipate L1TF Reloaded exactly, we knew transient execution vulnerabilities would continue to be discovered and built defense-in-depth mechanisms which blocked extraction of guest data on AWS instances. This design decision was made proactively during the Nitro Hypervisor development, based on our threat model that explicitly includes guest-to-host attacks that exploit the hypervisor. By assuming that threat actors might find ways to trigger transient execution vulnerabilities within the Nitro Hypervisor—whether through known vulnerabilities like L1TF or future unknown attack vectors—we designed the system to limit the scope of such attacks from the outset.

Beyond memory: Protecting guest CPU context

When VMs are scheduled and context-switched, guest CPU context information such as general-purpose and floating-point register content must be saved and restored. Guest CPU context can contain highly sensitive information. Registers might contain cryptographic keys, memory addresses that could defeat Address Space Layout Randomization (ASLR), or other secrets that applications rely on for security. In traditional hypervisors, guest CPU context is often stored in memory accessible to the hypervisor, creating another potential target for transient execution attacks. The original XPFO (eXclusive Page Frame Ownership) implementation makes sure that either user space or the kernel—but not both—can access a memory page and does not protect guest CPU context since it is exclusively owned by the kernel. The Nitro Hypervisor extends the XPFO concept to guest CPU context by saving it in memory—also known as process-local memory—that is solely mapped by process-specific kernel Page Table Entries (PTEs), as is shown in Figure 2 above. This memory is specifically designed to be only accessible from the Nitro Hypervisor in the context of the process it belongs to. This makes sure that even if a threat actor successfully triggers transient execution vulnerabilities within the Nitro Hypervisor, they cannot access the guest CPU context from other guests. The researchers confirmed this protection, noting that the AWS threat model accounts for guest-to-host attacks and that secret hiding, combined with existing L1 data cache flushing and core scheduling, prevented them from leaking guest data. This comprehensive approach to secret hiding demonstrates the defense-in-depth philosophy of the Nitro System: rather than protecting only known attack vectors, AWS systematically identifies and protects potential sources of guest data leakage, including both VM memory and guest CPU context.

Applying secret hiding principles to Xen

Most AWS Xen instances are now running on the AWS Nitro System and hence enjoy the benefits of the Nitro Hypervisor thanks to Xen-on-Nitro. For our portfolio of instance families running on the AWS Xen Hypervisor, we have implemented similar secret hiding principles to provide protection against transient execution attacks.

Defense in depth: The Nitro Hypervisor’s proven security model

L1TF Reloaded represents an important advancement in our understanding of how seemingly mitigated vulnerabilities can be combined to create new attack vectors. The researchers of the Rain paper demonstrated how L1TF and half-Spectre gadgets can work together to leak guest data from hypervisors. We are pleased to support their work and collaborate with them. The Nitro Hypervisor’s protection against L1TF Reloaded is not the result of a specific patch or reactive mitigation, but rather due to AWS deeply investing in securing multi-tenant cloud environments against sophisticated adversaries. This research reinforces our confidence in the Nitro System’s security model against both known and unknown attack vectors. The proactive security approach of AWS includes designing systems with defense-in-depth principles from the ground up. The threat landscape will continue to evolve, and at the same time, the defense-in-depth mechanisms built into the Nitro Hypervisor and our other products and services will continue to help protect AWS customers from sophisticated attacks, while maintaining the performance and functionality they depend on.

If you have questions or feedback about this post, contact AWS Support.

AWS Weekly Roundup: Amazon DocumentDB, AWS Lambda, Amazon EC2, and more (August 4, 2025)

2025-08-04 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-documentdb-aws-lambda-amazon-ec2-and-more-august-4-2025/

This week brings an array of innovations spanning from generative AI capabilities to enhancements of foundational services. Whether you’re building AI-powered applications, managing databases, or optimizing your cloud infrastructure, these updates help build more advanced, robust, and flexible applications.

Last week’s launches
Here are the launches that got my attention this week:

Amazon DocumentDB – Amazon DocumentDB Serverless is now available offering an on-demand, fully managed MongoDB API-compatible document database service. Read more in Channy’s post.
Amazon Q Developer CLI – You can now create custom agents to help you customize the CLI agent to be more effective when performing specialized tasks such as code reviews and troubleshooting. More info in this blog.
Amazon Bedrock Data Automation – Now supports DOC/DOCX files for document processing and H.265 encoded video files for video processing, making it easier to build multimodal data analysis pipelines.
Amazon DynamoDB – Introduced the Amazon DynamoDB data modeling Model Context Protocol (MCP) tool, providing a structured, natural-language-driven workflow to translate application requirements into DynamoDB data models.
AWS Lambda – Response streaming now supports a default maximum response payload size of 200 MB, 10 times higher than before. Lambda response streaming helps you build applications that progressively stream response payloads back to clients, improving performance for latency sensitive workloads by reducing time to first byte (TTFB) performance.
Powertools for AWS – Introducing v2 of Powertools for AWS Lambda (Java), a developer toolkit that helps you implement serverless best practices and directly translates AWS Well-Architected recommendations.
Amazon SNS – Now supports three additional message filtering operators: wildcard matching, anything-but wildcard matching, and anything-but prefix matching. SNS now also supports message group IDs in standard topics, enabling fair queue functionality for subscribed Amazon SQS standard queues.
Amazon CloudFront – Now offers two capabilities to enhance origin timeout controls: a response completion timeout and support for custom response timeout values for Amazon S3 origins. These capabilities give you more control over how to handle slow or unresponsive origins.
Amazon EC2 – You are now able to force terminate EC2 instances that are stuck in the shutting-down state.
Amazon EC2 Auto Scaling – You can now use AWS Lambda functions as notification targets for EC2 Auto Scaling lifecycle hooks. For example, you can use this to trigger custom actions when an instance enters a wait state.
Amazon SES – You can now provision isolated tenants within a single SES account and apply automated reputation policies to manage email sending.
AWS Management Console – You can now view your AWS Applications in the Service menu in the console navigation bar. With this view you can see all your Applications and choose an Application to see all its associated resources.
Amazon Connect – Amazon Connect UI builder now features an updated user interface to reduce the complexity to build structured workflows. It also simplified forecast editing with a new UI experience that improves planning accuracy. The Contact Control Panel now features an updated and more intuitive user interface. Amazon Connect also introduced new actions and workflows into the agent workspace. These actions are powered by third-party applications running in the background.
AWS Clean Rooms – Now publishes events to Amazon EventBridge for status changes in a Clean Rooms collaboration, further simplifying how companies and their partners analyze and collaborate on their collective datasets without revealing or copying one another’s underlying data.
AWS Entity Resolution – Introduced rule-based fuzzy matching using Levenshtein Distance, Cosine Similarity, and Soundex algorithms to help resolve consumer records across fragmented, inconsistent, and often incomplete datasets.

Additional updates
Here are some additional projects, blog posts, and news items that I found interesting:

Amazon Strands Agents SDK: A technical deep dive into agent architectures and observability – Nice overview to build single and multi-agent architectures.
Build dynamic web research agents with the Strands Agents SDK and Tavily – Showing how easy it is to add a new tool.
Structured outputs with Amazon Nova: A guide for builders – Good tips implemented on top of native tool use with constrained decoding.
Automate the creation of handout notes using Amazon Bedrock Data Automation – A solution to build an automated, serverless solution to transform webinar recordings into comprehensive handouts.
Build modern serverless solutions following best practices using Amazon Q Developer CLI and MCP – Adding the AWS Serverless MCP server.
Introducing Amazon Bedrock AgentCore Browser Tool – More info on this tool that enables AI agents to interact seamlessly with websites.
Introducing Amazon Bedrock AgentCore Code Interpreter – A fully managed service that enables AI agents to securely execute code in isolated sandbox environments.

Upcoming AWS events
Check your calendars so that you can sign up for these upcoming events:

AWS re:Invent 2025 (December 1-5, 2025, Las Vegas) — AWS’s flagship annual conference offering collaborative innovation through peer-to-peer learning, expert-led discussions, and invaluable networking opportunities.

AWS Summits — Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Mexico City (August 6) and Jakarta (August 7).

AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Australia (August 15), Adria (September 5), Baltic (September 10), and Aotearoa (September 18).

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse here upcoming in-person and virtual developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

– Danilo

How to migrate your Amazon EC2 Oracle Transparent Data Encryption database encryption keystore to AWS CloudHSM

2025-07-30 Bhushan Bhale

Post Syndicated from Bhushan Bhale original https://aws.amazon.com/blogs/security/how-to-migrate-your-ec2-oracle-transparent-data-encryption-tde-database-encryption-wallet-to-cloudhsm/

July 30, 2025: This post has been republished to migrate the Amazon EC2 Oracle Transparent Data Encryption database encryption keystore to AWS CloudHSM using AWS CloudHSM Client SDK 5.

Encrypting databases is crucial for protecting sensitive data, helping you to be aligned with security regulations and safeguarding against data loss. Oracle Transparent Data Encryption (TDE) is a feature that you can use to encrypt data at rest within an Oracle database. TDE uses envelope encryption. Envelope encryption is when the encryption key used to encrypt the tables of your database is encrypted by a primary key that resides either in a software keystore or on a hardware keystore, such as a hardware security module (HSM). This primary key is non-exportable by design to protect the confidentiality and integrity of your database operation. This gives you a more granular encryption scheme on your data. Hence, TDE for Oracle is a common use case for HSM devices such as AWS CloudHSM.

Oracle TDE supports keystores to securely store the TDE primary encryption keys. You can use either the TDE wallet (software keystore) or external key managers such as an HSM device. In this solution, we show you how to migrate a TDE keystore for an Oracle 19c database installed on Amazon Elastic Compute Cloud (Amazon EC2) from a software-based TDE wallet to AWS CloudHSM.

Using an external key manager, such as CloudHSM, offers several benefits over keeping keys on the Oracle wallet on the host:

Enhanced security: CloudHSM provides FIPS 140 validated hardware security, keeping the encryption key in a tamper-resistant module.
Centralized key management: CloudHSM supports centralized management of encryption keys, making it straightforward to rotate, back up, and audit keys.
Compliance: Your regulatory requirements may include encryption, and using CloudHSM can help you meet these compliance needs.

When you move from one type of keystore to another, new TDE primary keys are created inside the new keystore. To make sure that you have access to backups that rely on your past encryption keys, consider leaving the keystore running for your normal recovery window period or copying existing keys to the new keystore with exact key labels. Being able to access prior primary keys will help avoid data re-encryption.

You can use TDE to encrypt data online or offline. Encrypting TDE tablespace online minimizes disruption to database operations; however, it requires twice the storage space as the tablespace being encrypted, because the encryption process happens on a copy of the original tablespace.

Solution overview

In this solution, you migrate a TDE keystore for an Oracle 19c database from a software-based TDE wallet to CloudHSM, using the steps shown in Figure 1. Start by moving the current encryption keystore, which is your original TDE wallet, to a software wallet. This is done by replacing the PKCS#11 provider of your original HSM with the CloudHSM PKCS#11 software library (steps 1–2), next you reverse migrate to a local wallet (steps 3–5). The third step is to switch the encryption wallet for your database to your CloudHSM cluster (steps 6 and 7). After this process is complete, your database will automatically re-encrypt the data keys using the new primary key.

Figure 1: Steps to migrate your EC2 Oracle TDE database encryption wallet to CloudHSM

Note: The following instructions were tested using Oracle version 19c.

Prerequisites

You must have the following prerequisites in place to complete the solution in this post.

AWS CloudHSM cluster: You need to have a CloudHSM cluster set up and configured with an admin EC2 instance for interacting with CloudHSM following steps and best practices covered in Getting started with AWS CloudHSM.
Oracle database: Make sure that your Oracle database is up and running. This post assumes that you have an Oracle Database 19c database running on an EC2 Linux instance and there is network connectivity set up to CloudHSM as explained in this Configure the Client Amazon EC2 instance security groups for AWS CloudHSM.

Migrate an Oracle database keystore to a CloudHSM external keystore

As the first step in the migration, you need to migrate your Oracle database keystore to a CloudHSM external keystore. You do this by installing the CloudHSM client and the PKCS#11 library and then configuring the PKCS#11 library to connect to the HSM cluster.

Install the CloudHSM client:

Install the latest CloudHSM client software on your EC2 instance.
Configure the client to connect to HSMs in your cluster. For Linux EC2, use the following command:
```
sudo /opt/cloudhsm/bin/configure-cli -a <The ENI IPv4 / IPv6 addresses of the HSM>
```
Copy the CloudHSM issuing certificate created when you initialized the cluster (customerCA.crt) to the /opt/cloudhsm/etc folder. For more information, see Activate the cluster in AWS CloudHSM.
Validate connectivity to the CloudHSM cluster.
```
/opt/cloudhsm/bin/cloudhsm-cli interactive
```

aws-cloudhsm > login --username hsm-crypto-user --role admin
aws-cloudhsm > user create --username hsm-crypto-user --role crypto-user

Install the PKCS#11 Library

Install the PKCS #11 library for AWS CloudHSM Client SDK 5.
Configure Oracle to use the PKCS library:
1. Copy the PKCS#11 library to the appropriate Oracle folder. Typically, this is:
```
     cp /opt/cloudhsm/libcloudhsm_pkcs11.so /opt/oracle/extapi/[32,64]/hsm/aws/{VERSION}/libcloudhsm_pkcs11.so
```
2. Make sure that the folder /opt/oracle has the correct ownership, usually oracle:dba as owner:group.

Configure PKCS#11 library to connect to the HSM cluster

Use the following commands:

sudo /opt/cloudhsm/bin/configure-pkcs11 -a <HSM IP addresses>
sudo /opt/cloudhsm/bin/configure-pkcs11 --hsm-ca-cert <customerCA certificate file>

Configure the Oracle wallet location

In this section, you configure Oracle to point to CloudHSM using the sqlnet.ora file.

To configure the Oracle wallet location:

Edit the sqlnet.ora parameter ENCRYPTION_WALLET_LOCATION to point to the HSM:
```
ENCRYPTION_WALLET_LOCATION=
  (SOURCE=(METHOD=HSM)
  )
```
Verify that the WALLET_ROOT parameter is pointing to the current file-based TDE wallet location. This parameter defines the location where the TDE wallet (and other related files) will be stored. You can set it to an existing directory, preferably one in your $ORACLE_BASE or $ORACLE_HOME directory, but other locations are also possible.
```
show parameter wallet_root
show parameter tde_configuration
```

Use the following commands to set WALLET_ROOT if it hasn’t already been set.

ALTER SYSTEM SET WALLET_ROOT = '/u01/app/oracle/admin/orcl/wallet' SCOPE=BOTH SID='*';
ALTER SYSTEM SET TDE_CONFIGURATION=“KEYSTORE_CONFIGURATION=FILE” SCOPE=BOTH SID=“*”

Note:

In Oracle Database 19c and later, the ENCRYPTION_WALLET_LOCATION. parameter in sqlnet.ora is deprecated in favor of using WALLET_ROOT and TDE_CONFIGURATION.

You can also use the V$ENCRYPTION_WALLET view to check the current keystore location and status.

Point the Oracle database to use a local file-based keystore and CloudHSM

The KEYSTORE_CONFIGURATION attribute within TDE_CONFIGURATION determines the keystore type.

To point the Oracle database:

Use the following code to point the database to the local keystore and CloudHSM.

ALTER SYSTEM SET TDE_CONFIGURATION="KEYSTORE_CONFIGURATION=HSM|FILE" SCOPE = BOTH SID = '*';

Restart the database to have consistent results.

Verify that the keystore file-based wallet is open

To proceed with encryption key migration, you need to check the current keystore status and make sure that the file-based wallet is open.

To verify that the wallet is open:

Check to see if the file-based wallet is open.
```
Select * from V$encryption_wallet;
```
Figure 2: Verify that the wallet status is OPEN

If the wallet status is not OPEN, use the following command to open it:

  ALTER SYSTEM SET ENCRYPTION WALLET OPEN IDENTIFIED BY "wallet_password";

Migrate the encryption key to CloudHSM

Use the ADMINISTER KEY MANAGEMENT SET ENCRYPTION KEY command to initiate the TDE primary encryption key migration.

To migrate the encryption key:

Use the following command to migrate the encryption key:

ADMINISTER KEY MANAGEMENT SET ENCRYPTION KEY IDENTIFIED BY "hsm-crypto-user:password" MIGRATE USING "wallet-password" WITH BACKUP USING 'backup_tag';

The parameters used to migrate the encryption key are:

SET ENCRYPTION KEY: Specifies that the command is related to the TDE primary encryption key
IDENTIFIED BY: Specifies the details for migrating the keystore, including the external keystore user and password
MIGRATE USING: Specifies the password for the file-based wallet containing the primary encryption key
WITH BACKUP: Creates a backup of the keystore before the migration

Verify that the migration is complete

At this point, the migration from Oracle to Cloud HSM should be complete. Use the following steps to verify it.

To verify the migration:

Check the wallet status again. If the migration was successful, the WALLET_TYPE will be HSM and the WALLET_OR will be PRIMARY.
```
Select * from V$encryption_wallet;
```
Figure 3: Verify that WALLET_TYPE and WALLET_OR are correct

In the wallet is not open, use:

SQL>administer key management set keystore open identified by "hsm-crypto-user:password"

Verify that the database can access encrypted data without issues, confirming that the migration was successful.

Setup auto-login

Create auto-login to open the wallet during database restarts to connect to AWS CloudHSM.

To set up auto_login:

Create a new file-based keystore with the same username and password as the CloudHSM crypto user.

ADMINISTER KEY MANAGEMENT CREATE KEYSTORE '/etc/oracle/wallets/<path>/tde' IDENTIFIED BY "hsm-crypto-user:password";

Add the CloudHSM crypto user password to a keystore (TDE wallet).

ADMINISTER KEY MANAGEMENT ADD SECRET 'hsm-crypto-user:password' FOR CLIENT 'HSM_PASSWORD' TO KEYSTORE '/etc/oracle/wallets/<path>/tde' IDENTIFIED BY "hsm-crypto-user:password" WITH BACKUP;

The following command creates a new auto-login keystore. This is useful for scenarios where the keystore needs to be accessed without human intervention.
```
ADMINISTER KEY MANAGEMENT CREATE AUTO_LOGIN KEYSTORE FROM KEYSTORE '/etc/oracle/wallets/<path>/tde' IDENTIFIED BY "<hsm-crypto-user:password>";
```

Open the newly created file based keystore.

ADMINISTER KEY MANAGEMENT SET KEYSTORE OPEN IDENTIFIED BY "<hsm-crypto-user:password>"

Key rotation

Key rotation helps you to adhere to security best practices by providing several data security benefits. Regular rotation of TDE keys reduces the window of opportunity for a bad actor who might have obtained a key, thereby minimizing the impact of a potential breach.

Many established security frameworks and compliance standards—such as PCI DSS and HIPAA—recommend or require regular key rotation to maintain the integrity and confidentiality of encrypted data. By making sure that keys aren’t used indefinitely, you can help reduce the risk of exposure or compromise, which reinforces overall security.

Encryption algorithms can become less secure over time because of advancements in computing power or newly discovered vulnerabilities. By rotating keys regularly, you can transition to stronger encrypting methods as needed, and so improve protection against emerging risks.

When to rotate TDE keys

The frequency of key rotation depends on several factors, including organizational policies, regulatory requirements, and the sensitivity of the data being protected. Here are some common practices:

Annually: Many organizations rotate TDE keys once a year to align with common compliance requirements.
Quarterly: For higher-security environments or more sensitive data, rotating keys every quarter can provide an additional layer of security.
If keys are compromised or suspected to be compromised: If you believe a key to be compromised, rotating that key as soon as possible is recommended to reduce the impact window.

Oracle TDE primary key rotation with an HSM key

In this section, you choose a 32-bit hex value to use as a prefix when generating a key, then use that key to update the Oracle database to use the new primary key.

Sign in to the database instance as a user who has the ADMINISTER KEY MANAGEMENT or SYSKM privilege and execute following command:
```
ADMINISTER KEY MANAGEMENT SET ENCRYPTION KEY IDENTIFIED BY "<hsm-crypto-user:password>"
```
Decide on a 32-bit hex value pattern to be used. We used 15A5142C9E2D3C2F18FD435814257DFD in this example.

Add the prefix ORACLE.TDE.HSM.MK to the hex pattern.

ORACLE.TDE.HSM.MK.0615A5142C9E2D3C2F18FD435814257DFD

key generate-symmetric aes --label 'ORACLE.TDE.HSM.MK.0615A5142C9E2D3C2F18FD435814257DFD' --key-length-bytes 32 —attributes encrypt=true decrypt=true“

Share the key with the Oracle hsm-crypto-user in case the original key was generated through another user.

 key share --filter attr.label=""ORACLE.TDE.HSM.MK.0615A5142C9E2D3C2F18FD435814257DFD"" attr.class=secret-key --username hsm_crypto_user --role crypto-user

Update the Oracle database to use the new primary TDE key.

ADMINISTER KEY MANAGEMENT USE KEY '0615A5142C9E2D3C2F18FD435814257DFD' FORCE KEYSTORE IDENTIFIED BY "hsm-crypto-user:password"

By following these guidelines, you can enhance the security of your encrypted data, align with regulatory requirements, and maintain robust key management practices.

Conclusion

In this post, we’ve shown you the importance of Transparent Data Encryption (TDE) and the benefits of using an external key manager such as AWS CloudHSM for storing TDE encryption keys. We’ve discussed the benefits of TDE compared to encrypting underlying storage and why using an external key manager is superior to keeping keys on the Oracle wallet on the host. Following these guidelines can help you enhance the security of your encrypted data, align with regulatory requirements, and maintain robust key management practices.

The key takeaways from this post are:

TDE offers granular encryption, compliance benefits, and robust key management.
External key managers provide enhanced security, centralized management, improved auditability and scalability.
Regular rotation of TDE keys is crucial for maintaining security, aligning with regulations, and following recommended practices in key management.

To start securing your Oracle databases with TDE and AWS CloudHSM, visit the AWS Management Console. Follow the steps outlined in this guide to migrate your TDE encryption keystore to AWS CloudHSM and begin rotating your keys regularly to enhance your data security posture. By taking these actions, you can make sure that your sensitive data remains protected, your organization remains aligned with regulations, and you are following the best practices in data encryption and key management.

For more information, see:

If you have feedback about this post, submit comments in the Comments section below.

New Amazon EC2 P6e-GB200 UltraServers accelerated by NVIDIA Grace Blackwell GPUs for the highest AI performance

2025-07-09 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/new-amazon-ec2-p6e-gb200-ultraservers-powered-by-nvidia-grace-blackwell-gpus-for-the-highest-ai-performance/

Today, we’re announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) P6e-GB200 UltraServers, accelerated by NVIDIA GB200 NVL72 to offer the highest GPU performance for AI training and inference. Amazon EC2 UltraServers connect multiple EC2 instances using a dedicated, high-bandwidth, and low-latency accelerator interconnect across these instances.

The NVIDIA Grace Blackwell Superchips connect two high-performance NVIDIA Blackwell tensor core GPUs and an NVIDIA Grace CPU based on Arm architecture using the NVIDIA NVLink-C2C interconnect. Each Grace Blackwell Superchip delivers 10 petaflops of FP8 compute (without sparsity) and up to 372 GB HBM3e memory. With the superchip architecture, GPU and CPU are colocated within one compute module, increasing bandwidth between GPU and CPU significantly compared to current generation EC2 P5en instances.

With EC2 P6e-GB200 UltraServers, you can access up to 72 NVIDIA Blackwell GPUs within one NVLink domain to use 360 petaflops of FP8 compute (without sparsity) and 13.4 TB of total high bandwidth memory (HBM3e). Powered by the AWS Nitro System, P6e-GB200 UltraServers are deployed in EC2 UltraClusters to securely and reliably scale to tens of thousands of GPUs.

EC2 P6e-GB200 UltraServers deliver up to 28.8 Tbps of total Elastic Fabric Adapter (EFAv4) networking. EFA is also coupled with NVIDIA GPUDirect RDMA to enable low-latency GPU-to-GPU communication between servers with operating system bypass.

EC2 P6e-GB200 UltraServers specifications
EC2 P6e-GB200 UltraServers are available in sizes ranging from 36 to 72 GPUs under NVLink. Here are the specs for EC2 P6e-GB200 UltraServers:

UltraServer type	GPUs	GPU memory (GB)	vCPUs	Instance memory (GiB)	Instance storage (TB)	Aggregate EFA Network Bandwidth (Gbps)	EBS bandwidth (Gbps)
u-p6e-gb200x36	36	6660	1296	8640	202.5	14400	540
u-p6e-gb200x72	72	13320	2592	17280	405	28800	1080

P6e-GB200 UltraServers are ideal for the most compute and memory intensive AI workloads, such as training and inference of frontier models, including mixture of experts models and reasoning models, at the trillion-parameter scale.

You can build agentic and generative AI applications, including question answering, code generation, video and image generation, speech recognition, and more.

P6e-GB200 UltraServers in action
You can use EC2 P6e-GB200 UltraServers in the Dallas Local Zone through EC2 Capacity Blocks for ML. The Dallas Local Zone (us-east-1-dfw-2a) is an extension of the US East (N. Virginia) Region.

To reserve your EC2 Capacity Blocks, choose Capacity Reservations on the Amazon EC2 console. You can select Purchase Capacity Blocks for ML and then choose your total capacity and specify how long you need the EC2 Capacity Block for u-p6e-gb200x36 or u-p6e-gb200x72 UltraServers.

Once Capacity Block is successfully scheduled, it is charged up front and its price doesn’t change after purchase. The payment will be billed to your account within 12 hours after you purchase the EC2 Capacity Blocks. To learn more, visit Capacity Blocks for ML in the Amazon EC2 User Guide.

To run instances within your purchased Capacity Block, you can use AWS Management Console, AWS Command Line Interface (AWS CLI) or AWS SDKs. On the software side, you can start with the AWS Deep Learning AMIs. These images are preconfigured with the frameworks and tools that you probably already know and use: PyTorch, JAX, and a lot more.

You can also integrate EC2 P6e-GB200 UltraServers seamlessly with various AWS managed services. For example:

Amazon SageMaker Hyperpod provides managed, resilient infrastructure that automatically handles the provisioning and management of P6e-GB200 UltraServers, replacing faulty instances with preconfigured spare capacity within the same NVLink domain to maintain performance.
Amazon Elastic Kubernetes Services (Amazon EKS) allows one managed node group to span across multiple P6e-GB200 UltraServers as nodes, automating their provisioning and lifecycle management within Kubernetes clusters. You can use EKS topology-aware routing for P6e-GB200 UltraServers, enabling optimal placement of tightly coupled components of distributed workloads within a single UltraServer’s NVLink-connected instances.
Amazon FSx for Lustre file systems provide data access for P6e-GB200 UltraServers at the hundreds of GB/s of throughput and millions of input/output operations per second (IOPS) required for large-scale HPC and AI workloads. For fast access to large datasets, you can use up to 405 TB of local NVMe SSD storage or virtually unlimited cost-effective storage with Amazon Simple Storage Service (Amazon S3).

Now available
Amazon EC2 P6e-GB200 UltraServers are available today in the Dallas Local Zone (us-east-1-dfw-2a) through EC2 Capacity Blocks for ML. For more information, visit the Amazon EC2 pricing page.

Give Amazon EC2 P6e-GB200 UltraServers a try in the Amazon EC2 console. To learn more, visit the Amazon EC2 P6e instances page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

— Channy

AWS Weekly Roundup: EC2 C8gn instances, Amazon Nova Canvas virtual try-on, and more (July 7, 2025)

2025-07-08 Elizabeth Fuentes

Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-bedrock-api-keys-amazon-nova-canvas-virtual-try-on-and-more-july-7-2025/

Every Monday we tell you about the best releases and blogs that caught our attention last week.

Before continuing with this AWS Weekly Roundup, I’d like to share that last month I moved with my family to San Francisco, California, to start a new role as Developer Advocate/SDE, GenAI.

This excites me because I’ll have the opportunity to connect with new communities in the Bay Area while tackling exciting new challenges. If you’re part of a community focused on building generative AI and agentics applications, or know of one, I’d love to connect. Let’s connect!

Last week’s launches
Here are the launches from last week:

New Amazon EC2 C8gn instances powered by AWS Graviton4 offering up to 600Gbps network bandwidth – Amazon Elastic Compute Cloud (Amazon EC2) C8gn instances are now generally available, powered by AWS Graviton4 processors and 6th generation AWS Nitro Cards. These network-optimized instances deliver up to 600 Gbps network bandwidth. This represents the highest bandwidth among EC2 network-optimized instances, with up to 192 vCPUs and 384 GiB memory. They provide 30% higher compute performance than C7gn instances and are ideal for network-intensive workloads like virtual appliances, data analytics, and cluster computing jobs.
Build the highest resilience apps with multi-Region strong consistency in Amazon DynamoDB global tables – Amazon DynamoDB global tables now supports multi-Region strong consistency (MRSC) for applications requiring zero Recovery Point Objective (RPO). This capability ensures applications can read the latest data from any Region during outages, addressing critical needs in payment processing and financial services. MRSC requires three AWS Regions configured as either three full replicas or two replicas plus a witness, providing the highest level of application resilience for mission-critical workloads.
Amazon Nova Canvas update: Virtual try-on and style options now available – Amazon Nova Canvas introduces virtual try-on capabilities that help you visualize how clothing looks on a person by combining two images, plus eight new pre-trained style options (3D animation, design sketch, vector illustration, graphic novel, etc.) for generating images with improved artistic consistency. Available in three AWS Regions, these features enhance AI-powered image generation capabilities for retailers and content creators seeking realistic product visualizations.
Amazon Q in Connect now supports 7 languages for proactive recommendations – Amazon Q in Connect, a generative AI-powered assistant for customer service, now provides proactive recommendations in seven languages: English, Spanish, French, Portuguese, Mandarin, Japanese, and Korean. The AI-powered customer service assistant detects customer intent during voice and chat interactions to help agents resolve issues quickly and accurately.
Amazon Aurora MySQL and Amazon RDS for MySQL integration with Amazon SageMaker is now available – This integration provides near real-time data availability for analytics. It automatically extracts MySQL data into lakehouses with Apache Iceberg compatibility. You can then access this data seamlessly through various analytics engines and machine learning tools.
Amazon Aurora DSQL is now available in additional AWS Regions – Amazon Aurora DSQL expands to Asia Pacific (Seoul) and now supports multi-Region clusters across Asia Pacific and European regions. This serverless, distributed SQL database offers unlimited scalability, highest availability, and zero infrastructure management with AWS Free Tier access.

Other AWS blog posts

Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service – Learn how to optimize Retrieval Augmented Generation (RAG) in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service. This comprehensive guide demonstrates implementing RAG workflows with LangChain, covers OpenSearch optimization strategies, provides setup instructions, and explains benefits of combining these AWS services for scalable, cost-effective generative AI applications.v
Agentic GenAI App Using Bedrock, MCP servers on EKS – This post shows how to build a scalable AI chat application using Amazon Bedrock, Strands Agent, and Model Context Protocol (MCP) servers deployed on Amazon Elastic Kubernetes Service (Amazon EKS). The architecture combines agentic workflows with containerized microservices for intelligent, auto-scaling conversations with multiple foundation models.
Enforce table level access control on data lake tables using AWS Glue 5.0 with AWS Lake Formation – AWS Glue 5.0 introduces Full-Table Access (FTA) control for Apache Spark with AWS Lake Formation, providing table-level security without fine-grained access overhead. This feature supports native Spark SQL/DataFrames for Lake Formation tables. It enables read/write operations on Iceberg and Hive tables with improved performance and lower costs.

Upcoming AWS events
Check your calendars and sign up for these upcoming AWS events:

AWS re:Invent – Register now to get a head start on choosing your best learning path, booking travel and accommodations, and bringing your team to learn, connect, and have fun. Early-career professionals can apply for the All Builders Welcome Grant program, designed to remove financial barriers and create diverse pathways into cloud technology. Applications are now open and close on July 15, 2025.
AWS NY Summit – You can gain insights from Swami’s keynote featuring the latest cutting-edge AWS technologies in compute, storage, and generative AI. My News Blog team is also preparing some exciting news for you. If you’re unable to attend in person, you can still participate by registering for the global live stream. Also, save the date for these upcoming Summits in July and August near your city.
AWS Builders Online Series – If you’re based in one of the Asia Pacific time zones, join and learn fundamental AWS concepts, architectural best practices, and hands-on demonstrations to help you build, migrate, and deploy your workloads on AWS.
Join AWS Gen AI Lofts – Experience AWS Gen AI Lofts across San Francisco, Berlin, Dubai, Dublin, Bengaluru, Manchester, Paris, Tel Aviv, and additional locations – hands-on workshops, expert guidance, investor networking, and collaborative spaces designed to accelerate your generative AI startup journey.

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Eli

New Amazon EC2 C8gn instances powered by AWS Graviton4 offering up to 600Gbps network bandwidth

2025-06-30 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/new-amazon-ec2-c8gn-instances-powered-by-aws-graviton4-offering-up-to-600gbps-network-bandwidth/

Today, we’re announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) C8gn network optimized instances powered by AWS Graviton4 processors and the latest 6th generation AWS Nitro Card. EC2 C8gn instances deliver up to 600Gbps network bandwidth, the highest bandwidth among EC2 network optimized instances.

You can use C8gn instances to run the most demanding network intensive workloads, such as security and network virtual appliances (virtual ﬁrewalls, routers, load balancers, proxy servers, DDoS appliances), data analytics, and tightly-coupled cluster computing jobs.

EC2 C8gn instances specifications
C8gn instances provide up to 192 vCPUs and 384 GiB memory, and offer up to 30 percent higher compute performance compared Graviton3-based EC2 C7gn instances.

Here are the specs for C8gn instances:

Instance Name	vCPUs	Memory (GiB)	Network Bandwidth (Gbps)	EBS Bandwidth (Gbps)
c8gn.medium	1	2	Up to 25	Up to 10
c8gn.large	2	4	Up to 30	Up to 10
c8gn.xlarge	4	8	Up to 40	Up to 10
c8gn.2xlarge	8	16	Up to 50	Up to 10
c8gn.4xlarge	16	32	50	10
c8gn.8xlarge	32	64	100	20
c8gn.12xlarge	48	96	150	30
c8gn.16xlarge	64	128	200	40
c8gn.24xlarge	96	192	300	60
c8gn.metal-24xl	96	192	300	60
c8gn.48xlarge	192	384	600	60
c8gn.metal-48xl	192	384	600	60

You can launch C8gn instances through the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDKs.

If you’re using C7gn instances now, you will have straightforward experience migrating network intensive workloads to C8gn instances because the new instances offer similar vCPU and memory ratios. To learn more, check out the collection of Graviton resources to help you start migrating your applications to Graviton instance types.

You can also visit the Level up your compute with AWS Graviton page to begin your Graviton adoption journey.

Now available
Amazon EC2 C8gn instances are available today in US East (N. Virginia) and US West (Oregon) Regions. Two metal instance sizes are only available in US East (N. Virginia) Region. These instances can be purchased as On-Demand, Savings Plan, Spot instances, or as Dedicated instances and Dedicated hosts.

Give C8gn instances a try in the Amazon EC2 console. To learn more, refer to the Amazon EC2 C8g instance page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

— Channy

AWS Weekly Roundup: New AWS Heroes, Amazon Q Developer, EC2 GPU price reduction, and more (June 9, 2025)

2025-06-09 Betty Zheng (郑予彬)

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-new-aws-heroes-amazon-q-developer-ec2-gpu-price-reduction-and-more-june-9-2025/

The AWS Heroes program recognizes a vibrant, worldwide group of AWS experts whose enthusiasm for knowledge-sharing has a real impact within the community. Heroes go above and beyond to share knowledge in a variety of ways in developer community. We introduce our newest AWS Heroes in the second quarter of 2025.

To find and connect with more AWS Heroes near you, visit the categories in which they specialize Community Heroes, Container Heroes, Data Heroes, DevTools Heroes, Machine Learning Heroes, Security Heroes, and Serverless Heroes.

Last week’s launches
In addition to the inspiring celebrations, here are some AWS launches that caught my attention.

Agentic capabilities for Amazon Q Developer Chat in the AWS Management Console and chat applications – In the AWS Management Console, Microsoft Teams, and Slack, Amazon Q Developer can now answer more complex queries by automatically identifying appropriate tools across more than 200 AWS services, breaking queries into executable steps, and showing its reasoning process as it works.
Claude Sonnet 4 model on Amazon Q Developer CLI – Amazon Q Developer CLI now supports Anthropic’s Claude Sonnet 4 and allows developers to switch between premium models (Sonnet 4, 3.7, and 3.5) using simple commands like /model or q chat --model.
Amazon API Gateway now supports dynamic request routing for REST APIs – Amazon API Gateway now supports routing rules for REST APIs using custom domain names, allowing traffic distribution based on HTTP headers, URL base paths, or both.
Amazon Athena announces managed query results to streamline analysis workflows – Managed query results is a new feature that automatically stores, encrypts, and manages the lifecycle of query results for customers at no additional cost.
A new monitoring dashboard in the AWS Network Firewall console – The new dashboard provides enhanced visibility into network traffic patterns, including top flows, TLS Server Name Indication (SNI), HTTP Host headers, long-lived TCP connections, and failed TCP handshakes.

For a full list of AWS announcements, be sure to keep an eye on What’s New at AWS.

Other AWS news
Here are some additional projects, blog posts that you might find interesting:

Up to 45 percent price reduction for Amazon EC2 NVIDIA GPU-accelerated instances – AWS is reducing the price of NVIDIA GPU-accelerated Amazon EC2 instances (P4d, P4de, P5, and P5en) by up to 45 percent for On-Demand and Savings Plan usage. We are also making the very new P6-B200 instances available through Savings Plans to support large-scale deployments.
Introducing public AWS API models – AWS now provides daily updates of Smithy API models on GitHub, enabling developers to build custom SDK clients, understand AWS API behaviors, and create developer tools for better AWS service integration.
The AWS Asia Pacific (Taipei) Region is now open – The new Region provides customers with data residency requirements to securely store data in Taiwan while providing even lower latency. Customers across industries can benefit from the secure, scalable, and reliable cloud infrastructure to drive digital transformation and innovation.
Amazon EC2 has simplified the AMI cleanup workflow – Amazon EC2 now supports automatically deleting underlying Amazon Elastic Block Store (Amazon EBS) snapshots when deregistering Amazon Machine Images (AMIs).
The Lab where AWS designs custom chips – Visit Annapurna Labs in Austin, Texas—a combination of offices, workshops, and even a mini data center—where Amazon Web Services (AWS) engineers are designing the future of computing.

Upcoming AWS events
Check your calendars and sign up for these upcoming AWS events.

Join re:Inforce from anywhere – If you aren’t able to make it to Philadelphia (June 16–18), tune in remotely. Get free access to the re:Inforce keynote and innovation talks live as they happen.
AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Shanghai (June 19 – 20), Milano (June 18), Mumbai (June 19) and Japan (June 25 – 26).
AWS re:Invent – Mark your calendars for AWS re:Invent (December 1 – 5) in Las Vegas. Registration is now open
AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Mexico (June 14), Nairobi, Kenya (June 14) and Colombia (June 28)

That’s all for this week. Check back next Monday for another Weekly Roundup!

– Betty

Announcing up to 45% price reduction for Amazon EC2 NVIDIA GPU-accelerated instances

2025-06-05 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/announcing-up-to-45-price-reduction-for-amazon-ec2-nvidia-gpu-accelerated-instances/

Customers across industries are harnessing the power of generative AI on AWS to boost employee productivity, deliver exceptional customer experiences, and streamline business processes. However, the growth in demand for GPU capacity has outpaced industry-wide supply, making GPUs a scarce resource and increasing the cost of securing them.

As Amazon Web Services (AWS) grows, we work hard to lower our costs so that we can pass those savings back to our customers. Regular price reductions on AWS services have been a standard way for AWS to pass on the economic efficiencies gained from our scale back to our customers.

Today, we’re announcing up to 45 percent price reduction for Amazon Elastic Compute Cloud (Amazon EC2) NVIDIA GPU-accelerated instances: P4 (P4d and P4de) and P5 (P5 and P5en) instance types. This price reduction to On-Demand and Savings Plan pricing applies to all Regions where these instances are available. The pricing reduction applies to On-Demand purchases beginning June 1 and to Savings Plan purchases effective after June 4.

Here is a table of price reductions percentage (%) from May 31, 2025 baseline prices by instance types and pricing plans:

Instance type	NVIDIA GPUs	On-Demand	EC2 Instance Savings Plans		Compute Savings Plans
Instance type	NVIDIA GPUs	On-Demand	1 year	3 years	1 year	3 years
P4d	A100	33%	31%	25%	31%	–
P4de	A100	33%	31%	25%	31%	–
P5	H100	44%	–	45%	44%	25%
P5en	H200	25%	–	26%	25%	–

Savings Plans are a flexible pricing model that offer low prices on compute usage, in exchange for a commitment to a consistent amount of usage (measured in $/hour) for a 1- or 3- year term. We offers two types of Savings Plans:

EC2 Instance Savings Plans provide the lowest prices, offering savings in exchange for commitment to usage of individual instance families in a Region (for example, P5 usage in the US (N. Virginia) Region).
Compute Savings Plans provide the most flexibility and help to reduce your costs regardless of instance family, size, Availability Zones, and Regions (for example, from P4d to P5en instances, shift a workload between US Regions).

To provide increased accessibility to reduced pricing, we are making at-scale On-Demand capacity available for:

P4d instances in the Asia Pacific (Seoul), Asia Pacific (Sydney), Canada (Central), and Europe (London) Regions
P4de instances in the US East (N. Virginia) Region
P5 instances in the Asia Pacific (Mumbai), Asia Pacific (Tokyo), Asia Pacific (Jakarta), and South America (São Paulo) Regions
P5en instances in the Asia Pacific (Mumbai), Asia Pacific (Tokyo), and Asia Pacific (Jakarta) Regions

We are also now delivering Amazon EC2 P6-B200 instances through Savings Plan to support large scale deployments, which became available on May 15, 2025 at launch only through EC2 Capacity Blocks for ML. EC2 P6-B200 instances, powered by NVIDIA Blackwell GPUs, accelerate a broad range of GPU-enabled workloads but are especially well-suited for large-scale distributed AI training and inferencing.

These pricing updates reflect the AWS commitment to making advanced GPU computing more accessible while passing cost savings directly to customers.

Give Amazon EC2 NVIDIA GPU-accelerated instances a try in the Amazon EC2 console. To learn more about these pricing updates, visit Amazon EC2 Pricing page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

— Channy

Optimizing ODCR usage through AI-powered capacity insights

2025-06-05 Ankush Goyal

Post Syndicated from Ankush Goyal original https://aws.amazon.com/blogs/compute/optimizing-odcr-usage-through-ai-powered-capacity-insights/

Efficient resource management is crucial for organizations seeking to optimize cloud costs while making sure of seamless access to compute capacity. Amazon Elastic Compute Cloud (Amazon EC2) On-Demand Capacity Reservations (ODCRs) provide the flexibility to reserve compute capacity within a specific Availability Zone (AZ) for any duration. This makes sure that critical workloads always have the necessary resources available, minimizing the risk of capacity shortages.

However, managing ODCRs across multiple teams and accounts presents several challenges:

Limited visibility across teams: In large organizations, tracking ODCR usage across teams and business units can be difficult, often leading to underused or duplicate reservations.
Complexity in optimization: Without clear insights, adjusting, releasing, or modifying reservations to align with changing workloads becomes a cumbersome process.
Cost management challenges: Unused or oversized ODCRs contribute to unnecessary expenses, making it essential to continuously monitor and optimize usage.

In this post, we demonstrate how Amazon Bedrock Agents can help organizations gain actionable insights into ODCR usage across their Amazon Web Services (AWS) environment. Unlike traditional approaches that rely on predefined patterns or manual tracking, this solution dynamically retrieves the latest ODCR usage data and provides intelligent, query-based recommendations to fulfill the capacity needs. This solution is serverless and pay-per-use and incurs minimal operational cost.

This approach allows IT leaders, cloud architects, and finance teams to optimize reservations, control costs, and enhance overall resource management—without the complexity of traditional analysis methods.

Solution overview

This solution addresses two specific use cases:

The team must create a new ODCR to accommodate an upcoming project.
The team needs to expand the capacity of their existing ODCR.

The system consists of three essential components:

ODCRSupervisorAgent: Functions as the main coordinator, handling user inquiries and managing requests to specialized subordinate agents.
CapacityPlanningAgent: Reviews existing ODCRs throughout the organization, recommending SPLIT and MOVE operations to optimize capacity allocation for new projects.
AugmentationAgent: Identifies opportunities to increase existing ODCR capacity by recommending MOVE operations from other organizational ODCRs.

The following figure shows the high-level architecture for this solution.

The system architecture uses AWS services for comprehensive ODCR management, as shown in the following figure. The platform combines Amazon Cognito for authentication, AWS Amplify for front-end delivery, and AWS Lambda for serverless computing. AWS Resource Explorer enables efficient discovery and tracking of ODCRs across AWS Regions, while Lambda functions periodically query Resource Explorer to retrieve detailed ODCR information and usage metrics. AWS Identity and Access Management (IAM) plays a crucial role in securing the system, managing fine-grained access controls and permissions across all components. Amazon Bedrock Agents and its multi-agent capabilities generate intelligent recommendations through coordinated agent interactions, allowing organizations to optimize ODCR usage and implement data-driven capacity planning strategies.

How the solution functions

At the heart of this system is an Amazon Bedrock Agent powered by Action Groups, which map user queries to backend tasks using Lambda. These Lambda functions act as the system’s intelligence layer: fetching, filtering, and formatting ODCR data for the agent to reason over. The following three Lambda functions are deployed as part of the AWS CloudFormation template:

get_az_mapping_info Lamba function: This Lambda function takes an AWS account ID and Region as input. It returns a mapping of AZ names to their corresponding AZ IDs for that account and AWS Region. This helps the Capacity Planning Agent align AZs across accounts.

find_eligiable_odcrs Lambda function: This Lambda function supports the Capacity Planning Agent by identifying suitable ODCRs that meet the specified instance type and AZ. It uses Resource Explorer to search for ODCRs across the organization. Then, the function assumes cross-account roles to access ODCRs in different AWS accounts to gather detailed information. After retrieving the necessary data, it filters the ODCRs based on instance type, tenancy, and AZ. Finally, it returns a formatted list of eligible ODCRs, which are sourced from either the specified account or an alternative account with adequate capacity.

find_eligible_odcrs_for_move Lambda function: This Lambda function assists the Augmentation Agent by finding all capacity reservations within the same account as the specified ODCR. It uses Resource Explorer to discover ODCRs and, when necessary, assumes cross-account roles to gather ODCR information. The function filters ODCRs based on matching instance type, tenancy, and AZ, then returns a formatted list of eligible ODCRs that can provide more capacity through MOVE operations.

Each Lambda function receives structured inputs from the Amazon Bedrock Agent, executes its designated task, and returns structured responses. Then, these responses are used by the agent to generate intelligent, actionable answers to user queries.

Together, these Lambda functions extend the capabilities of the Amazon Bedrock Agent by integrating with external data sources and automating complex ODCR management logic—allowing the system to deliver accurate, dynamic recommendations.

Prerequisites

You must have the following in place to implement the solution in this post:

An AWS account or multiple accounts with AWS Organizations configured.
- Enable Trusted Access for Resource Explorer in Organizations.
Foundational Model (FM) access in Amazon Bedrock for Anthropic’s Claude 3.5 Haiku v1 in the same Region where you plan to deploy this solution.
Turn on Resource Explorer.
- Setting up and configuring Resource Explorer
- We recommend assigning a delegated administrator account—the account where you plan to deploy the ODCR solution—and creating the view in that account. Note this account number because it is needed later while deploying cross account role.
- Create a Resource Explorer view that filters for resourcetype:ec2:capacity-reservation to enable searching for ODCRs in the delegated account.
- After setting up Resource Explorer and creating a view, note the Amazon Resource Name (ARN). You need this ARN when deploying the ODCR CloudFormation template.
Active ODCRs configured and maintained across multiple accounts within your Organization.
The accompanying CloudFormation template downloaded from the aws-samples GitHub repo.
- Cross Account IAM Role
- ODCR Solution

Deploy solution resources using CloudFormation

This solution is designed to be deployed in a single Region. If deploying outside us-east-1, then you must modify the CloudFormation parameters to reference the correct FM version and adjust for Regional availability, such as any necessary cross-Region inference profile configurations.

Deploy the Cross Account IAM Role template: To access ODCR data across your Organization, deploy a CloudFormation StackSet from your payer account. During deployment, you must provide the account number where you plan to deploy the ODCR solution. This is the same account number that you used as delegate account for Resource Explorer under the prerequisites. This account number is added to the trust policy of the IAM role, allowing the solution in that account to assume the role created by the StackSet. The deployment automatically provisions a standardized IAM role with the necessary permissions (for example ec2:DescribeCapacityReservations and ec2:DescribeAvailabilityZones) in each target account.
Deploy the ODCR Solution template: Deploy the provided template in the delegated administrator account that you chose for Resource Explorer. This sets up the core components of the solution and integrates with your payer account for organization-wide ODCR insights.

Resources deployed by the CloudFormation template

After you have deployed the CloudFormation template, the following AWS resources are created in your account.

Cross Account IAM Role template

IAM resource
- CrossAccountODCRAccessRole

ODCR Solution template

Amazon Cognito resources:
- User Pool: ODCRAgentUserPool
- App Client: ODCRAgentApp
- Identity Pool: odcr-identity-pool
- Groups: ODCRAdmins
- User: ODCR User
IAM resources:
- IAM roles:
  - GetAZMappingFunctionRole
  - LambdaODCRAccessRole
  - BedrockAgentExecutionRole
  - ODCRSupervisorAgentExecutionRole
  - CognitoAuthRole
Lambda functions:
- get_az_mapping_info
- find_eligible_odcrs
- find_eligible_odcrs_for_move
Amazon Bedrock Agents
- ODCRSupervisorAgent
- CapacityPlanningAgent with action groups:
  - get_az_mapping_info
  - find_eligible_odcrs
- AugmentationAgent with action group:
  - find_eligible_odcrs_for_move

After you deploy the final CloudFormation template, copy the following from the Outputs tab on the CloudFormation console to use during the configuration of your application after it’s deployed in Amplify:

AWSRegion
BedrockAgentAliasId
BedrockAgentId
BedrockAgentName
IdentityPoolId
UserPoolClientId
UserPoolId

The following screenshot shows you what the Outputs tab looks like.

Deploy the Amplify application for front-end

You need to manually deploy the Amplify application using the front-end code found on GitHub. Complete the following steps, as shown in the following figure:

Download the front-end code AWS-Amplify-Frontend.zip from GitHub.
Use the .zip file to manually deploy the application in Amplify.
Return to the Amplify page and use the domain that it automatically generated to access the application.

After completing these steps, your environment is ready to analyze ODCR usage and receive recommendations powered by Amazon Bedrock Agents.

Solution walkthrough

After deploying the application in Amplify, return to the Amplify console. Use the auto-generated domain URL shown there to access the front-end. Upon accessing the application URL, you’re prompted to provide information related to Amazon Cognito and Amazon Bedrock Agents. This information is necessary to securely authenticate users and allow the front end to interact with the Amazon Bedrock Agent. It enables the application to manage user sessions and make authorized API calls to AWS services on behalf of the user.

You can enter information with the values that you collected from the CloudFormation stack outputs, as shown in the following screenshot:

A temporary password was automatically generated during deployment and sent to the email address provided when launching the CloudFormation template. At first sign-in, you’re prompted to reset your password.When you’re signed in, you can begin interacting with the application to analyze and manage your ODCR usage. The interface supports natural language queries to streamline capacity planning. The following are some example questions that you can ask:

I require [X units] of capacity for the [INSTANCE TYPE] instance type in Availability Zone [AZ ID], within account [ACCOUNT NUMBER]. Could you please advise if there are any existing On-Demand Capacity Reservations (ODCRs) that can fulfill this requirement? Additionally, what operations would be necessary to utilize the available capacity?

I have a requirement to add [X additional units] of capacity to an existing On-Demand Capacity Reservation with ID [cr-0012abcd123456789]. Could you please advise on the steps or operations required to fulfill this request?

Cleaning up

If you decide to discontinue using this solution, then you can follow these steps to remove it, its associated resources deployed using CloudFormation, and the Amplify deployment:

Delete the CloudFormation StackSet:
1. On the CloudFormation console, choose StackSets in the navigation pane.
2. Locate your StackSet for the Cross-Account IAM Role (you assigned a name to it).
3. Choose Delete stacks from StackSet to remove all stack instances.
4. After all instances are deleted, choose StackSet.
5. Choose Delete StackSet from the Actions menu.
Delete the CloudFormation stack:
1. On the CloudFormation console, choose Stacks in the navigation pane.
2. Locate the stack you created during the deployment process (you assigned a name to it).
3. Choose the stack and choose Delete.
Delete the Amplify application and its resources. For instructions, refer to Clean Up Resources.

Conclusion

In this post, we demonstrated how to use Amazon Bedrock Agents and AWS services to build an intelligent ODCR management solution. Combining AWS Resource Explorer for organization-wide visibility, Amazon Bedrock Agents for intelligent recommendations, and a secure front-end powered by AWS Amplify, organizations can now make data-driven decisions about their capacity reservations. This solution helps teams optimize ODCR usage, reduce costs, and efficiently manage compute capacity across their AWS Organization. The artificial intelligence (AI)-powered approach eliminates manual tracking and analysis, allowing teams to focus on strategic capacity planning rather than operational overhead.

Get started today by deploying this solution in your AWS environment, and unlock the power of AI-driven capacity insights for more efficient ODCR management.

PackScan: Building real-time sort center analytics with AWS Services

2025-05-30 Sairam Vangapally

Post Syndicated from Sairam Vangapally original https://aws.amazon.com/blogs/big-data/packscan-building-real-time-sort-center-analytics-with-aws-services/

Amazon manages a complex logistics network with multiple touch points, from fulfillment centers to sort centers to final customer delivery. Among these, sort centers play a crucial role in the middle mile, providing faster and more efficient package movement. Within Amazon’s Middle Mile operations, high-volume sort centers process millions of packages daily, making immediate access to operational data essential for optimizing efficiency and decision-making. Real-time visibility into key metrics—such as package movements, container statuses, and associate productivity—is critical for smooth logistics operations. To address the need for real-time operational planning, the Amazon Middle Mile team developed PackScan, a cloud-based platform designed to provide instant insights across the network. By significantly reducing data latency, PackScan enables proactive decision-making, so teams can monitor inbound package flows, optimize outbound shipments based on live data, track associate productivity, identify bottlenecks, and enhance overall operational efficiency—all in real time.

In this post, we explore how PackScan uses Amazon cloud-based services to drive real-time visibility, improve logistics efficiency, and support the seamless movement of packages across Amazon’s Middle Mile network.

Prerequisites

This post assumes a foundational understanding of the following services and concepts:

Core services such as Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (Amazon SQS), AWS Lambda, Amazon Data Firehose, and Amazon OpenSearch Service
Infrastructure components like Amazon Elastic Compute Cloud (Amazon EC2), Amazon Load Balancers, and security configurations including network rules and security groups
Visualization tools such as Grafana hosted on Amazon EC2

Although hands-on experience is not required, a conceptual understanding of these services will help in understanding the architecture, design patterns, and components discussed throughout the article.

Business challenges

Amazon’s sort centers handle over 15 million packages daily across more than 120 facilities in North America. Given this scale, even minor delays in operational insights can lead to inefficiencies, increased costs, and escalations. Traditionally, data latencies of up to an hour have restricted the ability to make proactive decisions, directly affecting productivity, resource allocation, and responsiveness—especially during peak periods like holiday seasons and big deal days.

Without immediate visibility into package movements, container statuses, and associate performance, operational teams face challenges in identifying and resolving bottlenecks in real time. The lack of timely insights can disrupt the flow of packages, leading to shipment delays, reduced throughput, and suboptimal facility performance. Addressing these inefficiencies required a solution capable of delivering real-time, high-fidelity data to support rapid decision-making.

To bridge this gap, Amazon’s Middle Mile organization needed a scalable platform that could enhance visibility, minimize latency, and provide up-to-the-minute insights into logistics operations. PackScan was designed to meet these demands, giving teams access to the real-time data necessary to optimize workflows, mitigate bottlenecks, and improve overall efficiency.

Data flow

In 2024, PackScan was deployed across 80 sort centers in the USA, enabling real-time package analytics. The solution powers Grafana dashboards, which refresh every 10 seconds by fetching live package data from OpenSearch Service. With this near real-time visibility, operations teams can monitor package movement and sorting efficiency across sort centers. The following diagram outlines how package scan data is ingested, processed, and made actionable.

Each sort center is equipped with hardware at inbound stations where packages arrive from trailers. Integrated barcode scanners automatically scan each package as it enters the sorting process. Every scan generates an SNS event, capturing key attributes such as the package ID, dimensions, the associate who performed the scan, and the timestamp and location of the scan.

After they’re generated, these SNS events are ingested into Data Firehose through a Lambda function, where the data undergoes real-time enrichment. During this process, additional attributes are appended, including the business logic rules. The enriched data is then streamed into OpenSearch Service, where events are indexed to enable fast and efficient querying. With the indexed package scan events available in OpenSearch Service, real-time analytics and monitoring become possible. The Grafana dashboards query this data every 10 seconds, providing operational insights into package inflow metrics and associate performance.

Solution overview

PackScan was implemented using a structured and scalable approach, using AWS cloud-based services to enable high-frequency data ingestion, real-time processing, and actionable insights. The architecture is designed to minimize latency while providing reliability, scalability, and operational efficiency. The solution is built around a serverless, event-driven architecture that dynamically scales based on data ingestion volumes. The architecture—illustrated in the following figure—enabled us to build a real-time data solution, utilizing the advantages of various AWS services to provide low-latency analytics, high scalability, and real-time operational insights across Amazon’s sort centers.

The following are the key components and features of the solution:

Real-time data processing – Lambda functions serve as the processing backbone of the system, handling 500,000 scan events per second. Each incoming event is processed by applying data transformations, enrichment, and validation before passing it downstream.
High-frequency data ingestion and streaming – Data Firehose is the primary ingestion pipeline, handling millions of scan events daily from thousands of barcode scanners across multiple sort centers. The Firehose streams handle incoming data of 12,000 PUT requests per second, maintaining smooth ingestion and low-latency streaming. Data retention policies are set to buffer and forward enriched events every 60 seconds or upon reaching 5 MB batch size, optimizing storage and processing efficiency.
Optimized querying and operational insights – OpenSearch Service is used to index and store the processed scan events, providing real-time querying and anomaly detection. The OpenSearch cluster consists of 12 data nodes (r5.4xlarge.search) and 3 primary nodes (r5.large.search), processing up to 10 GB of data per day with a rolling index strategy, where indexes are rotated every 24 hours to maintain query performance. The system supports concurrent queries per second, enabling logistics teams to perform rapid lookups and gain instant visibility into package movements.
Live visualization and dashboarding – Grafana, hosted on an m5.12xlarge EC2 instance, provides real-time visualization of key logistics metrics. The dashboards refresh every 10 seconds, querying OpenSearch and displaying up-to-the-minute package analytics. The setup includes multiple preconfigured dashboards, monitoring package flow at different inbound stations, and workforce efficiency. These dashboards support concurrent users, enabling supervisors and associates to track and optimize operations proactively. The following screenshot shows one of the real-time dashboards, with details of package flow by different routes within sort centers.

The entire PackScan architecture is designed for automatic scaling, adjusting dynamically based on data ingestion volume to maintain efficiency during peak and off-peak operations. This approach provides cost-effective resource utilization while maintaining high availability and performance.

Business outcomes

The implementation of PackScan has led to measurable improvements in operational efficiency, workforce productivity, and real-time decision-making across Amazon’s sort centers. By reducing data latency and enabling real-time insights, PackScan has transformed logistics operations in meaningful ways:

Widespread deployment – PackScan was deployed across 80 sort centers, supporting approximately 1,000 display monitors that provide real-time operational insights.
Significant reduction in data latency – Data latency dropped from approximately 1 hour to less than 1 minute, allowing for real-time operational responsiveness and minimizing workflow disruptions.
Proactive operational management – With dynamic workload balancing and instant bottleneck identification, supervisors can now address issues as they arise, leading to smoother operations and fewer escalations.
Boost in workforce productivity – The real-time performance feedback has enhanced associate engagement, resulting in a 25% increase in throughput per hour and 12% reduction in labor hours.

Overall, PackScan has redefined real-time logistics visibility within Amazon’s Middle Mile operations, empowering operational teams with actionable insights, enhanced workforce efficiency, and a data-driven approach to package movement and sort center performance.

Lessons learned and best practices

The deployment and scaling of PackScan provided valuable insights into optimizing real-time logistics visibility. Several key lessons and best practices emerged from this implementation:

Cloud architecture drives efficiency – Adopting Amazon technologies provides seamless scalability, reduced operational overhead, and lower infrastructure costs, while maintaining high reliability. The following table shows an approximate breakdown of monthly service costs observed in production. This is an estimation based on current pricing; we recommend checking the respective AWS service pricing pages to generate the most up-to-date quote. This architecture demonstrates that with combination of provisioned and serverless design, production-ready solutions can be built and scaled at a fraction of the cost of traditional infrastructure.

AWS Service	Description	Estimated Monthly Cost
Amazon EC2	Three EC2 instances of type m5.12xlarge hosting Grafana	$1,700
AWS Lambda	Streams SNS events to Data Firehose	$4,000
Amazon Data Firehose	Real-time data delivery with 12,000 records streaming to OpenSearch Service	$1,500
Amazon OpenSearch Service	Indexing and querying package scan events	$28,000

Real-time visibility is a game changer – Immediate access to operational data enhances agility, enabling teams to make timely, data-driven decisions that prevent bottlenecks and improve throughput.
Continuous monitoring enhances decision-making – Operational dashboards should evolve with business needs. Regular monitoring and updates provide accuracy, usability, and relevance in driving informed decision-making.

By applying these best practices, PackScan has set a foundation for scalable, real-time logistics management, making sure that Amazon’s Middle Mile operations remain proactive, efficient, and highly responsive to changing business demands.

Conclusion

PackScan has successfully transformed real-time operational visibility within Amazon’s sort centers, addressing critical challenges in data latency, workforce productivity, and logistics efficiency. By using AWS services, particularly Data Firehose for real-time data delivery and OpenSearch Service for analytics, PackScan has enabled proactive decision-making, streamlined operations, and enhanced throughput in high-volume sort environments. Looking ahead, future enhancements will focus on further elevating operational intelligence and scalability, including:

Integrating predictive analytics to anticipate workflow bottlenecks and optimize resource allocation
Scaling the solution across additional operational scenarios, providing greater resilience and adaptability to dynamic logistics environments

With these advancements, PackScan will continue to drive operational excellence, cost-efficiency, and real-time decision-making capabilities, reinforcing Amazon’s commitment to innovation in logistics and supply chain management.

For those interested in implementing similar solutions, we recommend exploring AWS Serverless Architecture Patterns and the AWS Architecture Blog for additional insights and best practices in building scalable, real-time analytics solutions.

About the authors

Sairam Vangapally is a Data Engineer at Amazon with extensive experience architecting real-time, large-scale data platforms that power critical logistics operations across North America. He has led the design and deployment of end-to-end data pipelines, enabling high-throughput ingestion, transformation, and analytics at scale. He is passionate about building resilient data infrastructure and driving cross-functional collaboration to deliver solutions that accelerate operational insights and business impact.

Nitin Goyal serves as a Data Engineering Manager in Amazon’s Sort Center organization, where he leads initiatives to optimize operational efficiency across North American facilities. With over nine years of tenure at Amazon spanning multiple teams, he specializes in architecting high-performance data systems, with particular emphasis on real-time streaming pipelines, artificial intelligence, and low-latency solutions. His expertise drives the development of sophisticated operational workflows that enhance sort center productivity and effectiveness.

How to secure your instances with multi-factor authentication

2025-05-23 Sangavi P

Post Syndicated from Sangavi P original https://aws.amazon.com/blogs/compute/how-to-secure-your-instances-with-multi-factor-authentication/

At AWS, security is our top priority. We strongly recommend implementing comprehensive security controls across all application layers to ensure defense-in-depth. This multi-layered approach helps protect your workloads, data, and infrastructure from potential threats. In this post, we walk through implementing an additional layer of authentication security for your Amazon Linux 2023 (AL2023) Amazon Elastic Compute Cloud (Amazon EC2) instances by using two-factor authentication while connecting to the instance through Secure Shell (SSH).

SSH Access Security Fundamentals

The most common tool to connect to Linux servers is SSH. When an EC2 instance is launched, you are prompted to either create a new key pair or use an existing key pair to connect to the EC2 instance using SSH. The key pair is a combination of a private and public key, where the public key is stored within the instance in the ~/.ssh/authorized_keys file, while the private key is stored in the user’s machine. A compromised local machine containing SSH private keys poses a significant security risk. An attacker who obtains both the private key and the corresponding EC2 instance username could gain unauthorized SSH access to your instance with the same permissions as the compromised user account. To prevent anyone with the private keypair from accessing the instance you should implement two-factor auth through multi-factor authentication (MFA).

Configuring security groups to allow unrestricted SSH access (0.0.0.0/0) to EC2 instances’ public IP addresses creates a significant security vulnerability. This practice exposes your instances to potential brute force attacks and unauthorized access attempts from anywhere on the internet. To overcome this, we either recommend a user restricts the access to only “My IP” in the security group, or to have a bastion host or a jump box in front of your instances and access your instances through your bastion host. Implementing MFA on top of this would tighten the security while accessing the instances through SSH.

The following figure shows a common architecture anti-pattern:

The following figure shows the recommended architectures:

Prerequisites

Before proceeding, install the Google Authenticator app on your mobile device – you’ll need it to generate one-time passwords (OTPs) for Multi-Factor Authentication (MFA) on your Amazon Linux 2023 instances.

Configuration Steps

Install Google Authenticator in the EC2 instance

$ sudo dnf install google-authenticator qrencode -y

Configuring Google Authenticator

After installing the package, the application has to be initialized to generate a key for the user that you are logged in as (for example ec2-user) to add the authentication to that user account. Execute the following command to initialize the application.

google-authenticator

You are asked if the authentication tokens used should be time-based. In this example, you use time-based tokens.

Do you want authentication tokens to be time-based (y/n) y

This generates a QR code that you should scan using your Google Authenticator app. Then, enter the verification code from the application in the terminal after scanning, or manually enter the account name, which is any name, to recognize the instance and the secret key displayed in the terminal in the application to register your instance. After confirming the code, the system will generate emergency scratch codes. Store these codes securely – they serve as backup authentication methods if you lose access to your Google Authenticator app or mobile device. Each scratch code can be used only once.

After registering the instance details in the mobile application through QR code or manual operation, in the SSH terminal you are asked if the google_authenticator file should be updated for user ec2-user. Typing y saves the secret key, scratch codes, and the other configuration options that you select later on in the file. Run the initialization app and go through the same procedure for each user account to enable MFA on each account.

Do you want me to update your "/home/ec2-user/.google_authenticator" file (y/n) y

Choose y for the following question to refuse multiple uses of the same authentication token to enhance security and prevent a man-in-the-middle attack.

Do you want to disallow multiple uses of the same authentication token? 
This restricts you to one login about every 30s, but it increases your chances 
to notice or even prevent man-in-the-middle attacks (y/n) y

Choose n for the following question to have three valid codes in a 1:30-minute window unless you are facing issues.

By default, a new token is generated every 30 seconds by the mobile app.
In order to compensate for possible time-skew between the client and the server, 
we allow an extra token before and after the current time. 
This allows for a time skew of up to 30 seconds between authentication server and client. 
If you experience problems with poor time synchronization, 
you can increase the window from its default size of 3 permitted codes 
(one previous code, the current code, the next code) to 17 permitted codes 
(the 8 previous codes, the current code, and the 8 next codes). 
This will permit for a time skew of up to 4 minutes between client and server. 
Do you want to do so? (y/n) n

Choose y for the following question to enable rate-limiting to protect against brute-force logic attempts.

If the computer that you are logging into isn't hardened against brute-force 
login attempts, you can enable rate-limiting for the authentication module.
By default, this limits attackers to no more than 3 login attempts every 30s. 
Do you want to enable rate-limiting (y/n) y

Configure SSH to use the Google Pluggable Authentication module

By default PAM is not configured to use MFA for SSH connections. Now that you have the MFA module configured and running, you must modify the PAM configurations to use MFA authentication.

sudo vi /etc/pam.d/sshd

There are two options for configuration to choose from based on your requirement.

Option 1: Configuring MFA for all users in the instance

Add the following to the bottom of the file to use Google Authenticator to enforce MFA for SSH connections. When this option is enabled, the system will prompt all users for MFA during SSH connections, regardless of whether MFA has been configured for their individual accounts. This applies to every user attempting to access the instance, which may disrupt access for users without MFA configuration.

Important: Before implementing this setting, ensure all users who need instance access have properly configured MFA to prevent potential lockouts.

auth required pam_google_authenticator.so 
auth required pam_permit.so

Option 2: Configuring for specific users without affecting others

If there are other service accounts or users within the instance that should be able to log in without MFA, then add nullok at the end of the following statement. This means that users who don’t run Google Authenticator initialization won’t be asked for a second authentication.

auth required pam_google_authenticator.so nullok 
auth required pam_permit.so

Comment out the password requirement, as SSH key pairs provide stronger security than password-based authentication. In this configuration, users will need both an SSH key and a verification code from Google Authenticator to establish an SSH connection – eliminating password prompts while maintaining robust security through two-factor authentication. Note that if you leave password authentication enabled (by not commenting out this line), users will be required to provide three factors: their SSH key, password and MFA code to access the instance.

#auth       substack     password-auth

Save the file. You must change the SSH configuration to make it prompt for a second authentication. Run the following command to make changes in the configuration file.

sudo vi /etc/ssh/sshd_config.d/50-redhat.conf

Then, change ChallengeResponseAuthentication no to ChallengeResponseAuthentication yes.

Lastly, you must let SSH know that it should use key pair along with interactive authentication for the MFA module to login. At the bottom of the file, add the following:

AuthenticationMethods publickey,keyboard-interactive

Save the file. Restart the SSH service running in the instance to let the changes take effect. Restarting the SSH service (the SSH daemon) stops the main sshd process and starts a new one, but it doesn’t disconnect existing SSH sessions.

sudo /etc/init.d/sshd restart

To test the solution, open a new terminal window and SSH into the instance, and you are asked for a verification code. Keep your session in the original terminal window open while you SSH from your new window.

Type the code that’s generated on your Google Authenticator app and you are logged in to your instance.

Using MFA allows to add a further layer of security to Amazon Linux 2023 EC2 instances while logging in.

Cleanup

To avoid incurring charges, please stop or terminate the launched Amazon EC2 instance if not in use.

Conclusion

In this post, you learned how to enhance your Amazon Linux 2023 EC2 instance security by implementing multi-factor authentication (MFA) with Google Authenticator. This setup requires users to provide both their SSH key pair and a time-based one-time password from their application when connecting to instances, adding an essential extra layer of protection.

New Amazon EC2 P6-B200 instances powered by NVIDIA Blackwell GPUs to accelerate AI innovations

2025-05-16 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/new-amazon-ec2-p6-b200-instances-powered-by-nvidia-blackwell-gpus-to-accelerate-ai-innovations/

Today, we’re announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) P6-B200 instances powered by NVIDIA B200 to address customer needs for high performance and scalability in artificial intelligence (AI), machine learning (ML), and high performance computing (HPC) applications.

Amazon EC2 P6-B200 instances accelerate a broad range of GPU-enabled workloads but are especially well-suited for large-scale distributed AI training and inferencing for foundation models (FMs) with reinforcement learning (RL) and distillation, multimodal training and inference, and HPC applications such as climate modeling, drug discovery, seismic analysis, and insurance risk modeling.

When combined with Elastic Fabric Adapter (EFAv4) networking, hyperscale clustering by EC2 UltraClusters, and advanced virtualization and security capabilities by AWS Nitro System, you can train and serve FMs with increased speed, scale, and security. These instances also deliver up to two times the performance for AI training (time to train) and inference (tokens/sec) compared to EC2 P5en instances.

You can accelerate time-to-market for training FMs and deliver faster inference throughput, which lowers inference cost and helps increase adoption of generative AI applications as well as increased processing performance for HPC applications.

EC2 P6-B200 instances specifications
New EC2 P6-B200 instances provide eight NVIDIA B200 GPUs with 1440 GB of high bandwidth GPU memory, 5th Generation Intel Xeon Scalable processors (Emerald Rapids), 2 TiB of system memory, and 30 TB of local NVMe storage.

Here are the specs for EC2 P6-B200 instances:

Instance size	GPUs (NVIDIA B200)	GPU memory (GB)	vCPUs	GPU Peer to peer (GB/s)	Instance storage (TB)	Network bandwidth (Gbps)	EBS bandwidth (Gbps)
P6-b200.48xlarge	8	1440 HBM3e	192	1800	8 x 3.84 NVMe SSD	8 x 400	100

These instances feature up to 125 percent improvement in GPU TFLOPs, 27 percent increase in GPU memory size, and 60 percent increase in GPU memory bandwidth compared to P5en instances.

P6-B200 instances in action
You can use P6-B200 instances in the US West (Oregon) AWS Region through EC2 Capacity Blocks for ML. To reserve your EC2 Capacity Blocks, choose Capacity Reservations on the Amazon EC2 console.

Select Purchase Capacity Blocks for ML and then choose your total capacity and specify how long you need the EC2 Capacity Block for p6-b200.48xlarge instances. The total number of days that you can reserve EC2 Capacity Blocks is 1-14 days, 21 days, 28 days, or multiples of 7 up to 182 days. You can choose your earliest start date for up to 8 weeks in advance.

Now, your EC2 Capacity Block will be scheduled successfully. The total price of an EC2 Capacity Block is charged up front, and the price doesn’t change after purchase. The payment will be billed to your account within 12 hours after you purchase the EC2 Capacity Blocks. To learn more, visit Capacity Blocks for ML in the Amazon EC2 User Guide.

When launching P6-B200 instances, you can use AWS Deep Learning AMIs (DLAMI) to support EC2 P6-B200 instances. DLAMI provides ML practitioners and researchers with the infrastructure and tools to quickly build scalable, secure, distributed ML applications in preconfigured environments.

To run instances, you can use AWS Management Console, AWS Command Line Interface (AWS CLI) or AWS SDKs.

You can integrate EC2 P6-B200 instances seamlessly with various AWS managed services such as Amazon Elastic Kubernetes Services (Amazon EKS), Amazon Simple Storage Service (Amazon S3), and Amazon FSx for Lustre. Support for Amazon SageMaker HyperPod is also coming soon.

Now available
Amazon EC2 P6-B200 instances are available today in the US West (Oregon) Region and can be purchased as EC2 Capacity blocks for ML.

Give Amazon EC2 P6-B200 instances a try in the Amazon EC2 console. To learn more, refer to the Amazon EC2 P6 instance page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

— Channy

How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

Enhanced remote desktop experience: Amazon DCV with Amazon Linux 2023

2025-05-13 Madhur Kulkarni

Post Syndicated from Madhur Kulkarni original https://aws.amazon.com/blogs/compute/enhanced-remote-desktop-experience-amazon-dcv-with-amazon-linux-2023/

Amazon DCV has evolved as a powerful remote display protocol, enabling secure high-performance remote desktop access and application streaming. This blog talks about how DCV remote display capabilities are now integrated with Amazon Linux 2023 (AL2023).

Overview

This post introduces new Graphical Desktop with AL2023 and provides an overview of new features available through DCV. The Graphical Desktop comes with GNOME 47 for a smooth UI experience that you can connect using DCV, enabling remote desktop access from anywhere. It also provides an overview of more tools such as a terminal emulator with Ptyxis for improved CLI experience, Mozilla Firefox for secure web browsing, an image viewer with Loupe, a text editor, and a file manager for file navigation, and.

Core features

AL2023 introduces an enhanced desktop experience, specifically tailored for remote access needs, as shown in the following Figure-1. DCV technology allows you to connect seamlessly to Graphical Desktop interface with GNOME 47. Users benefit from native DCV protocol support that enables high-performance remote access, featuring dynamic resolution adaptation and hardware-accelerated video encoding. Enhanced security features include advanced encryption and granular access controls.

Although DCV supports multiple desktop environments, the use of GNOME 47 is specifically part of the AL23 current release. The GNOME 47 system uses Mutter 47.0 as its window manager and compositor, alongside the GTK 4 toolkit for its user interface. This includes window management capabilities that provide more precise control over application placement and sizing, while improved multi-monitor support makes sure that workspaces expand seamlessly across displays. Most importantly, there is a native desktop-like experience with the DCV local features such as clipboard sharing, audio redirection, and multi-monitor capabilities, which deliver a seamless and responsive remote environment.

Figure 1. DCV desktop interface with AL2023

Ptyxis delivers exceptional performance with SSH, SFTP, and TLS/SSL support, as shown in the following figure. You can experience GPU-accelerated text rendering with a crystal-clear display, supporting UTF-8/16 and Unicode 15.0, while maintaining minimal input lag at 60 Hz refresh rates. The DCV 4K support enables high-resolution (3840 x 2160 pixels) remote desktop streaming, which allows users to work with graphically intensive applications while maintaining excellent visual quality. Ptyxis is deeply integrated with GNOME through D-Bus and GNOME I/O (GIO) interfaces, providing access to global search and system notifications. Users can use advanced session management with JSON-based configurations, tab groups, split views up to 16 panes, and automatic session restoration. The terminal includes full 256-color and true-color support, compatible with Bash, Zsh, and Fish shells, while maintaining robust connection stability.

Figure 2. Terminal Emulator

Firefox in AL2023 is optimized specifically for remote desktop scenarios, as shown in the following figure. The browser features hardware-accelerated rendering and WebGL 2.0 support, delivering smooth graphics and responsive page loading. Enhanced browser capabilities provide better performance for 3D applications and interactive web content. Users can experience optimized video playback with minimal frame drops and improved synchronization, which are particularly important for remote streaming needs. Integration with DCV streaming technology enables efficient resource usage and provides a local-like experience when accessing remote workstations, featuring seamless audio-video synchronization, smooth multi-monitor support, and native peripheral device integration.

Figure 3. Mozilla Firefox with DCV

More features

The GNOME Text Editor seamlessly integrates with AL2023, providing a modern, distraction-free interface for coding and text editing within the DCV environment, as shown in the following figure. As the default text editor in the AL23 GNOME desktop, it offers essential features such as syntax highlighting, dark/light themes, and autosave functionality, making it ideal for remote development work.

Figure 4. GNOME Text Editor

Loupe offers a sleek and intuitive image viewing experience in AL2023 when accessed through DCV, as shown in the following figure. It features a clean interface with smooth animations, efficient image loading, and gesture support, all while maintaining responsive performance over the DCV remote desktop connection. This makes it ideal for viewing and basic image manipulation tasks.

Figure 5. Image Viewer features and options

The GNOME File manager in AL2023 provides a robust and intuitive interface for managing files and folders when accessed through the DCV remote desktop environment, as shown in the following figure. It offers essential features such as drag-and-drop functionality, list and grid views, file search, and seamless integration with cloud storage, all while maintaining responsive performance over the DCV optimized remote connection protocol. You can use DCV to upload files to and download files from DCV session storage. For instructions on how to enable and configure session storage, go to Enabling session storage in the DCV Administrator Guide.

Figure 6. File Manager

Conclusion

The Amazon DCV team is committed to delivering the best remote desktop experience possible, and these enhancements demonstrate that commitment. In this post, we demonstrated how our integrated solution, from the GNOME 47 intuitive interface to the powerful terminal capabilities of Ptyxis, creates a seamless remote workspace. Using these improvements allows you to enhance productivity and overall user experience in remote desktop environments. These enhanced capabilities offer a significant step forward in remote computing, thereby providing tools and optimizations designed to meet the evolving needs of the distributed and flexible work environments today.

For a deeper dive into setup and advanced configurations, you should review our comprehensive DCV admin guides, which provide detailed information to help you maximize the potential of these new features.

AWS Weekly Roundup: South America expansion, Q Developer in OpenSearch, and more (May 12, 2025)

2025-05-12 Micah Walter

Post Syndicated from Micah Walter original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-south-america-expansion-q-developer-in-opensearch-and-more-may-12-2025/

I’ve always been fascinated by how quickly we’re able to stand up new Regions and Availability Zones at AWS. Today there are 36 launched Regions and 114 launched Availability Zones. That’s amazing!

This past week at AWS was marked by significant expansion to our global infrastructure. The announcement of a new Region in the works for South America means customers will have more options for meeting their low latency and data residency requirements. Alongside the expansion, AWS announced the availability of numerous instance types in additional Regions.

In addition to the infrastructure expansion, AWS is also expanding the reach of Amazon Q Developer into Amazon OpenSearch Service.

Last week’s launches

Introducing Amazon Q Developer in Amazon OpenSearch Service — Amazon Q Developer is now integrated with Amazon OpenSearch Service, providing AI-powered assistance to help developers quickly build search applications with relevant code suggestions and troubleshooting guidance.
In the works – AWS South America (Chile) Region — AWS has announced plans for a new South America (Chile) Region, expanding its global infrastructure to provide lower latency and enhanced data residency options for customers in that geographic region.
Introducing the AWS Zero Trust Accelerator for Government — The new AWS Zero Trust Accelerator for Government helps public sector organizations implement zero trust security architectures more efficiently, with specialized tools and guidance for compliance requirements.
Accelerate the transfer of data from an Amazon EBS snapshot to a new EBS volume — AWS has announced the general availability of Amazon EBS Provisioned Rate for Volume Initialization, a new feature that allows users to accelerate data transfer from EBS snapshots to new EBS volumes at specified rates between 100 MiB/s and 300 MiB/s.

Instance announcements

AWS expanded instance availability for an array of instance types across additional Regions.

Amazon EC2 C8gd, M8gd, and R8gd instances are now available in AWS Region Europe (Frankfurt) — Amazon EC2 C8gd, M8gd, and R8gd instances are now available in the AWS Europe (Frankfurt) Region, bringing Graviton-powered compute instances with local NVMe-based SSD storage to more customers.
Amazon EC2 R7g instances are now available in additional AWS Regions — Amazon EC2 R7g memory-optimized instances powered by AWS Graviton3 processors are now available in additional AWS Regions, expanding deployment options for memory-intensive workloads.
Amazon EC2 R7g instances are now available in AWS GovCloud (US-East) — Amazon EC2 R7g instances are now available in AWS GovCloud (US-East), bringing high-performance, memory-optimized compute options to government and regulated workloads.
Amazon EC2 X2idn instances now available in AWS Israel (Tel Aviv) Region — Amazon EC2 X2idn memory-optimized instances with Intel Xeon Scalable processors are now available in the AWS Israel (Tel Aviv) Region, supporting in-memory databases and high-performance computing workloads.
Amazon EC2 P5en instances are now available in the AWS US West (N. California) Region — Amazon EC2 P5en instances, designed for high-performance ML training and generative AI inferencing, are now available in the AWS US West (N. California) Region.

Additional updates

AWS Systems Manager adds customization options for onboarding configuration — AWS Systems Manager now offers expanded customization options for onboarding configuration, giving administrators more flexibility when setting up and managing their resources.
Amazon MSK enables seamless certificate renewals on MSK Provisioned clusters — Amazon MSK enables seamless certificate renewals on MSK Provisioned clusters, simplifying security management for Apache Kafka environments.
AWS Resource Explorer supports 41 new resource types — AWS Resource Explorer has expanded to support 41 new resource types, making it easier to discover and visualize more AWS resources across your accounts.
Amazon Managed Service for Prometheus now available in the AWS Canada (Central) Region — Amazon Managed Service for Prometheus is now available in the AWS Canada (Central) Region, offering container monitoring capabilities to more customers in North America.
AWS End User Messaging is now available in Mexico (Central) — AWS End User Messaging is now available in Mexico (Central), allowing businesses to send SMS, voice, and email messages to customers in the geographic region.
Amazon WorkSpaces is now available in AWS Europe (Paris) Region Amazon WorkSpaces, AWS’s managed desktop virtualization service, is now available in AWS Europe (Paris) Region, expanding virtual desktop infrastructure options for European customers.

Upcoming events

We are in the middle of AWS Summit season! AWS Summits run throughout the summer in cities all around the world. Be sure to check the calendar to find out when a AWS Summit is happening near you. Here are the remaining Summits for May, 2025.

Seoul, South Korea – May 14th – 15th
Dubai, United Arab Emirates – May 21st, 2025
Tel Aviv, Israel – May 28th, 2025
Singapore – May 29th, 2025

How is the News Blog doing? Take this 1 minute survey!

Announcing second-generation AWS Outposts racks with breakthrough performance and scalability on-premises

2025-04-29 Micah Walter

Post Syndicated from Micah Walter original https://aws.amazon.com/blogs/aws/announcing-second-generation-aws-outposts-racks-with-breakthrough-performance-and-scalability-on-premises/

Today we’re announcing the general availability of second-generation AWS Outposts racks, which marks the latest innovation from AWS for edge computing. This new generation includes support for the latest x86-powered Amazon Elastic Compute Cloud (Amazon EC2) instances, new simplified network scaling and configuration, and accelerated networking instances designed specifically for ultra-low latency and high-throughput workloads. These enhancements deliver greater performance for a broad range of on-premises workloads, such as core trading systems of financial services and telecom 5G Core workloads.

Customers like athenahealth, FanDuel, First Abu Dhabi Bank, Mercado Libre, Liberty Latin America, Riot Games, Vector Limited, and Wiwynn are already using Outposts racks for workloads that need to stay on-premises. The second-generation Outposts rack can provide low latency, local data processing, or data residency needs, such as game servers for multi-player online games, customer transaction data, medical records, industrial and manufacturing control systems, telecom Business Support Systems (BSS), and edge inference of a variety of machine learning (ML) models. Customers can now take advantage of the latest generation of processors and more advanced configurations of Outposts racks to support faster processing, higher memory capacity, and increased network bandwidth.

Latest generation EC2 instances

We’re excited to announce local support for the latest generation (7th generation) of x86-powered Amazon EC2 instances on AWS Outposts racks, starting with C7i compute-optimized instances, M7i general-purpose instances, and R7i memory-optimized instances. These new instances deliver twice the vCPU, memory, and network bandwidth while providing up to 40% better performance compared to C5, M5, and R5 instances on previous generation Outposts racks. They are powered by 4th Gen Intel Xeon Scalable processors and are ideal for a broad range of on-premises workloads requiring enhanced performance such as larger databases, more memory-intensive applications, advanced real-time big data analytics, high-performance video encoding and streaming, and CPU-based edge inference with more sophisticated ML models. Support for more latest generation EC2 instances, including GPU-enabled instances, is coming soon.

Simplified network scaling and configuration

We’ve completely reimagined networking in our latest Outposts generation, making it simpler and more scalable than ever. At the heart of this upgrade is our new Outposts network rack, which acts as a central hub for all your compute and storage traffic.

This new design brings three major benefits to the table. First, you can now scale your compute resources independently from your networking infrastructure, giving you more flexibility and cost efficiency as your workloads grow. Second, we’ve built in network resilience from the ground up, with the network rack automatically handling device failures to keep your systems running smoothly. Third, connecting to your on-premises environment and AWS Regions is now a breeze – you can configure everything from IP addresses to VLAN and BGP settings through straightforward APIs or our updated console interface.

Specialized Amazon EC2 instances with accelerated networking

We’re introducing a new category of specialized Amazon EC2 instances on Outposts racks with accelerated networking. These instances are purpose built for the most latency-sensitive, compute-intensive, and throughput-intensive mission-critical workloads on-premises. To deliver the best possible performance, in addition to the Outpost logical network, these instances feature a secondary physical network with network accelerator cards connected to top-of-rack (TOR) switches.

First in this category are bmn-sf2e instances, designed for ultra-low latency with deterministic performance. The new instances run on Intel’s latest Sapphire Rapids processors (4th Gen Xeon Scalable), delivering 3.9 GHz sustained performance across all cores with generous memory allocation – 8GB of RAM for every CPU core. We’ve equipped bmn-sf2e instances with AMD Solarflare X2522 network cards that connect directly to top-of-rack switches.

For financial services customers, especially capital market firms, these instances offer deterministic networking through native Layer 2 (L2) multicast, precision time protocol (PTP), and equal cable lengths. This enables customers to meet regulatory requirements around fair trading and equal access while easily connecting to their existing trading infrastructure.

Instance Name	vCPUs	Memory (DDR5)	Network Bandwidth	NVMe SSD Storage	Accelerated Network Cards	Accelerated Bandwidth (Gbps)
bmn-sf2e.metal-16xl	64	512 GiB	25 Gbps	2 x 8 TB (16 TB)	2	100
bmn-sf2e.metal-32xl	128	1024 GiB	50 Gbps	4 x 8 TB (32 TB)	4	200

The second instance type, bmn-cx2, is optimized for high throughput and low latency. This instance features NVIDIA ConnectX-7 400G NICs physically connected to high-speed top-of-rack switches, delivering up to 800 Gbps bare metal network bandwidth operating at near line rate. With native Layer 2 (L2) multicast and hardware PTP support, this instance is ideal for high-throughput workloads like real-time market data distribution, risk analytics, and telecom 5G core network applications.

Instance Name	vCPUs	Memory (DDR5)	Network Bandwidth	NVMe SSD Storage	Accelerated Network Cards	Accelerated Bandwidth (Gbps)
bmn-cx2.metal-48xl	192	1536 GiB	50 Gbps	4 x 4 TB (16 TB)	2	800

Bottom line, the new generation of Outposts racks deliver enhanced performance, scalability, and resiliency for a broad range of on-premises workloads, even for mission-critical workloads with the most stringent latency and throughput requirements. You can make your selection and initiate your order from the AWS Management Console. The new instances maintain consistency with regional deployments by supporting the same APIs, AWS Management Console, automation, governance policies, and security controls in the cloud and on-premises, improving developer productivity and IT efficiency.

Things to know

At launch, second-generation Outposts racks can be shipped to US and Canada and be parented back to 6 AWS Regions including US East (N. Virginia and Ohio), US West (Oregon), EU West (London and France) and Asia Pacific (Singapore). Support for more countries and territories and AWS Regions is coming soon. At launch, second-generation Outposts racks locally support a subset of AWS services found in previous generation Outposts racks. Support for more EC2 instance types and more AWS services is coming soon.

To learn more, visit the AWS Outposts racks product page and user guide. You can also talk to an Outposts expert if you are ready to discuss your on-premises needs.

— Micah;

How is the News Blog doing? Take this 1 minute survey!

Instance type flexibility strategies

Regional adaptation techniques

Best practices for EC2 Spot Instances usage with Graviton-based instances

Attribute-based instance type selection

Special considerations

Performance testing with multiple instance types

Reserving specific Graviton instance types

Amazon EMR workloads

Conclusion

Solution overview

AWS Local Zones: The foundation

Amazon EMR with Apache Flink: The decision engine

Architectural component choice considerations

Prerequisites

Deploy Amazon EMR on a Local Zone

Performance and scaling considerations

Security considerations

Benefits of Amazon EMR

Clean up

Conclusion

About the authors

The Nitro Hypervisor: Purpose-built for security

Understanding transient execution vulnerabilities

Half-Spectre gadgets: Incomplete but dangerous code patterns

Intel L1TF: Leveraging speculative address translation

The L1TF Reloaded attack: Exploiting mitigation gaps using Spectre

Secret hiding: Rethinking hypervisor memory architecture

Beyond memory: Protecting guest CPU context

Applying secret hiding principles to Xen

Defense in depth: The Nitro Hypervisor’s proven security model

Solution overview

Prerequisites

Migrate an Oracle database keystore to a CloudHSM external keystore

Configure the Oracle wallet location

Point the Oracle database to use a local file-based keystore and CloudHSM

Verify that the keystore file-based wallet is open

Migrate the encryption key to CloudHSM

Verify that the migration is complete

Setup auto-login

Key rotation

When to rotate TDE keys

Oracle TDE primary key rotation with an HSM key

Conclusion

Solution overview

How the solution functions

Prerequisites

Deploy solution resources using CloudFormation

Resources deployed by the CloudFormation template

Deploy the Amplify application for front-end

Solution walkthrough

Cleaning up

Conclusion

Prerequisites

Business challenges

Data flow

Solution overview

Business outcomes

Lessons learned and best practices

Conclusion

About the authors

SSH Access Security Fundamentals

Prerequisites

Configuration Steps

Install Google Authenticator in the EC2 instance

Configuring Google Authenticator

Configure SSH to use the Google Pluggable Authentication module

Option 1: Configuring MFA for all users in the instance

Option 2: Configuring for specific users without affecting others

Cleanup

Conclusion

Overview

Core features

More features

Conclusion

The collective thoughts of the interwebz