Tag Archives: Compute

New general-purpose Amazon EC2 M8i and M8i Flex instances are now available

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/new-general-purpose-amazon-ec2-m8i-and-m8i-flex-instances-are-now-available/

Today, we’re announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) general-purpose M8i and M8i-Flex instances powered by custom Intel Xeon 6 processors available only on AWS with sustained all-core 3.9 GHz turbo frequency. These instances deliver the highest performance and fastest memory bandwidth among comparable Intel processors in the cloud. They also deliver up to 15 percent better price performance, up to 20 percent higher performance, and 2.5 times more memory bandwidth compared to previous generation M7i and M7i-Flex instances.

M8i and M8i-flex instances are ideal for running general purpose workloads such as general web application servers, virtual desktops, batch processing, microservices, databases, and enterprise applications. In terms of performance, these instances are specifically up to 60 percent faster for NGINX web applications, up to 30 percent faster for PostgreSQL database workloads, and up to 40 percent faster for AI deep learning recommendation models compared to M7i and M7i-Flex instances.

As like R8i and R8i-Flex instances, these instances use the new sixth generation AWS Nitro Cards, delivering up to two times more network and Amazon Elastic Block Storage (Amazon EBS) bandwidth compared to the previous generation instances. It greatly improves network throughput for workloads handling small packets such as web, application, and gaming servers. They also support bandwidth configuration with 25 percent allocation adjustments between network and Amazon EBS bandwidth, enabling better database performance, query processing, and logging speeds.

M8i instances
M8i instances provide up to 384 vCPUs and 1.5 TB memory including bare metal instances that provide dedicated access to the underlying physical hardware. These SAP-certified instances help you to run large application servers and databases, gaming servers, CPU-based inference, and video streaming that need the largest instance sizes or high CPU continuously.

Here are the specs for M8i instances:

Instance size vCPUs Memory (GiB) Network bandwidth (Gbps) EBS bandwidth (Gbps)
m8i.large 2 8 Up to 12.5 Up to 10
m8i.xlarge 4 16 Up to 12.5 Up to 10
m8i.2xlarge 8 32 Up to 15 Up to 10
m8i.4xlarge 16 64 Up to 15 Up to 10
m8i.8xlarge 32 128 15 10
m8i.12xlarge 48 192 22.5 15
m8i.16xlarge 64 256 30 20
m8i.24xlarge 96 384 40 30
m8i.32xlarge 128 512 50 40
m8i.48xlarge 192 768 75 60
m8i.96xlarge 384 1536 100 80
m8i.metal-48xl 192 768 75 60
m8i.metal-96xl 384 1536 100 80

M8i-Flex instances
M8i-Flex instances are a lower-cost variant of the M8i instances, with 5 percent better price performance at 5 percent lower prices. They’re designed for workloads that benefit from the latest generation performance but don’t fully utilize all compute resources. These instances can reach up to the full CPU performance 95 percent of the time.

Here are the specs for the M8i-Flex instances:

Instance size vCPUs Memory (GiB) Network bandwidth (Gbps) EBS bandwidth (Gbps)
m8i-flex.large 2 8 Up to 12.5 Up to 10
m8i-flex.xlarge 4 16 Up to 12.5 Up to 10
m8i-flex.2xlarge 8 32 Up to 15 Up to 10
m8i-flex.4xlarge 16 64 Up to 15 Up to 10
m8i-flex.8xlarge 32 128 Up to 15 Up to 10
m8i-flex.12xlarge 48 192 Up to 22.5 Up to 15
m8i-flex.16xlarge 64 256 Up to 30 Up to 20

If you’re currently using earlier generations of general-purpose instances, you can adopt M8i-Flex instances without having to make changes to your application or your workload.

Now available
Amazon EC2 M8i and M8i-Flex instances are available today in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Spain) AWS Regions. M8i and M8i-Flex instances can be purchased as On-Demand, Savings Plan, and Spot instances. M8i instances are also available in Dedicated Instances and Dedicated Hosts. To learn more, visit the Amazon EC2 Pricing page.

Give M8i and M8i-Flex instances a try in the Amazon EC2 console. To learn more, visit the Amazon EC2 M8i instances page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

Channy

Best performance and fastest memory with the new Amazon EC2 R8i and R8i-flex instances

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/best-performance-and-fastest-memory-with-the-new-amazon-ec2-r8i-and-r8i-flex-instances/

Today, we’re announcing general availability of the new eighth generation, memory optimized Amazon Elastic Compute Cloud (Amazon EC2) R8i and R8i-flex instances powered by custom Intel Xeon 6 processors, available only on AWS. They deliver the highest performance and fastest memory bandwidth among comparable Intel processors in the cloud. These instances deliver up to 15 percent better price performance, 20 percent higher performance, and 2.5 times more memory throughput compared to previous generation instances.

With these improvements, R8i and R8i-flex instances are ideal for a variety of memory intensive workloads such as SQL and NoSQL databases, distributed web scale in-memory caches (Memcached and Redis), in-memory databases such as SAP HANA, and real-time big data analytics (Apache Hadoop and Apache Spark clusters). For a majority of the workloads that don’t fully utilize the compute resources, the R8i-flex instances are a great first choice to achieve an additional 5 percent better price performance and 5 percent lower prices.

Improvements made to both instances compared to their predecessors
In terms of performance, R8i and R8i-flex instances offer 20 percent better performance than R7i instances, with even higher gains for specific workloads. These instances are up to 30 percent faster for PostgreSQL databases, up to 60 percent faster for NGINX web applications, and up to 40 percent faster for AI deep learning recommendation models compared to previous generation R7i instances, with sustained all-core turbo frequency now reaching 3.9 GHz (compared to 3.2 GHz in the previous generation). They also feature a 4.6x larger L3 cache and significantly better memory throughput, offering 2.5 times higher memory bandwidth than the seventh generation. With this higher performance across all the vectors, you can run a greater number of workloads while keeping costs down.

R8i instances now scale up to 96xlarge with up to 384 vCPUs and 3TB memory (versus 48xlarge sizes in the seventh generation), helping you to scale up database applications. R8i instances are SAP certified to deliver 142,100 aSAPS, which is highest among all comparable machines in on premises and cloud environments, delivering exceptional performance for your mission-critical SAP workloads. R8i-flex instances offer the most common sizes, from large to 16xlarge, and are a great first choice for applications that don’t fully utilize all compute resources. Both R8i and R8i-flex instances use the latest sixth generation AWS Nitro Cards, delivering up to two times more network and Amazon Elastic Block Storage (Amazon EBS) bandwidth compared to the previous generation, which greatly improves network throughput for workloads handling small packets, such as web, application, and gaming servers.

R8i and R8i-flex instances also support bandwidth configuration with 25 percent allocation adjustments between network and Amazon EBS bandwidth, enabling better database performance, query processing, and logging speeds. Additional enhancements include FP16 datatype support for Intel AMX to support workloads such as deep learning training and inference and other artificial intelligence and machine learning (AI/ML) applications.

The specs for the R8i instances are as follows.

Instance size
vCPUs
Memory (GiB)
Network bandwidth (Gbps)
EBS bandwidth (Gbps)
r8i.large 2 16 Up to 12.5 Up to 10
r8i.xlarge 4 32 Up to 12.5 Up to 10
r8i.2xlarge 8 64 Up to 15 Up to 10
r8i.4xlarge 16 128 Up to 15 Up to 10
r8i.8xlarge 32 256 15 10
r8i.12xlarge 48 384 22.5 15
r8i.16xlarge 64 512 30 20
r8i.24xlarge 96 768 40 30
r8i.32xlarge 128 1024 50 40
r8i.48xlarge 192 1536 75 60
r8i.96xlarge 384 3072 100 80
r8i.metal-48xl 192 1536 75 60
r8i.metal-96xl 384 3072 100 80

The specs for the R8i-flex instances are as follows.

Instance size
vCPUs
Memory (GiB)
Network bandwidth (Gbps)
EBS bandwidth (Gbps)
r8i-flex.large 2 16 Up to 12.5 Up to 10
r8i-flex.xlarge 4 32 Up to 12.5 Up to 10
r8i-flex.2xlarge 8 64 Up to 15 Up to 10
r8i-flex.4xlarge 16 128 Up to 15 Up to 10
r8i-flex.8xlarge 32 256 Up to 15 Up to 10
r8i-flex.12xlarge 48 384 Up to 22.5 Up to 15
r8i-flex.16xlarge 64 512 Up to 30 Up to 20

When to use the R8i-flex instances
As stated earlier, R8i-flex instances are more affordable versions of the R8i instances, offering up to 5 percent better price performance at 5 percent lower prices. They’re designed for workloads that benefit from the latest generation performance but don’t fully use all compute resources. These instances can reach up to the full CPU performance 95 percent of the time and work well for in-memory databases, distributed web scale cache stores, mid-size in-memory analytics, real-time big data analytics, and other enterprise applications. R8i instances are recommended for more demanding workloads that need sustained high CPU, network, or EBS performance such as analytics, databases, enterprise applications, and web scale in-memory caches.

Available now
R8i and R8i-flex instances are available today in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Spain) AWS Regions. As usual with Amazon EC2, you pay only for what you use. For more information, refer to Amazon EC2 Pricing. Check out the full collection of memory optimized instances to help you start migrating your applications.

To learn more, visit our Amazon EC2 R8i instances page and Amazon EC2 R8i-flex instances page. Send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

– Veliswa

AWS Weekly Roundup: Kiro, AWS Lambda remote debugging, Amazon ECS blue/green deployments, Amazon Bedrock AgentCore, and more (July 21, 2025)

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-kiro-aws-lambda-remote-debugging-amazon-ecs-blue-green-deployments-amazon-bedrock-agentcore-and-more-july-21-2025/

I’m writing this as I depart from Ho Chi Minh City back to Singapore. Just realized what a week it’s been, so let me rewind a bit. This week, I tried my first Corne keyboard, wrapped up rehearsals for AWS Summit Jakarta with speakers who are absolutely raising the bar, and visited Vietnam to participate as a technical keynote speaker in AWS Community Day Vietnam, an energetic gathering of hundreds of cloud practitioners and AWS enthusiasts who shared knowledge through multiple technical tracks and networking sessions.

What I presented was a keynote titled “Reinvent perspective as modern developers”, featuring serverless, containers, and how we can cut the learning curves and be more productive with Amazon Q Developer and Kiro. I got a chance to discuss with a couple of AWS Community Builders and community developers, who shared how Amazon Q Developer actually addressed their challenges on building applications, with several highlighting significant productivity improvements and smoother learning curves in their cloud development journeys.

As I head back to Singapore, I’m carrying with me not just memories of delicious cà phê sữa đá (iced milk coffee), but also fresh perspectives and inspirations from this vibrant community of cloud innovators.

Introducing Kiro
One of the highlights from last week was definitely Kiro, an AI IDE that helps you deliver from concept to production through a simplified developer experience for working with AI agents. Kiro goes beyond “vibe coding” with features like specs and hooks that help get prototypes into production systems with proper planning and clarity.

Join the waitlist to get notified when it becomes available.

Last week’s AWS Launches
In other news, last week we had AWS Summit in New York, where we released several services. Here are some launches that caught my attention:

Console to IDE Integration

ECS Blue-Green Deployments

AWS Free Tier Enhanced Benefits

  • Monitor and debug event-driven applications with new Amazon EventBridge logging — Amazon EventBridge now provides enhanced logging capabilities that offer comprehensive event lifecycle tracking with detailed information about successes, failures, and status codes. This new observability feature addresses microservices and event-driven architecture monitoring challenges by providing visibility into the complete event journey.

EventBridge Enhanced Logging

S3 Vectors Overview

  • Amazon EKS enables ultra-scale AI/ML workloads with support for 100k nodes per cluster — Amazon EKS now supports up to 100,000 worker nodes in a single cluster, enabling customers to scale up to 1.6 million AWS Trainium accelerators or 800K NVIDIA GPUs. This industry-leading scale empowers customers to train trillion-parameter models and advance AGI development while maintaining Kubernetes conformance and familiar developer experience.

EKS Ultra-Scale Performance Improvements

From AWS Builder Center
In case you missed it, we just launched AWS Builder Center and integrated community.aws. Here are my top picks from the posts:

Upcoming AWS events
Check your calendars and sign up for upcoming AWS and AWS Community events:

  • AWS re:Invent – Register now to get a head start on choosing your best learning path, booking travel and accommodations, and bringing your team to learn, connect, and have fun. If you’re an early-career professional, you can apply to the All Builders Welcome Grant program, which is designed to remove financial barriers and create diverse pathways into cloud technology.
  • AWS Builders Online Series – If you’re based in one of the Asia Pacific time zones, join and learn fundamental AWS concepts, architectural best practices, and hands-on demonstrations to help you build, migrate, and deploy your workloads on AWS.
  • AWS Summits — Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Taipei (July 29), Mexico City (August 6), and Jakarta (June 26–27).
  • AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Singapore (August 2), Australia (August 15), Adria (September 5), Baltic (September 10), and Aotearoa (September 18).

You can browse all upcoming AWS led in-person and virtual developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

Donnie

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!


Join Builder ID: Get started with your AWS Builder journey at builder.aws.com

Accelerate safe software releases with new built-in blue/green deployments in Amazon ECS

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/accelerate-safe-software-releases-with-new-built-in-blue-green-deployments-in-amazon-ecs/

While containers have revolutionized how development teams package and deploy applications, these teams have had to carefully monitor releases and build custom tooling to mitigate deployment risks, which slows down shipping velocity. At scale, development teams spend valuable cycles building and maintaining undifferentiated deployment tools instead of innovating for their business.

Starting today, you can use the built-in blue/green deployment capability in Amazon Elastic Container Service (Amazon ECS) to make your application deployments safer and more consistent. This new capability eliminates the need to build custom deployment tooling while giving you the confidence to ship software updates more frequently with rollback capability.

Here’s how you can enable the built-in blue/green deployment capability in the Amazon ECS console.

You create a new “green” application environment while your existing “blue” environment continues to serve live traffic. After monitoring and testing the green environment thoroughly, you route the live traffic from blue to green. With this capability, Amazon ECS now provides built-in functionality that makes containerized application deployments safer and more reliable.

Below is a diagram illustrating how blue/green deployment works by shifting application traffic from the blue environment to the green environment. You can learn more at the Amazon ECS blue/green service deployments workflow page.

Amazon ECS orchestrates this entire workflow while providing event hooks to validate new versions using synthetic traffic before routing production traffic. You can validate new software versions in production environments before exposing them to end users and roll back near-instantaneously if issues arise. Because this functionality is built directly into Amazon ECS, you can add these safeguards by simply updating your configuration without building any custom tooling.

Getting started
Let me walk you through a demonstration that showcases how to configure and use blue/green deployments for an ECS service. Before that, there are a few setup steps that I need to complete, including configuring AWS Identity and Access Management (IAM) roles, which you can find on the Required resources for Amazon ECS blue/green deployments Documentation page.

For this demonstration, I want to deploy a new version of my application using the blue/green strategy to minimize risk. First, I need to configure my ECS service to use blue/green deployments. I can do this through the ECS console, AWS Command Line Interface (AWS CLI), or using infrastructure as code.

Using the Amazon ECS console, I create a new service and configure it as usual:

In the Deployment Options section, I choose ECS as the Deployment controller type, then Blue/green as the Deployment strategy. Bake time is the time after the production traffic has shifted to green, when instant rollback to blue is available. When the bake time expires, blue tasks are removed.

We’re also introducing deployment lifecycle hooks. These are event-driven mechanisms you can use to augment the deployment workflow. I can select which AWS Lambda function I’d like to use as a deployment lifecycle hook. The Lambda function can perform the required business logic, but it must return a hook status.

Amazon ECS supports the following lifecycle hooks during blue/green deployments. You can learn more about each stage on the Deployment lifecycle stages page.

  • Pre scale up
  • Post scale up
  • Production traffic shift
  • Test traffic shift
  • Post production traffic shift
  • Post test traffic shift

For my application, I want to test when the test traffic shift is complete and the green service handles all of the test traffic. Since there’s no end-user traffic, a rollback at this stage will have no impact on users. This makes Post test traffic shift suitable for my use case as I can test it first with my Lambda function.

Switching context for a moment, let’s focus on the Lambda function that I use to validate the deployment before allowing it to proceed. In my Lambda function as a deployment lifecycle hook, I can perform any business logic, such as synthetic testing, calling another API, or querying metrics.

Within the Lambda function, I must return a hookStatus. A hookStatus can be SUCCESSFUL, which will move the process to the next step. If the status is FAILED, it rolls back to the blue deployment. If it’s IN_PROGRESS, then Amazon ECS retries the Lambda function in 30 seconds.

In the following example, I set up my validation with a Lambda function that performs file upload as part of a test suite for my application.

import json
import urllib3
import logging
import base64
import os

# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

# Initialize HTTP client
http = urllib3.PoolManager()

def lambda_handler(event, context):
    """
    Validation hook that tests the green environment with file upload
    """
    logger.info(f"Event: {json.dumps(event)}")
    logger.info(f"Context: {context}")
    
    try:
        # In a real scenario, you would construct the test endpoint URL
        test_endpoint = os.getenv("APP_URL")
        
        # Create a test file for upload
        test_file_content = "This is a test file for deployment validation"
        test_file_data = test_file_content.encode('utf-8')
        
        # Prepare multipart form data for file upload
        fields = {
            'file': ('test.txt', test_file_data, 'text/plain'),
            'description': 'Deployment validation test file'
        }
        
        # Send POST request with file upload to /process endpoint
        response = http.request(
            'POST', 
            test_endpoint,
            fields=fields,
            timeout=30
        )
        
        logger.info(f"POST /process response status: {response.status}")
        
        # Check if response has OK status code (200-299 range)
        if 200 <= response.status < 300:
            logger.info("File upload test passed - received OK status code")
            return {
                "hookStatus": "SUCCEEDED"
            }
        else:
            logger.error(f"File upload test failed - status code: {response.status}")
            return {
                "hookStatus": "FAILED"
            }
            
    except Exception as error:
        logger.error(f"File upload test failed: {str(error)}")
        return {
            "hookStatus": "FAILED"
        }

When the deployment reaches the lifecycle stage that is associated with the hook, Amazon ECS automatically invokes my Lambda function with deployment context. My validation function can run comprehensive tests against the green revision—checking application health, running integration tests, or validating performance metrics. The function then signals back to ECS whether to proceed or abort the deployment.

As I chose the blue/green deployment strategy, I also need to configure the load balancers and/or Amazon ECS Service Connect. In the Load balancing section, I select my Application Load Balancer.

In the Listener section, I use an existing listener on port 80 and select two Target groups.

Happy with this configuration, I create the service and wait for ECS to provision my new service.

Testing blue/green deployments
Now, it’s time to test my blue/green deployments. For this test, Amazon ECS will trigger my Lambda function after the test traffic shift is completed. My Lambda function will return FAILED in this case as it performs file upload to my application, but my application doesn’t have this capability.

I update my service and check Force new deployment, knowing the blue/green deployment capability will roll back if it detects a failure. I select this option because I haven’t modified the task definition but still need to trigger a new deployment.

At this stage, I have both blue and green environments running, with the green revision handling all the test traffic. Meanwhile, based on Amazon CloudWatch Logs of my Lambda function, I also see that the deployment lifecycle hooks work as expected and emit the following payload:

[INFO]	2025-07-10T13:15:39.018Z	67d9b03e-12da-4fab-920d-9887d264308e	Event: 
{
    "executionDetails": {
        "testTrafficWeights": {},
        "productionTrafficWeights": {},
        "serviceArn": "arn:aws:ecs:us-west-2:123:service/EcsBlueGreenCluster/nginxBGservice",
        "targetServiceRevisionArn": "arn:aws:ecs:us-west-2:123:service-revision/EcsBlueGreenCluster/nginxBGservice/9386398427419951854"
    },
    "executionId": "a635edb5-a66b-4f44-bf3f-fcee4b3641a5",
    "lifecycleStage": "POST_TEST_TRAFFIC_SHIFT",
    "resourceArn": "arn:aws:ecs:us-west-2:123:service-deployment/EcsBlueGreenCluster/nginxBGservice/TFX5sH9q9XDboDTOv0rIt"
}

As expected, my AWS Lambda function returns FAILED as hookStatus because it failed to perform the test.

[ERROR]	2025-07-10T13:18:43.392Z	67d9b03e-12da-4fab-920d-9887d264308e	File upload test failed: HTTPConnectionPool(host='xyz.us-west-2.elb.amazonaws.com', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f8036273a80>, 'Connection to xyz.us-west-2.elb.amazonaws.com timed out. (connect timeout=30)'))

Because the validation wasn’t completed successfully, Amazon ECS tries to roll back to the blue version, which is the previous working deployment version. I can monitor this process through ECS events in the Events section, which provides detailed visibility into the deployment progress.

Amazon ECS successfully rolls back the deployment to the previous working version. The rollback happens near-instantaneously because the blue revision remains running and ready to receive production traffic. There is no end-user impact during this process, as production traffic never shifted to the new application version—ECS simply rolled back test traffic to the original stable version. This eliminates the typical deployment downtime associated with traditional rolling deployments.

I can also see the rollback status in the Last deployment section.

Throughout my testing, I observed that the blue/green deployment strategy provides consistent and predictable behavior. Furthermore, the deployment lifecycle hooks provide more flexibility to control the behavior of the deployment. Each service revision maintains immutable configuration including task definition, load balancer settings, and Service Connect configuration. This means that rollbacks restore exactly the same environment that was previously running.

Additional things to know
Here are a couple of things to note:

  • Pricing – The blue/green deployment capability is included with Amazon ECS at no additional charge. You pay only for the compute resources used during the deployment process.
  • Availability – This capability is available in all commercial AWS Regions.

Get started with blue/green deployments by updating your Amazon ECS service configuration in the Amazon ECS console.

Happy deploying!
Donnie

New Amazon EC2 P6e-GB200 UltraServers accelerated by NVIDIA Grace Blackwell GPUs for the highest AI performance

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/new-amazon-ec2-p6e-gb200-ultraservers-powered-by-nvidia-grace-blackwell-gpus-for-the-highest-ai-performance/

Today, we’re announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) P6e-GB200 UltraServers, accelerated by NVIDIA GB200 NVL72 to offer the highest GPU performance for AI training and inference. Amazon EC2 UltraServers connect multiple EC2 instances using a dedicated, high-bandwidth, and low-latency accelerator interconnect across these instances.

The NVIDIA Grace Blackwell Superchips connect two high-performance NVIDIA Blackwell tensor core GPUs and an NVIDIA Grace CPU based on Arm architecture using the NVIDIA NVLink-C2C interconnect. Each Grace Blackwell Superchip delivers 10 petaflops of FP8 compute (without sparsity) and up to 372 GB HBM3e memory. With the superchip architecture, GPU and CPU are colocated within one compute module, increasing bandwidth between GPU and CPU significantly compared to current generation EC2 P5en instances.

With EC2 P6e-GB200 UltraServers, you can access up to 72 NVIDIA Blackwell GPUs within one NVLink domain to use 360 petaflops of FP8 compute (without sparsity) and 13.4 TB of total high bandwidth memory (HBM3e). Powered by the AWS Nitro System, P6e-GB200 UltraServers are deployed in EC2 UltraClusters to securely and reliably scale to tens of thousands of GPUs.

EC2 P6e-GB200 UltraServers deliver up to 28.8 Tbps of total Elastic Fabric Adapter (EFAv4) networking. EFA is also coupled with NVIDIA GPUDirect RDMA to enable low-latency GPU-to-GPU communication between servers with operating system bypass.

EC2 P6e-GB200 UltraServers specifications
EC2 P6e-GB200 UltraServers are available in sizes ranging from 36 to 72 GPUs under NVLink. Here are the specs for EC2 P6e-GB200 UltraServers:

UltraServer type GPUs
GPU
memory (GB)
vCPUs Instance memory
(GiB)
Instance storage (TB) Aggregate EFA Network Bandwidth (Gbps) EBS bandwidth (Gbps)
u-p6e-gb200x36 36 6660 1296 8640 202.5 14400 540
u-p6e-gb200x72 72 13320 2592 17280 405 28800 1080

P6e-GB200 UltraServers are ideal for the most compute and memory intensive AI workloads, such as training and inference of frontier models, including mixture of experts models and reasoning models, at the trillion-parameter scale.

You can build agentic and generative AI applications, including question answering, code generation, video and image generation, speech recognition, and more.

P6e-GB200 UltraServers in action
You can use EC2 P6e-GB200 UltraServers in the Dallas Local Zone through EC2 Capacity Blocks for ML. The Dallas Local Zone (us-east-1-dfw-2a) is an extension of the US East (N. Virginia) Region.

To reserve your EC2 Capacity Blocks, choose Capacity Reservations on the Amazon EC2 console. You can select Purchase Capacity Blocks for ML and then choose your total capacity and specify how long you need the EC2 Capacity Block for u-p6e-gb200x36 or u-p6e-gb200x72 UltraServers.

Once Capacity Block is successfully scheduled, it is charged up front and its price doesn’t change after purchase. The payment will be billed to your account within 12 hours after you purchase the EC2 Capacity Blocks. To learn more, visit Capacity Blocks for ML in the Amazon EC2 User Guide.

To run instances within your purchased Capacity Block, you can use AWS Management Console, AWS Command Line Interface (AWS CLI) or AWS SDKs. On the software side, you can start with the AWS Deep Learning AMIs. These images are preconfigured with the frameworks and tools that you probably already know and use: PyTorch, JAX, and a lot more.

You can also integrate EC2 P6e-GB200 UltraServers seamlessly with various AWS managed services. For example:

  • Amazon SageMaker Hyperpod provides managed, resilient infrastructure that automatically handles the provisioning and management of P6e-GB200 UltraServers, replacing faulty instances with preconfigured spare capacity within the same NVLink domain to maintain performance.
  • Amazon Elastic Kubernetes Services (Amazon EKS) allows one managed node group to span across multiple P6e-GB200 UltraServers as nodes, automating their provisioning and lifecycle management within Kubernetes clusters. You can use EKS topology-aware routing for P6e-GB200 UltraServers, enabling optimal placement of tightly coupled components of distributed workloads within a single UltraServer’s NVLink-connected instances.
  • Amazon FSx for Lustre file systems provide data access for P6e-GB200 UltraServers at the hundreds of GB/s of throughput and millions of input/output operations per second (IOPS) required for large-scale HPC and AI workloads. For fast access to large datasets, you can use up to 405 TB of local NVMe SSD storage or virtually unlimited cost-effective storage with Amazon Simple Storage Service (Amazon S3).

Now available
Amazon EC2 P6e-GB200 UltraServers are available today in the Dallas Local Zone (us-east-1-dfw-2a) through EC2 Capacity Blocks for ML. For more information, visit the Amazon EC2 pricing page.

Give Amazon EC2 P6e-GB200 UltraServers a try in the Amazon EC2 console. To learn more, visit the Amazon EC2 P6e instances page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

Channy

New Amazon EC2 C8gn instances powered by AWS Graviton4 offering up to 600Gbps network bandwidth

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/new-amazon-ec2-c8gn-instances-powered-by-aws-graviton4-offering-up-to-600gbps-network-bandwidth/

Today, we’re announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) C8gn network optimized instances powered by AWS Graviton4 processors and the latest 6th generation AWS Nitro Card. EC2 C8gn instances deliver up to 600Gbps network bandwidth, the highest bandwidth among EC2 network optimized instances.

You can use C8gn instances to run the most demanding network intensive workloads, such as security and network virtual appliances (virtual firewalls, routers, load balancers, proxy servers, DDoS appliances), data analytics, and tightly-coupled cluster computing jobs.

EC2 C8gn instances specifications
C8gn instances provide up to 192 vCPUs and 384 GiB memory, and offer up to 30 percent higher compute performance compared Graviton3-based EC2 C7gn instances.

Here are the specs for C8gn instances:

Instance Name vCPUs Memory (GiB) Network Bandwidth (Gbps) EBS Bandwidth (Gbps)
c8gn.medium 1 2 Up to 25 Up to 10
c8gn.large 2 4 Up to 30 Up to 10
c8gn.xlarge 4 8 Up to 40 Up to 10
c8gn.2xlarge 8 16 Up to 50 Up to 10
c8gn.4xlarge 16 32 50 10
c8gn.8xlarge 32 64 100 20
c8gn.12xlarge 48 96 150 30
c8gn.16xlarge 64 128 200 40
c8gn.24xlarge 96 192 300 60
c8gn.metal-24xl 96 192 300 60
c8gn.48xlarge 192 384 600 60
c8gn.metal-48xl 192 384 600 60

You can launch C8gn instances through the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDKs.

If you’re using C7gn instances now, you will have straightforward experience migrating network intensive workloads to C8gn instances because the new instances offer similar vCPU and memory ratios. To learn more, check out the collection of Graviton resources to help you start migrating your applications to Graviton instance types.

You can also visit the Level up your compute with AWS Graviton page to begin your Graviton adoption journey.

Now available
Amazon EC2 C8gn instances are available today in US East (N. Virginia) and US West (Oregon) Regions. Two metal instance sizes are only available in US East (N. Virginia) Region. These instances can be purchased as On-Demand, Savings Plan, Spot instances, or as Dedicated instances and Dedicated hosts.

Give C8gn instances a try in the Amazon EC2 console. To learn more, refer to the Amazon EC2 C8g instance page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

Channy

Amazon Linux 2023 achieves FIPS 140-3 validation

Post Syndicated from Mahak Arora original https://aws.amazon.com/blogs/compute/amazon-linux-2023-achieves-fips-140-3-validation/

AWS announced that Amazon Linux 2023 (AL2023) has achieved Federal Information Processing Standards (FIPS) 140-3 Level 1 validation of our cryptographic modules, marking a significant milestone in our commitment to providing secure, compliant operating system options for regulated workloads. FIPS certified modules are particularly important for US and Canadian government workloads, healthcare applications requiring HIPAA compliance, financial services, defense contractors, and other regulated industries. FIPS 140-3, which supersedes FIPS 140-2, represents the latest government security standard for cryptographic modules, jointly validated by the National Institute of Standards and Technology (NIST) and the Canadian Centre for Cyber Security (CCCS) through the Cryptographic Module Validation Program (CMVP). The validation follows the rigorous requirements outlined in the FIPS 140-3 standard and encompasses critical cryptographic modules including the OpenSSL, Linux Kernel Cryptographic API, NSS, GnuTLS, and Libgcrypt.

These modules have been extensively tested to have robust security capabilities such as approved cryptographic algorithms, secure key management, strong entropy generation, and protected memory boundaries. The validation process was conducted by a NIST-accredited lab, and further reviewed by the Cryptographic Module Validation Program (CMVP). Additionally, the certificate details can be verified on the CMVP Active Validation List.

In order to enable FIPS mode on AL2023, customers can refer to our FIPS Mode enablement guide on AL2023. Amazon Linux maintains its compliance information through AWS Compliance Programs portal for FIPS- 140-3 and official NIST Guidelines and Compliance FAQs, for meeting global regulatory requirements. For regular updates and best practices, follow the AWS Security Blog, FIPS related FAQs on Amazon Linux 2 and Amazon Linux 2023 providing detailed configuration steps and operational guidance for regulated environments. You can also reach out to your AWS account team for help finding the resources you need.

If you have questions about this post, contact AWS Support.

Announcing up to 45% price reduction for Amazon EC2 NVIDIA GPU-accelerated instances

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/announcing-up-to-45-price-reduction-for-amazon-ec2-nvidia-gpu-accelerated-instances/

Customers across industries are harnessing the power of generative AI on AWS to boost employee productivity, deliver exceptional customer experiences, and streamline business processes. However, the growth in demand for GPU capacity has outpaced industry-wide supply, making GPUs a scarce resource and increasing the cost of securing them.

As Amazon Web Services (AWS) grows, we work hard to lower our costs so that we can pass those savings back to our customers. Regular price reductions on AWS services have been a standard way for AWS to pass on the economic efficiencies gained from our scale back to our customers.

Today, we’re announcing up to 45 percent price reduction for Amazon Elastic Compute Cloud (Amazon EC2) NVIDIA GPU-accelerated instances: P4 (P4d and P4de) and P5 (P5 and P5en) instance types. This price reduction to On-Demand and Savings Plan pricing applies to all Regions where these instances are available. The pricing reduction applies to On-Demand purchases beginning June 1 and to Savings Plan purchases effective after June 4.

Here is a table of price reductions percentage (%) from May 31, 2025 baseline prices by instance types and pricing plans:

Instance type NVIDIA GPUs On-Demand EC2 Instance Savings Plans Compute Savings Plans
1 year 3 years 1 year 3 years
P4d A100 33% 31% 25% 31%
P4de A100 33% 31% 25% 31%
P5 H100 44% 45% 44% 25%
P5en H200 25% 26% 25%

Savings Plans are a flexible pricing model that offer low prices on compute usage, in exchange for a commitment to a consistent amount of usage (measured in $/hour) for a 1- or 3- year term. We offers two types of Savings Plans:

  • EC2 Instance Savings Plans provide the lowest prices, offering savings in exchange for commitment to usage of individual instance families in a Region (for example, P5 usage in the US (N. Virginia) Region).
  • Compute Savings Plans provide the most flexibility and help to reduce your costs regardless of instance family, size, Availability Zones, and Regions (for example, from P4d to P5en instances, shift a workload between US Regions).

To provide increased accessibility to reduced pricing, we are making at-scale On-Demand capacity available for:

  • P4d instances in the Asia Pacific (Seoul), Asia Pacific (Sydney), Canada (Central), and Europe (London) Regions
  • P4de instances in the US East (N. Virginia) Region
  • P5 instances in the Asia Pacific (Mumbai), Asia Pacific (Tokyo), Asia Pacific (Jakarta), and South America (São Paulo) Regions
  • P5en instances in the Asia Pacific (Mumbai), Asia Pacific (Tokyo), and Asia Pacific (Jakarta) Regions

We are also now delivering Amazon EC2 P6-B200 instances through Savings Plan to support large scale deployments, which became available on May 15, 2025 at launch only through EC2 Capacity Blocks for ML. EC2 P6-B200 instances, powered by NVIDIA Blackwell GPUs, accelerate a broad range of GPU-enabled workloads but are especially well-suited for large-scale distributed AI training and inferencing.

These pricing updates reflect the AWS commitment to making advanced GPU computing more accessible while passing cost savings directly to customers.

Give Amazon EC2 NVIDIA GPU-accelerated instances a try in the Amazon EC2 console. To learn more about these pricing updates, visit Amazon EC2 Pricing page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

Channy

Enhance AI-assisted development with Amazon ECS, Amazon EKS and AWS Serverless MCP server

Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/enhance-ai-assisted-development-with-amazon-ecs-amazon-eks-and-aws-serverless-mcp-server/

Today, we’re introducing specialized Model Context Protocol (MCP) servers for Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), and AWS Serverless, now available in the AWS Labs GitHub repository. These open source solutions extend AI development assistants capabilities with real-time, contextual responses that go beyond their pre-trained knowledge. While Large Language Models (LLM) within AI assistants rely on public documentation, MCP servers deliver current context and service-specific guidance to help you prevent common deployment errors and provide more accurate service interactions.

You can use these open source solutions to develop applications faster, using up-to-date knowledge of Amazon Web Services (AWS) capabilities and configurations during the build and deployment process. Whether you’re writing code in your integrated development environment (IDE), or debugging production issues, these MCP servers support AI code assistants with deep understanding of Amazon ECS, Amazon EKS, and AWS Serverless capabilities, accelerating the journey from code to production. They work with popular AI-enabled IDEs, including Amazon Q Developer on the command line (CLI), to help you build and deploy applications using natural language commands.

  • The Amazon ECS MCP Server containerizes and deploys applications to Amazon ECS within minutes by configuring all relevant AWS resources, including load balancers, networking, auto-scaling, monitoring, Amazon ECS task definitions, and services. Using natural language instructions, you can manage cluster operations, implement auto-scaling strategies, and use real-time troubleshooting capabilities to identify and resolve deployment issues quickly.
  • For Kubernetes environments, the Amazon EKS MCP Server provides AI assistants with up-to-date, contextual information about your specific EKS environment. It offers access to the latest EKS features, knowledge base, and cluster state information. This gives AI code assistants more accurate, tailored guidance throughout the application lifecycle, from initial setup to production deployment.
  • The AWS Serverless MCP Server enhances the serverless development experience by providing AI coding assistants with comprehensive knowledge of serverless patterns, best practices, and AWS services. Using AWS Serverless Application Model Command Line Interface (AWS SAM CLI) integration, you can handle events and deploy infrastructure while implementing proven architectural patterns. This integration streamlines function lifecycles, service integrations, and operational requirements throughout your application development process. The server also provides contextual guidance for infrastructure as code decisions, AWS Lambda specific best practices, and event schemas for AWS Lambda event source mappings.

Let’s see it in action
If this is your first time using AWS MCP servers, visit the Installation and Setup guide in the AWS Labs GitHub repository to installation instructions. Once installed, add the following MCP server configuration to your local setup:

Install Amazon Q for command line and add the configuration to ~/.aws/amazonq/mcp.json. If you’re already an Amazon Q CLI user, add only the configuration.

{
  "mcpServers": {
    "awslabs.aws-serverless-mcp":  {
      "command": "uvx",
      "timeout": 60,
      "args": ["awslabs.aws_serverless_mcp_server@latest"],
    },
    "awslabs.ecs-mcp-server": {
      "disabled": false,
      "command": "uv",
      "timeout": 60,
      "args": ["awslabs.ecs-mcp-server@latest"],
    },
    "awslabs.eks-mcp-server": {
      "disabled": false,
      "timeout": 60,
      "command": "uv",
      "args": ["awslabs.eks-mcp-server@latest"],
    }
  }
}

For this demo I’m going to use the Amazon Q CLI to create an application that understands video using 02_using_converse_api.ipynb from Amazon Nova model cookbook repository as sample code. To do this, I send the following prompt:

I want to create a backend application that automatically extracts metadata and understands the content of images and videos uploaded to an S3 bucket and stores that information in a database. I'd like to use a serverless system for processing. Could you generate everything I need, including the code and commands or steps to set up the necessary infrastructure, for it to work from start to finish? - Use 02_using_converse_api.ipynb as example code for the image and video understanding.

Amazon Q CLI identifies the necessary tools, including the MCP serverawslabs.aws-serverless-mcp-server. Through a single interaction, the AWS Serverless MCP server determines all requirements and best practices for building a robust architecture.

I ask to Amazon Q CLI that build and test the application, but encountered an error. Amazon Q CLI quickly resolved the issue using available tools. I verified success by checking the record created in the Amazon DynamoDB table and testing the application with the dog2.jpeg file.

To enhance video processing capabilities, I decided to migrate my media analysis application to a containerized architecture. I used this prompt:

I'd like you to create a simple application like the media analysis one, but instead of being serverless, it should be containerized. Please help me build it in a new CDK stack.

Amazon Q Developer begins building the application. I took advantage of this time to grab a coffee. When I returned to my desk, coffee in hand, I was pleasantly surprised to find the application ready. To ensure everything was up to current standards, I simply asked:

please review the code and all app using the awslabsecs_mcp_server tools 

Amazon Q Developer CLI gives me a summary with all the improvements and a conclusion.

I ask it to make all the necessary changes, once ready I ask Amazon Q developer CLI to deploy it in my account, all using natural language.

After a few minutes, I review that I have a complete containerized application from the S3 bucket to all the necessary networking.

I ask Amazon Q developer CLI to test the app send it the-sea.mp4 video file and received a timed out error, so Amazon Q CLI decides to use the fetch_task_logs from awslabsecs_mcp_server tool to review the logs, identify the error and then fix it.

After a new deployment, I try it again, and the application successfully processed the video file

I can see the records in my Amazon DynamoDB table.

To test the Amazon EKS MCP server, I have code for a web app in the auction-website-main folder and I want to build a web robust app, for that I asked Amazon Q CLI to help me with this prompt:

Create a web application using the existing code in the auction-website-main folder. This application will grow, so I would like to create it in a new EKS cluster

Once the Docker file is created, Amazon Q CLI identifies generate_app_manifests from awslabseks_mcp_server as a reliable tool to create a Kubernetes manifests for the application.

Then create a new EKS cluster using the manage_eks_staks tool.

Once the app is ready, the Amazon Q CLI deploys it and gives me a summary of what it created.

I can see the cluster status in the console.

After a few minutes and resolving a couple of issues using the search_eks_troubleshoot_guide tool the application is ready to use.

Now I have a Kitties marketplace web app, deployed on Amazon EKS using only natural language commands through Amazon Q CLI.

Get started today
Visit the AWS Labs GitHub repository to start using these AWS MCP servers and enhance your AI-powered developmen there. The repository includes implementation guides, example configurations, and additional specialized servers to run AWS Lambda function, which transforms your existing AWS Lambda functions into AI-accessible tools without code modifications, and Amazon Bedrock Knowledge Bases Retrieval MCP server, which provides seamless access to your Amazon Bedrock knowledge bases. Other AWS specialized servers in the repository include documentation, example configurations, and implementation guides to begin building applications with greater speed and reliability.

To learn more about MCP Servers for AWS Serverless and Containers and how they can transform your AI-assisted application development, visit the Introducing AWS Serverless MCP Server: AI-powered development for modern applications, Automating AI-assisted container deployments with the Amazon ECS MCP Server, and Accelerating application development with the Amazon EKS MCP server deep-dive blogs.

— Eli

Amazon Inspector enhances container security by mapping Amazon ECR images to running containers

Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/amazon-inspector-enhances-container-security-by-mapping-amazon-ecr-images-to-running-containers/

When running container workloads, you need to understand how software vulnerabilities create security risks for your resources. Until now, you could identify vulnerabilities in your Amazon Elastic Container Registry (Amazon ECR) images, but couldn’t determine if these images were active in containers or track their usage. With no visibility if these images were being used on running clusters, you had limited ability to prioritize fixes based on actual deployment and usage patterns.

Starting today, Amazon Inspector offers two new features that enhance vulnerability management, giving you a more comprehensive view of your container images. First, Amazon Inspector now maps Amazon ECR images to running containers, enabling security teams to prioritize vulnerabilities based on containers currently running in your environment. With these new capabilities, you can analyze vulnerabilities in your Amazon ECR images and prioritize findings based on whether they are currently running and when they last ran in your container environment. Additionally, you can see the cluster Amazon Resource Name (ARN), number EKS pods or ECS tasks where an image is deployed, helping you prioritize fixes based on usage and severity.

Second, we’re extending vulnerability scanning support to minimal base images including scratch, distroless, and Chainguard images, and extending support for additional ecosystems including Go toolchain, Oracle JDK & JRE, Amazon Corretto, Apache Tomcat, Apache httpd, WordPress (core, themes, plugins), and Puppeteer, helping teams maintain robust security even in highly optimized container environments.

Through continual monitoring and tracking of images running on containers, Amazon Inspector helps teams identify which container images are actively running in their environment and where they’re deployed, detecting Amazon ECR images running on containers in Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (Amazon EKS), and any associated vulnerabilities. This solution supports teams managing Amazon ECR images across single AWS accounts, cross-account scenarios, and AWS Organizations with delegated administrator capabilities, enabling centralized vulnerability management based on container images running patterns.

Let’s see it in action
Amazon ECR image scanning helps identify vulnerabilities in your container images through enhanced scanning, which integrates with Amazon Inspector to provide automated, continual scanning of your repositories. To use this new feature you have to enable enhanced scanning through the Amazon ECR console, you can do it by following the steps in the Configuring enhanced scanning for images in Amazon ECR documentation page. I already have Amazon ECR enhanced scanning, so I don’t have to do any action.

In the Amazon Inspector console, I navigate to General settings and select ECR scanning settings from the navigation panel. Here, I can configure the new Image re-scan mode settings by choosing between Last in-use date and Last pull date. I leave it as it is by default with Last in-use date and set the Image last in use date to 14 days. These settings make it so that Inspector monitors my images based on when they were running in the last 14 days in my Amazon ECS or Amazon EKS environments. After applying these settings, Amazon Inspector starts tracking information about images running on containers and incorporating it into vulnerability findings, helping me focus on images actively running in containers in my environment.

After it’s configured, I can view information about images running on containers in the Details menu, where I can see last in-use and pull dates, along with EKS pods or ECS tasks count.

When selecting the number of Deployed ECS Tasks/EKS Pods, I can see the cluster ARN, last use dates, and Type for each image.

For cross-account visibility demonstration, I have a repository with EKS pods deployed in two accounts. In the Resources coverage menu, I navigate to Container repositories, select my repository name and choose the Image tag. As before, I can see the number of deployed EKS pods/ECS tasks.

When I select the number of deployed EKS pods/ECS tasks, I can see that it is running in a different account.

In the Findings menu, I can review any vulnerabilities, and by selecting one, I can find the Last in use date and Deployed ECS Tasks/EKS Pods involved in the vulnerability under Resource affected data, helping me prioritize remediation based on actual usage.

In the All Findings menu, you can now search for vulnerabilities within account management, using filters such as Account ID, Image in use count and Image last in use at.

Key features and considerations
Monitoring based on container image lifecycle – Amazon Inspector now determines image activity based on: image push date ranging duration 14, 30, 60, 90, or 180 days or lifetime, image pull date from 14, 30, 60, 90, or 180 days, stopped duration from never to 14, 30, 60, 90, or 180 days and status of image running on the container. This flexibility lets organizations tailor their monitoring strategy based on actual container image usage rather than only repository events. For Amazon EKS and Amazon ECS workloads, last in use, push and pull duration are set to 14 days, which is now the default for new customers.

Image runtime-aware finding details – To help prioritize remediation efforts, each finding in Amazon Inspector now includes the lastInUseAt date and InUseCount, indicating when an image was last running on the containers and the number of deployed EKS pods/ ECS tasks currently using it. Amazon Inspector monitors both Amazon ECR last pull date data and images running on Amazon ECS tasks or Amazon EKS pods container data for all accounts, updating this information at least once daily. Amazon Inspector integrates these details into all findings reports and seamlessly works with Amazon EventBridge. You can filter findings based on the lastInUseAt field using rolling window or fixed range options, and you can filter images based on their last running date within the last 14, 30, 60, or 90 days.

Comprehensive security coverage – Amazon Inspector now provides unified vulnerability assessments for both traditional Linux distributions and minimal base images including scratch, distroless, and Chainguard images through a single service. This extended coverage eliminates the need for multiple scanning solutions while maintaining robust security practices across your entire container ecosystem, from traditional distributions to highly optimized container environments. The service streamlines security operations by providing comprehensive vulnerability management through a centralized platform, enabling efficient assessment of all container types.

Enhanced cross-account visibility – Security management across single accounts, cross-account setups, and AWS Organizations is now supported through delegated administrator capabilities. Amazon Inspector shares images running on container information within the same organization, which is particularly valuable for accounts maintaining golden image repositories. Amazon Inspector provides all ARNs for Amazon EKS and Amazon ECS clusters where images are running, if the resource belongs to the account with an API, providing comprehensive visibility across multiple AWS accounts. The system updates deployed EKS pods or ECS tasks information at least one time daily and automatically maintains accuracy as accounts join or leave the organization.

Availability and pricing – The new container mapping capabilities are available now in all AWS Regions where Amazon Inspector is offered at no additional cost. To get started, visit the AWS Inspector documentation. For pricing details and Regional availability, refer to the AWS Inspector pricing page.

PS: Writing a blog post at AWS is always a team effort, even when you see only one name under the post title. In this case, I want to thank Nirali Desai, for her generous help with technical guidance, and expertise, which made this overview possible and comprehensive.

— Eli


How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

New Amazon EC2 P6-B200 instances powered by NVIDIA Blackwell GPUs to accelerate AI innovations

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/new-amazon-ec2-p6-b200-instances-powered-by-nvidia-blackwell-gpus-to-accelerate-ai-innovations/

Today, we’re announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) P6-B200 instances powered by NVIDIA B200 to address customer needs for high performance and scalability in artificial intelligence (AI), machine learning (ML), and high performance computing (HPC) applications.

Amazon EC2 P6-B200 instances accelerate a broad range of GPU-enabled workloads but are especially well-suited for large-scale distributed AI training and inferencing for foundation models (FMs) with reinforcement learning (RL) and distillation, multimodal training and inference, and HPC applications such as climate modeling, drug discovery, seismic analysis, and insurance risk modeling.

When combined with Elastic Fabric Adapter (EFAv4) networking, hyperscale clustering by EC2 UltraClusters, and advanced virtualization and security capabilities by AWS Nitro System, you can train and serve FMs with increased speed, scale, and security. These instances also deliver up to two times the performance for AI training (time to train) and inference (tokens/sec) compared to EC2 P5en instances.

You can accelerate time-to-market for training FMs and deliver faster inference throughput, which lowers inference cost and helps increase adoption of generative AI applications as well as increased processing performance for HPC applications.

EC2 P6-B200 instances specifications
New EC2 P6-B200 instances provide eight NVIDIA B200 GPUs with 1440 GB of high bandwidth GPU memory, 5th Generation Intel Xeon Scalable processors (Emerald Rapids), 2 TiB of system memory, and 30 TB of local NVMe storage.

Here are the specs for EC2 P6-B200 instances:

Instance size GPUs (NVIDIA B200) GPU
memory (GB)
vCPUs GPU Peer to peer (GB/s) Instance storage (TB) Network bandwidth (Gbps) EBS bandwidth (Gbps)
P6-b200.48xlarge 8 1440 HBM3e 192 1800 8 x 3.84 NVMe SSD 8 x 400 100

These instances feature up to 125 percent improvement in GPU TFLOPs, 27 percent increase in GPU memory size, and 60 percent increase in GPU memory bandwidth compared to P5en instances.

P6-B200 instances in action
You can use P6-B200 instances in the US West (Oregon) AWS Region through EC2 Capacity Blocks for ML. To reserve your EC2 Capacity Blocks, choose Capacity Reservations on the Amazon EC2 console.

Select Purchase Capacity Blocks for ML and then choose your total capacity and specify how long you need the EC2 Capacity Block for p6-b200.48xlarge instances. The total number of days that you can reserve EC2 Capacity Blocks is 1-14 days, 21 days, 28 days, or multiples of 7 up to 182 days. You can choose your earliest start date for up to 8 weeks in advance.

Now, your EC2 Capacity Block will be scheduled successfully. The total price of an EC2 Capacity Block is charged up front, and the price doesn’t change after purchase. The payment will be billed to your account within 12 hours after you purchase the EC2 Capacity Blocks. To learn more, visit Capacity Blocks for ML in the Amazon EC2 User Guide.

When launching P6-B200 instances, you can use AWS Deep Learning AMIs (DLAMI) to support EC2 P6-B200 instances. DLAMI provides ML practitioners and researchers with the infrastructure and tools to quickly build scalable, secure, distributed ML applications in preconfigured environments.

To run instances, you can use AWS Management Console, AWS Command Line Interface (AWS CLI) or AWS SDKs.

You can integrate EC2 P6-B200 instances seamlessly with various AWS managed services such as Amazon Elastic Kubernetes Services (Amazon EKS), Amazon Simple Storage Service (Amazon S3), and Amazon FSx for Lustre. Support for Amazon SageMaker HyperPod is also coming soon.

Now available
Amazon EC2 P6-B200 instances are available today in the US West (Oregon) Region and can be purchased as EC2 Capacity blocks for ML.

Give Amazon EC2 P6-B200 instances a try in the Amazon EC2 console. To learn more, refer to the Amazon EC2 P6 instance page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

Channy


How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

Powering generative AI/ML solutions with AWS Outposts Servers at Edge locations

Post Syndicated from Art Baudo original https://aws.amazon.com/blogs/compute/powering-generative-ai-ml-solutions-with-aws-outposts-servers-at-edge-locations/

This post is written by Brian Daugherty, Principal Solutions Architect, Leonardo Queirolo, Senior Cloud Support Engineer, and Reet Kundu, Senior Cloud Support Engineer

Powering generative AI/ML solutions with AWS Outposts Servers at Edge locations

Many organizations are vigorously pursuing generative AI initiatives in the Amazon Web Services (AWS) cloud today because generative AI drive advances in productivity, efficiency, and innovation.

However, for some organizations, industries, and use-cases, there is a compelling need to deploy generative AI not only in the cloud, but also at the edge due to factors such as application latency and proximity to critical data.

AWS Outposts can help these organizations address this need by extending AWS services to the edge, such as generative AI services, while maintaining the same tooling and orchestration capabilities found in AWS Regions.

Industrial and manufacturing use-cases are a primary focus of AWS Outposts Servers, which can be deployed on-premises to minimize latency and make sure of stable connectivity between orchestration and control applications such as Manufacturing Execution Systems (MES) or Supervisory, Control, and Data Acquisition (SCADA) systems and the industrial processes they control.

This post explores how to use Outposts Servers to power generative AI solutions at the edge. The example use-case demonstrates real-time anomaly detection for industrial processes and an edge-based human machine interface including a small language model (SLM) with Retrieval-Augmented Generation (RAG) to guide operators on best practices for problem resolution. Although the use case is specific, the tools and methods can be applied to many other edge generative AI use cases.

For a hands-on experience to implement this solution using Outposts Servers, fill out this form with your contact information and we will get back to you with lab access. A detailed step-by-step guide to develop the hands-on example is available in this link.

Architecture overview

As depicted in the following diagram, the solution is distributed in three modules. The first module (1) guides you to establish low-latency, local connectivity to an MQTT broker within the same on-premises network as your lab Amazon Elastic Compute Cloud (Amazon EC2) instance. You configure essential AWS infrastructure (Amazon S3, AWS Secrets Manager, AWS Identity and Access Management (IAM)) to manage the deployment, authentication, and permissions of AWS IoT Greengrass components. You deploy a component to the existing Greengrass core device on your lab EC2 instance to retrieve synthetic Arduino sensor data from the broker using its Local Network Interface (LNI).
Figure 1 – Architectural diagram of the solution to perform low-latency, local inference through generative AI and ML models running on Outposts Servers

Figure 1 – Architectural diagram of the solution to perform low-latency, local inference through generative AI and ML models running on Outposts Servers

In the second module (2), you deploy a component that detects anomalies in sensor data in real-time. This component runs on the Outposts Server EC2 instance hosting the AWS IoT Greengrass core device, performing inference directly at the edge. You use synthetic Arduino sensor data to generate anomalies and observe them being detected by the model. You configure an IoT rule to send the anomaly count to the Amazon CloudWatch Dashboard in the Region. This provides centralized monitoring, while making sure that the raw data and any sensitive data remains processed locally at the edge where latency and connectivity are assured.

In the third module (3), you deploy a comprehensive edge computing solution to enhance operational visibility and decision-making capabilities at the local level. The solution includes a local dashboard that provides a real-time telemetry to display raw sensor data and detect anomalies. A Virtual Assistant is integrated with SLM to provide context-aware response from the factory data and forecasting capability to predict future anomaly trends.

Outposts Server

Outposts Servers provide fully managed AWS infrastructure, services, APIs, and tools for edge use-cases . Two form factors are available: 1U servers are AWS Graviton based, and 2U servers are third-generation Intel Xeon Scalable processor based.

Enabling anomaly detection at the edge

Outposts Servers allow local sensor data processing for low-latency anomaly detection and resilience against external connectivity issues, as shown in the following figure. The example uses synthetic Arduino devices with gyroscope sensors data, simulating industrial sensors sending data to an MQTT Broker on an EC2 instance in the Outposts Server. Gyroscope data is used in various monitoring systems, such as motion control systems, orientation detection, stability, and balance mechanism. The Lab EC2 instance fetches sensor data through the MQTT client and processes it using a machine learning (ML) model for anomaly detection.

Figure 2 – Architectural diagram showing data flow from Arduino sensors through MQTT broker and EC2 on Outposts Server to perform local inference

Figure 2 – Architectural diagram showing data flow from Arduino sensors through MQTT broker and EC2 on Outposts Server to perform local inference

Outposts server LNI

Local communication between synthetic Arduino sensor data, MQTT broker, and the Lab EC2 instance uses LNI, providing Layer 2 presence on the local network. The setup necessitates creating an Elastic Network Interface (ENI) on an Outposts subnet with the LNI enabled, attaching it to the Lab EC2 Instance, and verifying connectivity through the MQTT Broker’s LNI IP using the command ping -c 5 <MQTT_BROKER_LNI_IP> . This enables direct, low-latency communication between components crucial for this edge computing scenario.

AWS IoT Greengrass

AWS IoT Greengrass is an open source edge runtime and cloud service for device software management and deployment supported on Outposts Server. This hybrid approach combines the benefits of edge computing with centralized management, such as:

  • Centralized artifact management: store and version component artifacts in Amazon S3, enabling consistent deployment across multiple edge locations.
  • Secure configuration: use Secrets Manager to handle sensitive information and credentials unique to each edge location.
  • Fleet monitoring: use CloudWatch for centralized monitoring and logging across your distributed edge deployment.
  • Automated updates: deploy software updates and model improvements across your edge fleet through AWS IoT Greengrass component management.

AWS IoT Greengrass components, such as the one used for the anomaly detection, can be deployed to EC2 instances running on Outposts Servers. After configuring the Lab EC2 instance with Greengrass, you can download components from an S3 bucket. The first component deploys a subscriber for receiving synthetic Arduino sensor data through MQTT broker configuration, as shown in the following configuration line.

{
    "broker": "<MQTT_BROKER_LNI_IP>",
    "port": 1883,
    "client_id": "OutpostsServerMLEdge_<workshop-id>",
    "sensor_name": "ArduinoSensor_<arduino-id>",
    "topic": "arduino/ArduinoSensor_<arduino-id>/3-axis-rotation",
    "thing_name": "OutpostsServerMLEdge_Sub",
    "mqttauth_creds": "<ARN_SECRET_MQTT_CREDENTIALS>"
}

The second component is the Anomaly Detector artifact that processes sensor data in real-time, detects anomalies using a pre-trained model, and sends anomaly counts to AWS IoT Core. Key components include:

  • edge_application.py: script for processing sensor data, performing local inference using pre-trained model in ONNX format, and publishing anomaly counts to AWS IoT Core. It is used for local inference, so that the raw data is not exposed outside the Edge location.
  • model: directory storing “arduino.onnx”, a pre-trained Autoencoder model for anomaly detection.
  • statistics: directory storing the values of different statistical functions (for example, mean and standard deviation) from the training phase and used by edge_application.py for inference.
  • functions: directory storing the code of the functions, such as the code to publish to the AWS IoT Core.

After deployment of subscriber and detector components, the Lab EC2 instance processes synthetic gyroscope data from Arduino sensors, detecting anomalies during X, Y, or Z axis movement:

Real-time Dashboard showing sensor data and anomaly count

Real-time anomaly detection results from gyroscope sensor data across X, Y, and Z axes.

Building upon the foundation of Outposts Server, Local Network Interface (LNI), and AWS IoT Greengrass, this solution extends beyond anomaly detection to deliver comprehensive edge AI capabilities. These core components work together to enable advanced generative AI applications at the edge, as demonstrated in the following sections.

Edge generative AI applications with Outposts Server

The solution demonstrates the implementation of key edge generative AI capabilities:

  • Contextual virtual assistance: providing on-site personnel with AI-powered guidance and troubleshooting using local operational data, SOPs, and technical documentation.
  • Predictive insights: using foundational models (FMs) to forecast future trends based on historical data, enabling proactive planning and optimization.
  • Real-time operational dashboard: integrating sensor data visualization with AI-powered insights and forecasts in a unified local interface that maintains operations during connectivity interruptions.

1. Contextual virtual assistance at the edge

The solution implements the virtual assistant through an AWS IoT Greengrass component. The following is a snippet from the component recipe showing the key configuration parameters:

{
    "ComponentConfiguration": {
        "DefaultConfiguration": {
            // Workshop defaults, SLM runs locally on same EC2 instance
            "SLM_endpoint": "http://localhost:8080",  
            "embedding_model": "all-MiniLM-L6-v2",    
            "knowledge_base_directory": "Factory_Data" 
        }
    }
    // Additional component recipe configurations...
}

Although the solution demonstrates a streamlined setup with the SLM running on the same EC2 instance as the AWS IoT Greengrass component, the architecture enables flexible deployment options through the SLM_endpoint configuration. Organizations can:

  • Deploy the SLM on a dedicated resource in their on-premises network (for example "http://<LNI-IP-DEDICATED-RESOURCE>:8080")
  • Use existing hardware infrastructure accessible through LNI
  • Scale SLM compute resources independently from the AWS IoT Greengrass component
  • Maintain low-latency communication through local network interfaces

The implementation showcases a streamlined approach to RAG at the edge through three main components:

Knowledge base management: the solution uses Amazon S3 for document storage (PDFs, Markdown, text) with automatic edge deployment through AWS IoT Greengrass. Alternatively, you can also choose to store the documents in a local storage. A vector database, such as ChromaDB, handles local vector storage and similarity search, enabling efficient knowledge base updates with centralized control.

Flexible query processing: the implementation provides a streamlined interface for RAG management, allowing users to load site-specific knowledge bases and switch between basic SLM and RAG-enhanced responses with local context:

if prompt := st.chat_input("Question"):
if "db" in st.session_state:
        prompt = augmentPrompt(prompt, st.session_state["db"])
response = getStreamingAnswer(prompt, SLM_MODEL_ENDPOINT)

Modular SLM integration: The solution uses a standardized chat completion API, which allows for integration with different SLM deployments while maintaining a consistent interface across the edge fleet:

def getStreamingAnswer(question: str, endpoint: str):    
    chat_template = '<|user|>\n{input} <|end|>\n<|assistant|>'
    payload = {
        'messages': [{'content': f'{chat_template.format(input=question)}'}],
        'stream': True
    }
    SLM_URL = endpoint + '/v1/chat/completions'

This flexible architecture can be adapted for many industrial use-cases where latency and proximity to local data-sources and processes are critical.

2. Predictive insights using local models

The solution demonstrates forecasting capabilities using Chronos, a small and efficient time series forecasting model that can run entirely at the edge. The following solution implementation shows how to process historical data and generate predictions using Chronos on the AWS IoT Greengrass component deployed on Outposts Server:

# Load Chronos model locally on the Outposts Server
pipeline = ChronosPipeline.from_pretrained(
    "amazon/chronos-t5-small",
    device_map="cpu",
    torch_dtype=torch.bfloat16,
)
# Generate forecasts with confidence intervals
def predict_anomaly_count_data():
    forecast = pipeline.predict(
        context = torch.tensor(df["total_anomalies"]),
        prediction_length = pred_length,
        num_samples = n_samples,
        top_k = 50,
        top_p = 1.0,
    )
    
    # Calculate confidence bounds
    low, median, high = np.quantile(forecast[0].numpy(), [0.1, 0.5, 0.9], axis=0)

Although the solution uses sample data for the demonstration, this architecture allows organizations to process complex, real-time data at each edge location. Companies can choose to upload only aggregated metrics to CloudWatch or Amazon QuickSight for fleet monitoring and BI analysis, making sure that sensitive raw data remains secure at the edge.

3. Real-time operational dashboard

The solution showcases a resilient monitoring solution where all inter-component communication occurs within the local network and processing happens on the Outposts server, making sure of full functionality during external network interruptions. The dashboard is accessible through the LNI of the Outposts server, allowing local clients to maintain access through the LNI IP address even when connectivity to the Region is lost.

Through a unified interface, the dashboard provides:

  • Real-time visualization of sensor readings
  • Anomaly detection results from the local ML component
  • AI-powered insights from the local SLM
  • Trend forecasting from the Chronos model

Real-time Dashboard showing sensor data and anomaly count

Real-time Dashboard showing sensor data and anomaly count

Virtual Assistant leveraging Factory Data to provide contextualized answers

Virtual Assistant leveraging Factory Data to provide contextualized answers

Chronos forecasting anomaly count based on historical data

Chronos forecasting anomaly count based on historical data

Conclusion

The implementation demonstrates how AWS Outposts Server enables organizations to use both traditional ML and advanced generative AI capabilities at the edge for a variety of industrial and manufacturing use-cases where low-latency and proximity to sensitive or real-time data are business- and process-critical.

To get started with AWS Outposts and explore use cases like this edge AI solution, fill out this form and our team will contact you with lab access and additional guidance. For a detailed walkthrough of this specific edge AI example, refer to this step-by-step guide. For more information about AWS Outposts Server, see the AWS Outposts Server User Guide.

Anchoring AWS Outposts servers with AWS Direct Connect

Post Syndicated from Art Baudo original https://aws.amazon.com/blogs/compute/anchoring-aws-outposts-servers-with-aws-direct-connect/

This post is written by Perry Wald, Principal GTM SA, Hybrid Edge, Eric Vasquez Senior SA Hybrid Edge, and Fernando Galves Gen AI Solutions Architect, Outposts

AWS Outposts is a fully managed service that extends AWS infrastructure, services, APIs, and tools to customer premises. Outposts servers launched in 2022, a 1U or 2U rack-mountable host, with the ability to run Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Container Service (Amazon ECS), as well as other appropriate smaller scale edge services such as AWS IoT Greengrass. This version of Outposts is primarily focused on bringing lower latency, AWS compute capabilities to the edge at many user locations.

During Outposts provisioning, you or AWS creates a service link connection that connects your Outposts server to your chosen AWS Region or home Region. Outposts depends on regional connectivity “to reach out to home,” needing very little in terms of networking. Looking at the network requirements, it needs:

  • DHCP, to assign an IP address and a default gateway
  • Public DNS, to resolve the name of the initial regional endpoint, to allow automated setup, and
  • Internet access, so that when the regional endpoint has been resolved, the Outpost can reach that endpoint. With a minimum of 500 Mbps or and a max of 175 ms round trip latency

User challenges with internet connectivity at the edge

When you order an Outposts server, you are responsible for installing the server. Outposts servers are self-provisioning and need a service link connection between your Outposts and the AWS Region (or home Region). This connection allows for the management of Outposts and the exchange of traffic to and from the AWS Region. Server deployment can be broken down into the following steps: installing the Outposts servers, powering them on, and providing authentication details through a command line. Then, the Outpost servers reach out to the regional endpoint, and provision themselves. Your Outpost status will show as Active when the process has completed, it could take a few hours depending on service link bandwidth.

Although this has been suitable for the vast majority of use cases, there are some locations that can’t provide internet connectivity in their environments. This has mostly been in use cases where there is a strong security reason for not having an internet connection (such as financial services kiosks, small manufacturing facilities, and defense), so as to avoid risks such as DDoS attacks and potential hack attempts, or to meet requirements for receiving an authority to operate (ATO).

These locations either have some form of direct connect, or more commonly have a centralized direct connect link to AWS, and an MPLS network linking all their remote sites to a central one. In both of these scenarios, the requirement is to allow the Outpost servers to resolve and reach the public endpoint for setup, and subsequently the public anchor endpoint for management. This is done without needing to leave the AWS ecosystem, without needing to expose themselves unnecessarily to potential internet threats, and without adding more systems to manage themselves, but rather making use of AWS services.

To meet this requirement, we identified several key things that need to be provided if the user does not have internet connectivity at the remote location, as follows:

  1. DHCP, to provide the Outposts servers with an IP address, default gateway, and DNS servers.
  2. Public DNS access to resolve both the setup endpoint, and when live, the anchor endpoint.
  3. Public internet access, without exposing the user location to potentially harmful traffic from the internet.

Direct Connect VIF options

There are three different types of Virtual Interfaces (VIF) possible to configure on an AWS Direct Connect link:

  • Public VIF: A public VIF can access all AWS public services using public IP addresses.
  • Private VIF: A private VIF should be used to access an Amazon Virtual Private Cloud (Amazon VPC) using private IP addresses.
  • Transit VIF: A transit VIF should be used to access one or more Amazon VPC Transit Gateways associated with Direct Connect gateways.

Transit VIF option

A transit VIF can be used to solve both of these issues. First, a transit VIF deploys an ENI within a VPC (known as an attachment), so that traffic coming from the transit VIF into a VPC can be routed. This is because it follows the rule that, for non-transitive VPC routing, the traffic has to either be sourced or targeted for an ENI in the VPC.

If the traffic is forwarded to a regional VPC through the transit gateway, then it can be forwarded to the internet through an NAT gateway. This is an enhancement of the architecture to use a transit gateway to provide a single egress point for multiple VPCs to the internet. For more information, see Creating a single internet exit point from multiple VPCs Using AWS Transit Gateway. In this case, instead of the transit gateway routing multiple VPCs to the internet, it’s routing to an on-premises connection.

Using a transit gateway to forward traffic to an NAT gateway allows you to provide internet connectivity for the Outposts servers without managing virtual appliances, because NAT gateway provides this as a service. NAT gateways also only allow outbound access, so they provide security against any attempted external access by a bad actor from the internet. This works for Outposts servers since they only need outbound access. Outposts always initiate communication to an anchor or service endpoint, and they never receive communication except as a response.

Architectural diagram showing the use of a Transit VIF and NAT gateway in a Region reaching regional endpoints

Figure 1. Architectural diagram showing the use of a Transit VIF and NAT gateway in a Region reaching regional endpoints

DNS provisioning

Although the preceding architecture solves the challenge of how we provide a path for IP packets to transit between the Outposts servers and the public endpoints needed, it doesn’t solve the issue of resolving DNS names. If the remote site is isolated from the internet, then it has no clear way to resolve DNS.

Amazon Route53 resolver endpoints allow you to deploy an IP address within a VPC subnet, which provides DNS resolution. There are two types of resolver endpoints: outbound and inbound.

Outbound resolver endpoints are used by AWS to send DNS queries to your on-premises DNS servers. Inbound resolver endpoints are used by your DNS servers (and hosts) to resolve addresses within Route 53.

Route 53 can resolve public DNS names, so the Outposts service endpoint outposts.<region-name>.amazonaws.com becomes resolvable by an inbound resolver endpoint.

Configuring the Outposts egress VPC

  1. Set up service link egress VPC, build subnets, deploy a NAT gateway, and transit gateway.
  2. Create Route 53 resolver inbound endpoint.
  3. Configure DHCP on the switch, and make sure that the DNS value matches resolver endpoint.
  4. Configure Transit VIF on the switch, build a BGP peer, and attach to your transit gateway.
  5. Confirm propagation settings on transit gateway and default routes.
  6. Confirm routes on subnets to allow traffic out to the internet, and back to your Outpost servers.
  7. Test name resolution (dig) and https (curl) test to service endpoint.
  8. If needed, install your Outpost servers.

Public VIF option

Using a public VIF allows you to provide an internet connection directly to the on-premises site. In turn, this means you need to implement firewalls and security functions on this connection, adding more layers of operational overhead. A public VIF also means that the on-premises end of the VIF can be accessed by any public IP on the AWS public network, regardless of the instance to which IP is mapped. A public VIF is a public IP endpoint on the AWS public network. You should treat public VIF traffic as internet-based traffic. This can become cumbersome for firewalls teams if they have to allow-list known AWS IP ranges and manage the stateful firewall for a long range of AWS IPs.

Furthermore, even if the user is happy to implement and manage a firewall on the end of that public VIF, there is still a question of how the Outpost would resolve DNS in this setup, and subsequent anchor endpoints. Unless the private network already has DNS resolution to a public DNS, then there are no DNS servers that DHCP can point to in order to allow the Outposts servers to get name resolution. This is because there is no public DNS endpoint within the AWS public network. Traffic from a user’s public VIF can access the AWS public network, but it can’t exit it to other public networks. For example, if the you had configured DHCP to point to one of the well-known DNS servers (such as 8.8.8.8), then, since this DNS servers lives outside of the AWS public network, requests originating from the on-premises side of a public VIF would be dropped as it hit the border of the AWS autonomous system.

The only way for a DNS request to be resolved would be to build a bind forwarding service within a VPC, provide it with a public IP address, and point the DHCP DNS values at this IP address.

This network configuration introduces complexity, and won’t be possible for those with highly regulated workloads. You would need to manage a firewall on-premises, allow a public network to reach the on-premises location, and manage a bind servers setup within a VPC. For these reasons, a public VIF is generally not an option unless the user is already running one, and is familiar with the steps to secure it.

Figure 2. Architectural diagram showing traffic flow using a public VIF and AWS Outposts

Private VIF option

A private VIF whether connected directly to a virtual private gateway (VGW), or through a Direct Connect gateway. VPCs do not support transitive routing. To explain this another way, any traffic following a routing rule in a subnet route table has to either originate from, or be destined for, an IP address (or to be more explicit, an Elastic Network Interface (ENI)) inside that VPC.

Virtual private gateways do not have an ENI associated with them, but are pointed to as a next hop within a subnet routing table. If we take this example and look at what the Outposts servers would be trying to pass as traffic, then it would send a packet with a source address of the Outposts servers, and a destination address of the Outposts service public endpoint (assuming that it could resolve it). When this packet reaches the VPC, then neither the source nor destination address would belong to an ENI within the VPC. Therefore, VPC routing would drop the packet.

Even if there was a routing rule on the subnet pointing the next hop for all traffic to a NAT gateway (ideal for internet egress), the routing still wouldn’t work. This is because the packet from the Outposts servers doesn’t have a destination of the NAT gateway, but instead a destination of the setup endpoint in the internet.

It’s possible to use a combination of ingress routing and transparent proxies to ingest the traffic and pass it to an instance running a proxy service to forward to the internet. However, this adds complexity having to manage and maintain proxy servers. For these reasons, a private VIF is generally not recommended.

Architectural diagram showing VGW and packet drops because of transitive routing not being supported

Figure 3. Architectural diagram showing VGW and packet drops because of transitive routing not being supported

Conclusion

In this post, we discussed architecture patterns you can use to provision your Outposts when public internet connectivity is unavailable. To get started with Outpost servers please visit our Server User Guide. For more information, contact us to learn more.

Implementing network traffic inspection on AWS Outposts rack

Post Syndicated from Art Baudo original https://aws.amazon.com/blogs/compute/implementing-network-traffic-inspection-on-aws-outposts-rack-2/

This post is written by Arun Kumar N C, Technical Account Manager; Debapriyo Jogi, Technical Account Manager; and Ashish Nagaraj, Cloud Support Engineer 2

Organizations are increasingly adopting hybrid cloud architectures that combine the scalability of cloud computing with the control and compliance benefits of on-premises infrastructure. AWS Outposts extends AWS infrastructure, AWS services, APIs, and tools to on-premises locations for workloads that require low latency, local data processing, or data residency. Outposts comes in a variety of form factors, from 42U Outposts racks to 1U and 2U Outposts servers. This post will focus on implementing network traffic inspection on Outposts rack.

Comprehensive security is critical for organizations deploying production workloads on Outposts. Network traffic inspection serves as a crucial security control, protecting against threats while enabling secure communication between different network segments. This post provides guidance on how to implement effective network traffic inspection across your hybrid cloud infrastructure using Outposts rack.

Overview

In the coming sections we will cover strategies for network traffic inspection on Outposts rack, focusing on outbound internet access and communication with on-premises networks. We explore AWS native services and third-party tools, offering a comprehensive overview of your options. We will cover architectural patterns, implementation guides, and best practices to help build a strong security posture for your hybrid cloud environment.

Securing internet-facing applications

Securing internet-facing applications on Outposts requires a robust, multi-layered approach for high availability and comprehensive security. We will explore two key architectural patterns that ensure enterprise-grade security for your workloads below.

Amazon CloudFront with AWS WAF integration

This architecture uses multiple AWS services including AWS Shield and AWS WAF for multi-layered security, Amazon CloudFront for global content delivery, and an Application Load Balancer (ALB) on Outposts for on-premises traffic management. Applications are deployed on Outposts, with CloudFront as the content delivery network. AWS WAF rules on CloudFront protect against web exploits, while the ALB distributes requests to application instances within Outposts.

This diagram illustrates AWS CloudFront with WAF integration connecting AWS cloud services to a customer data center through the internet. The setup includes CloudFront protected by WAF and Shield, EC2 instances in a VPC, and an Outpost deployment in the customer data center with ALB and EC2 instances.

Figure 1 – Amazon CloudFront with AWS WAF integration

  1. User sends a request via web browser or mobile app to access the application.
  2. The request is received by the CloudFront in AWS Edge Location, performing content-based routing.
  3. CloudFront integrates with AWS WAF to filter web traffic and block common attack patterns.
  4. ALB routes it to the appropriate targets.
  5. The application on Outposts processes the request and generates a response.

This flow ensures secure and efficient handling of user requests using both cloud and on-premises resources.

ALB with AWS WAF


This architecture offers more control over traffic routing while using AWS WAF for security. Applications are deployed on Outposts, but the ALB is in the parent Region, as AWS WAF cannot be associated with Outposts ALBs. The regional ALB handles incoming traffic, with AWS WAF providing firewall capabilities. After passing through AWS WAF, traffic is routed to Outposts applications. This configuration allows advanced WAF features but may introduce latency, as traffic must first reach the regional ALB. This trade-off between security and latency should be considered based on application needs.

Note: A critical dependency exists on the service link connection, as application traffic routing relies on the regional ALB. Service link failures will disrupt workload operations, making connection resilience essential for this architecture.

Figure 2 – ALB with AWS WAF

  1. User sends a request via web browser or mobile app for a webpage, API call, or service.
  2. The ALB in the AWS Region receives the request and performs Layer 7 content-based routing.
  3. ALB integrates with AWS WAF for security inspection.
  4. If the request passes, ALB routes it to the appropriate target in Outposts, selecting a specific instance or service.
  5. The application on Outposts processes the request, generates a response, and returns it.
  6. The response travels back through Outposts ALB to the regional ALB, which forwards it to the user’s browser or app.

Inspection between the Outpost subnet and regional subnet

Network traffic inspection between the Outpost and regional subnets is vital for security in hybrid cloud deployments. It makes sure traffic between Outposts and the parent Region complies with security policies and requirements. Two main architectural approaches exist for implementing this inspection:

  1. Using a third-party firewall in the Outpost subnet.
  2. Using AWS Network Firewall in an AWS Region.

Both approaches support various connectivity (service link) options between Outposts and the Region, including AWS Direct Connect.

Using third-party firewall in the Outpost subnet

This architecture uses a third-party firewall in the Outposts subnet, routing all traffic between the Outposts and regio0nal subnets through it. This setup enables local traffic inspection, reducing latency while enforcing security policies before traffic leaves the Outposts.

This diagram illustrates AWS Region and customer data center connected via service-linked VPN. Outpost deployment includes third-party firewall EC2 instance and target EC2 instances in Outpost subnet.

Figure 3 – Third-party firewall in the Outpost subnet

Traffic can originate from either Outposts or AWS regional subnet.

  1. Traffic originating from the Outpost to AWS Region:

a. Traffic is sent to the third-party firewall in the Outpost.
b. The firewall inspects the traffic and applies security policies.
c. If allowed, the firewall forwards traffic to the Region.
d. Traffic travels via service link connectivity (Direct Connect or public internet) to the regional subnet.

  1. Traffic originating from AWS Region to the Outpost:

a. Traffic originates in the regional subnet.
b. Traffic travels via service link connectivity (Direct Connect or public internet).
c. Upon reaching the Outpost, the traffic is sent to the third-party firewall.
d. The firewall inspects packets and applies security policies.
e. If allowed, the firewall forwards traffic to the Outpost subnet destination.

Using AWS Network Firewall in an AWS Region

In this architecture, a Network Firewall is deployed in the regional VPC, routing all traffic between the Outpost and regional subnets through it. This centralized approach ensures consistent policy enforcement with AWS native tools. The firewall inspects all traffic between Outposts and the AWS infrastructure in the Region.

This diagram illustrates AWS Network Firewall in a Region connected to customer data center via service-linked VPN. Includes VPC with Network Firewall, and EC2 in Outpost routing the traffic through the AWS Network Firewall endpoint.

Figure 4 – AWS Network Firewall in an AWS Region

Traffic can originate from either the Outposts subnet or AWS regional subnet.

All traffic is routed to the Network Firewall in the AWS Region.

  1. The firewall applies configured rules, including:
  • Custom rules for specific security needs.
  • Managed AWS rule groups for common threats.
  • Third-party rule groups for specialized protection.
  1. If traffic passes all rules, it is forwarded to its destination (Outpost or Region).
  2. Return traffic follows the same path, all traffic is inspected by the Network Firewall.

Inspection between on-premises and Outposts through Local Gateway

Network traffic inspection between on-premises networks and Outposts via Local Gateway (LGW) is essential for securing hybrid environments. It helps you make sure safe communication is happening between Outposts workloads and on-premises infrastructure.
Two primary architectural approaches are available explained below. The choice depends on infrastructure, security needs, and operational preferences.

Using third-party firewall on Outposts

For more details on implementing network traffic inspection between on-premises networks and Outposts via LGW, refer to Implementing network traffic inspection on AWS Outposts rack.

This post expands on the preceding blog by offering detailed guidance on architectural options and traffic flows for inspecting network traffic between on-premises environments and Outposts via LGW.

Using your on-premises router/firewall

This approach uses the existing firewall capabilities of your on-premises router/firewall. The network is configured to route all traffic between the on-premises environment and Outposts through this router/firewall. The LGW on your Outpost connects directly to your router/firewall, which handles the firewall functions. This setup uses the on-premises security infrastructure and policies, ensuring continuity in security management while integrating Outposts into the broader network security strategy.

Traffic flow:

  1. Traffic originates from on-premises network
  2. Passes through your router with the firewall
  3. Router inspects the traffic
  4. If allowed, traffic is sent to Outposts through the LGW
  5. Outbound inspection to the internet from Outposts instances

Outbound inspection to the internet from Outposts instances

Outbound internet traffic inspection for Outposts instances is useful for security and controlling access to external resources. Three architectural approaches are available for implementing this inspection, which are discussed in the following sections.

Using Customer-Owned IP (CoIP) with on-premises firewall

In this architecture, Outposts instances are assigned Customer-Owned IP (CoIP) addresses, with all outbound internet traffic routed through the on-premises network and firewall. The LGW connects the Outposts environment to the on-premises network. This setup enables organizations to leverage existing on-premises security and internet connectivity while ensuring consistent IP addressing across their hybrid environment.

This diagram illustrates Customer-Owned IP (CoIP) implementation in a customer data center, where Outpost EC2 instances use CoIP addresses, routing through LGW to an on-premises firewall for inspection.

Figure 5 – Customer-Owned IP (CoIP) with on-premises firewall

  1. An Outposts instance with a CoIP address initiates outbound internet traffic.
  2. The traffic is routed to the LGW on the Outpost.
  3. The LGW forwards the traffic to the on-premises network.
  4. The traffic reaches the on-premises firewall and inspects the traffic, applying security policies and rules.
  5. If allowed, the firewall forwards the traffic to the internet through the on-premises connection.
  6. Return traffic follows the reverse path, being inspected by the firewall before reaching the Outposts instance.

Using CoIP with third-party firewalls on Outposts

Using this configuration, you would assign a CoIP addresses to your Outposts instances and deploy a third-party firewall appliance directly on the Outposts rack. Outbound internet traffic from these instances is routed through the local firewall running on EC2 before reaching the internet via the LGW. This approach ensures local traffic inspection while preserving the advantages of CoIP addressing, enabling seamless integration with existing IP management systems.

This diagram illustrates third-party firewalls on Outposts as EC2 instances. Customer data center contains Outpost subnet with EC2 instances using CoIP, connected to LGW. Traffic routes through Customer Edge Router/Firewall before reaching Internet

Figure 6 – CoIP with third-party firewalls on Outposts

  • An Outposts instance with a CoIP address initiates outbound internet traffic.
  • The traffic is routed to the third-party firewall deployed on the Outpost.
  • The firewall performs deep packet inspection, applying security policies and rules.
  • If allowed, the firewall forwards the traffic to the LGW.
  • The LGW sends the traffic to the internet through the on-premises connection.
  • Return traffic follows the reverse path, being inspected by the firewall before reaching the Outposts instance.

Using Internet Gateway (IGW) with Network Firewall in the Region

This architecture provides secure outbound internet access for Outposts workloads by using services in the parent Region. The VPC extends to include the Outposts rack, with internet-bound traffic routed via the service link to the AWS Region. In the Region, the Network Firewall inspects the traffic before forwarding it to the Internet Gateway (IGW) for internet access.

Traffic flow:

  1. Traffic is sent to the parent Region via the service link.
  2. In the Region, traffic is routed to the Network Firewall.
  3. The Network Firewall inspects the traffic and applies rules.
  4. If allowed, traffic is forwarded to the IGW via the NAT Gateway.
  5. The IGW sends the traffic to the internet.
  6. Return traffic follows the reverse path, inspected before reaching Outposts.

Conclusion

Implementing effective network traffic inspection for AWS Outposts requires a strategic approach balancing security, efficiency, and architectural complexity. We’ve explored multiple architectural patterns for implementing network traffic inspection with Outposts rack.

Reach out to your AWS account team or AWS support to learn more about inspection in Outpost.

Migrating your on-premises workloads to AWS Outposts Rack

Post Syndicated from Art Baudo original https://aws.amazon.com/blogs/compute/migrating-your-on-premises-workloads-to-aws-outposts-rack-2/

This post is written by Craig Warburton, Senior Solutions Architect, Hybrid; Sedji Gaouaou, Senior Solutions Architect, Hybrid; and Brian Daugherty, Principal Solutions Architect, Hybrid.

Migrating workloads to AWS Outposts Rack offers you the opportunity to gain the benefits of cloud computing while keeping your data and applications on premises.

For organizations with strict data residency requirements, by deploying AWS infrastructure and services on premises, you can keep sensitive data and mission-critical applications within your own data centers or facilities, helping ensure compliance with data sovereignty laws and regulatory frameworks.

On the other hand, if your organization does not have stringent data residency requirements, you may opt for a hybrid approach, using both Outposts Rack and the AWS Regions. With this flexibility, you can process and store data in the most appropriate location based on factors such as latency, cost optimization, and application requirements.

In this post, we cover options to migrate your workloads to an Outposts Rack, taking into account your specific data residency requirements. We explore strategies, tools, and best practices to enable a successful migration tailored to your organization’s needs.

Overview

AWS has several services to help you migrate and rehost workloads, including AWS Migration Hub, AWS Application Migration Service, AWS Elastic Disaster Recovery. Alternatively, you can use backup and recovery solutions provided by AWS partners.

At AWS, we use the 7 Rs framework to help organizations evaluate and choose the appropriate migration strategy for moving applications and workloads to the AWS Cloud. The 7 Rs represent:

  1. Rehosting (rehost or lift and shift)
  2. Replatforming (lift, tinker, and shift)
  3. Repurchasing (republish or re-vendor)
  4. Refactoring (re-architecting)
  5. Retiring
  6. Retaining (revisit)
  7. Relocating (remigrate).

This post focuses on rehosting and the services available to help rehost on-premises applications to Outposts Rack.

Before getting started with any migration, AWS recommends a three-phase approach to migrating workloads to the cloud (AWS Region or Outposts Rack). The three phases are assess, mobilize, and migrate and modernize.

Figure 1: Diagram showing the three migration phases of assess, mobilize, and migrate and modernize

Figure 1: Diagram showing the three migration phases of assess, mobilize, and migrate and modernize

This post describes the steps that you can take in the migrate and modernize phase. However, the assess and mobilize phases are also critical to allow you to understand what applications are migrated, the dependencies between them, and the planning associated with how and when migration occurs.

Workload migration to Outposts Rack: With staging environment in a Region

After deploying an Outposts Rack to your desired on-premises location, you can perform migrations of on-premises systems and virtual machines using either Application Migration Service and AMI creation or third-party backup and recovery services. Both scenarios are described in the following sections.

Scenario 1: Using Application Migration Service with AMI creation

Application Migration Service is able to lift and shift a large number of physical or virtual servers without compatibility issues, performance disruption, or long cutover windows.

In this scenario, at least one Outposts Rack is deployed on premises with the following prerequisites:

  • An AWS Replication Agent installed on each source server
  • At least one Outposts Rack installed and activated
  • VPC in an AWS Region
  • Staging subnet for staging migrated instances
  • Cutover subnet to validating migrated instances
  • Extended VPC spanning Region to the Outposts Rack
  • Migrated resources subnet where instances will be deployed from AMIs

The following diagram shows the solution architecture including the prerequisites and the on-premises servers that will be migrated to the Outposts Rack.

Figure 2: Architecture diagram showing migration with Application Migration Service

Figure 2: Architecture diagram showing migration with Application Migration Service

Step 1: Outposts Rack configuration

You can work with AWS specialists to size your Outposts for your workload and application requirements. In this scenario, you don’t need additional Outposts Rack capacity for migration because the staging area will be deployed in the Region (see 1 in Figure 2).

Step 2: Prepare Application Migration service

Set up Application Migration Service from the console in the Region to which your Outposts Rack is anchored. If this is your first setup, then choose Get started on the Application Migration Service console. When creating the replication settings template, ensure that your staging area is using subnets in the anchor Region (see 2 in Figure 2).

Step 3: Install the AWS Replication Agent to the source servers or machines

For large migrations, source servers may have a wide variety of operating system versions and may be distributed across multiple data centers. Application Migration Service offers the MGN connector, a feature that allows you to automate running commands on your source environment. Finally, ensure that communication is possible between the agent and Application Migration Service (see 3 in Figure 2).

In the following image, there is an example of deploying the AWS Replication Agent providing the necessary parameters (AWS Region, AWS access key and AWS secret access key).

Figure 2: Architecture diagram showing migration with Application Migration Service

When the AWS Replication Agent is installed, the server is added to the Application Migration Service console. Next, it undergoes the initial syncronization process, which is completed when showing the Ready for testing lifecycle state in the Application Migration Service console.

Step 4: Configure launch settings

Prior to testing or cutting over an instance, you must configure the launch settings by creating Amazon Elastic Compute Cloud (Amazon EC2) launch templates, ensuring that your cutover subnet is selected and that you choose an available instance type (see 4 in Figure 2). The instance type right-sizing feature allows AWS Application Migration Service to launch a test or cutover instance type that best matches the hardware configuration of the source server, by selecting the Basic option, AWS Application Migration Service will launch a test or cutover AWS instance type that best matches the OS, CPU, and RAM of your source server.

Step 5: Install AWS Systems Manager Agent on your cutover instances. When the launch settings are defined, you must activate the post-launch actions for either a specific server or all the servers. You must leave the Install the Systems Manager agent and allow executing actions on launched servers option toggled on in order for post-launch actions to work. Untoggling the option would disallow Application Migration Service to install the AWS Systems Manager Agent on your servers, and post-launch actions would no longer be executed (see 5 in Figure 2).

Figure 3: Post-launch actions on the Application Migration Service console

Figure 3: Post-launch actions on the Application Migration Service console

Step 6: Testing and cutover in Region

When you have configured the launch settings for each source server, you are ready to launch the servers as test instances. Best practice is to test instances before cutover.

Figure 4: Application Migration Service console ready to launch test instances

Figure 4: Application Migration Service console ready to launch test instances

Finally, after completing the testing of all the source servers, you are ready for cutover (see 6 on Figure 2). Prior to launching cutover instances, check that the source servers are listed as Ready for cutover under Migration lifecycle and Healthy under Data replication status.

Figure 5: Application Migration Console ready for cutover

Figure 5: Application Migration Console ready for cutover

To launch the cutover instances, choose the instances you want to cutover and then choose Launch cutover instances under Cutover (see Figure 5). The Application Migration Service console indicates Cutover finalized when the cutover has completed successfully the chosen source servers’ Migration lifecycle column shows the Cutover complete status, the Data replication status column shows Disconnected, and the Next step column shows Mark as archived. The source servers have now been successfully migrated into AWS. You can now archive your source servers that have launched cutover instances.

Step 7: Create a Migration AMI

After migrating all your workloads in the region where the Outposts is anchored to, create Amazon Machine Images (AMI). When you create an AMI from an instance, Amazon EC2 powers down the instance before creating the AMI to make sure that everything on the instance is stopped and in a consistent state during the creation process. If you are confident that your instance is in a consistent state appropriate for AMI creation, you can tell Amazon EC2 not to power down and reboot the instance.

This step can be automated using an existing Post Launch Action.

Step 8: Launch instances on AWS Outposts

The final part is to launch your created AMIs to your Outposts. To identify the EC2 instances configured on your Outpost you can use the following AWS Command Line Interface (AWS CLI):

Outposts get-outpost-instance-types \

–outpost-id op-abcdefgh123456789

The output of this command lists the instance types and sizes configured on your Outpost:

InstanceTypes:

– InstanceType: c5.xlarge

– InstanceType: c5.4xlarge

– InstanceType: r5.2xlarge

– InstanceType: r5.4xlarge

With knowledge of the instance types configured, you can now determine how many of each are available. For example, the following AWS CLI command, which is run on the account that owns the Outpost, lists the number of c5.xlarge instances available for use:

aws cloudwatch get-metric-statistics \

–namespace AWS/Outposts \

–metric-name AvailableInstanceType_Count \

–statistics Average –period 3600 \

–start-time $(date -u -Iminutes -d ‘-1hour’) \

–end-time $(date -u -Iminutes) \

–dimensions \

Name=OutpostId,Value=op-abcdefgh123456789 \

Name=InstanceType,Value=c5.xlarge

This command returns:

Datapoints:

– Average: 10.0

Timestamp: ‘2024-04-10T10:39:00+00:00’

Unit: Count

Label: AvailableInstanceType_Count

The output indicates that there were (on average) 10 c5.xlarge instances available in the specified time period (one hour). Using the same command for the other instance types, you discover that there are also 20 c5.4xlarge, 10 r5.2xlarge, and 6 r5.4xlarge available for use in completing the necessary EC2 launch templates.

Scenario 2: Using partner backup and replication solutions

You may already be using a third-party or AWS Partner solution to create on-premises backups of bare-metal or virtualized systems. These solutions often use local disk-arrays or object stores to create tiered backups of systems covering restore-points going back years, days, or just a few hours or minutes.

These solutions may also have inherent capabilities to restore from these backups directly to the AWS. This enables migration of on-premises systems to EC2 instances deployed to Outposts Rack.

In the scenario illustrated in Figure 6, the partner backup and replication service (BR) creates backups (see 1 in Figure 6) of virtual machines to on-premises disk or object storage repositories. Using the service’s AWS integration, virtual machines can be restored (see 2 in Figure 6) to an EC2 instance deployed on Outposts Rack, which is also on-premises. The restoration may follow a process that uses helper instances and volumes (see 3 in Figure 6) during intermediate steps to create Amazon Elastic Block Store (Amazon EBS) snapshots (see 4 in Figure 6) and then AMIs of the systems being migrated (see 5 in Figure 6), which are ultimately deployed (see 6 in Figure 6) to Outposts Rack.

Figure 6: Architecture diagram of the partner backup and replication scenario

Figure 6: Architecture diagram of the partner backup and replication scenario

When deploying an AMI created from a restored instance you must specify the target VPC and subnet. These should be the VPC being extended to the Outpost and a subnet that has been created in that VPC on the Outpost. You also need to specify an EC2 instance type that is available on the Outpost, which can be discovered using the process described in the previous section.

Workload migration to Outposts Rack using AWS Elastic Disaster Recovery (DRS)

Data residency can be a critical consideration for organizations that collect and store sensitive information, such as personally identifiable information (PII), financial data, or medical records. AWS Elastic Disaster Recovery, supported on Outposts Rack, helps enable seamless replication of on-premises data to Outposts Rack and addresses data residency concerns by keeping data within your on-premises environment, using Amazon EBS and Amazon S3 on Outposts.

In this scenario, an Outpost Rack is deployed on-premises with the following prerequisites:

  • At least one Outposts Rack installed and activated
  • The Outposts Rack must be in Direct VPC Routing (DVR) mode
  • VPC extended to the Outposts Rack containing subnets for staging and target resources
  • Amazon S3 on Outposts (necessary for all Elastic Disaster Recovery replication destinations)
  • An AWS Replication Agent installed on each source server

The following diagram shows the solution architecture and includes the on-premises servers that are migrated from the local network to the Outposts Rack. It also includes the staging VPC used to deploy the replication servers on Outposts Rack, Amazon S3 on Outposts to store the local Amazon EBS snapshots, and the target VPC extended to Outposts Rack.

Figure 7: Architecture diagram for workflow migration to Outposts Rack

Figure 7: Architecture diagram for workflow migration to Outposts Rack

Step 1: Outposts Rack configuration

To use Elastic Disaster Recovery on Outposts Rack, you need to configure both Amazon EBS and Amazon S3 on Outposts to support continuous replication and point-in-time recovery for your workload needs (see 1 in Figure 7). Specifically, you need to size the Amazon EBS and Amazon S3 on Outposts capacity according to your workload capacity requirements and application interdependencies. To do this, you can define dependency groups: each dependency group is a collection of applications and their underlying infrastructure with technical or non-technical dependencies. A 2:1 ratio is recommended for the EBS volumes to be used for near-continuous replication, and a 1:1 ratio is recommended for the Amazon S3 on Outposts ratio for EBS snapshots. For example, to migrate 40 TB of workloads, you need to plan for 80 TB of EBS volumes and 40 TB of Amazon S3 on Outposts capacity.

Step 2: Extend VPC to your Outposts Rack

When your Outpost has been provisioned and is available, extend the necessary Amazon Virtual Private Cloud (Amazon VPC) connection to the Outpost from the Region by creating the desired staging and target subnets (see 2 in Figure 7).

Step 3: Prepare Elastic Disaster Recovery service

Prepare the Elastic Disaster Recovery service from the Console to set the default replication and launch settings. When defining these settings, make sure that the Outposts resources available are chosen for staging and target subnets and instance and storage type (see 3 in Figure 7).

Step 4: Install the AWS Replication Agent to the source servers or machines

The next phase is to install the AWS Replication Agent to the source servers and to make sure that communication is possible between the AWS Replication Agent and your Outposts replication subnet through the Outposts local gateway, which makes sure that replication traffic uses the local network (see 4 in Figure 7).

Step 5: Continuous block-level replication

Staging area resources are automatically created and managed by Elastic Disaster Recovery. When the AWS Replication Agent has been deployed, continuous block-level replication (compressed and encrypted in transit) occurs (see 5 in Figure 7) over the local network.

Step 6: Launch Outposts Rack resources

Finally, migrated instances can now be launched using Outposts Rack resources based on the launch settings defined previously (see 6 in Figure 7).

Conclusion

In this post, you have learned how to migrate your workloads from your on-premises environment to AWS Outposts Rack based on your specific data residency requirements. When you have the flexibility of using AWS Regional services, AWS migration services or partner solutions can be used with infrastructure already in place. If your data must stay on-premises, then using AWS Elastic Disaster Recovery allows you to migrate your data without using Regional services, allowing you to migrate to Outposts Rack without your data leaving the boundary of a certain geographic location.

To learn more about an end-to-end migration and modernization journey, visit the AWS Migration Hub.

Implementing a serverless architecture to detect absence of Guardrails in Amazon Bedrock inference API calls

Post Syndicated from Art Baudo original https://aws.amazon.com/blogs/compute/implementing-a-serverless-architecture-to-detect-absence-of-guardrails-in-amazon-bedrock-inference-api-calls/

This post is written by Sayan Chakraborty, Senior Solutions Architect, AWS

Implementing a serverless architecture to detect absence of Guardrails in Amazon Bedrock inference API calls

In today’s rapidly evolving artificial intelligence (AI) landscape, organizations are increasingly harnessing the power of foundation models through Amazon Bedrock to build sophisticated generative AI applications. Although this technology opens up exciting possibilities, it also brings forth important considerations around responsible AI implementation and content safety.

Amazon Bedrock Guardrails serve as a crucial safeguard, helping organizations filter out harmful content, prevent prompt injection attacks (LLM01:2025 from OWASP Top 10 for generative AI), and maintain ethical AI practices. These configurable safeguards are essential for enterprises committed to responsible AI development, especially when scaling their applications across various use cases.

However, there’s a critical consideration: although Guardrails are powerful, they’re optional by default in Amazon Bedrock inference API calls. For organizations that mandate the use of Guardrails as part of their responsible AI strategy, a solution is needed to make sure of consistent implementation across all API requests.

In this post, we explore how to build a serverless architecture that automatically detects when Guardrails are absent in Amazon Bedrock inference API calls. We demonstrate how enterprises can implement automated monitoring and alerting systems to maintain compliance with their AI safety standards, making sure that Guardrails are properly implemented wherever needed. This solution is particularly valuable for organizations prioritizing secure and responsible AI deployment at scale.

Prerequisites

Before proceeding with the implementation, make sure that you do the following:

1.Create an AWS account if you do not already have one, and log in. The AWS Identity and Access Management (IAM) user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.

2.Have AWS Command Line Interface (AWS CLI) installed and configured.

3.Have Git Installed.

Architecture

The following diagram shows an event-driven architecture of this solution.

Figure 1: Solution architecture diagram

Figure 1: Solution architecture diagram

Amazon Bedrock supports model invocation logging. When enabled, it collects the full request data, response data, and metadata associated with all model invocation calls performed in your AWS account. Logging can be configured to send the logs to supported destinations such as Amazon CloudWatch Logs and Amazon S3. This solution uses an S3 bucket to collect these logs. Note that this solution supports the below Amazon Bedrock inference APIs:

As logs get stored in the S3 bucket, an Amazon S3 event notification is generated to an Amazon EventBridge event bus. A rule that matches “Object Created” events from Amazon S3 routes these events to an AWS Step Functions state machine, which defines the orchestration logic to inspect the model invocation logs for missing Guardrails, and sends out an alert to a monitored email address when applicable.

Walkthrough of the orchestration

As mentioned previously, the Step Functions state machine is the orchestration engine that performs the business logic for this solution, as events are received from new logs created in the S3 bucket. When opened in the Workflow studio in the Step Functions console, you should observe the following diagram.

Figure 1: Step Functions state machine diagram as seen in workflow studio

Figure 2: Step Functions state machine diagram as seen in workflow studio

1.The first step in the state machine is to call an AWS Lambda function to get the logs from the S3 bucket using the bucket name and object key supplied in the event object received from EventBridge.

2.If the log shows that the Amazon Bedrock API invocation was successful, then the state machine collects the output object of the API response from the log that is needed for further evaluation.

3.The next step is to check if Amazon Bedrock Guardrails was used. This is done by looking for specific objects in the Amazon Bedrock API output that was captured from the logs.

4.If a Guardrail was detected, then the flow completes successfully, and no further action is needed.

5.If a Guardrail was not detected, then the next step in the state machine collects a few pieces of information from the log file that is necessary to record the transaction and adds the transaction date. Then, the transaction is logged in to the transactions table in Amazon DynamoDB.

6.A user or a role may be making a lot of API calls to Amazon Bedrock each day. Therefore, the solution implements a mechanism to prevent the monitored email address from being swamped by emails reporting the same user or role more than once each day. This is done in parallel to Step 5, where the flow checks if the principal’s identity (user ID/IAM role) is recorded as notified in the current date, by querying the notifications table in DynamoDB. If no results were found, meaning that a notification hasn’t been sent yet, then an email is sent out to a monitored email address through an Amazon Simple Notification Service (Amazon SNS) topic. Furthermore, an item is inserted into the notifications table in DynamoDB to prevent sending more notifications on the same day for the same principal.

Solution deployment

For deployment instructions, follow along in the GitHub repo or use this post. An AWS CloudFormation template is provided to deploy the solution.

1.Create an S3 bucket to store the model invocation logs from Amazon Bedrock. Under bucket Properties, turn the EventBridge notifications to On. This enables Amazon S3 to send an event notification to the EventBridge default event bus whenever a log file is created in the bucket by Amazon Bedrock.

2.Go to the Amazon Bedrock console and enable Model invocation logging under Bedrock Configuration > Settings, from the left navigation pane. Specify the bucket created in Step 1 under S3 location.

Figure 2: Amazon Bedrock settings for Model invocation logging

Figure 3: Amazon Bedrock settings for Model invocation logging

3. Create two more S3 buckets: one that is used by the Step Functions state machine to store Bedrock model invocation errors detected from the log, and the other that stores the Lambda function code for this solution. Inside the latter bucket, create a Folder called code (or any other preferred name) and upload the ZIP archive under the lambda-code folder of this repository, into that Amazon S3 folder. Note the names for these two S3 buckets and the Amazon S3 object key for the Lambda ZIP file. These must be specified as input parameters to the CloudFormation template.

4. From the CloudFormation console or using CLI, create a stack using the template provided in this repository called bedrock-guardrails-detection-template.yaml. For inputs, specify the BedrockLogsBucket (from Step 1), BedrockLogsErrorBucket (from Step 3), LambdaFunctionCodeBucket (from Step 3), LambdaFunctionCodeBucketKey (S3 object key for the ZIP file uploaded in Step 3, for example code/get-bedrock-logs-from-s3.py.zip), and NotificationEmailAddress (email address to subscribe to the SNS topic). It may take a few minutes to complete deployment of the CloudFormation stack.

5. When deployment is complete, access the email inbox for the email address specified during the CloudFormation stack deployment, and confirm the subscription using the email sent from the Amazon SNS topic. The email should be titled: AWS Notification – Subscription Confirmation. Choose the Confirm subscription link inside the email to complete the subscription process. The email account is now ready to receive notifications from this solution.

Scaling to multiple AWS accounts

The architecture discussed previously shows how Guardrails can be detected from within the same AWS account where Amazon Bedrock APIs are invoked. However, in most production environments, there are multiple AWS accounts where independent teams may be deploying their own generative AI workloads using Amazon Bedrock in their own accounts. To collect model invocation logs from all those accounts, EventBridge can be configured to send events from event buses in separate source workload accounts to a central event bus deployed in a central destination governance account. This central event bus can have a rule to route events to the Step Functions state machine deployed in that central governance account. The deployment model looks like the following diagram.

To learn more about sending and receiving events between AWS accounts in EventBridge, refer to the documentation.

Figure 3: Cross-account guardrail detection solution

Figure 4: Cross-account guardrail detection solution

Further considerations and clean up

Amazon Bedrock model invocation logging captures requests and responses from model invocations and stores the logs in the destination of your choosing. In this sample it is in an S3 bucket that you create. The following are some more security considerations.

1.To protect information, you may choose to use to encrypt the contents using server-side encryption with AWS KMS keys (SSE-KMS) on the S3 bucket, and specify a customer managed encryption key. More details are in this Amazon Bedrock user guide.

2.Perform regular cleanup of the model invocation logs bucket using an Amazon S3 lifecycle configuration rule as mentioned in this post.

To avoid ongoing charges, clean up your environment by following these steps to delete the resources you created by following this post, if they are no longer needed:

1.Delete the stack:
aws cloudformation delete-stack –stack-name STACK_NAME

2.Confirm the stack has been deleted:
aws cloudformation list-stacks –query “StackSummaries[?contains(StackName,’STACK_NAME’)].StackStatus”

3.Empty contents of the S3 buckets created manually as a prerequisite to deploying the CloudFormation stack and delete the buckets.

4.Turn off model invocation logging from under Settings in the Amazon Bedrock console, if it’s not desired any longer.

Conclusion

This post discussed implementing a serverless event-driven architecture to detect the absence of Guardrails in Amazon Bedrock inference API calls. As organizations increasingly use foundation models through Amazon Bedrock for generative AI applications, making sure of responsible AI implementation becomes crucial.

The solution presents an event-driven architecture that automatically detects when Guardrails are missing in API calls. It uses the Amazon Bedrock model invocation logging, storing logs in an Amazon S3 bucket. When new logs are created, an Amazon S3 event notification triggers an Amazon EventBridge event bus, which routes events to an AWS Step Functions state machine. Then, the state machine inspects the logs for missing Guardrails and sends alerts through Amazon SNS to a monitored email address.

The architecture includes features to prevent notification flooding and can scale across multiple AWS accounts. The post provides detailed deployment instructions using AWS CloudFormation and includes security considerations and cleanup procedures. With this solution you can help your organization maintain compliance with AI safety standards while scaling generative AI applications.

Efficiently manage Amazon EC2 On-Demand Capacity Reservations (ODCRs) with split, move, and modify

Post Syndicated from Art Baudo original https://aws.amazon.com/blogs/compute/efficiently-manage-amazon-ec2-on-demand-capacity-reservations-odcrs-with-split-move-and-modify/

This post is written by Ninad Joshi, Senior Solutions Architect, Ballu Singh, Principal Solutions Architect, and Ankush Goyal, Enterprise Support Lead AWS.

Introduction

In today’s cloud-first world, managing compute capacity efficiently while making sure of application availability is crucial for your business. Amazon EC2 On-Demand Capacity Reservations (ODCR) is a valuable tool for organizations looking to manage their reservations, but managing reservations across multiple teams and accounts is challenging. Recently, AWS introduced new capabilities – split, move, and modify – that improve how organizations can manage their Capacity Reservations. In this post, we explore how these features can transform your operations.

Common ODCR management challenges

As a consumer of ODCR, you might face several challenges managing your Capacity Reservations. These challenges include but are not limited to the following:

  • Underused reserved capacity in some accounts
  • Inability to redistribute excess capacity efficiently
  • Difficulty in managing existing capacity across multiple AWS accounts
  • Difficulty in modifying reservation attributes post-creation

With multiple development teams and various projects running simultaneously, you might struggle with efficient capacity allocation. You might also find yourself dealing with situations where one team has excess capacity while another desperately needs it.

Use case 1: Redistributing capacity across teams

The unused capacity dilemma

Consider a scenario where your machine learning (ML) team has an ODCR for ten c5.2xlarge instances, but they’re only using five. Meanwhile, your Analytics team urgently needs three Amazon Elastic Compute Cloud (Amazon EC2) instances of the same type for a new project. Previously, your Analytics team would have had to create a new reservation, leading to unnecessary overhead of managing their own Capacity Reservation. Meanwhile, the five unused capacity slots of the ODCR owned by your ML team results in unnecessary costs.

Split capability to the rescue

Using the new split capability, you can now divide the existing ODCR (see ODCR-1 in the following figure), which has a total capacity of ten EC2 instances, and create a new ODCR with three of the unused capacity.

Before split, ODCR-1 with original total and unused capacity

Figure 1: Before split, ODCR-1 with original total and unused capacity

This results in the creation of two ODCRs:

  1. Original ODCR: total capacity of seven instances for the ML team
  2. New ODCR: three instances for the Analytics team

The following figure illustrates the split result:After split, ODCR-1 with updated total and unused capacity, and newly created ODCR-2

Figure 2: After split, ODCR-1 with updated total and unused capacity, and newly created ODCR-2

Sharing across accounts

The split operation creates the new ODCR in the same AWS account. If your teams operate under the same AWS account, then the split operation is direct without any further steps. However, if your teams use different AWS accounts, then you would need to use AWS Resource Access Manager (AWS RAM) to share the newly created ODCR after the split operation. This enables cross-account capacity management while maintaining centralized control.

Refer to the AWS Documentation for more information on pre-requisites and considerations when splitting off capacity from one reservation to a new one.

Refer to the API and CLI documentation for further information on the split capability such as parameters, exceptions, and limits.

Use case 2: moving capacity between reservations

Scaling for growth

After a few days, when your Analytics team needs one more capacity to launch an instance for their expanding project, you need to add more capacity to ODCR-2.

Move capability to the rescue

Instead of creating a new ODCR for this purpose, you can move one of the unused slots from ODCR-1 to ODCR-2. This flexibility saves you multiple steps involved in reserving new capacity, removes any disruptions to running existing workloads, and helps with simpler ODCR management. This rebalancing makes sure of optimal resource usage without further procurement.

Before move, ODCR-1 with unused capacity and ODCR-2 with current capacity

Figure 3: Before move, ODCR-1 with unused capacity and ODCR-2 with current capacity

After move, ODCR-1 with reduced capacity and ODCR-2 with additional capacity

Figure 4: After move, ODCR-1 with reduced capacity and ODCR-2 with additional capacity

Refer to the AWS Documentation for more information on pre-requisites and considerations when moving capacity from one reservation to another one.

Refer to the API and CLI documentation for further information on the move capability such as parameters, exceptions, and limits.

Use case 3: adjusting reservation attributes for changing workload patterns

Dynamic workload requirements

When your data processing workload patterns change significantly, you must adapt. Initially, you might have set up your ODCR with specific instance matching criteria, making it a targeted reservation for predictable workloads. However, as you introduce more dynamic, impromptu analysis projects, you need more flexibility in how instances can be launched against your reservation.

Modify feature to the rescue

Using the modify capability, you can now change the reservation’s attributes without creating a new reservation or disrupting running workloads. You can modify your ODCR by:

  • Changing instance quantity
  • Changing instance eligibility from Targeted to Open
  • Adjusting the reservation’s end date to align with your project timeline

This modification allows you to:

  • Launch new instances more flexibly without strict instance eligibility
  • Improve the usage of reserved capacity across different projects
  • Maintain cost optimization while adapting to changing business needs

The modify feature provides this flexibility while making sure that your existing workloads continue running uninterrupted, making it an invaluable tool for dynamic environments. See the following figures for an example where the instance quantity of ODCR-2 is modified from four to six:

Before modify, ODCR-2 with total capacity of four and instance eligibility of targeted

Figure 5: Before modify, ODCR-2 with total capacity of four and instance eligibility of targeted

After modify, ODCR-2 with new total capacity of six and instance eligibility of open

Figure 6: After modify, ODCR-2 with new total capacity of six and instance eligibility of open

Increasing ODCR size or creating a new one is subject to capacity availability in Amazon EC2 on-demand availability. Therefore, if unused capacity is available in an existing ODCR, then moving/splitting that could be a better option than modifying an ODCR.

Refer to the AWS Documentation for more information on pre-requisites and considerations when modifying Capacity Reservations.

Refer to the API and CLI documentation for further information on the modify capability such as parameters, exceptions, and limits.

Special considerations for split capacity

In the preceding sections, we saw how you can use the split capability to detach excess unused capacity to create an ODCR for another team. However, you can also use this capability to split used capacity to create new ODCRs. This capability is particularly helpful when you want to split partially used ODCRs to create a new one for easier tracking and management. Along with the considerations for splitting unused/excess capacity, the following considerations apply for splitting used capacity:

  1. The used capacity can only be split for an ODCR with open instance eligibility that isn’t shared with any account.
  2. The instances running inside the reservation are of open eligibility (in other words they are not targeting the reservation).
  3. When you split the used capacity, the eligible instances are randomly selected. You cannot specify which running instances are split. If a sufficient number of eligible instances aren’t found to fulfill the split quantity, then the split operation fails. When you specify the quantity of instances to be split, by default any unused capacity is moved first, followed by any eligible running instances (the used capacity in your reservation).

In the next section we different scenarios where you can or can’t use split capability.

Scenario 1: managing internal ODCRs (Capacity Reservation not shared with any other AWS account)

For your internal projects, when managing ODCRs that aren’t shared with external partners (other AWS accounts) and all have open instance eligibility, consider this example with ODCR-1:

  • Total capacity of ten c5.2xlarge instances, all with open instance eligibility
  • Eight instances currently in use by your ML team
  • Two unused instances

Before split, ODCR-1 with total capacity of 10 and 2 unused instances

Figure 7: Before split, ODCR-1 with total capacity of 10 and 2 unused instances

This ODCR isn’t shared with any external AWS accounts, thus you have maximum flexibility in splitting the reservation. You can split up to nine instances into a new reservation (total capacity minus one), regardless of how many instances are currently in use. In this scenario, you can share used as well as unused capacity. This gives you significant freedom in restructuring the capacity allocation for your internal teams.

After split, ODCR-1 remains with total capacity of one, and ODCR-2 with total capacity of nine with two unused capacities

Figure 8: After split, ODCR-1 remains with total capacity of one, and ODCR-2 with total capacity of nine with two unused capacities

Scenario 2: managing shared ODCRs with partners (Capacity Reservation shared with other AWS account)

When you need to share your ODCR with a partner’s AWS account, consider this scenario where ODCR-1 has:

  • Total capacity of ten c5.2xlarge instances
  • Eight instances in use by both your team and your partner’s team
  • Two unused instances

Before split, ODCR-1 shared with another AWS account

Figure 9: Before split, ODCR-1 shared with another AWS account
In this case, your options are more limited. ODCR-1 is shared with your partner’s AWS account, thus you can only split the unused capacity (maximum of two instances). After split, the newly created ODCR (ODCR-2) remains in your AWS account and isn’t shared with any other AWS account. This restriction helps prevent any disruption to your partner’s running workloads while still allowing for some flexibility in capacity management.

After split, ODCR-1 remains shared with another AWS account, and newly created ODCR-2 isn’t shared

Figure 10: After split, ODCR-1 remains shared with another AWS account, and newly created ODCR-2 isn’t shared

These scenarios demonstrate important factors about capacity management in both internal and partner-shared environments. You should carefully consider the sharing status of ODCRs before planning any splits or modifications, making sure of smooth operations for both your teams and your partners.

Special considerations for move capability

The move capability enables you to redistribute available (or excess) capacity between ODCRs. However, in certain cases, you can also use this capability to move used instances between ODCRs. This capability is particularly helpful if you want to merge partially used ODCRs into one for easier tracking and management. Along with the considerations for moving unused capacity, the following considerations apply for moving used capacity:

  1. Both source and destination ODCR are of open instance eligibility and in active state.
  2. The instances running inside the reservation are of open eligibility (in other words they are not targeting the reservation).
  3. Both source and destination ODCRs are owned by the same account.
  4. The source and destination ODCRs can be shared, but with the same list of accounts when moving used portion. This sharing to same accounts condition doesn’t apply to the unused portion of the ODCR.

When you specify the quantity of instances to be moved, by default any unused capacity is moved first, followed by any eligible running instances (the used capacity in your reservation).

In the next sections, we review where you can or can’t use this capability.

Scenario 1: source and destination ODCRs not shared with other account(s) (Team Transfers)

When managing capacity between your internal teams using the same AWS account (Account-A), you find the process clear. For example, when consolidating the ML team’s resources:

  • ODCR-1 (ML Team A): had ten capacities total (all with open eligibility), with eight in use and two unused.
  • ODCR-2 (ML Team B): had five capacities (all with open eligibility), all in use.

Before move, ODCR-1 and ODCR-2 both in the same AWS account, unshared

Figure 11: Before move, ODCR-1 and ODCR-2 both in the same AWS account, unshared

Both ODCRs belonged to the same account and weren’t shared externally, and the ODCRs have open instance eligibility. Therefore, you could freely move all ten instances from ODCR-1 to ODCR-2, creating a unified pool of 15 instances for the consolidated DevOps team.

After moving capacity from ODCR-1, ODCR-2 has combined total capacity of 15 with 2 unused

Figure 12: After moving capacity from ODCR-1, ODCR-2 has combined total capacity of 15 with 2 unused

Scenario 2: source and destination ODCRs shared with the same account(s) (External Partner Collaboration)

If your ML team (ODCR-1) collaborates with an external AI research partner (Account-B), your setup might look like the following:

  • ODCR-1: ten instances (eight used, two unused), all with open instance eligibility, shared with the research partner through AWS RAM.
  • ODCR-2: Five instances (all used), all with open instance eligibility, for internal Analytics team.

Before move, ODCR-1 and ODCR-2 both in the same AWS account, with ODCR-1 shared with other AWS account

Figure 13: Before move, ODCR-1 and ODCR-2 both in the same AWS account, with ODCR-1 shared with other AWS account

When your Analytics team needs more capacity, you can only move the two unused instances from ODCR-1 to ODCR-2, as the other eight are actively used in the partner collaboration.

Since ODCR-1 is shared with other AWS account, only unused capacity is moved to ODCR-2

Figure 14: Since ODCR-1 is shared with other AWS account, only unused capacity is moved to ODCR-2

Scenario 3: source and destination ODCRs shared with different account(s) (Multi-Partner Projects)

In this scenario involving managing capacity across different partner engagements:

  • ODCR-1: Ten instances (eight used, two unused), shared with a database partner (Account-B).
  • ODCR-2: Five instances (all used), shared with a security partner (Account-C).

ODCR-1 and ODCR-2 are shared with different AWS account

Figure 15: ODCR-1 and ODCR-2 are shared with different AWS account

Due to the different partner arrangements, in other words ODCRs shared with another accounts, you can only move the two unused capacities from ODCR-1 to ODCR-2. This makes sure that there is no disruption to database partner workloads.

Only unused capacity moved to ODCR-2 due to shared capacity reservations

Figure 16: Only unused capacity moved to ODCR-2 due to shared capacity reservations

These scenarios teach valuable lessons about capacity management in multi-account environments. You can develop a comprehensive sharing strategy that balances flexibility with partner commitments, enabling you to optimize your resource usage while maintaining strong partner relationships.

Conclusion

The new ODCR features of AWS –a split, move, and modify – represent a significant advancement in cloud capacity management. For your organization, these features transform how you handle compute resources, enabling more efficient operations and cost management. The ability to dynamically adjust and share Capacity Reservations provides the flexibility you need while maintaining the stability necessary for your critical workloads.

As cloud infrastructure continues to evolve, these features demonstrate the AWS commitment to addressing real-world challenges that you face when managing complex cloud environments. If you’re looking to optimize your AWS infrastructure, then these new ODCR capabilities offer powerful tools for better capacity management and resource usage.

To enhance your understanding of these capabilities, we’ve created a GitHub repository containing APIs for implementation purposes. For more details, refer to the updated Capacity Reservations documentation. If you have any questions or feedback, feel free to share them in the comments section or contact AWS Support.

Architecting for seamless on-premises connectivity with AWS Outposts servers

Post Syndicated from aostan original https://aws.amazon.com/blogs/compute/architecting-for-seamless-on-premises-connectivity-with-aws-outposts-servers/

This post is written by Mark Nguyen, Principal Solutions Architect, AWS and Ryan Fillis, Solutions Architect, AWS.

AWS Outposts brings native AWS services, infrastructure, and operating models to virtually any data center, co-location space, or on-premises facility. Deploying Outposts servers in your environment necessitates additional considerations regarding local network connectivity and Amazon Elastic Compute Cloud (Amazon EC2) instance networking. This post demonstrates the scalability of Outposts servers through automation and the deployment of Amazon EC2 network interfaces. This reduces the number of manual steps required to configure an Outposts server.

This post details physically connecting your servers to your Local Area Network (LAN) and the networking options available for EC2 instances running on Outposts. We cover the physical cabling options, virtual networking components such as VPCs and subnets, and walkthrough an example setup for an EC2 instance with a user-data script to route traffic locally over your on-premises network.

This post assumes that you have some familiarity with Outposts servers. If you would like a general refresher, observe What is AWS Outposts. For more information about how to provision your Outposts server, see Installing an AWS Outposts server.

Basic Amazon EC2 networking using a single interface

When launching an EC2 instance on an Outposts server a single interface is created for network connectivity. This default setting, depicted in the following diagram, is the most direct method for your instance to communicate externally.

Figure 1 Simple network connectivity on an Outposts server

Figure 1: Simple network connectivity on an Outposts server

When deploying an EC2 instance to an Outposts server, there are certain differences in using the default Elastic Network Interface (ENI) as compared to deploying in an AWS Region. Understanding these differences is critical before modifying the network configuration, which you do in the next step.

ENI differentiators between Outposts servers and the Region:

  • Primary interface: The primary interface is an ENI. This ENI is associated to a subnet within a VPC. This VPC is extended from the Region to the Outposts server.
  • IP address configuration: The primary network interface within the guest operating system (OS) of the EC2 instance must be configured to obtain an IP address through DHCP. The assigned IP address is from the IP address range of the VPC subnet associated with the Outposts server.
  • Security group: A security group is associated with the ENI. This security group falls within the VPC that is extended from the Region. The user must apply appropriate access control rules to permit access to the EC2 instance. You may reuse security groups that already exist within the VPC.
  • Outbound traffic: By default, an EC2 instance uses its ENI to direct outbound traffic toward the VPC subnet. Traffic flows according to the routing table associated with the Outposts server’s VPC subnet.
  • Inbound traffic: If you’re only using an ENI, then traffic destined to EC2 instances on Outposts servers must traverse through the service link. In the preceding diagram, the user communicates with the EC2 instance over the internet. Traffic from the internet reaches the Region through the Internet Gateway of the VPC. Then, the VPC forwards the traffic to the appropriate subnet of the Outposts server (through the service link) and reaches the EC2 instance. The user must configure the necessary VPC components (Internet Gateway and associated routing table entries) for internet connectivity.
  • Local network connectivity: There is no local network connectivity using the ENI. For local network connectivity, see the next section where we discuss the Outposts server Local Network Interface (LNI).

Local network connectivity for EC2 instances

Outposts servers allow you to communicate through the Local Network Interface (LNI) in addition to the ENI. The LNI is a logical networking component that connects the Amazon EC2 instances in your Outposts subnet to your on-premises network.

The Outposts server EC2 instance local communications characteristics:

  • Local network traffic needs the use of an LNI.
  • The subnets on Outposts servers must be enabled for LNIs. This is done by entering the following command:

aws ec2 modify-subnet-attribute \

--subnet-id subnet-1a2b3c4d \

--enable-lni-at-device-index 1

  • IP address assignment for the LNI can be DHCP or static.
  • You can’t apply VPC security groups to the LNI. To control traffic on the LNI, you can use an OS based firewall, external on-premises firewall, or other security devices.
  • Amazon CloudWatch metrics are produced for each LNI.
  • Outposts servers don’t tag VLAN traffic. If VLAN tags are needed, then the network interface settings inside the guest OS must apply the VLAN tags. Multiple VLAN interfaces can exist within the same LNI (in this case you would be using the LNI as a VLAN trunk).
  • Local traffic bandwidth performance depends on the instance type. The larger the instance type, the higher performance the throughput of the LNI. The maximum throughput is 10 Gbps.
  • EC2 instances that communicate locally always have at least two interfaces: one ENI and one or more LNIs. Therefore, the instance OS’s routing table must be configured based on the desired traffic behavior.

Example configuration: Local traffic for EC2 instance on Outposts server

Figure 2 Example scenario topology

Figure 2: Example scenario topology

In the example scenario, we want to launch an Amazon Linux 2023 instance and route all default traffic through the local network. Eth0 is the primary interface (ENI) and is used for traffic towards the Region. Eth1 is the LNI and is used for all other traffic. A user-data script is used to make the necessary routing changes at launch.

Here is a sample user-data script. These commands run as root so there is no need to prepend each command with sudo.

User data script (my_userdata.txt):

#!/bin/bash 
route add -net 172.31.0.0/16 gw 172.31.239.1 
route del default gw 172.31.239.1 
cp -RL /run/systemd/network/* /etc/systemd/network/ 
echo -e '\n[Route]\nDestination=172.31.0.0/16\nGateway=172.31.239.1\nGatewayOnLink=yes' >> /etc/systemd/network/70-ens5.network 
sed -i -e 's/UseGateway=true/UseGateway=false/g' /etc/systemd/network/70- ens5.network.d/eni.conf

We can break down this script to observe the intent of each command:

route add -net 172.31.0.0/16 gw 172.31.239.1 
route del default gw 172.31.239.1

When an instance is launched on Outposts server, the instance automatically has a default route that points toward the VPC through the ENI. In the example scenario, the desired configuration is to have all default traffic go through the LNI toward our on-premises LAN, not through the ENI. To accomplish this routing behavior for the ENI, we have to add a route toward the VPC and remove the default route. The first line adds a route through the VPC (172.31.0.0/16), using 172.31.239.1 as the gateway. The second line removes the default route that uses 172.31.239.1 (via the ENI) as the gateway.

Traffic not destined for the VPC routes through the LNI. This includes all local traffic and internet-bound traffic. The local network’s DHCP server provides a default-gateway in its DHCP lease. Therefore, there is already a default route assigned to the LNI. This steers any traffic without a more specific route, including internet traffic, toward the LNI.

Next, the user-data script makes the network settings persistent after reboot. The procedure varies depending on your OS. In the case of Amazon Linux 2023, it uses systemd-networkd.

cp -RL /run/systemd/network/* /etc/systemd/network/

This command copies the configuration files from the /run/systemd/network/ folder to /etc/systemd/network/. The configuration files in the /etc/systemd/network/ folder override the default settings and load during boot. The next is step is to modify the newly copied network configuration files.

echo -e '\n[Route] \nDestination=172.31.0.0/16 \nGateway=172.31.239.1 \nGatewayOnLink=yes' >> /etc/systemd/network/70-ens5.network

In this case the ENI is ens5. This line appends the static route section to the 70-ens5.network configuration file. This makes the static route added earlier in the script (route add -net 172.31.0.0/16 gw 172.31.239.1) persistent across reboots.

sed -i -e 's/UseGateway=true/UseGateway=false/g' /etc/systemd/network/70- ens5.network.d/eni.conf

Next, the user-script edits the configuration file, eni.conf, such that the default route isn’t used for the ENI at bootup. This is accomplished using sed to search and replace true with false for the UseGateway parameter.

Launching an instance with ENI and LNI

Now that the user-data script has been created, use the AWS Command Line Interface (AWS CLI) to launch an EC2 instance:

aws ec2 run-instances \
--image-id ami-051f8a213df8bc089 \
--count 1 \
--instance-type c6id.xlarge \
--key-name my_key \
--user-data file://my_userdata.txt \
--network-interfaces '[ \
  { "DeviceIndex":0, "SubnetId":"subnet-0ca6abe6b34adfcce", "Groups": ["sg-0a9f8c2200c0a56f1"] }, \
  { "DeviceIndex":1, "SubnetId":"subnet-0ca6abe6b34adfcce", "Groups": ["sg-0a9f8c2200c0a56f1"] }]' \
--tag-specifications '[{ "ResourceType":"instance","Tags":[ \
  { "Key":"Name", "Value":"server1" } ] }]'

We can break down the parameters used in the preceding command:

--image-id ami-051f8a213df8bc089 \

This specifies the Amazon Machine Image (AMI) ID. ami-051f8a213df8bc089 is the AMI ID for Amazon Linux 2023 in us-east-1.

--count 1 \

This specifies how many EC2 instances to launch. You can launch multiple at the same time.

--instance-type c6id.xlarge

This specifies the instance type. By default, Outposts 2U servers are slotted with the c6id.8xlarge instance type and Outposts 1U servers are slotted with the c6gd.8xlarge instance type. You can adjust the slotting assignment during the ordering process or you can change the slotting assignment later by using the Self-service Capacity Management feature for AWS Outposts.

--key-name my_key

This specifies the public RSA key that is added to your EC2 instance. This key must already be defined in the same Region of your AWS account.

--user-data file://my_userdata.txt

This specifies the filename that contains your user-data script (that was created previously).

{ "DeviceIndex":0, "SubnetId":"subnet-0ca6abe6b34adfcce", "Groups": ["sg-0a9f8c2200c0a56f1"] }, \

{ "DeviceIndex":1, "SubnetId":"subnet-0ca6abe6b34adfcce", "Groups": ["sg-0a9f8c2200c0a56f1"] }]' \

This specifies the network interface configuration. By default, a single network interface, the ENI, is created. This example calls for a second interface for the LNI. DeviceIndex:0 is for the ENI and doesn’t change. DeviceIndex:1 is for the LNI, which we defined when we enabled LNI for the subnet (--enable-lni-at-device-index 1). The SubnetId refers to the subnet that was created on the Outposts server. If you want to deploy to a different Outposts server, then change the SubnetId. Groups refer to the security group that you would like assigned to the ENI. Security groups aren’t supported for the LNI, thus the security group specified for DeviceIndex:1 is only to comply with the command syntax check. A security group will not be applied to the LNI.

--tag-specifications '[{ "ResourceType":"instance","Tags":[ \

{ "Key":"Name", "Value":"server1" } ] }]'

This assigns a name to the EC2 instance, which in this case is server1.

Conclusion

AWS Outposts servers allow you to run native AWS services on-premises by providing local compute. This supports workloads with low latency and data residency requirements through on-premises processing.

Although Outposts servers integrate seamlessly with the AWS cloud, there are some unique networking considerations when deploying in your data center environment. Amazon EC2 instances on the Outposts server can route traffic over the AWS global network, but you can also enable Local Network Interfaces (LNIs) to directly access your on-premises networks.

In this post we’ve demonstrated using user-data scripts during instance launch to automate hybrid cloud networking flows tailored to your requirements. With proper planning, you can use the benefits of consistent AWS services and tooling while maintaining connectivity to your existing on-premises infrastructure.

Ready to get started with hybrid cloud networking on Outposts servers? Check out the Outposts server documentation and best practices guide to begin planning your on-premises deployment.

Streamlining AMI creation with EC2 Image Builder components in AWS Marketplace

Post Syndicated from Art Baudo original https://aws.amazon.com/blogs/compute/streamliningamicreationwith-ec2imagebuilder/

This post is written by Smriti Ohri, Senior Product Manager, EC2 and Omar Chehab, Senior Product Manager, AWS Marketplace.

At re:Invent 2024, Amazon Web Services (AWS) announced the availability of third-party EC2 Image Builder components in AWS Marketplace. EC2 Image Builder is a fully managed service that streamlines the customization, testing, distribution, and lifecycle management of images. You can use this new feature to procure third-party components from AWS Marketplace directly on the EC2 Image Builder console and in the AWS Marketplace website. You can add multiple of these components to create your golden images.

A golden image is a customized and pre-configured Amazon Machine Image (AMI) needed for launching Amazon Elastic Compute Cloud (Amazon EC2) instances. It includes a standardized set of software, configurations, and security settings that meet an organization’s specific requirements, promoting consistency and efficiency across all EC2 instances.

EC2 Image Builder provides Amazon managed components, and you can build your own components that help when building custom images. However, you may need third-party software to build your golden images. Procuring this software can be time-consuming and necessitates custom setup. This integration aims to address these challenges by providing the ability to add third-party software from AWS Marketplace directly while creating golden images using EC2 Image Builder. While creating the image, you can customize your image recipe to use the latest version of components published in AWS Marketplace and make sure that you always remain up to date.

This post shows you how to find, subscribe to, and incorporate components from AWS Marketplace using the EC2 Image Builder console.

Prerequisites

You must have access to subscribe to a product in AWS Marketplace. Check AWS Marketplace subscription permissions.

Solution overview

Three high-level steps are involved in using the third-party component from AWS Marketplace in EC2 Image Builder:

  1. Discover and subscribe to the third-party component on the EC2 Image Builder console.
  2. Build the golden image with the third-party component.
  3. Launch the EC2 instance using the golden image.

Solution walkthrough: Streamlining AMI creation with EC2 Image builder components in AWS Marketplace

To perform the solution, go through the steps in the following sections.

Discover and subscribe to a component by Cribl

To discover and subscribe to the component, follow these steps:

  1. On the EC2 Image Builder console, in the navigation pane, choose Discover products. On the Components tab, you can view the list of available AWS Marketplace image products and the associated components. As shown in the following screenshot, choose View subscription options, which shows the different pricing offered.

 Figure 1: Discover components on EC2 Image Builder console

 Figure 1: Discover components on EC2 Image Builder console

  1. To subscribe to the product, from the dropdown menu choose the available offers and choose Subscribe, as shown in the following screenshot. You can now start using the associated component in your image recipe.

Figure 2: Subscribe to the product that has the component

Figure 2: Subscribe to the product that has the component

Build the golden image with the third-party component

To use the component, you can either subscribe to it first, or you can create the pipeline and subscribe to the component later based on your preference. For this walkthrough, I already subscribed to the component. The following section shows how to create a pipeline to build a custom AMI using the component to which I subscribed. You can follow a similar process to install other components to create your golden AMIs. The high-level steps are:

  1. Create the recipe.
  2. Create the pipeline.

To create the recipe, follow these steps:

  1. On the EC2 Image Builder console, choose Image recipes and Create image recipe. A recipe has a base image and the components that you want to install on it.

For this example, Amazon Linux was chosen as the base image operating system and “Amazon Linux 2023 x86” as the image name.

  1. In the Build components section, choose Add build components and, from the dropdown, choose AWS Marketplace. Search for the component to which you subscribed and choose Add to recipe, as shown in the following screenshot.

You can choose to use the latest version or a specific version of the component. For this walkthrough, the latest available version was selected.

Figure 3: Create recipe and add components from AWS Marketplace

Figure 3: Create recipe and add components from AWS Marketplace

To create the pipeline, an automation configuration (where you define the infrastructure configuration), image workflows, and distribution configuration, follow these steps:

  1. On the EC2 Image Builder console, choose Image pipelines and Create image pipeline. Provide the name of the pipeline and choose a Build schedule. You can also enable scanning, which scans your AMIs for Common Vulnerabilities and Exposures (CVEs) using Amazon Inspector.

For more information, refer to Amazon Inspector integration in Image Builder in the EC2 Image Builder User Guide. For this example, image scanning is enabled and the option to manually trigger the pipeline was selected.

Figure 4: Create the pipeline with the recipe and other configurations

Figure 4: Create the pipeline with the recipe and other configurations

  1. Choose the recipe you created with third-party components from AWS Marketplace.
  2. Choose the image workflows for the image creation process and define infrastructure configurations for creating the image.

You can choose Dedicated Host, Dedicated Instance, or Shared Tenancy. By default, it uses Shared Tenancy. For this example, the default configuration was selected. I chose the c5.large instance type since that is the supported instance type for this component.

Figure 5: Select the supported instance type in the infrastructure configurations

Figure 5: Select the supported instance type in the infrastructure configurations

  1. Provide the distribution configuration details to share or copy the output image to other accounts and in other AWS Regions.

To allow these accounts to use any component from AWS Marketplace, you must share license entitlements with these accounts using AWS License Manager. Instructions for sharing license entitlements are outside the scope of this post. To learn more, refer to Associating licenses with AMI based products using AWS License Manager.

  1. Choose the pipeline that you created and choose Run pipeline. After a while, the image is created and ready to use.

Run the EC2 instance using the golden image

Create an EC2 instance with the output golden image. You can also view the product code stamped on the AMIs, as shown in the following figure.

 

Figure 6: View the output image to check the product code

Conclusion

This feature helps you save time and automate the process of using the latest versions of the software. With this integration, you get a diverse set of software components from verified sellers in AWS Marketplace to address the monitoring, security, governance, and compliance needs of your organization. You can learn more about these components in the documentation. Visit AWS Marketplace to view all supported EC2 Image Builder components.

If you’re an AWS Partner, then you can publish your software as components in AWS Marketplace to cater to your customers. To learn more about onboarding your software to AWS Marketplace, visit this blog post. You can reach out to [email protected] if you have questions about this new feature or the publishing process.

Start building your custom AMIs using components from Marketplace today.

Dynamically reconfigure your AWS Outposts capacity using Capacity Tasks

Post Syndicated from aostan original https://aws.amazon.com/blogs/compute/dynamically-reconfigure-your-aws-outposts-capacity-using-capacity-tasks/

This post is written by Brianna Rosentrater, Hybrid Edge Specialist SA and Adam Duffield, Senior Technical Account Manager.

AWS Outposts extends AWS infrastructure, AWS services, APIs, and tools to on-premises locations for workloads that require low latency, local data processing, or data residency. Outposts comes in a variety form factors, from 42U Outposts racks to 1U and 2U Outposts servers. Outposts now supports self-service capacity management, making it easy for you to view and manage compute capacity on your Outposts. A default capacity configuration for each new Outpost is determined during the ordering process. This default configuration can subsequently be modified to create a range of instance sizes and quantities to meet your changing business needs. To do so, you create a capacity task, specify the instance sizes and quantity, and run the capacity task to implement the changes. This post focuses on how to use capacity tasks to perform multi-host reconfigurations and view existing capacity configurations.

Overview

Amazon Elastic Compute Cloud (Amazon EC2) capacity on an Outpost is determined by the total volume of compute capacity within the Outpost when ordered. Outposts can also be scaled up or out as needed during your commitment term. For further details on Outpost capacity planning including best practices, refer to the Capacity Planning – AWS Outposts High Availability Design and Architecture whitepaper. We recommend planning spare capacity for N+M host availability when making modifications to your Outpost capacity configuration if your workloads need to be highly available. To calculate, take the number of hosts (N) you need to run all your workloads, and then add (M) additional hosts to meet your requirements for server availability during failure and maintenance events.

Viewing existing Outposts capacity configuration

Outposts users now have visibility into capacity configurations at both an instance family and host level. This gives greater insight into capacity usage and instance placements. Within the Outposts console, after choosing the Outpost ID on which you want to view the capacity configuration, two new views have been provided: the Instance view and the Rack view.

Instance view

The Instance view provides a granular breakdown of the currently deployed instances on the Outpost along with an overall view of instance family capacity pools and their usage, as shown in the preceding figure. The Instances section gives detailed information around the deployed instances, their associated instance ID, instance size, AWS managed service (if applicable), and asset ID of where the instance is running.

Figure 1 - Outposts Instance View

Figure 1 – Outposts Instance View

The Instance capacity distribution summary displays how the various instance sizes are allocated within each instance family, as shown in the following figure. Each host of the same instance family contributes its capacity to the overall pool, which is represented in this section as a percentage rather than number of instance slots. This shows the configured capacity, but it doesn’t reflect any level of usage.

Figure 2 - Outposts Instance Capacity Distribution Summary

Figure 2 – Outposts Instance Capacity Distribution Summary

The Instance capacity distribution details section, shown in the following figure, provides a more detailed breakdown of each instance family capacity pool. This section provides a view of the total available instance capacity, the number of used instances, and the number of instances that are unavailable you at that time (such as when a hardware failure occurs).

Figure 3 - Instance Capacity Distribution Details

Figure 3 – Instance Capacity Distribution Details

Rack view

The Rack view tab provides a more granular view of the overall configuration of each host on a per rack basis, as shown in the following figure. It allows you to analyze the spread and usage of the instance size allocations across each host (asset) and, when choosing the show instance details button, provides the instance ID of each used slot. Using the search box, you can filter by Instance Family or Instance Size to provide a more concise view. If you’re using Outposts server, the Rack view tab will show the capacity configuration of your server.

Figure 4 - Rack View

Figure 4 – Rack View

Obtaining a view of current configuration

Alongside these views, two buttons are available on each of the pages. The Export JSON button gives the ability to download a JSON formatted copy of the current configuration for an Outpost. This is especially useful if you’re looking to record current state, or wanting to use the JSON upload option when submitting a new capacity task. The JSON file structure provides the overall configuration of each capacity pool. However, it doesn’t provide any details in terms of usage. The second button, Modify Instance Capacity, provides a shortcut to creating a capacity task.

This level of capacity visibility is also now available through AWS Command Line Interface (AWS CLI)/Outposts API calls, which some may prefer over the console. For example, the list-assets CLI command can be used to obtain a breakdown of the capacity configuration of each Outpost host:

aws outposts list-assets --outpost-identifier outpost-arn

Figure 5 - list-assets CLI command sample output

Figure 5 – list-assets CLI command sample output

If you want to obtain details of running instances on an Outpost, such as the instance size, the asset ID on which an instance is running, and the related AWS service name (if relevant), then the list-asset-instances CLI command can be used:

aws outposts list-asset-instances --outpost-identifier outpost-arn

Figure 6 - list-asset-instances CLI command sample output

Figure 6 – list-asset-instances CLI command sample output

The list-asset-instances CLI command also allows you to filter through numerous dimensions, such as instance type or AWS service. For example, this can be particularly useful for quickly identifying all running instances of a certain type, such as the m5d.large instance type by using the following command:

aws outposts list-asset-instances --outpost-identifier outpost-arn --instance-type-filter m5d.large

Modifying the Outposts capacity configuration

Due to the finite nature of Outposts capacity, changing operational requirements often mean that adjustments need to be made to your Outposts capacity configuration over time as new workloads are identified or applications need scaling.

From the Outposts console page, choosing Capacity Tasks from the left-hand menu gives a list of previously run capacity tasks and their status. From here, you can choose Create Capacity Task to start the process. To create a capacity task there are two options available: using an interactive capacity configuration tool through the Modify an Outpost capacity configuration option, or uploading a JSON file containing the necessary configuration through the Upload a capacity configuration option.

Figure 7 - Capacity Tasks web form

Figure 7 – Capacity Tasks web form

The interactive Modify an Outpost capacity configuration option using the simple Auto-balance feature and UI is the easiest way for those unfamiliar with Outpost capacity management to get started with making changes. Using this option, you can also choose one of two methods for the task:

  • Run once: This results in the capacity task attempting to run a single time. If any instances block the successful application of the configuration, then the task fails.
  • Run periodically over 48 hours or less: In the event of blocking instances, the capacity task is paused until the instances are stopped. The task rechecks the status every 10 minutes until it can run. If instances aren’t stopped within 48 hours, then the task is cancelled.

To build out the capacity task, capacity pools are displayed, grouped by instance families, and automatically populated with the current configuration of instance sizes and corresponding vCPU allocation. From here, you can make necessary changes to the existing capacity. Specifically, you can add new instance sizes and amend instance quantities in the corresponding fields. The total vCPU count for each instance family will update automatically to reflect your changes. The Auto-balance allows you to automatically adjust the quantities of individual instance sizes to fit within the total vCPU capacity available for the host, which is reflected for each capacity pool at the end. In the event that a capacity pool is over- or under-used, a warning is displayed. You can under provision a host if you choose, but the unprovisioned capacity is unusable, and attempting to overprovision the host results in the capacity task failing to run.

Figure 8 - Modifying existing capacity configuration

Figure 8 – Modifying existing capacity configuration

When the necessary changes have been made to the capacity pools, the second part of the capacity task configuration is choosing instances that should not be impacted by the running of the capacity task. During the run, you may not be in a position to stop certain instances, such as databases or AWS managed services such as Elastic Load Balancing (ELB) or Amazon ElastiCache, due to the impact to production workloads. Choosing these instances allows the capacity task to automatically try to find a path that avoids impacting them. However, in some situations capacity tasks may fail if the chosen instances block the successful running of the task, and there is no possible solution that avoids all the chosen instances. For example, if a capacity task was to remove all c5.xlarge instances and an instance was chosen to ‘keep as-is’ that was running on this instance size, then the task would fail. To avoid this, make sure to include these instances in your capacity configuration. For example, if you have five critical m5.4xlarge instances that must remain running, then include 5 m5.4xlarge instances in your m5 capacity pool configuration.

Figure 9 - Instances to keep as-is

Figure 9 – Instances to keep as-is

After configuring the capacity pools and choosing the necessary instances to keep as-is, an overview of the changes is presented allowing you to validate the configuration prior to running. When you have reviewed the summary, select Create Task to trigger the execution of the capacity task using the chosen method. You can observe the status of a capacity task by choosing the capacity task ID. When it’s initially submitted, the status shows as Requested. During this time, the capacity task evaluates the necessary changes to determine if the task can proceed or if instances need stopping. If the Run once option was chosen and instances do need stopping, then the task moves to a cancelled status and provides details of the blocking instances. Alternatively, if Run periodically was chosen, the task remains in the Requested status until the listed instances have been stopped. In the event that the blocking instances can’t be stopped, the capacity task is cancelled after 48 hours. While a task is running, the Outpost hosts that are impacted by the configuration changes are placed into an isolated state. This means that capacity for new instance launches may be impacted. This isolation only lasts a few minutes while the capacity task is running, but it may impact auto scaling groups if a capacity task coincides with a scaling event.

Instead of using the interactive capacity configurator UI, you can also choose to upload a JSON file to the console containing the necessary configuration. Using this method, choosing instances to keep as-is isn’t available, and the method is automatically chosen as Run once. When a capacity task JSON file is uploaded, the resulting plan is displayed in the following text box and can be amended if needed. Alternatively, rather than uploading the file, the contents can be directly pasted into the text box. Choosing Next moves to the review screen where the remainder of the process continues in line with using the interactive capacity configurator UI.

Figure 10 - Upload a Capacity Configuration using JSON

Figure 10 – Upload a Capacity Configuration using JSON

You may also prefer using the AWS CLI/Outposts API for creating capacity tasks, and a number of new CLI/API actions are now available to support this:

  • cancel-capacity-task / CancelCapacityTask
  • get-capacity-task / GetCapacityTask
  • list-blocking-instances-for-capacity-task / ListBlockingInstancesForCapacityTask
  • list-capacity-tasks / ListCapacityTasks
  • start-capacity-task / StartCapacityTask

In addition to the same options available within the console, it is possible to request a dry run of the capacity task to determine if the instance type and instance size changes are above or below the available instance capacity. Requesting a dry run doesn’t make any changes to your plan.

For example, using the CLI to submit a capacity task to homogeneously slot an Outpost with 2 x m5 and 2 x c5 hosts (192 vCPU for each capacity pool) with xlarge instance sizes, and using periodic running could be achieved by running the following:

aws outposts start-capacity-task \

--outpost-identifier outpost-arn \

--instance-pools '[{"InstanceType":"c5.xlarge","Count":48},{"InstanceType":"m5.xlarge","Count":48}]' \

--task-action-on-blocking-instances WAIT_FOR_EVACUATION

Conclusion

This post demonstrated how to run a capacity task on Outposts and view your existing capacity configuration. For more information on how to manage and monitor your capacity configuration on Outposts, see Capacity management for AWS Outposts user guide and the Capacity planning section of the AWS Outposts High Availability Design and Architecture Considerations whitepaper, and the Modify AWS Outposts instance capacity – Outposts rack/Modify AWS Outposts instance capacity – Outposts server user guide sections for your respective environment. Reach out to your AWS account team to learn more about Outposts and self-service capacity management.