Tag Archives: Compute

Build multi-step applications and AI workflows with AWS Lambda durable functions

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/build-multi-step-applications-and-ai-workflows-with-aws-lambda-durable-functions/

Modern applications increasingly require complex and long-running coordination between services, such as multi-step payment processing, AI agent orchestration, or approval processes awaiting human decisions. Building these traditionally required significant effort to implement state management, handle failures, and integrate multiple infrastructure services.

Starting today, you can use AWS Lambda durable functions to build reliable multi-step applications directly within the familiar AWS Lambda experience. Durable functions are regular Lambda functions with the same event handler and integrations you already know. You write sequential code in your preferred programming language, and durable functions track progress, automatically retry on failures, and suspend execution for up to one year at defined points, without paying for idle compute during waits.

AWS Lambda durable functions use a checkpoint and replay mechanism, known as durable execution, to deliver these capabilities. After enabling a function for durable execution, you add the new open source durable execution SDK to your function code. You then use SDK primitives like “steps” to add automatic checkpointing and retries to your business logic and “waits” to efficiently suspend execution without compute charges. When execution terminates unexpectedly, Lambda resumes from the last checkpoint, replaying your event handler from the beginning while skipping completed operations.

Getting started with AWS Lambda durable functions
Let me walk you through how to use durable functions.

First, I create a new Lambda function in the console and select Author from scratch. In the Durable execution section, I select Enable. Note that, durable function setting can only be set during function creation and currently can’t be modified for existing Lambda functions.

After I create my Lambda durable function, I can get started with the provided code.

Lambda durable functions introduces two core primitives that handle state management and recovery:

  • Steps—The context.step() method adds automatic retries and checkpointing to your business logic. After a step is completed, it will be skipped during replay.
  • Wait—The context.wait() method pauses execution for a specified duration, terminating the function, suspending and resuming execution without compute charges.

Additionally, Lambda durable functions provides other operations for more complex patterns: create_callback() creates a callback that you can use to await results for external events like API responses or human approvals, wait_for_condition() pauses until a specific condition is met like polling a REST API for process completion, and parallel() or map() operations for advanced concurrency use cases.

Building a production-ready order processing workflow
Now let’s expand the default example to build a production-ready order processing workflow. This demonstrates how to use callbacks for external approvals, handle errors properly, and configure retry strategies. I keep the code intentionally concise to focus on these core concepts. In a full implementation, you could enhance the validation step with Amazon Bedrock to add AI-powered order analysis.

Here’s how the order processing workflow works:

  • First, validate_order() checks order data to ensure all required fields are present.
  • Next, send_for_approval() sends the order for external human approval and waits for a callback response, suspending execution without compute charges.
  • Then, process_order() completes order processing.
  • Throughout the workflow, try-catch error handling distinguishes between terminal errors that stop execution immediately and recoverable errors inside steps that trigger automatic retries.

Here’s the complete order processing workflow with step definitions and the main handler:

import random
from aws_durable_execution_sdk_python import (
    DurableContext,
    StepContext,
    durable_execution,
    durable_step,
)
from aws_durable_execution_sdk_python.config import (
    Duration,
    StepConfig,
    CallbackConfig,
)
from aws_durable_execution_sdk_python.retries import (
    RetryStrategyConfig,
    create_retry_strategy,
)


@durable_step
def validate_order(step_context: StepContext, order_id: str) -> dict:
    """Validates order data using AI."""
    step_context.logger.info(f"Validating order: {order_id}")
    # In production: calls Amazon Bedrock to validate order completeness and accuracy
    return {"order_id": order_id, "status": "validated"}


@durable_step
def send_for_approval(step_context: StepContext, callback_id: str, order_id: str) -> dict:
    """Sends order for approval using the provided callback token."""
    step_context.logger.info(f"Sending order {order_id} for approval with callback_id: {callback_id}")
    
    # In production: send callback_id to external approval system
    # The external system will call Lambda SendDurableExecutionCallbackSuccess or
    # SendDurableExecutionCallbackFailure APIs with this callback_id when approval is complete
    
    return {
        "order_id": order_id,
        "callback_id": callback_id,
        "status": "sent_for_approval"
    }


@durable_step
def process_order(step_context: StepContext, order_id: str) -> dict:
    """Processes the order with retry logic for transient failures."""
    step_context.logger.info(f"Processing order: {order_id}")
    # Simulate flaky API that sometimes fails
    if random.random() > 0.4:
        step_context.logger.info("Processing failed, will retry")
        raise Exception("Processing failed")
    return {
        "order_id": order_id,
        "status": "processed",
        "timestamp": "2025-11-27T10:00:00Z",
    }


@durable_execution
def lambda_handler(event: dict, context: DurableContext) -> dict:
    try:
        order_id = event.get("order_id")
        
        # Step 1: Validate the order
        validated = context.step(validate_order(order_id))
        if validated["status"] != "validated":
            raise Exception("Validation failed")  # Terminal error - stops execution
        context.logger.info(f"Order validated: {validated}")
        
        # Step 2: Create callback
        callback = context.create_callback(
            name="awaiting-approval",
            config=CallbackConfig(timeout=Duration.from_minutes(3))
        )
        context.logger.info(f"Created callback with id: {callback.callback_id}")
        
        # Step 3: Send for approval with the callback_id
        approval_request = context.step(send_for_approval(callback.callback_id, order_id))
        context.logger.info(f"Approval request sent: {approval_request}")
        
        # Step 4: Wait for the callback result
        # This blocks until external system calls SendDurableExecutionCallbackSuccess or SendDurableExecutionCallbackFailure
        approval_result = callback.result()
        context.logger.info(f"Approval received: {approval_result}")
        
        # Step 5: Process the order with custom retry strategy
        retry_config = RetryStrategyConfig(max_attempts=3, backoff_rate=2.0)
        processed = context.step(
            process_order(order_id),
            config=StepConfig(retry_strategy=create_retry_strategy(retry_config)),
        )
        if processed["status"] != "processed":
            raise Exception("Processing failed")  # Terminal error
        
        context.logger.info(f"Order successfully processed: {processed}")
        return processed
        
    except Exception as error:
        context.logger.error(f"Error processing order: {error}")
        raise error  # Re-raise to fail the execution

This code demonstrates several important concepts:

  • Error handling—The try-catch block handles terminal errors. When an unhandled exception is thrown outside of a step (like the validation check), it terminates the execution immediately. This is useful when there’s no point in retrying, such as invalid order data.
  • Step retries—Inside the process_order step, exceptions trigger automatic retries based on the default (step 1) or configured RetryStrategy (step 5). This handles transient failures like temporary API unavailability.
  • Logging—I use context.logger for the main handler and step_context.logger inside steps. The context logger suppresses duplicate logs during replay.

Now I create a test event with order_id and invoke the function asynchronously to start the order workflow. I navigate to the Test tab and fill in the optional Durable execution name to identify this execution. Note that, durable functions provides built-in idempotency. If I invoke the function twice with the same execution name, the second invocation returns the existing execution result instead of creating a duplicate.

I can monitor the execution by navigating to the Durable executions tab in the Lambda console:

Here I can see each step’s status and timing. The execution shows CallbackStarted followed by InvocationCompleted, which indicates the function has terminated and execution is suspended to avoid idle charges while waiting for the approval callback.

I can now complete the callback directly from the console by choosing Send success or Send failure, or programmatically using the Lambda API.

I choose Send success.

After the callback completes, the execution resumes and processes the order. If the process_order step fails due to the simulated flaky API, it automatically retries based on the configured strategy. Once all retries succeed, the execution completes successfully.

Monitoring executions with Amazon EventBridge
You can also monitor durable function executions using Amazon EventBridge. Lambda automatically sends execution status change events to the default event bus, allowing you to build downstream workflows, send notifications, or integrate with other AWS services.

To receive these events, create an EventBridge rule on the default event bus with this pattern:

{
  "source": ["aws.lambda"],
  "detail-type": ["Durable Execution Status Change"]
}

Things to know
Here are key points to note:

  • Availability—Lambda durable functions are now available in US East (Ohio) AWS Region. For the latest Region availability, visit the AWS Capabilities by Region page.
  • Programming language support—At launch, AWS Lambda durable functions supports JavaScript/TypeScript (Node.js 22/24) and Python (3.13/3.14). We recommend bundling the durable execution SDK with your function code using your preferred package manager. The SDKs are fast-moving, so you can easily update dependencies as new features become available.
  • Using Lambda versions—When deploying durable functions to production, use Lambda versions to ensure replay always happens on the same code version. If you update your function code while an execution is suspended, replay will use the version that started the execution, preventing inconsistencies from code changes during long-running workflows.
  • Testing your durable functions—You can test durable functions locally without AWS credentials using the separate testing SDK with pytest integration and the AWS Serverless Application Model (AWS SAM) command line interface (CLI) for more complex integration testing.
  • Open source SDKs—The durable execution SDKs are open source for JavaScript/TypeScript and Python. You can review the source code, contribute improvements, and stay updated with the latest features.
  • Pricing—To learn more on AWS Lambda durable functions pricing, refer to the AWS Lambda pricing page.

Get started with AWS Lambda durable functions by visiting the AWS Lambda console. To learn more, refer to AWS Lambda durable functions documentation page.

Happy building!

Donnie

Introducing Amazon EC2 X8aedz instances powered by 5th Gen AMD EPYC processors for memory-intensive workloads

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/introducing-amazon-ec2-x8aedz-instances-powered-by-5th-gen-amd-epyc-processors-for-memory-intensive-workloads/

Today, we’re announcing the availability of new memory-optimized, high-frequency Amazon Elastic Compute Cloud (Amazon EC2) X8aedz instances powered by a 5th Gen AMD EPYC processor. These instances offer the highest CPU frequency, 5GHz in the cloud. They deliver up to two times higher compute performance and 31% price-performance compared to previous generation X2iezn instances.

X8aedz instances are ideal for electronic design automation (EDA) workloads, such as physical layout and physical verification jobs, and relational databases that benefit from high single-threaded processor performance and a large memory footprint. The combination of 5 GHz processors and local NVMe storage enables faster processing of memory-intensive backend EDA workloads such as floor planning, logic placement, clock tree synthesis (CTS), routing, and power/signal integrity analysis. The high memory-to-vCPU ratio of 32:1 makes these instances particularly effective for applications with vCPU-based licensing models.

Let me explain the instance type naming: The “a” suffix indicates an AMD processor, “e” denotes extended memory in the memory-optimized instance family, “d” represents local NVMe-based SSDs physically connected to the host server, and “z” indicates high-frequency processors.

X8aedz instances
X8aedz instances are available in eight sizes ranging from 2–96 vCPUs with 64–3,072 GiB of memory, including two bare metal sizes. X8aedz instances feature up to 75 Gbps of network bandwidth with support for the Elastic Fabric Adapter (EFA), up to 60 Gbps of throughput to the Amazon Elastic Block Store (Amazon EBS), and up to 8 TB of local NVMe SSD storage.

Here are the specs for X8aedz instances:

Instance name vCPUs Memory
(GiB)
NVMe SSD storage (GB) Network bandwidth (Gbps) EBS bandwidth (Gbps)
x8aedz.large 2 64 158 Up to 18.75 Up to 15
x8aedz.xlarge 4 128 316 Up to 18.75 Up to 15
x8aedz.3xlarge 12 384 950 Up to 18.75 Up to 15
x8aedz.6xlarge 24 768 1,900 18.75 15
x8aedz.12xlarge 48 1,536 3,800 37.5 30
x8aedz.24xlarge 96 3,072 7,600 75 60
x8aedz.metal-12xl 48 1,536 3,800 37.5 30
x8aedz.metal-24xl 96 3,072 7,600 75 60

With the 60 Gbps Amazon EBS bandwidth and up to 8 TB of local NVMe SSD storage, you can achieve faster database response times and reduced latency for EDA operations, ultimately accelerating time-to-market for chip designs. These instances also support the instance bandwidth configuration feature that offers flexibility in allocating resources between network and EBS bandwidth. You can scale network or EBS bandwidth by 25% and improve database (read and write) performance, query processing, and logging speeds.

X8aedz instances use sixth-generation AWS Nitro cards, which offload CPU virtualization, storage, and networking functions to dedicated hardware and software, enhancing performance and security for your workloads.

Now available
Amazon EC2 X8aedz instances are now available in US West (Oregon) and Asia Pacific (Tokyo) AWS Regions, and additional Regions will be coming soon. For Regional availability and future roadmap, search the instance type in the AWS CloudFormation resources tab of the AWS Capabilities by Region.

You can purchase these instances as On-Demand, Savings Plan, Spot Instances, and Dedicated Instances. To learn more, visit the Amazon EC2 Pricing page.

Give X8aedz instances a try in the Amazon EC2 console. To learn more, visit the Amazon EC2 X8aedz instances page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

Channy

Amazon FSx for NetApp ONTAP now integrates with Amazon S3 for seamless data access

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/amazon-fsx-for-netapp-ontap-now-integrates-with-amazon-s3-for-seamless-data-access/

Today, we’re announcing the ability to access your data in Amazon FSx for NetApp ONTAP file systems using Amazon Simple Storage Service (Amazon S3). With this capability, you can use your enterprise file data to augment generative AI applications with Amazon Bedrock Knowledge Bases for Retrieval Augmented Generation (RAG), train machine learning (ML) models with Amazon SageMaker, generate insights with Amazon S3 integrated third-party services, use comprehensive research capabilities in AI-powered business intelligence (BI) tools such as Amazon Quick Suite, and run analyses using Amazon S3 based cloud-native applications, all while your file data continues to reside in your FSx for NetApp ONTAP file system.

Amazon FSx for NetApp ONTAP is the first and fully AWS managed NetApp ONTAP file system in the cloud to migrate on-premises applications that rely on NetApp ONTAP or other network-attached storage (NAS) appliances to AWS without having to change how you manage your data. FSx for NetApp ONTAP provides the popular capabilities, high performance, and data management APIs of ONTAP file systems with the added benefits of the AWS Cloud, such as simplified management, on-demand scaling, and seamless integration with other AWS services.

Over the years, AWS has developed a broad range of industry-leading AI, ML, and analytics services and applications that work with data in Amazon S3 that organizations use to innovate faster, discover new insights, and make even better data-driven decisions. However, some organizations want to use these services with their enterprise file data stored in NetApp ONTAP or other NAS appliances.

How to get started
You can create and attach an S3 Access Point to your FSx for ONTAP file system using the Amazon FSx console, the AWS Command Line Interface (AWS CLI), or the AWS SDK.

I have an existing FSx for ONTAP file system demo-create-s3access which I created by following the steps in the Creating file systems in the FSx for ONTAP documentation. Using the Amazon FSx console I now choose the file system ID fs-0c45b011a7f071d70 to access the full details of the file system.

I’ll attach the access point to the volume of the file system. I choose the volume vol1 and then select Create S3 Access Point from the Actions dropdown menu.


I enter details such as the access point name, the type of file system user identity and the network configuration, then choose Create s3 Access Point to finalize the process.


After it’s created, the access point my-s3-accesspoint is ready to allow access to the file data stored in my file system demo-create-s3access from Amazon S3. Amazon Access Points are S3 endpoints that can be attached to Amazon FSx volumes and used to perform Amazon S3 object operations.


I can now bring proprietary data stored in the file system demo-create-s3access to Amazon S3 for use in applications that work with Amazon S3 while my file data continues to reside in the FSx for NetApp ONTAP file system using the access point my-s3-accesspoint (this data remains accessible through the file protocols).

For the walkthrough in this post, I’ll integrate with Quick Suite.

Integrating decades of enterprise file data with the latest AI powered BI tools on AWS
In the Quick Suite Console, in the left navigation pane, I choose Connections, then select Integrations. Before you begin, make sure that you have the correct permissions to the Amazon S3 AWS resource. You can control the AWS resources that Quick Suite can access by following the Amazon Quick Suite user guide.


After I’ve selected the Amazon S3 integration I enter my Amazon S3 Access Point alias as the S3 bucket URL, leave the rest of the information as default, then choose Create and continue.


I finalize the process by providing the Name of the knowledge base, the Description, then choose Create.


After the knowledge base has been created it’s automatically synchronized, it’s now available for interaction.


I want to learn more about the AWS European Sovereign Cloud so I’ve updated the file system (accessed through the S3 Access Point my-s3-accesspoin-iyytkgz83djdjj7abn3u711supfgkuse1b-ext-s3alias) with the AWS whitepaper on this topic. In the chat in Amazon Quick Suite. I start asking the first question “do we have any documentation on the europe sovereignty cloud?“. To answer my question, the chat agent accesses and analyzes various types of data sources I have permission to use, including uploaded files in my current conversation, spaces I have access to, knowledge bases from my integrations, and more.

When I verify the source, I see that the document I uploaded to my file system is listed as one of the sources.

Other use cases of Amazon S3 Access Points for Amazon FSx for NetApp ONTAP
Earlier, we looked at use cases such as connecting an organization’s proprietary file data to Amazon Quick Suite for advanced business intelligence. Additionally, Amazon S3 Access Points for Amazon FSx for NetApp ONTAP can be used to seamlessly integrate enterprise file data with comprehensive analytics services, such as Amazon Athena for serverless SQL queries or AWS Glue for ETL processing, to name a few.

Amazon S3 Access Points for Amazon FSx for NetApp ONTAP are also suitable for data access from serverless compute workloads that are cloud-native with containerized microservices that require flexible access to shared enterprise datasets, such as configuration files, reference data, content libraries, model artifacts, and application assets.

Now available
You can get started today using the Amazon FSx console, AWS CLI, or AWS SDK to attach Amazon S3 Access Points to your Amazon FSx for NetApp ONTAP file systems. The feature is available in the following AWS Regions: Africa (Cape Town), Asia Pacific (Hong Kong, Hyderabad, Jakarta, Melbourne, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo), Canada (Central, Calgary), Europe (Frankfurt, Ireland, London, Milan, Paris, Spain, Stockholm, Zurich), Israel (Tel Aviv), Middle East (Bahrain, UAE), South America (Sao Paulo), US East (N. Virginia, Ohio), and US West (N. California Oregon). You’re billed by Amazon S3 for the requests and data transfer costs through your S3 Access Point, in addition to your standard Amazon FSx charges. Learn more on the Amazon FSx for NetApp ONTAP pricing page.

PS: Writing a blog post at AWS is always a team effort, even when you see only one name under the post title. In this case, I want to thank Luke Miller, for his expertise and generous help with technical guidance, which made this overview possible and comprehensive.

Veliswa Boya.

Introducing AWS Lambda Managed Instances: Serverless simplicity with EC2 flexibility

Post Syndicated from Micah Walter original https://aws.amazon.com/blogs/aws/introducing-aws-lambda-managed-instances-serverless-simplicity-with-ec2-flexibility/

Today, we’re announcing AWS Lambda Managed Instances, a new capability you can use to run AWS Lambda functions on your Amazon Elastic Compute Cloud (Amazon EC2) compute while maintaining serverless operational simplicity. This enhancement addresses a key customer need: accessing specialized compute options and optimizing costs for steady-state workloads without sacrificing the serverless development experience you know and love.

Although Lambda eliminates infrastructure management, some workloads require specialized hardware, such as specific CPU architectures, or cost optimizations from Amazon EC2 purchasing commitments. This tension forces many teams to manage infrastructure themselves, sacrificing the serverless benefits of Lambda only to access the compute options or pricing models they need. This often leads to a significant architectural shift and greater operational responsibility.

Lambda Managed Instances
You can use Lambda Managed Instances to define how your Lambda functions run on EC2 instances. Amazon Web Services (AWS) handles setting up and managing these instances in your account. You get access to the latest generation of Amazon EC2 instances, and AWS handles all the operational complexity—instance lifecycle management, OS patching, load balancing, and auto scaling. This means you can select compute profiles optimized for your specific workload requirements, like high-bandwidth networking for data-intensive applications, without taking on the operational burden of managing Amazon EC2 infrastructure.

Each execution environment can process multiple requests rather than handling just one request at a time. This can significantly reduce compute consumption, because your code can efficiently share resources across concurrent requests instead of spinning up separate execution environments for each invocation. Lambda Managed Instances provides access to Amazon EC2 commitment-based pricing models such as Compute Savings Plans and Reserved Instances, which can provide up to a 72% discount over Amazon EC2 On-Demand pricing. This offers significant cost savings for steady-state workloads while maintaining the familiar Lambda programming model.

Let’s try it out
To take Lambda Managed Instances for a spin, I first need to create a Capacity provider. As shown in the following image, there is a new tab for creating these in the navigation pane under Additional resources.

Lambda Managed Instances Console

Creating a Capacity provider is where I specify the virtual private cloud (VPC), subnet configuration and security groups. With a capacity provider configuration, I can also tell Lambda where to provision and manage the instances.

I can also specify the EC2 instance types I’d like to include or exclude, or I can choose to include all instance types for high diversity. Additionally, I can specify a few controls related to auto scaling, including the Maximum vCPU count, and if I want to use Auto scaling or use a CPU policy.

After I have my capacity provider configured, I can choose it through its Amazon Resource Name (ARN) when I go to create a new Lambda function. Here I can also select the memory allocation I want along with a memory-to-vCPU ratio.

Working with Lambda Managed Instances
Now that we’ve seen the basic setup, let’s explore how Lambda Managed Instances works in more detail. The feature organizes EC2 instances into capacity providers that you configure through the Lambda console, AWS Command Line Interface (AWS CLI), or infrastructure as code (IaC) tools such as AWS CloudFormation, AWS Serverless Application Model (AWS SAM), AWS Cloud Development Kit (AWS CDK) and Terraform. Each capacity provider defines the compute characteristics you need, including instance type, networking configuration, and scaling parameters.

When creating a capacity provider, you can choose from the latest generation of EC2 instances to match your workload requirements. For cost-optimized general-purpose compute, you could choose AWS Graviton4 based instances that deliver excellent price performance. If you’re not sure which instance type to select, AWS Lambda provides optimized defaults that balance performance and cost based on your function configuration.

After creating a capacity provider, you attach your Lambda functions to it through a straightforward configuration change. Before attaching a function, you should review your code for programming patterns that can cause issues in multiconcurrency environments, such as writing to or reading from file paths that aren’t unique per request or using shared memory spaces and variables across invocations.

Lambda automatically routes requests to preprovisioned execution environments on the instances, eliminating cold starts that can affect first-request latency. Each execution environment can handle multiple concurrent requests through the multiconcurrency feature, maximizing resource utilization across your functions. When additional capacity is needed during traffic increases, AWS automatically launches new instances within tens of seconds and adds them to your capacity provider. The capacity provider can absorb traffic spikes of up to 50% without needing to scale by default, but built-in circuit breakers protect your compute resources during extreme traffic surges by temporarily throttling requests with 429 status codes if the capacity provider reaches maximum provisioned capacity and additional capacity is still being spun up.

The operational and architectural model remains serverless throughout this process. AWS handles instance provisioning, OS patching, security updates, load balancing across instances, and automatic scaling based on demand. AWS automatically applies security patches and bug fixes to operating system and runtime components, often without disrupting running applications. Additionally, instances have a maximum 14-day lifetime to align with industry security and compliance standards. You don’t need to write automatic scaling policies, configure load balancers, or manage instance lifecycle yourself, and your function code, event source integrations, AWS Identity and Access Management (AWS IAM) permissions, and Amazon CloudWatch monitoring remain unchanged.

Now available
You can start using Lambda Managed Instances today through the Lambda console, AWS CLI, or AWS SDKs. The feature is available in US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland) Regions. For Regional availability and future roadmap, visit the AWS Capabilities by Region. Learn more about it in the AWS Lambda documentation.

Pricing for Lambda Managed Instances has three components. First, you pay standard Lambda request charges of $0.20 per million invocations. Second, you pay standard Amazon EC2 instance charges for the compute capacity provisioned. Your existing Amazon EC2 pricing agreements, including Compute Savings Plans and Reserved Instances, can be applied to these instance charges to reduce costs for steady-state workloads. Third, you pay a compute management fee of 15% calculated on the EC2 on-demand instance price to cover AWS’s operational management of your instances. Note that unlike traditional Lambda functions, you are not charged separately for execution duration per request. The multiconcurrency feature helps further optimize costs by reducing the total compute time required to process your requests.

The initial release supports the latest versions of Node.js, Java, .NET and Python runtimes, with support for other languages coming soon. The feature integrates with existing Lambda workflows including function versioning, aliases, AWS CloudWatch Lambda Insights, AWS AppConfig extensions, and deployment tools like AWS SAM and AWS CDK. You can migrate existing Lambda functions to Lambda Managed Instances without changing your function code (as long as it has been validated to be thread safe for multiconcurrency) making it easy to adopt this capability for workloads that would benefit from specialized compute or cost optimization.

Lambda Managed Instances represents a significant expansion of Lambda’s capabilities, which means you can run a broader range of workloads while preserving the serverless operational model. Whether you’re optimizing costs for high-traffic applications, or accessing the latest processor architectures like Graviton4, this new capability provides the flexibility you need without operational complexity. We’re excited to see what you build with Lambda Managed Instances.

Announcing Amazon EKS Capabilities for workload orchestration and cloud resource management

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/announcing-amazon-eks-capabilities-for-workload-orchestration-and-cloud-resource-management/

Today, we’re announcing Amazon Elastic Kubernetes Service (Amazon EKS) Capabilities, an extensible set of Kubernetes-native solutions that streamline workload orchestration, Amazon Web Services (AWS) cloud resource management, and Kubernetes resource composition and orchestration. These fully managed, integrated platform capabilities include open source Kubernetes solutions that many customers are using today, such as Argo CD, AWS Controllers for Kubernetes, and Kube Resource Orchestrator.

With EKS Capabilities, you can build and scale Kubernetes applications without managing complex solution infrastructure. Unlike typical in-cluster installations, these capabilities actually run in EKS service-owned accounts that are fully abstracted from customers.

With AWS managing infrastructure scaling, patching, and updates of these cluster capabilities, you can use the enterprise reliability and security without needing to maintain and manage the underlying components.

Here are the capabilities available at launch:

  • Argo CD – This is a declarative GitOps tool for Kubernetes that provides continuous continuous deployment (CD) capabilities for Kubernetes. It’s broadly adopted, with more than 45% of Kubernetes end-users reporting production or planned production use in the 2024 Cloud Native Computing Foundation (CNCF) Survey.
  • AWS Controllers for Kubernetes (ACK) – ACK is highly popular with enterprise platform teams in production environments. ACK provides custom resources for Kubernetes that enable the management of AWS Cloud resources directly from within your clusters.
  • Kube Resource Orchestrator (KRO) – KRO provides a streamlined way to create and manage custom resources in Kubernetes. With KRO, platform teams can create reusable resource bundles that abstract away complexity while remaining natively to the Kubernetes ecosystem.

With these features, you can accelerate and scale your Kubernetes use with fully managed capabilities, using its opinionated but flexible features to build for scale right from the start. It is designed to offer a set of foundational cluster capabilities that layer seamlessly with each other, providing integrated features for continuous deployment, resource orchestration, and composition. You can focus on managing and shipping software without needing to spend time and resources building and managing these foundational platform components.

How it works
Platform engineers and cluster administrators can set up EKS Capabilities to offload building and managing custom solutions to provide common foundational services, meaning they can focus on more differentiated features that matter to your business.

Your application developers primarily work with EKS Capabilities as they do other Kubernetes features. They do this by applying declarative configuration to create Kubernetes resources using familiar tools, such as kubectl or through automation from git commit to running code.

Get started with EKS Capabilities
To enable EKS Capabilities, you can use the EKS console, AWS Command Line Interface (AWS CLI), eksctl, or other preferred tools. In the EKS console, choose Create capabilities in the Capabilities tab on your existing EKS cluster. EKS Capabilities are AWS resources, and they can be tagged, managed, and deleted.

You can select one or more capabilities to work together. I checked all three capabilities: ArgoCD, ACK, and KRO. However, these capabilities are completely independent and you can pick and choose which capabilities you want enabled on your clusters.

Now you can configure selected capabilities. You should create AWS Identity and Access Management (AWS IAM) roles to enable EKS to operate these capabilities within your cluster. Please note you cannot modify the capability name, namespace, authentication region, or AWS IAM Identity Center instance after creating the capability. Choose Next and review the settings and enable capabilities.

Now you can see and manage created capabilities. Select ArgoCD to update configuration of the capability.

You can see details of ArgoCD capability. Choose Edit to change configuration settings or Monitor ArgoCD to show the health status of the capability for the current EKS cluster.

Choose Go to Argo UI to visualize and monitor deployment status and application health.

To learn more about how to set up and use each capability in detail, visit Getting started with EKS Capabilities in the Amazon EKS User Guide.

Things to know
Here are key considerations to know about this feature:

  • Permissions – EKS Capabilities are cluster-scoped administrator resources, and resource permissions are configured through AWS IAM. For some capabilities, there is additional configuration for single sign-on. For example, Argo CD single sign-on configuration is enabled directly in EKS with a direct integration with IAM Identity Center.
  • Upgrades – EKS automatically updates cluster capabilities you enable and their related dependencies. It automatically analyzes for breaking changes, patches and updates components as needed, and informs you of conflicts or issues through the EKS cluster insights.
  • Adoptions – ACK provides resource adoption features that enable migration of existing AWS resources into ACK management. ACK also provides read-only resources which can help facilitate a step-wise migration from provisioned resources with Terraform, AWS CloudFormation into EKS Capabilities.

Now available
Amazon EKS Capabilities are now available in commercial AWS Regions. For Regional availability and future roadmap, visit the AWS Capabilities by Region. There are no upfront commitments or minimum fees, and you only pay for the EKS Capabilities and resources that you use. To learn more, visit the EKS pricing page.

Give it a try in the Amazon EKS console and send feedback to AWS re:Post for EKS or through your usual AWS Support contacts.

Channy

Performance benefits of new Amazon EC2 R8a memory-optimized instances

Post Syndicated from Tyler Jones original https://aws.amazon.com/blogs/compute/performance-benefits-of-new-amazon-ec2-r8a-memory-optimized-instances/

Recently we announced the availability of Amazon Elastic Compute Cloud (Amazon EC2) R8a instances, the latest addition to the AMD memory-optimized instance family. These instances are powered by the 5th Generation AMD EPYC (codename Turin) processors with a maximum frequency of 4.5 GHz. In this post I take these instances for a spin and benchmark MySQL later on, but first I discuss the top things you should know about these instances.

Notable characteristics of R8a instances

Each vCPU on an R8a instance corresponds to a physical CPU core (something we started on 7th generation AMD instances). This means that there is no simultaneous multi-threading (SMT). Each vCPU mapped to a dedicated physical core, which means that you get more predictable and consistent performance because there’s no resource sharing or potential interference between threads, which is particularly crucial for performance-sensitive workloads where consistent latency is essential. When evaluating and adopting R8a instances, make sure that you’re re-evaluating your thresholds for CPU usage. You can likely squeeze more out of each instance’s CPU without impacting any of your workload’s SLA metrics.

R8a instances feature sizes of up to 192 vCPU with 1,536 GiB RAM. The following table shows the detailed specs:

Instance size vCPU Memory (GiB) Instance storage Network bandwidth (Gbps) EBS bandwidth (Gbps)
r8a.medium 1 8 EBS Only Up to 12.5 Up to 10
r8a.large 2 16 EBS Only Up to 12.5 Up to 10
r8a.xlarge 4 32 EBS Only Up to 12.5 Up to 10
r8a.2xlarge 8 64 EBS Only Up to 15 Up to 10
r8a.4xlarge 16 128 EBS Only Up to 15 Up to 10
r8a.8xlarge 32 256 EBS Only 15 10
r8a.12xlarge 48 384 EBS Only 22.5 15
r8a.16xlarge 64 512 EBS Only 30 20
r8a.24xlarge 96 768 EBS Only 40 30
r8a.48xlarge 192 1536 EBS Only 75 60
r8a.metal-24xl 96 768 EBS Only 40 30
r8a.metal-48xl 192 1536 EBS Only 75 60

Testing MySQL performance using HammerDB

R8a instances are a great choice for MySQL databases, so I thought that would be a great place to showcase some of these instances capabilities. To test MySQL, I used a series of scripts written by my colleagues to track MySQL performance across software versions and different EC2 instances. These scripts are stored in the repro-collection repository, which is an open source, extensible framework for performance testing that addresses real-world workloads rather than micro-benchmarks. It is built to provide a performance measurement reference usable across multiple organizations, and it’s currently centered on MySQL and actively used in discussions with Linux Kernel developers and maintainers. Furthermore, it helps track any performance impacts created by code changes to MySQL. The scripts contained in this repository set up a MySQL database to be tested, and a load generator running the HammerDB benchmark.

For this benchmark I used an r6a.24xlarge instance for the load generator, and an r6a.xlarge, r7a.xlarge, and r8a.xlarge instances for the MySQL database server all deployed in the same AWS Availability Zone (AZ). I chose a single AZ setup to minimize any latency variability from crossing multiple AZs. This is not meant to be a production-like setup, and I highly recommend using multiple AZs for production workloads. Each MySQL instance was tested separately using the same HammerDB load generator. Each test was run three times, and the results were averaged across the three runs. A diagram of the architecture is shown in the following figure:

Performance testing architecture showing r6a/r7a/r8a instance types with HammerDB load generator executing 9 test runs

HammerDB overall results

R8a instances show great results in the HammerDB benchmark for MySQL databases. For HammerDB’s overall score category, R8a instances outscored R7a instances by 55% and outscored R6a instances by 74%.

Performance comparison chart showing r6a, r7a, and r8a instance scores

HammerDB transactions per minute test

R8a instances also showed a notable improvement in this category. When compared to previous generation R7a instances, R8a out performed R7a by 32%. When compared to R6a instances, R8a outperformed by 63%.

 Performance comparison showing r6a (91,105), r7a (112,686), and r8a (148,478) transactions per minute

HammerDB P99 latency results

R8a instances showed improvement in P99 latency results, showing the efficiency gains driven by the new 5th Generation AMD EPYC CPUs and higher memory bandwidth. R8a shows an 14% latency reduction when compared to R7a, and a 25% latency reduction when compared to R6a.

P99 latency comparison showing decrease from 39.93ms (r6a) to 30.02ms (r8a) across instance generations

Conclusion

Built on the AWS Nitro System using sixth generation Nitro Cards, R8a instances are ideal for high performance, memory-intensive workloads, such as SQL and NoSQL databases, as demonstrated by the bench-marking shown in this post, as well as distributed web scale in-memory caches, in-memory databases, real-time big data analytics, and Electronic Design Automation (EDA) applications. R8a instances offer 12 sizes, including 2 bare metal sizes. Amazon EC2 R8a instances are SAP-certified, and providing 38% more SAPS when compared to R7a instances. If you’re still running 6th generation R6a instances, then I highly encourage you to migrate to the 8th generation instances to use their clear price performance benefits. Staying on modern infrastructure is a great way to drive down costs and provide more features for your customers, and there are clear gains to be had based on the testing shown in this post.

Start optimizing your high performance memory intensive workloads today by migrating to R8a instances. Visit the Amazon EC2 R8a instances page to learn more and get started on your upgrades to use the increased price performance of R8a instances today!

The attendee’s guide to hybrid cloud and edge computing at AWS re:Invent 2025

Post Syndicated from Rachel Zheng original https://aws.amazon.com/blogs/compute/the-attendees-guide-to-hybrid-cloud-and-edge-computing-at-aws-reinvent-2025/

AWS re:Invent 2025 returns to Las Vegas, Nevada, from December 1–5, 2025. This year, we’re offering a comprehensive lineup of sessions and booth activities to help you build resilient, performant, and scalable applications wherever you need them—in the cloud, on premises, or at the edge.

Session types

These sessions are available in the following formats. Most of the sessions are under the topic of Hybrid Cloud and Multicloud (HMC) in the event catalog. If you plan to attend in person, lightning talks and theater sessions are walk-up only. For all other session types, you can reserve your seat through the event portal (login required) or join as a walk-up based on availability.

  • Breakout sessions – Lecture-style 60-minute presentations led by AWS experts and customers.
  • Lightning talks – 20-minute content on specific topics. Each Hybrid Cloud lightning talk features a real-world customer implementation.
  • Chalk talks – 60-minute interactive sessions where AWS experts lead discussions and whiteboard in real time.
  • Code talks – 60-minute sessions featuring coding demonstrations and technical implementations.
  • Workshops – Hands-on, 120-minute sessions where you work directly with AWS services in a guided environment.
  • Theater sessions – 15-minute quick sessions on the Expo floor, typically featuring partner solutions.

Our Hybrid Cloud sessions explore how you can extend AWS infrastructure, services, and tools to distributed locations for low latency, data residency, and local processing needs. These sessions focus on AWS Local Zones, AWS Outposts, and AWS Dedicated Local Zones. We’ve curated the following sessions by theme to help you navigate re:Invent and find content most relevant to your needs.

Leadership session

HMC202 | Breakout Session | AWS wherever you need it: From the cloud to the edge

Wednesday, Dec 3 | 2:30 PM – 3:30 PM PST

Wynn | Convention Promenade | Lafite 7 | Content Hub | Turquoise Theater

Presented by our engineering and product management leaders, Spencer Dillard and Madhura Kale, this session provides an overview of all our latest innovations for hybrid cloud and edge computing, and how they help you address infrastructure requirements in digital sovereignty, generative AI with local data processing needs, and migration and modernization with on-premises dependencies.

Theme 1: Generative AI and agentic AI with local data processing

As you scale generative AI implementations from pilots to production, you need to balance speed of innovation with data sovereignty requirements, low-latency edge inference needs, and space, power, and cost efficiency. In the following sessions, you will learn how to address these challenges.

HMC308 | Breakout session | Build generative and agentic AI applications on-premises and at the edge

Thursday, Dec 4 | 2:30 PM – 3:30 PM PST

Wynn | Upper Convention Promenade | Cristal 7

This session shares reference architectures, best practices, and demos for running small language models (SLMs), Retrieval Augmented Generation (RAG), and agentic AI with AWS Hybrid and Edge services. Gain insights into strategies for model selection and optimization.

HMC324 | Lightning talk | BCC: Hybrid architecture for generative AI to meet regulatory needs

Monday, Dec 1 | 10:00AM – 10:20AM PST

Mandalay Bay | Oceanside C | Content Hub | Lightning Theater

Learn how Bank CenterCredit (BCC) implemented two generative AI use cases: anonymizing personally identifiable information (PII) in customer service calls with Outposts before sending it to the parent AWS Region for foundation model (FM) fine-tuning, and implementing local RAG with regulated data to improve HR efficiency.

HMC311 | Chalk talk | Developing end-to-end SLM pipelines from the cloud to the edge

Thursday, Dec 4 | 11:30AM – 12:30 PM PST

MGM | Level 3 | Chairman’s 362

This session presents a comprehensive approach to deploying SLMs to Local Zones and Outposts using Amazon SageMaker and Amazon EKS. Learn how to deliver domain-specific, fine-tuned SLMs from Regions to edge locations for low-latency inference.

HMC312 | Chalk talk | Implement RAG while meeting data residency requirements

Wednesday, Dec 3 | 5:30PM – 6:30 PM PST

Mandalay Bay | Level 3 South | South Seas A

This session explores how to implement RAG with on-premises and edge data to help you meet data residency and digital sovereignty needs.

HMC317 | Code talk | Implement Agentic AI at the edge for industrial automation

Monday, Dec 1 | 10:30AM – 11:30AM PST

Mandalay Bay | Level 3 South | Jasmine H

Manufacturing and industrial operations demand real-time, intelligent decision-making in low-connectivity environments. Learn how to deploy SmolVLMs (small vision language models) and AI agents to automate site operations using Outposts and Strands Agents.

HMC302 | Workshop | Implementing agentic AI solutions on-premises and at the edge

Wednesday, Dec 3 | 4:00PM – 6:00PM PST

MGM | Level 3 | Chairman’s 368

In this workshop, learn how to extend Amazon Bedrock AgentCore to Outposts and Local Zones to build distributed agentic applications using Model Context Protocol (MCP) and agent-to-agent (A2A) communication with on-premises data.

HMC305 | Workshop | Low-latency SLM deployment: Optimizing inference on AWS Hybrid and Edge services

Monday, Dec 1 | 08:00AM – 10:00AM PST

MGM | Level 1 | Grand 117

This workshop demonstrates a fully local deployment approach for running SLMs at the edge using Local Zones and Outposts. The implementation focuses on achieving low-latency inference and enabling data sovereignty compliance through RAG within local infrastructure.

Theme 2: Migration and modernization with on-premises or edge dependencies

Certain workloads need to stay on-premises or closer to end users. These can be driven by data residency and digital sovereignty requirements or the need to access legacy on-premises systems at a low latency. When a Region is not close enough to meet these needs, AWS brings AWS infrastructure and AWS services closer to where you need them to accelerate migration and modernization.

HMC309 | Breakout session | Migrating your VMware workloads with on-premises dependencies

Thursday, Dec 4 | 11:30AM – 12:30PM PST

Caesars Forum | Level 1 | Summit 212 | Content Hub | Orange Theater

Learn how AWS can help you migrate VMware-based workloads while addressing data residency requirements and latency-sensitive application interdependencies. Gain insights from a real-world implementation at Caesars Entertainment.

HMC217 | Lightning talk | Rivian: Modernize mission-critical manufacturing applications with AWS

Wednesday, Dec 3 | 2:30PM – 2:50PM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

Manufacturing applications like Ignition, SCADA, MES, and robotic control require low-latency connectivity to on-premises manufacturing equipment. Learn how Rivian is modernizing mission-critical and latency-sensitive manufacturing applications with Outposts.

HMC313 | Chalk talk | Extend Amazon EKS clusters for on-premises and edge use cases

Wednesday, Dec 3 | 4:00PM – 5:00PM PST

MGM | Level 3 | Chairman’s 356

Dive deep into strategies for modernizing distributed applications with Amazon EKS across Regions, Local Zones, and Outposts.

HMC303 | Workshop | Migrate and modernize VMware workloads with on-premises dependencies

Tuesday, Dec 2 | 12:30PM – 2:30PM PST

Wynn | Convention Promenade | Margaux 2

This workshop guides you through migrating VMware and other on-premises applications to Outposts while modernizing them through containerization.

Theater session | Deploying robust disaster recovery for mission-critical workloads

Wednesday, Dec 3 | 1:00PM – 1:15PM PST

The Venetian | Level 2 | Expo Hall B | NetApp Booth (#1039)

With Outposts third-party storage integration, you can modernize right inside your data centers while leveraging your investment in on-premises storage solutions. Join this session to learn how you can implement a robust disaster recovery solution for mission-critical workloads using Amazon FSx for NetApp ONTAP and on-premises ONTAP with Outposts. Learn how to perform real-time SnapMirror replication between Regions and on-premises environments and monitor replication status and RPO metrics.

Theme 3: Data residency and digital sovereignty

As organizations scale innovative solutions globally, they need to navigate complex digital sovereignty requirements. Learn how AWS Hybrid and Edge services help you adopt a consistent approach to security, monitoring, management, and auditing across jurisdictions while meeting regulatory obligations.

HMC310 | Breakout session | Digital sovereignty and data residency with AWS Hybrid and Edge services

Tuesday, Dec 2 | 4:30PM – 5:30PM PST

Caesars Forum | Level 1 | Summit 212 | Content Hub | Yellow Theater

This session examines best practices for data residency, security controls, and operational consistency across distributed locations. Hear how AWS helped Geidea, a leading fintech company, accelerate business expansion in the Middle East while meeting country-specific data residency requirements.

HMC214 | Lightning talk | DraftKings: Meeting gaming regulations at Super Bowl scale with AWS Local Zones

Wednesday, Dec 3 | 2:00PM – 2:20PM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

Learn how DraftKings built a scalable edge strategy using Regions and Local Zones to meet Federal Wire Act requirements while accelerating expansion into 26 US states.

HMC316 | Chalk talk | Address digital sovereignty with hybrid cloud solutions

Monday, Dec 1 | 1:30PM – 2:30PM PST

Mandalay Bay | Lower Level North | South Pacific D

Learn how to choose the best AWS infrastructure option for your sovereign needs and architect applications for data residency and resiliency. Discover how to implement security controls to regulate how data can be stored, processed, and transferred, and how to prevent unauthorized data access.

Theme 4: Optimizing for low and ultra-low latency

Systems like online ticketing, real-time threat detection, manufacturing control, and financial trading require network latency ranging from double-digit milliseconds to low double-digit microseconds. The Hybrid Cloud track discusses how AWS brings cloud services closer to end users and data generation points to satisfy various latency profiles.

SPF206 | Breakout session | Ticketmaster: Enhancing live event experience for fans with AWS Local Zones

Thursday, Dec 4 | 2:30PM – 3:30PM PST

Caesars Forum | Level 1 | Sports Forum | Mainstage

Discover how Ticketmaster delivers superior live event experiences by bringing its online ticket sales platform closer to fans using Local Zones.

HMC216 | Lightning talk | LSEG: Modernizing critical financial systems with multicloud and hybrid cloud

Tuesday, Dec 2 | 3:30PM – 3:50PM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

Learn how LSEG is transforming its global PriceStream FX trading platform, implementing a sophisticated architecture for ultra-low latency with managed services. Additionally, explore how LSEG is modernizing clearing systems as critical national infrastructure, balancing regulatory compliance, operational excellence, and business continuity with strict RPO/RTO requirements.

HMC215 | Lightning talk | AWS Local Zones – Sophos’ new edge in the global race against cyber-attacks

Tuesday, Dec 2 | 3:00PM – 3:20PM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

Discover how AWS Hybrid and Edge solutions transform how organizations deliver low-latency applications at the edge. Learn how Local Zones extends Regions and services closer to population centers, fitting use cases from media streaming to real-time gaming and financial trading.

HMC213 | Lightning talk | AWS Local Zones & Outposts: Verifone’s Global Edge Computing Strategy

Tuesday, Dec 2 | 11:30AM – 11:50AM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

Hear from Verifone on how it transformed its global payment ecosystem, managing 35 million terminals with innovative edge computing. Dive into strategies for deploying point-of-sale (POS) software using multi-tier architectures.

HMC402 | Chalk talk | Meet ultra-low latency and high throughput needs with AWS Outposts

Wednesday, Dec 3 | 10:00AM – 11:00AM PST

Mandalay Bay | Level 2 South | Lagoon G

Dive deep into the new category of accelerated networking Amazon EC2 instances on Outposts, purpose-built for modernizing ultra-low latency and high-throughput mission-critical workloads.

Theme 5: Architecture considerations for hybrid cloud

Running applications outside of Regions requires special architectural considerations to address space, power, and networking constraints. We will share best practices and implementation guidance on improving security, resiliency, and availability.

HMC327 | Lightning talk | Nasdaq: Build resilient infrastructure for global financial services

Tuesday, Dec 2 | 11:00AM – 11:20AM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

This session discusses how Nasdaq modernizes its mission-critical capital markets infrastructure while upholding the highest level of resiliency. Learn how Nasdaq integrates Outposts and multi-Region deployments into its core trading and surrounding systems, balancing cloud flexibility with performance and reliability.

HMC328 | Lightning talk | Build resilient and low-latency hybrid telecom infrastructure at scale

Thursday, Dec 4 | 5:00PM – 5:20PM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

Discover how Liberty Latin America (LLA) built telecom infrastructure in a hybrid cloud architecture for millions of subscribers. Learn how LLA created a resilient, low-latency network with over 180 VPCs.

HMC314 | Chalk talk | Deploying for resilience: HA/DR strategies for AWS Outposts and AWS Local Zones

Monday, Dec 1 | 1:30PM – 2:30PM PST

MGM | Level 1 | Boulevard 169

In this chalk talk, learn how to plan and implement resilient deployments to deliver high availability and disaster recovery, especially for business-critical or mission-critical workloads.

HMC315 | Chalk talk | Deep dive on AWS hybrid and edge networking architectures

Tuesday, Dec 2 | 1:30PM – 2:30PM PST

MGM | Level 1 | Boulevard 156

This chalk talk covers patterns such as private vs. public connectivity, service link and on-premises connectivity, and designing resilient Multi-AZ architecture across Outposts and Local Zones.

HMC403 | Code talk | Build and optimize edge architecture for resiliency with AI

Wednesday, Dec 3 | 2:30PM – 3:30PM PST

MGM | Level 3 | Chairman’s 356

This live coding session explores how to automate edge infrastructure operations with AI. Discover how to build resilient architectures with the latest Outposts and Local Zones APIs and Strands Agents.

HMC301 | Workshop | Build and operate resilient and performant distributed applications [REPEAT]

HMC301-R: Monday, Dec 1 | 3:00PM – 5:00PM PST | MGM | Grand 117

HMC301-R1: Thursday, Dec 4 | 3:00PM – 5:00PM PST | MGM | Premier 318

This workshop explores how to design and implement applications for multi-geo operations while meeting data residency and performance requirements. You will learn how to design fault-tolerant, latency-sensitive applications across distributed locations with limited hardware resources.

Activities in the Expo

Beyond sessions, join us in the re:Invent Expo (The Venetian, Level 2, Hall B) to meet with AWS experts and learn through interactive demos.

AWS Village (Booth #750)

Connect with AWS experts in the Hybrid Cloud kiosk and AWS Global Infrastructure kiosk to discuss your hybrid cloud and edge computing needs. Watch demos on the following topics and ask questions:

  • Consistent Amazon EKS experience across distributed locations and how it can simplify GPU resource orchestration
  • How to optimize and deploy SLMs locally for AI inference with low latency or data residency needs
  • How to build agentic AI workflows at the edge for manufacturing automation

Stop by the Migration & Modernization area of the AWS Village to see hardware innovations inside the latest generation of Outposts up close and personal.

AWS for Industries Pavillion (Booth #111)

Explore how AWS Hybrid and Edge services unlock new use cases and improve operational efficiency across multiple industries through immersive experiences:

  • AWS for Telecom – Modernizing telecom infrastructure while meeting data sovereignty and performance requirements
  • AWS for Public Sector – Accelerating tactical edge deployments to improve mission outcomes
  • AWS for Automotive – Advancing global R&D of embedded hardware and software

Partner booths

Discover how AWS Hybrid and Edge services and AWS Partner solutions unlock additional use cases through demos:

  • Pure Storage (Booth #1756) – Simplifying cloud migration with on-premises dependencies through Outposts integration with Pure Storage FlashArray
  • Intel (Booth #1010): – Powering physical AI and agentic systems for real-time operations in manufacturing
  • Seagate (Booth #159) – Facilitating data ingestion and pre-processing at the edge

Ready for re:Invent 2025?

We hope this guide to hybrid cloud and edge computing helps you maximize your learning experience at re:Invent. Not able to attend in-person? Register for the virtual-only event offered at no additional cost to livestream keynotes and innovation talks, and access on-demand session content. See you in Las Vegas or virtually!

Optimize unused capacity with Amazon EC2 interruptible capacity reservations

Post Syndicated from Shubham Sarin original https://aws.amazon.com/blogs/compute/optimize-unused-capacity-with-amazon-ec2-interruptible-capacity-reservations/

Organizations running critical workloads on Amazon Elastic Compute Cloud (Amazon EC2) reserve compute capacity using On-Demand Capacity Reservations (ODCR) to have availability when needed. However, reserved capacity can intermittently sit idle during off-peak periods, between deployments, or when workloads scale down. This unused capacity represents a missed opportunity for cost optimization and resource efficiency across the organization.

Amazon EC2 now offers interruptible ODCRs, a new capability that lets you make unused compute capacity temporarily available to other workloads while maintaining control to reclaim it. This feature helps you optimize reservations and reduce costs across your AWS organization.

In this post, we explore how this works through a practical customer scenario.

Customer scenario: Maximizing reservation utilization across

Consider a financial services company where the trading platform team maintains a large fleet of r7i.4xlarge instances reserved around-the-clock for critical blue/green deployments. During off-peak trading hours and weekends, a significant portion of this reserved capacity sits idle. Meanwhile, the data analytics team regularly runs batch processing jobs for risk modeling—workloads that could benefit from additional compute capacity but don’t require the same availability guarantees as the trading platform.

Previously, sharing this capacity meant losing control over when it could be reclaimed, creating operational challenges when the trading platform needed to scale up quickly during market volatility. Interruptible ODCRs solve this problem by giving the reservation owner control to reclaim capacity when needed for critical operations.

In the following sections, we walk through the key steps to configure capacity sharing, launch instances, and reclaim capacity. The high-level steps are:

  1. Set up capacity sharing
  2. Discover available capacity and launch instances
  3. Reclaim capacity and handle interruptions

Step 1: Set up capacity sharing

The trading platform team begins by identifying unused capacity patterns through Amazon EC2 Capacity Manager. They determine that approximately 60% of their reserved r7i.4xlarge capacity remains unused during overnight hours and weekends.

Create interruptible ODCR

For prerequisites, see Interruptible Capacity Reservations for capacity owners in the Amazon EC2 User Guide.

To repurpose this idle capacity, the trading platform team creates an interruptible ODCR using the AWS Management Console, SDK, or AWS Command Line Interface (AWS CLI). To use the console, they complete the following steps:

  1. On the Amazon EC2 console, choose Capacity Reservations in the navigation pane.
  2. Select the source ODCR and choose Create interruptible reservation.
  3. For Instances to allocate, enter how many instances to allocate (out of a 100-instance reservation). For this example, we allocate 60 instances.
  4. Choose Create interruptible reservation.

This configuration withdraws 60 instances from their original reservation and creates a new ODCR with interruptible configuration. The original reservation now shows 40 instances, and the new interruptible reservation shows 60.

Create interruptible on-demand capacity reservation

Share resources across the organization

With the interruptible reservation created, the reservation owning team uses AWS Resource Access Manager (AWS RAM)—a service that helps you securely share AWS resources across accounts and organizations—to share the newly created ODCR with additional accounts in their organization. When sharing your ODCR, you specify which consumer account IDs in your organization will get access to the interruptible ODCR. Alternatively, you can share the ODCR with your entire AWS Organization or Organizational Unit (OU). When it’s complete, the specified accounts get access to the interruptible ODCR capacity and establish a setup like the one illustrated in the following diagram.

Share resources across the organization

Sharing with all accounts (at once) within the organization requires organization-wide sharing to be enabled in AWS Organizations setup. If organization-wide sharing is not enabled, a user can still share with individual accounts by enumerating each account.

Step 2: Discover available capacity and launch instances

After the reservation owner (the trading platform team) shares their reservation, the capacity consumer (data analytics team) needs to find the capacity in their account and launch into it. In this section, we walk through the interruptible ODCR discovery and launch process.

Discover available capacity

The data analytics team, running batch processing jobs in a separate AWS account, can now find the shared interruptible capacity in their account using the console, SDK, or AWS CLI. To use the console, they complete the following steps:

  1. On the Amazon EC2 console, choose Capacity Reservations in the navigation pane.
  2. Choose the ODCR to view its details page.

The interruptible reservation appears with a clear indication that it’s interruptible, showing the instance type (r7i.4xlarge), Availability Zone, and available capacity.

Discover available capacity

Configure Auto Scaling groups for interruptible capacity

To use this capacity for their batch processing workloads, the analytics team creates a new launch template specifically designed for interruptible capacity. The key configuration element is setting the new market-type parameter and targeting the interruptible ODCR.

In the launch template, specify the following:

  • Instance type: r7i.4xlarge (matching the shared capacity)
  • Capacity reservation specification: Targeted
  • Capacity reservation ID: Enter the ID of the shared interruptible ODCR
  • Market type: Use the type interruptible-capacity-reservation

Next, create an Auto Scaling group that uses this launch template. The group is configured as follows:

  • Minimum size: 0 (to avoid unnecessary costs when capacity isn’t needed)
  • Maximum size: 40 (within the available shared capacity)
  • Desired capacity: Set based on job queue length

Launch instances into interruptible capacity

When the analytics team’s batch processing jobs trigger scaling events, the Auto Scaling group launches instances that automatically target the shared interruptible ODCR. These instances launch immediately if capacity is available, providing the team with access to reserved capacity for their fault-tolerant workloads. The instances appear on the Amazon EC2 console with their instance lifecycle as interruptible-capacity-reservation and the ODCR ID in which they’re running. This provides clear indication that they’re running on interruptible capacity, helping with monitoring and cost allocation.

Step 3: Reclaim capacity and handle interruptions

In this section, we review how the capacity owner (the trading platform team) can reclaim their capacity when needed for their critical operations and how the capacity consumer can gracefully handle such interruptions.

Trigger reclamation

When market volatility increases and the trading platform needs to scale up quickly, the platform team initiates capacity reclamation through the console, SDK, or AWS CLI. To use the console, they complete the following steps:

  1. On the Amazon EC2 console, choose Capacity Reservations in the navigation pane.
  2. Choose the ODCR to view its details page.
  3. Choose Edit Interruptible Allocation.
  4. Specify how many instances are needed back (in this case, all 60 instances for maximum trading capacity).
  5. Choose Update, then choose Confirm.

Edit interruptible allocation

The reclamation process can also be automated using AWS Lambda functions triggered by Amazon CloudWatch alarms or scheduled events, providing proactive capacity management based on predictable usage patterns.

Consumer notification and graceful shutdown

After the owner triggers capacity reclamation, consuming instances receive a 2-minute instance interruption warning notice through Amazon EventBridge. The analytics team has configured their batch processing applications to listen for these events. Their applications receive this 2-minute warning and immediately begin checkpointing their current work, saving intermediate results to Amazon Simple Storage Service (Amazon S3), and gracefully shutting down. For EventBridge notification details, refer to the Monitor interruptible Capacity Reservations with EventBridge section in the EC2 Capacity Reservations User Guide.

Automatic capacity restoration to source ODCR

After the 2-minute notice period, Amazon EC2 starts shutting down the consuming instances. After the instances are successfully shut down, Amazon EC2 restores the capacity to the trading platform’s original ODCR. The trading platform can then launch their critical workloads into the same ODCR, resulting in minimal delay for their scaling requirements. The reservation owner can track their capacity reclamation status through the console or API. On the Amazon EC2 console, the ODCR details page shows the current instance allocation, target instance allocation, and request status. When current and target counts match, the status changes to Active, confirming completion.

After the reservation owner requests their capacity back, the capacity reclamation process can take a few minutes, so reservation owners should account for this delay when planning critical activities. This is because Amazon EC2 provides a 2-minute warning to the consumer instances, followed by the instance shutdown period.

Billing and cost considerations

The billing model for interruptible ODCRs follows a clear usage-based approach that aligns costs with consumption:

  • Reservation owner (trading platform team) – Pays EC2 On-Demand rates for unused capacity in the interruptible ODCR, just like any standard ODCR. For example, when the analytics team uses 30 out of 60 available instances, the trading platform pays for the remaining 30 unused instances.
  • Consumer (analytics team) – Pays EC2 On-Demand rates only for the instances they actually launch and use. For example, when they use 30 instances for 4 hours, they’re charged for 30 × 4 = 120 instance-hours at the standard r7i.4xlarge On-Demand rate.

Conclusion

Amazon EC2 interruptible ODCR helps organizations optimize compute spending while maintaining operational control. Through capacity reclamation mechanisms, teams can achieve better resource utilization without compromising availability guarantees. In this post, we showed how this capability addresses real operational challenges through an example use case—enabling a trading platform to maintain their critical capacity guarantees while helping other teams access high-quality compute resources for their workloads. The predictable interruption model creates a sustainable approach to capacity sharing that benefits the entire organization.

To get started with interruptible capacity reservations, refer to EC2 Capacity Reservations User Guide. To learn more about using EC2 Auto Scaling, refer the Interruptible Capacity Reservations with EC2 Auto Scaling guide. Refer to the AWS RAM User Guide to learn more about sharing resources across your organization and the Amazon EventBridge User Guide to learn more about handling interruption notifications in your applications.

Build production-ready applications without infrastructure complexity using Amazon ECS Express Mode

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/build-production-ready-applications-without-infrastructure-complexity-using-amazon-ecs-express-mode/

Deploying containerized applications to production requires navigating hundreds of configuration parameters across load balancers, auto scaling policies, networking, and security groups. This overhead delays time to market and diverts focus from core application development.

Today, I’m excited to announce Amazon ECS Express Mode, a new capability from Amazon Elastic Container Service (Amazon ECS) that helps you launch highly available, scalable containerized applications with a single command. ECS Express Mode automates infrastructure setup including domains, networking, load balancing, and auto scaling through simplified APIs. This means you can focus on building applications while deploying with confidence using Amazon Web Services (AWS) best practices. Furthermore, when your applications evolve and require advanced features, you can seamlessly configure and access the full capabilities of the resources, including Amazon ECS.

You can get started with Amazon ECS Express Mode by navigating to the Amazon ECS console.

Amazon ECS Express Mode provides a simplified interface to the Amazon ECS service resource with new integrations for creating commonly used resources across AWS. ECS Express Mode automatically provisions and configures ECS clusters, task definitions, Application Load Balancers, auto scaling policies, and Amazon Route 53 domains from a single entry point.

Getting started with ECS Express Mode
Let me walk you through how to use Amazon ECS Express Mode. I’ll focus on the console experience, which provides the quickest way to deploy your containerized application.

For this example, I’m using a simple container image application running on Python with the Flask framework. Here’s the Dockerfile of my demo, which I have pushed to an Amazon Elastic Container Registry (Amazon ECR) repository:


# Build stage
FROM python:3.6-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt gunicorn

# Runtime stage
FROM python:3.6-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY app.py .
ENV PATH=/root/.local/bin:$PATH
EXPOSE 80
CMD ["gunicorn", "--bind", "0.0.0.0:80", "app:app"]

On the Express Mode page, I choose Create. The interface is streamlined — I specify my container image URI from Amazon ECR, then select my task execution role and infrastructure role. If you don’t already have these roles, choose Create new role in the drop down to have one created for you from the AWS Identity and Access Management (IAM) managed policy.

If I want to customize the deployment, I can expand the Additional configurations section to define my cluster, container port, health check path, or environment variables.

In this section, I can also adjust CPU, memory, or scaling policies.

Setting up logs in Amazon CloudWatch Logs is something I always configure so I can troubleshoot my applications if needed. When I’m happy with the configurations, I choose Create.

After I choose Create, Express Mode automatically provisions a complete application stack, including an Amazon ECS service with AWS Fargate tasks, Application Load Balancer with health checks, auto scaling policies based on CPU utilization, security groups and networking configuration, and a custom domain with an AWS provided URL. I can also follow the progress in Timeline view on the Resources tab.

If I need to do a programmatic deployment, the same result can be achieved with a single AWS Command Line Interface (AWS CLI) command:

aws ecs create-express-gateway-service \
--image [ACCOUNT_ID].ecr.us-west-2.amazonaws.com/myapp:latest \
--execution-role-arn arn:aws:iam::[ACCOUNT_ID]:role/[IAM_ROLE] \
--infrastructure-role-arn arn:aws:iam::[ACCOUNT_ID]:role/[IAM_ROLE]

After it’s complete, I can see my application URL in the console and access my running application immediately.

After the application is created, I can see the details by visiting the specified cluster, or the default cluster if I didn’t specify one, in the ECS service to monitor performance, view logs, and manage the deployment.

When I need to update my application with a new container version, I can return to the console, select my Express service, and choose Update. I can use the interface to specify a new image URI or adjust resource allocations.

Alternatively, I can use the AWS CLI for updates:

aws ecs update-express-gateway-service \
  --service-arn arn:aws:ecs:us-west-2:[ACCOUNT_ID]:service/[CLUSTER_NAME]/[APP_NAME] \
  --primary-container '{
    "image": "[IMAGE_URI]"
  }'

I find the entire experience reduces setup complexity while still giving me access to all the underlying resources when I need more advanced configurations.

Additional things to know
Here are additional things about ECS Express Mode:

  • Availability – ECS Express Mode is available in all AWS Regions at launch.
  • Infrastructure as Code support – You can use IaC tools such as AWS CloudFormation, AWS Cloud Development Kit (CDK), or Terraform to deploy your applications using Amazon ECS Express Mode.
  • Pricing – There is no additional charge to use Amazon ECS Express Mode. You pay for AWS resources created to launch and run your application.
  • Application Load Balancer sharing – The ALB created is automatically shared across up to 25 ECS services using host-header based listener rules. This helps distribute the cost of the ALB significantly.

Get started with Amazon ECS Express Mode through the Amazon ECS console. Learn more on the Amazon ECS documentation page.

Happy building!
Donnie

Serverless strategies for streaming LLM responses

Post Syndicated from KyungYong Shim original https://aws.amazon.com/blogs/compute/serverless-strategies-for-streaming-llm-responses/

Modern generative AI applications often need to stream large language model (LLM) outputs to users in real-time. Instead of waiting for a complete response, streaming delivers partial results as they become available, which significantly improves the user experience for chat interfaces and long-running AI tasks. This post compares three serverless approaches to handle Amazon Bedrock LLM streaming on Amazon Web Services (AWS), which helps you choose the best fit for your application.

  1. AWS Lambda function URLs with response streaming
  2. Amazon API Gateway WebSocket APIs
  3. AWS AppSync GraphQL subscriptions

We cover how each option works, the implementation details, authentication with Amazon Cognito, and when to choose one over the others.

Lambda function URLs with response streaming

AWS Lambda function URLs provide a direct HTTP(S) endpoint to invoke your Lambda function. Response streaming allows your function to send incremental chunks of data back to the caller without buffering the entire response. This approach is ideal for forwarding the Amazon Bedrock streamed output, providing a faster user experience. Streaming is supported in Node.js 18+. In Node.js, you wrap your handler with awslambda.streamifyResponse(), which provides a stream to write data to, and which sends it immediately to the HTTP response.

Architecture

The following figure shows the architecture.

Lambda function URLs with Amazon Bedrock architecture

  1. The client makes a fetch() call to the Lambda function URL.
  2. Lambda invokes InvokeModelWithResponseStream using the AWS SDK for JavaScript.
  3. As tokens arrive from Amazon Bedrock, they are written to the response stream.

Implementation steps

  1. Create a streaming Lambda function: Use a Node.js 18+ or later runtime (necessary for native streaming). Install the AWS SDK to call Amazon Bedrock. In the handler code, wrap the function with awslambda.streamifyResponse and stream the model output. For example, in Node.js you might do the following:
    const bedrock = new BedrockRuntimeClient({region: “us-east-1”});
    
    // Please consider adding more details when you use it for your application.
    exports.handler = awslambda.streamifyResponse(async (event, responseStream) => 
    {
        // 1. Parse input (e.g., prompt from event)
        const prompt = event.body?.prompt;
        // 2. Call Amazon Bedrock with streaming (using AWS SDK for Amazon Bedrock)
        const command = new InvokeModelWithResponseStreamCommand({ modelId: "YOUR_MODEL_ID", body: { prompt }});
        const response = await bedrock.send(command);
        // 3. Stream Bedrock tokens to client
        for await (const event of response.body) {
            if (event.content) {
                responseStream.write(event.content); // write partial output
            }
        }
        // 4. End stream when done
        responseStream.end();
    });
    

  2. This code snippet uses the Amazon Bedrock SDK’s async iterable to read the event stream of tokens and writes each to the responseStream.
  3. Configure the Lambda role: the execution role must allow the Amazon Bedrock invocation (such as bedrock:InvokeModelWithResponseStream on the LLM model Amazon Resource Name (ARN)).

Authentication with Amazon Cognito

Lambda function URLs can be set to “None” (public) or “AWS_IAM”. Native Cognito User Pool token authentication isn’t supported, thus you need to implement a solution.

  1. JWT verification in Lambda: Allow public access and verify a valid JWT from Amazon Cognito in the request header within your Lambda code. This necessitates development effort.
    // Initialize Cognito JWT Verifier
    const { CognitoJwtVerifier } = require('aws-jwt-verify');
    
    const jwtVerifier = CognitoJwtVerifier.create({
      userPoolId: USER_POOL_ID,
      tokenUse: 'id',
      clientId: USER_POOL_CLIENT_ID
    });
    
    // Verify JWT token from Cognito
    async function verifyToken(token) {
      try {
        if (!token) throw new Error('No authorization token provided');
        
        // Remove 'Bearer ' prefix if present
        if (token.startsWith('Bearer ')) {
          token = token.slice(7);
        }
    
        // Verify the token using Cognito JWT Verifier
        const payload = await jwtVerifier.verify(token);
        logger.info(`Verified token for user: ${payload.sub}`);
        
        return payload;
      } catch (error) {
        logger.error(`Token verification failed: ${error.message}`);
        throw new Error(`Invalid token: ${error.message}`);
      }
    }
    
    //...
    
        // Verify authentication
        let userId;
        try {
          const authHeader = event.headers?.Authorization;
          const payload = await verifyToken(authHeader);
          userId = payload.sub;
          logger.info(`Authenticated user: ${userId}`);
        } catch (error) {
          responseStream.write(`data: ${JSON.stringify({ type: 'error', error: 'Unauthorized', message: error.message })}\n\n`);
          return;
        }
    

  2. IAM authorization with Amazon Cognito identity: Use AWS credentials obtained from Amazon Cognito. A more complex setup, especially for web apps, is potentially overkill for a single function.

Pros and cons of Lambda function URLs

Pros:

  • Clarity: No API Gateway or other services are needed, which minimizes operational overhead.
  • Low latency, high throughput: The response is delivered directly from Lambda to the client. This yields excellent Time To First Byte (TTFB) performance, with no intermediate buffering.
  • Direct implementation: For Node.js developers, enabling streaming is as direct as a wrapper and writing to a stream. This is ideal for quick prototypes or single function microservices.
  • Lower cost for low concurrent usage: You pay only for Lambda execution time. There’s no persistent connection cost, which is the same as with WebSocket or AWS AppSync. If invocations are infrequent or short, then this could be cost-efficient.

Cons:

  • Limited runtime support: Native streaming is only supported in Node.js.
  • No built-in user pool auth: Unlike API Gateway or AWS AppSync, Lambda URLs don’t directly support Amazon Cognito user pool authorizers. You must handle auth either through AWS Identity and Access Management (IAM) or manual token validation, adding some development effort and potential security pitfalls if done incorrectly.
  • Error handling complexity: Streaming makes error propagation trickier. If an error occurs mid-stream, then you need to decide how to inform the client.

API Gateway WebSocket for streaming

API Gateway WebSocket APIs establish persistent, stateful connections between clients and your backend. This is ideal for real-time applications needing server-initiated messages. The client connects once, sends a prompt to Amazon Bedrock through the WebSocket, and the server pushes the streamed response back over the same connection.

Architecture

The following figure shows the architecture.

API Gateway WebSocket with Amazon Bedrock architecture

  1. Client connects through the WebSocket URL and store connectionId.
  2. Client sends a prompt through a custom route to the LLMHandler.
  3. Lambda as LLMHandler invokes Amazon Bedrock and streams back through WebSocket.
  4. Client disconnects through the DisconnectHandler and removes connectionId.

Implementation steps

  1. Create a WebSocket API in API Gateway with routes
    1. $connect: Connected to ConnectHandler Lambda.
    2. $disconnect: Connected to DisconnectHandler Lambda.
    3. $stream: All messages go to StreamHandler Lambda.
  2. Create Lambda Authorizer
    1. Receives the connection request with token in query string.
    2. Validates the JWT token against Amazon Cognito.
    3. Returns Allow/Deny policy for the connection.
      def lambda_handler(event, context):
          # Extract token from querystring
          token = event.get("queryStringParameters", {}).get("token", "")
          
          # Validate JWT token against Cognito
          if validate_token(token):
              return {
                  "isAuthorized": True,
                  # Optionally include context that other handlers can access
                  "context": {
                      "userId": extracted_user_id
                  }
              }
          else:
              return {"isAuthorized": False}
      

  3. Create Connection Handler
    1. Connection Lambda runs after successful authorization.
    2. Receives the new connection’s unique connectionId.
    3. Store connection info in Amazon DynamoDB (optional).
    4. Returns 200 status to complete the connection.
      def lambda_handler(event, context):
          # Extract connectionId
          connection_id = event.get("requestContext", {}).get("connectionId")
          
          # Optionally store in DynamoDB
          # dynamodb.put_item(...)
          
          # Connection established successfully
          return {"statusCode": 200}
      

  4. Create Disconnect Handler
    1. Disconnect Lambda is triggered automatically when clients disconnect.
    2. Receives the terminated connection’s connectionId.
    3. Cleans up any stored connection data.
    4. Returns 200 status
      def lambda_handler(event, context):
          # Extract connectionId
          connection_id = event.get("requestContext", {}).get("connectionId")
          
          # Optionally remove from DynamoDB
          # dynamodb.delete_item(...)
          
          # Disconnection handled successfully
          return {"statusCode": 200}
      

  5. Create LLM Handler
      1. Receives messages sent to the stream route.
      2. Extracts prompt from the message body.
      3. Calls Amazon Bedrock model with streaming response.
      4. Streams tokens back to the client using the connection ID.
        def lambda_handler(event, context):
            # Extract connectionId and domain details for sending responses
            connection_id = event["requestContext"]["connectionId"]
            domain = event["requestContext"]["domainName"]
            stage = event["requestContext"]["stage"]
            
            # Parse message body to get the prompt
            body = json.loads(event.get("body", "{}"))
            prompt = body.get("prompt", "")
            
            # Create API Gateway management client for sending responses
            api_client = boto3.client(
                'apigatewaymanagementapi',
                endpoint_url=f'https://{domain}/{stage}'
            )
            
            # Call Amazon Bedrock with streaming response
            response = bedrock_client.invoke_model_with_response_stream(...)
            
            # Stream tokens back to client
            for chunk in response["body"]:
                # Extract token from chunk
                token = process_chunk(chunk)
                
                # Send token directly back through the WebSocket
                api_client.post_to_connection(
                    ConnectionId=connection_id,
                    Data=json.dumps({"token": token, "isComplete": False})
                )
            
            # Send completion message
            api_client.post_to_connection(
                ConnectionId=connection_id,
                Data=json.dumps({"token": "", "isComplete": True})
            )
            
            return {"statusCode": 200}
        

Authentication with Amazon Cognito

Securing a WebSocket API with Amazon Cognito needs a bit more work. API Gateway WebSocket doesn’t have a built-in Amazon Cognito User Pool authorizer:

  1. Lambda authorizer with JWT authentication: API Gateway invokes your Lambda authorizer upon connection, validating the Amazon Cognito JWT (passed as a query parameter). The Lambda generates an IAM policy granting access and returns it.
  2. IAM authentication for WebSockets: Clients sign requests with SigV4 using AWS credentials from an Amazon Cognito Identity Pool. API Gateway evaluates the request against IAM policies.

Pros and cons of API Gateway WebSocket APIs

Pros:

  • Bidirectional real-time communication: WebSockets are ideal for applications where the server needs to push data such as the LLM’s response without explicit requests.
  • Persistent connection for multi-turn conversations: After the initial handshake, the same connection can be reused for subsequent prompts and responses, avoiding repeated setup latency. This is great for a chat UI where the user asks multiple questions in one session.
  • Scalability: API Gateway is a managed service that can handle 500 connections/second and 10,000 requests/second across APIs, which can be increased by request.

Cons:

  • Higher development complexity: When compared to the clarity of a direct Lambda URL, a WebSocket API involves multiple Lambdas and coordination to manage the connection state.
  • Custom auth implementation: There is no built-in Amazon Cognito user pool integration, thus you must implement a Lambda authorizer.
  • Timeout management: The API Gateway integration timeout is 29 s, thus your Lambda function should return the response promptly.

AWS AppSync GraphQL subscription

AWS AppSync is a fully managed GraphQL service that streamlines building real-time APIs. It handles WebSocket connections and client fan-out automatically. Clients subscribe to a GraphQL subscription, and a Lambda resolver pushes the Amazon Bedrock streamed tokens back.

Architecture

The following figure shows the architecture.

AWS AppSync GraphQL subscription with Amazon Bedrock architecture

  1. Client calls a startStream mutation. AppSync invokes the Request Lambda.
  2. The Request Lambda immediately returns a unique sessionId and sends the processing task to an Amazon Simple Queue Service (Amazon SQS) queue.
  3. Client uses the sessionId to subscribe to an onTokenReceived GraphQL subscription.
  4. The Processing Lambda (triggered by Amazon SQS) invokes Amazon Bedrock and, for each token, calls a publishToken mutation in AWS AppSync.
  5. AWS AppSync automatically pushes the token to all clients subscribed with the matching sessionId.

Implementation steps

  1. Design the GraphQL Schema: define types and operations.
    type StreamResponse {
      sessionId: String!
      status: String!
      message: String
      timestamp: String!
      error: String
    }
    
    type TokenEvent {
      sessionId: String!
      token: String!
      isComplete: Boolean!
      timestamp: String!
    }
    
    type Mutation {
      startStream(prompt: String!): StreamResponse!
      publishToken(sessionId: String!, token: String!, isComplete: Boolean!): TokenEvent!
    }
    
    type Subscription {
      onTokenReceived(sessionId: String!): TokenEvent
    

  2. Create the Request Handler (Request Lambda)
    1. Receives the GraphQL mutation with the prompt.
    2. Generates a unique session ID.
    3. Sends the prompt and session ID to the SQS queue.
    4. Returns the session ID to the client immediately.
      def lambda_handler(event, context):
          # Extract prompt from GraphQL event
          prompt = event["arguments"]["prompt"]
          
          # Generate unique session ID
          session_id = str(uuid.uuid4())
          
          # Send message to SQS queue
          sqs_client.send_message(
              QueueUrl="your-sqs-queue-url",
              MessageBody=json.dumps({
                  "prompt": prompt,
                  "sessionId": session_id
              })
          )
          
          # Return session ID to client
          return {
              "sessionId": session_id,
              "status": "streaming_started",
              "timestamp": datetime.datetime.utcnow().isoformat()
          }
      

  3. Create the Processing Handler (Processing Lambda)
    1. It is triggered by Amazon SQS messages.
    2. It calls Amazon Bedrock with streaming enabled.
    3. For each token generated, it calls the AppSync publishToken mutation.
      def lambda_handler(event, context):
          # Process SQS event records
          for record in event["Records"]:
              body = json.loads(record["body"])
              prompt = body["prompt"]
              session_id = body["sessionId"]
              
              # Call Amazon Bedrock with streaming
              response = bedrock_client.invoke_model_with_response_stream(...)
              
              # Process streaming response
              for chunk in response["body"]:
                  # Extract token from chunk
                  token = process_chunk(chunk)
                  
                  # Publish token to AppSync
                  publish_token_to_appsync(
                      session_id=session_id,
                      token=token,
                      is_complete=False
                  )
              
              # Send completion token
              publish_token_to_appsync(
                  session_id=session_id,
                  token="",
                  is_complete=True
              )
      

  4. Configure GraphQL Resolvers
    1. StartStream resolver: Connect to the Request Lambda.
    2. PublishToken resolver: Trigger subscription with a NONE data source.
  5. Client subscription setup
    1. Make a startStream mutation.
      const { sessionId } = await client.mutate({
        mutation: START_STREAM,
        variables: { prompt }
      });
      

    2. Subscribe to receive tokens.
      client.subscribe({
        query: ON_TOKEN_RECEIVED,
        variables: { sessionId }
      }).subscribe({
        next: ({ data }) => {
          if (data.onTokenReceived.isComplete) {
            // Handle completion
          } else {
            // Append token to UI
            appendToken(data.onTokenReceived.token);
          }
        }
      });
      

Authentication with Amazon Cognito

AWS AppSync integrates seamlessly with Amazon Cognito User Pools. Setting the API’s auth mode to Amazon Cognito User Pool needs a valid JWT for every GraphQL operation. This is the most developer-friendly option for authentication. AWS AppSync handles the handshake and token refresh.

Pros and cons of AWS AppSync subscriptions

Pros:

  • Fully managed real-time protocol: You don’t deal with raw WebSockets or connection IDs at all. AWS AppSync automatically establishes and maintains a secure WebSocket for subscriptions (no need for a connect or disconnect Lambda).
  • Streamlined authentication: Built-in support for Amazon Cognito User Pool tokens means that you can secure the API without writing custom authorizers.

Cons:

  • Potential overhead and complexity: For a direct case (one prompt—one stream), introducing GraphQL and AWS AppSync might be seen as over-engineering if your app doesn’t use GraphQL for other use cases.
  • 30-second resolver limit: AWS AppSync has a 30-second limit for mutation resolvers, thus you need to design the initial request to start the process and immediately return, relying on a subscription to stream the results progressively to avoid blocking the user.

Conclusion

The Amazon Bedrock streaming interface unlocks fluid, low-latency LLM experiences. You can use the right AWS serverless architecture to deliver streamed responses in a secure, scalable, and cost-effective way.

  • Lambda function URLs with streaming: Direct, single-user applications and prototypes.
  • API Gateway WebSocket: Multi-turn conversations, collaborative applications.
  • AppSync: Complex applications already using GraphQL.

Each method is serverless, production-ready, and fully integrated with Amazon Cognito for secure access control. AWS provides the flexibility to design high-quality AI user experiences at scale.

Refer to GitHub sample source code for more details.

Comparative table

Feature LAMBDA FUNCTION URLS API GATEWAY WEBSOCKET APIs APPSYNC GRAPHQL SUBSCRIPTIONS
Complexity Lowest Medium High
Real-time focus Limited Strong Strong
Authentication Needs custom logic Needs custom logic Built-in Amazon Cognito support
Scalability Good Good Excellent
GraphQL support None None Native
Use cases Q&A Chatbots, real-time apps Complex apps, multi-user scenarios
Cost Pay per invocation Connection time and Lambda execution Request/connection-based pricing

 

Optimize latency-sensitive workloads with Amazon EC2 detailed NVMe statistics

Post Syndicated from Sanjeev Malladi original https://aws.amazon.com/blogs/compute/optimize-latency-sensitive-workloads-with-amazon-ec2-detailed-nvme-statistics/

Amazon Elastic Cloud Compute (Amazon EC2) instances with locally attached NVMe storage can provide the performance needed for workloads demanding ultra-low latency and high I/O throughput. High-performance workloads, from high-frequency trading applications and in-memory databases to real-time analytics engines and AI/ML inference, need comprehensive performance tracking. Operating system tools like iostat and sar provide valuable system-level insights, and Amazon CloudWatch offers important disk IOPs and throughput measurements, but high-performance workloads can benefit from even more detailed visibility into instance store performance.

For latency-sensitive applications where every millisecond counts, enhanced performance monitoring tools provide deep visibility into storage systems, so your teams can track and analyze behavior at a 1 second granularity. This detailed insight can help your organization detect bottlenecks quickly, fine-tune application performance, and deliver reliable service.

In this post, we discuss how you can use Amazon EC2 detailed performance statistics for instance store NVMe volumes, a set of new metrics that provide per-second granularity, to provide real-time visibility into your locally attached storage performance. These statistics are similar to the Amazon EBS detailed performance statistics, providing a consistent monitoring experience across both storage types. You can access these statistics directly from your NVMe devices attached to the Amazon EC2 instance using nvme-cli or using CloudWatch agent to monitor I/O performance at the storage level. We also provide examples of how to use these statistics to identify performance bottlenecks.

Feature overview

Amazon EC2 Nitro-based instances with locally attached NVMe instance storage now offer 11 comprehensive metrics at per-second granularity. These metrics, similar to EBS volume metrics, include queue length measurements, IOPS, throughput data, and IO latency histograms for the locally attached NVMe instance storage. Additionally, they also include IO size-specific latency histograms to provide even more detailed insights into performance patterns of the local NVMe instance storage. These metrics are collected and presented separately for each individual NVMe volume available on an instance.

The statistics are presented in three main formats:

    1. Cumulative counters that track IO operations, throughput, and read/write times
    2. Real-time queue length, displaying the current value at the time of your query
    3. Latency histograms visualizing the distribution of IO operations across different latency ranges by displaying both cumulative view and IO size-specific distributions

Prerequisites

To access detailed performance statistics for local instance storage, complete the following steps:

    1. Launch a new Amazon EC2 Nitro instance or use an existing one, then connect to it using SSH or your preferred connection method.
    2. Identify the NVMe device associated with the local storage to query for the performance statistics. For example, you can run the nvme-cli command in the CLI to output all NVMe devices on the instance.
      $ sudo nvme list.

      The following is an example output of the list command that lists the NVMe devices on the instance and their volume Serial Numbers (SN; masked in the below output for privacy). In this demonstration, consider that the local storage used by your application is /dev/nvme1n1.

      Terminal output showing five NVMe devices: one EBS volume and four EC2 instance storage volumes with 3.75TB capacity each

    3. If you are using Amazon Linux 2023 version 2023.8.20250915 (or later) or Amazon Linux 2 2.0.20251014.0 (or later) you can proceed to Step 4 because nvme-cli will use the latest version. If you are using an earlier Amazon Linux version, update the nvme-cli using the following command, where 2023.8.20250915 can be replaced with the latest Amazon Linux 2023 version:
      $ sudo dnf upgrade --releasever=2023.8.20250915
    4. Run the nvme-cli, with the correct permissions, and pass the device as a parameter. You can use --help to get details on the command usage:
      $ sudo nvme amzn stats --help

      Example output:
      Command help output for 'nvme amzn stats' showing usage syntax and format options
      If you prefer output in a JSON format, you can provide the -o json parameter to the command.

      $ sudo nvme amzn stats /dev/nvme1n1 -o json

      The following output (without the -o json parameter) shows cumulative read/write operations, read/write bytes, total processing time (read and write in microseconds), and duration (in microseconds) when application attempted to exceed the instance’s IOPS/throughput limits.
      Storage performance metrics showing read operations count, total bytes, and timing statistics for an EC2 NVMe volume
      It also displays read/write I/O latency histograms, with each row representing completed I/O operations within a specific bin of time (in microseconds).
      Read latency distribution histogram showing operation counts across different microsecond ranges, with peak activity in 2048-4096 rangeWrite latency distribution histogram showing zero operations across all time ranges, indicating no write activity
      If you want to view the latency histograms across 5 different IO bands: (0, 512 Byte], (512B, 4KiB], (4KiB, 8KiB], (8KiB 32KiB], (32 KiB, MAX], you can provide --details or -d parameter to the command:

      $ sudo nvme amzn stats -d /dev/nvme1n

      The following image is an excerpt of the above command’s output, showing the additional latency histograms (read and write) of the 5 different IO bands.
      Dual read/write I/O latency histogram analyzing small block operations from 0-512 bytes with peak at 4096-8192 rangePerformance analysis histogram showing I/O patterns for 512-4K blocks with significant activity in 512-1024 rangeDual histogram showing I/O latency patterns for 4K-8K block operations with concentrated activity at 4096-8192Performance analysis histogram displaying I/O patterns for 8K-32K blocks with peak activity in 4096-8192 rangeComprehensive I/O latency histogram analyzing largest block sizes from 32K to maximum with concentrated activity in 4096-8192

You can run the stats command at a per second granularity. You can also write scripts to pull the stats at a desired interval (every second or any other duration) with each subsequent output reflecting the updated cumulative totals for the metrics. Calculating the difference in the statistics across the last two outputs allows you to derive insight into the instance storage profile during the interval. Below is a sample script you can use to pull the stats at a default interval of 1 second or at your desired interval.

#!/bin/bash 
# interval of 1 second 
INTERVAL=${1:-1} 
while true; do 
	echo "=== $(date) ===" 
	sudo nvme amzn stats /dev/nvme1 || break 
	echo 
	sleep $INTERVAL 
done

You can save this script, make it executable and run it at either the default 1-second interval or provide a custom interval when executing the script. For example, if you saved the script as nvme_stats.sh, you could use the following commands to make it executable and run to get the output at the default 1-second interval (assuming you are in the same directory as that of nvme_stats.sh).

chmod +x nvme_stats.sh
./nvme_stats.sh

If, for instance, you want to get the output at every 5 seconds, you can use the command below (after making the script executable)

./nvme_stats.sh 5

You can also integrate with CloudWatch using CloudWatch agent to collect and publish these statistics for historical tracking, trend visualization through dashboards, and performance-based alerts to correlate with application metrics and automated notifications for performance issues.

Deriving insights from the Amazon EC2 instance store NVMe detailed performance statistics

Similar to EBS detailed performance statistics, you can use Amazon EC2 instance store NVMe statistics to troubleshoot various workload performance issues. As mentioned in the preceding section, you can also use the detailed statistics to view I/O latency histograms to observe the spread of I/O latency within the period. You can use the read/write operations and time spent statistics to calculate the average latency. The detailed statistics show the average latency at per-second granularity.

The next two example scenarios demonstrate key performance analysis using the statistics. In Scenario 1, we will use the EC2 Instance Local Storage Performance Exceeded (us) metric to check if I/O demands exceed instance storage capabilities, helping with instance right-sizing for sufficient I/O application performance. In Scenario 2, we will use IO-size specific histograms (using --details) to diagnose how large block writes affect subsequent read performance – an issue typically hidden by traditional monitoring tools’ aggregated metrics across all IO sizes.

Scenario 1: Identifying when applications exceed instance storage performance limits

Understanding whether your application’s I/O demands exceed your instance store volumes’ capabilities is important for performance troubleshooting. When applications generate I/O workloads that consistently attempt to exceed the IOPS and throughput limits of specific Amazon EC2 instance types, you’ll experience increased latency and degraded performance. The EC2 Instance Local Storage Performance Exceeded (us) metric helps identify these scenarios by showing the duration (in microseconds) when workloads exceeded supported instance performance. A non-zero value or increasing count between snapshots indicates your current instance size or type may not provide sufficient I/O performance for your application.

The following section shows how to identify if an application is sending more IOPS than the instance’s local storage can support.

The example scenario: An application on an i3en.xlarge instance shows elevated write latency of >1ms. You want to determine if the application’s workload is exceeding the instance’s NVMe volume supported performance.

    1. Select the Instance Storage NVMe device you want to analyze – Identify the instance you want to analyze for the application experiencing elevated latency.
    2. Identify the NVMe device – Use the following nvme-cli command, and identify the NVMe device associated with that instance storage.
      $ sudo nvme list

      Example scenario: We used the list and identified /dev/nvme1n1 as the NVMe device associated with the i3en.xlarge instance that is running the application which is currently seeing elevated write latency >1ms (while read latency is <50us as per normal conditions), so now we want to. analyze it.

    3. Collect statistics for the device at a single point in time or at desired intervals – Collect the detailed performance statistics using the nvme-cli command or use the sample script provided in previous section to capture statistics at the desired intervals, if needed.
      $ sudo nvme amzn stats /dev/nvme1n1

      Example scenario: We choose to collect the statistics only once after noticing elevated write latency of the application.

    4. Analyze the statistics to check if the application demands more than the supported performance of the instance storage – Confirm existence of overall I/O latency degradation by comparing two sets of read/write I/O latency histograms taken some time apart.Example scenario: The following output shows Read IO histogram of the NVMe local instance storage taken 40 seconds apart with no read IO latency issues (as normal read latency for this workload is < 50 us).

      Metric captured at time T:
      AWS EC2 storage performance histogram showing read latency distribution, peak at 16-32 microsecond bucket
      Metric captured at time T+40s:
      AWS EC2 storage performance data showing increased read latency concentration in 16-32 microsecond bucket
      The following output shows Write IO histogram taken 40 seconds apart. We can discern that many write IOs fall into the 1ms – 2ms latency range, which is not expected for this application.
      Metric captured at time T:
      AWS EC2 storage write performance data showing majority of operations between 1-2ms latency
      Metric captured at time T+40s:
      AWS EC2 storage performance metrics showing increased write operations clustered in 1-2ms latency range

    5. Analyze the EC2 Instance Local Storage Performance Exceeded (us) metric which shows total time (in microseconds) IOPS requests exceed volume limits. Ideally, the incremental count of this metric between two snapshot times should be minimal, as any value above 0 indicates that the workload demanded more IOPS than the volume could deliver.Example scenario: Comparing metrics 40 seconds apart shows that for more than 34 seconds, the application’s IOPS demands surpassed the IOPS supported by the local instance storage. This explains elevated write latency: excess IOPS above what the underlying storage can physically handle queue the operations, increasing wait times. This indicates that the i3en.xlarge instance chosen to run this application cannot meet the application’s performance requirements, suggesting either upgrading to a larger instance size or re-evaluating the instance type itself.
      Metric captured at time T:
      EC2 Instance Local Storage Performance exceeded output of nvme-cli for the described scenario at time T
      Metric captured at time T+40s:
      EC2 Instance Local Storage Performance exceeded output of nvme-cli for the described scenario at time T+40 with increased count of metric

It’s important to have the right instance size to avoid performance bottlenecks to your application. Refer to the Amazon EC2 instance documentation for more information on the different instances and their storage size.

Scenario 2: Identifying the block size causing elevated latency in your applications

Many storage performance issues arise from complex interactions between read and write operations with different I/O sizes, which traditional system-level monitoring tools like iostat or sar cannot effectively diagnose due to their aggregated metrics across all I/O sizes. EC2 instance store NVMe detailed performance statistics solves this by providing I/O-size specific latency histograms through the --details option in NVMe CLI. These histograms show latency data for different I/O size ranges: (0, 512 Byte], (512B, 4KiB], (4KiB, 8KiB], (8KiB, 32KiB], (32KiB, MAX], for a more precise correlation between application workload patterns and I/O size-specific latency metrics for targeted optimizations.

In this example scenario, your application performs small reads (typically <=4KiB, like metadata read) followed by large writes (>=32KiB) and shows unexpectedly high read latency. This common issue occurs when large writes impact subsequent read operations’ performance, creating a cascading effect on overall I/O performance.

    1. Gather read and write IO latency by size ranges – Use the NVMe CLI with the --details option to gather read and write IO latency by size ranges:
      $ sudo nvme amzn stats /dev/nvme1n1 --details

    2. Confirm existence of overall IO latency degradation – In the example scenario, examining overall IO latency, both read (left) and write (right) operations are showing higher than expected latency.
      NVMe storage read latency histogram highlighting concentrated IO operations in 4K-16K microsecond rangeNVMe storage write latency histogram highlighting concentrated IO operations in 8-32K microsecond range
    3. Examine the output for patterns across different IO size bands – Analyzing latency by operation sizes shows small read operations (512 bytes to 4K), typically fast, are experiencing unexpected latency spikes while large writes (32K+) show significant delays. Small reads should theoretically maintain good performance regardless of other I/O activities.
      NVMe storage read/write latency histogram highlighting concentrated IO operations in 8-16K microsecond range for IO band of 512 - 4KNVMe storage read/write latency histogram highlighting concentrated IO operations in 8-16K microsecond range in IO band 32K and above
      The observed pattern indicates that the backed-up large write operations create system-wide congestion, affecting all I/O operations of types and sizes. Despite the storage system’s capability to handle small reads efficiently, the queued large writes slow down both read and write operations at the application level.

Based on this analysis, we can implement several targeted optimizations to the application, like using smaller block sizes for write operations when possible, or batching smaller writes instead of performing large single writes.

Clean up

If you created an Amazon EC2 instance with NVMe volume for this exercise, then terminate and delete the appropriate instance to avoid future costs.

Conclusion

Amazon EC2 detailed performance statistics for instance store NVMe volumes provide real-time, sub-minute storage performance monitoring, similar to the detailed performance statistics available for Amazon EBS volumes. This offers consistent monitoring experience across both storage types, with additional IO-size based latency histograms for instance storage for better optimization of I/O patterns, and more effective troubleshooting.

To learn more about Amazon EC2 instance store NVMe volumes, optimization techniques for latency-sensitive workloads or other Amazon EC2 related topics, visit the Amazon EC2 documentation page or explore our other AWS Storage Blog posts on performance optimization.

We’d love to hear how you’re using these statistics to enhance your workloads, or if you have any questions, in the comments section below.

Streamlined multi-tenant application development with tenant isolation mode in AWS Lambda

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/streamlined-multi-tenant-application-development-with-tenant-isolation-mode-in-aws-lambda/

Multi-tenant applications often require strict isolation when processing tenant-specific code or data. Examples include software-as-a-service (SaaS) platforms for workflow automation or code execution where customers need to ensure that execution environments used for individual tenants or end users remain completely separate from one another. Traditionally, developers have addressed these requirements by deploying separate Lambda functions for each tenant or implementing custom isolation logic within shared functions which increased architectural and operational complexity.

Today, AWS Lambda introduces a new tenant isolation mode that extends the existing isolation capabilities in Lambda. Lambda already provides isolation at the function level, and this new mode extends isolation to the individual tenant or end-user level within a single function. This built-in capability processes function invocations in separate execution environments for each tenant, enabling you to meet strict isolation requirements without additional implementation effort to manage tenant-specific resources within function code.

Here’s how you can enable tenant isolation mode in the AWS Lambda console:

When using the new tenant isolation capability, Lambda associates function execution environments with customer-specified tenant identifiers. This means that execution environments for a particular tenant aren’t used to serve invocation requests from other tenants invoking the same Lambda function.

The feature addresses strict security requirements for SaaS providers processing sensitive data or running untrusted tenant code. You maintain the pay-per-use and performance characteristics of AWS Lambda while gaining execution environment isolation. Additionally, this approach delivers the security benefits of per-tenant infrastructure without the operational overhead of managing dedicated Lambda functions for individual tenants, which can quickly grow as customers adopt your application.

Getting started with AWS Lambda tenant isolation
Let me walk you through how to configure and use tenant isolation for a multi-tenant application.

First, on the Create function page in the AWS Lambda console, I choose Author from scratch option.

Then, under Additional configurations, I select Enable under Tenant isolation mode. Note that, tenant isolation mode can only be set during function creation and can’t be modified for existing Lambda functions.

Next, I write Python code to demonstrate this capability. I can access the tenant identifier in my function code through the context object. Here’s the full Python code:

import json
import os
from datetime import datetime

def lambda_handler(event, context):
    tenant_id = context.tenant_id
    file_path = '/tmp/tenant_data.json'

    # Read existing data or initialize
    if os.path.exists(file_path):
        with open(file_path, 'r') as f:
            data = json.load(f)
    else:
        data = {
            'tenant_id': tenant_id,
            'request_count': 0,
            'first_request': datetime.utcnow().isoformat(),
            'requests': []
        }

    # Increment counter and add request info
    data['request_count'] += 1
    data['requests'].append({
        'request_number': data['request_count'],
        'timestamp': datetime.utcnow().isoformat()
    })

    # Write updated data back to file
    with open(file_path, 'w') as f:
        json.dump(data, f, indent=2)

    # Return file contents to show isolation
    return {
        'statusCode': 200,
        'body': json.dumps({
            'message': f'File contents for {tenant_id} (isolated per tenant)',
            'file_data': data
        })
    }

When I’m finished, I choose Deploy. Now, I need to test this capability by choosing Test. I can see on the Create new test event panel that there’s a new setting called Tenant ID.

If I try to invoke this function without a tenant ID, I’ll get the following error “Add a valid tenant ID in your request and try again.”

Let me try to test this function with a tenant ID called tenant-A.

I can see the function ran successfully and returned request_count: 1. I’ll invoke this function again to get request_count: 2.

Now, let me try to test this function with a tenant ID called tenant-B.

The last invocation returned request_count: 1 because I never invoked this function with tenant-B. Each tenant’s invocations will use separate execution environments, isolating the cached data, global variables, and any files stored in /tmp.

This capability transforms how I approach multi-tenant serverless architecture. Instead of wrestling with complex isolation patterns or managing hundreds of tenant-specific Lambda functions, I let AWS Lambda automatically handle the isolation. This keeps tenant data isolated across tenants, giving me confidence in the security and separation of my multi-tenant application.

Additional things to know
Here’s a list of additional things you need to know:

  • Performance — Same-tenant invocations can still benefit from warm execution environment reuse for optimal performance.
  • Pricing — You’re charged when Lambda creates a new tenant-aware execution environment, with the price depending on the amount of memory you allocate to your function and the CPU architecture you use. For more details, view AWS Lambda pricing.
  • Availability — Available now in all commercial AWS Regions except Asia Pacific (New Zealand), AWS GovCloud (US), and China Regions.

This launch simplifies building multi-tenant applications on AWS Lambda, such as SaaS platforms for workflow automation or code execution. Learn more about how to configure tenant isolation for your next multi-tenant Lambda function in the AWS Lambda Developer Guide.

Happy building!
Donnie

Monitor network performance and traffic across your EKS clusters with Container Network Observability

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/monitor-network-performance-and-traffic-across-your-eks-clusters-with-container-network-observability/

Organizations are increasingly expanding their Kubernetes footprint by deploying microservices to incrementally innovate and deliver business value faster. This growth places increased reliance on the network, giving platform teams exponentially complex challenges in monitoring network performance and traffic patterns in EKS. As a result, organizations struggle to maintain operational efficiency as their container environments scale, often delaying application delivery and increasing operational costs.

Today, I’m excited to announce Container Network Observability in Amazon Elastic Kubernetes Service (Amazon EKS), a comprehensive set of network observability features in Amazon EKS that you can use to better measure your network performance in your system and dynamically visualize the landscape and behavior of network traffic in EKS.

Here’s a quick look at Container Network Observability in Amazon EKS:

Container Network Observability in EKS addresses observability challenges by providing enhanced visibility of workload traffic. It offers performance insights into network flows within the cluster and those with cluster-external destinations. This makes your EKS cluster network environment more observable while providing built-in capabilities for more precise troubleshooting and investigative efforts.

Getting started with Container Network Observability in EKS

I can enable this new feature for a new or existing EKS cluster. For a new EKS cluster, during the Configure observability setup, I navigate to the Configure network observability section. Here, I select Edit container network observability. I can see there are three included features: Service map, Flow table, and Performance metric endpoint, which are enabled by Amazon CloudWatch Network Flow Monitor.

On the next page, I need to install the AWS Network Flow Monitor Agent.

After it’s enabled, I can navigate to my EKS cluster and select Monitor cluster.

This will bring me to my cluster observability dashboard. Then, I select the Network tab.


Comprehensive observability features
Container Network Observability in EKS provides several key features, including performance metrics, service map, and flow table with three views: AWS service view, cluster view, and external view.

With Performance metrics, you can now scrape network-related system metrics for pods and worker nodes directly from the Network Flow Monitor agent and send them to your preferred monitoring destination. Available metrics include ingress/egress flow counts, packet counts, bytes transferred, and various allowance exceeded counters for bandwidth, packets per second, and connection tracking limits. The following screenshot shows an example of how you can use Amazon Managed Grafana to visualize the performance metrics scraped using Prometheus.


With the Service map feature, you can dynamically visualize intercommunication between workloads in your cluster, making it straightforward to understand your application topology with a quick look. The service map helps you quickly identify performance issues by highlighting key metrics such as retransmissions, retransmission timeouts, and data transferred for network flows between communicating pods.

Let me show you how this works with a sample e-commerce application. The service map provides both high-level and detailed views of your microservices architecture. In this e-commerce example, we can see three core microservices working together: the GraphQL service acts as an API gateway, orchestrating requests between the frontend and backend services.

When a customer browses products or places an order, the GraphQL service coordinates communication with both the products service (for catalog data, pricing, and inventory) and the orders service (for order processing and management). This architecture allows each service to scale independently while maintaining clear separation of concerns.

For deeper troubleshooting, you can expand the view to see individual pod instances and their communication patterns. The detailed view reveals the complexity of microservices communication. Here, you can see multiple pod instances for each service and the network of connections between them.

This granular visibility is crucial for identifying issues like uneven load distribution, pod-to-pod communication bottlenecks, or when specific pod instances are experiencing higher latency. For example, if one GraphQL pod is making disproportionately more calls to a particular products pod, you can quickly spot this pattern and investigate potential causes.

Use the Flow table to monitor the top talkers across Kubernetes workloads in your cluster from three different perspectives, each providing unique insights into your network traffic patterns.

Flow table – Monitor the top talkers across Kubernetes workloads in your cluster from three different perspectives, each providing unique insights into your network traffic patterns:

  • AWS service view shows which workloads generate the most traffic to Amazon Web Services (AWS) services such as Amazon DynamoDB and Amazon Simple Storage Service (Amazon S3), so you can optimize data access patterns and identify potential cost optimization opportunities.
  • The Cluster view reveals the heaviest communicators within your cluster (east-west traffic), which means you can spot chatty microservices that might benefit from optimization or colocation strategies
  • External viewidentifies workloads with the highest traffic to destinations outside AWS (internet or on premises), which is useful for security monitoring and bandwidth management.

The flow table provides detailed metrics and filtering capabilities to analyze network traffic patterns. In this example, we can see the flow table displaying cluster view traffic between our e-commerce services. The table shows that the orders pod is communicating with multiple products pods, transferring amounts of data. This pattern suggests the orders service is making frequent product lookups during order processing.

The filtering capabilities are useful for troubleshooting, for example, to focus on traffic from a specific orders pod. This granular filtering helps you quickly isolate communication patterns when investigating performance issues. For instance, if customers are experiencing slow checkout times, you can filter to see if the orders service is making too many calls to the products service, or if there are network bottlenecks between specific pod instances.

Additional things to know
Here are key points to note about Container Network Observability in EKS:

  • Pricing – For network monitoring, you pay standard Amazon CloudWatch Network Flow Monitor pricing.
  • Availability – Container Network Observability in EKS is available in all commercial AWS regions where Amazon CloudWatch Network Flow Monitor is available.
  • Export metrics to your preferred monitoring solution – Metrics are available in OpenMetrics format, compatible with Prometheus and Grafana. For configuration details, refer to Network Flow Monitor documentation.

Get started with Container Network Observability in Amazon EKS today to improve network observability in your cluster.

Happy building!
Donnie

Accelerate large-scale AI applications with the new Amazon EC2 P6-B300 instances

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/accelerate-large-scale-ai-applications-with-the-new-amazon-ec2-p6-b300-instances/

Today, we’re announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) P6-B300 instances, our next-generation GPU platform accelerated by NVIDIA Blackwell Ultra GPUs. These instances deliver 2 times more networking bandwidth, and 1.5 times more GPU memory compared to previous generation instances, creating a balanced platform for large-scale AI applications.

With these improvements, P6-B300 instances are ideal for training and serving large-scale AI models, particularly those employing sophisticated techniques such as Mixture of Experts (MoE) and multimodal processing. For organizations working with trillion-parameter models and requiring distributed training across thousands of GPUs, these instances provide the perfect balance of compute, memory, and networking capabilities.

Improvements made compared to predecessors
The P6-B300 instances deliver 6.4Tbps Elastic Fabric Adapter (EFA) networking bandwidth, supporting efficient communication across large GPU clusters. These instances feature 2.1TB of GPU memory, allowing large models to reside within a single NVLink domain, which significantly reduces model sharding and communication overhead. When combined with EFA networking and the advanced virtualization and security capabilities of AWS Nitro System, these instances provide unprecedented speed, scale, and security for AI workloads.

The specs for the EC2 P6-B300 instances are as follows.

Instance size VCPUs System memory GPUs GPU memory GPU-GPU interconnect EFA network bandwidth ENA bandwidth EBS bandwidth Local storage
P6-B300.48xlarge 192 4TB 8x B300 GPU 2144GB HBM3e 1800 GB/s 6.4 Tbps 300 Gbps 100 Gbps 8x 3.84TB

Good to know
In terms of persistent storage, AI workloads primarily use a combination of high performance persistent storage options such as Amazon FSx for Lustre, Amazon S3 Express One Zone, and Amazon Elastic Block Store (Amazon EBS), depending on price performance considerations. For illustration, the dedicated 300Gbps Elastic Network Adapter (ENA) networking on P6-B300 enables high-throughput hot storage access with S3 Express One Zone, supporting large-scale training workloads. If you’re using FSx for Lustre, you can now use EFA with GPUDirect Storage (GDS) to achieve up to 1.2Tbps of throughput to the Lustre file system on the P6-B300 instances to quickly load your models.

Available now
The P6-B300 instances are now available through Amazon EC2 Capacity Blocks for ML and Savings Planin the US West (Oregon) AWS Region.
For on-demand reservation of P6-B300 instances, please reach out to your account manager. As usual with Amazon EC2, you pay only for what you use. For more information, refer to Amazon EC2 Pricing. Check out the full collection of accelerated computing instances to help you start migrating your applications.

To learn more, visit our Amazon EC2 P6-B300 instances page. Send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

– Veliswa

The attendee’s guide to the AWS re:Invent 2025 Compute track

Post Syndicated from Mai Kulkarni original https://aws.amazon.com/blogs/compute/the-attendees-guide-to-the-aws-reinvent-2025-compute-track/

From December 1st to December 5th, Amazon Web Services (AWS) will hold its annual premier learning event: re:Invent. At this event, attendees can become stronger and more proficient in any area of AWS technology through a variety of experiences: large keynotes given by AWS leaders, smaller innovation talks, interactive working sessions given by AWS experts, and fun activities such as live music and games at re:Play.

There are over 2000+ learning sessions that focus on specific topics at various skill levels, and the compute team have created 76 unique sessions for you to choose. There are many sessions you can choose from, and we are here to help you choose the sessions that best fit your needs. Even if you cannot join in person, you can catch-up with many of the sessions on-demand and even watch the keynote and innovation sessions live.

The basics: Session types

If you can join us, then remember that we offer several types of sessions that can help maximize your learning in a variety of AWS topics.

re:Invent attendees can also choose to attend chalk-talks, builder sessions, workshops, or code talk sessions. Each of these are live non-recorded interactive sessions.

  • Breakout sessions: Attendees are in a lecture-style 60-minute informative sessions presented by AWS experts, customers, or partners. These sessions are recorded and uploaded a few days after to the AWS Events YouTube channel.
  • Chalk-talk sessions: Attendees interact with presenters, asking questions, and using a whiteboard in session.
  • Builder Sessions: Attendees participate in a one-hour session and build something.
  • Workshops sessions: Attendees join a two-hour interactive session where they work in a small team to solve a real problem using AWS services.
  • Code talk sessions: Attendees participate in engaging code-focused sessions where an expert leads a live coding session.
  • Lightning talk sessions: Attendees watch a 20-minute demo dedicated to either a specific service or customer story (located in the Venetian Expo Hall or Mandalay Bay Level 2 South).

Getting started with Amazon EC2

The foundation of compute in AWS is Amazon Elastic Compute Cloud (Amazon EC2). Amazon EC2 offers the broadest and deepest compute platform, with over 1000 instances and choice of the latest processor, storage, networking, operating system, and purchase model to help you best match the needs of your workload. We’ve created the following sessions to help you implement and manage your workloads on EC2.

CMP356 | How well do you know EC2

EC2 offers 1000+ instance types with diverse processors, accelerators, and the AWS Nitro System. Options include cost-effective Spot Instances and Savings Plans. Learn how to optimize workload-instance matching for better performance and savings.

CMP343 | Select and launch the right instance for your workload and budget

Explore the newest EC2 instances featuring Intel Xeon Scalable (Granite Rapids), AMD EPYC (Turin), and AWS Graviton processors. Learn how to choose the optimal instance type for your workload and budget requirements.

CMP305 | Assembling the Complete AI Stack: Optimizing your AI hardware on AWS

Learn how to optimize your AI infrastructure on AWS: Choose the right processors, accelerators, storage, and pricing models for your workloads. Get practical guidance on GPU selection, vector databases, and building cost-effective, scalable AI platforms.

CMP332 | Mastering EC2 Image Builder: From basics to advanced techniques

Hands-on session: Build an automated image pipeline with AWS experts. Learn the basics and advanced features such as multi-account distribution and continuous integration/continuous development (CI/CD) integration in 60 minutes.

CMP331 | Managing Amazon EC2 capacity and availability

Learn how to optimize EC2 costs and capacity using different reservation models, such as On-Demand, Capacity Blocks for machine learning (ML), and capacity reservations, to improve efficiency and availability.

CMP330 | Use Auto Scaling to proactively scale and optimize EC2 workloads

Learn how to harness the latest features of EC2 Auto Scaling to optimize your cloud resources. This hands-on workshop covers predictive scaling, dynamic scaling, and warm pools to automatically manage capacity based on demand. This is perfect for those wanting to improve application availability while reducing costs. Bring your laptop for practical exercises.

Learn about AWS Compute innovations

AWS has invested years into designing custom silicon optimized for the cloud to deliver the best price performance for a wide range of applications and workloads using AWS services. Learn more about the AWS Nitro System, processors at AWS, and ML chips.

CMP316 | Deep Dive into the AWS Nitro System

Explore the architecture behind the groundbreaking AWS Nitro System: the custom hardware and security components driving modern EC2 instances. Learn how this innovative platform enables unprecedented compute, storage, and networking capabilities, and discover the latest advances making new cloud possibilities reality.

CMP307 | AWS Graviton: The best price performance for your AWS workloads

Explore how AWS Graviton processors deliver superior performance and energy efficiency in EC2. Learn optimization best practices, common use cases, and customer success stories to accelerate your AWS Graviton adoption journey.

CMP336 | Optimize network and Amazon EBS intensive workloads on Amazon EC2 instances

Discover how to maximize the EC2 network and Amazon Elastic Block Store (Amazon EBS)-optimized instances for high-performance workloads. Learn to use new AWS Graviton and Intel instances for security appliances, databases, and network-intensive applications. Get practical insights into the latest networking and storage technologies to optimize your EC2 workload performance.

CMP315 | Maximizing EC2 Local NVMe Storage: Enhanced NVMe Metrics and Kubernetes Integration

Learn to optimize data-intensive workloads using AWS Nitro SSDs. Explore new performance metrics (latency, IOPS, throughput) and best practices for monitoring and tuning application performance.

CMP407 | Innovating with AWS confidential computing: An integrated approach

Learn how AWS confidential computing (Nitro System, Enclaves, TPM) protects sensitive data during processing. Explore solutions for secure data handling across CPU, GPU, and AI workloads.

CMP302 | Accelerating engineering: Cross-industry HPC cloud transformations

Discover how AWS high performance computing (HPC) transformed engineering and product development across industries. Learn how customers used cloud HPC to revolutionize their design processes to reduce time-to-market and increase innovation efficiency. Observe how HPC instances, Elastic Fabric Adapter (EFA), Amazon FSx for Lustre, and AWS ParallelCluster accelerate global R&D innovation.

Optimize your compute costs

At AWS, we focus on delivering the best possible cost structure for our customers. Frugality is one of our founding leadership principles. Cost effective design continues to shape everything we do, from how we develop products to how we run our operations. Come learn new ways to optimize your compute costs through AWS services, tools, and optimization strategies in the following sessions:

CMP347 | The Frugal Architect in a chaotic world

Discover the practical implementation of Werner Vogels’ Frugal Architect principles through a hands-on exploration of AWS Graviton, EC2 Spot, Karpenter, and AI tools. Watch as we optimize a shopping cart using AI and flame graphs, demonstrating how to build efficient systems without compromising quality. Learn to combine Karpenter’s intelligent scaling, the performance benefits of AWS Graviton, and AI-driven analysis to create systems that are faster, leaner, and more cost-effective by design.

CMP349 | 5-Star customer service: Duolingo’s path to compute savings

Learn how Duolingo partnered with their AWS Technical Account Manager to transform their cloud spending. Discover their successful transition to AWS Graviton processors, from initial cost analysis through enterprise-wide implementation. Observe how the AWS customer-focused approach delivered significant savings and business value for Duolingo.

CMP337 | Optimizing EC2: Hands-on strategies for cost-effective performance

Get hands-on with advanced EC2 instance optimization in this technical workshop. Learn to analyze workloads, measure performance metrics, and master benchmarking tools through guided exercises. Walk away with practical strategies to choose and tune EC2 instances for your specific application needs. Perfect for architects and developers looking to maximize their AWS infrastructure performance.

CMP314 | Data-driven EC2 optimization: Efficiency, metrics, and sustainability

Join this chalk talk to discover how metric-driven decisions can transform your EC2 fleet optimization. Through real-world scenarios, learn to analyze workload data, choose optimal instance types, and fine-tune capacity for your specific needs. We explore practical approaches to balance cost, performance, and sustainability using AWS-native tools, providing you with actionable strategies that you can implement immediately.

CMP412 | EC2 Flex instances: Get the latest generation performance at lower costs

Explore how EC2 Flex instances deliver the latest generation performance at reduced costs. Learn about optimal workload types, architectural design, and implementation strategies. Discover practical approaches to adoption and performance monitoring to maximize your EC2 Flex instance benefits.

Maximize your workload’s performance

Your workload’s performance matters beyond just cost because it directly impacts the quality, efficiency, and effectiveness of your compute solution. It can significantly influence customer satisfaction, business growth, and overall productivity. Even if a cheaper option exists, a low-cost option with poor performance can lead to long-term financial losses due to issues such as lost customers, engineering rework, and negative reputation. We have several sessions that help you optimize your workload’s performance.

CMP333 | Maximizing EC2 performance: A hands-on guide to instance optimization

Live coding session: Learn to optimize EC2 performance using Amazon CloudWatch and APerf. Observe real-world examples of workload analysis and code optimization across different instance types and programming languages.

CMP351 | Building for efficiency and reliability with performance testing on AWS

Learn performance testing strategies on AWS to optimize costs, identify bottlenecks, and improve reliability. Discover how to measure system behavior under various loads to inform architecture and instance selection decisions.

CMP405 | Everything you’ve wanted to know about performance on EC2 instances

Explore compute optimization techniques in this code talk. Learn about memory topology, hardware counters, hyperthreading effects, and methods for accurate performance testing and latency optimization.

Customer experience and applications with AI and ML

ML has been evolving for decades and has an inflection point with generative AI applications capturing widespread attention and imagination. Learn about generative AI infrastructure at Amazon or get hands-on experience building ML applications through our ML focused sessions, such as the following:

CMP201 | Architecting solution patterns for GPU-accelerated HPC and AI/ML

Interactive discussion on GPU-accelerated HPC and AI/ML architecture. Explore EC2 GPU instance families, architectural tradeoffs , and cost optimization strategies. Share your challenges and learn how to build scalable GPU solutions on AWS.

CMP403 | Build, scale, and optimize agentic AI on CPUs with AWS Graviton

Hands-on workshop: Build cost-efficient AI applications on AWS Graviton. Deploy large language model (LLM) inference, multi-agent systems, and vector databases using Amazon Elastic Kubernetes Service (Amazon EKS) and Karpenter. Create a chat app showcasing the performance benefits of AWS Graviton.

CMP346 | Supercharge ML and inference on Apple Silicon with EC2 Mac

Learn to optimize ML workloads on EC2 Mac instances with Apple silicon. Explore Apple Neural Engine, Core ML, and efficient PyTorch/TensorFlow deployment for iOS and cloud ML applications.

CMP338 | Protect privacy in generative AI applications using AWS Confidential Computing

Build three secure generative AI applications while learning to protect sensitive data in prompts, augmented sources, and model weights. Practice implementing AWS Confidential Computing features in EC2 to mitigate common security threats. Get hands-on experience using both open source models and Amazon Bedrock to create privacy-first AI solutions.

CMP410 | Secure generative AI using trusted execution environments

Hands-on session: Build a secure AI environment using Nitro TPM-enabled EC2 instances. Deploy an LLM with cryptographic attestation and learn to protect sensitive data using trusted execution environments.

Accelerate your AWS Graviton adoption journey

The AWS Graviton Processors are custom designed server processors designed by AWS. They deliver the best price performance for your cloud workloads running in AWS and help you reduce your carbon footprint. Ready to realize up to 40% better price performance for your workloads? We have curated the following session to help you accelerate your AWS Graviton adoption:

CMP329 | Learnings from developers adopting AWS Graviton at scale

Learn how the custom-designed AWS Graviton processors deliver optimal price-performance across diverse workloads: from microservices to HPC. Engage with AWS experts to explore adoption strategies, best practices, and real customer success stories for scaling AWS Graviton in production.

CMP352 | Unlock cost efficiency with AWS Graviton Savings Dashboard

Discover how the enhanced AWS Graviton Savings Dashboard provides deeper analytics for workload modernization, enabling up to 40% better price performance. Learn to use advanced features for granular workload analysis and streamlined migration planning. This lightning talk shows you how to transform efficiency insights into actionable strategies for measurable cloud cost savings.

CMP326 | Java modernization and performance optimization GameDay

Hands-on workshop: Use Amazon Q Developer to modernize Java applications from v8 to v21. Practice automated code analysis, performance benchmarking, and cost optimization across different instances. Laptop needed.

CMP335 | Optimize .NET TCO with agentic AI powered AWS Transform and AWS Graviton

Hands-on workshop: Use agentic AI to accelerate the migration of Windows-based .NET applications to .NET Core running on Linux with AWS Graviton for 40% better price performance. Learn code analysis, automated transformations, and CI/CD updates. For .NET developers/architects. Laptop needed.

Optimizing your container-based workloads

Maximizing the efficiency of container-based workloads is crucial for modern cloud applications. Whether you’re running microservices, web applications, or high-performance computing tasks, optimizing your container infrastructure can significantly impact both performance and cost. In this track, we’ve assembled essential sessions focused on using AWS Graviton processors and modernization tools to enhance your containerized applications. From real-world adoption stories to hands-on workshops, these sessions can help you achieve better price performance while maintaining operational excellence. Join us to explore the following:

CMP310 | Boost Amazon EKS efficiency: Amazon EKS Auto Mode, AWS Graviton, and EC2 Spot

Explore how Amazon EKS Auto Mode streamlines Kubernetes operations by removing infrastructure management complexity. Learn to optimize costs using AWS Graviton and EC2 Spot, with practical examples for building more efficient, cost-effective container environments.

CMP311 | Build once, run everywhere: Multi-architecture in your CI/CD pipelines

Learn to build multi-architecture containers for x86 and AWS Graviton processors. Observe how to optimize web applications for both platforms and integrate with CI/CD systems such as ArgoCD, GitLab, and GitHub.

CMP348 | Using Amazon Q to cost optimize your containerized workloads

Learn to achieve 40% better price-performance by migrating containerized workloads to AWS Graviton using Amazon EKS and Karpenter. Use Amazon Q to accelerate x86-to-Graviton migration, implement multi-architecture CI/CD pipelines, and optimize deployment strategies.

Quantum computing

Quantum computing is moving from theoretical possibility to practical reality, offering groundbreaking potential across industries. As organizations prepare for this technology, AWS provides the tools and infrastructure needed to explore quantum applications today. Through Amazon Braket, our managed quantum computing service, we’re making quantum experimentation accessible to enterprises, researchers, and developers alike. Whether you’re interested in drug discovery, optimization problems, or cybersecurity, this track offers a comprehensive journey from quantum basics to advanced hybrid solutions. Join industry leaders, such as AstraZeneca and Accenture, to discover how quantum computing is already delivering value and how you can begin your quantum journey:

CMP202 | Amazon Braket: Get hands-on with quantum computing

Get started with quantum computing in this practical workshop. Learn to implement quantum algorithms and run circuits on gate-based devices using Amazon Braket. Explore the quantum algorithm library of AWS through hands-on exercises. Bring your laptop to begin your quantum journey.

CMP209 | Amazon Braket hubs: Accelerating R&D in national quantum initiatives

Learn how AWS supports quantum computing research hubs worldwide, helping create secure environments and providing access to cutting-edge quantum technologies for researchers and startups.

CMP411 | Quantum computing with Amazon Braket: From exploration to enterprise

Explore quantum computing with Amazon Braket, featuring the AWS strategy and AstraZeneca’s drug discovery research. Learn how to combine quantum and classical workloads and prepare for future quantum technologies.

CMP205 | Q-CTRL Fire Opal on Amazon Braket: Quantum solutions from security to finance

Learn how organizations use Q-CTRL and Amazon Braket for quantum computing breakthroughs. Observe how Accenture Federal Services achieved 3x better network security detection using Q-CTRL’s optimizer, and explore quantum-classical solutions for various industries.

CMP304 | Architectures for hybrid quantum-classical workflows at scale

Learn to build hybrid quantum-classical computing solutions using Amazon Braket with AWS services (AWS Batch, AWS ParallelCluster) and GPU-accelerated instances. Explore architectures integrating CPUs, GPUs, and quantum processors using NVIDIA CUDA-Q.

Check out workload-specific sessions

EC2 offers the broadest and deepest compute platform to help you best match the needs of your workload. Join sessions focused on your specific workload to learn about how you can use AWS solutions to accelerate your innovations.

CMP207 | Startup to scale: Powering business growth with Amazon Lightsail

Get started in the cloud with just a few clicks with Amazon Lightsail. Discover how it can support your business at any stage of growth. Whether you’re launching your first cloud workload, migrating existing applications, or managing services for your customers, learn proven approaches for success. We explore how customers are using Lightsail today, including cost optimization and best practices for efficient scaling.

CMP320 | Full stack web apps on EC2: Using AWS Elastic Beanstalk with Amazon Q

Accelerate your cloud journey with AWS Elastic Beanstalk and Amazon Q. Learn how Elastic Beanstalk streamlines deployment and maintenance of full stack web applications on EC2 with automated infrastructure provisioning, while Amazon Q enhances your Elastic Beanstalk experience with natural language commands, intelligent troubleshooting guidance, and deployment best practices recommendations. This is perfect for teams ready to focus on building exceptional applications instead of managing infrastructure.

CMP334 | Modernize Apple platform development with AWS and EC2 Mac

Explore how EC2 Mac instances enable scalable, cost-effective macOS workloads on AWS. Learn about the latest features and hear a customer success story showcasing optimized Apple development workflows in the cloud.

CMP341 | SAP workloads on memory optimized Amazon EC2 instances

Discover how the memory-optimized instances (R, X, U) of EC2 revolutionize SAP HANA deployments, eliminating traditional infrastructure compromises. Learn from SAP’s experience managing RISE with SAP on AWS, and explore how high-memory instances can transform your SAP operations.

CMP319 | Exploring the spectrum of architecture patterns for 3D rendering

Explore the complete rendering toolkit of AWS for 3D and spatial applications: from GPU-powered EC2 instances to distributed rendering with Deadline Cloud and real-time GameLift Streams. Learn practical architecture patterns and cost optimization strategies to scale your rendering pipeline for games, architectural visualization, and AR/VR experiences.

CMP321 | Generative AI storyboarding: From Sketch to 3D Scene with generative AI on AWS

Learn to create visual content using Amazon Bedrock: convert sketches to storyboards, generate 2D/3D assets, and compose scenes. Explore AI-assisted workflows for film, games, and UI design while maintaining artistic control.

CMP211 | Hybrid science: AI + physics simulations for climate and life sciences

Explore how to combine AI with physics simulations using AWS services (such as AWS Batch, AWS ParallelCluster, Amazon FSx, EFA). Learn real-world patterns for integrating AI and simulation workflows in climate, weather, and healthcare applications.

CMP345 | Accelerate drug discovery R&D at scale with AWS

Interactive session on how top pharma companies use AWS for drug discovery R&D. Explore solutions for imaging, molecular simulation, and AI-driven research, with focus on managing large-scale data and diverse compute needs.

CMP350 | Accelerating vehicle innovation: ML and HPC best practices

Learn how Toyota and Deloitte transformed automotive engineering by migrating HPC and ML workloads to AWS. Using NVIDIA GPUs and EC2 HPC instances, they dramatically reduced development cycles. You can gain practical insights for your own high-performance computing initiatives.

CMP401 | Accelerating semiconductor design, simulation, and verification on AWS

This session covers the latest compute and storage innovations such as the new generation of EC2 instances powered by custom Intel Xeon Scalable processors (Granite Rapids), AMD EPYC processors (Turin), and AWS Graviton, and new features of Amazon FSx for NetApp ONTAP.

CMP406 | HPC infrastructure for financial services using AWS Batch and AWS CDK

Hands-on session: Build HPC infrastructure using AWS Cloud Development Kit (AWS CDK). Deploy AWS Batch for financial risk analysis workloads. This is suitable for HPC experts new to AWS and AWS developers new to HPC.

CMP204 | Quantum computing: Accelerating pharma innovation

Explore how Merck Sharp & Dohme partners with MathWorks and AWS to revolutionize pharmaceutical development through quantum computing. Using MATLAB and Amazon Braket, they implement QAOA for optimizing drug production and enhancing cancer diagnostics.

Ready to unlock new possibilities?

The AWS Compute team looks forward to seeing you in Las Vegas. Come meet us at the Compute Booth in the Expo and check out our various EC2 demos. And if you’re looking for more session recommendations, check-out more re:Invent attendee guides curated by experts.

Secure EKS clusters with the new support for Amazon EKS in AWS Backup

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/secure-eks-clusters-with-the-new-support-for-amazon-eks-in-aws-backup/

Today, we’re announcing support for Amazon EKS in AWS Backup to provide the capability to secure Kubernetes applications using the same centralized platform you trust for your other Amazon Web Services (AWS) services. This integration eliminates the complexity of protecting containerized applications while providing enterprise-grade backup capabilities for both cluster configurations and application data. AWS Backup is a fully managed service to centralize and automate data protection across AWS and on-premises workloads. Amazon Elastic Kubernetes Service (Amazon EKS) is a fully managed Kubernetes service to manage availability and scalability of the Kubernetes clusters. With this new capability, you can centrally manage and automate data protection across your Amazon EKS environments alongside other AWS services.

Until now, for backups, customers relied on custom solutions or third-party tools to back up their EKS clusters, requiring complex scripting and maintenance for each cluster. The support for Amazon EKS in AWS Backup eliminates this overhead by providing a single, centralized, and policy-driven solution that protects both EKS clusters (Kubernetes deployments and resources) and stateful data (stored in Amazon Elastic Block Store (Amazon EBS), Amazon Elastic File System (Amazon EFS), and Amazon Simple Storage Service (Amazon S3) only) without the need to manage custom scripts across clusters. For restores, customers were previously required to restore their EKS backups to a target EKS cluster which was either the source EKS cluster, or a new EKS cluster, requiring that an EKS cluster infrastructure is provisioned ahead of time prior to the restore. With this new capability, during a restore of EKS cluster backups, customers also have the option to create a new EKS cluster based on previous EKS cluster configuration settings and restore to this new EKS cluster, with AWS Backup managing the provisioning of the EKS cluster on the customer’s behalf.

This support includes policy-based automation for protecting single or multiple EKS clusters. This single data protection policy provides a consistent experience across all services AWS Backup supports. It allows creation of immutable backups to prevent malicious or inadvertent changes, helping customers meet their regulatory compliance needs. In case there is a customer data loss or cluster downtime event, customers can easily recover their EKS cluster data from encrypted, immutable backups using an easy-to-use interface and maintain business continuity of running their EKS clusters at scale.

How it works
Here’s how I set up support for on-demand backup of my EKS cluster in AWS Backup. First, I’ll show a walkthrough of the backup process, then demonstrate a restore of the EKS cluster.

Backup
In the AWS Backup console, in the left navigation pane, I choose Settings and then Configure resources to opt in to enable protection of EKS clusters in AWS Backup.

Now that I’ve enabled Amazon EKS, in Protected resources I choose Create on-demand backup to create a backup for my already existing EKS cluster floral-electro-unicorn.

Enabling EKS in Settings ensures that it shows up as a Resource type when I create on-demand backup for the EKS cluster. I proceed to select the EKS resource type and the cluster.

I leave the rest of the information as default, then select Choose an IAM role to select a role (test-eks-backup) that I’ve created and customized with the necessary permissions for AWS Backup to assume when creating and managing backups on my behalf. I choose Create on-demand backup to finalize the process.


The job is initiated, and it will start running to back up both the EKS cluster state and the persistent volumes. If Amazon S3 buckets are attached to the backup, you’ll need to add the additional Amazon S3 backup permissions AWSBackupServiceRolePolicyForS3Backup to your role. This policy contains the permissions necessary for AWS Backup to back up any Amazon S3 bucket, including access to all objects in a bucket and any associated AWS KMS key.


The job is completed successfully and now EKS clusterfloral-electro-unicorn is backed up by AWS Backup.


Restore
Using the AWS Backup Console, I choose the EKS backup composite recovery point to start the process of restoring the EKS cluster backups, then choose Restore.


I choose Restore full EKS cluster to restore the full EKS backup. To restore to an existing cluster, I Choose an existing cluster then select the cluster from the drop-down list. I choose the Default order as the order in which individual Kubernetes resources will be restored.

I then configure the restore for the persistent storage resources, that will be restored alongside my EKS clusters.


Next, I Choose an IAM role to execute the restore action. The Protected resource tags checkbox is selected by default and I’ll leave it as is, then choose Next.

I review all the information before I finalize the process by choosing Restore, to start the job.


Selecting the drop-down arrow gives details of the restore status for both the EKS cluster state and persistent volumes attached. In this walkthrough, all the individual recovery points are restored successfully. If portions of the backup fail, it’s possible to restore the successfully backed up persistent stores (for example, Amazon EBS volumes) and cluster configuration settings individually. However, it’s not possible to restore full EKS backup. The successfully backed up resources will be available for restore, listed as nested recovery points under the EKS cluster recovery point. If there’s a partial failure, there will be a notification of the portion(s) that failed.


Benefits
Here are some of the benefits provided by the support for Amazon EKS in AWS Backup:

  • A fully managed multi-cluster backup experience, removing the overhead associated with managing custom scripts and third-party solutions.
  • Centralized, policy-based backup management that simplifies backup lifecycle management and makes it seamless to back up and recover your application data across AWS services, including EKS.
  • The ability to store and organize your backups with backup vaults. You assign policies to the backup vaults to grant access to users to create backup plans and on-demand backups but limit their ability to delete recovery points after they’re created.

Good to know
The following are some helpful facts to know:

  • Use either the AWS Backup Console, API, or AWS Command Line Interface (AWS CLI) to protect EKS clusters using AWS Backup. Alternatively, you can create an on-demand backup of the cluster after it has been created.
  • You can create secondary copies of your EKS backups across different accounts and AWS Regions to minimize risk of accidental deletion.
  • Restoration of EKS backups is available using the AWS Backup Console, API, or AWS CLI.
  • Restoring to an existing cluster will not override the Kubernetes versions, or any data as restores are non-destructive. Instead, there will be a restore of the delta between the backup and source resource.
  • Namespaces can only be restored to an existing cluster to ensure a successful restore as Kubernetes resources may be scoped at the cluster level.

Voice of the customer

Srikanth Rajan, Sr. Director of Engineering at Salesforce said “Losing a Kubernetes control plane because of software bugs or unintended cluster deletion can be catastrophic without a solid backup and restore plan. That’s why it’s exciting to see AWS rolling out the new EKS Backup and Restore feature, it’s a big step forward in closing a critical resiliency gap for Kubernetes platforms.”

Now available
Support for Amazon EKS in AWS Backup is available today in all AWS commercial Regions (except China) and in the AWS GovCloud (US) where AWS Backup and Amazon EKS are available. Check the full Region list for future updates.

To learn more, check out the AWS Backup product page and the AWS Backup pricing page.

Try out this capability for protecting your EKS clusters in AWS Backup and let us know what you think by sending feedback to AWS re:Post for AWS Backup or through your usual AWS Support contacts.

Veliswa.

BASF Digital Farming builds a STAC-based solution on Amazon EKS

Post Syndicated from Kevin S. Ridolfi original https://aws.amazon.com/blogs/architecture/basf-digital-farming-builds-a-stac-based-solution-on-amazon-eks/

This post was co-written with Frederic Haase and Julian Blau with BASF Digital Farming GmbH.

At xarvio – BASF Digital Farming, our mission is to empower farmers around the world with cutting-edge digital agronomic decision-making tools. Central to this mission is our crop optimization platform, xarvio FIELD MANAGER, which delivers actionable insights through a range of geospatial assets, including satellite imagery, drone data, and application maps from sprayers.

In this post, we show you how we built a scalable geospatial data solution on AWS to efficiently catalog, manage, and visualize both raster and vector datasets through the web. We walk you through our solution based on the SpatioTemporal Asset Catalog (STAC) specification and the open source eoAPI ecosystem, detailing the solution architecture, key technologies, and lessons learned during deployment. This builds upon a previous post on efficient satellite imagery ingestion using AWS Serverless, extending our discussion to the full lifecycle of geospatial data management at scale.

Requirements for our geospatial data solution

BASF Digital Farming’s xarvio FIELD MANAGER platform operates at exceptional scale in the geospatial data ecosystem, processing hundreds of millions of satellite images that translate into STAC items, which further decompose into billions of individual geospatial artifacts. Unlike traditional satellite data providers such as European Space Agency (ESA) who work with predictable, structured data flows, we operate in an inherently dynamic agricultural environment where we ingest near-daily satellite imagery per field from a diverse array of sensors and providers globally. Our mission to support farmers worldwide with advanced digital agronomic decision advice demands a reliable, cloud-based infrastructure capable of handling this massive data velocity and volume and applying advanced quality assurance processes including cloud detection and anomaly detection algorithms. The platform’s true value emerges through our machine learning (ML) pipelines that transform raw satellite data into actionable insights. For example, estimating accurate absolute biomass such as Leaf Area Index (LAI) helps farmers make precise, data-driven agronomic decisions that optimize crop yield and resource utilization across fields worldwide.

STAC and eoAPI ecosystem

To efficiently manage our growing archive of geospatial data, we adopted the Spatio Temporal Asset Catalog (STAC) specification, an open standard that provides a common language to describe and catalog raster and vector datasets. With STAC, we can standardize metadata across diverse sources like satellite imagery, UAV datasets, and prescription maps, making it straightforward to search, filter, and retrieve assets across our platform. We built our platform using the eoAPI ecosystem, an integrated suite of open source tools designed to handle the full lifecycle of geospatial data on the cloud. At its core is pgSTAC, which provides a performant PostGIS-backed STAC API implementation. With pgSTAC, we can index millions of STACi Items efficiently, with support for spatial, temporal, and attribute-based filtering at scale. On top of that, we use Tiles in PostGIS (TiPG) to serve tiled vector data directly from our PostGIS database. This enables real-time visualization of field boundaries, management zones, and application histories as lightweight Mapbox Vector Tiles (MVT), without requiring an external tile server. For raster assets, including satellite and drone imagery, we rely on TiTiler, a modern dynamic tile server built for Cloud Optimized GeoTIFFs (COGs). With TiTiler, we can stream imagery on-demand as WMTS or XYZ tiles, perform dynamic rendering (such as NDVI or false color composites), and integrate seamlessly into web maps and mobile apps.

Solution overview

The following architecture diagram shows how we implemented our geospatial data platform on AWS. In this section, we explain each component of the architecture and how they work together to process millions of satellite images and geospatial assets daily. The solution uses Amazon Elastic Kubernetes Service (Amazon EKS) as the core computing platform, with Amazon Simple Storage Service (Amazon S3) for storage and Amazon Relational Database Service (Amazon RDS) for metadata management. We break down the architecture into four main layers: core services, storage, database, and ingestion.

A detailed AWS Cloud architecture visualization showcasing a complete geospatial data processing system across four distinct layers. The database layer features an EKS Cluster managing STAC, raster, and vector services, all connected to Amazon RDS through a proxy instance. The client layer supports both desktop and mobile access via Amazon API Gateway. The ingestion layer processes geospatial data streams through a STAC ingestor, feeding into a robust storage layer utilizing Cloud Optimized GeoTIFF and FlatGeobuf technologies. The architecture emphasizes scalability and efficient spatial data handling through PostgreSQL with pgstac extension, enabling seamless integration of various geospatial services and data formats.

Core services layer

The solution uses an EKS cluster hosting three key services:

  • stac-service – Implements the STAC API specification to catalog and serve metadata for both raster and vector datasets
  • raster-service – Powered by TiTiler, this service dynamically renders and tiles cloud-optimized raster data (for example, COGs) for seamless integration into web and mobile maps
  • vector-service – Built with TiPG, this component serves vector data (for example, boundaries or application zones) as tiled MVT layers directly from the database or from Amazon S3

These services are containerized and orchestrated within Kubernetes, allowing for high availability, modular separation, and simplified continuous integration and delivery (CI/CD) workflows.

KEDA-based automatic scaling

We use Kubernetes Event-Driven Autoscaling (KEDA) to scale our platform services dynamically based on real-time workloads. With KEDA, we can scale individual pods based on precise event-driven metrics such as the STAC ingestion queue depth or visualization request load. This supports responsive performance during peak activity while maintaining lean resource usage during idle periods, aligning perfectly with our need for elasticity in a data-intensive, variable-load environment.

Geospatial asset storage layer

The platform stores all raw and processed geospatial assets in S3 buckets, optimized for performance and durability. This layer holds COGs for raster imagery and FlatGeobuf or similar formats for vector data. These formats are chosen for their support of streaming access, indexing, and cloud-based performance.

Database layer

The metadata backbone of the system is a PostgreSQL database hosted on Amazon RDS, extended with the pgSTAC plugin. This setup enables efficient indexing and querying of millions of STAC items and collections. An RDS proxy sits in front of the database, providing connection pooling and resiliency, especially under bursty or concurrent access patterns common in geospatial applications.

Ingestion layer

An independent ingestion component handles batch or streaming geospatial data inputs. This component processes satellite imagery, drone data, or prescription maps and pushes relevant metadata into the STAC API and storage assets into Amazon S3. The ingestion engine is decoupled from serving infrastructure, enabling asynchronous and large-scale data loading.

Amazon API Gateway and clients

Public access to the platform is handled through Amazon API Gateway, allowing clients—whether browser-based or mobile—to interact securely with the services. The API gateway provides a unified entrypoint and is used for applying rate limiting, authorization, and routing policies.

Solution benefits

The solution offers the following benefits:

  • Rapid onboarding with STAC standardization – By aligning with the STAC specification, we’ve significantly reduced the time to onboard new data domains like sprayer application maps. Compared to previous approaches in our legacy system, metadata modeling and integration are now both standardized and automated, so we can expose new geospatial data products to clients in days instead of weeks or months.
  • Optimized storage with COGs and Amazon S3 – Storing raster and vector assets in Amazon S3 using cloud-optimized formats (such as COGs for imagery or FlatGeobuff for vectors) reduces storage costs while enabling low-latency, streaming access. This avoids the need for preprocessing or extract, transform, and load (ETL)-heavy pipelines and simplifies client delivery.
  • Large-scale ingestion with a batch STAC ingestor – Our custom STAC ingestor supports both real-time and batch-mode operations. This has made it possible to onboard satellite constellations, drone imagery, and historical datasets in bulk without disrupting running services. The ingestion service uses optimized database ingestion functions, capable of ingesting thousands of items per second, providing high-throughput and reliable data integration at scale.
  • PostgreSQL, pgSTAC, and Amazon RDS Proxy for a scalable metadata backbone – With pgSTAC and Amazon RDS Proxy, we benefit from advanced spatial-temporal querying while making sure database connection management is handled gracefully, even under high concurrency. This combination offers reliability without compromising performance.
  • Scalable deployment with Amazon EKS – Hosting the solution on Amazon EKS provides full control over deployments, resource tuning, and service orchestration. Combined with automatic scaling, we dynamically adjust compute capacity based on demand, facilitating resilience and cost-efficiency.

Learnings

As part of building this solution, we learned the following:

  • RDS Proxy is essential for automatically scaled environments – Given our use of automatic scaling pods in Amazon EKS, we found that RDS Proxy is critical. It handles connection pooling efficiently and protects the underlying PostgreSQL database from connection exhaustion during sudden scale-up events. Without it, we encountered spiky load failures and blocked connections during high-ingest periods.
  • Batch STAC ingestor is a core component – Our custom STAC ingestor proved to be an indispensable piece of the system. It interfaces directly with pgSTAC to perform large-scale, automated ingestions of geospatial metadata from streams and archives. Without this tool, onboarding data providers or processing legacy imagery at scale would have been labor-intensive and error-prone.
  • COGs are non-negotiable – For fast, scalable visualization of large raster datasets, COGs are essential, particularly if raster datasets exceed several gigabytes. They enable efficient HTTP range requests, alleviate the need for preprocessing, and work seamlessly with TiTiler for real-time tile rendering. Non-COG formats led to noticeably slower performance and weren’t suitable for cloud-based visualization.
  • Serverless-compliant, optimized for Amazon EKS (for now) – Although the architecture is designed to be serverless-compatible, we opted for an Amazon EKS first approach due to the nature of our other application landscape. Components like TiTiler and TiPG benefit from persistent, memory-tuned environments that are harder to achieve in a serverless runtime. However, the solution remains modular and stateless by design, and certain subsystems (such as ingestion triggers, notifications, or monitoring) are already candidates for future serverless migration to further improve elasticity and reduce operational overhead.

Conclusion

BASF Digital Farming GmbH has successfully implemented a STAC-based geospatial data platform on Amazon EKS, enabling efficient management and visualization of satellite imagery, drone data, and application maps. This architecture helps us onboard new data sources within weeks rather than months. The new platform also processes twice as much data in a single day while cutting costs by 50%, thanks to reduced data handling through the STAC schema and the efficiencies of automatic scaling. By adopting the STAC standard, the architecture improves data discoverability, reduces search latency, and supports more efficient analytic workflows.

Organizations looking to build similar geospatial data solutions can use AWS services like Amazon EKS, Amazon S3, and Amazon RDS along with open source tools like STAC and eoAPI to create scalable, cost-effective solutions. Learn more about building containerized applications on AWS at Containers on AWS.

AWS Weekly Roundup: Kiro waitlist, EBS Volume Clones, EC2 Capacity Manager, and more (October 20, 2025)

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-kiro-waitlist-ebs-volume-clones-ec2-capacity-manager-and-more-october-20-2025/

I’ve been inspired by all the activities that tech communities around the world have been hosting and participating in throughout the year. Here in the southern hemisphere we’re starting to dream about our upcoming summer breaks and closing out on some of the activities we’ve initiated this year. The tech community in South Africa is participating in Amazon Q Developer coding challenges that my colleagues and I are hosting throughout this month as a fun way to wind down activities for the year. The first one was hosted in Johannesburg last Friday with Durban and Cape Town coming up next.

Last week’s launches
These are the launches from last week that caught my attention:

Additional updates
I thought these projects, blog posts, and news items were also interesting:

Upcoming AWS events
Keep a look out and be sure to sign up for these upcoming events:

AWS re:Invent 2025 (December 1-5, 2025, Las Vegas) — AWS flagship annual conference offering collaborative innovation through peer-to-peer learning, expert-led discussions, and invaluable networking opportunities.

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse here for upcoming in-person and virtual developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

Veliswa.

New general-purpose Amazon EC2 M8a instances are now available

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/new-general-purpose-amazon-ec2-m8a-instances-are-now-available/

Today, we’re announcing the availability of Amazon Elastic Compute Cloud (Amazon EC2) M8a instances, the latest addition to the general-purpose M instance family. These instances are powered by the 5th Generation AMD EPYC (codename Turin) processors with a maximum frequency of 4.5GHz. Customers can expect up to 30% higher performance and up to 19% better price performance compared to M7a instances. They also provide higher memory bandwidth, improved networking and storage throughput, and flexible configuration options for a broad set of general-purpose workloads.

Improvements in M8a
M8a instances deliver up to 30% better performance per vCPU compared to M7a instances, making them ideal for applications that require benefit from high performance and high throughput such as financial applications, gaming, rendering, application servers, simulation modeling, midsize data stores, application development environments, and caching fleets.

They provide 45% more memory bandwidth compared to M7a instances, accelerating in-memory databases, distributed caches, and real-time analytics.

For workloads with high I/O requirements, M8a instances provide up to 75 Gbps of networking bandwidth and 60 Gbps of Amazon Elastic Block Store (Amazon EBS) bandwidth, a 50% improvement over the previous generation. These enhancements support modern applications that rely on rapid data transfer and low-latency network communication.

Each vCPU on an M8a instance corresponds to a physical CPU core, meaning there is no simultaneous multithreading (SMT). In application benchmarks, M8a instances delivered up to 60% faster performance for GroovyJVM and up to 39% faster performance for Cassandra compared to M7a instances.

M8a instances support instance bandwidth configuration (IBC), which provides flexibility to allocate resources between networking and EBS bandwidth. This gives customers the flexibility to scale network or EBS bandwidth by up to 25% and improve database performance, query processing, and logging speeds.

M8a is available in ten virtualized sizes and two bare metal options (metal-24xl and metal-48xl), providing deployment choices that scale from small applications to large enterprise workloads. All of these improvements are built on the AWS Nitro System, which delivers low virtualization overhead, consistent performance, and advanced security across all instance sizes. These instances are built using the latest sixth generation AWS Nitro Cards, which offload and accelerate I/O for functions, increasing overall system performance.

M8a instances feature sizes of up to 192 vCPU with 768GiB RAM. Here are the detailed specs:

M8a vCPUs Memory (GiB) Network bandwidth (Gbps) EBS bandwidth (Gbps)
medium 1 4 Up to 12.5 Up to 10
large 2 8 Up to 12.5 Up to 10
xlarge 4 16 Up to 12.5 Up to 10
2xlarge 8 32 Up to 15 Up to 10
4xlarge 16 64 Up to 15 Up to 10
8xlarge 32 128 15 10
12xlarge 48 192 22.5 15
16xlarge 64 256 30 20
24xlarge 96 384 40 30
48xlarge 192 768 75 60
metal-24xl 96 384 40 30
metal-48xl 192 768 75 60

For a complete list of instance sizes and specifications, refer to the Amazon EC2 M8a instances page.

When to use M8a instances
M8a is a strong fit for general-purpose applications that need a balance of compute, memory, and networking. M8a instances are ideal for web and application hosting, microservices architectures, and databases where predictable performance and efficient scaling are important.

These instances are SAP certified and also well suited for enterprise workloads such as financial applications and enterprise resource planning (ERP) systems. They’re equally effective for in-memory caching and customer relationship management (CRM), in addition to development and test environments that require cost efficiency and flexibility. With this versatility, M8a supports a wide spectrum of workloads while helping customers improve price performance.

Now available
Amazon EC2 M8a instances are available today in US East (Ohio) US West (Oregon) and Europe (Spain) AWS Regions. M8a instances can be purchased as On-Demand, Savings Plans, and Spot Instances. M8a instances are also available on Dedicated Hosts. To learn more, visit the Amazon EC2 Pricing page.

To learn more, visit the Amazon EC2 M8a instances page and send feedback to AWS re:Post for EC2 or through your usual AWS support contacts.

Betty

DISA STIG for Amazon Linux 2023 is now available

Post Syndicated from Mahak Arora original https://aws.amazon.com/blogs/compute/disa-stig-for-amazon-linux-2023-is-now-available/

Today, we announce the availability of a Security Technical Implementation Guide (STIG) for Amazon Linux 2023 (AL2023), developed through collaboration between Amazon Web Services (AWS) and the Defense Information Systems Agency (DISA). The STIG guidelines are important for U.S Department of Defense (DOD) and Federal customers needing strict security compliance derived from the National Institute of Standards and Technology (NIST) 800-53 and related documents. This new technical implementation guide provides detailed Operating System (OS) security hardening configurations for organizations deploying AL2023 in DOD environments and other agencies requiring DISA STIG alignment. The AL2023 STIG provides customers with access to an OS guide that complies with stringent government security standards. This guide for implementing STIG configurations will streamline security processes for organizations seeking robust cybersecurity controls, whether they are needed to maintain DOD compliance or voluntarily adopting these best security practices to enhance their security posture.

Implementing the AL2023 DISA STIG with AWS

AWS Systems Manager (SSM) and EC2 Image builder offer native solutions for implementing the AL2023 DISA STIG configurations in your environment. For customers with existing AL2023 EC2 workload, they can utilize AWS Systems Manger (SSM) to streamline the STIG implementation. For customers who would like to build STIG compliant AL2023 EC2 instances to use for deployment, they can utilize EC2 Image Builder and automate the application of the AL2023 DISA STIG.

Customers can utilize EC2 Image builder to enhance and streamline their implementation of the AL2023 DISA STIG. This integrated approach significantly reduces the operational overhead traditionally associated with maintaining STIG compliance. Therefore, our customers can focus on their core missions while maintaining the highest security standards. Our customers can use AWS EC2 Image Builder’s existing Linux hardening components, which now support AL2023 Category I, II, and III findings to automatically create STIG-compliant AL2023 EC2 images with minimal manual intervention. This automation significantly reduces the time and effort typically needed for security hardening implementations. The EC2 Image Builder Linux hardening component extends its proven capabilities to AL2023, providing the same streamlined security configuration process available for other Linux distributions. For more information, refer to the Image Builder documentation.

Automating the STIG for Existing Fleets via Systems Manager

For existing AL2023 EC2 instances, you can use AWS-managed SSM command documents to automate the implementation of the STIG configurations. . These command documents can be executed through the SSM console, API, or AWS Command Line Interface (AWS CLI). The key mechanism here is the AWS managed Systems Manager command document, which contains the pre-defined STIG configurations. By leveraging these command documents through Systems Manager execution capabilities, customers can systematically deploy and maintain AL2023 STIG configurations across their fleet of EC2 instances. This generates consistent security baselines that meet government and enterprise requirements. This solution is particularly effective for environments with existing AL2023 EC2 instances as it allows customers to implement STIG controls without rebuilding or redeploying instances. For more information about the command document, refer to Apply STIG settings with Systems Manager in the EC2 User Guide.

The AL2023 STIG represents the continued commitment of Amazon Linux to providing customers with the security tools and guidance they need to succeed in highly regulated environments. Amazon Linux, in collaboration with DISA is providing their customers with access to authoritative, government-validated security configurations that meet the most demanding compliance requirements.

Ready to implement AL2023 STIG in your environment? Explore our comprehensive documentation and begin streamlining your security compliance journey today. To learn more about STIG hardening for your EC2 instances, refer to STIG compliance for your EC2 instance and for STIG settings that are applied to EC2 Linux instances, refer to the STIG settings for EC2 Linux instances. To apply STIG settings to your AL 2023 EC2 instance, download the AL2023 DISA STIG.