Tag Archives: announcements

Celebrating 10 years of Amazon Aurora innovation

2025-08-15 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/celebrating-10-years-of-amazon-aurora-innovation/

Ten years ago, we announced the general availability of Amazon Aurora, a database that combined the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases.

As Jeff described it in its launch blog post: “With storage replicated both within and across three Availability Zones, along with an update model driven by quorum writes, Amazon Aurora is designed to deliver high performance and 99.99% availability while easily and efficiently scaling to up to 64 TiB of storage.”

When we started developing Aurora over a decade ago, we made a fundamental architectural decision that would change the database landscape forever: we decoupled storage from compute. This novel approach enabled Aurora to deliver the performance and availability of commercial databases at one-tenth the cost.

This is one of the reasons why hundreds of thousands of AWS customers choose Aurora as their relational database.

Today, I’m excited to invite you to join us for a livestream event on August 21, 2025, to celebrate a decade of Aurora database innovation.

A brief look back at the past
Throughout the evolution of Aurora, we’ve focused on four core innovation themes: security as our top priority, scalability to meet growing workloads, predictable pricing for better cost management, and multi-Region capabilities for global applications. Let me walk you through some key milestones in the Aurora journey.

We previewed Aurora at re:Invent 2014, and made it generally available in July 2015. At launch, we presented Aurora as “a new cost-effective MySQL-compatible database engine.”

In June 2016, we introduced reader endpoints and cross-Region read replicas, followed by AWS Lambda integration and the ability to load tables directly from Amazon S3 in October. We added database cloning and export to Amazon S3 capabilities in June 2017 and full compatibility with PostgreSQL in October that year.

The journey continued with the serverless preview in November 2017, which became generally available in August 2018. Global Database launched in November 2018 for cross-Region disaster recovery. We introduced blue/green deployments to simplify database updates, and optimized read instances to improve query performance.

In 2023, we added vector capabilities with pgvector for similarity search for Aurora PostgreSQL, and Aurora I/O-Optimized to provide predictable pricing with up to 40 percent cost savings for I/O-intensive applications. We launched Aurora zero-ETL integration with Amazon Redshift which enables near real-time analytics and ML using Amazon Redshift on petabytes of transactional data from Aurora by removing the need for you to build and maintain complex data pipelines that perform extract, transform, and load (ETL) operations. This year we added Aurora MySQL zero-ETL integration with Amazon Sagemaker, enabling near real-time access of your data in the lakehouse architecture of SageMaker to run a broad range of analytics.

In 2024, we made it as effortless as just one click to select Aurora PostgreSQL as a vector store for Amazon Bedrock Knowledge Bases and launched Aurora PostgreSQL Limitless Database, a serverless horizontal scaling (sharding) capability.

To simplify scaling for customers, we also increased the maximum storage to 128 TiB in September 2020, allowing many applications to operate within a single instance. Last month, we’ve further simplified scaling by doubling the maximum storage to 256 TiB, with no upfront provisioning required and pay-as-you-go pricing based on actual storage used. This enables even more customers to run their growing workloads without the complexity of managing multiple instances while maintaining cost efficiency.

Most recently, at re:Invent 2024, we announced Amazon Aurora DSQL, which became generally available in May 2025. Aurora DSQL represents our latest innovation in distributed SQL databases, offering active-active high availability and multi-Region strong consistency. It’s the fastest serverless distributed SQL database for always available applications, effortlessly scaling to meet any workload demand with zero infrastructure management.

Aurora DSQL builds on our original architectural principles of separation of storage and compute, taking them further with independent scaling of reads, writes, compute, and storage. It provides 99.99% single-Region and 99.999% multi-Region availability, with strong consistency across all Regional endpoints.

And in June, we launched Model Context Protocol (MCP) servers for Aurora, so you can integrate your AI agents with your data sources and services.

Let’s celebrate 10 years of innovation
By attending the August 21 livestream event, you’ll hear from Aurora technical leaders and founders, including Swami Sivasubramanian, Ganapathy (G2) Krishnamoorthy, Yan Leshinsky, Grant McAlister, and Raman Mittal. You’ll learn directly from the architects who pioneered the separation of compute and storage in cloud databases, with technical insights into Aurora architecture and scaling capabilities. You’ll also get a glimpse into the future of database technology as Aurora engineers share their vision and discuss the complex challenges they’re working to solve on behalf of customers.

The event also offers practical demonstrations that show you how to implement key features. You’ll see how to build AI-powered applications using pgvector, understand cost optimization with the new Aurora DSQL pricing model, and learn how to achieve multi-Region strong consistency for global applications.

The interactive format includes Q&A opportunities with Aurora experts, so you’ll be able to get your specific technical questions answered. You can also receive AWS credits to test new Aurora capabilities.

If you’re interested in agentic AI, you’ll particularly benefit from the sessions on MCP servers, Strands Agents, and how to integrate Strands Agents with Aurora DSQL, which demonstrate how to safely integrate AI capabilities with your Aurora databases while maintaining control over database access.

Whether you’re running mission-critical workloads or building new applications, these sessions will help you understand how to use the latest Aurora features.

To the next decade of Aurora innovation!

— seb

Spring 2025 PCI 3DS compliance package available now

2025-08-14 Will Black

Post Syndicated from Will Black original https://aws.amazon.com/blogs/security/spring-2025-pci-3ds-compliance-package-available-now/

Amazon Web Services (AWS) is pleased to announce the successful completion of our annual audit to renew our Payment Card Industry Three Domain Secure (PCI 3DS) certification. As part of this renewal, we have expanded the scope to include three additional AWS services and three additional AWS Regions:

Newly added AWS services:

Newly added AWS Regions:

Asia Pacific (Thailand)
Asia Pacific (Malaysia)
Mexico (Central)

This certification allows customers to use these services while maintaining PCI 3DS compliance, enabling innovation without compromising security. The full list of services can be found on the AWS Services in Scope by Compliance Program page.

The PCI 3DS compliance package includes two key components:

Attestation of Compliance (AOC) – demonstrates that AWS was successfully validated against the PCI 3DS standard.
AWS Responsibility Summary – provides guidance to help AWS customers understand their responsibility in developing and operating a highly secure environment on AWS for handling payment card data.

AWS was evaluated by Coalfire, a third-party Qualified Security Assessor (QSA).

This refreshed certification offers customers greater flexibility in deploying regulated workloads while reducing compliance overhead. Customers can access the PCI 3DS reports through AWS Artifact. This self-service portal provides on-demand access to AWS compliance reports, streamlining audit processes.

To learn more about our PCI programs and other compliance and security programs, see the AWS Compliance Programs page. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Compliance Support page.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

177 AWS services achieve HITRUST certification

2025-08-13 Mark Weech

Post Syndicated from Mark Weech original https://aws.amazon.com/blogs/security/177-aws-services-achieve-hitrust-certification/

Amazon Web Services (AWS) is excited to announce that 177 AWS services have achieved HITRUST certification for the 2025 assessment cycle, including the following five services which were certified for the first time:

The full list of AWS services, which a third-party assessor audited and certified under the HITRUST CSF, is now available on our Services in Scope by Compliance Program page. Customers can view and download our 2025 HITRUST certification on demand through AWS Artifact.

AWS HITRUST certification is available for customer inheritance

As an added benefit to our customers, organizations no longer have to assess inherited controls for their HITRUST validated assessment because AWS already has. You can deploy business solutions to the AWS Cloud and inherit our HITRUST certification, provided that you use only in-scope services and properly apply the controls detailed on the HITRUST website according to the AWS Shared Responsibility Model.

Our HITRUST certification is based on the version 11.5.1 control framework, so you can inherit the latest controls and related scoring, knowing that AWS has attested to the latest framework standards available. Leading organizations in a variety of industries have adopted HITRUST CSF as part of their approach to security and privacy. For more information, see the HITRUST website.

As always, we value your feedback and questions and are committed to helping you achieve and maintain the highest standard of security and compliance. Feel free to contact the team through AWS Compliance Support. If you have feedback about this post, submit comments in the Comments section below.

Meet our newest AWS Heroes — August 2025

2025-08-13 Taylor Jacobsen

Post Syndicated from Taylor Jacobsen original https://aws.amazon.com/blogs/aws/meet-our-newest-aws-heroes-august-2025/

We are excited to announce the latest cohort of AWS Heroes, recognized for their exceptional contributions and technical leadership. These passionate individuals represent diverse regions and technical specialties, demonstrating notable expertise and dedication to knowledge sharing within the AWS community. From AI and machine learning to serverless architectures and security, our new Heroes showcase the breadth of cloud innovation while fostering inclusive and engaging technical communities. Join us in welcoming these community leaders who are helping to shape the future of cloud computing and inspiring the next generation of AWS builders.

Kristine Armiyants – Masis, Armenia

Community Hero Kristine Armiyants is a software engineer and cloud support engineer who transitioned into technology from a background in finance, having earned an MBA before becoming self-taught in software development. As the founder and leader of AWS User Group Armenia for over 2.5 years, she has transformed the local tech landscape by organizing Armenia’s first AWS Community Day, scaling it from 320 to 440+ attendees, and leading a team that brings international-scale events to her country. Through her technical articles in Armenian, hands-on workshops, and “no-filter” blog series, she makes cloud knowledge more accessible while mentoring new user group organizers and early-career engineers. Her dedication to community building has resulted in five new AWS Community Builders from Armenia, demonstrating her commitment to creating inclusive spaces for learning and growth in the AWS community.

Nadia Reyhani – Perth, Australia

Machine Learning Hero Nadia Reyhani is an AI Product Engineer who integrates DevOps best practices with machine learning systems. She is a former AWS Community Builder and regularly presents at AWS events on building scalable AI solutions using Amazon SageMaker and Bedrock. As a Women in Digital Ambassador, she combines technical expertise with advocacy, creating inclusive spaces for underrepresented groups in cloud and AI technologies.

Raphael Manke – Karlsruhe, Germany

DevTools Hero Raphael Manke is a Senior Product Engineer at Dash0 and the creator of the unofficial AWS re:Invent planner, which is used to help build a schedule for the event. With a decade of AWS experience, he specializes in serverless technologies and DevTools that streamline cloud development. As the organizer of the AWS User Group in Karlsruhe and a former AWS Community Builder, he actively contributes to product enhancement through public speaking and direct collaboration with AWS service teams. His commitment to the AWS community spans from local user group leadership to providing valuable feedback to service teams.

Rowan Udell – Brisbane, Australia

Security Hero Rowan Udell is an independent AWS security consultant specializing in AWS Identity and Access Management (IAM). He has been sharing AWS security expertise for over a decade through books, blog posts, meet-ups, workshops, and conference presentations. Rowan has taken part in many AWS community programs, was an AWS Community Builder for four years, and is part of the AWS Community Day Australia Organizing Committee. A frequent speaker at AWS events including Sydney Summit and other community meetups, Rowan is known for transforming complex security concepts into simple, practical, and workable solutions for businesses securing their AWS environments.

Sangwoon (Chris) Park – Seoul, Korea

Serverless Hero Sangwoon (Chris) Park leads development at RECON Labs, an AI startup specializing in AI-driven 3D content generation. He is a former AWS Community Builder and the creator of “AWS Classroom” YouTube channel, and he shares practical serverless architecture knowledge with the AWS community. Chris hosts monthly AWS Classroom Meetups and the AWS KRUG Serverless Small Group, actively promoting serverless technologies through community events and educational content.

Toshal Khawale – Pune, India

Community Hero Toshal Khawale is an experienced technology leader with over 22 years of expertise in engineering and AWS cloud technology, holding 12 AWS certifications that demonstrate his cloud knowledge. As a Managing Director at PwC, Toshal guides organizations through cloud transformation, digital innovation, and application modernization initiatives, having led numerous large-scale AWS migrations and generative AI implementations. He was an AWS Community Builder for six years and continues to serve as the AWS User Group Pune Leader, actively fostering community engagement and knowledge sharing. Through his roles as a mentor, frequent speaker, and advocate, Toshal helps organizations maximize their AWS investments while staying at the forefront of cloud technology trends.

Learn More

Visit the AWS Heroes webpage if you’d like to learn more about the AWS Heroes program, or to connect with a Hero near you.

— Taylor

Introducing AWS Cloud Control API MCP Server: Natural Language Infrastructure Management on AWS

2025-08-13 Kevon Mayers

Post Syndicated from Kevon Mayers original https://aws.amazon.com/blogs/devops/introducing-aws-cloud-control-api-mcp-server-natural-language-infrastructure-management-on-aws/

Today, we’re officially announcing the AWS Cloud Control API (CCAPI) MCP Server. This MCP server transforms AWS infrastructure management by allowing developers to create, read, update, delete, and list resources using natural language. As part of the awslabs/mcp project, this new and innovative tool serves as a bridge between natural language commands and AWS infrastructure deployment and management. This MCP server is powered by the AWS Cloud Control API – a standardized API that allows CRUDL (Create/Read/Update/Delete/List) operations to be performed against AWS and third party resources using a single endpoint.

Key Features:

Leverages AWS Cloud Control API for CRUDL operations for more than 1,200 AWS resources
Enables LLM-powered agents and developers to manage infrastructure with natural language prompts
Provides the option to output Infrastructure as Code (IaC) templates for infrastructure it will create, allowing to still be used with existing CI/CD pipelines
Integrates with AWS Pricing API to provide cost estimates for the infrastructure it will create
Applies security best practices automatically using Checkov

Why Use CCAPI MCP Server?

Simplified Infrastructure Management: No more wrestling with complex templates or documentation
Increased Developer Productivity: Focus on what you need, not how to configure it
Reduced Learning Curve: Onboard new team members faster with natural language commands
LLM Integration: Perfect companion for AI-assisted development workflows

The CCAPI MCP Server transforms infrastructure management by enabling natural language interactions for AWS resource operations. Bridging natural language commands with AWS infrastructure deployment and management, this MCP Server allows developers to manage cloud infrastructure through conversational inputs such as:

Can you create a new s3 bucket for me?or
Find all of my EC2 instances and tell me which one have an instance type that is not t2.large

This significantly reduces configuration overhead and accelerates onboarding for new team members, directly translates developer intent into cloud infrastructure.

Let’s see it in action.

Creating and Managing Cloud Infrastructure

Prerequisites

uv package manager installed
Python 3.x.x installed
AWS credentials with appropriate permissions. The MCP server supports multiple ways to define these credentials. See the MCP documentation for more information. Using dynamic credentials such as one provided via SSO is recommended. For more information on configuring AWS credentials, see the AWS CLI documentation.
An MCP Host application installed that supports MCP Clients and MCP Servers (e.g. Amazon Q Developer, Claude Desktop, Cursor, etc.). To follow this blog install Amazon Q Developer for CLI (CLI) as described in the installation instructions

Integration with Developer Tools

To start using the CCAPI MCP server, you will need to set up your server configuration which is typically in a file named mcp.json. For this blog we will focus on using the CCAPI MCP server with Amazon Q Developer. Note that for other MCP Host applications the path to the mcp configuration file may differ. You will need to create the file if it does not already exist in the directory.

1. Global Configuration: ~/.aws/amazon/mcp.json – Applies to all workspaces

2. Workspace Configuration: .amazonq/mcp.json – Specific to the current workspace

More information can be found in the Amazon Q Developer User Guide.

Configuration file structure

The MCP configuration file uses a JSON format with the following structure:

mcp.json

{
  "mcpServers": {
    "server-name": {
      "command": "command-to-run",
      "args": ["arg1", "arg1",],
      "env": {
        "ENV_VAR1": "value1",
        "ENV_VAR2": "value2",
      },
    }
  }
}

Here is mcp.json with the CCAPI MCP Server configuration:

{
  "mcpServers": {
   "awslabs.ccapi-mcp-server": {
      "command": "uvx",
      "args": [
        "awslabs.ccapi-mcp-server@latest"
      ],
      "env": {
        "AWS_PROFILE": "your named AWS profile",
	"DEFAULT_TAGS": “enabled”,
	"SECURITY_SCANNING": “enabled”,
	"FASTMCP_LOG_LEVEL": “ERROR”
      },
      "disabled": false,
      "autoApprove": []
    }
  }
}

Important

Ensure you correctly set your AWS credentials in the MCP server config. It is essential that you properly configure these credentials, as the MCP server uses their associated permissions when invoking the AWS Cloud Control API for CRUDL operations in your AWS account. The server supports multiple methods of consuming these credentials such as AWS profiles, Environment Variables, SSO tokens, etc. You can see some of this in the aws_client.py file. See these docs on using named profiles for more information.

Read Only Mode

If you would like to prevent the MCP server from performing mutating actions (e.g. Create/Update/Delete Resource), you can specify the --readonly flag as demonstrated below:

{
  "mcpServers": {
   "awslabs.ccapi-mcp-server": {
      "command": "uvx",
      "args": [
        "awslabs.ccapi-mcp-server@latest",
        “--readonly”"
      ],
      "env": {
        "AWS_PROFILE": "your named AWS profile",
	"DEFAULT_TAGS": “enabled”,
	"SECURITY_SCANNING": “enabled”,
	"FASTMCP_LOG_LEVEL": “ERROR”
      },
      "disabled": false,
      "autoApprove": []
    }
  }
}

More information on the configuration and tools the CCAPI MCP server provides can be found in the AWS CloudFormation MCP Server documentation.

Security Considerations

Ensure the IAM credentials include permissions for Cloud Control API actions (List, Get, Create, Update, Delete). See the AWS CCAPI API documentation for more info
Follow IAM least privilege principles
Enable AWS CloudTrail auditing
Consider running in read-only mode with --readonly flag for safer operations

Example Use Case: Creating an S3 Bucket with KMS Encryption

IMPORTANT: Ensure you have satisfied all prerequisites before attempting these commands.

1. With the mcp.json file correctly set, try to run a sample prompt. In your terminal, run q chat to start using Amazon Q in the CLI.

Q CLI Initial Load of Cloud Control API MCP Server 2. This will start initializing the MCP servers in the background, allowing you to immediately start using Q Chat even if they are still loading. As a note, if these have not finished loading, your prompts will be handled without using any MCP servers. To check the status of the servers, run /mcp

3. Once that you have validated that the MCP server was loaded successfully, try a sample command. Simply tell Amazon Q : Create an S3 bucket with versioning and encrypt it using a new KMS key

Amazon Q will use the server to automatically:

Fetch your current environment variables
Use those to fetch your current AWS session info
Create code that defines what is in your prompt
Explain the code that was generated
Run security analysis against the code that was generated (if enabled)
Explain the results of the security analysis
Validate the configuration against AWS Cloud Control API schemas (which use CloudFormation Resource Provider Schemas as their foundation) and IAM policies. This validation ensures compliance with Cloud Control API requirements, which is essential for resource creation
Create the resources directly through Cloud Control API

Note: While CloudFormation schemas are referenced in the validation step, this solution uses Cloud Control API for resource management, not CloudFormation. The schemas are used because they define the standardized resource properties that Cloud Control API expects.

4. First, Amazon Q will mention that it needs to check the environment variables to find information related to the AWS session information. It will inform you about the specific tool it aims to use and will ask for permission. Select y to accept and allow actions.

5. Next, Amazon Q will ask to use get_aws_session_info() to fetch information about the AWS session it should use for subsequent actions. It will use the relevant values from the environment variables defined in the MCP configuration file (e.g. ~/.aws/amazon/mcp.json)

6.Amazon Q will then display the AWS account ID and region it will use to deploy resources. To start, it will use generate_infrastructure_code() to generate the resource properties for a KMS key that will be sent to Cloud Control API. These properties mirror the structure defined in AWS CloudFormation Resource Provider Schemas (which Cloud Control API uses as its foundation), allowing for security validation through Checkov before deployment. The key will be configured following security best practices, with a key policy scoped to only allow usage within the AWS account.

7. Once that Amazon Q has generated the code for the resource, it will run then use the explain() tool to explain the infrastructure code that was generated. Note that default tags MANAGED_BY, MCP_SERVER_SOURCE_CODE, and MCP_SERVER_VERSION are added for all resources managed by the CCAPI MCP server. These tags provide for ease of identification of infrastructure that is being managed by the MCP server. They are configurable and you optionally can disable them, but we highly recommend adding tags to ensure you have visibility into infrastructure that is being managed by the CCAPI MCP server.

8. It will then attempt to use the run_checkov() tool to inspect the security of the code. This tool is triggered because SECURITY_SCANNING was set to enabled in your server configuration file.

9. After Checkov has run, it will then attempt to use the explain() tool again to explain the security findings from the Checkov run. If there were no security issues, it will attempt to proceed. If there were security issues, you will be asked how you’d like to proceed, and Amazon Q will recommend necessary fixes. By default, the checks that passed will only give a minimal summary. If you’d like to get more information, just ask for more details.

10. The next tool that Amazon Q will use is the create_resource() tool. This tool will attempt to create the resource using the AWS Cloud Control API, and then use the get_resource_request_status() tool to check the status of the creation. This tool uses the request token to identify the request that was submitted to the Cloud Control API and uses this to fetch its status information.

11. Amazon Q will continue using the CCAPI MCP server tools as needed until it finishes creation of both the S3 Bucket and KMS Key and will output a summary.

12. Now, ask Amazon Q to make a change potentially negatively affecting security, for example by allowing the S3 bucket to be publicly accessible. While this configuration is generally advised against, sometimes it is necessary – such as when you want to use the S3 bucket for public website hosting. Amazon Q will respond letting you know that what you are asking for is not the best practice, and explain why. However, since this could be a valid request depending on your use case, it will prompt you to confirm.

13. The CCAPI MCP server also has integrations with the AWS Pricing API, so you can even ask for the estimated cost of what it has deployed.

14. Lastly, ask Amazon Q to create a CloudFormation template of what it has created so far so you can either have a backup, or if you want to redeploy something similar, you will have a template to work off. It will use the create_template() tool to accomplish this task.

Note: The create_template() tool comes with predefined settings:

Outputs YAML format by default (can be JSON)
Sets DeletionPolicy to RETAIN
Sets UpdateReplacePolicy to RETAIN
Allows optional parameters for template ID, file saving location, and region specification

For more information, review the tool in the source code.

15. Try one more dangerous operation, attempting to delete all resources within an AWS account. The security checks block this attempt and suggest other alternatives.

16. Finally, ask Amazon Q to just delete what it has created. This time it will use the get_resource() tool to get information about the existing resources it created, the explain() tool to explain the changes that will be made, and finally the delete_resource() tool to delete the resources.

After successfully deleting the resources, it will provide a final summary.

Sample Prompts for Easy Start

Sample Prompt	What It Does
“Create a VPC with private and public subnets”	Sets up a complete network environment
“List all my EC2 instances”	Shows running instances across your account
“Create a serverless API for my application”	Deploys API Gateway with Lambda integration
“Set up a load-balanced web application”	Creates ALB with target groups and instances

Conclusion

The AWS Cloud Control API MCP Server represents a significant advancement in AWS infrastructure management, making operations on cloud resources easy to express and access through natural language. Whether you’re streamlining operations, experimenting with LLM-based development, or onboarding new team members, whether you are using Amazon Q Developer in CLI or any other MCP Host application (such as Claude Desktop or Cursor), the CCAPI MCP servet and its tools offer a truly intuitive way to interact with AWS.

Authors

The Amazon SageMaker lakehouse architecture now automates optimization configuration of Apache Iceberg tables on Amazon S3

2025-08-09 Tomohiro Tanaka

Post Syndicated from Tomohiro Tanaka original https://aws.amazon.com/blogs/big-data/the-amazon-sagemaker-lakehouse-architecture-now-automates-optimization-configuration-of-apache-iceberg-tables-on-amazon-s3/

As organizations increasingly adopt Apache Iceberg tables for their data lake architectures on Amazon Web Services (AWS), maintaining these tables becomes crucial for long-term success. Without proper maintenance, Iceberg tables can face several challenges: degraded query performance, unnecessary retention of old data that should be removed, and a decline in storage cost efficiency. These issues can significantly impact the effectiveness and economics of your data lake. Regular table maintenance operations help ensure your Iceberg tables remain high performing, compliant with data retention policies, and cost-effective for production workloads. To help you manage your Iceberg tables at scale, AWS Glue automated those Iceberg table maintenance operations: compaction with sort and z-order and snapshots expiration and orphan data management. After the launch of the feature, many customers have enabled automated table optimization through AWS Glue Data Catalog to reduce operational burden.

The Amazon SageMaker lakehouse architecture now automates optimization of Iceberg tables stored in Amazon S3 with catalog-level configuration, optimizing storage in your Iceberg tables and improving query performance. Previously, optimizing Iceberg tables in AWS Glue Data Catalog required updating configurations for each table individually. Now, you can enable automatic optimization for new Iceberg tables with one-time Data Catalog configuration. Once enabled, for any new table or updated table, Data Catalog continuously optimizes tables by compacting small files, removing snapshots, and unreferenced files that are no longer needed.

This post demonstrates an end-to-end flow to enable catalog level table optimization setting.

Prerequisites

The following prerequisites are required to use the new catalog-level table optimizations:

An active AWS account.
A data lake administrator to configure the table optimizations at the catalog level. To create the data lake administrator, refer to Set up AWS Lake Formation.
An AWS Identity and Access Management (IAM) role for the table optimizations to access Iceberg tables. For the instructions, refer to Catalog level table optimization prerequisites.

Enable table optimizations at the catalog level

The data lake administrator can enable the catalog-level table optimization on the AWS Lake Formation console. Complete the following steps:

On the AWS Lake Formation console, choose Catalogs in the navigation pane.
Select the catalog to be enabled with catalog-level table optimizations.
Choose Table optimizations tab, and choose Edit in Table optimizations, as shown in the following screenshot.

In Optimization options, select Compaction, Snapshot retention, and Orphan file deletion, as shown in the following screenshot.

enable-optimizations

Select an IAM role. Refer to Table optimization prerequisites for permissions.
Choose Grant required permissions.
Choose I acknowledge that expired data will be deleted as part of the optimizers.

After you enable the table optimizations at the catalog level, the configuration is displayed on the AWS Lake Formation console, as shown in the following screenshot.

optimizations-configuration

When you select an Iceberg table registered in the catalog, you can confirm that the table optimizations configuration is inherited from the table view because Configuration source shows catalog, as shown in the following screenshot.

catalog-level-optimizations

The table optimizations history is displayed on the table view. The following result shows one of the compaction runs by the table optimizations.

binpack-compaction-result

The catalog-level table optimizations for all databases and Iceberg tables are now enabled.

Customize setting of table optimizations at both the catalog and table-level

Although the catalog-level optimization applies common settings across all databases and Iceberg tables in your catalog, you might want to apply different strategies for specific Iceberg tables. You can use AWS Glue Data Catalog to enable both catalog-level and table-level optimizations based on specific table characteristics and access patterns. For example, in addition to configuring the catalog-level compaction with the bin-pack strategy for general-purpose Iceberg tables, you can apply the sort strategy at the table-level to tables with frequent range queries on timestamp columns.

This section shows configuring catalog-level and table-specific optimizations through a practical scenario. Imagine a real-time analytics table with frequent write operations that generates more orphan files due to constant metadata updates. Users also run selective queries filtering specific columns, which makes sort-order strategy preferable. Complete the following steps:

Select another Iceberg table in the same catalog as before to configure the table-level optimizations on the AWS Lake Formation console. At this point, the catalog-level table optimizations are configured for this table.
Choose Edit in Optimization configuration, as shown in the following screenshot.

In Optimization options, choose Compaction, Snapshot retention, and Orphan file deletion.
In Optimization configuration, choose Customize settings.
Select the same IAM role.
In Compaction configuration, select Sort, as shown in the following screenshot. Also configure 80 files to Minimum input files, which is a threshold of the number of files to trigger the compaction. To configure Sort, a sort order needs to be defined in your Iceberg table. You can define the sort order with Spark SQL such as ALTER TABLE db.tbl WRITE ORDERED BY <columns>.

sort-config

In Snapshot retention configuration and Snapshot deletion run rate, select Specify a custom value in hours. Then, configure 12 hours to the interval between two deletion job runs, as shown in the following screenshot.

snapshot-retention

In Orphan file deletion configuration, configure 1 day to Files under the provided Table Location with a creation time older than this number of days will be deleted if they are no longer referenced by the Apache Iceberg Table metadata.

orphan-deletion

Choose Grant required permissions.
Choose I acknowledge that expired data will be deleted as part of the optimizers.
Choose Save.
The Table optimization tab on the AWS Lake Formation console displays the custom setting of table optimizers. In Compaction, Compaction strategy is configured to sort and Minimum input files is also configured to 80 files. In Snapshot retention, Snapshot deletion run rate is configured to 12 hours. In Orphan file deletion, Orphan files will be deleted after is configured to 1 days, as shown in the following screenshot.

new-table-level-optimizations

The compaction history shows sort as its table-level compaction strategy even if the strategy in the catalog-level is configured to binpack, as shown in the following screenshot.

sort-compaction-result

In this scenario, the table-specific optimizations are configured along with the catalog-level optimizations. Combining the table and catalog-level optimizations means you can more flexibly manage your Iceberg table data deletions and compactions.

Conclusion

In this post, we demonstrated how to enable and manage using Amazon SageMaker lakehouse architecture with AWS Glue Data Catalog’s catalog-level table optimization feature for Iceberg tables. This enhancement significantly simplifies the management of Iceberg tables because you can enable automated maintenance operations across all tables with a single setting. Instead of configuring optimization settings for individual tables, you can now maintain your entire data lake more efficiently, reducing operational overhead while ensuring consistent optimization policies. We recommend enabling catalog-level table optimization to help you maintain a well-organized, high-performing, and cost-effective data lake while freeing up your teams to focus on deriving value from your data.

Try out this feature for your own use case and share your feedback and questions in the comments. To learn more about AWS Glue Data Catalog table optimizer, visit Optimizing Iceberg tables.

Acknowledgment: A special thanks to everyone who contributed to the development and launch of catalog level optimization: Siddharth Padmanabhan Ramanarayanan, Dhrithi Chidananda, Noella Jiang, Sangeet Lohariwala, Shyam Rathi, Anuj Jigneshkumar Vakil, and Jeremy Song.

About the authors

Tomohiro Tanaka is a Senior Cloud Support Engineer at Amazon Web Services (AWS). He’s passionate about helping customers use Apache Iceberg for their data lakes on AWS. In his free time, he enjoys a coffee break with his colleagues and making coffee at home.

Noritaka Sekiyama is a Principal Big Data Architect with AWS Analytics services. He’s responsible for building software artifacts to help customers. In his spare time, he enjoys cycling on his road bike.

Sandeep Adwankar is a Senior Product Manager at Amazon Web Services (AWS). Based in the California Bay Area, he works with customers around the globe to translate business and technical requirements into products customers can use to improve how they manage, secure, and access data.

Siddharth Padmanabhan Ramanarayanan is a Senior Software Engineer on the AWS Glue and AWS Lake Formation team, where he focuses on building scalable distributed systems for data analytics workloads. He is passionate about helping customers optimize their cloud infrastructure for performance and cost efficiency.

Improve email deliverability with tenant management in Amazon SES

2025-08-07 Satya S Tripathy

Post Syndicated from Satya S Tripathy original https://aws.amazon.com/blogs/messaging-and-targeting/improve-email-deliverability-with-tenant-management-in-amazon-ses/

Amazon Simple Email Service (Amazon SES) serves diverse industries—from ecommerce services to financial institutions to marketing technology product providers—helping organizations manage their email communication needs. Many businesses face the challenge of sending emails not just for themselves, but on behalf of their downstream customers or across various business divisions. These scenarios, commonly known as multi-tenant email sending practices, require careful architectural consideration. For example, a marketing service might need to send promotional emails for hundreds of retail clients, or an enterprise IT team might manage email communications across multiple business units (BUs). These clients and BUs are also identified as tenants. To successfully implement multi-tenancy in Amazon SES, customers usually develop an architecture pattern within Amazon SES that accomplishes critical objectives to efficiently handle the email sending needs of thousands of downstream tenants while maintaining isolated email sending reputations for each tenant. This isolation is crucial for protecting each customer’s deliverability metrics and to prevent issues with one tenant from impacting others.

Amazon SES customers can achieve multi-tenancy through isolated configuration sets for sending emails, but traditionally, reputation management and enforcement occur at the account level. To address this, Amazon SES now offers tenant management capabilities that enable tenant isolation and reputation management at the individual tenant level. This new feature provides greater control and flexibility for organizations managing multiple tenants within a single Amazon SES account, allowing each tenant to maintain its own sending reputation independently.

In this post, you will learn about the newly released tenant management feature that helps customers manage individual tenant onboardings and manage their reputations in isolation. This feature helps organizations create and manage up to 10,000 isolated tenants within a single AWS account (which can be increased 300,000 on explicit request), each with independent configurations and reputation metrics. You will discover how these capabilities maintain email deliverability through automated tenant-level controls, real-time monitoring, and customizable sending policies.

Whether you’re a service provider sending emails on behalf of multiple customers or an enterprise coordinating various BUs or lines of business (LOBs), this new feature offers sophisticated workflows to identify reputation-based findings and pause individual tenant sending to protect other tenants’ reputations. These enhancements are available globally across AWS Regions where Amazon SES is offered, representing a significant advancement in email deliverability management at scale.

Use cases

Following use cases can easily achieved though Amazon SES tenant management feature.

Onboard multiple brands from different BUs with different domains
Separate marketing and transaction tenants
Support independent software vendor (ISV) customers’ requirement to segregate email sending reputation of their customers
Domain management using configuration sets.
Track individual customers’ email sending reputations and control their email sending processes

Multi-tenant email operation challenges

Businesses rely on email as a critical communication channel. However, managing email operations for multiple tenants (customers or business units) has historically presented significant challenges such as:

Lack of isolation: Without proper tenant isolation, poor sending practices by one tenant could negatively impact the email deliverability of others, potentially jeopardizing your entire email sending operation.
Limited visibility: Understanding per-tenant email performance metrics and managing reputation independently has been difficult, if not impossible.
Scalability constraints: Many organizations struggle to scale their email operations because of account-level limitations on resources such as identities and configuration sets.
Inadequate control: The inability to set tenant-specific limits and configurations has made it challenging to prevent individual tenants from impacting others or exceeding allocated resources.
Complex monitoring: Building custom solutions for monitoring tenant activity often leads to inconsistent and inefficient workflows.

Benefits of tenant management capability

The Amazon SES tenant management feature provides a comprehensive solution for organizations managing email sending at scale on behalf of their customers or LOBs (called tenants). This capability is particularly valuable for software as a service (SaaS) providers, email service providers, and enterprises managing email operations across multiple clients or departments while separating tenants from each other.

Through tenant management, organizations can effectively manage email streams and reputation independently and maintain oversight of their various email operations. This new functionality transforms how organizations use Amazon SES, enabling them to handle complex, multi-faceted email operations with greater control and visibility at the tenant level with the following key capabilities.

Isolate tenant resources and reputation: Tenant management provides dedicated resource isolation that protects your email reputation across different customers and lines of business. Each tenant (customers or LOBs) will have their own dedicated set of resources such as email sending IPs, domain, and identifiers in DomainKeys Identified Mail (DKIM) signed headers, which are observed by mailbox providers within your Amazon SES account. Tenant management delivers granular control over resource allocation. You can assign tenant-specific or shared sending identities based on your organizational requirements. Each tenant can receive dedicated SMTP or API credentials that provide secure access to their allocated resources. You can configure tenant-level IP pools that separate sending traffic and maintain distinct reputation profiles for each tenant. You can use tenant management to manage tenant-specific configuration sets that define how emails are processed and tracked for each tenant. You can associate email templates with specific tenants, confirms that branded communications remain properly segmented and controlled. This isolation helps ensure that one tenant’s actions cannot affect the reputation or performance of other tenants. When a tenant experiences delivery issues or reputation problems, these challenges remain contained within their dedicated resources. This approach maintains fairness across all tenants and establishes clear individual accountability for email practices.

Monitor tenant-specific metrics in real time: You can access specific reputation metrics for each tenant, including raw bounce rates and complaint rates that directly impact sender reputation. You can use this system to set up tenant-level event destinations though the respective configuration set mapped to the tenant for detailed tracking and analysis. With this, you gain access to detailed tenant-level events that track performance, engagement, and compliance metrics for each tenant individually. You can also establish automated enforcement policies based on configurable thresholds that align with your business requirements. When tenant reputation findings are detected or when tenant status changes occur, you can receive real-time alerts through Amazon EventBridge.
Scale to hundreds of thousands of tenants: The system is designed to handle massive scale (starting at 10 thousand tenants per account and can increase to as many as 300 thousand tenants), allowing you to grow your business or expand your email operations without worrying about infrastructure limitations. Whether you’re managing dozens or hundreds of thousands of tenants, the system will adapt to your needs.
Automate tenant management workflows: You can set up automated processes for onboarding new tenants, applying policies, and managing tenant lifecycles. With this system, you can use API and console interfaces to create, modify, and delete tenants and have the flexibility to pause or resume sending capabilities as required. This automation reduces manual overhead for consistent application of your email sending standards across all tenants.
Take targeted enforcement actions to maintain high deliverability: If issues arise with a specific tenant, you can take precise actions—such as suspending sending privileges or applying stricter reputation policies—without affecting other tenants. This capability helps maintain overall high deliverability rates for your entire operation.

These features collectively represent an advancement in email management capabilities, so organizations can offer more sophisticated, scalable, and reliable email services to their clients or internal departments while maintaining strict control over reputation and compliance.

How tenant management works

You can use the tenant management feature from Amazon SES to segment your email sending operations effectively. You can use the system to create multiple tenants within a single Amazon SES account, with each tenant having its own dedicated resources. These resources include essential components such as sending identities, SMTP credentials, configuration sets, and dedicated IP pools. What makes this architecture particularly flexible is the ability to share common resources across tenants, such as IP pools and configuration sets, enabling optimal resource utilization while maintaining operational separation. The following diagram illustrates the preceding information in detail.

Prerequisites

To get started with tenant management, you need:

An AWS account
Verified sending identities within Amazon SES (domains or email addresses)
Configuration sets for email settings
A clear understanding of your tenant structure based on your business requirements

How to set up tenant management and its key considerations

Setting up a multi-tenant system in Amazon SES requires careful configuration of three key components: IP pools, domain verification, and configuration sets. By following the set-up procedure, each tenant will have isolated resources, proper tracking, and monitoring capabilities. Using the AWS Management Console for Amazon SES or the Amazon SES APIs, you can create a robust email sending infrastructure that maintains high deliverability while keeping each tenant’s reputation separate.

IP pool configuration

IP pool configuration is a fundamental step to send email communications using Amazon SES. Begin your multi-tenant setup by establishing dedicated IP pools or managed IP pools for each customer though a configuration set. First, access the Amazon SES console and navigate to the Dedicated IP pools section. Create a new Standard dedicated IP pool, giving it a name that clearly identifies the customer. Through AWS Support, request the specific number of IP addresses needed based on your customer’s sending volume—typically one IP per 50,000 daily emails. After the IPs are provisioned, assign them to the appropriate pool. Then, map the IP pool with the configuration set mapped to the tenant. For IP warm-up, you have two options: enable the automatic warm-up schedule, which gradually increases sending volume over 45 days, or disable it to implement your own custom warm-up plan. Monitor the warm-up progress closely to help ensure optimal delivery rates.

Domain verification process

After setting up the IP pool, proceed with domain verification to establish your customer’s sending identity. Navigate to the “verified Identities” (verified identities are the domains or email ids those you have already whitelisted with Amazon SES) section in the Amazon SES console and create a new domain identity using your customer’s domain name. Amazon SES will provide DKIM records that need to be added to the domain’s DNS settings. Work with your customer to implement these records correctly in their DNS configuration. The verification process typically takes 24–72 hours to complete. During this time, regularly check the verification status in the Amazon SES console to make sure the process completes successfully.

Authentication and authorization for tenants

In addition to restricting email sending to specific identities and configurations, you can restrict email sending permissions by tenant using AWS Identity and Access Management (IAM) user or role policies. The following policy demonstrates these restrictions by allowing emails only when the tenant Amazon Resource Name (ARN) is arn:aws:ses:us-east-1:111122223333:tenant/testTenant1/tn-e08a68010000a3e4c67bcd990910, the identity is arn:aws:ses:us-east-1:111122223333:identity/example.com and the configuration-set is arn:aws:ses:us-east-1:111122223333:configuration-set/testTenant1.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "ses:SendRawEmail",
      "Resource": [
        "arn:aws:ses:us-east-1:111122223333:identity/example.com",
        "arn:aws:ses:us-east-1:111122223333:configuration-set/TestTenant1"
      ],
  "Condition": 
{ "StringEquals": 
{ "ses:TenantName": "testTenant1" }
}
    }
  ]
}

Set up a configuration set

The final step involves creating and configuring the configuration set, which manages tracking and monitoring. Start by creating a new configuration set under configuration set section in the Amazon SES console, naming it to match your customer’s identification. Configure the custom tracking domain and enable appropriate tracking settings for opens and clicks. Link this configuration to the previously created IP pool. Next, set up event destinations to monitor email performance—this can include Amazon CloudWatch metrics, Amazon Data Firehose, or Amazon Simple Notification Service (Amazon SNS) topics. In CloudWatch, create alarms for critical metrics such as bounce rates (recommended threshold: 5%) and complaint rates (recommended threshold: 0.1%). Set up notification systems to alert your team when these thresholds are breached, so you can quickly respond to any delivery issues.

Sample CLI commands

To start using tenant management, you can use the console, AWS Command Line Interface (AWS CLI), or AWS SDKs. The following are basic examples of creating and configuring a tenant using the AWS CLI:

Following states a life cycle of the tenant management procedure starting from creating a tenant to deleting it in case you want to remove the tenant. Make sure that you are using AWS CLI version to 2.28.0 or later. See AWS CLI install and update instructions if necessary.

Create a new tenant

aws sesv2 create-tenant --tenant-name testTenant1 --region us-east-1
Note that “-–region” value is optional

Assign a sending identity to the tenant (domain or email ID)

aws sesv2 create-tenant-resource-association --tenant-name testTenant1 --resource-arn arn:aws:ses:us-east-1:111122223333:identity/example.com

Add a configuration set to the tenant

aws sesv2 create-tenant-resource-association --tenant-name testTenant1 --resource-arn arn:aws:ses:us-east-1:111122223333:configuration-set/test1

The assumption here is that the selected configuration set already has an IP-Pool associated.

Get tenant information through get-tenants or list-tenants

aws sesv2 get-tenant --tenant-name testTenant1 --region us-east-1 

aws sesv2 list-tenants --region us-east-1 <List all the tenants with their ARN>

You can use get-tenant or List-tenants to get information about a specific tenant, including the tenant’s name, ID, ARN, creation timestamp, tags, and sending status or list-tenants to list all tenants associated with your account

List resources of a tenant

aws sesv2 list-tenant-resources --tenant-name testTenant1

Send email using tenant

aws sesv2 send-email --from-email-address "[email protected]" --destination "[email protected] " --configuration-set-name test1   --content '{"Simple":{"Subject":{"Data":"Your email subject","Charset":"UTF-8"},"Body":{"Text":{"Data":"This is the plain text version.","Charset":"UTF-8"},"Html":{"Data":"<html><body><h1>This is the HTML version</h1><p>With formatted content.</p></body></html>","Charset":"UTF-8"}}}}' --tenant-name testTenant1

To change the reputation policy from standard to strict (Standard policy is applied by default)

aws sesv2 update-reputation-entity-policy --reputation-entity-type RESOURCE --reputation-entity-reference arn:aws:ses:us-east-1: 111122223333:tenant/ testTenant1/tn-145f7885b000074362bfa074ec4e1 --reputation-entity-policy arn:aws:ses:us-east-1:aws:reputation-policy/strict

Disable sending for a tenant (to temporarily disable or pause a tenant)

aws sesv2 update-reputation-entity-customer-managed-status  —reputation-entity-type RESOURCE —reputation-entity-reference arn:aws:ses:us-east-1:111122223333:tenant/testTenant1/tn-145f7885b000074362bfa074ec4e1 —sending-status DISABLED

Delete the tenant (remove the tenant completely from the Amazon SES account)

aws sesv2 delete-tenant --tenant-name testTenant1

Send emails using SMTP

The X-SES-TENANT header is utilized by AWS to manage emails across multiple tenants. You can specify the tenant name by including it in the X-SES-TENANT field. This approach allows for better organization and routing of emails based on tenant information. To implement this, you can add the X-SES-TENANT header when sending emails using SMTP. The following Python code demonstrates how to include this header in your email sending process::

<Pseudo code>
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart

def send_email(smtp_server, port, username, password, from_email, to_email, subject, body, config_set=None):
    msg = MIMEMultipart()
    msg['From'] = from_email
    msg['To'] = to_email
    msg['Subject'] = subject
    msg['X-SES-TENANT'] = 'test1'
    if config_set:
        msg['X-SES-CONFIGURATION-SET'] = config_set
    msg.attach(MIMEText(body, 'plain'))

    with smtplib.SMTP(smtp_server, port) as server:
        server.starttls()
        server.login(username, password)
        server.send_message(msg)

# Example usage:
send_email('email-smtp.us-east-1.amazonaws.com', 587, 'YOUR_SMTP_USERNAME', 'YOUR_SMTP_PASSWORD','[email protected]', '[email protected]', 'Test Subject', 'Hello World', 'test1')

Email event feedback loop management for tenants

Receiving email events or using a feedback loop is important to monitor the email sending practices followed by the tenants. Tenant management provides reputation management capabilities for multi-tenant environments, so organizations can maintain granular control over email sending practices across their tenant base. You can automatically monitor and enforce reputation-based policies at the tenant level, so that problematic email sending behavior from one tenant doesn’t impact the deliverability of others. When reputation issues are detected, Amazon SES can automatically pause sending for the affected tenant while allowing other tenants to continue their email operations unimpeded.Organizations can now implement precise enforcement mechanisms through automated reputation findings that provide early detection of potential deliverability issues. Tenant isolation uses machine learning models and signal-based detection to identify problematic patterns in email sending behaviour. When issues are detected, Amazon SES automatically notifies the parent account and can trigger predetermined actions based on customizable thresholds. This granular control helps maintain strong deliverability rates across the entire email sending infrastructure while isolating and addressing issues at the tenant level.

Enforcement data and patterns

Unlike other communication channels that are governed by a patchwork of national laws, bulk email delivery is subject to requirements dictated by a handful of large inbox providers. Google, Yahoo, Microsoft, and several others set deliverability targets, leaving compliance up to the sender or service providers such as Amazon SES. Amazon SES, in turn, expects its direct customers, including multi-tenant providers, to monitor key signals of enforcement. If any of the tenants send rogue emails, Amazon SES expects the AWS customer to monitor key enforcement signals and take appropriate actions such as pausing or stopping the rogue tenants. Signals for enforcement and trust indicators are essential components of our email reputation management system. These signals are various data points and behaviours we monitor to assess the trustworthiness of email senders. Trust indicators, derived from these signals, provide a measure of a sender’s reputation and adherence to best practices. Amazon SES uses a combination of pre-send signals (such as account vetting and configuration) and post-send signals (including delivery success rates, bounce rates, and recipient engagement) to calculate reputation findings. These findings then inform automated enforcement actions and manual reviews, helping to ensure that our service maintains high deliverability standards while protecting recipients from unwanted or malicious emails. By continuously refining our signal analysis and enforcement processes, we strive to create a reliable and secure email ecosystem for all users.

Administrating tenant isolation and reputation management

When managing multiple tenants sending email through your SES account, you’ll want to monitor sending behaviour and reputation. Amazon SES provides a comprehensive monitoring system through reputation findings, which alert you when tenants exhibit concerning sending patterns. These findings appear in your dashboard and are delivered as events through Amazon EventBridge default event bus, letting you know immediately when issues arise.

As an email deferability administrator, your daily monitoring routine needs to include reviewing the tenant management dashboard where you can see all your tenants’ status at a glance. Pay particular attention to any reputation findings, which come in two levels—low and high severity. These findings indicate when tenants exceed acceptable thresholds for metrics like bounce rates or complaint rates. You can configure reputation policies to automatically pause tenant sending when these thresholds are breached, with options for standard enforcement (pausing on high severity findings) or strict enforcement (pausing on low severity findings).

When a tenant is paused, either automatically or manually, you’ll need to investigate the cause. The reputation findings provide detailed information about what triggered the pause, such as elevated bounce rates or complaint rates. After addressing the underlying issues with the tenant, you can reinstate their sending capabilities. During reinstatement, the tenant can continue sending while you monitor their metrics to verify that they return to healthy levels. After their metrics improve, the tenant will automatically transition back to a normal enabled status.

Available metrics and data points

These core reputation metrics are released by Amazon SES and can be routed to EventBridge. The event feedback loop will contain the tenant name and ID to enable tracking of tenant-specific bounce rates

Complaint rates per tenant
Third-party specific complaint rates
Spamhaus IP listing status
Email volume pattern

For ongoing management, you have full control over tenant resources and configurations. You can assign or remove sending identities and configuration sets as needed, adjust reputation policies, and manually pause sending if you observe concerning patterns. By using this combination of automated monitoring, clear reputation signals, and flexible management tools, you can maintain control over your tenants while preventing individual tenant issues from affecting your overall account reputation. The key is to stay proactive in monitoring the dashboard and reputation findings, and to act quickly when issues arise.

Conclusion

We’re excited to see how our customers will use the tenant management feature to transform their email operations, boost efficiency, and create better experiences for their users. To get started with tenant isolation simply visit the Amazon SES console or see Tenants in the Amazon SES Developer Guide. You can find details about pricing on the Amazon SES pricing page. We’re committed to improving tenant isolation and management based on your feedback and needs, and we look forward to bringing you even more powerful and flexible email management capabilities in the future. Start exploring multi-tenant management today with Amazon SES.

About the authors

Improving network observability with new AWS Outposts racks network metrics

2025-08-06 Adam Duffield

Post Syndicated from Adam Duffield original https://aws.amazon.com/blogs/compute/improving-network-observability-with-new-aws-outposts-racks-network-metrics/

With AWS Outposts racks, you can extend AWS infrastructure, services, APIs, and tools to on-premises locations. Providing performant, stable, and resilient network connections to both the parent AWS Region as well as the local network is essential to maintaining uninterrupted service.

The release of two new Amazon CloudWatch metrics, VifConnectionStatus and VifBgpSessionState, gives you greater visibility into the operational status of the Outpost network connections. In this post, we discuss how to use these metrics to quickly identify network disruptions, using additional data points that can help reduce time to resolution.

Outposts network connectivity overview

When connecting an Outposts rack to your chosen data center location, network connections are made between the Outpost Networking Devices (ONDs) and Customer Network Devices (CNDs). These network connections support both the Service Link connectivity back to the chosen anchor Region and connectivity to the on-premises local network through the Local Gateway. First-generation Outposts racks include a minimum of two network devices to provide resilience, with second-generation Outposts racks including four network devices.

Virtual interfaces (VIFs) are used to establish IP network connectivity between the Outpost and CNDs, using Border Gateway Protocol (BGP) for dynamic routing. You can view the details for these VIFs on the Outposts console by choosing Link aggregation groups (LAGs) in the navigation pane and drilling down to find the specific service link and local gateway VIF information. For each connection between an OND and CND, two BGP sessions are established: one to support service link traffic and the other to support local gateway traffic.

The following diagram shows an example of this connectivity for a first-generation Outposts rack.

Figure 1: First-Generation Outposts Rack network connections

In this configuration, a total of four VIFs are configured into two link aggregation groups (LAGs): one on each OND for the service link and local gateway VIFs.

Understanding the new CloudWatch metrics for Outposts

Observability into the operational status of Outposts rack, including the status and performance of network connectivity, is important for you to be able to quickly identify and investigate potential issues. With the addition of the VifConnectionStatus and VifBgpSessionState Outposts metrics in CloudWatch, you have greater visibility into the connection status of the Outposts rack to your CNDs. The VifConnectionStatus metric is provided on a per-VIF level, available for both the local gateway and service link VIFs. It provides an indication on the status of the VIF using two possible values:

A value of 1 indicates that the VIF is successfully connected to the CND with established BGP sessions and able to transmit traffic
A value of 0 indicates that the VIF is not in an operational state due to an underlying issue

The VifBgpSessionState metric goes deeper into the BGP connectivity status between each Outposts VIF and CND. A BGP session can be in one of multiple states, each providing insight into where a potential issue might be. To reflect this, the CloudWatch metric value shown relates to the following BGP states:

IDLE – The initial state; the ONDs are waiting for a start event
Connect – The Outposts rack is waiting for the TCP connection to be complete
Active – The Outposts rack is trying to initiate a TCP connection
OpenSent – The router has sent an OPEN message and is waiting for a response
OpenConfirm – The router has received an OPEN message and is waiting for a KEEPALIVE response
Established – The BGP connection is fully established and the ONDs and CNDs can exchange routing information

With these metrics now available in CloudWatch, you can configure Amazon CloudWatch alarms to alert when the metric values indicate potential issues. You can combine existing CloudWatch metrics for Outposts racks with these new metrics to give additional context and visibility into network connectivity status.

Using CloudWatch metrics to investigate Outposts network connectivity issues

In the event of network connectivity issues, it’s important to understand how to use these metrics to assist with investigations and understand potential causes when seeing network impairment. To start with, the Configuration state of the VIFs should be checked. For each VIF, there are four possible states:

Pending – A VIF is in this state from the time that it is created within a VIF group until the VIF becomes active on the OND
Available – A VIF is active on ONDs
Deleting – A VIF is in this state immediately after requesting deletion
Deleted – A VIF is deleted

To check the state of an individual VIF on the Outposts console, choose Networking followed by Link aggregation groups (LAGS) in the navigation pane. The service link and local gateway VIFs associated with a specific LAG are shown, and when you choose a specific LAG, the configuration state of the associated VIFs are visible.

Figure 2: AWS Outposts console showing VIF configuration details

You can also retrieve these details programmatically. For example, use the following AWS Command Line Interface (AWS CLI) command to specifically check the configuration state of a service link VIF with ID sl-vif-087faf21db43ba723:

aws ec2 describe-service-link-virtual-interfaces \
--service-link-virtual-interface-id sl-vif-087faf21db43ba723
{
    "ServiceLinkVirtualInterfaces": [
        {
            "ServiceLinkVirtualInterfaceId": "sl-vif-087faf21db43ba723",
            "ServiceLinkVirtualInterfaceArn": "arn:aws:ec2:us-west-2:111122223333:service-link-virtual-interface/sl-vif-087faf21db43ba723",
            "OutpostId": "op-07f6f537e0607d3f1",
            "OutpostArn": "arn:aws:outposts:us-west-2:111122223333:outpost/op-07f6f537e0607d3f1",
            "OwnerId": "280066404755",
            "LocalAddress": "XX.XX.XX.XX/XX",
            "PeerAddress": " XX.XX.XX.XX/XX ",
            "PeerBgpAsn": 65000,
            "Vlan": 2006,
            "OutpostLagId": "op-lag-03782b844d7da1afc",
            "Tags": [],
            "ConfigurationState": "available"
        }
    ]
}

After confirming the Configuration state, you can use the VifConnectionStatus metric to determine the network connectivity status of individual VIFs. When operating and processing traffic in a healthy state, the value of this metric is 1. If this value changes to 0, it indicates a connectivity problem for that VIF between the Outpost and CNDs.

To further understand the potential cause of the VifConnectionStatus value, you can use the VifBgpSessionState metric. Under normal operational status, this metric value is 6, indicating that the BGP session is established and traffic can be sent and received. However, if this metric value changes to 1–5, then it is indicative of an issue. To start investigating the cause of this, you should review VIF configuration both on the Outposts console and programmatically. This includes the values set on the OND for VLAN, local and peer addresses, and BGP ASN. These values can be validated against the configuration on your on-premises CNDs if required. Furthermore, you can use the VifBgpSessionState metric value to determine the potential cause:

If the value is 1, validate the values for BGP ASN and peer addresses
If the value is 2, this might indicate port or IP address issues
If the value is 3, this might indicate BGP version mismatches
If the value is 4 or 5, this refers to networking path problems

By using a combination of these metrics, you can gain a clearer understanding of the potential network issue without having to engage with AWS or third-party support teams.

You can view and query these metrics on the CloudWatch console. In the navigation pane, choose All metrics, followed by Outposts under the AWS namespaces section. The Outposts namespace can only be viewed by the Outposts owner account, unless CloudWatch cross-account observability is configured. The new VifConnectionStatus and VifBgpSessionState metrics can be found under the OutpostsID, VirtualInterfaceGroupId, VirtualInterfaceId dimension.

Figure 3: Amazon CloudWatch metrics for AWS Outposts

For more information on working with metrics, see Metrics in Amazon CloudWatch. For creating alerts based upon these new metrics and their values, refer to Using Amazon CloudWatch alarms.

The resilient design of using multiple ONDs for both service link and local gateway traffic allows workloads to continue to run in the event of connectivity issues for single VIFs. For example, a single service link VIF might report as being down, but the remaining service link VIFs might be unaffected and remain available. In this scenario, the service link itself would remain functional and connected, albeit with potentially lower resilience and capacity. This can be validated throught the ConnectedStatus metric which would have a value of 1.

Conclusion

This post provided details on the newly released CloudWatch metrics for Outposts racks, VifConnectionStatus and VifBgpSessionState, and how you can use them to investigate potential connectivity issues. For more information on Outposts rack networking patterns, see the Networking section of the Outposts High Availability Design and Architecture Considerations whitepaper. For more information about additional CloudWatch metrics that are available, check out the CloudWatch metrics for AWS Outposts documentation for second-generation Outposts racks and first-generation Outposts racks.

Reach out to your AWS account team, or fill out this form to learn more about observability for Outposts.

Minimize AI hallucinations and deliver up to 99% verification accuracy with Automated Reasoning checks: Now available

2025-08-06 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/minimize-ai-hallucinations-and-deliver-up-to-99-verification-accuracy-with-automated-reasoning-checks-now-available/

Today, I’m happy to share that Automated Reasoning checks, a new Amazon Bedrock Guardrails policy that we previewed during AWS re:Invent, is now generally available. Automated Reasoning checks helps you validate the accuracy of content generated by foundation models (FMs) against a domain knowledge. This can help prevent factual errors due to AI hallucinations. The policy uses mathematical logic and formal verification techniques to validate accuracy, providing definitive rules and parameters against which AI responses are checked for accuracy.

This approach is fundamentally different from probabilistic reasoning methods which deal with uncertainty by assigning probabilities to outcomes. In fact, Automated Reasoning checks delivers up to 99% verification accuracy, providing provable assurance in detecting AI hallucinations while also assisting with ambiguity detection when the output of a model is open to more than one interpretation.

With general availability, you get the following new features:

Support for large documents in a single build, up to 80K tokens – Process extensive documentation; we found this can add up to 100 pages of content
Simplified policy validation – Save your validation tests and run them repeatedly, making it easier to maintain and verify your policies over time
Automated scenario generation – Create test scenarios automatically from your definitions, saving time and effort while helping make coverage more comprehensive
Enhanced policy feedback – Provide natural language suggestions for policy changes, simplifying the way you can improve your policies
Customizable validation settings – Adjust confidence score thresholds to match your specific needs, giving you more control over validation strictness

Let’s see how this works in practice.

Creating Automated Reasoning checks in Amazon Bedrock Guardrails
To use Automated Reasoning checks, you first encode rules from your knowledge domain into an Automated Reasoning policy, then use the policy to validate generated content. For this scenario, I’m going to create a mortgage approval policy to safeguard an AI assistant evaluating who can qualify for a mortgage. It is important that the predictions of the AI system do not deviate from the rules and guidelines established for mortgage approval. These rules and guidelines are captured in a policy document written in natural language.

In the Amazon Bedrock console, I choose Automated Reasoning from the navigation pane to create a policy.

I enter name and description of the policy and upload the PDF of the policy document. The name and description are just metadata and do not contribute in building the Automated Reasoning policy. I describe the source content to add context on how it should be translated into formal logic. For example, I explain how I plan to use the policy in my application, including sample Q&A from the AI assistant.

When the policy is ready, I land on the overview page, showing the policy details and a summary of the tests and definitions. I choose Definitions from the dropdown to examine the Automated Reasoning policy, made of rules, variables, and types that have been created to translate the natural language policy into formal logic.

The Rules describe how variables in the policy are related and are used when evaluating the generated content. For example, in this case, which are the thresholds to apply and how some of the decisions are taken. For traceability, each rule has its own unique ID.

The Variables represent the main concepts at play in the original natural language documents. Each variable is involved in one or more rules. Variables allow complex structures to be easier to understand. For this scenario, some of the rules need to look at the down payment or at the credit score.

Custom Types are created for variables that are neither boolean nor numeric. For example, for variables that can only assume a limited number of values. In this case, there are two type of mortgage described in the policy, insured and conventional.

Now we can assess the quality of the initial Automated Reasoning policy through testing. I choose Tests from the dropdown. Here I can manually enter a test, consisting of input (optional) and output, such as a question and its possible answer from the interaction of a customer with the AI assistant. I then set the expected result from the Automated Reasoning check. The expected result can be valid (the answer is correct), invalid (the answer is not correct), or satisfiable (the answer could be true or false depending on specific assumptions). I can also assign a confidence threshold for the translation of the query/content pair from natural language to logic.

Before I enter tests manually, I use the option to automatically generate a scenario from the definitions. This is the easiest way to validate a policy and (unless you’re an expert in logic) should be the first step after the creation of the policy.

For each generated scenario, I provide an expected validation to say if it is something that can happen (satisfiable) or not (invalid). If not, I can add an annotation that can then be used to update the definitions. For a more advanced understanding of the generated scenario, I can show the formal logic representation of a test using SMT-LIB syntax.

After using the generate scenario option, I enter a few tests manually. For these tests, I set different expected results: some are valid, because they follow the policy, some are invalid, because they flout the policy, and some are satisfiable, because their result depends on specific assumptions.

Then, I choose Validate all tests to see the results. All tests passed in this case. Now, when I update the policy, I can use these tests to validate that the changes didn’t introduce errors.

For each test, I can look at the findings. If a test doesn’t pass, I can look at the rules that created the contradiction that made the test fail and go against the expected result. Using this information, I can understand if I should add an annotation, to improve the policy, or correct the test.

Now that I’m satisfied with the tests, I can create a new Amazon Bedrock guardrail (or update an existing one) to use up to two Automated Reasoning policies to check the validity of the responses of the AI assistant. All six policies offered by Guardrails are modular, and can be used together or separately. For example, Automated Reasoning checks can be used with other safeguards such as content filtering and contextual grounding checks. The guardrail can be applied to models served by Amazon Bedrock or with any third-party model (such as OpenAI and Google Gemini) via the ApplyGuardrail API. I can also use the guardrail with an agent framework such as Strands Agents, including agents deployed using Amazon Bedrock AgentCore.

Now that we saw how to set up a policy, let’s look at how Automated Reasoning checks are used in practice.

Customer case study – Utility outage management systems
When the lights go out, every minute counts. That’s why utility companies are turning to AI solutions to improve their outage management systems. We collaborated on a solution in this space together with PwC. Using Automated Reasoning checks, utilities can streamline operations through:

Automated protocol generation – Creates standardized procedures that meet regulatory requirements
Real-time plan validation – Ensures response plans comply with established policies
Structured workflow creation – Develops severity-based workflows with defined response targets

At its core, this solution combines intelligent policy management with optimized response protocols. Automated Reasoning checks are used to assess AI-generated responses. When a response is found to be invalid or satisfiable, the result of the Automated Reasoning check is used to rewrite or enhance the answer.

This approach demonstrates how AI can transform traditional utility operations, making them more efficient, reliable, and responsive to customer needs. By combining mathematical precision with practical requirements, this solution sets a new standard for outage management in the utility sector. The result is faster response times, improved accuracy, and better outcomes for both utilities and their customers.

In the words of Matt Wood, PwC’s Global and US Commercial Technology and Innovation Officer:

“At PwC, we’re helping clients move from AI pilot to production with confidence—especially in highly regulated industries where the cost of a misstep is measured in more than dollars. Our collaboration with AWS on Automated Reasoning checks is a breakthrough in responsible AI: mathematically assessed safeguards, now embedded directly into Amazon Bedrock Guardrails. We’re proud to be AWS’s launch collaborator, bringing this innovation to life across sectors like pharma, utilities, and cloud compliance—where trust isn’t a feature, it’s a requirement.”

Things to know
Automated Reasoning checks in Amazon Bedrock Guardrails is generally available today in the following AWS Regions: US East (Ohio, N. Virginia), US West (Oregon), and Europe (Frankfurt, Ireland, Paris).

With Automated Reasoning checks, you pay based on the amount of text processed. For more information, see Amazon Bedrock pricing.

To learn more, and build secure and safe AI applications, see the technical documentation and the GitHub code samples. Follow this link for direct access to the Amazon Bedrock console.

The videos in this playlist include an introduction to Automated Reasoning checks, a deep dive presentation, and hands-on tutorials to create, test, and refine a policy.

— Danilo

OpenAI open weight models now available on AWS

2025-08-06 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/openai-open-weight-models-now-available-on-aws/

AWS is committed to bringing you the most advanced foundation models (FMs) in the industry, continuously expanding our selection to include groundbreaking models from leading AI innovators so that you always have access to the latest advancements to drive your business forward.

Today, I am happy to announce the availability of two new OpenAI models with open weights in Amazon Bedrock and Amazon SageMaker JumpStart. OpenAI gpt-oss-120b and gpt-oss-20b models are designed for text generation and reasoning tasks, offering developers and organizations new options to build AI applications with complete control over their infrastructure and data.

These open weight models excel at coding, scientific analysis, and mathematical reasoning, with performance comparable to leading alternatives. Both models support a 128K context window and provide adjustable reasoning levels (low/medium/high) to match your specific use case requirements. The models support external tools to enhance their capabilities and can be used in an agentic workflow, for example, using a framework like Strands Agents.

With Amazon Bedrock and Amazon SageMaker JumpStart, AWS gives you the freedom to innovate with access to hundreds of FMs from leading AI companies, including OpenAI open weight models. With our comprehensive selection of models, you can match your AI workloads to the perfect model every time.

Through Amazon Bedrock, you can seamlessly experiment with different models, mix and match capabilities, and switch between providers without rewriting code—turning model choice into a strategic advantage that helps you continuously evolve your AI strategy as new innovations emerge. At launch, these new models are available in Bedrock via an OpenAI compatible endpoint. You can point the OpenAI SDK to this endpoint or use the Bedrock InvokeModel and Converse API.

With SageMaker JumpStart, you can quickly evaluate, compare, and customize models for your use case. You can then deploy the original or the customized model in production with the SageMaker AI console or using the SageMaker Python SDK.

Let’s see how these work in practice.

Getting started with OpenAI open weight models in Amazon Bedrock
In the Amazon Bedrock console, I choose Model access from the Configure and learn section of the navigation pane. Then, I navigate to the two listed OpenAI models on this page and request access.

Console screenshot

Now that I have access, I use the Chat/Test playground to test and evaluate the models. I select OpenAI as the category and then the gpt-oss-120b model.

Console screenshot

Using this model, I run the following sample prompt:

A family has $5,000 to save for their vacation next year. They can place the money in a savings account earning 2% interest annually or in a certificate of deposit earning 4% interest annually but with no access to the funds until the vacation. If they need $1,000 for emergency expenses during the year, how should they divide their money between the two options to maximize their vacation fund?

This prompt generates an output that includes the chain of thought used to produce the result.

I can use these models with the OpenAI SDK by configuring the API endpoint (base URL) and using an Amazon Bedrock API key for authentication. For example, I set this environment variables to use the US West (Oregon) AWS Region endpoint (us-west-2) and my Amazon Bedrock API key:

export OPENAI_API_KEY="<my-bedrock-api-key>"
export OPENAI_BASE_URL="https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1"

Now I invoke the model using the OpenAI Python SDK.

client = OpenAI()

response = client.chat.completion.create(
    messages=[{
        "role": "user",
        "content": "Hello, how are you?"
    }],
    model="openai.gpt-oss-120b-1:0",
    stream=True
)

for item in response:
    print(item)

To build an AI agent, I can choose any framework that supports the Amazon Bedrock API or the OpenAI API. For example, here’s the starting code for Strands Agents using the Amazon Bedrock API:

from strands import Agent
from strands.models import BedrockModel
from strands_tools import calculator

model = BedrockModel(
    model_id="openai.gpt-oss-120b-1:0"
)
agent = Agent(
    model=model,
    tools=[calculator]
)

agent("Tell me the square root of 42 ^ 3")

I save the code (app.py file), install the dependencies, and run the agent locally:

pip install strands-agents strands-agents-tools
python app.py

When I am satisfied with the agent, I can deploy in production using the capabilities offered by Amazon Bedrock AgentCore, including a fully managed serverless runtime and memory and identity management.

Getting started with OpenAI open weight models in Amazon SageMaker JumpStart
In the Amazon SageMaker AI console, you can use OpenAI open weight models in the SageMaker Studio. The first time I do this, I need to set up a SageMaker domain. There are options to set it up for a single user (simpler) or an organization. For these tests, I use a single user setup.

In the SageMaker JumpStart model view, I have access to a detailed description of the gpt-oss-120b or gpt-oss-20b model.

I choose the gpt-oss-20b model and then deploy the model. In the next steps, I select the instance type and the initial instance count. After a few minutes, the deployment creates an endpoint that I can then invoke in SageMaker Studio and using any AWS SDKs.

To learn more, visit GPT OSS models from OpenAI are now available on SageMaker JumpStart in the AWS Artificial Intelligence Blog.

Things to know
The new OpenAI open weight models are now available in Amazon Bedrock in the US West (Oregon) AWS Region, while Amazon SageMaker JumpStart supports these models in US East (Ohio, N. Virginia) and Asia Pacific (Mumbai, Tokyo).

Each model comes equipped with full chain-of-thought output capabilities, providing you with detailed visibility into the model’s reasoning process. This transparency is particularly valuable for applications requiring high levels of interpretability and validation. These models give you the freedom to modify, adapt, and customize them to your specific needs. This flexibility allows you to fine-tune the models for your unique use cases, integrate them into your existing workflows, and even build upon them to create new, specialized models tailored to your industry or application.

Security and safety are built into the core of these models, with comprehensive evaluation processes and safety measures in place. The models maintain compatibility with the standard GPT-4 tokenizer.

Both models can be used in your preferred environment, whether that’s through the serverless experience of Amazon Bedrock or the extensive machine learning (ML) development capabilities of SageMaker JumpStart. For information about the costs associated with using these models and services, visit the Amazon Bedrock pricing and Amazon SageMaker AI pricing pages.

To learn more, see the parameters for the models and the chat completions API in the Amazon Bedrock documentation.

Get started today with OpenAI open weight models on AWS in the Amazon Bedrock console or in Amazon SageMaker AI console.

– Danilo

Introducing Amazon Elastic VMware Service for running VMware Cloud Foundation on AWS

2025-08-05 Micah Walter

Post Syndicated from Micah Walter original https://aws.amazon.com/blogs/aws/introducing-amazon-elastic-vmware-service-for-running-vmware-cloud-foundation-on-aws/

Today, we’re announcing the general availability of Amazon Elastic VMware Service (Amazon EVS), a new AWS service that lets you run VMware Cloud Foundation (VCF) environments directly within your Amazon Virtual Private Cloud (Amazon VPC). With Amazon EVS, you can deploy fully functional VCF environments in just hours using a guided workflow, while running your VMware workloads on qualified Amazon Elastic Compute Cloud (Amazon EC2) bare metal instances and seamlessly integrating with AWS services such as Amazon FSx for NetApp ONTAP.

Many organizations running VMware workloads on premises want to move to the cloud to benefit from improved scalability, reliability, and access to cloud services, but migrating these workloads often requires substantial changes to applications and infrastructure configurations. Amazon EVS lets customers continue using their existing VMware expertise and tools without having to re-architect applications or change established practices, thereby simplifying the migration process while providing access to AWS’s scale, reliability, and broad set of services.

With Amazon EVS, you can run VMware workloads directly in your Amazon VPC. This gives you full control over your environments while being on AWS infrastructure. You can extend your on-premises networks and migrate workloads without changing IP addresses or operational runbooks, reducing complexity and risk.

Key capabilities and features

Amazon EVS delivers a comprehensive set of capabilities designed to streamline your VMware workload migration and management experience. The service enables seamless workload migration without the need for replatforming or changing hypervisors, which means you can maintain your existing infrastructure investments while moving to AWS. Through an intuitive, guided workflow on the AWS Management Console, you can efficiently provision and configure your EVS environments, significantly reducing the complexity to migrate your workloads to AWS.

With Amazon EVS, you can deploy a fully functional VCF environment running on AWS in a few hours. This process eliminates many of the manual steps and potential configuration errors that often occur during traditional deployments. Furthermore, with Amazon EVS you can optimize your virtualization stack on AWS. Given the VCF environment runs inside your VPC, you have full full administrative access to the environment and the associated management appliances. You also have the ability to integrate third-party solutions, from external storage such as Amazon FSx for NetApp ONTAP or Pure Cloud Block Store or backup solutions such as Veeam Backup and Replication.

The service also gives you the ability to self-manage or work with AWS Partners to build, manage, and operate your environments. This provides you with flexibility to match your approach with your overall goals.

Setting up a new VCF environment

Organizations can streamline their setup process by ensuring they have all the necessary pre-requisites in place ahead of creating a new VCF environment. These prerequisites include having an active AWS account, configuring the appropriate AWS Identity and Access Management (IAM) permissions, and setting up a Amazon VPC with sufficient CIDR space and two Route Server endpoints, with each endpoint having its own peer. Additionally, customers will need to have their VMware Cloud Foundation license keys ready, secure Amazon EC2 capacity reservations specifically for i4i.metal instances, and prepare their VLAN subnet information planning.

To help ensure a smooth deployment process, we’ve provided a Getting started hub, which you can access from the EVS homepage as well as a comprehensive guide in our documentation. By following these preparation steps, you can avoid potential setup delays and ensure a successful environment creation.

Screenshots of EVS onboarding

Let’s walk through the process of setting up a new VCF environment using Amazon EVS.

Screenshots of EVS onboarding

You will need to provide your Site ID, which is allocated by Broadcom when purchasing VCF licenses, along with your license keys. To ensure a successful initial deployment, you should verify you have sufficient licensing coverage for a minimum of 256 cores. This translates to at least four i4i.metal instances, with each instance providing 64 physical cores.

This licensing requirement helps you maintain optimal performance and ensures your environment meets the necessary infrastructure specifications. By confirming these requirements upfront, you can avoid potential deployment delays and ensure a smooth setup process.

Screenshots of EVS onboarding

Once you have provided all the required details, you will be prompted to specify your host details. These are the underlying Amazon EC2 instances that your VCF environment will get deployed in.

Screenshots of EVS onboarding

Once you have filled out details for each of your host instances, you will need to configure your networking and management appliance DNS details. For further information on how to create a new VCF environment on Amazon EVS, follow the documentation here.

Screenshots of EVS onboarding

After you have created your VCF environment, you will be able to look over all of the host and configuration details through the AWS Console.

Additional things to know

Amazon EVS currently supports VCF version 5.2.1 and runs on i4i.metal instances. Future releases will expand VCF versions, licensing options, and more instance type support to provide even more ﬂexibility for your deployments.

Amazon EVS provides flexible storage options. Your Amazon EVS local Instance storage is powered by VMware’s vSAN solution, which pools local disks across multiple ESXi hosts into a single distributed datastore. To scale your storage, you can leverage external Network File System (NFS) or iSCSI-based storage solutions. For example, Amazon FSx for NetApp ONTAP is particularly well-suited for use as an NFS datastore or shared block storage over iSCSI.

Additionally, Amazon EVS makes connecting your on-premises environments to AWS simple. You can connect from on-premises vSphere environment into Amazon EVS using a Direct Connect connection or a VPN that terminates into a transit gateway. Amazon EVS also manages the underlying connectivity from your VLAN subnets into your VMs.

AWS provides comprehensive support for all AWS services deployed by Amazon EVS, handling direct customer support while engaging with Broadcom for advanced support needs. Customers must maintain AWS Business Support on accounts running the service.

Availability and pricing

Amazon EVS is now generally available in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Frankfurt), Europe (Ireland), and Asia Pacific (Tokyo) AWS Regions, with additional Regions coming soon. Pricing is based on the Amazon EC2 instances and AWS resources you use, with no minimum fees or upfront commitments.

To learn more, visit the Amazon EVS product page.

Integrate scientific data management and analytics with the next generation of Amazon SageMaker, Part 1

2025-08-05 Nadeem Bulsara

Post Syndicated from Nadeem Bulsara original https://aws.amazon.com/blogs/big-data/integrate-scientific-data-management-and-analytics-with-the-next-generation-of-amazon-sagemaker-part-1/

Our customers tell us that scientists are increasingly spending more time managing data-related challenges than focusing on science. The primary reason for this challenge is that scientific data comes in many types and is siloed across systems, groups, and stages, and scientists struggle to efficiently discover, access, share, and analyze datasets across silos. This fragmentation creates lengthy cycles full of manual interventions, leading to inefficiencies. Mapping data sources and negotiating access across silos can take 4–6 weeks, integrating datasets can extend to months, and fully connecting data from source to tooling can take years, if ever achieved. These data challenges reduce lab productivity and slow down scientific innovation, which decrease drug and product pipeline throughput, and ultimately delay time-to-market. The solution lies in breaking down data silos by creating digital environments that help scientists efficiently connect disparate datasets and analytical tools, so they can conduct iterative hypothesis and product testing without technology friction.

Part 1 of this series shows an example project in drug target identification where two groups of scientists need to collaborate as they integrate no-code knowledge searching, scientific data management, and sophisticated analytics. In this example, a computational biology team begins by mining the scientific literature on a knowledge search GUI. Next, they navigate to a data catalog to find and access relevant datasets, which they share with the data scientist team to run analytics with sophisticated tools (see the following figure). Although the end-to-end journey illustrates the benefits to a target identification example, the underlying data challenges and technology solution apply to any life sciences use case requiring the integration of data management and analytics. Details of the implementation and technical solution will be discussed in Part 2 of the series.

A flow diagram with a dark background starting with Scientific data. It shows people with stock images as example personas that use the data to derive insights.

Example use case

A computational biologist has been tasked with identifying a target for Non-Alcoholic Fatty Liver Disease (NAFLD). A typical question from the biologist might be “Can I find genes associated with NAFLD and do we have a patient cohort with variants in those genes?” The solution we designed for this use case involves three simple steps:

Search the scientific literature through a no-code interface to identify genomic variants associated with NAFLD.
Search an internal data catalog with natural language:
- Find datasets of interest, such as multi-omics and clinical data for patients associated with NAFLD.
- Request access to the relevant datasets.
Share relevant datasets with a data scientist collaborator for deeper analysis.

In designing this solution, we focused on the following features:

Providing no-code scientists with point-and-click and natural-language interfaces
Reducing silos with data findability, governance automation, and seamless collaboration
Providing technical personas with the sophisticated tools and environments they prefer

Solution overview

This solution uses the next generation of Amazon SageMaker, including Amazon SageMaker Unified Studio, an integrated data and AI development environment. SageMaker Unified Studio offers capabilities for data processing, SQL analytics, model development, and generative AI application development, built on existing AWS services. The next generation of SageMaker also includes Amazon SageMaker Catalog, which is built on Amazon DataZone, a data management service designed to streamline data discovery, data cataloging, data sharing, and governance. Your organization can have a single secure data hub where everyone in the organization can find, access, and collaborate on data across AWS, on premises, and even third-party sources.

SageMaker Catalog supports certain system asset types, such as tables from Amazon Redshift, tables from AWS Glue, and object collections from Amazon Simple Storage Service (Amazon S3). It also offers the ability to support custom asset types, which gives users flexibility to catalog data that can’t be categorized as a system asset type. For asset type S3ObjectCollectionType, see Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone. SageMaker Catalog also offers the ability to support custom asset types, which gives users flexibility to catalog data that can’t be categorized as a system asset type. For this example use case, we used AWS HealthOmics variant stores to store and allow querying of genomic variant data. This example lists HealthOmics variant stores as a custom asset type within the catalog. Details of the implementation and technical solution for access management will be discussed in Part 2 of the series.

In the example use case, a computational biologist, in order to identify a target for NAFLD, relies heavily on diverse datasets from multiple sources (genomic sequences, gene expression data, clinical records, and more). This data comes from both internal sources (first-party) and external partners or public databases (third-party). Multiple teams are responsible for collecting and processing this data before making it available to computational biologists, researchers, data scientists, and bioinformaticians within the organization.

In this solution, users (data engineers, data scientists, bioinformaticians, computational biologists) log in to a project-based environment from SageMaker Unified Studio with a preconfigured authentication method. A typical workflow involves the following steps:

Data stewards as authorized members of projects publish data assets into the SageMaker catalog.
Data consumers as authorized members of projects seeking to analyze data for their scientific needs find and discover available data assets of interest from the SageMaker catalog.
Data consumers request to subscribe to the relevant discovered data assets.
Data producers review and decide to approve or reject the subscription request.
Data consumers access and analyze the data using preconfigured tools from SageMaker Unified Studio.

The following diagram illustrates the solution architecture and workflow.

architecture diagram

In the following sections, we explore each step of the workflow in more detail.

Step 1: Data producers publish data assets

As shown in the preceding workflow diagram, data producers can use SageMaker Catalog to publish their datasets as data assets or data products with appropriate business (such as source, license, vendor, study identifier), scientific (such as disease name, cohort information, data modality, assay type), or technical (file types, data formats, file sizes) metadata. In our example use case, the data producers publish clinical data as AWS Glue tables and genomic variant data as a table within the HealthOmics variant store. Additionally, data producers can use AI-based recommendations to automatically populate descriptors, making it straightforward for consumers to find and understand its use.

Step 2: Data consumers find relevant datasets

Data consumers, such as data scientists and bioinformaticians, can log in to SageMaker Unified Studio and navigate to SageMaker Catalog to search for the appropriate data assets and products, such as “NAFLD Variants” or “NAFLD Clinical.” They can also find data assets or products using metadata filters such as study identifiers or disease names to discover the possible datasets associated with a study or disease.

Step 3: Data consumers subscribe to required data assets or products

After the data consumers see a data asset or data product of interest (for example, the clinical and genomics data for NAFLD), they can subscribe to them. Data consumers can also optionally include a comment in the subscription request to add more context to the request. This initiates the subscription workflow based on the asset type.

Step 4: Data producers review and approve the subscription request

Data producers get notified of subscription requests and review if access should be granted and approve accordingly. The response can optionally include a comment for reasoning and traceability. In addition, data producers can limit access to certain rows and columns to protect controlled data.

Step 5: Data consumers access the subscribed data assets or products

Upon approval from the data producer, the data consumer gets access to those data assets and can use them in the appropriate environments configured within their project. For example, data scientists can open a workspace with a JupyterLab notebook already available within SageMaker Unified Studio. Subsequently, the data scientist can start analyzing the tabular clinical and variant data that was just approved for access.

Conclusion

The next generation of SageMaker transforms how scientists work with data by creating an integrated data and analytics environment. In this unified environment, data producers are empowered to publish datasets with rich metadata. Data consumers are able to use the catalog within SageMaker Unified Studio to search for their required datasets, either using free text or using metadata and business glossary filters. Data consumers can subscribe to data securely, tap into powerful search capabilities using free text or metadata filters, and access essential analysis tools (Amazon Athena, JupyterLab IDE, Amazon EMR) directly. The result is a unified digital workspace that reduces communication bottlenecks, speeds up scientific cycles, and removes technical barriers. Scientists can now focus on what matters most—testing hypotheses and products, and scaling scientific innovation to production—within a unified, powerful platform. This streamlined approach accelerates data-driven science, enabling research institutions, pharmaceutical companies, and clinical laboratories to innovate more efficiently. For example, data scientists can launch a space with a JupyterLab notebook preinstalled.

Consider using the next generation of SageMaker to increase productivity within your organization. Contact your account representatives or an AWS Representative to learn how we can help accelerate your projects and your business.

About the authors

Nadeem Bulsara is a Principal Solutions Architect at AWS specializing in Genomics and Life Sciences. He brings his 13+ years of Bioinformatics, Software Engineering, and Cloud Development skills as well as experience in research and clinical genomics and multi-omics to help Healthcare and Life Sciences organizations globally. He is motivated by the industry’s mission to enable people to have a long and healthy life.

Chaitanya Vejendla is a Senior Solutions Architect specialized in DataLake & Analytics primarily working for Healthcare and Life Sciences industry division at AWS. Chaitanya is responsible for helping life sciences organizations and healthcare companies in developing modern data strategies, deploy data governance and analytical applications, electronic medical records, devices, and AI/ML-based applications, while educating customers about how to build secure, scalable, and cost-effective AWS solutions. His expertise spans across data analytics, data governance, AI, ML, big data, and healthcare-related technologies.

Dr. Mileidy Giraldo has over 20 years of experience bridging bioinformatics, research, and industry technology strategy. She specializes in making technology accessible for organizations in the life sciences sector. In her current role as WW Lead for Life Sciences Strategy and Lab of the Future at AWS, she helps biotechs, biopharma, and diagnostics organizations design Data & AI-driven initiatives that modernize labs and help scientists unlock the full value of their data.

Chris Clark is a Senior Solutions Architect focused on helping Life Science customers leverage AWS technology to advance their operational capabilities. With 20+ years of hands-on experience in life sciences manufacturing and supply chain, he combines deep industry knowledge with his AWS expertise to guide his customers. When he’s not working to solve customer challenges, he enjoys cycling and building and repairing things in his workshop.

Nick Furr is a Specialist Solutions Architect at AWS, supporting Data & Analytics for Healthcare and Life Sciences. He helps providers, payers, and life sciences organizations build secure, scalable data platforms to drive innovation and improve outcomes. His work focuses on modernizing data strategies through cloud analytics, governed data processing, and machine learning for use cases like clinical research and population health.

Subrat Das is a Principal Solutions Architect for Global Healthcare and Life Sciences accounts at AWS. He is passionate about modernizing and architecting complex customers workloads. When he’s not working on technology solutions, he enjoys long hikes and traveling around the world.

AWS Weekly Roundup: Amazon DocumentDB, AWS Lambda, Amazon EC2, and more (August 4, 2025)

2025-08-04 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-documentdb-aws-lambda-amazon-ec2-and-more-august-4-2025/

This week brings an array of innovations spanning from generative AI capabilities to enhancements of foundational services. Whether you’re building AI-powered applications, managing databases, or optimizing your cloud infrastructure, these updates help build more advanced, robust, and flexible applications.

Last week’s launches
Here are the launches that got my attention this week:

Amazon DocumentDB – Amazon DocumentDB Serverless is now available offering an on-demand, fully managed MongoDB API-compatible document database service. Read more in Channy’s post.
Amazon Q Developer CLI – You can now create custom agents to help you customize the CLI agent to be more effective when performing specialized tasks such as code reviews and troubleshooting. More info in this blog.
Amazon Bedrock Data Automation – Now supports DOC/DOCX files for document processing and H.265 encoded video files for video processing, making it easier to build multimodal data analysis pipelines.
Amazon DynamoDB – Introduced the Amazon DynamoDB data modeling Model Context Protocol (MCP) tool, providing a structured, natural-language-driven workflow to translate application requirements into DynamoDB data models.
AWS Lambda – Response streaming now supports a default maximum response payload size of 200 MB, 10 times higher than before. Lambda response streaming helps you build applications that progressively stream response payloads back to clients, improving performance for latency sensitive workloads by reducing time to first byte (TTFB) performance.
Powertools for AWS – Introducing v2 of Powertools for AWS Lambda (Java), a developer toolkit that helps you implement serverless best practices and directly translates AWS Well-Architected recommendations.
Amazon SNS – Now supports three additional message filtering operators: wildcard matching, anything-but wildcard matching, and anything-but prefix matching. SNS now also supports message group IDs in standard topics, enabling fair queue functionality for subscribed Amazon SQS standard queues.
Amazon CloudFront – Now offers two capabilities to enhance origin timeout controls: a response completion timeout and support for custom response timeout values for Amazon S3 origins. These capabilities give you more control over how to handle slow or unresponsive origins.
Amazon EC2 – You are now able to force terminate EC2 instances that are stuck in the shutting-down state.
Amazon EC2 Auto Scaling – You can now use AWS Lambda functions as notification targets for EC2 Auto Scaling lifecycle hooks. For example, you can use this to trigger custom actions when an instance enters a wait state.
Amazon SES – You can now provision isolated tenants within a single SES account and apply automated reputation policies to manage email sending.
AWS Management Console – You can now view your AWS Applications in the Service menu in the console navigation bar. With this view you can see all your Applications and choose an Application to see all its associated resources.
Amazon Connect – Amazon Connect UI builder now features an updated user interface to reduce the complexity to build structured workflows. It also simplified forecast editing with a new UI experience that improves planning accuracy. The Contact Control Panel now features an updated and more intuitive user interface. Amazon Connect also introduced new actions and workflows into the agent workspace. These actions are powered by third-party applications running in the background.
AWS Clean Rooms – Now publishes events to Amazon EventBridge for status changes in a Clean Rooms collaboration, further simplifying how companies and their partners analyze and collaborate on their collective datasets without revealing or copying one another’s underlying data.
AWS Entity Resolution – Introduced rule-based fuzzy matching using Levenshtein Distance, Cosine Similarity, and Soundex algorithms to help resolve consumer records across fragmented, inconsistent, and often incomplete datasets.

Additional updates
Here are some additional projects, blog posts, and news items that I found interesting:

Amazon Strands Agents SDK: A technical deep dive into agent architectures and observability – Nice overview to build single and multi-agent architectures.
Build dynamic web research agents with the Strands Agents SDK and Tavily – Showing how easy it is to add a new tool.
Structured outputs with Amazon Nova: A guide for builders – Good tips implemented on top of native tool use with constrained decoding.
Automate the creation of handout notes using Amazon Bedrock Data Automation – A solution to build an automated, serverless solution to transform webinar recordings into comprehensive handouts.
Build modern serverless solutions following best practices using Amazon Q Developer CLI and MCP – Adding the AWS Serverless MCP server.
Introducing Amazon Bedrock AgentCore Browser Tool – More info on this tool that enables AI agents to interact seamlessly with websites.
Introducing Amazon Bedrock AgentCore Code Interpreter – A fully managed service that enables AI agents to securely execute code in isolated sandbox environments.

Upcoming AWS events
Check your calendars so that you can sign up for these upcoming events:

AWS re:Invent 2025 (December 1-5, 2025, Las Vegas) — AWS’s flagship annual conference offering collaborative innovation through peer-to-peer learning, expert-led discussions, and invaluable networking opportunities.

AWS Summits — Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Mexico City (August 6) and Jakarta (August 7).

AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Australia (August 15), Adria (September 5), Baltic (September 10), and Aotearoa (September 18).

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse here upcoming in-person and virtual developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

– Danilo

Introducing Amazon Application Recovery Controller Region switch: A multi-Region application recovery service

2025-08-01 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/introducing-amazon-application-recovery-controller-region-switch-a-multi-region-application-recovery-service/

As a developer advocate at AWS, I’ve worked with many enterprise organizations who operate critical applications across multiple AWS Regions. A key concern they often share is the lack of confidence in their Region failover strategy—whether it will work when needed, whether all dependencies have been identified, and whether their teams have practiced the procedures enough. Traditional approaches often leave them uncertain about their readiness for Regional switch.

Today, I’m excited to announce Amazon Application Recovery Controller (ARC) Region switch, a fully managed, highly available capability that enables organizations to plan, practice, and orchestrate Region switches with confidence, eliminating the uncertainty around cross-Region recovery operations. Region switch helps you orchestrate recovery for your multi-Region applications on AWS. It gives you a centralized solution to coordinate and automate recovery tasks across AWS services and accounts when you need to switch your application’s operations from one AWS Region to another.

Many customers deploy business-critical applications across multiple AWS Regions to meet their availability requirements. When an operational event impacts an application in one Region, switching operations to another Region involves coordinating multiple steps across different AWS services, such as compute, databases, and DNS. This coordination typically requires building and maintaining complex scripts that need regular testing and updates as applications evolve. Additionally, orchestrating and tracking the progress of Region switches across multiple applications and providing evidence of successful recovery for compliance purposes often involves manual data gathering.

Region switch is built on a Regional data plane architecture, where Region switch plans are executed from the Region being activated. This design eliminates dependencies on the impacted Region during the switch, providing a more resilient recovery process since the execution is independent of the Region you’re switching from.

Building a recovery plan with ARC Region switch
With ARC Region switch, you can create recovery plans that define the specific steps needed to switch your application between Regions. Each plan contains execution blocks that represent actions on AWS resources. At launch, Region switch supports nine types of execution blocks:

ARC Region switch plan execution block–let you orchestrate the order in which multiple applications switch to the Region you want to activate by referencing other Region switch plans.
Amazon EC2 Auto Scaling execution block–Scales Amazon EC2 compute resources in your target Region by matching a specified percentage of your source Region’s capacity.
ARC routing controls execution block–Changes routing control states to redirect traffic using DNS health checks.
Amazon Aurora global database execution block–Performs database failover with potential data loss or switchover with zero data loss for Aurora Global Database.
Manual approval execution block–Adds approval checkpoints in your recovery workflow where team members can review and approve before proceeding.
Custom Action AWS Lambda execution block–Adds custom recovery steps by executing Lambda functions in either the activating or deactivating Region.
Amazon Route 53 health check execution block–Let you to specify which Regions your application’s traffic will be redirected to during failover. When executing your Region switch plan, the Amazon Route 53 health check state is updated and traffic is redirected based on your DNS configuration.
Amazon Elastic Kubernetes Service (Amazon EKS) resource scaling execution block–Scales Kubernetes pods in your target Region during recovery by matching a specified percentage of your source Region’s capacity.
Amazon Elastic Container Service (Amazon ECS) resource scaling execution block–Scales ECS tasks in your target Region by matching a specified percentage of your source Region’s capacity.

Region switch continually validates your plans by checking resource configurations and AWS Identity and Access Management (IAM) permissions every 30 minutes. During execution, Region switch monitors the progress of each step and provides detailed logs. You can view execution status through the Region switch dashboard and at the bottom of the execution details page.

To help you balance cost and reliability, Region switch offers flexibility in how you prepare your standby resources. You can configure the desired percentage of compute capacity to target in your destination Region during recovery using Region switch scaling execution blocks. For critical applications expecting surge traffic during recovery, you might choose to scale beyond 100 percent capacity, and setting a lower percentage can help achieve faster overall execution times. However, it’s important to note that using one of the scaling execution blocks does not guarantee capacity, and actual resource availability depends on the capacity in the destination Region at the time of recovery. To facilitate the best possible outcomes, we recommend regularly testing your recovery plans and maintaining appropriate Service Quotas in your standby Regions.

ARC Region switch includes a global dashboard you can use to monitor the status of Region switch plans across your enterprise and Regions. Additionally, there’s a Regional executions dashboard that only displays executions within the current console Region. This dashboard is designed to be highly available across each Region so it can be used during operational events.

Region switch allows resources to be hosted in an account that is separate from the account that contains the Region switch plan. If the plan uses resources from an account that is different from the account that hosts the plan, then Region switch uses the executionRole to assume the crossAccountRole to access those resources. Additionally, Region switch plans can be centralized and shared across multiple accounts using AWS Resource Access Manager (AWS RAM), enabling efficient management of recovery plans across your organization.

Let’s see how it works
Let me show you how to create and execute a Region switch plan. There are three parts in this demo. First, I create a Region switch plan. Then, I define a workflow. Finally, I configure the triggers.

Step 1: Create a plan

I navigate to the Application Recovery Controller section of the AWS Management Console. I choose Region switch in the left navigation menu. Then, I choose Create Region switch plan.

After I give a name to my plan, I specify a Multi-Region recovery approach (active/passive or active/active). In Active/Passive mode, two application replicas are deployed into two Regions, with traffic routed into the active Region only. The replica in the passive Region can be activated by executing the Region switch plan.

Then, I select the Primary Region and Standby Region. Optionally, I can enter a Desired recovery time objective (RTO). The service will use this value to provide insight into how long Region switch plan executions take in relation to my desired RTO.

I enter the Plan execution IAM role. This is the role that allows Region switch to call AWS services during execution. I make sure the role I choose has permissions to be invoked by the service and contains the minimum set of permissions allowing ARC to operate. Refer to the IAM permissions section of the documentation for the details.

Step 2: Create a workflow

When the two Plan evaluation status notifications are green, I create a workflow. I choose Build workflows to get started.

Plans enable you to build specific workflows that will recover your applications using Region switch execution blocks. You can build workflows with execution blocks that run sequentially or in parallel to orchestrate the order in which multiple applications or resources recover into the activating Region. A plan is made up of these workflows that allow you to activate or deactivate a specific Region.

For this demo, I use the graphical editor to create the workflow. But you can also define the workflow in JSON. This format is better suited for automation or when you want to store your workflow definition in a source code management system (SCMS) and your infrastructure as code (IaC) tools, such as AWS CloudFormation.

I can alternate between the Design and the Code views by selecting the corresponding tab next to the Workflow builder title. The JSON view is read-only. I designed the workflow with the graphical editor and I copied the JSON equivalent to store it alongside my IaC project files.

Region switch launches an evaluation to validate your recovery strategy every 30 minutes. It regularly checks that all actions defined in your workflows will succeed when executed. This proactive validation assesses various elements, including IAM permissions and resource states across accounts and Regions. By continually monitoring these dependencies, Region switch helps ensure your recovery plans remain viable and identifies potential issues before they impact your actual switch operations.

However, just as an untested backup is not a reliable backup, an untested recovery plan cannot be considered truly validated. While continuous evaluation provides a strong foundation, we strongly recommend regularly executing your plans in test scenarios to verify their effectiveness, understand actual recovery times, and ensure your teams are familiar with the recovery procedures. This hands-on testing is essential for maintaining confidence in your disaster recovery strategy.

Step 3: Create a trigger

A trigger defines the conditions to activate the workflows just created. It’s expressed as a set of CloudWatch alarms. Alarm-based triggers are optional. You can also use Region switch with manual triggers.

From the Region switch page in the console, I choose the Triggers tab and choose Add triggers.

For each Region defined in my plan, I choose Add trigger to define the triggers that will activate the Region.Finally, I choose the alarms and their state (OK or Alarm) that Region switch will use to trigger the activation of the Region.

I’m now ready to test the execution of the plan to switch Regions using Region switch. It’s important to execute the plan from the Region I’m activating (the target Region of the workflow) and use the data plane in that specific Region.

Here is how to execute a plan using the AWS Command Line Interface (AWS CLI):

aws arc-region-switch start-plan-execution \
--plan-arn arn:aws:arc-region-switch::111122223333:plan/resource-id \
--target-region us-west-2 \
--action activate

Pricing and availability
Region switch is available in all commercial AWS Regions at $70 per month per plan. Each plan can include up to 100 execution blocks, or you can create parent plans to orchestrate up to 25 child plans.

Having seen firsthand the engineering effort that goes into building and maintaining multi-Region recovery solutions, I’m thrilled to see how Region switch will help automate this process for our customers. To get started with ARC Region switch, visit the ARC console and create your first Region switch plan. For more information about Region switch, visit the Amazon Application Recovery Controller (ARC) documentation. You can also reach out to your AWS account team with questions about using Region switch for your multi-Region applications.

I look forward to hearing about how you use Region switch to strengthen your multi-Region applications’ resilience.

— seb

Introducing v2 of Powertools for AWS Lambda (Java)

2025-08-01 Philipp Page

Post Syndicated from Philipp Page original https://aws.amazon.com/blogs/compute/introducing-v2-of-powertools-for-aws-lambda-java/

Modern applications increasingly rely on Serverless technologies such as Amazon Web Services (AWS) Lambda to provide scalability, cost efficiency, and agility. The Serverless Applications Lens for the AWS Well-Architected Framework focuses on how to design, deploy, and architect your Serverless applications to overcome some of these challenges.

Powertools for AWS Lambda is a developer toolkit that helps you implement Serverless best practices and directly translates AWS Well-Architected recommendations into actionable, developer friendly utilities. Following the community’s continued successful adoption of Powertools for AWS in Python, Java, TypeScript, and .NET, this post announces the general availability of Powertools for AWS Lambda (Java) v2 coming with major performance improvements, enhanced core utilities, and a brand-new Kafka utility.

Powertools for AWS (Java) v2 provides three updated core utilities:

Logging: A re-designed Java idiomatic logging module providing structured logging that streamlines log aggregation and analysis.
Metrics: An improved metrics experience allowing custom metrics collection using CloudWatch Embedded Metric Format (EMF).
Tracing: An annotation-based way to collect distributed tracing data with AWS X-Ray to visualize and analyze request flows.

Along with the updated core utilities, v2 of the developer toolkit adds two brand new features:

GraalVM native image support: Native image support for GraalVM across all core utilities reducing Lambda cold start times up to 75.61% (p95).
Kafka utility: This new utility integrates with Amazon Managed Streaming for Apache Kafka (Amazon MSK) and self-managed Kafka event sources on Lambda and allows developers to deserialize directly into Kafka native types such as ConsumerRecords.

Learn more about how to migrate to v2 in our upgrade guide.

Getting started using Powertools for AWS Lambda (Java) v2

Powertools for AWS Lambda (Java) v2 is readily accessible as a Java package on Maven Central and integrates with popular build tools such as Maven and Gradle. This post focuses on Maven-based implementation samples to help you get started quickly. Gradle examples are available for all utilities in the documentation and the examples repository.

The toolkit is compatible with Java 11 and newer versions, making sure you can use modern Java features while building Serverless applications. Examples on how to install each utility are outlined in each section of the post and complete configuration examples are also available in the Powertools documentation.

Logging

The Logging utility helps implement structured logging when running on Lambda while still using familiar Java logging libraries such as slf4j, log4j, and logback. v2 of Logging allows you to do the following:

Output structured JSON logs enriched with Lambda context
Choose the logging backend of your choice among log4j2 and logback
Add structured arguments to logs that get serialized into arbitrarily nested JSON objects
Add global log keys using the slf4j default Mapped Diagnostic Context (MDC)

To add the logging utility to your project, include it as a dependency in your Java Maven project. The following example shows how to add the log4j2 logging backend to your application:

<!-- In the dependencies section -->
<dependency>
    <groupId>software.amazon.lambda</groupId>
    <artifactId>powertools-logging-log4j</artifactId>
    <!-- Alternatively, if you wish to use the logback backend
    <artifactId>powertools-logging-logback</artifactId> 
    -->
    <version>2.1.1</version>
</dependency>
<!-- In the build plugins section -->
<plugin>
    <groupId>dev.aspectj</groupId>
    <artifactId>aspectj-maven-plugin</artifactId>
    <configuration>
        <aspectLibraries>
            <aspectLibrary>
                <groupId>software.amazon.lambda</groupId>
                <artifactId>powertools-logging</artifactId>
                <version>2.1.1</version>
            </aspectLibrary>
        </aspectLibraries>
    </configuration>
</plugin>

Create a custom JsonTemplateLayout appender in your log4j2.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration>
    <Appenders>
        <Console name="JsonAppender" target="SYSTEM_OUT">
            <JsonTemplateLayout eventTemplateUri="classpath:LambdaJsonLayout.json" />
        </Console>
    </Appenders>
    <Loggers>
        <Logger name="JsonLogger" level="INFO" additivity="false">
            <AppenderRef ref="JsonAppender"/>
        </Logger>
        <Root level="info">
            <AppenderRef ref="JsonAppender"/>
        </Root>
    </Loggers>
</Configuration>

To add structured logging to your functions, apply the @Logging annotation to your Lambda handler and use the familiar slf4j Java API when writing log statements. This allows you to adopt the logging utility without major code refactoring. Powertools handles routing to the correct logging backend for you. The following example shows how to add global log keys using MDC, and add a structured entry argument to your log message:

public class App implements RequestHandler<SQSEvent, String> {
    private static final Logger log = LoggerFactory.getLogger(App.class);

    @Logging
    public String handleRequest(final SQSEvent input, final Context context) {
        // Add a global log key using Mapped Diagnostic Context MDC
        MDC.put("myCustomKey", "willBeLoggedForAllLogStatements");

        // Log a message with a structured argument (any JSON serializable Object)
        log.info("My message", entry("anotherCustomKey", Map.of("nested", "object")));

        // ... return response
    }
}

Lambda sends the following JSON-formatted output to Amazon CloudWatch Logs (note how the Java Map gets auto-serialized into a JSON object):

{
  "level": "INFO",
  "message": "My message",
  "cold_start": true,
  "function_arn": "arn:aws:lambda:us-east-1:012345678912:function:AppFunction",
  "function_memory_size": 512,
  "function_name": "AppFunction",
  "function_request_id": "0150a2a4-c5aa-4277-9345-17bad039f6c0",
  "function_version": "$LATEST",
  "sampling_rate": 0.1,
  "service": "powertools-java-sample",
  "timestamp": "2025-05-20T08:35:28.565Z",
  "myCustomKey": "willBeLoggedForAllLogStatements",
  "anotherCustomKey": {
    "nested": "object"
  }
}

Metrics

CloudWatch offers essential built-in service metrics for monitoring application throughput, error rates, and resource usage. Users also need to capture workload specific custom metrics relevant to their business use-case following AWS Well-Architected best-practices.

Powertools for AWS (Java) enables you to create custom metrics asynchronously by outputting metrics in CloudWatch EMF directly to standard output—an approach that needs no other configuration. The Lambda service sends the EMF formatted metrics to CloudWatch on your behalf.

The Metrics utility allows you to:

Create custom metrics asynchronously using CloudWatch EMF
Reduce latency by avoiding synchronous metric publishing
Automatically track cold starts in a custom CloudWatch metric
Avoid manually validating your output against the EMF specification
Keep you code clean by avoiding manual flushing to standard output

To add the Metrics utility to your project, add the following Maven dependency:

<!-- In the dependencies section -->
<dependency>
    <groupId>software.amazon.lambda</groupId>
    <artifactId>powertools-metrics</artifactId>
    <version>2.1.1</version>
</dependency>
<!-- In the build plugins section -->
<plugin>
    <groupId>dev.aspectj</groupId>
    <artifactId>aspectj-maven-plugin</artifactId>
    <configuration>
        <aspectLibraries>
            <aspectLibrary>
                <groupId>software.amazon.lambda</groupId>
                <artifactId>powertools-metrics</artifactId>
                <version>2.1.1</version>
            </aspectLibrary>
        </aspectLibraries>
    </configuration>
</plugin>

To add custom metrics to your Lambda function, place the @FlushMetrics annotation on your Lambda handler. The library takes care of validating and flushing your metrics to standard output before the Lambda function terminates. The following example shows how you can automatically capture a cold start metric and emit your own custom metrics:

public class App implements RequestHandler<SQSEvent, String> {
    private static final Logger log = LoggerFactory.getLogger(App.class);
    private static final Metrics metrics = MetricsFactory.getMetricsInstance();

    // This configures a default namespace and service dimension for all metrics
    @FlushMetrics(namespace = "ServerlessAirline", service = "payment", captureColdStart = true)
    public String handleRequest(final SQSEvent input, final Context context) {
        // The Metrics instance is a singleton
        metrics.addMetric("CustomMetric1", 1, MetricUnit.COUNT);

        // Publish metrics with non-default configuration options
        DimensionSet dimensionSet = new DimensionSet();
        dimensionSet.addDimension("Service", "AnotherService");
        metrics.flushSingleMetric("CustomMetric2", 1, MetricUnit.COUNT, "AnotherNamespace", dimensionSet);

        // ... return response
    }
}

AWS CloudWatch Metrics Graph View of metrics generated by Metrics utility example.

Figure 1. AWS CloudWatch Metrics Graph View

Tracing

The Tracing utility provides an annotation-based integration with X-Ray for distributed tracing with minimal configuration. Tracing allows you to:

Gain visibility into your own methods calls and AWS service interactions visualized in the X-Ray console
Automatically capture method responses and errors
Automatically capture Lambda cold start information as part of your traces
Add custom metadata to traces for more context and debugging information
Enable or disable tracing features through environment variables without code changes

To add the Tracing utility to your project, add the following Maven dependency:

<!-- In the dependencies section -->
<dependency>
    <groupId>software.amazon.lambda</groupId>
    <artifactId>powertools-tracing</artifactId>
    <version>2.1.1</version>
</dependency>
<!-- In the build plugins section -->
<plugin>
    <groupId>dev.aspectj</groupId>
    <artifactId>aspectj-maven-plugin</artifactId>
    <configuration>
        <aspectLibraries>
            <aspectLibrary>
                <groupId>software.amazon.lambda</groupId>
                <artifactId>powertools-tracing</artifactId>
                <version>2.1.1</version>
            </aspectLibrary>
        </aspectLibraries>
    </configuration>
</plugin>

To enable tracing in your Lambda function, annotate your Lambda handler and your custom methods that you want to trace with the @Tracing annotation. Each annotation maps to a sub-segment of your main Lambda handler in X-Ray and becomes visible in the console.

public class App implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> {
    private static final Logger log = LoggerFactory.getLogger(App.class);

    @Tracing
    public APIGatewayProxyResponseEvent handleRequest(final APIGatewayProxyRequestEvent input, final Context context) {
        // ... business logic
        
        // Get calling IP with tracing
        String location = getCallingIp("https://checkip.amazonaws.com");

        // ... return response
    }

    @Tracing(segmentName = "Location service")
    private String getCallingIp(String address) {
        // Implementation to get IP address
        log.info("Retrieving caller IP address");
        
        // Add custom metadata to current sub-segment
        URL url = new URL(address);
        putMetadata("getCallingIp", address);
        
        // ...
        return "127.0.0.1";
    }
}

The X-Ray console displays a generated service map when traffic begins flowing through your application. Applying the Tracing annotation to your Lambda function handler method or any other methods in the execution chain provides you with comprehensive visibility into the traffic patterns throughout your application. The following figure shows how the custom metadata added in the example is associated with the custom sub-segment.

Picture showing the generated traces in the AWS X-Ray console. Shows the custom named Location service trace along with its metadata as a JSON object.

Figure 2. AWS X-Ray waterfall trace view

Reducing Lambda cold start duration

A key feature in Powertools for AWS Lambda (Java) v2 is GraalVM native image support for all core utilities. Compiling your Lambda functions to native executables allows you to significantly reduce cold start times and memory usage. Using Powertools v2 with GraalVM allows you to reduce cold starts up to 75.61% (p95) compared to using the managed Java runtime. The following benchmark compares the cold start times of an application using all core utilities (logging, metrics, tracing) on the managed java21 runtime as compared to the Lambda provided.al2023 runtime running a GraalVM compiled native image (go to the supported Lambda runtimes):

Environment	p95 (ms)	Min (ms)	Avg (ms)	Max (ms)	Max Memory (MB)	N
Powertools for AWS (Java) v2: JVM	1682.92	1224.55	1224.55	2229.81	205.04	234
Powertools for AWS (Java) v2: GraalVM	542.86	404.92	504.77	752.85	93.46	369

This improvement is particularly valuable for latency-sensitive applications and functions that scale frequently. Check out a full working example on GitHub.

Lambda MSK Event Source Mapping Integration

The new Kafka utility introduced with Powertools for AWS Lambda (Java) v2 streamlines working with the Lambda MSK Event Source Mapping (ESM) and self-managed Kafka event sources. It provides a familiar experience for developers working with Apache Kafka by allowing direct conversion from Lambda events to Kafka’s native types. The key features include:

Direct deserialization into Kafka ConsumerRecords<K, V> objects while using the Lambda-native RequestHandler interface
Support for deserializing JSON, Avro, and Protobuf encoded records for key and value fields with and without usage of a Schema Registry when producing the messages

To add the Kafka utility to your project, include the powertools-kafka library as a Maven dependency in your pom.xml:

<!-- In the dependencies section -->
<dependency>
    <groupId>software.amazon.lambda</groupId>
    <artifactId>powertools-kafka</artifactId>
    <version>2.1.1</version>
</dependency>
<!-- Kafka clients dependency - compatibility works for >= 3.0.0 -->
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>4.0.0</version>
</dependency>

Use the @Deserialization annotation on your Lambda handler to deserialize messages as native Kafka ConsumerRecords. Make sure to specify the deserializer type. The following example shows how to deserialize Avro encoded record values with String keys. As in a regular Lambda handler, declare the input type to your function in the RequestHandler generic parameters and the utility discovers the deserialization types automatically. The AvroProduct class in the following example is an auto-generated Java class using the Java org.apache.avro.avro library.

public class App implements RequestHandler<ConsumerRecords<String, AvroProduct>, Void> {
    private static final Logger log = LoggerFactory.getLogger(App.class);

    @Deserialization(type = DeserializationType.KAFKA_AVRO)
    public Void handleRequest(ConsumerRecords<String, AvroProduct> consumerRecords, Context context) {
        log.info("Deserialized {} records.", consumerRecords.records().size()); 

        // ... Business logic 
        
        return null;
    }
}

Conclusion

Powertools for AWS Lambda (Java) v2 represents the next evolution in the toolkit for building robust, observable, and high-performing Serverless applications. Throughout this post, we’ve explored the enhanced core observability utilities with their new features, the performance gains through GraalVM native image support, and the new Kafka utility that supports using familiar Kafka patterns when working on Lambda.

Powertools also offers more utilities to handle common Serverless design patterns. Each utility is designed with the same principles of clarity and minimal overhead.To learn more:

Visit the documentation for detailed guides and examples
Try the sample applications
Join the community on GitHub to share your experience and get help

Your next Serverless application awaits with Powertools for AWS Lambda (Java) v2. We would love to hear your feedback!

Overcome development disarray with Amazon Q Developer CLI custom agents

2025-07-31 Brian Beach

Post Syndicated from Brian Beach original https://aws.amazon.com/blogs/devops/overcome-development-disarray-with-amazon-q-developer-cli-custom-agents/

As a developer who has embraced the power of the Model Context Protocol (MCP)to enhance my workflows, I’m thrilled to see the addition of custom agents in the Amazon Q Developer CLI. This new feature takes the capabilities I’ve come to rely on to a whole new level, allowing me to seamlessly manage different development contexts and easily switch between them.

In my previous post, I discussed how MCP servers have revolutionized the way I interact with AWS services, databases, and other essential tools. MCP integration in Amazon Q Developer allows me to query my database schemas, automate infrastructure deployments, and so much more. However, as I started juggling multiple projects, each with their own unique tech stacks and requirements, I found myself needing a more structured approach to managing these diverse development environments.

Enter custom agents. With this new feature, I can now create and use a custom agent by bringing together specific tools, prompt, context and tool permissions for tasks appropriate for the stage of development. In this post I will explain how to configure a cusom agent for front-end and back-end development. Allowing me to easily optimize Amazon Q Developer for each task.

Background

Imagine that I am working on a multi-tier web application. The application has a React front-end written in Typescript and a FastAPI back-end written in Python. In addition to me, the team includes a designer that uses Figma, and the database administrator that manages a PostgreSQL database. There are subtle differences in how I communicate with the designer and the database administrator. For example, when I discuss a “table” with the designer, I’m likely referring to an HTML table and how the page is structured. However, when I discuss a table with the database administrator, I’m likely talking about a SQL table and how data is stored.

In the past, I had both the Figma Dev Mode MCP server and Amazon Aurora PostgreSQL MCP server configured in my environment. While this allowed me to easily work on either the front-end or back-end code, it introduced some challenges. If I asked Amazon Q Developer “how many tables do I have?” Amazon Q Developer would have to guess if I was talking about HTML tables or SQL tables. If the question is about HTML, it should use the Figma server. If the question is about SQL, it should use the Aurora server. This is not a technical limitation, it’s a language limitation. Just as I have to adjust my assumptions to talk with the designer and database administrator, Amazon Q Developer has to make the same adjustments.

Enter Amazon Q Developer CLI custom agents. Custom agents allow me to optimize Q Developer’s configuration for each scenario. Let’s walk through my front-end and back-end configuration to understand the impact.

Front-end agent

My front-end custom agent is optimized for front-end web development using React and Figma. The following code example is the configuration for my front-end agent stored in ~/.aws/amazonq/agents/front-end.json. Let’s discuss the major sections of the configuration.

mcpServers – Here I have configured the Figma Dev Mode MCP Server. This simply communicates with the Figma Web Design App installed locally. Note that this replaces the MCP configuration that was stored in ~/.aws/amazonq/mcp.json
tools and allowedTools – These two sections are related, so I will discuss them together. tools defines the tools are available to Amazon Q Developer while allowedTools defines which tools are trusted. In other words, Q Developer is able to use all configured tools, and it does not have to ask my permission to use fs_read, fs_write, and @Figma. @Figma allows Amazon Q Developer to use all Figma tools without asking for permission. More on this in the next section.
resources – Here I have configured the files that should be added to the context. I have included the README.md (stored in the project folder) and my own preferences for React (stored in my profile). You can read more in the context management section of the user guide.
hooks – In addition to the resources, I have also included a hook. This hook will run a command and inject it into the context at runtime. In the example, I am adding the current git status. You can read more in the context hooks section of the user guide.

{
  "description": "Optimized for front-end web development using React and Figma",
  "mcpServers": {
    "Figma": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "http://127.0.0.1:3845/sse"
      ]
    }
  },
  "tools": ["*"],
  "allowedTools": [
    "fs_read",
    "fs_write",
    "report_issues",
    "@Figma"
  ],
  "resources": [
    "file://README.md",
    "file://~/.aws/amazonq/react-preferences.md"
  ],
  "hooks": {
    "agentSpawn": [
      {
        "command": "git status"
      }
    ]
  }
}

Back-end agent

My back-end custom agent is optimized for back-end development with Python and PostgreSQL. The following code example is the configuration for my back-end agent stored in ~/.aws/amazonq/agents/back-end.json. Rather than describing the sections, as I did earlier, I will focus on the differences between the front-end and back-end.

mcpServers – Here I have configured the Amazon Aurora PostgreSQL MCP Server. This allows Amazon Q Developer to query my dev database to learn about the schema. Notice that I have configured a read-only connection to ensure that I don’t accidentally update the database.
tools and allowedTools – Once again, I have enabled Amazon Q Developer to use all tools. However, notice that I am more restrictive about what tools are trusted. Amazon Q Developer will need to ask permission to use fs_write or @PostgreSQL/run_query. Notice that I can allow the entire MCP server as I did with Figma or specific tools as I did here.
resources – Again, I have included the README.md (stored in the project folder) and my own preferences for Python and SQL (both stored in my profile). Note that I can also use glob patterns here. For example, file://.amazonq/rules/**/*.md would include the rules created by the Amazon Q Developer IDE plugins.
hooks – Finally, I have also included the hook for the front-end and back-end. However, I could have included project specific options such as npm run for the front-end and pip freeze for the back-end.

{
  "description": "Optimized for back-end development with Python and PostgreSQL",
  "mcpServers": {
    "PostgreSQL": {
      "command": "uvx",
      "args": [
        "awslabs.postgres-mcp-server@latest",
        "--resource_arn", "arn:aws:rds:us-east-1:xxxxxxxxxxxx:cluster:xxxxxx",
        "--secret_arn", "arn:aws:secretsmanager:us-east-1:xxxxxxxxxxxx:secret:rds!cluster-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx-xxxxxx",
        "--database", "dev",
        "--region", "us-east-1",
        "--readonly", "True"
      ]
    }
  },
  "tools": ["*"],
  "allowedTools": [
    "fs_read",
    "report_issues",
    "@PostgreSQL/get_table_schema"
  ],
  "resources": [
    "file://README.md",
    "file://~/.aws/amazonq/python-preferences.md",
    "file://~/.aws/amazonq/sql-preferences.md"
  ],
  "hooks": {
    "agentSpawn": [
      {
        "command": "git status"
      }
    ]
  }
}

Using custom agents

The real power of agents becomes evident when I need to switch between these different development contexts. I can now simply run q chat --agent front-end when I am working on React and Figma or q chat --agent back-end when I am working with Python and SQL. Amazon Q Developer will configure the correct agent with all my preferences.

In the following image, you can see the configuration in the Amazon Q Developer CLI. Notice that the front-end agent has an additional tool called Figma while the back-end agent has an additional tool called PostgreSQL. In addition, the front-end agent trusts fs_write and all of the Figma tools while the back-end agent will ask permission to use fs_write and only trusts one of the two PostgreSQL tools.

Similarly, let’s look at the context configuration in both the front-end and back-end agents. In the following image, I have included my React preferences for front-end development, and both Python and SQL preferences for back-end development.

A split terminal view showing the output of "/context show" command for both front-end and back-end environments. The front-end agent shows matches for "~/.aws/amazonq/react-preferences.md" and "README.md", while the back-end agent shows matches for "~/.aws/amazonq/python-preferences.md", "~/.aws/amazonq/sql-preferences.md", and "README.md". Each file is marked with "(1 match)" in green text.

As you can see, custom agents allow me to optimize the Amazon Q Developer CLI for each task. Of course, front-end and back-end agents are just an example. You might have a developer and testing agents, data science and analytics agents, etc. Custom agents allow you to tailor the configuration to most any task.

Conclusion

Amazon Q Developer CLI custom agents represent a significant improvement in managing complex development environments. By allowing developers to seamlessly switch between different contexts, they eliminate the cognitive overhead of manually reconfiguring tools and permissions for different tasks. Ready to streamline your development workflow? Get started with Amazon Q Developer today.

Amazon DocumentDB Serverless is now available

2025-07-31 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/amazon-documentdb-serverless-is-now-available/

Today, we’re announcing the general availability of Amazon DocumentDB Serverless, a new configuration for Amazon DocumentDB (with MongoDB compatibility) that automatically scales compute and memory based on your application’s demand. Amazon DocumentDB Serverless simplifies database management with no upfront commitments or additional costs, offering up to 90 percent cost savings compared to provisioning for peak capacity.

With Amazon DocumentDB Serverless, you can use the same MongoDB compatible-APIs and capabilities as Amazon DocumentDB, including read replicas, Performance Insights, I/O optimized, and integrations with other Amazon Web Services (AWS) services.

Amazon DocumentDB Serverless introduces a new database configuration measured in a DocumentDB Capacity Unit (DCU), a combination of approximately 2 gibibytes (GiB) of memory, corresponding CPU, and networking. It continually tracks utilization of resources such as CPU, memory, and network coming from database operations performed by your application.

Amazon DocumentDB Serverless automatically scales DCUs up or down to meet demand without disrupting database availability. Switching from provisioned instances to serverless in an existing cluster is as straightforward as adding or changing the instance type. This transition doesn’t require any data migration. To learn more, visit How Amazon DocumentDB Serverless works.

Some key use cases and advantages of Amazon DocumentDB Serverless include:

Variable workloads – With Amazon DocumentDB Serverless, you can handle sudden traﬃc spikes such as periodic promotional events, development and testing environments, and new applications where usage might ramp up quickly. You can also build agentic AI applications that benefit from built-in vector search for Amazon DocumentDB and serverless adaptability to handle dynamically invoked agentic AI workflows.
Multi-tenant workloads – You can use Amazon DocumentDB Serverless to manage individual database capacity across the entire database fleet. You don’t need to manage hundreds or thousands of databases for enterprises applications or multi-tenant environments of a software as a service (SaaS) vendor.
Mixed-use workloads – You can balance read and write capacity in workloads that periodically experience spikes in query traffic, such as online transaction processing (OLTP) applications. By specifying promotion tiers for Amazon DocumentDB Serverless instances in a cluster, you can configure your cluster so that the reader instances can scale independently of the writer instance to handle the additional load.

For steady workloads, Amazon DocumentDB provisioned instances are more suitable. You can select an instance class that offers a predefined amount of memory, CPU power, and I/O bandwidth. If your workload changes when using provisioned instances, you should manually modify the instance class of your writer and readers. Optionally, you can add serverless instances to an existing provisioned Amazon DocumentDB cluster at any time.

Amazon DocumentDB Serverless in action
To get started with Amazon DocumentDB Serverless, go to the Amazon DocumentDB console. In the left navigation pane, choose Clusters and Create.

On the Create Amazon DocumentDB cluster page, choose Instance-based cluster type and then Serverless instance configuration. You can choose minimum and maximum capacity DCUs. Amazon DocumentDB Serverless is supported starting with Amazon DocumentDB 5.0.0 and higher with a capacity range of 0.5–256 DCUs.

If you use features such as auditing and Performance Insights, consider adding DCUs for each feature. To learn more, visit Amazon DocumentDB Serverless scaling configuration.

To add a serverless instance to an existing provisioned cluster, choose Add instances on the Actions menu when you choose the provisioned cluster. If you use a cluster with an earlier version such as 3.6 or 4.0, you should first upgrade the cluster to the supported engine version (5.0).

On the Add instances page, choose Serverless in the DB instance class section for each new serverless instance you want to create. To add another instance, choose Add instance and continue adding instances until you have reached the desired number of new instances. Choose Create.

You can perform a failover operation to make a DocumentDB Serverless instance the cluster writer. Also, you can convert any remaining provisioned Amazon DocumentDB instances to DocumentDB Serverless instances by changing an instance’s class or removing them from the cluster by deleting an Amazon DocumentDB instance.

Now, you can connect to your Amazon DocumentDB cluster using AWS CloudShell. Choose Connect to cluster, and you can see the AWS CloudShell Run command screen. Enter a unique name in New environment name and choose Create and run.

When prompted, enter the password for the Amazon DocumentDB cluster. You’re successfully connected to your Amazon DocumentDB cluster, and you can run a few queries to get familiar with using a document database.

To learn more, visit Creating a cluster that uses Amazon DocumentDB Serverless and Managing Amazon DocumentDB Serverless in the AWS documentation.

Now available
Amazon DocumentDB Serverless is now available starting with Amazon DocumentDB 5.0 for both new and existing clusters. You only pay a flat rate per second of DCU usage. To learn more about pricing details and Regional availability, visit the Amazon DocumentDB pricing page.

Give these new features a try in the Amazon DocumentDB console and send feedback to AWS re:Post for Amazon DocumentDB or through your usual AWS Support contacts.

— Channy

AWS Weekly Roundup: SQS fair queues, CloudWatch generative AI observability, and more (July 28, 2025)

2025-07-28 Micah Walter

Post Syndicated from Micah Walter original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-sqs-fair-queues-cloudwatch-generative-ai-observability-and-more-july-28-2025/

To be honest, I’m still recovering from the AWS Summit in New York, doing my best to level up on launches like Amazon Bedrock AgentCore (Preview) and Amazon Simple Storage Service (S3) Vectors. There’s a lot of new stuff to learn!

Meanwhile, it’s been an exciting week for AWS builders focused on reliability and observability. The standout announcement has to be Amazon SQS fair queues, which tackles one of the most persistent challenges in multi-tenant architectures: the “noisy neighbor” problem. If you’ve ever dealt with one tenant’s message processing overwhelming shared infrastructure and affecting other tenants, you’ll appreciate how this feature enables more balanced message distribution across your applications.

On the AI front, we’re also seeing AWS continue to enhance our observability capabilities with the preview launch of Amazon CloudWatch generative AI observability. This brings AI-powered insights directly into your monitoring workflows, helping you understand infrastructure and application performance patterns in new ways. And for those managing Amazon Connect environments, the addition of AWS CloudFormation for message template attachments makes it easier to programmatically deploy and manage email campaign assets across different environments.

Last week’s launches

Amazon SQS Fair Queues — AWS launched Amazon SQS fair queues to help mitigate the “noisy neighbor” problem in multi-tenant systems, enabling more balanced message processing and improved application resilience across shared infrastructure.
Amazon CloudWatch Generative AI Observability (Preview) — AWS launched a preview of Amazon CloudWatch generative AI observability, enabling users to gain AI-powered insights into their cloud infrastructure and application performance through advanced monitoring and analysis capabilities.
Amazon Connect CloudFormation Support for Message Template Attachments —AWS has expanded the capabilities of Amazon Connect by introducing support for AWS CloudFormation for Outbound Campaign message template attachments, enabling customers to programmatically manage and deploy email campaign attachments across different environments.
Amazon Connect Forecast Editing — Amazon Connect introduces a new forecast editing UI that allows contact center planners to quickly adjust forecasts by percentage or exact values across specific date ranges, queues, and channels for more responsive workforce planning.
Bloom Filters for Amazon ElastiCache — Amazon ElastiCache now supports Bloom filters in version 8.1 for Valkey, offering a space-efficient way to quickly check if an item is in a set with over 98% memory efficiency compared to traditional sets.
Amazon EC2 Skip OS Shutdown Option — AWS has introduced a new option for Amazon EC2 that allows customers to skip the graceful operating system shutdown when stopping or terminating instances, enabling faster application recovery and instance state transitions.
AWS HealthOmics Git Repository Integration — AWS HealthOmics now supports direct Git repository integration for workflow creation, allowing researchers to seamlessly pull workflow definitions from GitHub, GitLab, and Bitbucket repositories while enabling version control and reproducibility.
AWS Organizations Tag Policies Wildcard Support — AWS Organizations now supports a wildcard statement (ALL_SUPPORTED) in Tag Policies, allowing users to apply tagging rules to all supported resource types for a given AWS service in a single line, simplifying policy creation and reducing complexity.

Blogs of note

Beyond IAM Access Keys: Modern Authentication Approaches — AWS recommends moving beyond traditional IAM access keys to more secure authentication methods, reducing risks of credential exposure and unauthorized access by leveraging modern, more robust approaches to identity management.

Upcoming AWS events

AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Singapore (August 2), Australia (August 15), Adria (September 5), Baltic (September 10), and Aotearoa (September 18).

New AWS whitepaper: AWS User Guide to Financial Services Regulations and Guidelines in Australia

2025-07-25 Julian Busic

Post Syndicated from Julian Busic original https://aws.amazon.com/blogs/security/new-aws-whitepaper-aws-user-guide-to-financial-services-regulations-and-guidelines-in-australia/

Amazon Web Services (AWS) has released substantial updates to its AWS User Guide to Financial Services Regulations and Guidelines in Australia to help financial services customers in Australia accelerate their use of AWS.

The updates reflect the Australian Prudential Regulation Authority’s (APRA) publication of the Prudential Standard CPS 230 Operational Risk Management (CPS 230), which became effective from July 1, 2025. It also reflects that APRA rescinded its 2018 information paper “Outsourcing Involving Cloud Computing Services” in February 2025.

The updated whitepaper continues our efforts to help AWS customers navigate APRA’s regulatory expectations in a shared responsibility environment. It is intended for APRA-regulated institutions that are looking to run workloads on AWS and is particularly useful for leadership, governance, security, risk, and compliance teams that need to understand APRA requirements and guidance.

The whitepaper summarizes APRA’s requirements and guidance related to operational risk management and information security. It also gives APRA-regulated institutions information they can use to commence their due diligence and assess how to implement the appropriate programs for their use of AWS.

As the regulatory environment continues to evolve, we’ll provide further updates through the AWS Security Blog and the AWS Compliance page. You can find more information on cloud-related regulatory compliance at the AWS Compliance Center. You can also reach out to your AWS account manager for help finding the resources you need.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

New whitepaper available: AICPA SOC 2 Compliance Guide on AWS

2025-07-23 Abdul Javid

Post Syndicated from Abdul Javid original https://aws.amazon.com/blogs/security/new-whitepaper-available-aicpa-soc-2-compliance-guide-on-aws/

We’re excited to announce the release of our latest whitepaper, AICPA SOC 2 Compliance Guide on AWS, which provides in-depth guidance on implementing and maintaining SOC 2-aligned controls using AWS services.

Building and operating cloud-native services in alignment with the AICPA’s Trust Services Criteria requires thoughtful planning and robust implementation. This new whitepaper helps cloud architects, security and compliance teams, and DevOps professionals design environments that meet SOC 2 requirements while leveraging AWS’s shared responsibility model.

What’s inside the whitepaper:

Overview of the SOC 2 framework—including Common Criteria (CC 1–CC 9) and category-specific criteria (Security, Availability, Processing Integrity, Confidentiality, Privacy)
Mapping of each Trust Services Criterion to AWS services and constructs
Guidance on implementing complementary user entity controls (CUECs)
Strategies for evidence collection, documentation, and audit procedures
Risk and governance for executives
Best practices for automating compliance and preparing for SOC 2 readiness assessments

Download AICPA SOC 2 Compliance Guide on AWS.

For further assistance, contact AWS Security Assurance Services.

If you have feedback about this post, submit comments in the Comments section below.

AWS HITRUST certification is available for customer inheritance

Kristine Armiyants – Masis, Armenia

Nadia Reyhani – Perth, Australia

Raphael Manke – Karlsruhe, Germany

Rowan Udell – Brisbane, Australia

Sangwoon (Chris) Park – Seoul, Korea

Toshal Khawale – Pune, India

Learn More

Key Features:

Why Use CCAPI MCP Server?

Creating and Managing Cloud Infrastructure

Prerequisites

Integration with Developer Tools

Configuration file structure

Important

Read Only Mode

Security Considerations

Example Use Case: Creating an S3 Bucket with KMS Encryption

Sample Prompts for Easy Start

Conclusion

Authors

Prerequisites

Enable table optimizations at the catalog level

Customize setting of table optimizations at both the catalog and table-level

Conclusion

About the authors

Use cases

Multi-tenant email operation challenges

Benefits of tenant management capability

How tenant management works

Prerequisites

How to set up tenant management and its key considerations

IP pool configuration

Domain verification process

Authentication and authorization for tenants

Set up a configuration set

Sample CLI commands

Send emails using SMTP

Email event feedback loop management for tenants

Enforcement data and patterns

Administrating tenant isolation and reputation management

Available metrics and data points

Conclusion

About the authors

Outposts network connectivity overview

Understanding the new CloudWatch metrics for Outposts

Using CloudWatch metrics to investigate Outposts network connectivity issues

Conclusion

Example use case

Solution overview

Step 1: Data producers publish data assets

Step 2: Data consumers find relevant datasets

Step 3: Data consumers subscribe to required data assets or products

Step 4: Data producers review and approve the subscription request

Step 5: Data consumers access the subscribed data assets or products

Conclusion

About the authors

Getting started using Powertools for AWS Lambda (Java) v2

Logging

Metrics

Tracing

Reducing Lambda cold start duration

Lambda MSK Event Source Mapping Integration

Conclusion

Background

Front-end agent

Back-end agent

Using custom agents

Conclusion

What’s inside the whitepaper:

The collective thoughts of the interwebz