Tag Archives: announcements

Accelerate autonomous incident resolutions using the Datadog MCP server and AWS DevOps agent (in preview)

Post Syndicated from Nina Chen original https://aws.amazon.com/blogs/devops/accelerate-autonomous-incident-resolutions-using-the-datadog-mcp-server-and-aws-devops-agent-in-preview/

This post was co-written with Omri Sass (Director of Product Management), Cansu Berkem (Director of Product Management), and Mohammad Jama (Product Marketing Manager) from Datadog.

On-call engineers spend hours manually investigating incidents across multiple observability tools, logs, and monitoring systems. This process delays incident resolution and impacts business operations, especially when teams need to correlate data across different monitoring platforms. AWS DevOps Agent (in preview) is a frontier agent that resolves and proactively prevents incidents, continuously improving reliability and performance of applications in AWS, multicloud, and hybrid environments. Frontier agents represent a new class of AI agents that are autonomous, massively scalable, and work for hours or days without constant intervention. AWS DevOps Agent offers built-in integration with Datadog Model Context Protocol (MCP) Server, enabling you to access the untapped insights in your data by connecting directly to Datadog’s monitoring solutions. DevOps Agent maps your application resources and correlates telemetry, code, and deployment data to reduce MTTR (Mean Time To Resolution) and drive operational excellence.

You can use this integration to collect and analyze Datadog logs, metrics, and traces, correlating this data across AWS services. When incidents occur, AWS DevOps Agent identifies issues and provides mitigation plans which engineers can then implement. Engineers can monitor automated investigations through a central dashboard and engage with the agent through interactive chat at any time. Using this integration, engineers are able to reduce mean time to resolution (MTTR) from hours to minutes, while maintaining full visibility into automated actions.

How Datadog MCP and AWS DevOps Agent work together

The integration between Datadog MCP Server and AWS DevOps Agent connects your monitoring data with automated incident response. Datadog MCP Server acts as a central access point for your monitoring data. It securely connects to Datadog through a standardized protocol, allowing AWS DevOps Agent to query logs, metrics, and traces during investigations. The service uses OAuth 2.0 authentication and supports multiple regions to help maintain data sovereignty requirements.

AWS DevOps Agent learns your resources and relationships while correlating data from both AWS services and Datadog. It analyzes Amazon CloudWatch logs and metrics, deployment data, and code alongside Datadog telemetry to build a complete picture of the incident. This combined view helps identify root causes faster than examining each data source separately. Security considerations are built into every interaction. All interactions between AWS DevOps Agent and Datadog MCP Server uses authentication, authorization, encryption, and logging for audit purposes. While the service currently only runs in us-east-1, it can monitor and analyze applications deployed across any AWS Region in customer accounts globally.

Setting up and using AWS DevOps Agent with Datadog

In this section, we will guide you through the steps required to enable Datadog MCP Server in your AWS DevOps Agent account and configure it for incident resolution.

Pre-requisites

For this walkthrough, you should have access to and understanding of the following:

  • An AWS account with permissions to create AWS IAM (Identity and Access Management) roles:
    • Agent Space role – for basic service operations
    • Agent Space web app role – for using the Agent Space web app functionality
    •  (Optional) Secondary source account roles if monitoring multiple AWS accounts. Refer to the DevOps Agent user guide for the details on setting up these roles.
  • A Datadog account
  • Access to Datadog MCP Server (in preview)

Setting up Datadog in the AWS DevOps Agent console

Start the setup in the AWS DevOps Agent console by connecting your Datadog MCP Server. Navigate to Settings, select the Datadog integration panel, and choose “Register.” Enter your Datadog MCP Server details when prompted (you can learn more about requesting access to this server in their documentation). AWS DevOps Agent validates the connection and displays a confirmation message.

This is the configuration in AWS DevOps Agent for Datadog MCP Server Details with three input fields: Server Name (with example 'my-datadog-server'), Endpoint URL (showing 'https://mcp.datadog.com/api/unstable/mcp-server/mcp'), and an optional Description field. The form includes navigation steps at the top and Cancel/Next buttons at the bottom. The interface has a dark theme with blue accents.Figure 1: Setting up Datadog MCP Server in AWS DevOps Agent Console

Create an AWS DevOps Agent Agent Space

Next, create an Agent Space in your primary AWS account. This requires an AWS IAM role that grants AWS DevOps Agent access to your AWS resources. After creating your Agent Space, add Datadog MCP Server as a telemetry source to enable comprehensive incident investigation.

To create your Agent Space, start by accessing the AWS DevOps Agent console in us-east-1. Choose the “Create Agent Space” button and provide a meaningful name and description for your space. After submitting the form, you’ll need to configure the required IAM roles, which can be done through either the automated creation process or manual setup.

This is the configuration for creating an AWS DevOps Agent AgentSpace. The screen shows the option to create a DevOps Agents, with areas to give agent details, resource access, and more. The interface is dark blue theme. Figure 2: Creating a AWS DevOps Agent in Agent Space

Your Agent Space topology can be initialized using either AWS CloudFormation stacks or AWS Tags as starting points to identify your application components. Once the basic setup is complete, you can enhance your Agent Space configuration by adding Secondary source accounts for multi-account monitoring and configuring integrations with services like SIM ticketing system, Pipelines (where GitFarm packages and CloudFormation Stacks are located), Slack, and most importantly for our use case, Telemetry with the Datadog MCP Server.

This is a page that has options for adding telemetry source (datadog) in agent space. Here, there is a pop-up to add source association. The selected source here to add is Datadog. Figure 3: Add additional telemetry sources for AWS DevOps Agent to investigate

From here, we can launch the Agent Space web app to begin the investigation.

Real-World example: Resolving API Gateway errors

Let’s walk through how AWS DevOps Agent and Datadog work together to resolve a production incident. In this scenario, Datadog detects a spike in Amazon API Gateway 5XX errors affecting downstream services.

This is a sample monitor view of sample 5XX errors in Datadog. There is a monitor of Amazon API Gateway pulled up. On the right, there is a monitor showing "Your 5XX Errors" with over 220 errors. Figure 4: Sample API Gateway errors in Datadog

Investigating 5XX errors from API Gateway Incident with the Datadog MCP Server and AWS DevOps Agent

When the alert triggers, AWS DevOps Agent automatically analyzes both Datadog metrics and API Gateway logs. Through the investigation chat interface, an engineer guides AWS DevOps Agent to examine the API Gateway configuration. The agent correlates API Gateway and AWS Lambda execution logs, quickly identifying error patterns.

This is a view in AWS DevOps Agent to allow for investigating an incident with AWS DevOps Agent and Datadog MCPFigure 4: Investigating an incident with AWS DevOps Agent and Datadog MCP

Resolving and prevention

AWS DevOps Agent helps identify potential misconfigurations in the Lambda and Amazon DynamoDB integration and implements immediate fixes. The agent documents all findings and actions in an incident record, backed by telemetry from both Datadog and AWS services. After resolution, AWS DevOps Agent generates a detailed analysis report with specific recommendations to prevent similar incidents. Teams can review and implement these suggestions through the Prevention feature in the AWS DevOps Agent web app.

This view show the investigation summary produced by AWS DevOps Agent. Here, we see the root cause for this sample incident. The root cause head line states that "1. DynamoDB table name misconfiguration - typo in environment variable". There is a longer description explaining this under it. The background for this view is plain white. Figure 5: Investigation summary produced by AWS DevOps Agent

Clean up

When you’re done using the integration, you can clean up your resources by following these steps:

  1. Delete your Agent Space from the AWS DevOps Agent console
  2. Remove the Datadog MCP Server connection from your settings
  3. Delete the IAM roles created for the Agent Space
  4. (Optional) If you created additional source account roles, remove those as well

Conclusion

The integration between Datadog MCP Server and AWS DevOps Agent reduces incident resolution time by automatically correlating data across your monitoring tools. Instead of manually switching between Datadog and AWS dashboards during incidents, teams can now get an AI-powered investigation that identifies root causes and suggests fixes. Early adopters report significant improvements in their incident response. Resolution times drop from hours to minutes, while on-call teams spend less time gathering data. Teams also see more consistent incident responses and improved root cause analysis through comprehensive data correlation. To learn more, check out the AWS DevOps Agent product page.

Datadog is an AWS Specialization Partner and AWS Marketplace Seller that has been building integrations with AWS services for over a decade, amassing a growing catalog of 100+ AWS and 1000+ built-in integrations. This new AWS DevOps Agent and Datadog MCP Server integration builds upon Datadog’s strong track record of AWS partnership success. If you’re not already using Datadog, you can get started with a 14-day free trial via the AWS Marketplace.

Sujatha Kuppuraju

Sujatha Kuppuraju is a Principal Solutions Architect at AWS, specializing in Cloud and, Generative AI Security. She collaborates with software companies’ leadership teams to architect secure, scalable solutions on AWS and guide strategic product development. Leveraging her expertise in cloud architecture and emerging technologies, Sujatha helps organizations optimize offerings, maintain robust security, and bring innovative products to market in an evolving tech landscape.

DhilipVenkatesh Uvarajan

DhilipVenkatesh Uvarajan is as an Enterprise Support Lead TAM within AWS Enterprise Support, specializing in Independent Software Vendors (ISVs) across the United States. In this role, Dhilip provides strategic technical guidance to help customers innovate, optimize their AWS architecture, and ensure the seamless operation of their business-critical applications on the AWS cloud. Beyond his professional endeavors, Dhilip is passionate about AI and Robotics, often exploring innovative projects in his spare time.

Nina Chen

Nina Chen is a Customer Solutions Manager at AWS specializing in leading software companies to leverage the power of the AWS cloud to accelerate their product innovation and growth. With over 4 years of experience working in the strategic Independent Software Vendor (ISV) vertical, Nina enjoys guiding ISV partners through their cloud transformation journeys, helping them optimize their cloud infrastructure, driving product innovation, and delivering exceptional customer experiences.

Omri Sass

Omri Sass is a Director of Product Management at Datadog, where he’s overseen the development and launch of a multitude of products and capabilities including Bits AI SRE and updog.ai. He is a keen advocate for good user experience and doing what’s right by users.

Cansu Berkem

Cansu Berkem is a Director of Product Management at Datadog, overseeing the company’s end-to-end incident response experience, including Incident Management, On-Call, Automations, and Bits AI SRE. Her products help engineers resolve incidents faster through AI-driven workflows, powered by Bits AI SRE as an autonomous incident investigator and supported by integration-rich incident management and paging flows.

Mohammad Jama

Mohammad Jama is a Product Marketing Manager at Datadog. He leads go-to-market for Datadog’s AWS integrations, working closely with product, marketing, and sales to help companies observe and secure their hybrid and AWS environments.

Amazon Bedrock adds reinforcement fine-tuning simplifying how developers build smarter, more accurate AI models

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/improve-model-accuracy-with-reinforcement-fine-tuning-in-amazon-bedrock/

Organizations face a challenging trade-off when adapting AI models to their specific business needs: settle for generic models that produce average results, or tackle the complexity and expense of advanced model customization. Traditional approaches force a choice between poor performance with smaller models or the high costs of deploying larger model variants and managing complex infrastructure. Reinforcement fine-tuning is an advanced technique that trains models using feedback instead of massive labeled datasets, but implementing it typically requires specialized ML expertise, complicated infrastructure, and significant investment—with no guarantee of achieving the accuracy needed for specific use cases.

Today, we’re announcing reinforcement fine-tuning in Amazon Bedrock, a new model customization capability that creates smarter, more cost-effective models that learn from feedback and deliver higher-quality outputs for specific business needs. Reinforcement fine-tuning uses a feedback-driven approach where models improve iteratively based on reward signals, delivering 66% accuracy gains on average over base models.

Amazon Bedrock automates the reinforcement fine-tuning workflow, making this advanced model customization technique accessible to everyday developers without requiring deep machine learning (ML) expertise or large labeled datasets.

How reinforcement fine-tuning works
Reinforcement fine-tuning is built on top of reinforcement learning principles to address a common challenge: getting models to consistently produce outputs that align with business requirements and user preferences.

While traditional fine-tuning requires large, labeled datasets and expensive human annotation, reinforcement fine-tuning takes a different approach. Instead of learning from fixed examples, it uses reward functions to evaluate and judge which responses are considered good for particular business use cases. This teaches models to understand what makes a quality response without requiring massive amounts of pre-labeled training data, making advanced model customization in Amazon Bedrock more accessible and cost-effective.

Here are the benefits of using reinforcement fine-tuning in Amazon Bedrock:

  • Ease of use – Amazon Bedrock automates much of the complexity, making reinforcement fine-tuning more accessible to developers building AI applications. Models can be trained using existing API logs in Amazon Bedrock or by uploading datasets as training data, eliminating the need for labeled datasets or infrastructure setup.
  • Better model performance – Reinforcement fine-tuning improves model accuracy by 66% on average over base models, enabling optimization for price and performance by training smaller, faster, and more efficient model variants. This works with Amazon Nova 2 Lite model, improving quality and price performance for specific business needs, with support for additional models coming soon.
  • Security – Data remains within the secure AWS environment throughout the entire customization process, mitigating security and compliance concerns.

The capability supports two complementary approaches to provide flexibility for optimizing models:

  • Reinforcement Learning with Verifiable Rewards (RLVR) uses rule-based graders for objective tasks like code generation or math reasoning.
  • Reinforcement Learning from AI Feedback (RLAIF) employs AI-based judges for subjective tasks like instruction following or content moderation.

Getting started with reinforcement fine-tuning in Amazon Bedrock
Let’s walk through creating a reinforcement fine-tuning job.

First, I access the Amazon Bedrock console. Then, I navigate to the Custom models page. I choose Create and then choose Reinforcement fine-tuning job.

I start by entering the name of this customization job and then select my base model. At launch, reinforcement fine-tuning supports Amazon Nova 2 Lite, with support for additional models coming soon.

Next, I need to provide training data. I can use my stored invocation logs directly, eliminating the need to upload separate datasets. I can also upload new JSONL files or select existing datasets from Amazon Simple Storage Service (Amazon S3). Reinforcement fine-tuning automatically validates my training dataset and supports the OpenAI Chat Completions data format. If I provide invocation logs in the Amazon Bedrock invoke or converse format, Amazon Bedrock automatically converts them to the Chat Completions format.

The reward function setup is where I define what constitutes a good response. I have two options here. For objective tasks, I can select Custom code and write custom Python code that gets executed through AWS Lambda functions. For more subjective evaluations, I can select Model as judge to use foundation models (FMs) as judges by providing evaluation instructions.

Here, I select Custom code, and I create a new Lambda function or use an existing one as a reward function. I can start with one of the provided templates and customize it for my specific needs.

I can optionally modify default hyperparameters like learning rate, batch size, and epochs.

For enhanced security, I can configure virtual private cloud (VPC) settings and AWS Key Management Service (AWS KMS) encryption to meet my organization’s compliance requirements. Then, I choose Create to start the model customization job.

During the training process, I can monitor real-time metrics to understand how the model is learning. The training metrics dashboard shows key performance indicators including reward scores, loss curves, and accuracy improvements over time. These metrics help me understand whether the model is converging properly and if the reward function is effectively guiding the learning process.

When the reinforcement fine-tuning job is completed, I can see the final job status on the Model details page.

Once the job is completed, I can deploy the model with a single click. I select Set up inference, then choose Deploy for on-demand.

Here, I provide a few details for my model.

After deployment, I can quickly evaluate the model’s performance using the Amazon Bedrock playground. This helps me to test the fine-tuned model with sample prompts and compare its responses against the base model to validate the improvements. I select Test in playground.

The playground provides an intuitive interface for rapid testing and iteration, helping me confirm that the model meets my quality requirements before integrating it into production applications.

Interactive demo
Learn more by navigating an interactive demo of Amazon Bedrock reinforcement fine-tuning in action.

Additional things to know
Here are key points to note:

  • Templates — There are seven ready-to-use reward function templates covering common use cases for both objective and subjective tasks.
  • Pricing — To learn more about pricing, refer to the Amazon Bedrock pricing page.
  • Security — Training data and custom models remain private and aren’t used to improve FMs for public use. It supports VPC and AWS KMS encryption for enhanced security.

Get started with reinforcement fine-tuning by visiting the reinforcement fine-tuning documentation and by accessing the Amazon Bedrock console.

Happy building!
Donnie

Announcing replication support and Intelligent-Tiering for Amazon S3 Tables

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/announcing-replication-support-and-intelligent-tiering-for-amazon-s3-tables/

Today, we’re announcing two new capabilities for Amazon S3 Tables: support for the new Intelligent-Tiering storage class that automatically optimizes costs based on access patterns, and replication support to automatically maintain consistent Apache Iceberg table replicas across AWS Regions and accounts without manual sync.

Organizations working with tabular data face two common challenges. First, they need to manually manage storage costs as their datasets grow and access patterns change over time. Second, when maintaining replicas of Iceberg tables across Regions or accounts, they must build and maintain complex architectures to track updates, manage object replication, and handle metadata transformations.

S3 Tables Intelligent-Tiering storage class
With the S3 Tables Intelligent-Tiering storage class, data is automatically tiered to the most cost-effective access tier based on access patterns. Data is stored in three low-latency tiers: Frequent Access, Infrequent Access (40% lower cost than Frequent Access), and Archive Instant Access (68% lower cost compared to Infrequent Access). After 30 days without access, data moves to Infrequent Access, and after 90 days, it moves to Archive Instant Access. This happens without changes to your applications or impact on performance.

Table maintenance activities, including compaction, snapshot expiration, and unreferenced file removal, operate without affecting the data’s access tiers. Compaction automatically processes only data in the Frequent Access tier, optimizing performance for actively queried data while reducing maintenance costs by skipping colder files in lower-cost tiers.

By default, all existing tables use the Standard storage class. When creating new tables, you can specify Intelligent-Tiering as the storage class, or you can rely on the default storage class configured at the table bucket level. You can set Intelligent-Tiering as the default storage class for your table bucket to automatically store tables in Intelligent-Tiering when no storage class is specified during creation.

Let me show you how it works
You can use the AWS Command Line Interface (AWS CLI) and the put-table-bucket-storage-class and get-table-bucket-storage-class commands to change or verify the storage tier of your S3 table bucket.

# Change the storage class
aws s3tables put-table-bucket-storage-class \
   --table-bucket-arn $TABLE_BUCKET_ARN  \
   --storage-class-configuration storageClass=INTELLIGENT_TIERING

# Verify the storage class
aws s3tables get-table-bucket-storage-class \
   --table-bucket-arn $TABLE_BUCKET_ARN  \

{ "storageClassConfiguration":
   { 
      "storageClass": "INTELLIGENT_TIERING"
   }
}

S3 Tables replication support
The new S3 Tables replication support helps you maintain consistent read replicas of your tables across AWS Regions and accounts. You specify the destination table bucket and the service creates read-only replica tables. It replicates all updates chronologically while preserving parent-child snapshot relationships. Table replication helps you build global datasets to minimize query latency for geographically distributed teams, meet compliance requirements, and provide data protection.

You can now easily create replica tables that deliver similar query performance as their source tables. Replica tables are updated within minutes of source table updates and support independent encryption and retention policies from their source tables. Replica tables can be queried using Amazon SageMaker Unified Studio or any Iceberg-compatible engine including DuckDB, PyIceberg, Apache Spark, and Trino.

You can create and maintain replicas of your tables through the AWS Management Console or APIs and AWS SDKs. You specify one or more destination table buckets to replicate your source tables. When you turn on replication, S3 Tables automatically creates read-only replica tables in your destination table buckets, backfills them with the latest state of the source table, and continually monitors for new updates to keep replicas in sync. This helps you meet time-travel and audit requirements while maintaining multiple replicas of your data.

Let me show you how it works
To show you how it works, I proceed in three steps. First, I create an S3 table bucket, create an Iceberg table, and populate it with data. Second, I configure the replication. Third, I connect to the replicated table and query the data to show you that changes are replicated.

For this demo, the S3 team kindly gave me access to an Amazon EMR cluster already provisioned. You can follow the Amazon EMR documentation to create your own cluster. They also created two S3 table buckets, a source and a destination for the replication. Again, the S3 Tables documentation will help you to get started.

I take a note of the two S3 Tables bucket Amazon Resource Names (ARNs). In this demo, I refer to these as the environment variables SOURCE_TABLE_ARN and DEST_TABLE_ARN.

First step: Prepare the source database

I start a terminal, connect to the EMR cluster, start a Spark session, create a table, and insert a row of data. The commands I use in this demo are documented in Accessing tables using the Amazon S3 Tables Iceberg REST endpoint.

sudo spark-shell \
--packages "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160,software.amazon.awssdk:url-connection-client:2.20.160" \
--master "local[*]" \
--conf "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" \
--conf "spark.sql.defaultCatalog=spark_catalog" \
--conf "spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog" \
--conf "spark.sql.catalog.spark_catalog.type=rest" \
--conf "spark.sql.catalog.spark_catalog.uri=https://s3tables.us-east-1.amazonaws.com/iceberg" \
--conf "spark.sql.catalog.spark_catalog.warehouse=arn:aws:s3tables:us-east-1:012345678901:bucket/aws-news-blog-test" \
--conf "spark.sql.catalog.spark_catalog.rest.sigv4-enabled=true" \
--conf "spark.sql.catalog.spark_catalog.rest.signing-name=s3tables" \
--conf "spark.sql.catalog.spark_catalog.rest.signing-region=us-east-1" \
--conf "spark.sql.catalog.spark_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO" \
--conf "spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialProvider" \
--conf "spark.sql.catalog.spark_catalog.rest-metrics-reporting-enabled=false"

spark.sql("""
CREATE TABLE s3tablesbucket.test.aws_news_blog (
customer_id STRING,
address STRING
) USING iceberg
""")

spark.sql("INSERT INTO s3tablesbucket.test.aws_news_blog VALUES ('cust1', 'val1')")

spark.sql("SELECT * FROM s3tablesbucket.test.aws_news_blog LIMIT 10").show()
+-----------+-------+
|customer_id|address|
+-----------+-------+
|      cust1|   val1|
+-----------+-------+

So far, so good.

Second step: Configure the replication for S3 Tables

Now, I use the CLI on my laptop to configure the S3 table bucket replication.

Before doing so, I create an AWS Identity and Access Management (IAM) policy to authorize the replication service to access my S3 table bucket and encryption keys. Refer to the S3 Tables replication documentation for the details. The permissions I used for this demo are:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:*",
                "s3tables:*",
                "kms:DescribeKey",
                "kms:GenerateDataKey",
                "kms:Decrypt"
            ],
            "Resource": "*"
        }
    ]
}

After having created this IAM policy, I can now proceed and configure the replication:

aws s3tables-replication put-table-replication \
--table-arn ${SOURCE_TABLE_ARN} \
--configuration  '{
    "role": "arn:aws:iam::<MY_ACCOUNT_NUMBER>:role/S3TableReplicationManualTestingRole", 
    "rules":[
        {
            "destinations": [
                {
                    "destinationTableBucketARN": "${DST_TABLE_ARN}"
                }]
        }
    ]

The replication starts automatically. Updates are typically replicated within minutes. The time it takes to complete depends on the volume of data in the source table.

Third step: Connect to the replicated table and query the data

Now, I connect to the EMR cluster again, and I start a second Spark session. This time, I use the destination table.

S3 Tables replication - destination table

To verify the replication works, I insert a second row of data on the source table.

spark.sql("INSERT INTO s3tablesbucket.test.aws_news_blog VALUES ('cust2', 'val2')")

I wait a few minutes for the replication to trigger. I follow the status of the replication with the get-table-replication-status command.

aws s3tables-replication get-table-replication-status \
--table-arn ${SOURCE_TABLE_ARN} \
{
    "sourceTableArn": "arn:aws:s3tables:us-east-1:012345678901:bucket/manual-test/table/e0fce724-b758-4ee6-85f7-ca8bce556b41",
    "destinations": [
        {
            "replicationStatus": "pending",
            "destinationTableBucketArn": "arn:aws:s3tables:us-east-1:012345678901:bucket/manual-test-dst",
            "destinationTableArn": "arn:aws:s3tables:us-east-1:012345678901:bucket/manual-test-dst/table/5e3fb799-10dc-470d-a380-1a16d6716db0",
            "lastSuccessfulReplicatedUpdate": {
                "metadataLocation": "s3://e0fce724-b758-4ee6-8-i9tkzok34kum8fy6jpex5jn68cwf4use1b-s3alias/e0fce724-b758-4ee6-85f7-ca8bce556b41/metadata/00001-40a15eb3-d72d-43fe-a1cf-84b4b3934e4c.metadata.json",
                "timestamp": "2025-11-14T12:58:18.140281+00:00"
            }
        }
    ]
}

When replication status shows ready, I connect to the EMR cluster and I query the destination table. Without surprise, I see the new row of data.

S3 Tables replication - target table is up to date

Additional things to know
Here are a couple of additional points to pay attention to:

  • Replication for S3 Tables supports both Apache Iceberg V2 and V3 table formats, giving you flexibility in your table format choice.
  • You can configure replication at the table bucket level, making it straightforward to replicate all tables under that bucket without individual table configurations.
  • Your replica tables maintain the storage class you choose for your destination tables, which means you can optimize for your specific cost and performance needs.
  • Any Iceberg-compatible catalog can directly query your replica tables without additional coordination—they only need to point to the replica table location. This gives you flexibility in choosing query engines and tools.

Pricing and availability
You can track your storage usage by access tier through AWS Cost and Usage Reports and Amazon CloudWatch metrics. For replication monitoring, AWS CloudTrail logs provide events for each replicated object.

There are no additional charges to configure Intelligent-Tiering. You only pay for storage costs in each tier. Your tables continue to work as before, with automatic cost optimization based on your access patterns.

For S3 Tables replication, you pay the S3 Tables charges for storage in the destination table, for replication PUT requests, for table updates (commits), and for object monitoring on the replicated data. For cross-Region table replication, you also pay for inter-Region data transfer out from Amazon S3 to the destination Region based on the Region pair.

As usual, refer to the Amazon S3 pricing page for the details.

Both capabilities are available today in all AWS Regions where S3 Tables are supported.

To learn more about these new capabilities, visit the Amazon S3 Tables documentation or try them in the Amazon S3 console today. Share your feedback through AWS re:Post for Amazon S3 or through your AWS Support contacts.

— seb

Amazon Bedrock AgentCore adds quality evaluations and policy controls for deploying trusted AI agents

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-bedrock-agentcore-adds-quality-evaluations-and-policy-controls-for-deploying-trusted-ai-agents/

Today, we’re announcing new capabilities in Amazon Bedrock AgentCore to further remove barriers holding AI agents back from production. Organizations across industries are already building on AgentCore, the most advanced agentic platform to build, deploy, and operate highly capable agents securely at any scale. In just 5 months since preview, the AgentCore SDK has been downloaded over 2 million times. For example:

  • PGA TOUR, a pioneer and innovation leader in sports has built a multi-agent content generation system to create articles for their digital platforms. The new solution, built on AgentCore, enables the PGA TOUR to provide comprehensive coverage for every player in the field, by increasing content writing speed by 1,000 percent while achieving a 95 percent reduction in costs.
  • Independent software vendors (ISVs) like Workday are building the software of the future on AgentCore. AgentCore Code Interpreter provides Workday Planning Agent with secure data protection and essential features for financial data exploration. Users can analyze financial and operational data through natural language queries, making financial planning intuitive and self-driven. This capability reduces time spent on routine planning analysis by 30 percent, saving approximately 100 hours per month.
  • Grupo Elfa, a Brazilian distributor and retailer, relies on AgentCore Observability for complete audit traceability and real-time metrics of their agents, transforming their reactive processes into proactive operations. Using this unified platform, their sales team can handle thousands of daily price quotes while the organization maintains full visibility of agent decisions, helping achieve 100 percent traceability of agent decisions and interactions, and reduced problem resolution time by 50 percent.

As organizations scale their agent deployments, they face challenges around implementing the right boundaries and quality checks to confidently deploy agents. The autonomy that makes agents powerful also makes them hard to confidently deploy at scale, as they might access sensitive data inappropriately, make unauthorized decisions, or take unexpected actions. Development teams must balance enabling agent autonomy while ensuring they operate within acceptable boundaries and with the quality you require to put them in front of customers and employees.

The new capabilities available today take the guesswork out of this process and help you build and deploy trusted AI agents with confidence:

  • Policy in AgentCore (Preview) – Defines clear boundaries for agent actions by intercepting AgentCore Gateway tool calls before they run using policies with fine-grained permissions.
  • AgentCore Evaluations (Preview) – Monitors the quality of your agents based on real-world behavior using built-in evaluators for dimensions such as correctness and helpfulness, plus custom evaluators for business-specific requirements.

We’re also introducing features that expand what agents can do:

  • Episodic functionality in AgentCore Memory – A new long-term strategy that helps agents learn from experiences and adapt solutions across similar situations for improved consistency and performance in similar future tasks.
  • Bidirectional streaming in AgentCore Runtime – Deploys voice agents where both users and agents can speak simultaneously following a natural conversation flow.

Policy in AgentCore for precise agent control
Policy gives you control over the actions agents can take and are applied outside of the agent’s reasoning loop, treating agents as autonomous actors whose decisions require verification before reaching tools, systems, or data. It integrates with AgentCore Gateway to intercept tool calls as they happen, processing requests while maintaining operational speed, so workflows remain fast and responsive.

You can create policies using natural language or directly use Cedar—an open source policy language for fine-grained permissions—simplifying the process to set up, understand, and audit rules without writing custom code. This approach makes policy creation accessible to development, security, and compliance teams who can create, understand, and audit rules without specialized coding knowledge.

The policies operate independently of how the agent was built or which model it uses. You can define which tools and data agents can access—whether they are APIs, AWS Lambda functions, Model Context Protocol (MCP) servers, or third-party services—what actions they can perform, and under what conditions.

Teams can define clear policies once and apply them consistently across their organization. With policies in place, developers gain the freedom to create innovative agentic experiences, and organizations can deploy their agents to act autonomously while knowing they’ll stay within defined boundaries and compliance requirements.

Using Policy in AgentCore
You can start by creating a policy engine in the new Policy section of the AgentCore console and associate it with one or more AgentCore gateways.

A policy engine is a collection of policies that are evaluated at the gateway endpoint. When associating a gateway with a policy engine, you can choose whether to enforce the result of the policy—effectively permitting or denying access to a tool call—or to only emit logs. Using logs helps you test and validate a policy before enabling it in production.

Then, you can define the policies to apply to have granular control over access to the tools offered by the associated AgentCore gateways.

Amazon Bedrock AgentCore Policy console

To create a policy, you can start with a natural language description (that should include information of the authentication claims to use) or directly edit Cedar code.

Amazon Bedrock AgentCore Policy add

Natural language-based policy authoring provides a more accessible way for you to create fine-grained policies. Instead of writing formal policy code, you can describe rules in plain English. The system interprets your intent, generates candidate policies, validates them against the tool schema, and uses automated reasoning to check safety conditions—identifying prompts that are overly permissive, overly restrictive, or contain conditions that can never be satisfied.

Unlike generic large language model (LLM) translations, this feature understands the structure of your tools and generates policies that are both syntactically correct and semantically aligned with your intent, while flagging rules that cannot be enforced. It is also available as a Model Context Protocol (MCP) server, so you can author and validate policies directly in your preferred AI-assisted coding environment as part of your normal development workflow. This approach reduces onboarding time and helps you write high-quality authorization rules without needing Cedar expertise.

The following sample policy uses information from the OAuth claims in the JWT token used to authenticate to an AgentCore gateway (for the role) and the arguments passed to the tool call (context.input) to validate access to the tool processing a refund. Only an authenticated user with the refund-agent role can access the tool but for amounts (context.input.amount) lower than $200 USD.

permit(
  principal is AgentCore::OAuthUser,
  action == AgentCore::Action::"RefundTool__process_refund",
  resource == AgentCore::Gateway::"<GATEWAY_ARN>"
)
when {
  principal.hasTag("role") &&
  principal.getTag("role") == "refund-agent" &&
  context.input.amount < 200
};

AgentCore Evaluations for continuous, real-time quality intelligence
AgentCore Evaluations is a fully managed service that helps you continuously monitor and analyze agent performance based on real-world behavior. With AgentCore Evaluations, you can use built-in evaluators for common quality dimensions such as correctness, helpfulness, tool selection accuracy, safety, goal success rate, and context relevance. You can also create custom model-based scoring systems configured with your choice of prompt and model for business-tailored scoring while the service samples live agent interactions and scores them continuously.

All results from AgentCore Evaluations are visualized in Amazon CloudWatch alongside AgentCore Observability insights, providing one place for unified monitoring. You can also set up alerts and alarms on the evaluation scores to proactively monitor agent quality and respond when metrics fall outside acceptable thresholds.

You can use AgentCore Evaluations during the testing phase where you can check an agent against the baseline before deployment to stop faulty versions from reaching users, and in production for continuous improvement of your agents. When quality metrics drop below defined thresholds—such as a customer service agent satisfaction declining or politeness scores dropping by more than 10 percent over an 8-hour period—the system triggers immediate alerts, helping to detect and address quality issues faster.

Using AgentCore Evaluations
You can create an online evaluation in the new Evaluations section of the AgentCore console. You can use as data source an AgentCore agent endpoint or a CloudWatch log group used by an external agent. For example, I use here the same sample customer support agent I shared when we introduced AgentCore in preview.

Amazon Bedrock AgentCore Evaluations source

Then, you can select the evaluators to use, including custom evaluators that you can define starting from the existing templates or build from scratch.

Amazon Bedrock AgentCore Evaluations source

For example, for a customer support agent, you can select metrics such as:

  • Correctness – Evaluates whether the information in the agent’s response is factually accurate
  • Faithfulness – Evaluates whether information in the response is supported by provided context/sources
  • Helpfulness – Evaluates from user’s perspective how useful and valuable the agent’s response is
  • Harmfulness – Evaluates whether the response contains harmful content
  • Stereotyping – Detects content that makes generalizations about individuals or groups

The evaluators for tool selection and tool parameter accuracy can help you understand if an agent is choosing the right tool for a task and extracting the correct parameters from the user queries.

To complete the creation of the evaluation, you can choose the sampling rate and optional filters. For permissions, you can create a new AWS Identity and Access Management (IAM) service role or pass an existing one.

Amazon Bedrock AgentCore Evaluations create

The results are published, as they are evaluated, on Amazon CloudWatch in the AgentCore Observability dashboard. You can choose any of the bar chart sections to see the corresponding traces and gain deeper insight into the requests and responses behind that specific evaluation.

Amazon AgentCore Evaluations results

Because the results are in CloudWatch, you can use all of its feature to create, for example, alarms and automations.

Creating custom evaluators in AgentCore Evaluations
Custom evaluators allow you to define business-specific quality metrics tailored to your agent’s unique requirements. To create a custom evaluator, you provide the model to use as a judge, including inference parameters such as temperature and max output tokens, and a tailored prompt with the judging instructions. You can start from the prompt used by one of the built-in evaluators or enter a new one.

AgentCore Evaluations create custom evaluator

Then, you define the scale to produce in output. It can be either numeric values or custom text labels that you define. Finally, you configure whether the evaluation is computed by the model on single traces, full sessions, or for each tool call.

AgentCore Evaluations custom evaluator scale

AgentCore Memory episodic functionality for experience-based learning
AgentCore Memory, a fully managed service that gives AI agents the ability to remember past interactions, now includes a new long-term memory strategy that gives agents the ability to learn from past experiences and apply those lessons to provide more helpful assistance in future interactions.

Consider booking travel with an agent: over time, the agent learns from your booking patterns—such as the fact that you often need to move flights to later times when traveling for work due to client meetings. When you start your next booking involving client meetings, the agent proactively suggests flexible return options based on these learned patterns. Just like an experienced assistant who learns your specific travel habits, agents with episodic memory can now recognize and adapt to your individual needs.

When you enable the new episodic functionality, AgentCore Memory captures structured episodes that record the context, reasoning process, actions taken, and outcomes of agent interactions, while a reflection agent analyzes these episodes to extract broader insights and patterns. When facing similar tasks, agents can retrieve these learnings to improve decision-making consistency and reduce processing time. This reduces the need for custom instructions by including in the agent context only the specific learnings an agent needs to complete a task instead of a long list of all possible suggestions.

AgentCore Runtime bidirectional streaming for more natural conversations
With AgentCore Runtime, you can deploy agentic applications with few lines of code. To simplify deploying conversational experiences that feel natural and responsive, AgentCore Runtime now supports bidirectional streaming. This capability enables voice agents to listen and adapt while users speak, so that people can interrupt agents mid-response and have the agent immediately adjust to the new context—without waiting for the agent to finish its current output. Rather than traditional turn-based interaction where users must wait for complete responses, bidirectional streaming creates flowing, natural conversations where agents dynamically change their response based on what the user is saying.

Building these conversational experiences from the ground up requires significant engineering effort to handle the complex flow of simultaneous communication. Bidirectional streaming simplifies this by managing the infrastructure needed for agents to process input while generating output, handling interruptions gracefully, and maintaining context throughout dynamic conversation shifts. You can now deploy agents that naturally adapt to the fluid nature of human conversation—supporting mid-thought interruptions, context switches, and clarifications without losing the thread of the interaction.

Things to know
Amazon Bedrock AgentCore, including the preview of Policy, is available in the US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Mumbai, Singapore, Sydney, Tokyo), and Europe (Frankfurt, Ireland) AWS Regions . The preview of AgentCore Evaluations is available in the US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Sydney), and Europe (Frankfurt) Regions. For Regional availability and future roadmap, visit AWS Capabilities by Region.

With AgentCore, you pay for what you use with no upfront commitments. For detailed pricing information, visit the Amazon Bedrock pricing page. AgentCore is also a part of the AWS Free Tier that new AWS customers can use to get started at no cost and explore key AWS services.

These new features work with any open source framework such as CrewAI, LangGraph, LlamaIndex, and Strands Agents, and with any foundation model. AgentCore services can be used together or independently, and you can get started using your favorite AI-assisted development environment with the AgentCore open source MCP server.

To learn more and get started quickly, visit the AgentCore Developer Guide.

Danilo

Introducing Database Savings Plans for AWS Databases

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/introducing-database-savings-plans-for-aws-databases/

Since Amazon Web Services (AWS) introduced Savings Plans, customers have been able to lower the cost of running sustained workloads while maintaining the flexibility to manage usage across accounts, resource types, and AWS Regions. Today, we’re extending this flexible pricing model to AWS managed database services with the launch of Database Savings Plans, which help customers reduce database costs by up to 35% when they commit to a consistent amount of usage ($/hour) over a 1-year term. Savings automatically apply each hour to eligible usage across supported database services, and any additional usage beyond the commitment is billed at on-demand rates.

As organizations build and manage data-driven and AI applications, they often use different database services, engines and deployment types, including instance-based and serverless options, to meet evolving business needs. Database Savings Plans provide the flexibility to choose how workloads run while maintaining cost efficiency. If customers are in the middle of a migration or modernization effort, they can switch database engines and adjust deployment types, such as from provisioned to serverless as part of ongoing cost optimization, while continuing to receive discounted rates. If a customer’s business expands globally, they can also shift usage across AWS Regions and continue to benefit from the same commitment. By applying a consistent hourly commitment, customers can maintain predictable spend even as usage patterns evolve and analyze coverage and utilization using familiar cost management tools.

New Savings Plans
Each plan defines where pricing applies, the range of available discounts, and the level of flexibility provided across supported database engines, instance families, sizes, deployment options, or AWS Regions.

The hourly commitment automatically applies to all eligible usage regardless of Region, with support for Amazon Aurora, Amazon Relational Database Service (Amazon RDS), Amazon DynamoDB, Amazon ElastiCache, Amazon DocumentDB (with MongoDB compatibility), Amazon Neptune, Amazon Keyspaces (for Apache Cassandra), Amazon Timestream, and AWS Database Migration Service (AWS DMS). As new eligible database offerings, instance types, or Regions become available, Savings Plans will automatically apply to that usage.

Discounts vary by deployment model and service type. Serverless deployments provide up to 35% savings compared to on-demand rates. Provisioned instances across supported database services deliver up to 20% savings. For Amazon DynamoDB and Amazon Keyspaces, on-demand throughput workloads receive up to 18% savings, and provisioned capacity offers up to 12%. Together, these savings help customers optimize costs while maintaining consistent coverage for database usage. To learn more about the pricing and eligible usage, visit the Database Savings Plans pricing page.

Purchasing Database Savings Plans
AWS Billing and Cost Management Console helps you choose Savings Plans and guides you through the purchase process. You can get started from the AWS Management Console or use the AWS Command Line Interface (AWS CLI) and the API. There are two ways to evaluate Database Savings Plans purchases, in the Recommendations view and in the Purchase Analyzer.

Recommendations – are automatically generated from your recent on-demand usage. To reach the Recommendations view in the Billing and Cost Management console, choose Savings and Commitments, Savings Plans, and Recommendations in the navigation pane. In the Recommendations view, select Database Savings Plans and configure the Recommendation options. AWS Savings Plans recommendations analyze your historical on-demand usage to identify the hourly commitment that delivers the highest overall savings.

The Purchase Analyzer – is designed for modeling custom commitment levels. If you want to purchase a different amount than the recommended commitment on the Purchase Analyzer page, select Database Savings Plans and configure Lookback period and Hourly commitment to simulate alternative commitment levels and see the projected impact on Cost, Coverage, and Utilization.

This way is preferred if your purchasing strategy includes smaller, incremental commitments over time or if you expect future usage changes that could affect your ideal purchase amount.

After reviewing the recommendations or running simulations in Savings Plans Recommendations or Savings Plans Purchase Analyzer, choose Add to cart to proceed with your chosen commitment. If you prefer to purchase directly, you can also navigate to the Purchase Savings Plans page. The console updates estimated discounts and coverage in real time as you adjust each setting, so you can evaluate the impact before completing your order.

You can learn more about how to choose and purchase Database Saving Plans by visiting the Savings Plans User Guide documents.

Now available
Database Savings Plans are available in all AWS Regions outside of China. Give them a try and start shaping your database strategy with more flexibility and predictable costs.

– Betty

Amazon S3 Vectors now generally available with increased scale and performance

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/amazon-s3-vectors-now-generally-available-with-increased-scale-and-performance/

Today, I’m excited to announce that Amazon S3 Vectors is now generally available with significantly increased scale and production-grade performance capabilities. S3 Vectors is the first cloud object storage with native support to store and query vector data. You can use it to help you reduce the total cost of storing and querying vectors by up to 90% when compared to specialized vector database solutions.

Since we announced the preview of S3 Vectors in July, I’ve been impressed by how quickly you adopted this new capability to store and query vector data. In just over four months, you created over 250,000 vector indexes and ingested more than 40 billion vectors, performing over 1 billion queries (as of November 28th).

You can now store and search across up to 2 billion vectors in a single index, that’s up to 20 trillion vectors in a vector bucket and a 40x increase from 50 million per index during preview. This means that you can consolidate your entire vector dataset into one index, removing the need to shard across multiple smaller indexes or implement complex query federation logic.

Query performance has been optimized. Infrequent queries continue to return results in under one second, with more frequent queries now resulting in latencies around 100ms or less, making it well-suited for interactive applications such as conversational AI and multi-agent workflows. You can also retrieve up to 100 search results per query, up from 30 previously, providing more comprehensive context for retrieval augmented generation (RAG) applications.

The write performance has also improved substantially, with support for up to 1,000 PUT transactions per second when streaming single-vector updates into your indexes, delivering significantly higher write throughput for small batch sizes. This higher throughput supports workloads where new data must be immediately searchable, helping you ingest small data corpora quickly or handle many concurrent sources writing simultaneously to the same index.

The fully serverless architecture removes infrastructure overhead—there’s no infrastructure to set up or resources to provision. You pay for what you use as you store and query vectors. This AI-ready storage provides you with quick access to any amount of vector data to support your complete AI development lifecycle, from initial experimentation and prototyping through to large-scale production deployments. S3 Vectors now provides the scale and performance needed for production workloads across AI agents, inference, semantic search, and RAG applications.

Two key integrations that were launched in preview are now generally available. You can use S3 Vectors as a vector storage engine for Amazon Bedrock Knowledge Base. In particular, you can use it to build RAG applications with production-grade scale and performance. Moreover, S3 Vectors integration with Amazon OpenSearch is now generally available, so that you can use S3 Vectors as your vector storage layer while using OpenSearch for search and analytics capabilities.

You can now use S3 Vectors in 14 AWS Regions, expanding from five AWS Regions during the preview.

Let’s see how it works
In this post, I demonstrate how to use S3 Vectors through the AWS Console and CLI.

First, I create an S3 Vector bucket and an index.

echo "Creating S3 Vector bucket..."
aws s3vectors create-vector-bucket \
    --vector-bucket-name "$BUCKET_NAME"

echo "Creating vector index..."
aws s3vectors create-index \
    --vector-bucket-name "$BUCKET_NAME" \
    --index-name "$INDEX_NAME" \
    --data-type "float32" \
    --dimension "$DIMENSIONS" \
    --distance-metric "$DISTANCE_METRIC" \
    --metadata-configuration "nonFilterableMetadataKeys=AMAZON_BEDROCK_TEXT,AMAZON_BEDROCK_METADATA"

The dimension metric must match the dimension of the model used to compute the vectors. The distance metric indicates to the algorithm to compute the distance between vectors. S3 Vectors supports cosine and euclidian distances.

I can also use the console to create the bucket. We’ve added the capability to configure encryption parameters at creation time. By default, indexes use the bucket-level encryption, but I can override bucket-level encryption at the index level with a custom AWS Key Management Service (AWS KMS) key.

I also can add tags for the vector bucket and vector index. Tags at the vector index help with access control and cost allocation.

S3 Vector console - create

And I can now manage Properties and Permissions directly in the console.

S3 Vector console - properties

S3 Vector console - create

Similarly, I define Non-filterable metadata and I configure Encryption parameters for the vector index.

S3 Vector console - create index

Next, I create and store the embeddings (vectors). For this demo, I ingest my constant companion: the AWS Style Guide. This is an 800-page document that describes how to write posts, technical documentation, and articles at AWS.

I use Amazon Bedrock Knowledge Bases to ingest the PDF document stored on a general purpose S3 bucket. Amazon Bedrock Knowledge Bases reads the document and splits it in pieces called chunks. Then, it computes the embeddings for each chunk with the Amazon Titan Text Embeddings model and it stores the vectors and their metadata on my newly created vector bucket. The detailed steps for that process are out of the scope of this post, but you can read the instructions in the documentation.

When querying vectors, you can store up to 50 metadata keys per vector, with up to 10 marked as non-filterable. You can use the filterable metadata keys to filter query results based on specific attributes. Therefore, you can combine vector similarity search with metadata conditions to narrow down results. You can also store more non-filterable metadata for larger contextual information. Amazon Bedrock Knowledge Bases computes and stores the vectors. It also adds large metadata (the chunk of the original text). I exclude this metadata from the searchable index.

There are other methods to ingest your vectors. You can try the S3 Vectors Embed CLI, a command line tool that helps you generate embeddings using Amazon Bedrock and store them in S3 Vectors through direct commands. You can also use S3 Vectors as a vector storage engine for OpenSearch.

Now I’m ready to query my vector index. Let’s imagine I wonder how to write “open source”. Is it “open-source”, with a hyphen, or “open source” without a hyphen? Should I use uppercase or not? I want to search the relevant sections of the AWS Style Guide relative to “open source.”

# 1. Create embedding request
echo '{"inputText":"Should I write open source or open-source"}' | base64 | tr -d '\n' > body_encoded.txt

# 2. Compute the embeddings with Amazon Titan Embed model
aws bedrock-runtime invoke-model \
  --model-id amazon.titan-embed-text-v2:0 \
  --body "$(cat body_encoded.txt)" \
  embedding.json

# Search the S3 Vectors index for similar chunks
vector_array=$(cat embedding.json | jq '.embedding') && \
aws s3vectors query-vectors \
  --index-arn "$S3_VECTOR_INDEX_ARN" \
  --query-vector "{\"float32\": $vector_array}" \
  --top-k 3 \
  --return-metadata \
  --return-distance | jq -r '.vectors[] | "Distance: \(.distance) | Source: \(.metadata."x-amz-bedrock-kb-source-uri" | split("/")[-1]) | Text: \(.metadata.AMAZON_BEDROCK_TEXT[0:100])..."'

The first result shows this JSON:

        {
            "key": "348e0113-4521-4982-aecd-0ee786fa4d1d",
            "metadata": {
                "x-amz-bedrock-kb-data-source-id": "0SZY6GYPVS",
                "x-amz-bedrock-kb-source-uri": "s3://sst-aws-docs/awsstyleguide.pdf",
                "AMAZON_BEDROCK_METADATA": "{\"createDate\":\"2025-10-21T07:49:38Z\",\"modifiedDate\":\"2025-10-23T17:41:58Z\",\"source\":{\"sourceLocation\":\"s3://sst-aws-docs/awsstyleguide.pdf\"",
                "AMAZON_BEDROCK_TEXT": "[redacted] open source (adj., n.) Two words. Use open source as an adjective (for example, open source software), or as a noun (for example, the code throughout this tutorial is open source). Don't use open-source, opensource, or OpenSource. [redacted]",
                "x-amz-bedrock-kb-document-page-number": 98.0
            },
            "distance": 0.63120436668396
        }

It finds the relevant section in the AWS Style Guide. I must write “open source” without a hyphen. It even retrieved the page number in the original document to help me cross-check the suggestion with the relevant paragraph in the source document.

One more thing
S3 Vectors has also expanded its integration capabilities. You can now use AWS CloudFormation to deploy and manage your vector resources, AWS PrivateLink for private network connectivity, and resource tagging for cost allocation and access control.

Pricing and availability
S3 Vectors is now available in 14 AWS Regions, adding Asia Pacific (Mumbai, Seoul, Singapore, Tokyo), Canada (Central), and Europe (Ireland, London, Paris, Stockholm) to the existing five Regions from preview (US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Sydney), and Europe (Frankfurt))

Amazon S3 Vectors pricing is based on three dimensions. PUT pricing is calculated based on the logical GB of vectors you upload, where each vector includes its logical vector data, metadata, and key. Storage costs are determined by the total logical storage across your indexes. Query charges include a per-API charge plus a $/TB charge based on your index size (excluding non-filterable metadata). As your index scales beyond 100,000 vectors, you benefit from lower $/TB pricing. As usual, the Amazon S3 pricing page has the details.

To get started with S3 Vectors, visit the Amazon S3 console. You can create vector indexes, start storing your embeddings, and begin building scalable AI applications. For more information, check out the Amazon S3 User Guide or the AWS CLI Command Reference.

I look forward to seeing what you build with these new capabilities. Please share your feedback through AWS re:Post or your usual AWS Support contacts.

— seb

AWS DevOps Agent helps you accelerate incident response and improve system reliability (preview)

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-devops-agent-helps-you-accelerate-incident-response-and-improve-system-reliability-preview/

Today, we’re announcing the public preview of AWS DevOps Agent, a frontier agent that helps you respond to incidents, identify root causes, and prevent future issues through systematic analysis of past incidents and operational patterns.

Frontier agents represent a new class of AI agents that are autonomous, massively scalable, and work for hours or days without constant intervention.

When production incidents occur, on-call engineers face significant pressure to quickly identify root causes while managing stakeholder communications. They must analyze data across multiple monitoring tools, review recent deployments, and coordinate response teams. After service restoration, teams often lack bandwidth to transform incident learnings into systematic improvements.

AWS DevOps Agent is your always-on, autonomous on-call engineer. When issues arise, it automatically correlates data across your operational toolchain, from metrics and logs to recent code deployments in GitHub or GitLab. It identifies probable root causes and recommends targeted mitigations, helping reduce mean time to resolution. The agent also manages incident coordination, using Slack channels for stakeholder updates and maintaining detailed investigation timelines.

To get started, you connect AWS DevOps Agent to your existing tools through the AWS Management Console. The agent works with popular services such as Amazon CloudWatch, Datadog, Dynatrace, New Relic, and Splunk for observability data, while integrating with GitHub Actions and GitLab CI/CD to track deployments and their impact on your cloud resources. Through the bring your own (BYO) Model Context Protocol (MCP) server capability, you can also integrate additional tools such as your organization’s custom tools, specialized platforms or open source observability solutions, such as Grafana and Prometheus into your investigations.

The agent acts as a virtual team member and can be configured to automatically respond to incidents from your ticketing systems. It includes built-in support for ServiceNow, and through configurable webhooks, can respond to events from other incident management tools like PagerDuty. As investigations progress, the agent updates tickets and relevant Slack channels with its findings. All of this is powered by an intelligent application topology the agent builds—a comprehensive map of your system components and their interactions, including deployment history that helps identify potential deployment-related causes during investigations.

Let me show you how it works
To show you how it works, I deployed a straigthforward AWS Lambda function that intentionally generates errors when invoked. I deployed it in an AWS CloudFormation stack.

Step 1: Create an Agent Space

An Agent Space defines the scope of what AWS DevOps Agent can access as it performs tasks.

You can organize Agent Spaces based on your operational model. Some teams align an Agent Space with a single application, others create one per on-call team managing multiple services, and some organizations use a centralized approach. For this demonstration, I’ll show you how to create an Agent Space for a single application. This setup helps isolate investigations and resources for that specific application, making it easier to track and analyze incidents within its context.

In the AWS DevOps Agent section of the AWS Management Console, I select Create Agent Space, enter a name for this space and create the AWS Identity and Access Management (IAM) roles it uses to introspect AWS resources in my or others’ AWS accounts.

AWS DevOps Agent - Create an Agent SpaceFor this demo, I choose to enable the AWS DevOps Agent web app; more about this later. This can be done at a later stage.

When ready, I choose Create.

AWS DevOps Agent - Enable Web AppAfter it has been created, I choose the Topology tab.

This view shows the key resources, entities, and relationships AWS DevOps Agent has selected as a foundation for performing its tasks efficiently. It doesn’t represent everything AWS DevOps Agent can access or see, only what the Agent considers most relevant right now. By default, the Topology includes the AWS resources that are contained in my account. As your agent completes more tasks, it will discover and add new resources to this list.

AWS DevOps Agent - Topology

Step 2: Configure the AWS DevOps web app for the operators

The AWS DevOps Agent web app provides a web interface for on-call engineers to manually trigger investigations, view investigation details including relevant topology elements, steer investigations, and ask questions about an investigation.

I can access the web app directly from my Agent Space in the AWS console by choosing the Operator access link. Alternatively, I can use AWS IAM Identity Center to configure user access for my team. IAM Identity Center lets me manage users and groups directly or connect to an identity provider (IdP), providing a centralized way to control who can access the AWS DevOps Agent web app.

AWS DevOps Agent - web app access

At this stage, I have an Agent Space all set up to focus investigations and resources for this specific application, and I’ve enabled the DevOps team to initiate investigations using the web app.

Now that the one-time setup for this application is done, I start invoking the faulty Lambda function. It generates errors at each invocation. The CloudWatch alarm associated with the Lambda errors count turns on to ALARM state. In real life, you might receive an alert from external services, such as ServiceNow. You can configure AWS DevOps Agent to automatically start investigations when receiving such alerts.

For this demo, I manually start the investigation by selecting Start Investigation.

You can also choose from several preconfigured starting points to quickly begin your investigation: Latest alarm to investigate your most recent triggered alarm and analyze the underlying metrics and logs to determine the root cause, High CPU usage to investigate high CPU utilization metrics across your compute resources and identify which processes or services are consuming excessive resources, or Error rate spike to investigate the recent increase in application error rates by analyzing metrics, application logs, and identifying the source of failures.

AWS DevOps Agent - web app

I enter some information, such as Investigation details, Investigation starting point, the Date and time of the incident, the AWS Account ID for the incident.

- web app - start investigation

In the AWS DevOps Agent web app, you can watch the investigation unfold in real time. The agent identifies the application stack. It correlates metrics from CloudWatch, examines logs from CloudWatch Logs or external sources, such as Splunk, reviews recent code changes from GitHub, and analyzes traces from AWS X-Ray.

- web app - application stack

It identifies the error patterns and provides a detailed investigation summary. In the context of this demo, the investigation reveals that these are intentional test exceptions, shows the timeline of function invocations leading to the alarm, and even suggests monitoring improvements for error handling.

The agent uses a dedicated incident channel in Slack, notifies on-call teams if needed, and provides real-time status updates to stakeholders. Through the investigation chat interface, you can interact directly with the agent by asking clarifying questions such as “which logs did you analyze?” or steering the investigation by providing additional context, such as “focus on these specific log groups and rerun your analysis.” If you need expert assistance, you can create an AWS Support case with a single click, automatically populating it with the agent’s findings, and engage with AWS Support experts directly through the investigation chat window.

For this demo, the AWS DevOps Agent correctly identified manual activities in the Lambda console to invoke a function that intentionally triggers errors 😇.

- web app - root cause

Beyond incident response, AWS DevOps Agent analyzes my recent incidents to identify high-impact improvements that prevent future issues.

During active incidents, the agent offers immediate mitigation plans through its incident mitigations tab to help restore service quickly. Mitigation plans consist of specs that provide detailed implementation guidance for developers and agentic development tools like Kiro.

For longer-term resilience, it identifies potential enhancements by examining gaps in observability, infrastructure configurations, and deployment pipeline. My straightforward demo that triggered intentional errors was not enough to generate relevant recommendations though.

AWS DevOps Agent - web app - recommendations

For example, it might detect that a critical service lacks multi-AZ deployment and comprehensive monitoring. The agent then creates detailed recommendations with implementation guidance, considering factors like operational impact and implementation complexity. In an upcoming quick follow-up release, the agent will expand its analysis to include code bugs and testing coverage improvements.

Availability
You can try AWS DevOps Agent today in the US East (N. Virginia) Region. Although the agent itself runs in US East (N. Virginia) (us-east-1), it can monitor applications deployed in any Region, across multiple AWS accounts.

During the preview period, you can use AWS DevOps Agent at no charge, but there will be a limit on the number of agent task hours per month.

As someone who has spent countless nights debugging production issues, I’m particularly excited about how AWS DevOps Agent combines deep operational insights with practical, actionable recommendations. The service helps teams move from reactive firefighting to proactive system improvement.

To learn more and sign up for the preview, visit AWS DevOps Agent. I look forward to hearing how AWS DevOps Agent helps improve your operational efficiency.

— seb

Accelerate AI development using Amazon SageMaker AI with serverless MLflow

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/accelerate-ai-development-using-amazon-sagemaker-ai-with-serverless-mlflow/

Since we announced Amazon SageMaker AI with MLflow in June 2024, our customers have been using MLflow tracking servers to manage their machine learning (ML) and AI experimentation workflows. Building on this foundation, we’re continuing to evolve the MLflow experience to make experimentation even more accessible.

Today, I’m excited to announce that Amazon SageMaker AI with MLflow now includes a serverless capability that eliminates infrastructure management. This new MLflow capability transforms experiment tracking into an immediate, on-demand experience with automatic scaling that removes the need for capacity planning.

The shift to zero-infrastructure management fundamentally changes how teams approach AI experimentation—ideas can be tested immediately without infrastructure planning, enabling more iterative and exploratory development workflows.

Getting started with Amazon SageMaker AI and MLflow
Let me walk you through creating your first serverless MLflow instance.

I navigate to Amazon SageMaker AI Studio console and select the MLflow application. The term MLflow Apps replaces the previous MLflow tracking servers terminology, reflecting the simplified, application-focused approach.

Here, I can see there’s already a default MLflow App created. This simplified MLflow experience makes it more straightforward for me to start doing experiments.

I choose Create MLflow App, and enter a name. Here, I have both an AWS Identity and Access Management (IAM) role and Amazon Simple Service (Amazon S3) bucket are already been configured. I only need to modify them in Advanced settings if needed.

Here’s where the first major improvement becomes apparent—the creation process completes in approximately 2 minutes. This immediate availability enables rapid experimentation without infrastructure planning delays, eliminating the wait time that previously interrupted experimentation workflows.

After it’s created, I receive an MLflow Amazon Resource Name (ARN) for connecting from notebooks. The simplified management means no server sizing decisions or capacity planning required. I no longer need to choose between different configurations or manage infrastructure capacity, which means I can focus entirely on experimentation. You can learn how to use MLflow SDK at Integrate MLflow with your environment in the Amazon SageMaker Developer Guide.

With MLflow 3.4 support, I can now access new capabilities for generative AI development. MLflow Tracing captures detailed execution paths, inputs, outputs, and metadata throughout the development lifecycle, enabling efficient debugging across distributed AI systems.

This new capability also introduces cross-domain access and cross-account access through AWS Resource Access Manager (AWS RAM) share. This enhanced collaboration means that teams across different AWS domains and accounts can share MLflow instances securely, breaking down organizational silos.

Better together: Pipelines integration
Amazon SageMaker Pipelines is integrated with MLflow. SageMaker Pipelines is a serverless workflow orchestration service purpose-built for machine learning operations (MLOps) and large language model operations (LLMOps) automation—the practices of deploying, monitoring, and managing ML and LLM models in production. You can easily build, execute, and monitor repeatable end-to-end AI workflows with an intuitive drag-and-drop UI or the Python SDK.

From a pipeline, a default MLflow App will be created if one doesn’t already exist. The experiment name can be defined and metrics, parameters, and artifacts are logged to the MLflow App as defined in your code. SageMaker AI with MLflow is also integrated with familiar SageMaker AI model development capabilities like SageMaker AI JumpStart and Model Registry, enabling end-to-end workflow automation from data preparation through model fine-tuning.

Things to know
Here are key points to note:

  • Pricing – The new serverless MLflow capability is offered at no additional cost. Note there are service limits that apply.
  • Availability – This capability is available in the following AWS Regions: US East (N. Virginia, Ohio), US West (N.California, Oregon), Asia Pacific (Mumbai, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Paris, Stockholm), South America (São Paulo).
  • Automatic upgrades: MLflow in-place version upgrades happen automatically, providing access to the latest features without manual migration work or compatibility concerns. The service currently supports MLflow 3.4, providing access to the latest capabilities including enhanced tracing features.
  • Migration support – You can use the open source MLflow export-import tool available at mlflow-export-import to help migrate from existing Tracking Servers, whether they’re from SageMaker AI, self-hosted, or otherwise to serverless MLflow (MLflow Apps).

Get started with serverless MLflow by visiting Amazon SageMaker AI Studio and creating your first MLflow App. Serverless MLflow is also supported in SageMaker Unified Studio for additional workflow flexibility.

Happy experimenting!
Donnie

Introducing Amazon Nova 2 Lite, a fast, cost-effective reasoning model

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-nova-2-lite-a-fast-cost-effective-reasoning-model/

Today, we’re releasing Amazon Nova 2 Lite, a fast, cost-effective reasoning model for everyday workloads. Available in Amazon Bedrock, the model offers industry-leading price performance and helps enterprises and developers build capable, reliable, and efficient agentic-AI applications. For organizations who need AI that truly understands their domain, Nova 2 Lite is the best model to use with Nova Forge to build their own frontier intelligence.

Nova 2 Lite supports extended thinking, including step-by-step reasoning and task decomposition, before providing a response or taking action. Extended thinking is off by default to deliver fast, cost-optimized responses, but when deeper analysis is needed, you can turn it on and choose from three thinking budget levels: low, medium, or high, giving you control over the speed, intelligence, and cost tradeoff.

Nova 2 Lite supports text, image, video, document as input and offers a one million-token context window, enabling expanded reasoning and richer in-context learning. In addition, Nova 2 Lite can be customized for your specific business needs. The model also includes access to two built-in tools: web grounding and a code interpreter. Web grounding retrieves publicly available information with citations, while the code interpreter allows the model to run and evaluate code within the same workflow.

Amazon Nova 2 Lite demonstrates strong performance across diverse evaluation benchmarks. The model excels in core intelligence across multiple domains including instruction following, math, and video understanding with temporal reasoning. For agentic workflows, Nova 2 Lite shows reliable function calling for task automation and precise UI interaction capabilities. The model also demonstrates strong code generation and practical software engineering problem-solving abilities.

Amazon Nova 2 Lite benchmarks

Nova 2 Lite is built to meet your company’s needs
Nova 2 Lite can be used for a broad range of your everyday AI tasks. It offers the best combination of price, performance, and speed. Early customers are using Nova 2 Lite for customer service chatbots, document processing, and business process automation.

Nova 2 Lite can help support workloads across many different use cases:

  • Business applications – Automate business process workflow, intelligent document processing (IDP), customer support, and web search to improve productivity and outcomes
  • Software engineering – Generate code, debugging, refactoring, and migrating systems to accelerate development and increase efficiency
  • Business intelligence and research – Use long-horizon reasoning and web grounding to analyze internal and external sources to uncover insights, and make informed decisions

For specific requirements, Nova 2 Lite is also available for customization on both Amazon Bedrock and Amazon SageMaker AI.

Using Amazon Nova 2 Lite
In the Amazon Bedrock console, you can use the Chat/Text playground to quickly test the new model with your prompts. To integrate the model into your applications, you can use any AWS SDKs with the Amazon Bedrock InvokeModel and Converse API. Here’s a sample invocation using the AWS SDK for Python (Boto3).

import boto3

AWS_REGION="us-east-1"
MODEL_ID="global.amazon.nova-2-lite-v1:0"
MAX_REASONING_EFFORT="low" # low, medium, high

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)

# Enable extended thinking for complex problem-solving
response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=[{
        "role": "user",
        "content": [{"text": "I need to optimize a logistics network with 5 warehouses, 12 distribution centers, and 200 retail locations. The goal is to minimize total transportation costs while ensuring no location is more than 50 miles from a distribution center. What approach should I take?"}]
    }],
    additionalModelRequestFields={
        "reasoningConfig": {
            "type": "enabled", # enabled, disabled (default)
            "maxReasoningEffort": MAX_REASONING_EFFORT
        }
    }
)

# The response will contain reasoning blocks followed by the final answer
for block in response["output"]["message"]["content"]:
    if "reasoningContent" in block:
        reasoning_text = block["reasoningContent"]["reasoningText"]["text"]
        print(f"Nova's thinking process:\n{reasoning_text}\n")
    elif "text" in block:
        print(f"Final recommendation:\n{block['text']}")

You can also use the new model with agentic frameworks that supports Amazon Bedrock and deploy the agents using Amazon Bedrock AgentCore. In this way, you can build agents for a broad range of tasks. Here’s the sample code for an interactive multi-agent system using the Strands Agents SDK. The agents have access to multiple tools, including read and write file access and the possibility to run shell commands.

from strands import Agent
from strands.models import BedrockModel
from strands_tools import calculator, editor, file_read, file_write, shell, http_request, graph, swarm, use_agent, think

AWS_REGION="us-east-1"
MODEL_ID="global.amazon.nova-2-lite-v1:0"
MAX_REASONING_EFFORT="low" # low, medium, high

SYSTEM_PROMPT = (
    "You are a helpful assistant. "
    "Follow the instructions from the user. "
    "To help you with your tasks, you can dynamically create specialized agents and orchestrate complex workflows."
)

bedrock_model = BedrockModel(
    region_name=AWS_REGION,
    model_id=MODEL_ID,
    additional_request_fields={
        "reasoningConfig": {
            "type": "enabled", # enabled, disabled (default)
            "maxReasoningEffort": MAX_REASONING_EFFORT
        }
    }
)

agent = Agent(
    model=bedrock_model,
    system_prompt=SYSTEM_PROMPT,
    tools=[calculator, editor, file_read, file_write, shell, http_request, graph, swarm, use_agent, think]
)

while True:
    try:
        prompt = input("\nEnter your question (or 'quit' to exit): ").strip()
        if prompt.lower() in ['quit', 'exit', 'q']:
            break
        if len(prompt) > 0:
            agent(prompt)
    except KeyboardInterrupt:
        break
    except EOFError:
        break

print("\nGoodbye!")

Things to know
Amazon Nova 2 Lite is now available in Amazon Bedrock via global cross-Region inference in multiple locations. For Regional availability and future roadmap, visit AWS Capabilities by Region.

Nova 2 Lite includes built-in safety controls to promote responsible AI use, with content moderation capabilities that help maintain appropriate outputs across a wide range of applications.

To understand the costs, see Amazon Bedrock pricing. To learn more, visit the Amazon Nova User Guide.

Start building with Nova 2 Lite today. To experiment with the new model, visit the Amazon Nova interactive website. Try the model in the Amazon Bedrock console, and share your feedback on AWS re:Post.

Danilo

New AWS Security Agent secures applications proactively from design to deployment (preview)

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/new-aws-security-agent-secures-applications-proactively-from-design-to-deployment-preview/

Today, we’re announcing AWS Security Agent in preview, a frontier agent that proactively secures your applications throughout the development lifecycle. It conducts automated application security reviews tailored to your organizational requirements and delivers context-aware penetration testing on demand. By continuously validating application security from design to deployment, it helps prevent vulnerabilities early in development.

Static application security testing (SAST) tools examine code without runtime context, whereas dynamic application security testing (DAST) tools assess running applications without application-level context. Both types of tools are one-dimensional because they don’t understand your application context. They don’t understand how your application is designed, what security threats it faces, and where and how it runs. This forces security teams to manually review everything, creating delays. Penetration testing is even slower—you either wait weeks for an external vendor or your internal security team to find time. When every application requires a manual security review and penetration test, the backlog grows quickly. Applications wait weeks or months for security validation before they can launch. This creates a gap between the frequency of software releases and the frequency of security evaluations. Security is not applied to the entire portfolio of applications, leaving customers exposed and knowingly shipping vulnerable code to meet deadlines. Over 60 percent of organizations update web applications weekly or more often, while nearly 75 percent test web applications monthly or less often. A 2025 report from Checkmarx found that 81 percent of organizations knowingly deploy vulnerable code to meet delivery deadlines.

AWS Security Agent is context-aware—it understands your entire application. It understands your application design, your code, and your specific security requirements. It continuously scans for security violations automatically and runs penetration tests on-demand instantly without scheduling. The penetration testing agent creates a customized attack plan informed by the context it has learned from your security requirements, design documents, and source code, and dynamically adapts as it runs based on what it discovers, such as endpoints, status and error codes, and credentials. This helps surface deeper, more sophisticated vulnerabilities before production, ensuring your application is secure before it launches without delays or surprises.

“SmugMug is excited to add AWS Security Agent to our automated security portfolio. AWS Security Agent transforms our security ROI by enabling pen test assessments that complete in hours rather than days, at a fraction of manual testing costs. We can now assess our services more frequently, dramatically decreasing the time to identify and address issues earlier in the software development lifecycle.” says Erik Giberti, Sr. Director of Product Engineering at SmugMug.

Get started with AWS Security Agent
AWS Security Agent provides design security review, code security review, and on-demand penetration testing capabilities. Design and code review check organizational security requirements that you define, and penetration testing learns application context from source code and specifications to identify vulnerabilities. To get started, navigate to the AWS Security Agent console. The console landing page provides an overview of how AWS Security Agent delivers continuous security assessment across your development lifecycle.

The Get started with AWS Security Agent panel on the right side of the landing page guides you through initial configuration. Choose Set up AWS Security Agent to create your first agent space and begin performing security reviews on your applications.

Provide an Agent space name to identify which agent you’re interacting with across different security assessments. An agent space is an organizational container that represents a distinct application or project you want to secure. Each agent space has its own testing scope, security configuration, and dedicated web application domain. We recommend creating one agent space per application or project to maintain clear boundaries and organized security assessments. You can optionally add a Description to provide context about the agent space’s purpose for other administrators.

When you create the first agent space in the AWS Management Console, AWS creates the Security Agent Web Application. The Security Agent Web Application is where users conduct design reviews and execute penetration tests within the boundaries established by administrators in the console. Users select which agent space to work in when conducting design reviews or executing penetration tests.

During the setup process, AWS Security Agent offers two options for managing user access to the Security Agent Web Application: Single Sign-On (SSO) with IAM Identity Center, which enables team-wide SSO access by integrating with AWS IAM Identity Center, or IAM users, which allows only AWS Identity and Access Management (IAM) users of this AWS account to access the Security Agent Web Application directly through the console and is best for quick setup or access without SSO configuration. When you choose the SSO option, AWS Security Agent creates an IAM Identity Center instance to provide centralized authentication and user management for AppSec team members who will access design reviews, code reviews, and penetration testing capabilities through the Security Agent Web Application.

The permissions configuration section helps you control how AWS Security Agent accesses other AWS services, APIs, and accounts. You can create a default IAM role that AWS Security Agent will use to access resources, or choose an existing role with appropriate permissions.

After completing the initial configuration, choose Set up AWS Security Agent to create the agent.

After creating an agent space, the agent configuration page displays three capability cards: Design review, Code review, and Penetration testing. While not required to operate the penetration testing, if you plan to use design review or code review capabilities, you can configure which security requirements will guide those assessments. AWS Security Agent includes AWS managed requirements, and you can optionally define custom requirements tailored to your organization. You can also manage which team members have access to the agent.

Security requirements
AWS Security Agent enforces organizational security requirements that you define so that applications comply with your team’s policies and standards. Security requirements specify the controls and policies that your applications must follow during both design and code review phases.

To manage security requirements, navigate to Security requirements in the navigation pane. These requirements are shared across all agent spaces and apply to both design and code reviews.

Managed security requirements are based on industry standards and best practices. These requirements are ready to use, maintained by AWS, and you can enable them instantly without configuration.

When creating a custom security requirement, you specify the control name and description that defines the policy. For example, you might create a requirement called Network Segmentation Strategy Defined that requires designs to define clear network segmentation separating workload components into logical layers based on data sensitivity. Or you might define Short Session Timeouts for Privileged and PII Access to mandate specific timeout durations for administrative and personally identifiable information (PII) access. Another example is Customer-Managed Encryption Keys Required, which requires designs to specify customer managed AWS Key Management Service (AWS KMS) keys rather than AWS managed keys for encrypting sensitive data at rest. AWS Security Agent evaluates designs and code against these enabled requirements, identifying policy violations.

Design security review
The design review capability analyzes architectural documents and product specifications to identify security risks before code is written. AppSec teams upload design documents through the AWS Security Agent console or ingest them from S3 and other connected services. AWS Security Agent assesses compliance with organizational security requirements and provides remediation guidance.

Before conducting design reviews, confirm you’ve configured the security requirements that AWS Security Agent will check. You can get started with AWS managed security requirements or define custom requirements tailored to your organization, as described in the Security requirements section.

To get started with the Design review, choose Admin access under Web app access to access the web app interface. When logged in, choose Create design review. Enter a Design review name to identify the assessment—for example, when assessing a new feature design that extends your application—and upload up to five design files. Choose Start design review to begin the assessment against your enabled security requirements.

After completing a design review, the design review detail page displays the review status, completion date, and files reviewed in the Details section. The Findings summary shows the count of findings across four compliance status categories:

  • Non-compliant – The design violates or inadequately addresses the security requirement.
  • Insufficient data – The uploaded files don’t contain enough information to determine compliance.
  • Compliant – The design meets the security requirement based on the uploaded documentation.
  • Not applicable – The security requirement’s relevance criteria indicate it doesn’t apply to this system design.

The Findings summary section helps you quickly assess which security requirements need attention. Non-compliant findings require updates to your design documents, while Insufficient data findings indicate gaps in the documentation where security teams should work with application teams to gather additional clarity before AWS Security Agent can complete the assessment.

The Files reviewed section displays all uploaded documents with options to search and download the original files.

The Review findings section lists each security requirement evaluated during the review along with its compliance status. In this example, the findings include Network Segmentation Strategy Defined, Customer-Managed Encryption Keys Required, and Short Session Timeouts for Privileged and PII Access. These are the custom security requirements defined earlier in the Security requirements section. You can search for specific security requirements or filter findings by compliance status to focus on items that require action.

When you choose a specific finding, AWS Security Agent displays detailed justification explaining the compliance status and provides recommended remediation steps. This context-aware analysis helps you understand security concerns specific to your design rather than generic security guidance. For designs with noncompliant findings, you can update your documentation to address the security requirements and create a new design review to validate the improvements. You can also choose Clone this design review to create a new assessment based on the current configuration or choose Download report to export the complete findings for sharing with your team.

After validating that your application design meets organizational security requirements, the next step is enforcing those same requirements as developers write code.

Code security review
The code review capability analyzes pull requests in GitHub to identify security vulnerabilities and organizational policy violations. AWS Security Agent detects OWASP Top Ten common vulnerabilities such as SQL injection, cross-site scripting, and inadequate input validation. It also enforces the same organizational security requirements used in design review, implementing code compliance with your team’s policies beyond common vulnerabilities.

When your application checks in new code, AWS Security Agent verifies compliance with organizational security requirements that go beyond common vulnerabilities. For example, if your organization requires audit logs to be retained for only 90 days, AWS Security Agent identifies when code configures a 365-day retention period and comments on the pull request with the specific violation. This catches policy violations that traditional security tools miss because the code is technically functional and secure.

To enable code review, choose Enable code review on the agent configuration page and connect your GitHub repositories. You can enable code review for specific repositories or connect repositories without enabling code review if you want to use them for penetration testing context instead.

For detailed setup instructions, visit the AWS Security Agent documentation.

On-demand penetration testing
The on-demand penetration testing capability executes comprehensive security testing to discover and validate vulnerabilities through multistep attack scenarios. AWS Security Agent systematically discovers the application’s attack surface through reconnaissance and endpoint enumeration, then deploys specialized agents to execute security testing across 13 risk categories, including authentication, authorization, and injection attacks. When provided source code, API specifications, and business documentation, AWS Security Agent builds deeper context about the application’s architecture and business rules to generate more targeted test cases. It adapts testing based on application responses and adjusts attack strategies as it discovers new information during the assessment.

AWS Security Agent tests web applications and APIs against OWASP Top Ten vulnerability types, identifying exploitable issues that static analysis tools miss. For example, while dynamic application security testing (DAST) tools look for direct server-side template injection (SSTI) payloads, AWS Security Agent can combine SSTI attacks with error forcing and debug output analysis to execute more complex exploits. AppSec teams define their testing scope—target URLs, authentication details, threat models, and documentation—the same as they would brief a human penetration tester. Using this understanding, AWS Security Agent develops application context and autonomously executes sophisticated attack chains to discover and validate vulnerabilities. This transforms penetration testing from a periodic bottleneck into a continuous security practice, reducing risk exposure.

To enable penetration testing, choose Enable penetration test on the agent configuration page. You can configure target domains, VPC settings for private endpoints, authentication credentials, and additional context sources such as GitHub repositories or S3 buckets. You must verify ownership of each domain before AWS Security Agent can run penetration testing.

After enabling the capability, create and run penetration tests through the AWS Security Agent Web Application. For detailed setup and configuration instructions, visit the AWS Security Agent documentation.

After creating and running a penetration test, the detail page provides an overview of test execution and results. You can run new tests or modify the configuration from this page. The page displays information about the most recent execution, including start time, status, duration, and a summary of discovered vulnerabilities categorized by severity. You can also view a history of all previous test runs with their findings summaries.

For each run, the detail page provides three tabs. The Penetration test run overview tab displays high-level information about the execution, including duration and overall status. The Penetration test logs tab lists all tasks executed during the penetration test, providing visibility into how AWS Security Agent discovered vulnerabilities, including the security testing actions performed, application responses, and the reasoning behind each test. The Findings tab displays all discovered vulnerabilities with complete details, including descriptions, attack reasoning, steps to reproduce, impact, and remediation guidance.

Join the preview
To get started with AWS Security Agent, visit the AWS Security Agent console and create your first agent to begin automating design reviews, code reviews, and penetration testing across your development lifecycle. During the preview period, AWS Security Agent is free of charge.

AWS Security Agent is available in the US East (N. Virginia) Region.

To learn more, visit the AWS Security Agent product page and technical documentation.

— Esra

AWS Security Hub now generally available with near real-time analytics and risk prioritization

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/aws-security-hub-now-generally-available-with-near-real-time-analytics-and-risk-prioritization/

Today, AWS Security Hub is generally available, transforming how security teams identify and respond to critical security risks across their AWS environments. These new capabilities were first announced in preview at AWS re:Inforce 2025. Security Hub prioritizes your critical security issues and unifies your security operations to help you respond at scale by correlating and enriching signals across multiple AWS security services. Security Hub provides near real-time risk analytics, trends, unified enablement, streamlined pricing, and automated correlation that transforms security signals into actionable insights.

Organizations deploying multiple security tools need to manually correlate signals across different consoles, creating operational overhead that can delay detection and response times. Security teams use various tools for threat detection, vulnerability management, security posture monitoring, and sensitive data discovery, but extracting value from the findings these tools generate requires significant manual effort to understand relationships and determine priority.

Security Hub addresses these challenges through built-in integration that unifies your cloud security operations. Available for individual accounts or entire AWS Organizations accounts, Security Hub automatically aggregates and correlates signals from Amazon GuardDuty, Amazon Inspector, AWS Security Hub Cloud Security Posture Management (AWS Security Hub CSPM), and Amazon Macie, organizing them by threats, exposures, resources, and security coverage. This unified approach reduces manual correlation work, helping you quickly identify critical issues, understand coverage gaps, and prioritize remediation based on severity and impact.

What’s new in general availability
Since the preview announcement, Security Hub has added several new features.

Historical trends
Security Hub includes a Trends feature through the Summary dashboard that provides up to 1 year of historical data for findings and resources across your organization. The Summary dashboard displays an overview of your exposures, threats, resources, and security coverage through customizable widgets that you can add, remove, and arrange based on your operational needs.

The dashboard includes a Trends overview widget that displays period-over-period analysis for day-over-day, week-over-week, and month-over-month comparisons, helping you track whether your security posture is improving or degrading. Trend widgets for Active threat findings, Active exposure findings, and Resource trends provide visualizations of average counts over selectable time periods including 5 days, 30 days, 90 days, 6 months, and 1 year. You can filter these visualizations by severity levels such as critical, high, medium, and low, and hover over specific points in time to review detailed counts.

The Summary dashboard also includes widgets that display current exposure summaries prioritized by severity, threat summaries showing malicious or suspicious activity, and resource inventories organized by type and associated findings.

The Security coverage widget helps you identify gaps in your security service deployment across your organization. This widget tracks which AWS accounts and Regions have security services enabled, helping you understand where you might lack visibility into threats, vulnerabilities, misconfigurations, or sensitive data. The widget displays account coverage across security capabilities including vulnerability management by Amazon Inspector, threat detection by GuardDuty, sensitive data discovery by Amazon Macie, and posture management by AWS Security Hub CSPM. Coverage percentages show which security checks passed or failed across your AWS accounts and Regions where Security Hub is enabled.

You can apply filters to widgets using shared filters that apply across all widgets, finding filters for exposure and threat data, or resource filters for inventory data. You can create and save filter sets using and/or operators to define specific criteria for your security analysis. Dashboard customizations, including saved filter sets and widget layouts, are saved automatically and persist across sessions.

If you configure cross-Region aggregation, the Summary dashboard includes findings from all linked Regions when viewing from your home Region. For delegated administrator accounts in AWS Organizations, data includes findings for both the administrator account and member accounts. Security Hub retains trends data for 1 year from the date findings are generated. After 1 year, trends data is automatically deleted.

Near real-time risk analytics
Security Hub now calculates exposures in near real-time and includes threat correlation from GuardDuty alongside existing vulnerability and misconfiguration analysis. When GuardDuty detects threats, Amazon Inspector identifies vulnerabilities, or AWS Security Hub CSPM discovers misconfigurations, Security Hub automatically correlates these findings and updates associated exposures. This advancement provides immediate feedback on your security posture, helping you quickly identify new exposures and verify that remediation actions have reduced risk as expected.

Security Hub correlates findings across AWS Security Hub CSPM, Amazon Inspector, Amazon Macie, Amazon GuardDuty, and other security services to identify exposures that could lead to security incidents. This correlation helps you understand when multiple security issues combine to create critical risk. Security Hub enriches security signals with context by analyzing resource associations, potential impact, and relationships between signals. For example, if Security Hub identifies an Amazon Simple Storage Service (Amazon S3) bucket containing sensitive data with versioning disabled, Object Lock disabled, and MFA delete disabled, remediating any component triggers automatic calculation, helping you verify remediation effectiveness without waiting for scheduled assessments.

The Exposure page organizes findings by title and severity, helping you focus on critical issues first. The page includes an Overview section with a trends graph that displays the average count of exposure findings over the last 90 days, segmented by severity level. This visualization helps you track changes in your exposure posture over time and identify patterns in security risk.

Exposure findings are grouped by title with expandable rows showing the count of affected resources and overall severity. Each exposure title describes the potential security impact, such as “Potential Data Destruction: S3 bucket with versioning, Object Lock, and MFA delete disabled” or “Potential Remote Execution: EC2 instance is reachable from VPC and has software vulnerabilities.” You can filter exposures using saved filter sets or quick filters based on severity levels including critical, high, medium, and low. The interface also provides filtering by account ID, resource type, and accounts, helping you quickly narrow down exposures relevant to specific parts of your infrastructure.

Security Hub generates exposures as soon as findings are available. For example, when you deploy an Amazon Elastic Compute Cloud (Amazon EC2) instance that is publicly accessible and Amazon Inspector detects a highly exploitable vulnerability while AWS Security Hub CSPM identifies the public accessibility configuration, Security Hub automatically correlates these findings to generate an exposure without waiting for a scheduled assessment. This near-real time correlation helps you identify critical risks in newly deployed resources and take action before they can be exploited.

When you select an exposure finding, the details page displays the exposure type, primary resource, Region, account, age, and creation time. The Overview section shows contributing traits that represent the security issues directly contributing to the exposure scenario. These traits are organized by categories such as Reachability, Vulnerability, Sensitive data, Misconfiguration, and Assumability.

The details page includes a Potential attack path tab that provides a visual graph showing how potential attackers could access and take control of your resources. This visualization displays the relationships between the primary resource (such as an EC2 instance), involved resources (such as VPC, subnet, network interface, security group, AWS Identity and Access Management (IAM) instance profile, IAM role, IAM policy, and volumes), and contributing traits. The graph helps you understand the complete attack surface and identify which security controls need adjustment.

The Traits tab lists all security issues contributing to the exposure, and the Resources tab shows all affected resources. The Remediation section provides prioritized guidance with links to documentation, recommending which traits to address first to reduce risk most effectively. By using this comprehensive view, you can investigate specific exposures, understand the full context of security risks, and track remediation progress as your team addresses vulnerabilities, misconfigurations, and other security gaps across your environment.

Expanded partner integrations
Security Hub supports integration with Jira and ServiceNow for incident management workflows. When viewing a finding, you can create a ticket in your preferred system directly from the AWS Security Hub console with finding details, severity, and recommended remediation steps automatically populated. You can also define automation rules in Security Hub that automatically create tickets in Atlassian’s Jira Service Management and ServiceNow based on criteria you specify, such as severity level, resource type, or finding type. This helps you route critical security issues to your incident response teams without manual intervention.

Security Hub findings are formatted in the Open Cybersecurity Schema Framework (OCSF) schema, an open-source standard that enables security tools to share data seamlessly. Partners who have built integrations with the OCSF format with Security Hub include Cribl, CrowdStrike, Databee, DataDog, Dynatrace, Expel, Graylog, Netskope, Securonix, SentinelOne, Splunk a Cisco company, Sumo Logic, Tines, Upwind Security, Varonis, DTEX, and Zscaler. Additionally, service partners such as Accenture, Caylent, Deloitte, Optiv, PwC, and Wipro can help you adopt Security Hub and the OCSF schema.

Security Hub also supports automated response workflows through Amazon EventBridge. You can create EventBridge rules that identify findings based on criteria you specify and route them to targets such as AWS Lambda functions or AWS Systems Manager Automation runbooks for processing and remediation. This helps you act on findings programmatically without manual intervention.

Now available
If you currently use AWS Security Hub CSPM, Amazon GuardDuty, Amazon Inspector, or Amazon Macie, you can access these capabilities by navigating to the AWS Security Hub console. If you’re a new customer, you can enable Security Hub through the AWS Management Console and configure the security services appropriate for your workloads. Security Hub automatically consumes findings from enabled services, making the findings available in the unified console and creating correlated exposure findings based on the ingested security data.

For Regional availability, visit our AWS Services by Region page. Near real-time exposure calculation and the Trends feature are included at no additional charge. Security Hub uses a streamlined, resource-based pricing model that consolidates charges across integrated AWS security services. The console includes a cost estimator to help you plan and forecast security investments across your AWS accounts and Regions before deployment. For detailed information about capabilities, supported integrations, and pricing, visit the AWS Security Hub product page and technical documentation.

— Esra

Amazon GuardDuty adds Extended Threat Detection for Amazon EC2 and Amazon ECS

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/amazon-guardduty-adds-extended-threat-detection-for-amazon-ec2-and-amazon-ecs/

Today, we’re announcing new enhancements to Amazon GuardDuty Extended Threat Detection with the addition of two attack sequence findings for Amazon Elastic Compute Cloud (Amazon EC2) instances and Amazon Elastic Container Service (Amazon ECS) tasks. These new findings build on the existing Extended Threat Detection capabilities, which already combine sequences involving AWS Identity and Access Management (IAM) credential misuse, unusual Amazon Simple Storage service (Amazon S3) bucket activity, and Amazon Elastic Kubernetes Service (Amazon EKS) cluster compromise. By adding coverage for EC2 instance groups and ECS clusters, this launch expands sequence-level visibility to virtual machine and container environments that support the same application. Together, these capabilities provide a more consistent and unified way to detect multistage activity across diverse Amazon Web Services (AWS) workloads.

Modern cloud environments are dynamic and distributed, often running virtual machines, containers, and serverless workloads at scale. Security teams strive to maintain visibility across these environments and connect related activities that might indicate complex, multistage attack sequences. These sequences can involve multiple steps, such as establishing initial access and persistence, providing missing credentials or performing unexpected data access, that unfold over time and across different sources. GuardDuty Extended Threat Detection automatically links these signals using AI and machine learning (ML) models trained at AWS scale to build a complete picture of the activity and surface high-confidence insights to help customers prioritize response actions. By combining evidence from diverse sources, this analysis produces high-fidelity, unified findings that would otherwise be difficult to infer from individual events.

How it works
Extended Threat Detection analyzes multiple types of security signals, including runtime activity, malware detections, VPC Flow Logs, DNS queries, and AWS CloudTrail events to identify patterns that represent a multistage attack across Amazon EC2 and Amazon ECS workloads. Detection works with the GuardDuty foundational plan, and turning on Runtime Monitoring for EC2 or ECS adds deeper process and network-level telemetry that strengthens signal analysis and increases the completeness of each attack sequence.

The new attack sequence findings combine runtime and other observed behaviors across the environment into a single critical-severity sequence. Each sequence includes an incident summary, a timeline of observed events, mapped MITRE ATT&CK® tactics and techniques, and remediation guidance to help you understand how the activity unfolded and which resources were affected.

EC2 instances and ECS tasks are often created and replaced automatically through Auto Scaling groups, shared launch templates, Amazon Machine Images (AMIs), IAM instance profiles, or cluster-level deployments. Because these resources commonly operate as part of the same application, activity observed across them might originate from a single underlying compromise. The new EC2 and ECS findings analyze these shared attributes and consolidate related signals into one sequence when GuardDuty detects a pattern affecting the group.

When a sequence is detected, the GuardDuty console highlights any critical-severity sequence findings on the Summary page, with the affected EC2 instance group or ECS cluster already identified. Selecting a finding opens a consolidated view that shows how the resources are connected, which signals contributed to the sequence, and how the activity progressed over time, helping you quickly understand the scope of impact across virtual machine and container workloads.

In addition to viewing sequences in the console, you can also see these findings in AWS Security Hub, where they appear on the new exposure dashboards alongside other GuardDuty findings to help you understand your overall security risk in one place. This detailed view establishes the context for interpreting how the analysis brings related signals together into a broader attack sequence.

Together, the analysis model and grouping logic give you a clearer, consolidated view of activity across virtual machine and container workloads, helping you focus on the events that matter instead of investigating numerous individual findings. By unifying related behaviors into a single sequence, Extended Threat Detection helps you assess the full context of an attack path and prioritize the most urgent remediation actions.

Now available
Amazon GuardDuty Extended Threat Detection with expanded coverage for EC2 instances and ECS tasks is now available in all AWS Regions where GuardDuty is offered. You can start using this capability today to detect coordinated, multistage activity across virtual machine and container workloads by combining signals from runtime activity, malware execution, and AWS API activity.

This expansion complements the existing Extended Threat Detection capabilities for Amazon EKS, providing unified visibility into coordinated, multistage activity across your AWS compute environment. To learn more, visit the Amazon GuardDuty product page.

Betty

Introducing Amazon Nova Forge: Build your own frontier models using Nova

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-nova-forge-build-your-own-frontier-models-using-nova/

Organizations are rapidly expanding their use of generative AI across all parts of the business. Applications requiring deep domain expertise or specific business context need models that truly understand their proprietary knowledge, workflows, and unique requirements.

While techniques like prompt engineering and Retrieval Augmented Generation (RAG) work well for many use cases, they have fundamental limitations when it comes to embedding specialized knowledge into a model’s core understanding. Supervised fine-tuning and reinforcement learning help in customizing the model, but they operate too late in the development lifecycle, layering modifications on top of models that are a fully trained, and therefore difficult to steer to specific domains of interest.

When organizations attempt deeper customization through Continued Pre-Training (CPT) using only their proprietary data, they often encounter catastrophic forgetting, where models lose their foundational capabilities as they learn new content. At the same time, the data, compute, and cost needed for training a model from scratch are still a prohibitive barrier for most organizations.

Today, we’re introducing Amazon Nova Forge, a new service to build your own frontier models using Nova. Nova Forge customers can start their development from early model checkpoints, blend their datasets with Amazon Nova-curated training data, and host their custom models securely on AWS. Nova Forge is the easiest and most cost-effective way to build your own frontier model.

Use cases and applications
Nova Forge is designed for organizations with access to proprietary or industry-specific data who want to build AI that truly understands their domain. This includes:

  • Manufacturing and automation – Building models that understand specialized processes, equipment data, and industry-specific workflows
  • Research and development – Creating models trained on proprietary research data and domain-specific knowledge
  • Content and media – Developing models that understand brand voice, content standards, and specific moderation requirements
  • Specialized industries – Training models on industry-specific terminology, regulations, and best practices

Depending on the specific use cases, Nova Forge can be used to add differentiated capabilities, enhance task-specific accuracy, reduce costs, and lower latency.

How Nova Forge works
Nova Forge addresses the limitations of current customization approaches by allowing you to start model development from early checkpoints across pre-training, mid-training, and post-training phases. You can blend your proprietary data with Amazon Nova-curated data throughout all training phases, running training using proven recipes on Amazon SageMaker AI fully managed infrastructure. This data mixing approach significantly reduces catastrophic forgetting compared to training with raw data alone, helping preserve foundational skills—including core intelligence, general instruction following capabilities, and safety benefits—while incorporating your specialized knowledge.

Nova Forge provides the ability to use reward functions in your own environment for reinforcement learning (RL). This allows the model to learn from feedback generated in environments that are representative of your use cases. Beyond single-step evaluations, you can also use your own orchestrator to manage multi-turn rollouts, enabling RL training for complex agent workflows and sequential decision-making tasks. Whether you’re using chemistry tools to score molecular designs, or robotics simulations that reward efficient task completion and penalize collisions, you can connect your proprietary environments directly.

You can also take advantage of the built-in responsible AI toolkit available in Nova Forge to configure the safety and content moderation settings of your model. You can adjust settings to meet your specific business needs in areas like safety, security, and handling of sensitive content.

Getting started with Nova Forge
Nova Forge integrates seamlessly with your existing AWS workflows. You can use the familiar tools and infrastructure in Amazon SageMaker AI to run your training, then import your custom Nova models as private models on Amazon Bedrock. This gives you the same security, consistent APIs, and broader AWS integrations as any model in Amazon Bedrock.

In Amazon SageMaker Studio, you can now build your frontier model with Amazon Nova.

Amazon Nova Forge in the SageMaker AI console

To start building the model, choose which checkpoint to use: pre-trained, mid-trained, or post-trained. You can also upload your dataset here or use existing datasets.

Amazon Nova Forge checkpoints

You can blend your training data by mixing in curated datasets provided by Nova. These datasets, categorized by domain, can help your model to preserve general performance and prevent overfitting or catastrophic forgetting.

Amazon Nova Forge data mixing

Optionally, you can choose to use Reinforcement Fine-Tuning (RFT) to improve factual accuracy and reduce hallucinations in specific domains.

When training completes, import the model into Amazon Bedrock and start using it in your applications.

Things to know
Amazon Nova Forge is available in the US East (N. Virginia) AWS Region. The program includes access to multiple Nova model checkpoints, training recipes to mix proprietary data with Amazon Nova-curated training data, proven training recipes, and integration with Amazon SageMaker AI and Amazon Bedrock.

Learn more in the Amazon Nova User Guide and explore Nova Forge from the Amazon SageMaker AI console.

Organizations interested in expert assistance can also reach out to our Generative AI Innovation Center for additional support with their model development initiatives.

Danilo

AWS Transform announces full-stack Windows modernization capabilities

Post Syndicated from Prasad Rao original https://aws.amazon.com/blogs/aws/aws-transform-announces-full-stack-windows-modernization-capabilities/

Earlier this year in May, we announced the general availability of AWS Transform for .NET, the first agentic AI service for modernizing .NET applications at scale. During the early adoption period of the service, we received valuable feedback indicating that, in addition to .NET application modernization, you would like to modernize SQL Server and legacy UI frameworks. Your applications typically follow a three-tier architecture—presentation tier, application tier, and database tier—and you need a comprehensive solution that can transform all of these tiers in a coordinated way.

Today, based on your feedback, we’re excited to announce AWS Transform for full-stack Windows modernization, to offload complex, tedious modernization work across the Windows application stack. You can now identify application and database dependencies and modernize them in an orchestrated way through a centralized experience.

AWS Transform accelerates full-stack Windows modernization by up to five times across application, UI, database, and deployment layers. Along with porting .NET Framework applications to cross-platform .NET, it migrates SQL Server databases to Amazon Aurora PostgreSQL-Compatible Edition with intelligent stored procedure conversion and dependent application code refactoring. For validation and testing, AWS Transform deploys applications to Amazon Elastic Compute Cloud (Amazon EC2) Linux or Amazon Elastic Container Service (Amazon ECS), and provides customizable AWS CloudFormation templates and deployment configurations for production use. AWS Transform has also added capabilities to modernize ASP.NET Web Forms UI to Blazor.

There is much to explore, so in this post I’ll provide the first look at AWS Transform for full-stack Windows modernization capabilities across all layers.

Create a full-stack Windows modernization transformation job
AWS Transform connects to your source code repositories and database servers, analyzes application and database dependencies, creates modernization waves, and orchestrates full-stack transformations for each wave.

To get started with AWS Transform, I first complete the onboarding steps outlined in the getting started with AWS Transform user guide. After onboarding, I sign in to the AWS Transform console using my credentials and create a job for full-stack Windows modernization.

Create a new job for Windows Modernization
Create a new job by choosing SQL Server Database Modernization

After creating the job, I complete the prerequisites. Then, I configure the database connector for AWS Transform to securely access SQL Server databases running on Amazon EC2 and Amazon Relational Database Service (Amazon RDS). The connector can connect to multiple databases within the same SQL Server instance.

Create new database connector by adding connector name and AWS Account ID

Next, I set up a connector to connect to my source code repositories.

Add a source code connector by adding Connection name, AWS Account ID and Code Connector Arn

Furthermore, I have the option to choose if I would like AWS Transform to deploy the transformed applications. I choose Yes and provide the target AWS account ID and AWS Region for deploying the applications. The deployment option can be configured later as well.

Choose if you would like to deploy transformed apps

After the connectors are set up, AWS Transform connects to the resources and runs the validation to verify IAM roles, network settings, and related AWS resources.

After the successful validation, AWS Transform discovers databases and their associated source code repositories. It identifies dependencies between databases and applications to create waves for transforming related components together. Based on this analysis, AWS Transform creates a wave-based transformation plan.

Start assessment for discovered database and source code repositories

Assessing database and dependent applications
For the assessment, I review the databases and source code repositories discovered by AWS Transform and choose the appropriate branches for code repositories. AWS Transform scans these databases and source code repositories, then presents a list of databases along with their dependent .NET applications and transformation complexity.

Start wave planning of asessed databases and dependent repositories

I choose the target databases and repositories for modernization. AWS Transform analyzes these selections and generates a comprehensive SQL Modernization Assessment Report with a detailed wave plan. I download the report to review the proposed modernization plan. The report includes an executive summary, wave plan, dependencies between databases and code repositories, and complexity analysis.

View SQL Modernization Assessment Report

Wave transformation at scale
The wave plan generated by AWS Transform consists of four steps for each wave. First, it converts the SQL Server schema to PostgreSQL. Second, it migrates the data. Third, it transforms the dependent .NET application code to make it PostgreSQL compatible. Finally, it deploys the application for testing.

Before converting the SQL Server schema, I can either create a new PostgreSQL database or choose an existing one as the target database.

Choose or create target database

After I choose the source and target databases, AWS Transform generates conversion reports for my review. AWS Transform converts the SQL Server schema to PostgreSQL-compatible structures, including tables, indexes, constraints, and stored procedures.

Download Schema conversion reports

For any schema that AWS Transform can’t automatically convert, I can manually address them in the AWS Database Migration Service (AWS DMS) console. Alternatively, I can fix them in my preferred SQL editor and update the target database instance.

After completing schema conversion, I have the option to proceed with data migration, which is an optional step. AWS Transform uses AWS DMS to migrate data from my SQL Server instance to the PostgreSQL database instance. I can choose to perform data migration later, after completing all transformations, or work with test data by loading it into my target database.

Choose if you would like to migrate data

The next step is code transformation. I specify a target branch for AWS Transform to upload the transformed code artifacts. AWS Transform updates the codebase to make the application compatible with the converted PostgreSQL database.

Specify target branch destination for transformed codebase

With this release, AWS Transform for full-stack Windows modernization supports only codebases in .NET 6 or later. For codebases in .NET Framework 3.1+, I first use AWS Transform for .NET to port them to cross-platform .NET. I’ll expand on this in a following section.

After the conversion is completed, I can view the source and target branches along with their code transformation status. I can also download and review the transformation report.

Download transformation report

Modernizing .NET Framework applications with UI layer
One major feature we’re releasing today is the modernization of UI frameworks from ASP.NET Web Forms to Blazor. This is added to existing support for modernizing model-view-controller (MVC) Razor views to ASP.NET Core Razor views.

As mentioned previously, if I have a .NET application in legacy .NET Framework, then I continue using AWS Transform for .NET to port it to cross-platform .NET. For legacy applications with UIs built on ASP.NET Web Forms, AWS Transform now modernizes the UI layer to Blazor along with porting the backend code.

AWS Transform for .NET converts ASP.NET Web Forms projects to Blazor on ASP.NET Core, facilitating the migration of ASP.NET websites to Linux. The UI modernization feature is enabled by default in AWS Transform for .NET on both the AWS Transform web console and Visual Studio extension.

During the modernization process, AWS Transform handles the conversion of ASPX pages, ASCX custom controls, and code-behind files, implementing them as server-side Blazor components rather than web assembly. The following project and file changes are made during the transformation:

From To Description
*.aspx, *.ascx *.razor .aspx pages and .ascx custom controls become .razor files
Web.config appsettings.json Web.config settings become appsettings.json settings
Global.asax Program.cs Global .asax code becomes Program.cs code
*.master *layout.razor Master files become layout.razor files

Image showcasing how the specific project files are transformed

Other new features in AWS Transform for .NET
Along with UI porting, AWS Transform for .NET has added support for more transformation capabilities and enhanced developer experience. These new features include the following:

  • Port to .NET 10 and .NET Standard – AWS Transform now supports porting to .NET 10, the latest Long-Term Support (LTS) release, which was released on November 11, 2025. It also supports porting class libraries to .NET Standard, a formal specification for a set of APIs that are common across all .NET implementations. Furthermore, AWS Transform is now available with AWS Toolkit for Visual Studio 2026.
  • Editable transformation report – After the assessment is complete, you can now view and customize the transformation plan based on your specific requirements and preferences. For example, you can update package replacement details.
  • Real-time transformation updates with estimated remaining time – Depending on the size and complexity of the codebase, AWS Transform can take some time to complete the porting. You can now track transformation updates in real-time along with the estimated remaining time.
  • Next steps markdown – After the transformation is complete, AWS Transform now generates a next steps markdown file with the remaining tasks to complete the porting. You can use this as a revised plan to repeat the transformation with AWS Transform or use AI code-companions to complete the porting.

Things to know
Some more things to know are:

  • AWS Regions – AWS Transform for full-stack Windows modernization is generally available today in the US East (N. Virginia) Region. For Regional availability and future roadmap, visit the AWS Capabilities by Region.
  • Pricing – Currently, there is no added charge for Windows modernization features of AWS Transform. Any resources you create or continue to use in your AWS account using the output of AWS Transform are billed according to their standard pricing. For limits and quotas, refer to the AWS Transform User Guide.
  • SQL Server versions supported – AWS Transform supports the transformation of SQL Server versions from 2008 R2 through 2022, including all editions (Express, Standard, and Enterprise). SQL Server must be hosted on Amazon RDS or Amazon EC2 in the same Region as AWS Transform.
  • Entity Framework versions supported – AWS Transform supports the modernization of Entity Framework versions 6.3 through 6.5 and Entity Framework Core 1.0 through 8.0.
  • Getting started – To get started, visit AWS Transform for full-stack Windows modernization User Guide.

Prasad

Introducing AWS Transform custom: Crush tech debt with AI-powered code modernization

Post Syndicated from Matheus Guimaraes original https://aws.amazon.com/blogs/aws/introducing-aws-transform-custom-crush-tech-debt-with-ai-powered-code-modernization/

Technical debt is one of the most persistent challenges facing enterprise development teams today. Studies show that organizations spend 20% of their IT budget on technical debt instead of advancing new capabilities. Whether it’s upgrading legacy frameworks, migrating to newer runtime versions, or refactoring outdated code patterns, these essential but repetitive tasks consume valuable developer time that could be spent on innovation.

Today, we’re excited to announce AWS Transform custom, a new agent that fundamentally changes how organizations approach modernization at scale. This intelligent agent combines pre-built transformations for Java, Node.js, and Python upgrades with the ability to define custom transformations. By learning specific transformation patterns and automating them across entire codebases, customers using AWS Transform custom have achieved up to 80% reduction in execution time in many cases, freeing developers to focus on innovation.

You can define transformations using your documentation, natural language descriptions, and code samples. The service then applies these specific patterns consistently across hundreds or thousands of repositories, improving its effectiveness through both explicit feedback and implicit signals like developers’ manual fixes within your transformation projects.

AWS Transform custom offers both CLI and web interfaces to suit different modernization needs. You can use the CLI to define transformations through natural language interactions and execute them on local codebases, either interactively or autonomously. You can also integrate it into code modernization pipelines or workflows, making it ideal for machine-driven automation. Meanwhile, the web interface provides comprehensive campaign management capabilities, helping teams track and coordinate transformation progress across multiple repositories at scale.

Language and framework modernization
AWS Transform supports runtime upgrades without the need to provide additional information, understanding not only the syntax changes required but also the subtle behavioral differences and optimization opportunities that come with newer versions. The same intelligent approach applies to Node.js, Python and Java runtime upgrades, and even extends to infrastructure-level transitions, such as migrating workloads from x86 processors to AWS Graviton.

It also navigates framework modernization with sophistication. When organizations need to update their Spring Boot applications to take advantage of newer features and security patches, AWS Transform custom doesn’t merely update version numbers but understands the cascading effects of dependency changes, configuration updates, and API modifications.

For teams facing more dramatic shifts, such as migrating from Angular to React, AWS Transform custom can learn the patterns of component translation, state management conversion, and routing logic transformation that make such migrations successful.

Infrastructure and enterprise-scale transformations
The challenge of keeping up with evolving APIs and SDKs becomes particularly acute in cloud-based environments where services are continuously improving. AWS Transform custom supports AWS SDK updates across a broad spectrum of programming languages that enterprises use including Java, Python, and JavaScript. The service understands not only the mechanical aspects of API changes, but also recognizes best practices and optimization opportunities available in newer SDK versions.

Infrastructure as Code transformations represent another critical capability, especially as organizations evaluate different tooling strategies. Whether you’re converting AWS Cloud Development Kit (AWS CDK) templates to Terraform for standardization purposes, or updating AWS CloudFormation configurations to access new service features, AWS Transform custom understands the declarative nature of these tools and can maintain the intent and structure of your infrastructure definitions.

Beyond these common scenarios, AWS Transform custom excels at addressing the unique, organization-specific code patterns that accumulate over years of development. Every enterprise has its own architectural conventions, utility libraries, and coding standards that need to evolve over time. It can learn these custom patterns and help refactor them systematically so that institutional knowledge and best practices are applied consistently across the entire application portfolio.

AWS Transform custom is designed with enterprise development workflows in mind, enabling center of excellence teams and system integrators to define and execute organization-wide transformations while application developers focus on reviewing and integrating the transformed code. DevOps engineers can then configure integrations with existing continuous integration and continuous delivery (CI/CD) pipelines and source control systems. It also includes pre-built transformations for Java, Node.js and Python runtime updates which can be particularly useful for AWS Lambda functions, along with transformations for AWS SDK modernization to help teams get started immediately.

Getting started
AWS Transform makes complex code transformations manageable through both pre-built and custom transformation capabilities. Let’s start by exploring how to use an existing transformation to address a common modernization challenge: upgrading AWS Lambda functions due to end-of-life (EOL) runtime support.

For this example, I’ll demonstrate migrating a Python 3.8 Lambda function to Python 3.13, as Python 3.8 reached EOL and is no longer receiving security updates. I’ll use the CLI for this demo, but I encourage you to also explore the web interface’s powerful campaign management capabilities.

First, I use the command atx custom def list to explore the available transformation definitions. You can also access this functionality through a conversational interface by typing only atx instead of issuing the command directly, if you prefer.

This command displays all available transformations, including both AWS-managed defaults and any existing custom transformations created by users in my organization. AWS-managed transformations are identified by the AWS/ prefix, indicating they’re maintained and updated by AWS. In the results, I can see several options such as AWS/java-version-upgrade for Java runtime modernization, AWS/python-boto2-to-boto3-migration for updating Python AWS SDK usage, AWS/nodejs-version-upgrade for Node.js runtime updates.

For my Python 3.8 to 3.13 migration, I’ll use the AWS/python-version-upgrade transformation.

You run a migration by using the atx custom def exec command.  Please consult the documentation for more details about the command and all its options. Here, I run it against my project repository specifying the transformation name. I also add pytest to run unit tests for validation. More importantly, I use the additionalPlanContext section in the  --configuration input to specify which Python version I want to upgrade to. For reference, here’s the command I have for my demo (I’ve used multiple lines and indented it here for clarity):

atx custom def exec 
-p /mnt/c/Users/vasudeve/Documents/Work/Projects/ATX/lambda/todoapilambda 
-n AWS/python-version-upgrade
-C "pytest" 
--configuration 
    "additionalPlanContext= The target Python version to upgrade to is Python 3.13" 
-x -t

AWS Transform then starts the migration process. It analyzes my Lambda function code, identifies Python 3.8-specific patterns, and automatically applies the necessary changes for Python 3.13 compatibility. This includes updating syntax for deprecated features, modifying import statements, and adjusting any version-specific behaviors.

After execution, it provides a comprehensive summary including a report on dependencies updated in requirements.txt with Python 3.13-compatible package versions, instances of deprecated syntax replaced with current equivalents, updated runtime configuration notes for AWS Lambda deployment, suggested test cases to validate the migration, and more. It also provides a body of evidence that serve as proof of success.

The migrated code lives in a local branch so you can review and merge when satisfied. Alternatively, you can keep providing feedback and reiterating until yo’re happy that the migration is fully complete and meets your expectations.

This automated process changes what would typically require hours of manual work into a streamlined, consistent upgrade that maintains code quality while maintaining compatibility with the newer Python runtime.

Creating a new custom transformation
While AWS-managed transformations handle common scenarios effectively, you can also create custom transformations tailored to your organization’s specific needs. Let’s explore how to create a custom transformation to see how AWS Transform learns from your specific requirements.

I type atx to initialize the atx cli and start the process.

The first thing it asks me is if I want to use one of the existing transformations or create a new one. I choose to create a new one. Notice that from here on the whole conversation takes place using natural language, not commands. I typed new one but I could have typed I want to create a new one and it would’ve understood it exactly the same.

It then prompts me to provide more information about the kind of transformation I’d like to perform. For this demo, I’m going to migrate an Angular application, so I type angular 16 to 19 application migration which prompts the CLI to search for all transformations available for this type of migration. In my case, my team has already created and made available a few Angular migrations, so it shows me those. However, it warns me that none of them is an exact match to my specific request for migrating from Angular 16 to 19. It then asks if I’d like to select from one of the existing transformations listed or create a custom one.

I choose to create a custom one by continuing to use natural language and typing create a new one as a command. Again, this could be any variation of that statement provided that you indicate your intentions clearly. It follows by asking me a few questions including whether I have any useful documentation, example code or migration guides that I can provide to help customize the transformation plan.

For this demo, I’m only going to rely on AWS Transform to provide me with good defaults. I type I don't have these details. Follow best practices. and the CLI responds by telling me that it will create a comprehensive transformation definition for migrating Angular 16 to Angular 19.  Of course, I relied on the pre-trained data to generate results based on best practices. As usual, the recommendation is to provide as much information and relevant data as possible at this stage of the process for better results. However, you don’t need to have all the data upfront. You can keep on providing data at any time› as you iterate through the process of creating the custom transformation definition.

The transformation definition is generated as a markup file containing a summary and a comprehensive sequence of implementation steps grouped logically into phases such as premigration preparation, processing and partitioning, static dependency analysis, searching and applying specific transformation rules, and step-by-step migration and iterative validation.

It’s interesting to see that AWS Transform opted for the best practice of doing incremental framework updates creating steps for migrating the application first to 17 then 18 then 19 instead of trying to go directly from 16 to 19 to minimize issues.

Note that the plan includes various stages of testing and verification to confirm that the various phases can be concluded with confidence. At the very end, it also includes a final validation stage listing exit criteria that performs a comprehensive set of tests against all aspects of the application that will be used to accept the migration as successfully complete.

After the transformation definition is created, AWS Transform asks me about what I would like to do next. I can choose to review or modify the transformation definition and I can reiterate through this process as much as I need until I arrive at one that I’m satisfied with. I can also choose to already apply this transformation definition to an Angular codebase. However, first I want to make this transformation available to my team members as well as myself so we can all use it again in the future. So, I choose option 4 to publish this transformation to the registry.

This custom transformation needs a name and a description of its objective which is displayed when users browse the registry. AWS Transforms automatically extracts those from context for me and asks me if I would like to modify them before going ahead. I like the sensible default of “Angular-16-to-19-Migration”, and the objective is clearly stated, so I choose to accept the suggestions and publish it by answering with yes, looks good.

Now that the transformation definition is created and published, I can use it and run it multiple times against any code repository. Let’s apply the transformation to a code repository with a project written in Angular 16. I now choose option 1 from the follow-up prompt and the CLI asks me for the path in my file system to the application that I want to migrate and, optionally, the build command that it should use.

After I provide that information, AWS Transform proceeds to analyze the code base and formulate a thorough step-by-step transformation plan based on the definition created earlier. After it’s done, it creates a JSON file containing the detailed migration plan specifically designed for applying our transformation definition to this code base. Similar to the process of creating the transformation definition, you can review and iterate through this plan as much as you need, providing it with feedback and adjusting it to any specific requirements you might have.

When I’m ready to accept the plan, I can use natural language to tell AWS Transform that we can start the migration process. I type looks good, proceed and watch the progress in my shell as it starts executing the plan and making the changes to my code base one step at a time.

The time it takes will vary depending on the complexity of the application. In my case, it took a few minutes to complete. After it has finished, it provides me with a transformation summary and the status of each one of the exit criteria that were included in the final verification phase of the plan alongside all the evidence to support the reported status. For example, the Application Build – Production criteria was listed as passed and some of the evidence provided included the incremental Git commits, the time that it took to complete the production build, the bundle size, the build output message, and the details about all the output files created.

Conclusion
AWS Transform represents a fundamental shift in how organizations approach code modernization and technical debt. The service helps to transform what was at one time a fragmented, team-by-team effort into a unified, intelligent capability that eliminates knowledge silos, keeping your best practices and institutional knowledge available as scalable assets across the entire organization. This helps to accelerate modernization initiatives while freeing developers to spend more time on innovation and driving business value instead of focusing on repetitive maintenance and modernization tasks.

Things to know

AWS Transform custom is now generally available. Visit the get started guide to start your first transformation campaign or check out the documentation to learn more about setting up custom transformation definitions.

Top announcements of AWS re:Invent 2025

Post Syndicated from AWS News Blog Team original https://aws.amazon.com/blogs/aws/top-announcements-of-aws-reinvent-2025/

Matt Garman stands on stage at re:Invent 2024We’re rounding up the most exciting and impactful announcements from AWS re:Invent 2025, which takes place November 30-December 4 in Las Vegas. This guide highlights the innovations that will help you build, scale, and transform your business in the cloud.

We’ll update this roundup throughout re:Invent with our curation of the major announcements from each keynote session and more. To see the complete list of all AWS launches, visit What’s New with AWS.

(This post was updated Nov. 30, 2025.)


Analytics

AWS Clean Rooms launches privacy-enhancing dataset generation for ML model training
Train ML models on sensitive collaborative data by generating synthetic datasets that preserve statistical patterns while protecting individual privacy through configurable noise levels and protection against re-identification.

Compute

Introducing AWS Lambda Managed Instances: Serverless simplicity with EC2 flexibility
Run Lambda functions on EC2 compute while maintaining serverless simplicity—enabling access to specialized hardware and cost optimizations through EC2 pricing models, with AWS handling all infrastructure management.

Containers

Announcing Amazon EKS Capabilities for workload orchestration and cloud resource management
Streamline Kubernetes development with fully managed platform capabilities that handle workload orchestration and cloud resource management, eliminating infrastructure maintenance while providing enterprise-grade reliability and security.

Networking & Content Delivery

Introducing Amazon Route 53 Global Resolver for secure anycast DNS resolution (preview)
Simplify hybrid DNS management with a unified service that resolves public and private domains globally through secure, anycast-based resolution while reducing operational overhead and maintaining consistent security controls.

Partner Network

AWS Partner Central now available in AWS Management Console
Access Partner Central directly through the AWS Console to streamline your journey from customer to Partner—manage solutions, opportunities, and marketplace listings in one unified interface with enterprise-grade security.

Security, Identity, & Compliance

Simplify IAM policy creation with IAM Policy Autopilot, a new open source MCP server for builders
Speed up AWS development with an open source tool that analyzes your code to generate valid IAM policies, providing AI coding assistants with up-to-date AWS service knowledge and reliable permission recommendations.

Introducing Amazon Route 53 Global Resolver for secure anycast DNS resolution (preview)

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/introducing-amazon-route-53-global-resolver-for-secure-anycast-dns-resolution-preview/

Today, we’re announcing Amazon Route 53 Global Resolver, a new Amazon Route 53 service that provides secure and reliable DNS resolution globally for queries from anywhere (preview). You can use Global Resolver to resolve DNS queries to public domains on the internet and private domains associated with Route 53 private hosted zones. Route 53 Global Resolver offers network administrators a unified solution to resolve queries from authenticated clients and sources in on-premises data centers, branch offices, and remote locations through globally distributed anycast IP addresses. This service includes built-in security controls including DNS traffic filtering, support for encrypted queries, and centralized logging to help organizations reduce operational overhead while maintaining compliance with security requirements.

Organizations with hybrid deployments face operational complexity when managing DNS resolution across distributed environments. Resolving public internet domains and private application domains often requires maintaining split DNS infrastructure, which increases cost and administrative burden especially when replicating to multiple locations. Network administrators must configure custom forwarding solutions, deploy Route 53 Resolver endpoints for private domain resolution, and implement separate security controls across different locations. Additionally, they must configure and maintain multi-Region failover strategies for Route 53 Resolver endpoints and provide consistent security policy enforcement across all Regions while testing failover scenarios.

Route 53 Global Resolver has key capabilities that address these challenges. The service resolves both public internet domains and Route 53 private hosted zones, eliminating the need for separate split-DNS forwarding. It provides DNS resolution through multiple protocols, including DNS over UDP (Do53), DNS-over-HTTPS (DoH), and DNS-over-TLS (DoT). Each deployment provides a single set of common IPv4 and IPv6 anycast IP addresses that route queries to the nearest AWS Region, reducing latency for distributed client populations.

Route 53 Global Resolver provides integrated security features equivalent to Route 53 Resolver DNS Firewall. Administrators can configure filtering rules using AWS Managed Domain Lists that provide flexible controls with lists classified by DNS threats (malware, spam, phishing) or web content (adult sites, gambling, social networking) that might not be safe for work or create custom domain lists by importing domains from a file. Advanced threat protection detects and blocks domain generation algorithm (DGA) patterns and DNS tunneling attempts. For encrypted DNS traffic, Route 53 Global Resolver supports DoH and DoT protocols to protect queries from unauthorized access during transit.

Route 53 Global Resolver only accepts traffic from known clients that need to authenticate with the Resolver. For Do53, DoT, and DoH connections, administrators can configure IP and CIDR allowlists. For DoH and DoT connections, token-based authentication provides granular access control with customizable expiration periods and revocation capabilities. Administrators can assign tokens to specific client groups or individual devices based on organizational requirements.

Route 53 Global Resolver supports DNSSEC validation to verify the authenticity and integrity of DNS responses from public nameservers. It also includes EDNS Client Subnet support, which forwards client subnet information to enable more accurate geographic-based DNS responses from content delivery networks.

Getting started with Route 53 Global Resolver
This walkthrough shows how to configure Route 53 Global Resolver for an organization with offices on the US East and West coasts that needs to resolve both public domains and private applications hosted in Route 53 private hosted zones. To configure Route 53 Global Resolver, go to the AWS Management Console, choose Global resolvers from the navigation pane, and choose Create global resolver.

In the Resolver details section, enter a Resolver name such as corporate-dns-resolver. Add an optional description like DNS resolver for corporate offices and remote clients. In the Regions section, choose the AWS Regions where you want the resolver to operate, such as US East (N. Virginia) and US West (Oregon). The anycast architecture routes DNS queries from your clients to the nearest selected Region.

After the resolver is created, the console displays the resolver details, including the anycast IPv4 and IPv6 addresses that you will use for DNS queries. You can proceed to create a DNS view by choosing Create DNS view to configure client authentication and DNS query resolution settings.

In the Create DNS view section, enter a DNS view name such as primary-view and optionally add a Description like DNS view for corporate offices. A DNS view helps you create different logical groupings for your clients and sources, and determine the DNS resolution for those groups. This helps you maintain different DNS filtering rules and private hosted zone resolution policies for different clients in your organization.

For DNSSEC validation, choose Enable to verify the authenticity of DNS responses from public DNS servers. For Firewall rules fail open behavior, choose Disable to block DNS queries when firewall rules can’t be evaluated, which provides additional security. For EDNS client subnet, keep Enable selected to forward client location information to DNS servers, which allows content delivery networks to provide more accurate geographic responses. DNS view creation might take a few minutes to become operational.

After the DNS view is created and operational, configure DNS Firewall rules to filter network traffic by choosing Create rule. In the Create DNS Firewall rules section, enter a Rule name such as block-malware-domains and optionally add a description. For Rule configuration type, you can choose Customer managed domain lists, AWS managed domain lists provided by AWS or DNS Firewall Advanced protection.

For this walkthrough, choose AWS managed domain lists. In the Domain lists dropdown, choose one or more AWS managed lists such as Threat – Malware to block known malicious domains. You can leave Query type empty to apply the rule to all DNS query types. In this example, choose A to apply this rule only to IPv4 address queries. In the Rule action section, select Block to prevent DNS resolution for domains that match the selected lists. For Response to send for Block action, keep NODATA selected to indicate that the query was successful but no response is available, then choose Create rules.

The next step is to configure access sources to specify which IP addresses or CIDR blocks are allowed to send DNS queries to the resolver. Navigate to the Access sources tab in the DNS view and then choose Create access source.

In the Access source details section, enter a Rule name such as office-networks to identify the access source. In the CIDR block field, enter the IP address range for your offices to allow queries from that network. For Protocol, select Do53 for standard DNS queries over UDP or choose DoH or DoT if you want to require encrypted DNS connections from clients. After configuring these settings, choose Create access source to allow the specified network to send DNS queries to the resolver.

Next, navigate to the Access tokens tab in the DNS view to create token-based authentication for clients and choose Create access token. In the Access token details section, enter a Token name such as remote-clients-token. For Token expiry, select an expiration period from the dropdown based on your security requirements, such as 365 days for long-term client access, or choose a shorter duration like 30 days or 90 days for tighter access control. After configuring these settings, choose Create access token to generate the token, which clients can use to authenticate DoH and DoT connections to the resolver.

After the access token is created, navigate to the Private hosted zones tab in the DNS view to associate Route 53 private hosted zones with the DNS view so that the resolver can resolve queries for your private application domains. Choose Associate private hosted zone and in the Private hosted zones section, select a private hosted zone from the list that you want the resolver to handle. After selecting the zone, choose Associate to enable the resolver to respond to DNS queries for these private domains from your configured access sources.

With the DNS view configured, firewall rules created, access sources and tokens defined, and private hosted zones associated, the Route 53 Global Resolver setup is complete and ready to handle DNS queries from your configured clients.

After creating your Route 53 Global Resolver, you need to configure your DNS clients to send queries to the resolver’s anycast IP addresses. The configuration method depends on the access control you configured in your DNS view:

  • For IP-based access sources (CIDR blocks) – Configure your source clients to point DNS traffic to the Route 53 Global Resolver anycast IP addresses provided in the resolver details. Global Resolver will only allow access from allowlisted IPs that you have specified in your access sources. You can also associate the access sources to different DNS views to provide more granular DNS resolution views for different sets of IPs.
  • For access token–based authentication – Deploy the tokens on your clients to authenticate DoH and DoT connections with Route 53 Global Resolver. You must also configure your clients to point the DNS traffic to the Route 53 Global Resolver anycast IP addresses provided in the resolver details.

For detailed configuration instructions for your specific operating system and protocol, refer to the technical documentation.

Additional things to know
We’re renaming the existing Route 53 Resolver to Route 53 VPC Resolver. This naming change clarifies the architectural distinction between the two services. VPC Resolver operates Regionally within your VPCs to provide DNS resolution for resources in your Amazon VPC environment. VPC Resolver continues to support inbound and outbound resolver endpoints for hybrid DNS architectures within specific AWS Regions.

Route 53 Global Resolver complements Route 53 VPC Resolver by providing internet-reachable, global and private DNS resolution for on-premises and remote clients without requiring VPC deployment or private connections.

Existing VPC Resolver configurations remain unchanged and continue to function as configured. The renaming affects the service name in the AWS Management Console and documentation, but API operation names remain unchanged. If your architecture requires DNS resolution for resources within your VPCs, continue using VPC Resolver.

Join the preview
Route 53 Global Resolver reduces operational overhead by providing unified DNS resolution for public and private domains through a single managed service. The global anycast architecture improves reliability and reduces latency for distributed clients. Integrated security controls and centralized logging help organizations maintain consistent security policies across all locations while meeting compliance requirements.

To learn more about Amazon Route 53 Global Resolver, visit the Amazon Route 53 documentation.

You can start using Route 53 Global Resolver through the AWS Management Console in US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Europe (Frankfurt), Europe (Ireland), Europe (London), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Tokyo), and Asia Pacific (Sydney) Regions.

— Esra

AWS Partner Central now available in AWS Management Console

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-partner-central-now-available-in-aws-management-console/

Today, we’re announcing that AWS Partner Central is now available directly in the AWS Management Console, creating a unified experience that transforms how you engage with AWS as both customers and AWS Partners.

As someone who has worked with countless AWS customers over the years, I’ve observed how organizations evolve in their AWS journey. Many of our most successful Partners began as AWS customers—first using our services to build their own infrastructure and solutions, then expanding to create offerings for others. Seeing this natural progression from customer to Partner, we recognized an opportunity to streamline these traditionally separate experiences into one unified journey.

As AWS evolved, so did the needs of our Partner community. Organizations today operate in multiple capacities: using AWS services for their own infrastructure while simultaneously building and delivering solutions for their customers. Modern businesses need streamlined workflows that support their growth from AWS customer to Partner to AWS Marketplace Seller, with enterprise-grade security features that match how they actually work with AWS today.

A new unified console experience
The integration of AWS Partner Central into the Console represents a fundamental shift in partnership accessibility. For existing AWS customers, such as you, becoming an AWS Partner is now as clear as accessing any other AWS service. The familiar console interface provides direct access to partnership opportunities, program benefits, and AWS Marketplace capabilities without needing separate logins or navigation between different systems.

Getting started as an AWS Partner now takes only a few clicks within your existing console environment. You can discover partnership opportunities, understand program requirements, and begin your Partner journey without leaving the AWS interface you already know and trust.

The console integration creates an intuitive pathway for existing customers to transition into AWS Marketplace Sellers. You can now access AWS Marketplace Seller capabilities alongside your existing AWS services, managing both your infrastructure and AWS Marketplace business from a single interface. Private offer requests and negotiations can be managed directly within AWS Partner Central, and you can manage your AWS Marketplace listings alongside your other AWS activities through streamlined workflows.

Becoming an AWS Partner
The unified console experience provides access to comprehensive partnership benefits designed to accelerate your business growth.

Join the AWS Partner Network (APN) and complete your Partner and AWS Marketplace Seller requirements seamlessly within the same interface. Enroll in Partner Paths that align with your customer solutions to build, market, list, and sell in AWS Marketplace while growing alongside AWS. When you are established, use the Partner programs to differentiate your solution, list in AWS Marketplace to improve your go-to-market discoverability, and build AWS expertise through certifications to drive profitability by capturing new revenue streams. Scale your business by selling or reselling software and professional services in AWS Marketplace, helping you accelerate deals, boost revenue, and expand your customer reach to new geographies, industries, and segments.

Throughout your journey, you can continue using Amazon Q in the console, which provides personalized guidance through AWS Partner Assistant.

Let’s see the new Partner Central console
The new AWS Partner Central is accessible like any other AWS service from the console. Among many new capabilities, it provides four key sections that support Partner operations and business growth within the AWS Partner Network:

1. It helps you sell your solutions

AWS Partner Central - Solutions

You can create and publish solutions that address specific customer needs through AWS Marketplace. Solutions are made up of products such as software as a service (SaaS), Amazon Machine Images (AMI), containers, professional services, AI agents and tools, and more. The solutions management capability guides you through building offerings that include both products you own and those you are authorized to resell. You can craft compelling value propositions and descriptions that clearly communicate your solution benefits to potential buyers browsing AWS Marketplace.

I choose Create solution to start listing a new solution in the AWS Marketplace, as shown in the following figure.

AWS Partner Central - Create solution

2. It helps you update and manage your Partner profile

AWS Partner Central - Manage profile

Your Partner profile showcases your organization’s expertise and capabilities to the AWS community. You control how your business appears to potential customers and Partners by highlighting the industry segments you serve and describing your primary products or services. Profile visibility settings provide you with the option to choose whether your information is public or private.

3. It helps you track opportunities

AWS Partner Central - Track Opportunities

You can manage your pipeline of AWS customers, supporting joint collaborations with AWS on customer engagements. You monitor these prospects using clear status indicators: approved, rejected, draft, and pending approval. The opportunity dashboard shows stages, estimated AWS Monthly Recurring Revenue, and other key metrics that help you understand your pipeline. You can create more opportunities directly within the console and export data for your own reporting and analysis.

4. It provides you with the ability to discover and connect with other Partners

After becoming an AWS Partner, you get access to the AWS Partners network, where you can search for other Partners. You can connect with them to collaborate on sales opportunities and expand your customer outreach.

AWS Partner Central - Discover and Search for partners

You search through available Partners using filters for industry, location, Partner program type, and specialization. The centralized dashboard shows your active connections, pending requests, and connection history, so that you can manage business relationships and identify collaboration opportunities that can expand your reach. Like all other AWS services, these Partner connection capabilities are now available as APIs, which provide automation and integration into your existing workflows.

AWS Partner Central - Manage contact requests

These capabilities work together within the new AWS Partner Central console, accessible directly from the console, helping you transition from AWS customer to successful Partner with enterprise-grade security and streamlined workflows.

The technical foundation: Migrating the identity system
This unified console experience is made possible by our migration to a modern identity system built on AWS Identity and Access Management (IAM). We’ve transitioned from legacy identity infrastructure to IAM Identity Center, providing enterprise-grade security capabilities including single sign-on capabilities and multi-factor authentication. With security as job zero, this migration provides new and existing Partners with the possibility to connect their own identity providers to AWS Partner Central. It provides seamless integration with existing enterprise authentication systems while removing the complexity of managing separate credentials across different services.

One more thing
APIs are the core of what we do at AWS, and AWS Partner Central is no different. You can automate and streamline your co-sell workflows by connecting your business tools to AWS Partner Central. The APIs offered by AWS Partner Central help you accelerate APN benefits—from Account Management (Account API) and Solution Management (Solution API) to co-selling with Opportunity and Leads APIs, and Benefits APIs for faster benefit activation.

You can use these APIs to engage with AWS and grow your Partner business from your own CRM tools.

Get started today
This integration between the console and AWS Partner Central reflects our commitment to reducing complexity and improving the Partner experience. We’re bringing AWS Partner Central into the console to create a more intuitive path for organizations to grow with AWS from initial customer adoption through to full partnership engagement and AWS Marketplace success.

Your journey from AWS customer to successful AWS Partner and AWS Marketplace Seller starts with a few clicks in your console. I encourage you to explore the new unified experience today and discover how AWS Partner Central in the console can accelerate your organization’s growth and success within the AWS community.

Ready to get started? Visit AWS Partner Central in your console to learn more about the AWS Partner Network and discover the partnership path that’s right for your organization.

— seb

Run Apache Spark and Iceberg 4.5x faster than open source Spark with Amazon EMR

Post Syndicated from Atul Payapilly original https://aws.amazon.com/blogs/big-data/run-apache-spark-and-iceberg-4-5x-faster-than-open-source-spark-with-amazon-emr/

This post shows how Amazon EMR 7.12 can make your Apache Spark and Iceberg workloads up to 4.5x faster performance.

The Amazon EMR runtime for Apache Spark provides a high-performance runtime environment with full API compatibility with open source Apache Spark and Apache Iceberg. Amazon EMR on EC2, Amazon EMR Serverless, Amazon EMR on Amazon EKS, Amazon EMR on AWS Outposts and AWS Glue use the optimized runtimes.

Our benchmarks show Amazon EMR 7.12 runs TPC-DS 3 TB workloads 4.5x faster than open source Spark 3.5.6 with Iceberg 1.10.0.

Performance improvements include optimizations for metadata caching, parallel I/O, adaptive query planning, data type handling, and fault tolerance. There were also some Iceberg specific regressions around data scans that we identified and fixed.

These optimizations let you match Parquet performance on Amazon EMR while keeping the key features of Iceberg key features: ACID transactions, time travel, and schema evolution.

Benchmark results compared to open source

To assess the performance of the Spark engine with the Iceberg table format, we performed benchmark tests using the 3 TB TPC-DS dataset, version 2.13, a popular industry standard benchmark. Benchmark tests for the Amazon EMR runtime for Apache Spark and Apache Iceberg were conducted on Amazon EMR 7.12 EC2 clusters compared to open source Apache Spark 3.5.6 and Apache Iceberg 1.10.0 on EC2 clusters.

Note: Our results derived from the TPC-DS dataset are not directly comparable to the official TPC-DS results due to setup differences.

The setup instructions and technical details are available in our GitHub repository. To minimize the influence of external catalogs like AWS Glue and Hive, we used the Hadoop catalog for the Iceberg tables. This uses the underlying file system, specifically Amazon S3, as the catalog. We can define this setup by configuring the property spark.sql.catalog.<catalog_name>.type. The fact tables used the default partitioning by the date column, which vary from 200–2,100 partitions. No precalculated statistics were used for these tables.

We ran a total of 104 SparkSQL queries in 3 sequential rounds, and the average runtime of each query across these rounds was taken for comparison. The average runtime for the 3 rounds on Amazon EMR 7.12 with Iceberg enabled was 0.37 hours, demonstrating a 4.5x speed increase compared to open source Spark 3.5.6 and Iceberg 1.10.0. The following figure presents the total runtimes in seconds.

The following table summarizes the metrics.

Metric Amazon EMR 7.12 on EC2 Amazon EMR 7.5 on EC2 Open source Apache Spark 3.5.6 and Apache Iceberg 1.10.0
Average runtime in seconds 1349.62 1535.62 6113.92
Geometric mean over queries in seconds 7.45910 8.30046 22.31854
Cost* $4.81 $5.47 $17.65

*Detailed cost estimates are discussed later in this post.

The following chart demonstrates the per-query performance improvement of Amazon EMR 7.12 relative to open source Spark 3.5.6 and Iceberg 1.10.0. The extent of the speedup varies from one query to another, with the fastest up to 13.6x faster for q23b, with Amazon EMR outperforming open source Spark with Iceberg tables. The horizontal axis arranges the TPC-DS 3TB benchmark queries in descending order based on the performance improvement seen with Amazon EMR, and the vertical axis depicts the magnitude of this speedup as a ratio.

Cost comparison breakdown

Our benchmark provides the total runtime and geometric mean data to assess the performance of Spark and Iceberg in a complex, real-world decision support scenario. For additional insights, we also examine the cost aspect. We calculate cost estimates using formulas that account for EC2 On-Demand instances, Amazon Elastic Block Store (Amazon EBS), and Amazon EMR expenses.

  • Amazon EC2 cost (includes SSD cost) = number of instances * r5d.4xlarge hourly rate * job runtime in hours
    • 4xlarge hourly rate = $1.152 per hour
  • Root Amazon EBS cost = number of instances * Amazon EBS per GB-hourly rate * root EBS volume size * job runtime in hours
  • Amazon EMR cost = number of instances * r5d.4xlarge Amazon EMR cost * job runtime in hours
    • 4xlarge Amazon EMR cost = $0.27 per hour
  • Total cost = Amazon EC2 cost + root Amazon EBS cost + Amazon EMR cost

The calculations reveal that the Amazon EMR 7.12 benchmark yields a 3.6x cost efficiency improvement over open source Spark 3.5.6 and Iceberg 1.10.0 in running the benchmark job.

Metric Amazon EMR 7.12 Amazon EMR 7.5 Open source Apache Spark 3.5.6 and Apache Iceberg 1.10.0
Runtime in seconds 1349.62 1535.62 6113.92

Number of EC2 instances

(Includes primary node)

9 9 9
Amazon EBS Size 20gb 20gb 20gb

Amazon EC2

(Total runtime cost)

$3.89 $4.42 $17.61
Amazon EBS cost $0.01 $0.01 $0.04
Amazon EMR cost $0.91 $1.04 $0
Total cost $4.81 $5.47 $17.65
Cost savings Amazon EMR 7.12 is 3.6x better Amazon EMR 7.5 is 3.2x better Baseline

In addition to the time-based metrics discussed so far, data from Spark event logs show that Amazon EMR scanned approximately 4.3x less data from Amazon S3 and 5.3x fewer records than the open source version in the TPC-DS 3 TB benchmark. This reduction in Amazon S3 data scanning contributes directly to cost savings for Amazon EMR workloads.

Run open source Apache Spark benchmarks on Apache Iceberg tables

We used separate EC2 clusters, each equipped with 9 r5d.4xlarge instances, for testing both open source Spark 3.5.6 and Amazon EMR 7.12 for Iceberg workload. The primary node was equipped with 16 vCPU and 128 GB of memory, and the 8 worker nodes together had 128 vCPU and 1024 GB of memory. We conducted tests using the Amazon EMR default settings to showcase the typical user experience and minimally adjusted the settings of Spark and Iceberg to maintain a balanced comparison.

The following table summarizes the Amazon EC2 configurations for the primary node and 8 worker nodes of type r5d.4xlarge.

EC2 Instance vCPU Memory (GiB) Instance storage (GB) EBS root volume (GB)
r5d.4xlarge 16 128 2 x 300 NVMe SSD 20 GB

Prerequisites

The following prerequisites are required to run the benchmarking:

  1. Using the instructions in the emr-spark-benchmark GitHub repository, set up the TPC-DS source data in your S3 bucket and on your local computer.
  2. Build the benchmark application following the steps provided in Steps to build spark-benchmark-assembly application and copy the benchmark application to your S3 bucket. Alternatively, copy spark-benchmark-assembly-3.5.6.jar to your S3 bucket.
  3. Create Iceberg tables from the TPC-DS source data. Follow the instructions on GitHub to create Iceberg tables using the Hadoop catalog. For example, the following code uses an Amazon EMR 7.12 cluster with Iceberg enabled to create the tables:
aws emr add-steps --cluster-id <cluster-id> --steps Type=Spark,Name="Create Iceberg Tables",
Args=[--class,com.amazonaws.eks.tpcds.CreateIcebergTables,--conf,spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,
--conf,spark.sql.catalog.hadoop_catalog=org.apache.iceberg.spark.SparkCatalog,
--conf,spark.sql.catalog.hadoop_catalog.type=hadoop,
--conf,spark.sql.catalog.hadoop_catalog.warehouse=s3://<bucket>/<warehouse_path>/,
--conf,spark.sql.catalog.hadoop_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO,
s3://<bucket>/<jar_location>/spark-benchmark-assembly-3.5.6.jar,s3://blogpost-sparkoneks-us-east-1/blog/BLOG_TPCDS-TEST-3T-partitioned/,
/home/hadoop/tpcds-kit/tools,parquet,3000,true,<database_name>,true,true],ActionOnFailure=CONTINUE --region <AWS region>

Note: The Hadoop catalog warehouse location and database name from the preceding step. We use the same Iceberg tables to run benchmarks with Amazon EMR 7.12 and open source Spark.

This benchmark application is built from the branch tpcds-v2.13_iceberg. If you’re building a new benchmark application, switch to the correct branch after downloading the source code from the GitHub repository.

Create and configure a YARN cluster on Amazon EC2

To compare Iceberg performance between Amazon EMR on Amazon EC2 and open source Spark on Amazon EC2, follow the instructions in the emr-spark-benchmark GitHub repository to create an open source Spark cluster on Amazon EC2 using Flintrock with 8 worker nodes.

Based on the cluster selection for this test, the following configurations are used:

Make sure to replace the placeholder <private ip of primary node>, in the yarn-site.xml file, with the primary node’s IP address of your Flintrock cluster.

Run the TPC-DS benchmark with Apache Spark 3.5.6 and Apache Iceberg 1.10.0

Complete the following steps to run the TPC-DS benchmark:

  1. Log in to the open source cluster primary node using flintrock login $CLUSTER_NAME.
  2. Submit your Spark job:
    1. Choose the correct Iceberg catalog warehouse location and database that has the created Iceberg tables.
    2. The results are created in s3://<YOUR_S3_BUCKET>/benchmark_run.
    3. You can track progress in /media/ephemeral0/spark_run.log.
spark-submit \
--master yarn \
--deploy-mode client \
--class com.amazonaws.eks.tpcds.BenchmarkSQL \
--conf spark.driver.cores=4 \
--conf spark.driver.memory=10g \
--conf spark.executor.cores=16 \
--conf spark.executor.memory=100g \
--conf spark.executor.instances=8 \
--conf spark.network.timeout=2000 \
--conf spark.executor.heartbeatInterval=300s \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.shuffle.service.enabled=false \
--conf spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.InstanceProfileCredentialsProvider \
--conf spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.jars.packages=org.apache.hadoop:hadoop-aws:3.3.4,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.0,org.apache.iceberg:iceberg-aws-bundle:1.10.0 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions   \
--conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog    \
--conf spark.sql.catalog.local.type=hadoop  \
--conf spark.sql.catalog.local.warehouse=s3a://<YOUR_S3_BUCKET>/<warehouse_path>/ \
--conf spark.sql.defaultCatalog=local   \
--conf spark.sql.catalog.local.io-impl=org.apache.iceberg.aws.s3.S3FileIO   \
spark-benchmark-assembly-3.5.6.jar   \
s3://<YOUR_S3_BUCKET>/benchmark_run 3000 1 false  \
q1-v2.13,q10-v2.13,q11-v2.13,q12-v2.13,q13-v2.13,q14a-v2.13,q14b-v2.13,q15-v2.13,q16-v2.13,\
q17-v2.13,q18-v2.13,q19-v2.13,q2-v2.13,q20-v2.13,q21-v2.13,q22-v2.13,q23a-v2.13,q23b-v2.13,\
q24a-v2.13,q24b-v2.13,q25-v2.13,q26-v2.13,q27-v2.13,q28-v2.13,q29-v2.13,q3-v2.13,q30-v2.13,\
q31-v2.13,q32-v2.13,q33-v2.13,q34-v2.13,q35-v2.13,q36-v2.13,q37-v2.13,q38-v2.13,q39a-v2.13,\
q39b-v2.13,q4-v2.13,q40-v2.13,q41-v2.13,q42-v2.13,q43-v2.13,q44-v2.13,q45-v2.13,q46-v2.13,\
q47-v2.13,q48-v2.13,q49-v2.13,q5-v2.13,q50-v2.13,q51-v2.13,q52-v2.13,q53-v2.13,q54-v2.13,\
q55-v2.13,q56-v2.13,q57-v2.13,q58-v2.13,q59-v2.13,q6-v2.13,q60-v2.13,q61-v2.13,q62-v2.13,\
q63-v2.13,q64-v2.13,q65-v2.13,q66-v2.13,q67-v2.13,q68-v2.13,q69-v2.13,q7-v2.13,q70-v2.13,\
q71-v2.13,q72-v2.13,q73-v2.13,q74-v2.13,q75-v2.13,q76-v2.13,q77-v2.13,q78-v2.13,q79-v2.13,\
q8-v2.13,q80-v2.13,q81-v2.13,q82-v2.13,q83-v2.13,q84-v2.13,q85-v2.13,q86-v2.13,q87-v2.13,\
q88-v2.13,q89-v2.13,q9-v2.13,q90-v2.13,q91-v2.13,q92-v2.13,q93-v2.13,q94-v2.13,q95-v2.13,\
q96-v2.13,q97-v2.13,q98-v2.13,q99-v2.13,ss_max-v2.13    \
true <database> > /media/ephemeral0/spark_run.log 2>&1 &!

Summarize the results

After the Spark job finishes, retrieve the test result file from the output S3 bucket at s3://<YOUR_S3_BUCKET>/benchmark_run/timestamp=xxxx/summary.csv/xxx.csv. This can be done either through the Amazon S3 console by navigating to the specified bucket location or by using the Amazon Command Line Interface (AWS CLI). The Spark benchmark application organizes the data by creating a timestamp folder and placing a summary file within a folder labeled summary.csv. The output CSV files contain 4 columns without headers:

  • Query name
  • Median time
  • Minimum time
  • Maximum time

With the data from 3 separate test runs with 1 iteration each time, we can calculate the average and geometric mean of the benchmark runtimes.

Run the TPC-DS benchmark with Amazon EMR runtime for Apache Spark

Most of the instructions are similar to Steps to run Spark Benchmarking with a few Iceberg-specific details.

Prerequisites

Complete the following prerequisite steps:

  1. Run aws configure to configure the AWS CLI shell to point to the benchmarking AWS account. Refer to Configure the AWS CLI for instructions.
  2. Upload the benchmark application JAR file to Amazon S3.

Deploy Amazon EMR cluster and run the benchmark job

Complete the following steps to run the benchmark job:

  1. Use the AWS CLI command as shown in Deploy EMR on EC2 Cluster and run benchmark job to deploy an Amazon EMR on EC2 cluster. Make sure to enable Iceberg. See Create an Iceberg cluster for more details. Choose the correct Amazon EMR version, root volume size, and same resource configuration as the open source Flintrock setup. Refer to create-cluster for a detailed description of the AWS CLI options.
  2. Store the cluster ID from the response. We need this for the next step.
  3. Submit the benchmark job in Amazon EMR using add-steps from the AWS CLI:
    1. Replace <cluster ID> with the cluster ID from Step 2.
    2. The benchmark application is at s3://<your-bucket>/spark-benchmark-assembly-3.5.6.jar.
    3. Choose the correct Iceberg catalog warehouse location and database that has the created Iceberg tables. This should be the same as the one used for the open source TPC-DS benchmark run.
    4. The results will be in s3://<your-bucket>/benchmark_run.
aws emr add-steps   --cluster-id <cluster-id>
--steps Type=Spark,Name="SPARK Iceberg EMR TPCDS Benchmark Job",
Args=[--class,com.amazonaws.eks.tpcds.BenchmarkSQL,
--conf,spark.driver.cores=4,
--conf,spark.driver.memory=10g,
--conf,spark.executor.cores=16,
--conf,spark.executor.memory=100g,
--conf,spark.executor.instances=8,
--conf,spark.network.timeout=2000,
--conf,spark.executor.heartbeatInterval=300s,
--conf,spark.dynamicAllocation.enabled=false,
--conf,spark.shuffle.service.enabled=false,
--conf,spark.sql.iceberg.data-prefetch.enabled=true,
--conf,spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,
--conf,spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog,
--conf,spark.sql.catalog.local.type=hadoop,
--conf,spark.sql.catalog.local.warehouse=s3://<your-bucket>/<warehouse-path>,
--conf,spark.sql.defaultCatalog=local,
--conf,spark.sql.catalog.local.io-impl=org.apache.iceberg.aws.s3.S3FileIO,
s3://<your-bucket>/spark-benchmark-assembly-3.5.6.jar,
s3://<your-bucket>/benchmark_run,3000,1,false,
'q1-v2.13\,q10-v2.13\,q11-v2.13\,q12-v2.13\,q13-v2.13\,q14a-v2.13\,q14b-v2.13\,q15-v2.13\,q16-v2.13\,q17-v2.13\,q18-v2.13\,q19-v2.13\,q2-v2.13\,q20-v2.13\,q21-v2.13\,q22-v2.13\,q23a-v2.13\,q23b-v2.13\,q24a-v2.13\,q24b-v2.13\,q25-v2.13\,q26-v2.13\,q27-v2.13\,q28-v2.13\,q29-v2.13\,q3-v2.13\,q30-v2.13\,q31-v2.13\,q32-v2.13\,q33-v2.13\,q34-v2.13\,q35-v2.13\,q36-v2.13\,q37-v2.13\,q38-v2.13\,q39a-v2.13\,q39b-v2.13\,q4-v2.13\,q40-v2.13\,q41-v2.13\,q42-v2.13\,q43-v2.13\,q44-v2.13\,q45-v2.13\,q46-v2.13\,q47-v2.13\,q48-v2.13\,q49-v2.13\,q5-v2.13\,q50-v2.13\,q51-v2.13\,q52-v2.13\,q53-v2.13\,q54-v2.13\,q55-v2.13\,q56-v2.13\,q57-v2.13\,q58-v2.13\,q59-v2.13\,q6-v2.13\,q60-v2.13\,q61-v2.13\,q62-v2.13\,q63-v2.13\,q64-v2.13\,q65-v2.13\,q66-v2.13\,q67-v2.13\,q68-v2.13\,q69-v2.13\,q7-v2.13\,q70-v2.13\,q71-v2.13\,q72-v2.13\,q73-v2.13\,q74-v2.13\,q75-v2.13\,q76-v2.13\,q77-v2.13\,q78-v2.13\,q79-v2.13\,q8-v2.13\,q80-v2.13\,q81-v2.13\,q82-v2.13\,q83-v2.13\,q84-v2.13\,q85-v2.13\,q86-v2.13\,q87-v2.13\,q88-v2.13\,q89-v2.13\,q9-v2.13\,q90-v2.13\,q91-v2.13\,q92-v2.13\,q93-v2.13\,q94-v2.13\,q95-v2.13\,q96-v2.13\,q97-v2.13\,q98-v2.13\,q99-v2.13\,ss_max-v2.13',
true,<database>],ActionOnFailure=CONTINUE --region <aws-region>

Summarize the results

After the step is complete, you can see the summarized benchmark result at s3://<YOUR_S3_BUCKET>/benchmark_run/timestamp=xxxx/summary.csv/xxx.csv in the same way as the previous run and compute the average and geometric mean of the query runtimes.

Clean up

To help prevent future charges, delete the resources you created by following the instructions provided in the Cleanup section of the GitHub repository.

Summary

Amazon EMR optimizes the runtime for Spark when used with Iceberg tables, achieving 4.5x faster performance than open source Apache Spark 3.5.6 and Apache Iceberg 1.10.0 with Amazon EMR 7.12 on TPC-DS 3 TB, v2.13. This represents a significant advancement from Amazon EMR 7.5, which delivered 3.6x faster performance and closes the gap to parquet performance on Amazon EMR so customers can use the benefits of Iceberg without a performance penalty.

We encourage you to keep up to date with the latest Amazon EMR releases to fully benefit from ongoing performance improvements.

To stay informed, subscribe to the RSS feed for the AWS Big Data Blog, where you can find updates on the Amazon EMR runtime for Spark and Iceberg, as well as tips on configuration best practices and tuning recommendations.


About the authors

Atul Felix Payapilly is a software development engineer for Amazon EMR at Amazon Web Services.

Akshaya KP is a software development engineer for Amazon EMR at Amazon Web Services.

Hari Kishore Chaparala is a software development engineer for Amazon EMR at Amazon Web Services.

Giovanni Matteo is the Senior Manager for the Amazon EMR Spark and Iceberg group.

Apache Spark encryption performance improvement with Amazon EMR 7.9

Post Syndicated from Sonu Kumar Singh original https://aws.amazon.com/blogs/big-data/apache-spark-encryption-performance-improvement-with-amazon-emr-7-9/

The Amazon EMR runtime for Apache Spark is a performance-optimized runtime for Apache Spark that is 100% API compatible with open source Apache Spark. With Amazon EMR release 7.9.0, the EMR runtime for Apache Spark introduces significant performance improvements for encrypted workloads, supporting Spark version 3.5.5.

For compliance and security requirements, many customers need to enable Apache Spark’s local storage encryption (spark.io.encryption.enabled = true) in addition to Amazon Simple Storage Service (Amazon S3) encryption (such as server-side encryption (SSE) or AWS Key Management Service (AWS KMS)). This feature encrypts shuffle files, cached data, and other intermediate data written to local disk during Spark operations, protecting sensitive data at rest on Amazon EMR cluster instances.

Industries subject to regulations such as the Health Insurance Portability and Accountability Act (HIPAA) for healthcare, Payment Card Industry Data Security Standard (PCI-DSS) for financial services, General Data Protection Regulation (GDPR) for personal data, and Federal Risk and Authorization Management Program (FedRAMP) for government often require encryption of all data at rest, including temporary files on local storage. While Amazon S3 encryption protects data in object storage, Spark’s I/O encryption secures the intermediate shuffle and spill data that Spark writes to local disk during distributed processing—data that never reaches Amazon S3 but might contain sensitive information extracted from source datasets. Generally, encrypted operations require additional computational overhead that can impact overall job performance.

With the built-in encryption optimizations of Amazon EMR 7.9.0, customers might see significant performance improvements in their Apache Spark applications without requiring any application changes. In our performance benchmark tests, derived from TPC-DS performance tests at 3 TB scale, we observed up to 20% faster performance with the EMR 7.9 optimized Spark runtime compared to Spark without these optimizations. Individual results may vary depending on specific workloads and configurations.

In this post, we analyze the results from our benchmark tests comparing the Amazon EMR 7.9 optimized Spark runtime against Spark 3.5.5 without encryption optimizations. We walk through a detailed cost analysis and provide step-by-step instructions to reproduce the benchmark.

Results observed

To evaluate the performance improvements, we used an open source Spark performance test utility derived from the TPC-DS performance test toolkit. We ran the tests on two nine-node (eight core nodes and one primary node) r5d.4xlarge Amazon EMR 7.9.0 clusters, comparing two configurations:

  • Baseline: EMR 7.9.0 cluster with a bootstrap action installing Spark 3.5.5 without encryption optimizations
  • Optimized: EMR 7.9.0 cluster using the EMR Spark 3.5.5 runtime with encryption optimizations

Both tests used data stored in Amazon Simple Storage Service (Amazon S3). All data processing was configured identically except for the Spark runtime version.

To maintain benchmarking consistency and ensure a consistent, equivalent comparison, we disabled Dynamic Resource Allocation (DRA) in both test configurations. This approach eliminates variability from dynamic scaling and so we can measure pure computational performance improvements.

The following table shows the total job runtime for all queries (in seconds) in the 3 TB query dataset between the baseline and Amazon EMR 7.9 optimized configurations:

Configuration Total runtime (seconds) Geometric mean (seconds) Performance improvement
Baseline (Spark 3.5.5 without optimization) 1,485 10.24
EMR 7.9 (with encryption optimization) 1,176 8.15 20% faster

We observed that our TPC-DS tests with the Amazon EMR 7.9 optimized Spark runtime completed about 20% faster based on total runtime and 20% faster based on geometric mean compared to the baseline configuration.

The encryption optimizations in Amazon EMR 7.9 deliver performance benefits through:

  • Improved shuffle and decryption operations reducing overhead during data exchange without compromising security
  • Better memory management for intermediate results

Cost analysis

The performance improvements of the Amazon EMR 7.9 optimized Spark runtime directly translate to lower costs. We realized an approximately 20% cost savings running the benchmark application with encryption optimizations compared to the baseline configuration, because of reduced hours of EMR, Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Block Store (Amazon EBS) using General Purpose SSD (gp2).

The following table summarizes the cost comparison in the us-east-1 AWS Region:

Configuration Runtime (hours) Estimated cost Total EC2 instances Total vCPU Total memory (GiB) Root device (EBS)
Baseline: Spark 3.5.5 without optimization, 1 primary and 8 core nodes 0.41 $5.28 9 144 1152 64 GiB gp2
Amazon EMR 7.9 with optimization, 1 primary and 8 core nodes 0.33 $4.25 9 144 1152 64 GiB gp2

Cost breakdown

Formulas used:

  • Amazon EMR cost – Number of instances × EMR hourly rate × Runtime hours
  • Amazon EC2 cost – Number of instances × EC2 hourly rate × Runtime hour)
  • Amazon EBS cost(EBS cost per GB per month ÷ hours in a month) × EBS volume size × number of instances × runtime hours

Note: EBS is priced monthly ($0.1 per GB per month), so we divide by 730 hours to convert to an hourly rate. EMR and EC2 are already priced hourly, so no conversion is needed.

Baseline configuration (0.41 hours):

  • Amazon EMR cost – 9 × $0.27 × 0.41 = $1.00
  • Amazon EC2 cost – 9 × $1.152 × 0.41 = $4.25
  • Amazon EBS cost – ($0.1/730 × 64 × 9 × 0.41) = $0.032
  • Total cost – $5.28

EMR 7.9 optimized configuration (0.33 hours):

  • Amazon EMR cost – (9 × $0.27 × 0.33) = $0.80
  • Amazon EC2 cost – (9 × $1.152 × 0.33) = $3.42
  • Amazon EBS cost – ($0.1/730 × 64 × 9 × 0.33) = $0.025
  • Total cost: $4.25

Total cost savings: 20% per benchmark run, which scales linearly with your production workload frequency.

Set up EMR benchmarking

For detailed instructions and scripts, see the companion GitHub repository.

Prerequisites

To set up Amazon EMR benchmarking, start by completing the following prerequisite steps:

  1. Configure your AWS Command Line Interface (AWS CLI) by running aws configure to point to your benchmarking account,
  2. Create an S3 bucket for test data and results.
  3. Copy the TPC-DS 3TB source data from a publicly available dataset to your S3 bucket using the following command:
    aws s3 cp s3://blogpost-sparkoneks-us-east-1/blog/BLOG_TPCDS-TEST-3T-partitioned s3://<YOUR-BUCKET-NAME>/BLOG_TPCDS-TEST-3T-partitioned --recursive

    Replace <YOUR-BUCKET-NAME> with the name of the S3 bucket you created in step 2.

  4. Build or download the benchmark application JAR file (spark-benchmark-assembly-3.3.0.jar)
  5. Ensure you have appropriate AWS Identity Access Management (IAM) roles for EMR cluster creation and Amazon S3 access

Deploy the baseline EMR cluster (without optimization)

Step 1: Launch EMR 7.9.0 cluster with bootstrap action

The baseline configuration uses a bootstrap action to install Spark 3.5.5 without encryption optimizations. We have made the bootstrap script publicly available in an S3 bucket for your convenience.

Create the default Amazon EMR roles:

aws emr create-default-roles

Now create the cluster:

aws emr create-cluster \
  --name "EMR-7.9-Baseline-Spark-3.5.5" \
  --release-label emr-7.9.0 \
  --applications Name=Spark \
  --ec2-attributes SubnetId=<YOUR-SUBNET-ID>,InstanceProfile=EMR_EC2_DefaultRole  \
  --service-role EMR_DefaultRole
  --instance-groups \
    InstanceGroupType=MASTER,InstanceCount=1,InstanceType=r5d.4xlarge \
    InstanceGroupType=CORE,InstanceCount=8,InstanceType=r5d.4xlarge \
  --bootstrap-actions \
    Path=s3://spark-ba/install-spark-3-5-5-no-encryption.sh,Name="install spark 3.5.5 without encryption optimization" \
  --use-default-roles \
  --log-uri s3://<YOUR-BUCKET-NAME>/logs/baseline/

Note: The bootstrap script is available in a public S3 bucket at s3://spark-ba/install-spark-3-5-5-no-encryption.sh. This script installs Apache Spark 3.5.5 without the encryption optimizations present in the Amazon EMR runtime.

Step 2: Submit the benchmark job to the baseline cluster

Next submit the Spark job using the following commands:

aws emr add-steps \
  --cluster-id <YOUR-BASELINE-CLUSTER-ID> \  
  --steps 'Type=Spark,Name="EMR-7.9-Baseline-Spark-3.5.5 Step",ActionOnFailure=CONTINUE,Args=["--deploy-mode","client","--conf","spark.io.encryption.enabled=false","--class","com.amazonaws.eks.tpcds.BenchmarkSQL","s3://<YOUR-BUCKET-NAME>/jar/spark-benchmark-assembly-3.3.0.jar","s3:// <YOUR-BUCKET-NAME>/blog/BLOG_TPCDS-TEST-3T-partitioned","s3:// <YOUR-BUCKET-NAME>/blog/BASELINE_TPCDS-TEST-3T-RESULT","/opt/tpcds-kit/tools","parquet","3000","3","false","q1-v2.4,q10-v2.4,q11-v2.4,q12-v2.4,q13-v2.4,q14a-v2.4,q14b-v2.4,q15-v2.4,q16-v2.4,q17-v2.4,q18-v2.4,q19-v2.4,q2-v2.4,q20-v2.4,q21-v2.4,q22-v2.4,q23a-v2.4,q23b-v2.4,q24a-v2.4,q24b-v2.4,q25-v2.4,q26-v2.4,q27-v2.4,q28-v2.4,q29-v2.4,q3-v2.4,q30-v2.4,q31-v2.4,q32-v2.4,q33-v2.4,q34-v2.4,q35-v2.4,q36-v2.4,q37-v2.4,q38-v2.4,q39a-v2.4,q39b-v2.4,q4-v2.4,q40-v2.4,q41-v2.4,q42-v2.4,q43-v2.4,q44-v2.4,q45-v2.4,q46-v2.4,q47-v2.4,q48-v2.4,q49-v2.4,q5-v2.4,q50-v2.4,q51-v2.4,q52-v2.4,q53-v2.4,q54-v2.4,q55-v2.4,q56-v2.4,q57-v2.4,q58-v2.4,q59-v2.4,q6-v2.4,q60-v2.4,q61-v2.4,q62-v2.4,q63-v2.4,q64-v2.4,q65-v2.4,q66-v2.4,q67-v2.4,q68-v2.4,q69-v2.4,q7-v2.4,q70-v2.4,q71-v2.4,q72-v2.4,q73-v2.4,q74-v2.4,q75-v2.4,q76-v2.4,q77-v2.4,q78-v2.4,q79-v2.4,q8-v2.4,q80-v2.4,q81-v2.4,q82-v2.4,q83-v2.4,q84-v2.4,q85-v2.4,q86-v2.4,q87-v2.4,q88-v2.4,q89-v2.4,q9-v2.4,q90-v2.4,q91-v2.4,q92-v2.4,q93-v2.4,q94-v2.4,q95-v2.4,q96-v2.4,q97-v2.4,q98-v2.4,q99-v2.4,ss_max-v2.4","true"]'

Deploy the optimized EMR cluster (with encryption optimization)

Step 1: Launch EMR 7.9.0 cluster with Spark runtime

The optimized configuration uses the EMR 7.9.0 Spark runtime without any bootstrap actions:

aws emr create-cluster \
  --name "EMR-7.9-Optimized-Native-Spark" \
  --release-label emr-7.9.0 \
  --applications Name=Spark \
  --ec2-attributes SubnetId=<YOUR-SUBNET-ID>,InstanceProfile=EMR_EC2_DefaultRole \
  --service-role EMR_DefaultRole
  --instance-groups \
    InstanceGroupType=MASTER,InstanceCount=1,InstanceType=r5d.4xlarge \
    InstanceGroupType=CORE,InstanceCount=8,InstanceType=r5d.4xlarge \
  --use-default-roles \
  --log-uri s3://<YOUR-BUCKET-NAME>/logs/optimized/

Example:

aws emr create-cluster \
--name "EMR-7.9-Optimized-Native-Spark" \
--release-label emr-7.9.0 \
--applications Name=Spark \
--ec2-attributes SubnetId=subnet-08a5f71f92bc8a801 \
--instance-groups \
InstanceGroupType=MASTER,InstanceCount=1,InstanceType=r5d.4xlarge \
InstanceGroupType=CORE,InstanceCount=8,InstanceType=r5d.4xlarge \
--bootstrap-actions \
Path=s3://spark-ba/install-spark-3-5-5-no-encryption.sh,Name="install spark 3.5.5 without encryption optimization" \
--use-default-roles \
--log-uri s3://aws-logs-123456789012-us-west-2/elasticmapreduce/

Step 2: Submit the benchmark job to optimized cluster

ext submit the Spark job using the following commands:

aws emr add-steps \
  --cluster-id <YOUR-OPTIMIZED-CLUSTER-ID> \ 
  --steps 'Type=Spark,Name="EMR-7.9-Optimized-Native-Spark Step",ActionOnFailure=CONTINUE,Args=["--deploy-mode","client","--conf","spark.io.encryption.enabled=true","--class","com.amazonaws.eks.tpcds.BenchmarkSQL","s3://<YOUR-BUCKET-NAME>/jar/spark-benchmark-assembly-3.3.0.jar","s3://<YOUR-BUCKET-NAME>/blog/BLOG_TPCDS-TEST-3T-partitioned","s3://<YOUR-BUCKET-NAME>/blog/BASELINE_TPCDS-TEST-3T-RESULT","/opt/tpcds-kit/tools","parquet","3000","3","false","q1-v2.4,q10-v2.4,q11-v2.4,q12-v2.4,q13-v2.4,q14a-v2.4,q14b-v2.4,q15-v2.4,q16-v2.4,q17-v2.4,q18-v2.4,q19-v2.4,q2-v2.4,q20-v2.4,q21-v2.4,q22-v2.4,q23a-v2.4,q23b-v2.4,q24a-v2.4,q24b-v2.4,q25-v2.4,q26-v2.4,q27-v2.4,q28-v2.4,q29-v2.4,q3-v2.4,q30-v2.4,q31-v2.4,q32-v2.4,q33-v2.4,q34-v2.4,q35-v2.4,q36-v2.4,q37-v2.4,q38-v2.4,q39a-v2.4,q39b-v2.4,q4-v2.4,q40-v2.4,q41-v2.4,q42-v2.4,q43-v2.4,q44-v2.4,q45-v2.4,q46-v2.4,q47-v2.4,q48-v2.4,q49-v2.4,q5-v2.4,q50-v2.4,q51-v2.4,q52-v2.4,q53-v2.4,q54-v2.4,q55-v2.4,q56-v2.4,q57-v2.4,q58-v2.4,q59-v2.4,q6-v2.4,q60-v2.4,q61-v2.4,q62-v2.4,q63-v2.4,q64-v2.4,q65-v2.4,q66-v2.4,q67-v2.4,q68-v2.4,q69-v2.4,q7-v2.4,q70-v2.4,q71-v2.4,q72-v2.4,q73-v2.4,q74-v2.4,q75-v2.4,q76-v2.4,q77-v2.4,q78-v2.4,q79-v2.4,q8-v2.4,q80-v2.4,q81-v2.4,q82-v2.4,q83-v2.4,q84-v2.4,q85-v2.4,q86-v2.4,q87-v2.4,q88-v2.4,q89-v2.4,q9-v2.4,q90-v2.4,q91-v2.4,q92-v2.4,q93-v2.4,q94-v2.4,q95-v2.4,q96-v2.4,q97-v2.4,q98-v2.4,q99-v2.4,ss_max-v2.4","true"]'

Benchmark command parameters explained

The Amazon EMR Spark step uses the following parameters:

  • EMR step configuration:
    • Type=Spark: Specifies this is a Spark application step
    • Name=”EMR-7.9-Baseline-Spark-3.5.5″: Human-readable name for the step
    • ActionOnFailure=CONTINUE: Continue with other steps if this one fails
  • Spark submit arguments:
    • –deploy-mode client: Run the driver on the master node (not cluster mode)
    • –class com.amazonaws.eks.tpcds.BenchmarkSQL: Main class for the TPC-DS benchmark
  • Application parameters:
    • JAR file: s3://<YOUR-BUCKET-NAME>/jar/spark-benchmark-assembly-3.3.0.jar
    • Input data: s3://<YOUR-BUCKET-NAME>/blog/BLOG_TPCDS-TEST-3T-partitioned (3 TB TPC-DS dataset)
    • Output location: s3://<YOUR-BUCKET-NAME>/blog/BASELINE_TPCDS-TEST-3T-RESULT (S3 path for results)
    • TPC-DS tools path: /opt/tpcds-kit/tools(local path on EMR nodes)
    • Format: parquet (output format)
    • Scale factor: 3000 (3 TB dataset size)
    • Iterations: 3 (run each query 3 times for averaging)
    • Collect results: false (don’t collect results to driver)
    • Query list: "q1-v2.4,q10-v2.4,...,ss_max-v2.4" (all 104 TPC-DS queries)
    • Final parameter: true (enable detailed logging and metrics)
  • Query coverage:
    • All 104 standard TPC-DS benchmark queries (q1-v2.4 through q99-v2.4)
    • Plus the ss_max-v2.4 query for additional testing
    • Each query runs 3 times to calculate average performance

Summarize the results

  1. Download the test result files from both output S3 locations:
    # Baseline results
    aws s3 cp s3://<YOUR-BUCKET-NAME>/blog/BASELINE_TPCDS-TEST-3T-RESULT/timestamp=xxxx/summary.csv/xxx.csv ./baseline-results.csv
       
    # Optimized results
    aws s3 cp s3://<YOUR-BUCKET-NAME>/blog/OPTIMIZED_TPCDS-TEST-3T-RESULT/timestamp=xxxx/summary.csv/xxx.csv ./optimized-results.csv

  2. The CSV files contain four columns (without headers):
    • Query name
    • Median time (seconds)
    • Minimum time (seconds)
    • Maximum time (seconds)
  3. Calculate performance metrics for comparison:
    • Average time per query: AVERAGE(median, min, max) for each query
    • Total runtime: Sum of all median times
    • Geometric mean: GEOMEAN(average times) across all queries
    • Speedup: Calculate the ratio between baseline and optimized for each query
  4. Create comparison analysis:Speedup = (Baseline Time - Optimized Time) / Baseline Time * 100%

Testing configuration details

The following table summarizes the test environment used for this post:

Parameter Value
EMR release emr-7.9.0 (both configurations)
Baseline Spark version 3.5.5 (installed through bootstrap action)
Baseline bootstrap script s3://spark-ba/install-spark-3-5-5-no-encryption.sh (public)
Optimized spark version Amazon EMR Spark runtime
Cluster size 9 nodes (1 primary and 8 core)
Instance type r5d.4xlarge
vCPUs per node 16
Memory per node 128 GB
Instance storage 600 GB SSD
EBS volume 64 GB gp2 (2 volumes per instance)
Total vCPUs 144 (9 × 16)
Total memory 1152 GB (9 × 128)
Dataset TPC-DS 3TB (Parquet format)
Queries 104 queries (TPC-DS v2.4)
Iterations 3 runs per query
DRA Disabled for consistent benchmarking

Clean up

To avoid incurring future charges, delete the resources you created:

  1. Terminate both EMR clusters:
    aws emr terminate-clusters --cluster-ids <YOUR-BASELINE-CLUSTER-ID> <YOUR-OPTIMIZED-CLUSTER-ID>

  2. Delete S3 test results if no longer needed:
    aws s3 rm s3://<YOUR-BUCKET-NAME>/blog/BASELINE_TPCDS-TEST-3T-RESULT/ --recursive
    aws s3 rm s3://<YOUR-BUCKET-NAME>/blog/OPTIMIZED_TPCDS-TEST-3T-RESULT/ --recursive
    aws s3 rm s3://<YOUR-BUCKET-NAME>/logs/ --recursive

  3. Remove IAM roles if created specifically for testing

Key findings

  • Up to 20% performance improvement using the Amazon EMR 7.9’s Spark runtime with no code changes required
  • 20% cost savings because of reduced runtime
  • Significant gains for shuffle-heavy, join-intensive workloads
  • 100% API compatibility with open source Apache Spark
  • Simple migration from custom Spark builds to EMR runtime
  • Easy benchmarking using publicly available bootstrap scripts

Conclusion

You can run your Apache Spark workloads up to 20% faster and at lower cost without making any changes to your applications by using the Amazon EMR 7.9.0 optimized Spark runtime. This improvement is achieved through numerous optimizations in the EMR Spark runtime, including enhanced encryption handling, improved data serialization, and optimized shuffle operations.

To learn more about Amazon EMR 7.9 and best practices, see the EMR documentation. For configuration guidance and tuning advice, subscribe to the AWS Big Data Blog.

Related resources:

If you’re running Spark workloads on Amazon EMR today, we encourage you to test the EMR 7.9 Spark runtime with your production workloads and measure the improvements specific to your use case.


About the authors

Sonu Kumar Singh

Sonu Kumar Singh

Sonu is a Senior Solutions Architect with more than 13 years of experience, with a specialization in Analytics and Healthcare domain. He has been instrumental in catalyzing transformative shifts in organizations by enabling data-driven decision-making thereby fueling innovation and growth. He enjoys it when something he designed or created brings a positive impact.

Roshin Babu

Roshin Babu

Roshin is a Sr. Specialist Solutions architect at AWS, where he collaborates with the sales team to support public sector clients. His role focuses on developing innovative solutions that solve complex business challenges while driving increased adoption of AWS analytics services. When he’s not working, Roshin is passionate about exploring new destinations, discovering great food, and enjoying soccer both as a player and fan.Polaris Jhandi

Polaris Jhandi

Polaris Jhandi

Polaris is a Cloud Application Architect with AWS Professional Services. He has a background in AI/ML and big data. He is currently working with customers to migrate their legacy mainframe applications to the AWS Cloud.Zheng Yuan

Zheng Yuan

Zheng Yuan

Zheng is a Software Engineer on the Amazon EMR Spark team, where he focuses on improving the performance of the Spark execution engine across various use cases.