Tag Archives: news

Welcoming Ryan Smith as the New Managing Editor at STH

2025-12-04 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/welcoming-ryan-smith-as-the-new-managing-editor-at-sth/

Take a moment to welcome Ryan Smith as STH’s new Managing Editor marking a major milestone in the site’s history

The post Welcoming Ryan Smith as the New Managing Editor at STH appeared first on ServeTheHome.

AI Data Center Markets are so Big that Micron is Exiting its Crucial Consumer Business

2025-12-03 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/ai-data-center-markets-are-so-big-that-micron-is-exiting-its-crucial-consumer-business/

Micron just announced that it is exiting its Crucial consumer business to instead focus on enterprise and commercial segments

The post AI Data Center Markets are so Big that Micron is Exiting its Crucial Consumer Business appeared first on ServeTheHome.

Amazon Bedrock adds reinforcement ﬁne-tuning simplifying how developers build smarter, more accurate AI models

2025-12-03 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/improve-model-accuracy-with-reinforcement-fine-tuning-in-amazon-bedrock/

Organizations face a challenging trade-off when adapting AI models to their specific business needs: settle for generic models that produce average results, or tackle the complexity and expense of advanced model customization. Traditional approaches force a choice between poor performance with smaller models or the high costs of deploying larger model variants and managing complex infrastructure. Reinforcement fine-tuning is an advanced technique that trains models using feedback instead of massive labeled datasets, but implementing it typically requires specialized ML expertise, complicated infrastructure, and significant investment—with no guarantee of achieving the accuracy needed for specific use cases.

Today, we’re announcing reinforcement fine-tuning in Amazon Bedrock, a new model customization capability that creates smarter, more cost-effective models that learn from feedback and deliver higher-quality outputs for specific business needs. Reinforcement fine-tuning uses a feedback-driven approach where models improve iteratively based on reward signals, delivering 66% accuracy gains on average over base models.

Amazon Bedrock automates the reinforcement fine-tuning workflow, making this advanced model customization technique accessible to everyday developers without requiring deep machine learning (ML) expertise or large labeled datasets.

How reinforcement fine-tuning works
Reinforcement fine-tuning is built on top of reinforcement learning principles to address a common challenge: getting models to consistently produce outputs that align with business requirements and user preferences.

While traditional fine-tuning requires large, labeled datasets and expensive human annotation, reinforcement fine-tuning takes a different approach. Instead of learning from fixed examples, it uses reward functions to evaluate and judge which responses are considered good for particular business use cases. This teaches models to understand what makes a quality response without requiring massive amounts of pre-labeled training data, making advanced model customization in Amazon Bedrock more accessible and cost-effective.

Here are the benefits of using reinforcement fine-tuning in Amazon Bedrock:

Ease of use – Amazon Bedrock automates much of the complexity, making reinforcement fine-tuning more accessible to developers building AI applications. Models can be trained using existing API logs in Amazon Bedrock or by uploading datasets as training data, eliminating the need for labeled datasets or infrastructure setup.
Better model performance – Reinforcement fine-tuning improves model accuracy by 66% on average over base models, enabling optimization for price and performance by training smaller, faster, and more efficient model variants. This works with Amazon Nova 2 Lite model, improving quality and price performance for specific business needs, with support for additional models coming soon.
Security – Data remains within the secure AWS environment throughout the entire customization process, mitigating security and compliance concerns.

The capability supports two complementary approaches to provide flexibility for optimizing models:

Reinforcement Learning with Verifiable Rewards (RLVR) uses rule-based graders for objective tasks like code generation or math reasoning.
Reinforcement Learning from AI Feedback (RLAIF) employs AI-based judges for subjective tasks like instruction following or content moderation.

Getting started with reinforcement fine-tuning in Amazon Bedrock
Let’s walk through creating a reinforcement fine-tuning job.

First, I access the Amazon Bedrock console. Then, I navigate to the Custom models page. I choose Create and then choose Reinforcement fine-tuning job.

I start by entering the name of this customization job and then select my base model. At launch, reinforcement fine-tuning supports Amazon Nova 2 Lite, with support for additional models coming soon.

Next, I need to provide training data. I can use my stored invocation logs directly, eliminating the need to upload separate datasets. I can also upload new JSONL files or select existing datasets from Amazon Simple Storage Service (Amazon S3). Reinforcement fine-tuning automatically validates my training dataset and supports the OpenAI Chat Completions data format. If I provide invocation logs in the Amazon Bedrock invoke or converse format, Amazon Bedrock automatically converts them to the Chat Completions format.

The reward function setup is where I define what constitutes a good response. I have two options here. For objective tasks, I can select Custom code and write custom Python code that gets executed through AWS Lambda functions. For more subjective evaluations, I can select Model as judge to use foundation models (FMs) as judges by providing evaluation instructions.

Here, I select Custom code, and I create a new Lambda function or use an existing one as a reward function. I can start with one of the provided templates and customize it for my specific needs.

I can optionally modify default hyperparameters like learning rate, batch size, and epochs.

For enhanced security, I can configure virtual private cloud (VPC) settings and AWS Key Management Service (AWS KMS) encryption to meet my organization’s compliance requirements. Then, I choose Create to start the model customization job.

During the training process, I can monitor real-time metrics to understand how the model is learning. The training metrics dashboard shows key performance indicators including reward scores, loss curves, and accuracy improvements over time. These metrics help me understand whether the model is converging properly and if the reward function is effectively guiding the learning process.

When the reinforcement fine-tuning job is completed, I can see the final job status on the Model details page.

Once the job is completed, I can deploy the model with a single click. I select Set up inference, then choose Deploy for on-demand.

Here, I provide a few details for my model.

After deployment, I can quickly evaluate the model’s performance using the Amazon Bedrock playground. This helps me to test the fine-tuned model with sample prompts and compare its responses against the base model to validate the improvements. I select Test in playground.

The playground provides an intuitive interface for rapid testing and iteration, helping me confirm that the model meets my quality requirements before integrating it into production applications.

Interactive demo
Learn more by navigating an interactive demo of Amazon Bedrock reinforcement fine-tuning in action.

Additional things to know
Here are key points to note:

Templates — There are seven ready-to-use reward function templates covering common use cases for both objective and subjective tasks.
Pricing — To learn more about pricing, refer to the Amazon Bedrock pricing page.
Security — Training data and custom models remain private and aren’t used to improve FMs for public use. It supports VPC and AWS KMS encryption for enhanced security.

Get started with reinforcement fine-tuning by visiting the reinforcement fine-tuning documentation and by accessing the Amazon Bedrock console.

Happy building!
— Donnie

New serverless customization in Amazon SageMaker AI accelerates model fine-tuning

2025-12-03 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/new-serverless-customization-in-amazon-sagemaker-ai-accelerates-model-fine-tuning/

Today, I’m happy to announce new serverless customization in Amazon SageMaker AI for popular AI models, such as Amazon Nova, DeepSeek, GPT-OSS, Llama, and Qwen. The new customization capability provides an easy-to-use interface for the latest fine-tuning techniques like reinforcement learning, so you can accelerate the AI model customization process from months to days.

With a few clicks, you can seamlessly select a model and customization technique, and handle model evaluation and deployment—all entirely serverless so you can focus on model tuning rather than managing infrastructure. When you choose serverless customization, SageMaker AI automatically selects and provisions the appropriate compute resources based on the model and data size.

Getting started with serverless model customization
You can get started customizing models in Amazon SageMaker Studio. Choose Models in the left navigation pane and check out your favorite AI models to be customized.

Customize with UI
You can customize AI models in a only few clicks. In the Customize model dropdown list for a specific model such as Meta Llama 3.1 8B Instruct, choose Customize with UI.

You can select a customization technique used to adapt the base model to your use case. SageMaker AI supports Supervised Fine-Tuning and the latest model customization techniques including Direct Preference Optimization, Reinforcement Learning from Verifiable Rewards (RLVR), and Reinforcement Learning from AI Feedback (RLAIF). Each technique optimizes models in different ways, with selection influenced by factors such as dataset size and quality, available computational resources, task at hand, desired accuracy levels, and deployment constraints.

Upload or select a training dataset to match the format required by the customization technique selected. Use the values of batch size, learning rate, and number of epochs recommended by the technique selected. You can conﬁgure advanced settings such as hyperparameters, a newly introduced serverless MLﬂow application for experiment tracking, and network and storage volume encryption. Choose Submit to get started on your model training job.

After your training job is complete, you can see the models you created in the My Models tab. Choose View details in one of your models.

By choosing Continue customization, you can continue to customize your model by adjusting hyperparameters or training with different techniques. By choosing Evaluate, you can evaluate your customized model to see how it performs compared to the base model.

When you complete both jobs, you can choose either the SageMaker or Bedrock in the Deploy dropdown list to deploy your model.

You can choose Amazon Bedrock for serverless inference. Choose Bedrock and the model name to deploy the model into Amazon Bedrock. To find your deployed models, choose Imported models in the Bedrock console.

You can also deploy your model to a SageMaker AI inference endpoint if you want to control your deployment resources such as an instance type and instance count. After the SageMaker AI deployment is In service, you can use this endpoint to perform inference. In the Playground tab, you can test your customized model with a single prompt or chat mode.

With the serverless MLﬂow capability, you can automatically log all critical experiment metrics without modifying code and access rich visualizations for further analysis.

Customize with code
When you choose customizing with code, you can see a sample notebook to fine-tune or deploy AI models. If you want to edit the sample notebook, open it in JupyterLab. Alternatively, you can deploy the model immediately by choosing Deploy.

You can choose the Amazon Bedrock or SageMaker AI endpoint by selecting the deployment resources either from Amazon SageMaker Inference or Amazon SageMaker Hyperpod.

When you choose Deploy on the bottom right of the page, it will be redirected back to the model detail page. After the SageMaker AI deployment is in service, you can use this endpoint to perform inference.

Okay, you’ve seen how to streamline the model customization in the SageMaker AI. You can now choose your favorite way. To learn more, visit the Amazon SageMaker AI Developer Guide.

Now available
New serverless AI model customization in Amazon SageMaker AI is now available in US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland) Regions. You only pay for the tokens processed during training and inference. To learn more details, visit Amazon SageMaker AI pricing page.

Give it a try in Amazon SageMaker Studio and send feedback to AWS re:Post for SageMaker or through your usual AWS Support contacts.

— Channy

Introducing checkpointless and elastic training on Amazon SageMaker HyperPod

2025-12-03 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/introducing-checkpointless-and-elastic-training-on-amazon-sagemaker-hyperpod/

Today, we’re announcing two new AI model training features within Amazon SageMaker HyperPod: checkpointless training, an approach that mitigates the need for traditional checkpoint-based recovery by enabling peer-to-peer state recovery, and elastic training, enabling AI workloads to automatically scale based on resource availability.

Checkpointless training – Checkpointless training eliminates disruptive checkpoint-restart cycles, maintaining forward training momentum despite failures, reducing recovery time from hours to minutes. Accelerate your AI model development, reclaim days from development timelines, and confidently scale training workflows to thousands of AI accelerators.
Elastic training – Elastic training maximizes cluster utilization as training workloads automatically expand to use idle capacity as it becomes available, and contract to yield resources as higher-priority workloads like inference volumes peak. Save hours of engineering time per week spent reconfiguring training jobs based on compute availability.

Rather than spending time managing training infrastructure, these new training techniques mean that your team can concentrate entirely on enhancing model performance, ultimately getting your AI models to market faster. By eliminating the traditional checkpoint dependencies and fully utilizing available capacity, you can significantly reduce model training completion times.

Checkpointless training: How it works
Traditional checkpoint-based recovery has these sequential job stages: 1) job termination and restart, 2) process discovery and network setup, 3) checkpoint retrieval, 4) data loader initialization, and 5) training loop resumption. When failures occur, each stage can become a bottleneck and training recovery can take up to an hour on self-managed training clusters. The entire cluster must wait for every single stage to complete before training can resume. This can lead to the entire training cluster sitting idle during recovery operations, which increases costs and extends the time to market.

Checkpointless training removes this bottleneck entirely by maintaining continuous model state preservation across the training cluster. When failures occur, the system instantly recovers by using healthy peers, avoiding the need for a checkpoint-based recovery that requires restarting the entire job. As a result, checkpointless training enables fault recovery in minutes.

Checkpointless training is designed for incremental adoption and built on four core components that work together: 1) collective communications initialization optimizations, 2) memory-mapped data loading that enables caching, 3) in-process recovery, and 4) checkpointless peer-to-peer state replication. These components are orchestrated through the HyperPod training operator that is used to launch the job. Each component optimizes a specific step in the recovery process, and together they enable automatic detection and recovery of infrastructure faults in minutes with zero manual intervention, even with thousands of AI accelerators. You can progressively enable each of these features as your training scales.

The latest Amazon Nova models were trained using this technology on tens of thousands of accelerators. Additionally, based on internal studies on cluster sizes ranging between 16 GPUs to over 2,000 GPUs, checkpointless training showcased significant improvements in recovery times, reducing downtime by over 80% compared to traditional checkpoint-based recovery.

To learn more, visit HyperPod Checkpointless Training in the Amazon SageMaker AI Developer Guide.

Elastic training: How it works
On clusters that run different types of modern AI workloads, accelerator availability can change continuously throughout the day as short-duration training runs complete, inference spikes occur and subside, or resources free up from completed experiments. Despite this dynamic availability of AI accelerators, traditional training workloads remain locked into their initial compute allocation, unable to take advantage of idle accelerators without manual intervention. This rigidity leaves valuable GPU capacity unused and prevents organizations from maximizing their infrastructure investment.

Elastic training transforms how training workloads interact with cluster resources. Training jobs can automatically scale up to utilize available accelerators and gracefully contract when resources are needed elsewhere, all while maintaining training quality.

Workload elasticity is enabled through the HyperPod training operator that orchestrates scaling decisions through integration with the Kubernetes control plane and resource scheduler. It continuously monitors cluster state through three primary channels: pod lifecycle events, node availability changes, and resource scheduler priority signals. This comprehensive monitoring enables near-instantaneous detection of scaling opportunities, whether from newly available resources or requests from higher-priority workloads.

The scaling mechanism relies on adding and removing data parallel replicas. When additional compute resources become available, new data parallel replicas join the training job, accelerating throughput. Conversely, during scale-down events (for example, when a higher-priority workload requests resources), the system scales down by removing replicas rather than terminating the entire job, allowing training to continue at reduced capacity.

Across different scales, the system preserves the global batch size and adapts learning rates, preventing model convergence from being adversely impacted. This enables workloads to dynamically scale up or down to utilize available AI accelerators without any manual intervention.

You can start elastic training through the HyperPod recipes for publicly available foundation models (FMs) including Llama and GPT-OSS. Additionally, you can modify your PyTorch training scripts to add elastic event handlers, which enable the job to dynamically scale.

To learn more, visit the HyperPod Elastic Training in the Amazon SageMaker AI Developer Guide. To get started, find the HyperPod recipes available in the AWS GitHub repository.

Now available
Both features are available in all the Regions in which Amazon SageMaker HyperPod is available. You can use these training techniques without additional cost. To learn more, visit the SageMaker HyperPod product page and SageMaker AI pricing page.

Give it a try and send feedback to AWS re:Post for SageMaker or through your usual AWS Support contacts.

— Channy

NVIDIA NVLink Fusion Tapped for Future AWS Trainium4 Deployments

2025-12-03 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/nvidia-nvlink-fusion-tapped-for-future-aws-trainium4-deployments/

NVIDIA NVLink Fusion has been tapped for future AWS Trainium4 deployments as AWS selects NVIDIA for scale-up capabilities

The post NVIDIA NVLink Fusion Tapped for Future AWS Trainium4 Deployments appeared first on ServeTheHome.

Announcing replication support and Intelligent-Tiering for Amazon S3 Tables

2025-12-02 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/announcing-replication-support-and-intelligent-tiering-for-amazon-s3-tables/

Today, we’re announcing two new capabilities for Amazon S3 Tables: support for the new Intelligent-Tiering storage class that automatically optimizes costs based on access patterns, and replication support to automatically maintain consistent Apache Iceberg table replicas across AWS Regions and accounts without manual sync.

Organizations working with tabular data face two common challenges. First, they need to manually manage storage costs as their datasets grow and access patterns change over time. Second, when maintaining replicas of Iceberg tables across Regions or accounts, they must build and maintain complex architectures to track updates, manage object replication, and handle metadata transformations.

S3 Tables Intelligent-Tiering storage class
With the S3 Tables Intelligent-Tiering storage class, data is automatically tiered to the most cost-effective access tier based on access patterns. Data is stored in three low-latency tiers: Frequent Access, Infrequent Access (40% lower cost than Frequent Access), and Archive Instant Access (68% lower cost compared to Infrequent Access). After 30 days without access, data moves to Infrequent Access, and after 90 days, it moves to Archive Instant Access. This happens without changes to your applications or impact on performance.

Table maintenance activities, including compaction, snapshot expiration, and unreferenced file removal, operate without affecting the data’s access tiers. Compaction automatically processes only data in the Frequent Access tier, optimizing performance for actively queried data while reducing maintenance costs by skipping colder files in lower-cost tiers.

By default, all existing tables use the Standard storage class. When creating new tables, you can specify Intelligent-Tiering as the storage class, or you can rely on the default storage class configured at the table bucket level. You can set Intelligent-Tiering as the default storage class for your table bucket to automatically store tables in Intelligent-Tiering when no storage class is specified during creation.

Let me show you how it works
You can use the AWS Command Line Interface (AWS CLI) and the put-table-bucket-storage-class and get-table-bucket-storage-class commands to change or verify the storage tier of your S3 table bucket.

# Change the storage class
aws s3tables put-table-bucket-storage-class \
   --table-bucket-arn $TABLE_BUCKET_ARN  \
   --storage-class-configuration storageClass=INTELLIGENT_TIERING

# Verify the storage class
aws s3tables get-table-bucket-storage-class \
   --table-bucket-arn $TABLE_BUCKET_ARN  \

{ "storageClassConfiguration":
   { 
      "storageClass": "INTELLIGENT_TIERING"
   }
}

S3 Tables replication support
The new S3 Tables replication support helps you maintain consistent read replicas of your tables across AWS Regions and accounts. You specify the destination table bucket and the service creates read-only replica tables. It replicates all updates chronologically while preserving parent-child snapshot relationships. Table replication helps you build global datasets to minimize query latency for geographically distributed teams, meet compliance requirements, and provide data protection.

You can now easily create replica tables that deliver similar query performance as their source tables. Replica tables are updated within minutes of source table updates and support independent encryption and retention policies from their source tables. Replica tables can be queried using Amazon SageMaker Unified Studio or any Iceberg-compatible engine including DuckDB, PyIceberg, Apache Spark, and Trino.

You can create and maintain replicas of your tables through the AWS Management Console or APIs and AWS SDKs. You specify one or more destination table buckets to replicate your source tables. When you turn on replication, S3 Tables automatically creates read-only replica tables in your destination table buckets, backfills them with the latest state of the source table, and continually monitors for new updates to keep replicas in sync. This helps you meet time-travel and audit requirements while maintaining multiple replicas of your data.

Let me show you how it works
To show you how it works, I proceed in three steps. First, I create an S3 table bucket, create an Iceberg table, and populate it with data. Second, I configure the replication. Third, I connect to the replicated table and query the data to show you that changes are replicated.

For this demo, the S3 team kindly gave me access to an Amazon EMR cluster already provisioned. You can follow the Amazon EMR documentation to create your own cluster. They also created two S3 table buckets, a source and a destination for the replication. Again, the S3 Tables documentation will help you to get started.

I take a note of the two S3 Tables bucket Amazon Resource Names (ARNs). In this demo, I refer to these as the environment variables SOURCE_TABLE_ARN and DEST_TABLE_ARN.

First step: Prepare the source database

I start a terminal, connect to the EMR cluster, start a Spark session, create a table, and insert a row of data. The commands I use in this demo are documented in Accessing tables using the Amazon S3 Tables Iceberg REST endpoint.

sudo spark-shell \
--packages "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160,software.amazon.awssdk:url-connection-client:2.20.160" \
--master "local[*]" \
--conf "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" \
--conf "spark.sql.defaultCatalog=spark_catalog" \
--conf "spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog" \
--conf "spark.sql.catalog.spark_catalog.type=rest" \
--conf "spark.sql.catalog.spark_catalog.uri=https://s3tables.us-east-1.amazonaws.com/iceberg" \
--conf "spark.sql.catalog.spark_catalog.warehouse=arn:aws:s3tables:us-east-1:012345678901:bucket/aws-news-blog-test" \
--conf "spark.sql.catalog.spark_catalog.rest.sigv4-enabled=true" \
--conf "spark.sql.catalog.spark_catalog.rest.signing-name=s3tables" \
--conf "spark.sql.catalog.spark_catalog.rest.signing-region=us-east-1" \
--conf "spark.sql.catalog.spark_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO" \
--conf "spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialProvider" \
--conf "spark.sql.catalog.spark_catalog.rest-metrics-reporting-enabled=false"

spark.sql("""
CREATE TABLE s3tablesbucket.test.aws_news_blog (
customer_id STRING,
address STRING
) USING iceberg
""")

spark.sql("INSERT INTO s3tablesbucket.test.aws_news_blog VALUES ('cust1', 'val1')")

spark.sql("SELECT * FROM s3tablesbucket.test.aws_news_blog LIMIT 10").show()
+-----------+-------+
|customer_id|address|
+-----------+-------+
|      cust1|   val1|
+-----------+-------+

So far, so good.

Second step: Configure the replication for S3 Tables

Now, I use the CLI on my laptop to configure the S3 table bucket replication.

Before doing so, I create an AWS Identity and Access Management (IAM) policy to authorize the replication service to access my S3 table bucket and encryption keys. Refer to the S3 Tables replication documentation for the details. The permissions I used for this demo are:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:*",
                "s3tables:*",
                "kms:DescribeKey",
                "kms:GenerateDataKey",
                "kms:Decrypt"
            ],
            "Resource": "*"
        }
    ]
}

After having created this IAM policy, I can now proceed and configure the replication:

aws s3tables-replication put-table-replication \
--table-arn ${SOURCE_TABLE_ARN} \
--configuration  '{
    "role": "arn:aws:iam::<MY_ACCOUNT_NUMBER>:role/S3TableReplicationManualTestingRole", 
    "rules":[
        {
            "destinations": [
                {
                    "destinationTableBucketARN": "${DST_TABLE_ARN}"
                }]
        }
    ]

The replication starts automatically. Updates are typically replicated within minutes. The time it takes to complete depends on the volume of data in the source table.

Third step: Connect to the replicated table and query the data

Now, I connect to the EMR cluster again, and I start a second Spark session. This time, I use the destination table.

To verify the replication works, I insert a second row of data on the source table.

spark.sql("INSERT INTO s3tablesbucket.test.aws_news_blog VALUES ('cust2', 'val2')")

I wait a few minutes for the replication to trigger. I follow the status of the replication with the get-table-replication-status command.

aws s3tables-replication get-table-replication-status \
--table-arn ${SOURCE_TABLE_ARN} \
{
    "sourceTableArn": "arn:aws:s3tables:us-east-1:012345678901:bucket/manual-test/table/e0fce724-b758-4ee6-85f7-ca8bce556b41",
    "destinations": [
        {
            "replicationStatus": "pending",
            "destinationTableBucketArn": "arn:aws:s3tables:us-east-1:012345678901:bucket/manual-test-dst",
            "destinationTableArn": "arn:aws:s3tables:us-east-1:012345678901:bucket/manual-test-dst/table/5e3fb799-10dc-470d-a380-1a16d6716db0",
            "lastSuccessfulReplicatedUpdate": {
                "metadataLocation": "s3://e0fce724-b758-4ee6-8-i9tkzok34kum8fy6jpex5jn68cwf4use1b-s3alias/e0fce724-b758-4ee6-85f7-ca8bce556b41/metadata/00001-40a15eb3-d72d-43fe-a1cf-84b4b3934e4c.metadata.json",
                "timestamp": "2025-11-14T12:58:18.140281+00:00"
            }
        }
    ]
}

When replication status shows ready, I connect to the EMR cluster and I query the destination table. Without surprise, I see the new row of data.

Additional things to know
Here are a couple of additional points to pay attention to:

Replication for S3 Tables supports both Apache Iceberg V2 and V3 table formats, giving you flexibility in your table format choice.
You can configure replication at the table bucket level, making it straightforward to replicate all tables under that bucket without individual table configurations.
Your replica tables maintain the storage class you choose for your destination tables, which means you can optimize for your specific cost and performance needs.
Any Iceberg-compatible catalog can directly query your replica tables without additional coordination—they only need to point to the replica table location. This gives you flexibility in choosing query engines and tools.

Pricing and availability
You can track your storage usage by access tier through AWS Cost and Usage Reports and Amazon CloudWatch metrics. For replication monitoring, AWS CloudTrail logs provide events for each replicated object.

There are no additional charges to configure Intelligent-Tiering. You only pay for storage costs in each tier. Your tables continue to work as before, with automatic cost optimization based on your access patterns.

For S3 Tables replication, you pay the S3 Tables charges for storage in the destination table, for replication PUT requests, for table updates (commits), and for object monitoring on the replicated data. For cross-Region table replication, you also pay for inter-Region data transfer out from Amazon S3 to the destination Region based on the Region pair.

As usual, refer to the Amazon S3 pricing page for the details.

Both capabilities are available today in all AWS Regions where S3 Tables are supported.

To learn more about these new capabilities, visit the Amazon S3 Tables documentation or try them in the Amazon S3 console today. Share your feedback through AWS re:Post for Amazon S3 or through your AWS Support contacts.

— seb

Amazon Bedrock AgentCore adds quality evaluations and policy controls for deploying trusted AI agents

2025-12-02 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-bedrock-agentcore-adds-quality-evaluations-and-policy-controls-for-deploying-trusted-ai-agents/

Today, we’re announcing new capabilities in Amazon Bedrock AgentCore to further remove barriers holding AI agents back from production. Organizations across industries are already building on AgentCore, the most advanced agentic platform to build, deploy, and operate highly capable agents securely at any scale. In just 5 months since preview, the AgentCore SDK has been downloaded over 2 million times. For example:

PGA TOUR, a pioneer and innovation leader in sports has built a multi-agent content generation system to create articles for their digital platforms. The new solution, built on AgentCore, enables the PGA TOUR to provide comprehensive coverage for every player in the field, by increasing content writing speed by 1,000 percent while achieving a 95 percent reduction in costs.
Independent software vendors (ISVs) like Workday are building the software of the future on AgentCore. AgentCore Code Interpreter provides Workday Planning Agent with secure data protection and essential features for financial data exploration. Users can analyze financial and operational data through natural language queries, making financial planning intuitive and self-driven. This capability reduces time spent on routine planning analysis by 30 percent, saving approximately 100 hours per month.
Grupo Elfa, a Brazilian distributor and retailer, relies on AgentCore Observability for complete audit traceability and real-time metrics of their agents, transforming their reactive processes into proactive operations. Using this unified platform, their sales team can handle thousands of daily price quotes while the organization maintains full visibility of agent decisions, helping achieve 100 percent traceability of agent decisions and interactions, and reduced problem resolution time by 50 percent.

As organizations scale their agent deployments, they face challenges around implementing the right boundaries and quality checks to confidently deploy agents. The autonomy that makes agents powerful also makes them hard to confidently deploy at scale, as they might access sensitive data inappropriately, make unauthorized decisions, or take unexpected actions. Development teams must balance enabling agent autonomy while ensuring they operate within acceptable boundaries and with the quality you require to put them in front of customers and employees.

The new capabilities available today take the guesswork out of this process and help you build and deploy trusted AI agents with confidence:

Policy in AgentCore (Preview) – Defines clear boundaries for agent actions by intercepting AgentCore Gateway tool calls before they run using policies with fine-grained permissions.
AgentCore Evaluations (Preview) – Monitors the quality of your agents based on real-world behavior using built-in evaluators for dimensions such as correctness and helpfulness, plus custom evaluators for business-specific requirements.

We’re also introducing features that expand what agents can do:

Episodic functionality in AgentCore Memory – A new long-term strategy that helps agents learn from experiences and adapt solutions across similar situations for improved consistency and performance in similar future tasks.
Bidirectional streaming in AgentCore Runtime – Deploys voice agents where both users and agents can speak simultaneously following a natural conversation flow.

Policy in AgentCore for precise agent control
Policy gives you control over the actions agents can take and are applied outside of the agent’s reasoning loop, treating agents as autonomous actors whose decisions require verification before reaching tools, systems, or data. It integrates with AgentCore Gateway to intercept tool calls as they happen, processing requests while maintaining operational speed, so workflows remain fast and responsive.

You can create policies using natural language or directly use Cedar—an open source policy language for fine-grained permissions—simplifying the process to set up, understand, and audit rules without writing custom code. This approach makes policy creation accessible to development, security, and compliance teams who can create, understand, and audit rules without specialized coding knowledge.

The policies operate independently of how the agent was built or which model it uses. You can define which tools and data agents can access—whether they are APIs, AWS Lambda functions, Model Context Protocol (MCP) servers, or third-party services—what actions they can perform, and under what conditions.

Teams can define clear policies once and apply them consistently across their organization. With policies in place, developers gain the freedom to create innovative agentic experiences, and organizations can deploy their agents to act autonomously while knowing they’ll stay within defined boundaries and compliance requirements.

Using Policy in AgentCore
You can start by creating a policy engine in the new Policy section of the AgentCore console and associate it with one or more AgentCore gateways.

A policy engine is a collection of policies that are evaluated at the gateway endpoint. When associating a gateway with a policy engine, you can choose whether to enforce the result of the policy—effectively permitting or denying access to a tool call—or to only emit logs. Using logs helps you test and validate a policy before enabling it in production.

Then, you can define the policies to apply to have granular control over access to the tools offered by the associated AgentCore gateways.

To create a policy, you can start with a natural language description (that should include information of the authentication claims to use) or directly edit Cedar code.

Natural language-based policy authoring provides a more accessible way for you to create fine-grained policies. Instead of writing formal policy code, you can describe rules in plain English. The system interprets your intent, generates candidate policies, validates them against the tool schema, and uses automated reasoning to check safety conditions—identifying prompts that are overly permissive, overly restrictive, or contain conditions that can never be satisfied.

Unlike generic large language model (LLM) translations, this feature understands the structure of your tools and generates policies that are both syntactically correct and semantically aligned with your intent, while flagging rules that cannot be enforced. It is also available as a Model Context Protocol (MCP) server, so you can author and validate policies directly in your preferred AI-assisted coding environment as part of your normal development workflow. This approach reduces onboarding time and helps you write high-quality authorization rules without needing Cedar expertise.

The following sample policy uses information from the OAuth claims in the JWT token used to authenticate to an AgentCore gateway (for the role) and the arguments passed to the tool call (context.input) to validate access to the tool processing a refund. Only an authenticated user with the refund-agent role can access the tool but for amounts (context.input.amount) lower than $200 USD.

permit(
  principal is AgentCore::OAuthUser,
  action == AgentCore::Action::"RefundTool__process_refund",
  resource == AgentCore::Gateway::"<GATEWAY_ARN>"
)
when {
  principal.hasTag("role") &&
  principal.getTag("role") == "refund-agent" &&
  context.input.amount < 200
};

AgentCore Evaluations for continuous, real-time quality intelligence
AgentCore Evaluations is a fully managed service that helps you continuously monitor and analyze agent performance based on real-world behavior. With AgentCore Evaluations, you can use built-in evaluators for common quality dimensions such as correctness, helpfulness, tool selection accuracy, safety, goal success rate, and context relevance. You can also create custom model-based scoring systems configured with your choice of prompt and model for business-tailored scoring while the service samples live agent interactions and scores them continuously.

All results from AgentCore Evaluations are visualized in Amazon CloudWatch alongside AgentCore Observability insights, providing one place for unified monitoring. You can also set up alerts and alarms on the evaluation scores to proactively monitor agent quality and respond when metrics fall outside acceptable thresholds.

You can use AgentCore Evaluations during the testing phase where you can check an agent against the baseline before deployment to stop faulty versions from reaching users, and in production for continuous improvement of your agents. When quality metrics drop below defined thresholds—such as a customer service agent satisfaction declining or politeness scores dropping by more than 10 percent over an 8-hour period—the system triggers immediate alerts, helping to detect and address quality issues faster.

Using AgentCore Evaluations
You can create an online evaluation in the new Evaluations section of the AgentCore console. You can use as data source an AgentCore agent endpoint or a CloudWatch log group used by an external agent. For example, I use here the same sample customer support agent I shared when we introduced AgentCore in preview.

Then, you can select the evaluators to use, including custom evaluators that you can define starting from the existing templates or build from scratch.

For example, for a customer support agent, you can select metrics such as:

Correctness – Evaluates whether the information in the agent’s response is factually accurate
Faithfulness – Evaluates whether information in the response is supported by provided context/sources
Helpfulness – Evaluates from user’s perspective how useful and valuable the agent’s response is
Harmfulness – Evaluates whether the response contains harmful content
Stereotyping – Detects content that makes generalizations about individuals or groups

The evaluators for tool selection and tool parameter accuracy can help you understand if an agent is choosing the right tool for a task and extracting the correct parameters from the user queries.

To complete the creation of the evaluation, you can choose the sampling rate and optional filters. For permissions, you can create a new AWS Identity and Access Management (IAM) service role or pass an existing one.

The results are published, as they are evaluated, on Amazon CloudWatch in the AgentCore Observability dashboard. You can choose any of the bar chart sections to see the corresponding traces and gain deeper insight into the requests and responses behind that specific evaluation.

Because the results are in CloudWatch, you can use all of its feature to create, for example, alarms and automations.

Creating custom evaluators in AgentCore Evaluations
Custom evaluators allow you to define business-specific quality metrics tailored to your agent’s unique requirements. To create a custom evaluator, you provide the model to use as a judge, including inference parameters such as temperature and max output tokens, and a tailored prompt with the judging instructions. You can start from the prompt used by one of the built-in evaluators or enter a new one.

Then, you define the scale to produce in output. It can be either numeric values or custom text labels that you define. Finally, you configure whether the evaluation is computed by the model on single traces, full sessions, or for each tool call.

AgentCore Memory episodic functionality for experience-based learning
AgentCore Memory, a fully managed service that gives AI agents the ability to remember past interactions, now includes a new long-term memory strategy that gives agents the ability to learn from past experiences and apply those lessons to provide more helpful assistance in future interactions.

Consider booking travel with an agent: over time, the agent learns from your booking patterns—such as the fact that you often need to move flights to later times when traveling for work due to client meetings. When you start your next booking involving client meetings, the agent proactively suggests flexible return options based on these learned patterns. Just like an experienced assistant who learns your specific travel habits, agents with episodic memory can now recognize and adapt to your individual needs.

When you enable the new episodic functionality, AgentCore Memory captures structured episodes that record the context, reasoning process, actions taken, and outcomes of agent interactions, while a reflection agent analyzes these episodes to extract broader insights and patterns. When facing similar tasks, agents can retrieve these learnings to improve decision-making consistency and reduce processing time. This reduces the need for custom instructions by including in the agent context only the specific learnings an agent needs to complete a task instead of a long list of all possible suggestions.

AgentCore Runtime bidirectional streaming for more natural conversations
With AgentCore Runtime, you can deploy agentic applications with few lines of code. To simplify deploying conversational experiences that feel natural and responsive, AgentCore Runtime now supports bidirectional streaming. This capability enables voice agents to listen and adapt while users speak, so that people can interrupt agents mid-response and have the agent immediately adjust to the new context—without waiting for the agent to finish its current output. Rather than traditional turn-based interaction where users must wait for complete responses, bidirectional streaming creates flowing, natural conversations where agents dynamically change their response based on what the user is saying.

Building these conversational experiences from the ground up requires significant engineering effort to handle the complex flow of simultaneous communication. Bidirectional streaming simplifies this by managing the infrastructure needed for agents to process input while generating output, handling interruptions gracefully, and maintaining context throughout dynamic conversation shifts. You can now deploy agents that naturally adapt to the fluid nature of human conversation—supporting mid-thought interruptions, context switches, and clarifications without losing the thread of the interaction.

Things to know
Amazon Bedrock AgentCore, including the preview of Policy, is available in the US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Mumbai, Singapore, Sydney, Tokyo), and Europe (Frankfurt, Ireland) AWS Regions . The preview of AgentCore Evaluations is available in the US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Sydney), and Europe (Frankfurt) Regions. For Regional availability and future roadmap, visit AWS Capabilities by Region.

With AgentCore, you pay for what you use with no upfront commitments. For detailed pricing information, visit the Amazon Bedrock pricing page. AgentCore is also a part of the AWS Free Tier that new AWS customers can use to get started at no cost and explore key AWS services.

These new features work with any open source framework such as CrewAI, LangGraph, LlamaIndex, and Strands Agents, and with any foundation model. AgentCore services can be used together or independently, and you can get started using your favorite AI-assisted development environment with the AgentCore open source MCP server.

To learn more and get started quickly, visit the AgentCore Developer Guide.

— Danilo

Build multi-step applications and AI workflows with AWS Lambda durable functions

2025-12-02 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/build-multi-step-applications-and-ai-workflows-with-aws-lambda-durable-functions/

Modern applications increasingly require complex and long-running coordination between services, such as multi-step payment processing, AI agent orchestration, or approval processes awaiting human decisions. Building these traditionally required significant effort to implement state management, handle failures, and integrate multiple infrastructure services.

Starting today, you can use AWS Lambda durable functions to build reliable multi-step applications directly within the familiar AWS Lambda experience. Durable functions are regular Lambda functions with the same event handler and integrations you already know. You write sequential code in your preferred programming language, and durable functions track progress, automatically retry on failures, and suspend execution for up to one year at defined points, without paying for idle compute during waits.

AWS Lambda durable functions use a checkpoint and replay mechanism, known as durable execution, to deliver these capabilities. After enabling a function for durable execution, you add the new open source durable execution SDK to your function code. You then use SDK primitives like “steps” to add automatic checkpointing and retries to your business logic and “waits” to efficiently suspend execution without compute charges. When execution terminates unexpectedly, Lambda resumes from the last checkpoint, replaying your event handler from the beginning while skipping completed operations.

Getting started with AWS Lambda durable functions
Let me walk you through how to use durable functions.

First, I create a new Lambda function in the console and select Author from scratch. In the Durable execution section, I select Enable. Note that, durable function setting can only be set during function creation and currently can’t be modified for existing Lambda functions.

After I create my Lambda durable function, I can get started with the provided code.

Lambda durable functions introduces two core primitives that handle state management and recovery:

Steps—The context.step() method adds automatic retries and checkpointing to your business logic. After a step is completed, it will be skipped during replay.
Wait—The context.wait() method pauses execution for a specified duration, terminating the function, suspending and resuming execution without compute charges.

Additionally, Lambda durable functions provides other operations for more complex patterns: create_callback() creates a callback that you can use to await results for external events like API responses or human approvals, wait_for_condition() pauses until a specific condition is met like polling a REST API for process completion, and parallel() or map() operations for advanced concurrency use cases.

Building a production-ready order processing workflow
Now let’s expand the default example to build a production-ready order processing workflow. This demonstrates how to use callbacks for external approvals, handle errors properly, and configure retry strategies. I keep the code intentionally concise to focus on these core concepts. In a full implementation, you could enhance the validation step with Amazon Bedrock to add AI-powered order analysis.

Here’s how the order processing workflow works:

First, validate_order() checks order data to ensure all required fields are present.
Next, send_for_approval() sends the order for external human approval and waits for a callback response, suspending execution without compute charges.
Then, process_order() completes order processing.
Throughout the workflow, try-catch error handling distinguishes between terminal errors that stop execution immediately and recoverable errors inside steps that trigger automatic retries.

Here’s the complete order processing workflow with step definitions and the main handler:

import random
from aws_durable_execution_sdk_python import (
    DurableContext,
    StepContext,
    durable_execution,
    durable_step,
)
from aws_durable_execution_sdk_python.config import (
    Duration,
    StepConfig,
    CallbackConfig,
)
from aws_durable_execution_sdk_python.retries import (
    RetryStrategyConfig,
    create_retry_strategy,
)


@durable_step
def validate_order(step_context: StepContext, order_id: str) -> dict:
    """Validates order data using AI."""
    step_context.logger.info(f"Validating order: {order_id}")
    # In production: calls Amazon Bedrock to validate order completeness and accuracy
    return {"order_id": order_id, "status": "validated"}


@durable_step
def send_for_approval(step_context: StepContext, callback_id: str, order_id: str) -> dict:
    """Sends order for approval using the provided callback token."""
    step_context.logger.info(f"Sending order {order_id} for approval with callback_id: {callback_id}")
    
    # In production: send callback_id to external approval system
    # The external system will call Lambda SendDurableExecutionCallbackSuccess or
    # SendDurableExecutionCallbackFailure APIs with this callback_id when approval is complete
    
    return {
        "order_id": order_id,
        "callback_id": callback_id,
        "status": "sent_for_approval"
    }


@durable_step
def process_order(step_context: StepContext, order_id: str) -> dict:
    """Processes the order with retry logic for transient failures."""
    step_context.logger.info(f"Processing order: {order_id}")
    # Simulate flaky API that sometimes fails
    if random.random() > 0.4:
        step_context.logger.info("Processing failed, will retry")
        raise Exception("Processing failed")
    return {
        "order_id": order_id,
        "status": "processed",
        "timestamp": "2025-11-27T10:00:00Z",
    }


@durable_execution
def lambda_handler(event: dict, context: DurableContext) -> dict:
    try:
        order_id = event.get("order_id")
        
        # Step 1: Validate the order
        validated = context.step(validate_order(order_id))
        if validated["status"] != "validated":
            raise Exception("Validation failed")  # Terminal error - stops execution
        context.logger.info(f"Order validated: {validated}")
        
        # Step 2: Create callback
        callback = context.create_callback(
            name="awaiting-approval",
            config=CallbackConfig(timeout=Duration.from_minutes(3))
        )
        context.logger.info(f"Created callback with id: {callback.callback_id}")
        
        # Step 3: Send for approval with the callback_id
        approval_request = context.step(send_for_approval(callback.callback_id, order_id))
        context.logger.info(f"Approval request sent: {approval_request}")
        
        # Step 4: Wait for the callback result
        # This blocks until external system calls SendDurableExecutionCallbackSuccess or SendDurableExecutionCallbackFailure
        approval_result = callback.result()
        context.logger.info(f"Approval received: {approval_result}")
        
        # Step 5: Process the order with custom retry strategy
        retry_config = RetryStrategyConfig(max_attempts=3, backoff_rate=2.0)
        processed = context.step(
            process_order(order_id),
            config=StepConfig(retry_strategy=create_retry_strategy(retry_config)),
        )
        if processed["status"] != "processed":
            raise Exception("Processing failed")  # Terminal error
        
        context.logger.info(f"Order successfully processed: {processed}")
        return processed
        
    except Exception as error:
        context.logger.error(f"Error processing order: {error}")
        raise error  # Re-raise to fail the execution

This code demonstrates several important concepts:

Error handling—The try-catch block handles terminal errors. When an unhandled exception is thrown outside of a step (like the validation check), it terminates the execution immediately. This is useful when there’s no point in retrying, such as invalid order data.
Step retries—Inside the process_order step, exceptions trigger automatic retries based on the default (step 1) or configured RetryStrategy (step 5). This handles transient failures like temporary API unavailability.
Logging—I use context.logger for the main handler and step_context.logger inside steps. The context logger suppresses duplicate logs during replay.

Now I create a test event with order_id and invoke the function asynchronously to start the order workflow. I navigate to the Test tab and fill in the optional Durable execution name to identify this execution. Note that, durable functions provides built-in idempotency. If I invoke the function twice with the same execution name, the second invocation returns the existing execution result instead of creating a duplicate.

I can monitor the execution by navigating to the Durable executions tab in the Lambda console:

Here I can see each step’s status and timing. The execution shows CallbackStarted followed by InvocationCompleted, which indicates the function has terminated and execution is suspended to avoid idle charges while waiting for the approval callback.

I can now complete the callback directly from the console by choosing Send success or Send failure, or programmatically using the Lambda API.

I choose Send success.

After the callback completes, the execution resumes and processes the order. If the process_order step fails due to the simulated flaky API, it automatically retries based on the configured strategy. Once all retries succeed, the execution completes successfully.

Monitoring executions with Amazon EventBridge
You can also monitor durable function executions using Amazon EventBridge. Lambda automatically sends execution status change events to the default event bus, allowing you to build downstream workflows, send notifications, or integrate with other AWS services.

To receive these events, create an EventBridge rule on the default event bus with this pattern:

{
  "source": ["aws.lambda"],
  "detail-type": ["Durable Execution Status Change"]
}

Things to know
Here are key points to note:

Availability—Lambda durable functions are now available in US East (Ohio) AWS Region. For the latest Region availability, visit the AWS Capabilities by Region page.
Programming language support—At launch, AWS Lambda durable functions supports JavaScript/TypeScript (Node.js 22/24) and Python (3.13/3.14). We recommend bundling the durable execution SDK with your function code using your preferred package manager. The SDKs are fast-moving, so you can easily update dependencies as new features become available.
Using Lambda versions—When deploying durable functions to production, use Lambda versions to ensure replay always happens on the same code version. If you update your function code while an execution is suspended, replay will use the version that started the execution, preventing inconsistencies from code changes during long-running workflows.
Testing your durable functions—You can test durable functions locally without AWS credentials using the separate testing SDK with pytest integration and the AWS Serverless Application Model (AWS SAM) command line interface (CLI) for more complex integration testing.
Open source SDKs—The durable execution SDKs are open source for JavaScript/TypeScript and Python. You can review the source code, contribute improvements, and stay updated with the latest features.
Pricing—To learn more on AWS Lambda durable functions pricing, refer to the AWS Lambda pricing page.

Get started with AWS Lambda durable functions by visiting the AWS Lambda console. To learn more, refer to AWS Lambda durable functions documentation page.

Happy building!

— Donnie

Introducing Database Savings Plans for AWS Databases

2025-12-02 Betty Zheng (郑予彬)

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/introducing-database-savings-plans-for-aws-databases/

Since Amazon Web Services (AWS) introduced Savings Plans, customers have been able to lower the cost of running sustained workloads while maintaining the flexibility to manage usage across accounts, resource types, and AWS Regions. Today, we’re extending this flexible pricing model to AWS managed database services with the launch of Database Savings Plans, which help customers reduce database costs by up to 35% when they commit to a consistent amount of usage ($/hour) over a 1-year term. Savings automatically apply each hour to eligible usage across supported database services, and any additional usage beyond the commitment is billed at on-demand rates.

As organizations build and manage data-driven and AI applications, they often use different database services, engines and deployment types, including instance-based and serverless options, to meet evolving business needs. Database Savings Plans provide the flexibility to choose how workloads run while maintaining cost efficiency. If customers are in the middle of a migration or modernization effort, they can switch database engines and adjust deployment types, such as from provisioned to serverless as part of ongoing cost optimization, while continuing to receive discounted rates. If a customer’s business expands globally, they can also shift usage across AWS Regions and continue to benefit from the same commitment. By applying a consistent hourly commitment, customers can maintain predictable spend even as usage patterns evolve and analyze coverage and utilization using familiar cost management tools.

New Savings Plans
Each plan defines where pricing applies, the range of available discounts, and the level of flexibility provided across supported database engines, instance families, sizes, deployment options, or AWS Regions.

The hourly commitment automatically applies to all eligible usage regardless of Region, with support for Amazon Aurora, Amazon Relational Database Service (Amazon RDS), Amazon DynamoDB, Amazon ElastiCache, Amazon DocumentDB (with MongoDB compatibility), Amazon Neptune, Amazon Keyspaces (for Apache Cassandra), Amazon Timestream, and AWS Database Migration Service (AWS DMS). As new eligible database offerings, instance types, or Regions become available, Savings Plans will automatically apply to that usage.

Discounts vary by deployment model and service type. Serverless deployments provide up to 35% savings compared to on-demand rates. Provisioned instances across supported database services deliver up to 20% savings. For Amazon DynamoDB and Amazon Keyspaces, on-demand throughput workloads receive up to 18% savings, and provisioned capacity offers up to 12%. Together, these savings help customers optimize costs while maintaining consistent coverage for database usage. To learn more about the pricing and eligible usage, visit the Database Savings Plans pricing page.

Purchasing Database Savings Plans
AWS Billing and Cost Management Console helps you choose Savings Plans and guides you through the purchase process. You can get started from the AWS Management Console or use the AWS Command Line Interface (AWS CLI) and the API. There are two ways to evaluate Database Savings Plans purchases, in the Recommendations view and in the Purchase Analyzer.

Recommendations – are automatically generated from your recent on-demand usage. To reach the Recommendations view in the Billing and Cost Management console, choose Savings and Commitments, Savings Plans, and Recommendations in the navigation pane. In the Recommendations view, select Database Savings Plans and configure the Recommendation options. AWS Savings Plans recommendations analyze your historical on-demand usage to identify the hourly commitment that delivers the highest overall savings.

The Purchase Analyzer – is designed for modeling custom commitment levels. If you want to purchase a different amount than the recommended commitment on the Purchase Analyzer page, select Database Savings Plans and configure Lookback period and Hourly commitment to simulate alternative commitment levels and see the projected impact on Cost, Coverage, and Utilization.

This way is preferred if your purchasing strategy includes smaller, incremental commitments over time or if you expect future usage changes that could affect your ideal purchase amount.

After reviewing the recommendations or running simulations in Savings Plans Recommendations or Savings Plans Purchase Analyzer, choose Add to cart to proceed with your chosen commitment. If you prefer to purchase directly, you can also navigate to the Purchase Savings Plans page. The console updates estimated discounts and coverage in real time as you adjust each setting, so you can evaluate the impact before completing your order.

You can learn more about how to choose and purchase Database Saving Plans by visiting the Savings Plans User Guide documents.

Now available
Database Savings Plans are available in all AWS Regions outside of China. Give them a try and start shaping your database strategy with more flexibility and predictable costs.

– Betty

Amazon CloudWatch introduces unified data management and analytics for operations, security, and compliance

2025-12-02 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/amazon-cloudwatch-introduces-unified-data-management-and-analytics-for-operations-security-and-compliance/

Today we’re expanding Amazon CloudWatch capabilities to unify and manage log data across operational, security, and compliance use cases with flexible and powerful analytics in one place and with reduced data duplication and costs.

This enhancement means that CloudWatch can automatically normalize and process data to offer consistency across sources with built-in support for Open Cybersecurity Schema Framework (OCSF) and Open Telemetry (OTel) formats, so you can focus on analytics and insights. CloudWatch also introduces Apache Iceberg compatible access to your data through Amazon Simple Storage Service (Amazon S3) Tables, so that you can run analytics, not only locally but also using Amazon Athena, Amazon SageMaker Unified Studio, or any other Iceberg-compatible tool.

You can also correlate your operational data in CloudWatch with other business data from your preferred tools to correlate with other data. This unified approach streamlines management and provides comprehensive correlation across security, operational, and business use cases.

Here are the detailed enhancements:

Streamline data ingestion and normalization – CloudWatch automatically collects AWS vended logs across accounts and AWS Regions, integrating with AWS Organizations from AWS services including AWS CloudTrail, Amazon Virtual Private Cloud (Amazon VPC) Flow Logs, AWS WAF access logs, Amazon Route 53 resolver logs, and pre-built connectors for third-party sources such as endpoint (CrowdStrike, SentinelOne), identity (Okta, Entra ID), cloud security (Wiz), network security (Zscaler, Palo Alto Networks), productivity and collaboration (Microsoft Office 365, Windows Event Logs, and GitHub), along with IT service manager with ServiceNow CMBD. To normalize and process your data as they are being ingested, CloudWatch offers managed OCSF conversion for various AWS and third-party data sources and other processors such ad Grok for custom parsing, field-level operations, and string manipulations.
Reduce costly log data management – CloudWatch consolidates log management into a single service with built-in governance capabilities without storing and maintaining multiple copies of the same data across different tools and data stores. The unified data store of CloudWatch eliminates the need for complex ETL pipelines and reduces your operational costs and management overhead needed to maintain multiple separate data stores and tools.
Discover business insights from log data – You can run queries in CloudWatch using natural language queries and popular query languages such as LogsQL, PPL, and SQL through a single interface, or query your data using your preferred analytics tools through Apache Iceberg-compatible tables. The new Facets interface gives you intuitive filtering by source, application, account, region, and log type, which you can use to run queries across log groups of multiple AWS accounts and Regions with intelligent parameter inference.

In the next sections we explore the new log management and analytics features of the CloudWatch Logs!

1. Data discovery and management by data sources and types

You can see a high-level overview of logs and all data sources with a new Logs Management View in the CloudWatch console. To get started, go to the CloudWatch console and choose Log Management under the Logs menu in the left navigation pane. In the Summary tab, you can observe your logs data sources and types, insights into how your log groups are doing across ingestion, and anomalies.

Choose the Data sources tab to find and manage your log data by data sources, types, and fields. CloudWatch ingests and automatically categorizes data sources by AWS services, third-party, or custom sources such as application logs.

Choose the Data source actions to integrate S3 Tables to make future logs for selected data sources. You have the flexibility to analyze the logs through Athena and Amazon Redshift and other query engines such as Spark using Iceberg compatible access patterns. With this integration, logs from CloudWatch are available in a read-only aws-cloudwatch S3 Tables bucket.

When you choose a specific data source such as CloudTrail data, you can view the details of the data source that includes information regarding data format, pipeline, facets/field indexes, S3 Tables association, and the number of logs with that data source. You can observe all log groups included in this data source and type and edit a source/type field index policy using the new schema support.

To learn more about how to manage your data sources and index policy, visit Data sources in the Amazon CloudWatch Logs User Guide.

2. Ingestion and transformation using CloudWatch pipelines

You can create pipelines to streamline collecting, transforming, and routing telemetry and security data while standardizing data formats to optimize observability and security data management. The new pipeline feature of CloudWatch connects data from a catalogue of data sources, so that you can add and configure pipeline processors from a library to parse, enrich, and standardize data.

In the Pipeline tab, choose Add pipeline. It shows you the pipeline configuration wizard. This wizard guides you through five steps where you can choose the data source and other source details such as log source types, configure destination, configure up to 19 processors to perform an action on your data (such as filtering, transforming, or enriching), and finally review and deploy the pipeline.

You also have the option to create pipelines through the new Ingestion experience in CloudWatch. To learn more about how to set up and manage the pipelines, visit Pipelines in the Amazon CloudWatch Logs User Guide.

3. Enhanced analytics and querying based on data sources

You can enhance analytics with support for Facets and querying based on data sources. Facets enable interactive exploration and drill-down into logs and their values are automatically extracted based on the selected time period.

Choose the Facets tab in the Log Insights under the Logs menu in the left navigation pane. You can view available facets and values that appear in the panel. Choose one or more facets and values to interactively explore your data. I choose Facets regarding a VPC Flow Logs group and action, query to list the five most frequent patterns in my VPC Flow Logs through the AI query generator, and get the result patterns.

You can save your query with the selected Facets and values that you have specified. When you next choose your saved query, the logs to be queried have the pre-specified facets and values. To learn more about Facet management, visit Facets in the CloudWatch Logs User Guide.

As I previously noted, you can integrate data sources into S3 Tables and query together. For example, using a Query Editor in Athena, you can query correlates network traffic with AWS API activity from a specific IP range (174.163.137.*) by joining VPC Flow Logs with CloudTrail logs based on matching source IP addresses.

This type of integrated search is particularly valuable for security monitoring, incident investigation, and suspicious behavior detection. You can view if an IP that’s making network connections is also performing sensitive AWS operations such as creating users, modifying security groups, or accessing data.

To learn more, visit S3 Tables integration with CloudWatch in the CloudWatch Logs User Guide.

Now available
New log management features of Amazon CloudWatch are available today in all AWS Regions except the AWS GovCloud (US) Regions and China Regions. For Regional availability and future roadmap, visit the AWS Capabilities by Region. There are no upfront commitments or minimum fees, and you pay for the usage of existing CloudWatch Logs for data ingestion, storage, and queries. To learn more, visit the CloudWatch pricing page.

Give it a try in the CloudWatch console. To learn more, visit the CloudWatch product page and send feedback to AWS re:Post for CloudWatch Logs or through your usual AWS Support contacts.

— Channy

New and enhanced AWS Support plans add AI capabilities to expert guidance

2025-12-02 Matheus Guimaraes

Post Syndicated from Matheus Guimaraes original https://aws.amazon.com/blogs/aws/new-and-enhanced-aws-support-plans-add-ai-capabilities-to-expert-guidance/

Today, we’re announcing a fundamental shift in how AWS Support helps customers move from reactive problem-solving to proactive issue prevention. This evolution introduces new Support plans that combine AI-powered capabilities with Amazon Web Services (AWS) expertise. The new and enhanced plans help you identify and address potential issues before they impact your business operations, helping you to operate and optimize your cloud workloads more effectively.

The portfolio includes three plans designed to match different operational needs. Each plan offers distinct capabilities, with higher tiers including all the capabilities of lower tiers plus additional features and enhanced service levels. Let’s have a look at them.

New and enhanced AWS Support paid plans
Business Support+ transforms the developer, startup, and small business experience by providing intelligent assistance powered by AI. You can choose to engage directly with AWS experts or start with AI-powered contextual recommendations that seamlessly transition to AWS experts when needed. AWS experts respond in within 30 minutes for critical cases(twice as fast as before), maintaining previous context and saving you from having to repeat yourself.

With a low-cost monthly subscription, this plan delivers advanced operational capabilities through a combination of AI-powered tools and AWS expertise. The plan provides personalized recommendations to help optimize your workloads based on your specific environment, while maintaining seamless access to AWS experts for technical support when needed.

Enterprise Support builds on our established support model, this tier accelerates innovation and cloud operations success through intelligent operations and AI-powered trusted human guidance. Your designated technical account manager (TAM) combines deep AWS knowledge with data-driven insights from your environment to help identify optimization opportunities and potential risks before they impact your operations. The plan also offers access to AWS Security Incident Response at no additional fee, a comprehensive service that centralizes tracking, storage, and management of security events while providing automated monitoring and investigation capabilities to strengthen your security posture.

Through AI-powered assistance and continuous monitoring of your AWS environment, this tier helps you achieve new levels of scale in your operations. With up to 15-minute response times for production-critical issues and support engineers who receive personalized context delivered by AI agents, this tier enables faster and more personalized resolution while maintaining operational excellence. Additionally, you also get access to interactive programs and hands-on workshops to foster continuous technical growth.

Unified Operations Support delivers our highest level of context-aware Support through an expanded team of AWS experts. Your core team comprised of a Technical Account Manager, a Domain Engineer, and a designated Senior Billing and Account Specialist is complemented by on-demand experts in migration, incident management, and security. These designated experts understand your unique environment and operational history, providing guidance through your preferred collaboration channels while combining their architectural knowledge with AI-powered insights.

Through comprehensive around-the-clock monitoring and AI-powered automation, this tier strengthens your mission-critical operations with proactive risk identification and contextual guidance. When critical incidents occur, you receive 5-minute response times with technical recommendations provided by Support engineers who understand your workloads. The team conducts systematic application reviews, helps validate operational readiness, and supports business-critical events, which means you can focus on innovation while maintaining the highest levels of operational excellence.

Transforming your cloud operations
AWS Support is evolving to help you build, operate, and optimize your cloud infrastructure more effectively. We maintain context of your account’s support history and previous cases, configuration, and previous cases, so our AI-powered capabilities and AWS experts can deliver more relevant and effective solutions tailored to your specific environment.

Support plan capabilities will continuously evolve to add comprehensive visibility into your infrastructure, delivering actionable insights across performance, security, and cost dimensions with clear evaluation of business impact and cost benefits. This combination of AI-powered tools and AWS expertise represents a fundamental shift from reactive to proactive operations, helping you prevent issues before they impact your business.

Subscribers of AWS Developer Support, AWS Business Support (classic), and AWS Enterprise On-Ramp Support plans can continue to receive their current level of support through January 1, 2027. You can transition to one of the new and enhanced plans at any time before then by visiting the AWS Management Console or by reaching out to your AWS account team. Customers subscribed to AWS Enterprise Support can begin using the new features of this plan at any time.

Things to know
Business Support+, Enterprise Support, and Unified Operations are available in all commercial AWS Regions. Existing customers can continue their current plans or explore the new offerings for enhanced performance and efficiency.

Business Support+ starts at $29 per month, a 71% savings over the previous Business Support monthly minimum. Enterprise Support starts at $5,000 per month, a 67% savings over the previous Enterprise Support minimum price. Unified Operations, designed for organizations with mission-critical workloads and including a designated team of AWS experts, starts at $50,000 a month. All new Support plans use pricing tiers, which rewards higher usage with lower marginal prices for Support.

For critical cases, AWS Support provides different target response times across the plans. Business Support+ offers a 30-minute response time, Enterprise Support responds within 15 minutes, and Unified Operations Support delivers the fastest response time at 5 minutes.

To learn more about AWS Support plans and features, visit the AWS Support page or sign in to the AWS Management Console.

For hands-on guidance with AWS Support features, schedule a consultation with your account team.

Amazon OpenSearch Service improves vector database performance and cost with GPU acceleration and auto-optimization

2025-12-02 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/amazon-opensearch-service-improves-vector-database-performance-and-cost-with-gpu-acceleration-and-auto-optimization/

Today we’re announcing serverless GPU acceleration and auto-optimization for vector index in Amazon OpenSearch Service that helps you build large-scale vector databases faster with lower costs and automatically optimize vector indexes for optimal trade-offs between search quality, speed, and cost.

Here are the new capabilities introduced today:

GPU acceleration – You can build vector databases up to 10 times faster at a quarter of the indexing cost when compared to non-GPU acceleration, and you can create billion-scale vector databases in under an hour. With significant gains in cost saving and speed, you get an advantage in time-to-market, innovation velocity, and adoption of vector search at scale.
Auto-optimization – You can find the best balance between search latency, quality, and memory requirements for your vector field without needing vector expertise. This optimization helps you achieve better cost-savings and recall rates when compared to default index configurations, while manual index tuning can take weeks to complete.

You can use these capabilities to build vector databases faster and more cost-effectively on OpenSearch Service. You can use them to power generative AI applications, search product catalogs and knowledge bases, and more. You can enable GPU acceleration and auto-optimization when you create a new OpenSearch domain or collection, as well as update an existing domain or collection.

Let’s go through how it works!

GPU acceleration for vector index
When you enable GPU acceleration on your OpenSearch Service domain or Serverless collection, OpenSearch Service automatically detects opportunities to accelerate your vector indexing workloads. This acceleration helps build the vector data structures in your OpenSearch Service domain or Serverless collection.

You don’t need to provision the GPU instances, manage their usage or pay for idle time. OpenSearch Service securely isolates your accelerated workloads to your domain’s or collection’s Amazon Virtual Private Cloud (Amazon VPC) within your account. You pay only for useful processing through the OpenSearch Compute Units (OCU) – Vector Acceleration pricing.

To enable GPU acceleration, go to the OpenSearch Service console and choose Enable GPU Acceleration in the Advanced features section when you create or update your OpenSearch Service domain or Serverless collection.

You can use the following AWS Command Line Interface (AWS CLI) command to enable GPU acceleration for an existing OpenSearch Service domain.

$ aws opensearch update-domain-config \
    --domain-name my-domain \
    --aiml-options '{"ServerlessVectorAcceleration": {"Enabled": true}}'

You can create a vector index optimized for GPU processing. This example index stores 768-dimensional vectors for text embeddings by enabling index.knn.remote_index_build.enabled.

PUT my-vector-index
{
    "settings": {
        "index.knn": true,
        "index.knn.remote_index_build.enabled": true
    },
    "mappings": {
        "properties": {
        "vector_field": {
        "type": "knn_vector",
        "dimension": 768,
      },
      "text": {
        "type": "text"
      }
    }
  }
}

Now you can add vector data and optimize your index using standard OpenSearch Service operations using the bulk API. The GPU acceleration is automatically applied to indexing and force-merge operations.

POST my-vector-index/_bulk
{"index": {"_id": "1"}}
{"vector_field": [0.1, 0.2, 0.3, ...], "text": "Sample document 1"}
{"index": {"_id": "2"}}
{"vector_field": [0.4, 0.5, 0.6, ...], "text": "Sample document 2"}

We ran index build benchmarks and observed speed gains from GPU acceleration ranging between 6.4 to 13.8 times. Stay tuned for more benchmarks and further details in upcoming posts.

To learn more, visit GPU acceleration for vector indexing in the Amazon OpenSearch Service Developer Guide.

Auto-optimizing vector databases
You can use the new vector ingestion feature to ingest documents from Amazon Simple Storage Service (Amazon S3), generate vector embeddings, optimize indexes automatically, and build large-scale vector indexes in minutes. During the ingestion, auto-optimization generates recommendations based on your vector fields and indexes of your OpenSearch Service domain or Serverless collection. You can choose one of these recommendations to quickly ingest and index your vector dataset instead of manually configuring these mappings.

To get started, choose Vector ingestion under the Ingestion menu in the left navigation pane of OpenSearch Service console.

You can create a new vector ingestion job with the following steps:

Prepare dataset – Prepare OpenSearch Service parquet documents in an S3 bucket and choose a domain or collection for your destination.
Configure index and automate optimizations – Auto-optimize your vector fields or manually configure them.
Ingest and accelerate indexing – Use OpenSearch ingestion pipelines to load data from Amazon S3 into OpenSearch Service. Build large vector indexes up to 10 times faster at a quarter of the cost.

In Step 2, configure your vector index with auto-optimize vector field. Auto-optimize is currently limited to one vector field. Further index mappings can be input after the auto-optimization job has completed.

Your vector field optimization settings depend on your use case. For example, if you need high search quality (recall rate) and don’t need faster responses, then choose Modest for the Latency requirements (p90) and more than or equal to 0.9 for the Acceptable search quality (recall). When you create a job, it starts to ingest vector data and auto-optimize vector index. The processing time depends on the vector dimensionality.

To learn more, visit Auto-optimize vector index in the OpenSearch Service Developer Guide.

Now available
GPU acceleration in Amazon OpenSearch Service is now available in the US East (N. Virginia), US West (Oregon), Asia Pacific (Sydney), Asia Pacific (Tokyo), and Europe (Ireland) Regions. Auto-optimization in OpenSearch Service is now available in the US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Paciﬁc (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland) Regions.

OpenSearch Service separately charges for used OCU – Vector Acceleration only to index your vector databases. For more information, visitOpenSearch Service pricing page.

Give it a try and send feedback to the AWS re:Post for Amazon OpenSearch Service or through your usual AWS Support contacts.

— Channy

Amazon S3 Vectors now generally available with increased scale and performance

2025-12-02 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/amazon-s3-vectors-now-generally-available-with-increased-scale-and-performance/

Today, I’m excited to announce that Amazon S3 Vectors is now generally available with significantly increased scale and production-grade performance capabilities. S3 Vectors is the first cloud object storage with native support to store and query vector data. You can use it to help you reduce the total cost of storing and querying vectors by up to 90% when compared to specialized vector database solutions.

Since we announced the preview of S3 Vectors in July, I’ve been impressed by how quickly you adopted this new capability to store and query vector data. In just over four months, you created over 250,000 vector indexes and ingested more than 40 billion vectors, performing over 1 billion queries (as of November 28th).

You can now store and search across up to 2 billion vectors in a single index, that’s up to 20 trillion vectors in a vector bucket and a 40x increase from 50 million per index during preview. This means that you can consolidate your entire vector dataset into one index, removing the need to shard across multiple smaller indexes or implement complex query federation logic.

Query performance has been optimized. Infrequent queries continue to return results in under one second, with more frequent queries now resulting in latencies around 100ms or less, making it well-suited for interactive applications such as conversational AI and multi-agent workflows. You can also retrieve up to 100 search results per query, up from 30 previously, providing more comprehensive context for retrieval augmented generation (RAG) applications.

The write performance has also improved substantially, with support for up to 1,000 PUT transactions per second when streaming single-vector updates into your indexes, delivering significantly higher write throughput for small batch sizes. This higher throughput supports workloads where new data must be immediately searchable, helping you ingest small data corpora quickly or handle many concurrent sources writing simultaneously to the same index.

The fully serverless architecture removes infrastructure overhead—there’s no infrastructure to set up or resources to provision. You pay for what you use as you store and query vectors. This AI-ready storage provides you with quick access to any amount of vector data to support your complete AI development lifecycle, from initial experimentation and prototyping through to large-scale production deployments. S3 Vectors now provides the scale and performance needed for production workloads across AI agents, inference, semantic search, and RAG applications.

Two key integrations that were launched in preview are now generally available. You can use S3 Vectors as a vector storage engine for Amazon Bedrock Knowledge Base. In particular, you can use it to build RAG applications with production-grade scale and performance. Moreover, S3 Vectors integration with Amazon OpenSearch is now generally available, so that you can use S3 Vectors as your vector storage layer while using OpenSearch for search and analytics capabilities.

You can now use S3 Vectors in 14 AWS Regions, expanding from five AWS Regions during the preview.

Let’s see how it works
In this post, I demonstrate how to use S3 Vectors through the AWS Console and CLI.

First, I create an S3 Vector bucket and an index.

echo "Creating S3 Vector bucket..."
aws s3vectors create-vector-bucket \
    --vector-bucket-name "$BUCKET_NAME"

echo "Creating vector index..."
aws s3vectors create-index \
    --vector-bucket-name "$BUCKET_NAME" \
    --index-name "$INDEX_NAME" \
    --data-type "float32" \
    --dimension "$DIMENSIONS" \
    --distance-metric "$DISTANCE_METRIC" \
    --metadata-configuration "nonFilterableMetadataKeys=AMAZON_BEDROCK_TEXT,AMAZON_BEDROCK_METADATA"

The dimension metric must match the dimension of the model used to compute the vectors. The distance metric indicates to the algorithm to compute the distance between vectors. S3 Vectors supports cosine and euclidian distances.

I can also use the console to create the bucket. We’ve added the capability to configure encryption parameters at creation time. By default, indexes use the bucket-level encryption, but I can override bucket-level encryption at the index level with a custom AWS Key Management Service (AWS KMS) key.

I also can add tags for the vector bucket and vector index. Tags at the vector index help with access control and cost allocation.

And I can now manage Properties and Permissions directly in the console.

Similarly, I define Non-filterable metadata and I configure Encryption parameters for the vector index.

Next, I create and store the embeddings (vectors). For this demo, I ingest my constant companion: the AWS Style Guide. This is an 800-page document that describes how to write posts, technical documentation, and articles at AWS.

I use Amazon Bedrock Knowledge Bases to ingest the PDF document stored on a general purpose S3 bucket. Amazon Bedrock Knowledge Bases reads the document and splits it in pieces called chunks. Then, it computes the embeddings for each chunk with the Amazon Titan Text Embeddings model and it stores the vectors and their metadata on my newly created vector bucket. The detailed steps for that process are out of the scope of this post, but you can read the instructions in the documentation.

When querying vectors, you can store up to 50 metadata keys per vector, with up to 10 marked as non-filterable. You can use the filterable metadata keys to filter query results based on specific attributes. Therefore, you can combine vector similarity search with metadata conditions to narrow down results. You can also store more non-filterable metadata for larger contextual information. Amazon Bedrock Knowledge Bases computes and stores the vectors. It also adds large metadata (the chunk of the original text). I exclude this metadata from the searchable index.

There are other methods to ingest your vectors. You can try the S3 Vectors Embed CLI, a command line tool that helps you generate embeddings using Amazon Bedrock and store them in S3 Vectors through direct commands. You can also use S3 Vectors as a vector storage engine for OpenSearch.

Now I’m ready to query my vector index. Let’s imagine I wonder how to write “open source”. Is it “open-source”, with a hyphen, or “open source” without a hyphen? Should I use uppercase or not? I want to search the relevant sections of the AWS Style Guide relative to “open source.”

# 1. Create embedding request
echo '{"inputText":"Should I write open source or open-source"}' | base64 | tr -d '\n' > body_encoded.txt

# 2. Compute the embeddings with Amazon Titan Embed model
aws bedrock-runtime invoke-model \
  --model-id amazon.titan-embed-text-v2:0 \
  --body "$(cat body_encoded.txt)" \
  embedding.json

# Search the S3 Vectors index for similar chunks
vector_array=$(cat embedding.json | jq '.embedding') && \
aws s3vectors query-vectors \
  --index-arn "$S3_VECTOR_INDEX_ARN" \
  --query-vector "{\"float32\": $vector_array}" \
  --top-k 3 \
  --return-metadata \
  --return-distance | jq -r '.vectors[] | "Distance: \(.distance) | Source: \(.metadata."x-amz-bedrock-kb-source-uri" | split("/")[-1]) | Text: \(.metadata.AMAZON_BEDROCK_TEXT[0:100])..."'

The first result shows this JSON:

        {
            "key": "348e0113-4521-4982-aecd-0ee786fa4d1d",
            "metadata": {
                "x-amz-bedrock-kb-data-source-id": "0SZY6GYPVS",
                "x-amz-bedrock-kb-source-uri": "s3://sst-aws-docs/awsstyleguide.pdf",
                "AMAZON_BEDROCK_METADATA": "{\"createDate\":\"2025-10-21T07:49:38Z\",\"modifiedDate\":\"2025-10-23T17:41:58Z\",\"source\":{\"sourceLocation\":\"s3://sst-aws-docs/awsstyleguide.pdf\"",
                "AMAZON_BEDROCK_TEXT": "[redacted] open source (adj., n.) Two words. Use open source as an adjective (for example, open source software), or as a noun (for example, the code throughout this tutorial is open source). Don't use open-source, opensource, or OpenSource. [redacted]",
                "x-amz-bedrock-kb-document-page-number": 98.0
            },
            "distance": 0.63120436668396
        }

It finds the relevant section in the AWS Style Guide. I must write “open source” without a hyphen. It even retrieved the page number in the original document to help me cross-check the suggestion with the relevant paragraph in the source document.

One more thing
S3 Vectors has also expanded its integration capabilities. You can now use AWS CloudFormation to deploy and manage your vector resources, AWS PrivateLink for private network connectivity, and resource tagging for cost allocation and access control.

Pricing and availability
S3 Vectors is now available in 14 AWS Regions, adding Asia Pacific (Mumbai, Seoul, Singapore, Tokyo), Canada (Central), and Europe (Ireland, London, Paris, Stockholm) to the existing five Regions from preview (US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Sydney), and Europe (Frankfurt))

Amazon S3 Vectors pricing is based on three dimensions. PUT pricing is calculated based on the logical GB of vectors you upload, where each vector includes its logical vector data, metadata, and key. Storage costs are determined by the total logical storage across your indexes. Query charges include a per-API charge plus a $/TB charge based on your index size (excluding non-filterable metadata). As your index scales beyond 100,000 vectors, you benefit from lower $/TB pricing. As usual, the Amazon S3 pricing page has the details.

To get started with S3 Vectors, visit the Amazon S3 console. You can create vector indexes, start storing your embeddings, and begin building scalable AI applications. For more information, check out the Amazon S3 User Guide or the AWS CLI Command Reference.

I look forward to seeing what you build with these new capabilities. Please share your feedback through AWS re:Post or your usual AWS Support contacts.

— seb

Amazon Bedrock adds 18 fully managed open weight models, including the new Mistral Large 3 and Ministral 3 models

2025-12-02 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/amazon-bedrock-adds-fully-managed-open-weight-models/

Today, we’re announcing the general availability of an additional 18 fully managed open weight models in Amazon Bedrock from Google, Moonshot AI, MiniMax AI, Mistral AI, NVIDIA, OpenAI, and Qwen, including the new Mistral Large 3 and Ministral 3 3B, 8B, and 14B models.

With this launch, Amazon Bedrock now provides nearly 100 serverless models, offering a broad and deep range of models from leading AI companies, so customers can choose the precise capabilities that best serve their unique needs. By closely monitoring both customer needs and technological advancements, we regularly expand our curated selection of models based on customer needs and technological advancements to include promising new models alongside established industry favorites.

This ongoing expansion of high-performing and differentiated model offerings helps customers stay at the forefront of AI innovation. You can access these models on Amazon Bedrock through the unified API, evaluate, switch, and adopt new models without rewriting applications or changing infrastructure.

New Mistral AI models
These four Mistral AI models are now available first on Amazon Bedrock, each optimized for different performance and cost requirements:

Mistral Large 3 – This open weight model is optimized for long-context, multimodal, and instruction reliability. It excels in long document understanding, agentic and tool use workflows, enterprise knowledge work, coding assistance, advanced workloads such as math and coding tasks, multilingual analysis and processing, and multimodal reasoning with vision.
Ministral 3 3B – The smallest in the Ministral 3 family is edge-optimized for single GPU deployment with strong language and vision capabilities. It shows robust performance in image captioning, text classification, real-time translation, data extraction, short content generation, and lightweight real-time applications on edge or low-resource devices.
Ministral 3 8B – The best-in-class Ministral 3 model for text and vision is edge-optimized for single GPU deployment with high performance and minimal footprint. This model is ideal for chat interfaces in constrained environments, image and document description and understanding, specialized agentic use cases, and balanced performance for local or embedded systems.
Ministral 3 14B – The most capable Ministral 3 model delivers state-of the-art text and vision performance optimized for single GPU deployment. You can use advanced local agentic use cases and private AI deployments where advanced capabilities meet practical hardware constraints.

More open weight model options
You can use these open weight models for a wide range of use cases across industries:

Model provider	Model name	Description	Use cases
Google	Gemma 3 4B	Efficient text and image model that runs locally on laptops. Multilingual support for on-device AI applications.	On-device AI for mobile and edge applications, privacy-sensitive local inference, multilingual chat assistants, image captioning and description, and lightweight content generation.
	Gemma 3 12B	Balanced text and image model for workstations. Multi-language understanding with local deployment for privacy-sensitive applications.	Workstation-based AI applications; local deployment for enterprises; multilingual document processing, image analysis and Q&A; and privacy-compliant AI assistants.
	Gemma 3 27B	Powerful text and image model for enterprise applications. Multi-language support with local deployment for privacy and control.	Enterprise local deployment, high-performance multimodal applications, advanced image understanding, multilingual customer service, and data-sensitive AI workflows.
Moonshot AI	Kimi K2 Thinking	Deep reasoning model that thinks while using tools. Handles research, coding and complex workflows requiring hundreds of sequential actions.	Complex coding projects requiring planning, multistep workflows, data analysis and computation, and long-form content creation with research.
MiniMax AI	MiniMax M2	Built for coding agents and automation. Excels at multi-file edits, terminal operations and executing long tool-calling chains efficiently.	Coding agents and integrated development environment (IDE) integration, multi-file code editing, terminal automation and DevOps, long-chain tool orchestration, and agentic software development.
Mistral AI	Magistral Small 1.2	Excels at math, coding, multilingual tasks, and multimodal reasoning with vision capabilities for efficient local deployment.	Math and coding tasks, multilingual analysis and processing, and multimodal reasoning with vision.
	Voxtral Mini 1.0	Advanced audio understanding model with transcription, multilingual support, Q&A, summarization, and function-calling.	Voice-controlled applications, fast speech-to-text conversion, and offline voice assistants.
	Voxtral Small 1.0	Features state-of-the-art audio input with best-in-class text performance; excels at speech transcription, translation, and understanding.	Enterprise speech transcription, multilingual customer service, and audio content summarization.
NVIDIA	NVIDIA Nemotron Nano 2 9B	High efficiency LLM with hybrid transformer Mamba design, excelling in reasoning and agentic tasks.	Reasoning, tool calling, math, coding, and instruction following.
NVIDIA	NVIDIA Nemotron Nano 2 VL 12B	Advanced multimodal reasoning model for video understanding and document intelligence, powering Retrieval-Augmented Generation (RAG) and multimodal agentic applications.	Multi-image and video understanding, visual Q&A, and summarization.
OpenAI	gpt-oss-safeguard-20b	Content safety model that applies your custom policies. Classifies harmful content with explanations for trust and safety workflows.	Content moderation and safety classification, custom policy enforcement, user-generated content filtering, trust and safety workflows, and automated content triage.
OpenAI	gpt-oss-safeguard-120b	Larger content safety model for complex moderation. Applies custom policies with detailed reasoning for enterprise trust and safety teams.	Enterprise content moderation at scale, complex policy interpretation, multilayered safety classification, regulatory compliance checking, high-stakes content review.
Qwen	Qwen3-Next-80B-A3B	Fast inference with hybrid attention for ultra-long documents. Optimized for RAG pipelines, tool use & agentic workflows with quick responses.	RAG pipelines with long documents, agentic workflows with tool calling, code generation and software development, multi-turn conversations with extended context, multilingual content generation.
Qwen	Qwen3-VL-235B-A22B	Understands images and video. Extracts text from documents, converts screenshots to working code, and automates clicking through interfaces.	Extracting text from images and PDFs, converting UI designs or screenshots to working code, automating clicks and navigation in applications, video analysis and understanding, reading charts and diagrams.

When implementing publicly available models, give careful consideration to data privacy requirements in your production environments, check for bias in output, and monitor your results for data security, responsible AI, and model evaluation.

You can access the enterprise-grade security features of Amazon Bedrock and implement safeguards customized to your application requirements and responsible AI policies with Amazon Bedrock Guardrails. You can also evaluate and compare models to identify the optimal models for your use cases by using Amazon Bedrock model evaluation tools.

To get started, you can quickly test these models with a few prompts in the playground of the Amazon Bedrock console or use any AWS SDKs to include access to the Bedrock InvokeModel and Converse APIs. You can also use these models with any agentic framework that supports Amazon Bedrock and deploy the agents using Amazon Bedrock AgentCore and Strands Agents. To learn more, visit Code examples for Amazon Bedrock using AWS SDKs in the Amazon Bedrock User Guide.

Now available
Check the full Region list for availability and future updates of new models or search your model name in the AWS CloudFormation resources tab of AWS Capabilities by Region. To learn more, check out the Amazon Bedrock product page and the Amazon Bedrock pricing page.

Give these models a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

— Channy

Introducing Amazon EC2 X8aedz instances powered by 5th Gen AMD EPYC processors for memory-intensive workloads

2025-12-02 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/introducing-amazon-ec2-x8aedz-instances-powered-by-5th-gen-amd-epyc-processors-for-memory-intensive-workloads/

Today, we’re announcing the availability of new memory-optimized, high-frequency Amazon Elastic Compute Cloud (Amazon EC2) X8aedz instances powered by a 5th Gen AMD EPYC processor. These instances offer the highest CPU frequency, 5GHz in the cloud. They deliver up to two times higher compute performance and 31% price-performance compared to previous generation X2iezn instances.

X8aedz instances are ideal for electronic design automation (EDA) workloads, such as physical layout and physical verification jobs, and relational databases that benefit from high single-threaded processor performance and a large memory footprint. The combination of 5 GHz processors and local NVMe storage enables faster processing of memory-intensive backend EDA workloads such as floor planning, logic placement, clock tree synthesis (CTS), routing, and power/signal integrity analysis. The high memory-to-vCPU ratio of 32:1 makes these instances particularly effective for applications with vCPU-based licensing models.

Let me explain the instance type naming: The “a” suffix indicates an AMD processor, “e” denotes extended memory in the memory-optimized instance family, “d” represents local NVMe-based SSDs physically connected to the host server, and “z” indicates high-frequency processors.

X8aedz instances
X8aedz instances are available in eight sizes ranging from 2–96 vCPUs with 64–3,072 GiB of memory, including two bare metal sizes. X8aedz instances feature up to 75 Gbps of network bandwidth with support for the Elastic Fabric Adapter (EFA), up to 60 Gbps of throughput to the Amazon Elastic Block Store (Amazon EBS), and up to 8 TB of local NVMe SSD storage.

Here are the specs for X8aedz instances:

Instance name	vCPUs	Memory (GiB)	NVMe SSD storage (GB)	Network bandwidth (Gbps)	EBS bandwidth (Gbps)
x8aedz.large	2	64	158	Up to 18.75	Up to 15
x8aedz.xlarge	4	128	316	Up to 18.75	Up to 15
x8aedz.3xlarge	12	384	950	Up to 18.75	Up to 15
x8aedz.6xlarge	24	768	1,900	18.75	15
x8aedz.12xlarge	48	1,536	3,800	37.5	30
x8aedz.24xlarge	96	3,072	7,600	75	60
x8aedz.metal-12xl	48	1,536	3,800	37.5	30
x8aedz.metal-24xl	96	3,072	7,600	75	60

With the 60 Gbps Amazon EBS bandwidth and up to 8 TB of local NVMe SSD storage, you can achieve faster database response times and reduced latency for EDA operations, ultimately accelerating time-to-market for chip designs. These instances also support the instance bandwidth configuration feature that offers flexibility in allocating resources between network and EBS bandwidth. You can scale network or EBS bandwidth by 25% and improve database (read and write) performance, query processing, and logging speeds.

X8aedz instances use sixth-generation AWS Nitro cards, which offload CPU virtualization, storage, and networking functions to dedicated hardware and software, enhancing performance and security for your workloads.

Now available
Amazon EC2 X8aedz instances are now available in US West (Oregon) and Asia Pacific (Tokyo) AWS Regions, and additional Regions will be coming soon. For Regional availability and future roadmap, search the instance type in the AWS CloudFormation resources tab of the AWS Capabilities by Region.

You can purchase these instances as On-Demand, Savings Plan, Spot Instances, and Dedicated Instances. To learn more, visit the Amazon EC2 Pricing page.

Give X8aedz instances a try in the Amazon EC2 console. To learn more, visit the Amazon EC2 X8aedz instances page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

— Channy

AWS DevOps Agent helps you accelerate incident response and improve system reliability (preview)

2025-12-02 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-devops-agent-helps-you-accelerate-incident-response-and-improve-system-reliability-preview/

Today, we’re announcing the public preview of AWS DevOps Agent, a frontier agent that helps you respond to incidents, identify root causes, and prevent future issues through systematic analysis of past incidents and operational patterns.

Frontier agents represent a new class of AI agents that are autonomous, massively scalable, and work for hours or days without constant intervention.

When production incidents occur, on-call engineers face significant pressure to quickly identify root causes while managing stakeholder communications. They must analyze data across multiple monitoring tools, review recent deployments, and coordinate response teams. After service restoration, teams often lack bandwidth to transform incident learnings into systematic improvements.

AWS DevOps Agent is your always-on, autonomous on-call engineer. When issues arise, it automatically correlates data across your operational toolchain, from metrics and logs to recent code deployments in GitHub or GitLab. It identifies probable root causes and recommends targeted mitigations, helping reduce mean time to resolution. The agent also manages incident coordination, using Slack channels for stakeholder updates and maintaining detailed investigation timelines.

To get started, you connect AWS DevOps Agent to your existing tools through the AWS Management Console. The agent works with popular services such as Amazon CloudWatch, Datadog, Dynatrace, New Relic, and Splunk for observability data, while integrating with GitHub Actions and GitLab CI/CD to track deployments and their impact on your cloud resources. Through the bring your own (BYO) Model Context Protocol (MCP) server capability, you can also integrate additional tools such as your organization’s custom tools, specialized platforms or open source observability solutions, such as Grafana and Prometheus into your investigations.

The agent acts as a virtual team member and can be configured to automatically respond to incidents from your ticketing systems. It includes built-in support for ServiceNow, and through configurable webhooks, can respond to events from other incident management tools like PagerDuty. As investigations progress, the agent updates tickets and relevant Slack channels with its findings. All of this is powered by an intelligent application topology the agent builds—a comprehensive map of your system components and their interactions, including deployment history that helps identify potential deployment-related causes during investigations.

Let me show you how it works
To show you how it works, I deployed a straigthforward AWS Lambda function that intentionally generates errors when invoked. I deployed it in an AWS CloudFormation stack.

Step 1: Create an Agent Space

An Agent Space defines the scope of what AWS DevOps Agent can access as it performs tasks.

You can organize Agent Spaces based on your operational model. Some teams align an Agent Space with a single application, others create one per on-call team managing multiple services, and some organizations use a centralized approach. For this demonstration, I’ll show you how to create an Agent Space for a single application. This setup helps isolate investigations and resources for that specific application, making it easier to track and analyze incidents within its context.

In the AWS DevOps Agent section of the AWS Management Console, I select Create Agent Space, enter a name for this space and create the AWS Identity and Access Management (IAM) roles it uses to introspect AWS resources in my or others’ AWS accounts.

For this demo, I choose to enable the AWS DevOps Agent web app; more about this later. This can be done at a later stage.

When ready, I choose Create.

After it has been created, I choose the Topology tab.

This view shows the key resources, entities, and relationships AWS DevOps Agent has selected as a foundation for performing its tasks efficiently. It doesn’t represent everything AWS DevOps Agent can access or see, only what the Agent considers most relevant right now. By default, the Topology includes the AWS resources that are contained in my account. As your agent completes more tasks, it will discover and add new resources to this list.

Step 2: Configure the AWS DevOps web app for the operators

The AWS DevOps Agent web app provides a web interface for on-call engineers to manually trigger investigations, view investigation details including relevant topology elements, steer investigations, and ask questions about an investigation.

I can access the web app directly from my Agent Space in the AWS console by choosing the Operator access link. Alternatively, I can use AWS IAM Identity Center to configure user access for my team. IAM Identity Center lets me manage users and groups directly or connect to an identity provider (IdP), providing a centralized way to control who can access the AWS DevOps Agent web app.

At this stage, I have an Agent Space all set up to focus investigations and resources for this speciﬁc application, and I’ve enabled the DevOps team to initiate investigations using the web app.

Now that the one-time setup for this application is done, I start invoking the faulty Lambda function. It generates errors at each invocation. The CloudWatch alarm associated with the Lambda errors count turns on to ALARM state. In real life, you might receive an alert from external services, such as ServiceNow. You can configure AWS DevOps Agent to automatically start investigations when receiving such alerts.

For this demo, I manually start the investigation by selecting Start Investigation.

You can also choose from several preconfigured starting points to quickly begin your investigation: Latest alarm to investigate your most recent triggered alarm and analyze the underlying metrics and logs to determine the root cause, High CPU usage to investigate high CPU utilization metrics across your compute resources and identify which processes or services are consuming excessive resources, or Error rate spike to investigate the recent increase in application error rates by analyzing metrics, application logs, and identifying the source of failures.

I enter some information, such as Investigation details, Investigation starting point, the Date and time of the incident, the AWS Account ID for the incident.

In the AWS DevOps Agent web app, you can watch the investigation unfold in real time. The agent identifies the application stack. It correlates metrics from CloudWatch, examines logs from CloudWatch Logs or external sources, such as Splunk, reviews recent code changes from GitHub, and analyzes traces from AWS X-Ray.

It identifies the error patterns and provides a detailed investigation summary. In the context of this demo, the investigation reveals that these are intentional test exceptions, shows the timeline of function invocations leading to the alarm, and even suggests monitoring improvements for error handling.

The agent uses a dedicated incident channel in Slack, notifies on-call teams if needed, and provides real-time status updates to stakeholders. Through the investigation chat interface, you can interact directly with the agent by asking clarifying questions such as “which logs did you analyze?” or steering the investigation by providing additional context, such as “focus on these specific log groups and rerun your analysis.” If you need expert assistance, you can create an AWS Support case with a single click, automatically populating it with the agent’s findings, and engage with AWS Support experts directly through the investigation chat window.

For this demo, the AWS DevOps Agent correctly identified manual activities in the Lambda console to invoke a function that intentionally triggers errors .

Beyond incident response, AWS DevOps Agent analyzes my recent incidents to identify high-impact improvements that prevent future issues.

During active incidents, the agent offers immediate mitigation plans through its incident mitigations tab to help restore service quickly. Mitigation plans consist of specs that provide detailed implementation guidance for developers and agentic development tools like Kiro.

For longer-term resilience, it identifies potential enhancements by examining gaps in observability, infrastructure configurations, and deployment pipeline. My straightforward demo that triggered intentional errors was not enough to generate relevant recommendations though.

For example, it might detect that a critical service lacks multi-AZ deployment and comprehensive monitoring. The agent then creates detailed recommendations with implementation guidance, considering factors like operational impact and implementation complexity. In an upcoming quick follow-up release, the agent will expand its analysis to include code bugs and testing coverage improvements.

Availability
You can try AWS DevOps Agent today in the US East (N. Virginia) Region. Although the agent itself runs in US East (N. Virginia) (us-east-1), it can monitor applications deployed in any Region, across multiple AWS accounts.

During the preview period, you can use AWS DevOps Agent at no charge, but there will be a limit on the number of agent task hours per month.

As someone who has spent countless nights debugging production issues, I’m particularly excited about how AWS DevOps Agent combines deep operational insights with practical, actionable recommendations. The service helps teams move from reactive firefighting to proactive system improvement.

To learn more and sign up for the preview, visit AWS DevOps Agent. I look forward to hearing how AWS DevOps Agent helps improve your operational efficiency.

— seb

Accelerate AI development using Amazon SageMaker AI with serverless MLflow

2025-12-02 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/accelerate-ai-development-using-amazon-sagemaker-ai-with-serverless-mlflow/

Since we announced Amazon SageMaker AI with MLflow in June 2024, our customers have been using MLflow tracking servers to manage their machine learning (ML) and AI experimentation workflows. Building on this foundation, we’re continuing to evolve the MLflow experience to make experimentation even more accessible.

Today, I’m excited to announce that Amazon SageMaker AI with MLflow now includes a serverless capability that eliminates infrastructure management. This new MLflow capability transforms experiment tracking into an immediate, on-demand experience with automatic scaling that removes the need for capacity planning.

The shift to zero-infrastructure management fundamentally changes how teams approach AI experimentation—ideas can be tested immediately without infrastructure planning, enabling more iterative and exploratory development workflows.

Getting started with Amazon SageMaker AI and MLflow
Let me walk you through creating your first serverless MLflow instance.

I navigate to Amazon SageMaker AI Studio console and select the MLflow application. The term MLflow Apps replaces the previous MLflow tracking servers terminology, reflecting the simplified, application-focused approach.

Here, I can see there’s already a default MLflow App created. This simplified MLflow experience makes it more straightforward for me to start doing experiments.

I choose Create MLflow App, and enter a name. Here, I have both an AWS Identity and Access Management (IAM) role and Amazon Simple Service (Amazon S3) bucket are already been configured. I only need to modify them in Advanced settings if needed.

Here’s where the first major improvement becomes apparent—the creation process completes in approximately 2 minutes. This immediate availability enables rapid experimentation without infrastructure planning delays, eliminating the wait time that previously interrupted experimentation workflows.

After it’s created, I receive an MLflow Amazon Resource Name (ARN) for connecting from notebooks. The simplified management means no server sizing decisions or capacity planning required. I no longer need to choose between different configurations or manage infrastructure capacity, which means I can focus entirely on experimentation. You can learn how to use MLflow SDK at Integrate MLflow with your environment in the Amazon SageMaker Developer Guide.

With MLflow 3.4 support, I can now access new capabilities for generative AI development. MLflow Tracing captures detailed execution paths, inputs, outputs, and metadata throughout the development lifecycle, enabling efficient debugging across distributed AI systems.

This new capability also introduces cross-domain access and cross-account access through AWS Resource Access Manager (AWS RAM) share. This enhanced collaboration means that teams across different AWS domains and accounts can share MLflow instances securely, breaking down organizational silos.

Better together: Pipelines integration
Amazon SageMaker Pipelines is integrated with MLflow. SageMaker Pipelines is a serverless workflow orchestration service purpose-built for machine learning operations (MLOps) and large language model operations (LLMOps) automation—the practices of deploying, monitoring, and managing ML and LLM models in production. You can easily build, execute, and monitor repeatable end-to-end AI workflows with an intuitive drag-and-drop UI or the Python SDK.

From a pipeline, a default MLflow App will be created if one doesn’t already exist. The experiment name can be defined and metrics, parameters, and artifacts are logged to the MLflow App as defined in your code. SageMaker AI with MLflow is also integrated with familiar SageMaker AI model development capabilities like SageMaker AI JumpStart and Model Registry, enabling end-to-end workflow automation from data preparation through model fine-tuning.

Things to know
Here are key points to note:

Pricing – The new serverless MLflow capability is offered at no additional cost. Note there are service limits that apply.
Availability – This capability is available in the following AWS Regions: US East (N. Virginia, Ohio), US West (N.California, Oregon), Asia Pacific (Mumbai, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Paris, Stockholm), South America (São Paulo).
Automatic upgrades: MLflow in-place version upgrades happen automatically, providing access to the latest features without manual migration work or compatibility concerns. The service currently supports MLflow 3.4, providing access to the latest capabilities including enhanced tracing features.
Migration support – You can use the open source MLflow export-import tool available at mlflow-export-import to help migrate from existing Tracking Servers, whether they’re from SageMaker AI, self-hosted, or otherwise to serverless MLflow (MLflow Apps).

Get started with serverless MLflow by visiting Amazon SageMaker AI Studio and creating your first MLflow App. Serverless MLflow is also supported in SageMaker Unified Studio for additional workflow flexibility.

Happy experimenting!
— Donnie

Introducing Amazon Nova 2 Lite, a fast, cost-effective reasoning model

2025-12-02 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-nova-2-lite-a-fast-cost-effective-reasoning-model/

Today, we’re releasing Amazon Nova 2 Lite, a fast, cost-effective reasoning model for everyday workloads. Available in Amazon Bedrock, the model offers industry-leading price performance and helps enterprises and developers build capable, reliable, and efficient agentic-AI applications. For organizations who need AI that truly understands their domain, Nova 2 Lite is the best model to use with Nova Forge to build their own frontier intelligence.

Nova 2 Lite supports extended thinking, including step-by-step reasoning and task decomposition, before providing a response or taking action. Extended thinking is off by default to deliver fast, cost-optimized responses, but when deeper analysis is needed, you can turn it on and choose from three thinking budget levels: low, medium, or high, giving you control over the speed, intelligence, and cost tradeoff.

Nova 2 Lite supports text, image, video, document as input and offers a one million-token context window, enabling expanded reasoning and richer in-context learning. In addition, Nova 2 Lite can be customized for your specific business needs. The model also includes access to two built-in tools: web grounding and a code interpreter. Web grounding retrieves publicly available information with citations, while the code interpreter allows the model to run and evaluate code within the same workflow.

Amazon Nova 2 Lite demonstrates strong performance across diverse evaluation benchmarks. The model excels in core intelligence across multiple domains including instruction following, math, and video understanding with temporal reasoning. For agentic workflows, Nova 2 Lite shows reliable function calling for task automation and precise UI interaction capabilities. The model also demonstrates strong code generation and practical software engineering problem-solving abilities.

Nova 2 Lite is built to meet your company’s needs
Nova 2 Lite can be used for a broad range of your everyday AI tasks. It offers the best combination of price, performance, and speed. Early customers are using Nova 2 Lite for customer service chatbots, document processing, and business process automation.

Nova 2 Lite can help support workloads across many different use cases:

Business applications – Automate business process workflow, intelligent document processing (IDP), customer support, and web search to improve productivity and outcomes
Software engineering – Generate code, debugging, refactoring, and migrating systems to accelerate development and increase efficiency
Business intelligence and research – Use long-horizon reasoning and web grounding to analyze internal and external sources to uncover insights, and make informed decisions

For specific requirements, Nova 2 Lite is also available for customization on both Amazon Bedrock and Amazon SageMaker AI.

Using Amazon Nova 2 Lite
In the Amazon Bedrock console, you can use the Chat/Text playground to quickly test the new model with your prompts. To integrate the model into your applications, you can use any AWS SDKs with the Amazon Bedrock InvokeModel and Converse API. Here’s a sample invocation using the AWS SDK for Python (Boto3).

import boto3

AWS_REGION="us-east-1"
MODEL_ID="global.amazon.nova-2-lite-v1:0"
MAX_REASONING_EFFORT="low" # low, medium, high

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)

# Enable extended thinking for complex problem-solving
response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=[{
        "role": "user",
        "content": [{"text": "I need to optimize a logistics network with 5 warehouses, 12 distribution centers, and 200 retail locations. The goal is to minimize total transportation costs while ensuring no location is more than 50 miles from a distribution center. What approach should I take?"}]
    }],
    additionalModelRequestFields={
        "reasoningConfig": {
            "type": "enabled", # enabled, disabled (default)
            "maxReasoningEffort": MAX_REASONING_EFFORT
        }
    }
)

# The response will contain reasoning blocks followed by the final answer
for block in response["output"]["message"]["content"]:
    if "reasoningContent" in block:
        reasoning_text = block["reasoningContent"]["reasoningText"]["text"]
        print(f"Nova's thinking process:\n{reasoning_text}\n")
    elif "text" in block:
        print(f"Final recommendation:\n{block['text']}")

You can also use the new model with agentic frameworks that supports Amazon Bedrock and deploy the agents using Amazon Bedrock AgentCore. In this way, you can build agents for a broad range of tasks. Here’s the sample code for an interactive multi-agent system using the Strands Agents SDK. The agents have access to multiple tools, including read and write file access and the possibility to run shell commands.

from strands import Agent
from strands.models import BedrockModel
from strands_tools import calculator, editor, file_read, file_write, shell, http_request, graph, swarm, use_agent, think

AWS_REGION="us-east-1"
MODEL_ID="global.amazon.nova-2-lite-v1:0"
MAX_REASONING_EFFORT="low" # low, medium, high

SYSTEM_PROMPT = (
    "You are a helpful assistant. "
    "Follow the instructions from the user. "
    "To help you with your tasks, you can dynamically create specialized agents and orchestrate complex workflows."
)

bedrock_model = BedrockModel(
    region_name=AWS_REGION,
    model_id=MODEL_ID,
    additional_request_fields={
        "reasoningConfig": {
            "type": "enabled", # enabled, disabled (default)
            "maxReasoningEffort": MAX_REASONING_EFFORT
        }
    }
)

agent = Agent(
    model=bedrock_model,
    system_prompt=SYSTEM_PROMPT,
    tools=[calculator, editor, file_read, file_write, shell, http_request, graph, swarm, use_agent, think]
)

while True:
    try:
        prompt = input("\nEnter your question (or 'quit' to exit): ").strip()
        if prompt.lower() in ['quit', 'exit', 'q']:
            break
        if len(prompt) > 0:
            agent(prompt)
    except KeyboardInterrupt:
        break
    except EOFError:
        break

print("\nGoodbye!")

Things to know
Amazon Nova 2 Lite is now available in Amazon Bedrock via global cross-Region inference in multiple locations. For Regional availability and future roadmap, visit AWS Capabilities by Region.

Nova 2 Lite includes built-in safety controls to promote responsible AI use, with content moderation capabilities that help maintain appropriate outputs across a wide range of applications.

To understand the costs, see Amazon Bedrock pricing. To learn more, visit the Amazon Nova User Guide.

Start building with Nova 2 Lite today. To experiment with the new model, visit the Amazon Nova interactive website. Try the model in the Amazon Bedrock console, and share your feedback on AWS re:Post.

— Danilo

New AWS Security Agent secures applications proactively from design to deployment (preview)

2025-12-02 Esra Kayabali

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/new-aws-security-agent-secures-applications-proactively-from-design-to-deployment-preview/

Today, we’re announcing AWS Security Agent in preview, a frontier agent that proactively secures your applications throughout the development lifecycle. It conducts automated application security reviews tailored to your organizational requirements and delivers context-aware penetration testing on demand. By continuously validating application security from design to deployment, it helps prevent vulnerabilities early in development.

Static application security testing (SAST) tools examine code without runtime context, whereas dynamic application security testing (DAST) tools assess running applications without application-level context. Both types of tools are one-dimensional because they don’t understand your application context. They don’t understand how your application is designed, what security threats it faces, and where and how it runs. This forces security teams to manually review everything, creating delays. Penetration testing is even slower—you either wait weeks for an external vendor or your internal security team to find time. When every application requires a manual security review and penetration test, the backlog grows quickly. Applications wait weeks or months for security validation before they can launch. This creates a gap between the frequency of software releases and the frequency of security evaluations. Security is not applied to the entire portfolio of applications, leaving customers exposed and knowingly shipping vulnerable code to meet deadlines. Over 60 percent of organizations update web applications weekly or more often, while nearly 75 percent test web applications monthly or less often. A 2025 report from Checkmarx found that 81 percent of organizations knowingly deploy vulnerable code to meet delivery deadlines.

AWS Security Agent is context-aware—it understands your entire application. It understands your application design, your code, and your specific security requirements. It continuously scans for security violations automatically and runs penetration tests on-demand instantly without scheduling. The penetration testing agent creates a customized attack plan informed by the context it has learned from your security requirements, design documents, and source code, and dynamically adapts as it runs based on what it discovers, such as endpoints, status and error codes, and credentials. This helps surface deeper, more sophisticated vulnerabilities before production, ensuring your application is secure before it launches without delays or surprises.

“SmugMug is excited to add AWS Security Agent to our automated security portfolio. AWS Security Agent transforms our security ROI by enabling pen test assessments that complete in hours rather than days, at a fraction of manual testing costs. We can now assess our services more frequently, dramatically decreasing the time to identify and address issues earlier in the software development lifecycle.” says Erik Giberti, Sr. Director of Product Engineering at SmugMug.

Get started with AWS Security Agent
AWS Security Agent provides design security review, code security review, and on-demand penetration testing capabilities. Design and code review check organizational security requirements that you define, and penetration testing learns application context from source code and specifications to identify vulnerabilities. To get started, navigate to the AWS Security Agent console. The console landing page provides an overview of how AWS Security Agent delivers continuous security assessment across your development lifecycle.

The Get started with AWS Security Agent panel on the right side of the landing page guides you through initial configuration. Choose Set up AWS Security Agent to create your first agent space and begin performing security reviews on your applications.

Provide an Agent space name to identify which agent you’re interacting with across different security assessments. An agent space is an organizational container that represents a distinct application or project you want to secure. Each agent space has its own testing scope, security configuration, and dedicated web application domain. We recommend creating one agent space per application or project to maintain clear boundaries and organized security assessments. You can optionally add a Description to provide context about the agent space’s purpose for other administrators.

When you create the first agent space in the AWS Management Console, AWS creates the Security Agent Web Application. The Security Agent Web Application is where users conduct design reviews and execute penetration tests within the boundaries established by administrators in the console. Users select which agent space to work in when conducting design reviews or executing penetration tests.

During the setup process, AWS Security Agent offers two options for managing user access to the Security Agent Web Application: Single Sign-On (SSO) with IAM Identity Center, which enables team-wide SSO access by integrating with AWS IAM Identity Center, or IAM users, which allows only AWS Identity and Access Management (IAM) users of this AWS account to access the Security Agent Web Application directly through the console and is best for quick setup or access without SSO configuration. When you choose the SSO option, AWS Security Agent creates an IAM Identity Center instance to provide centralized authentication and user management for AppSec team members who will access design reviews, code reviews, and penetration testing capabilities through the Security Agent Web Application.

The permissions configuration section helps you control how AWS Security Agent accesses other AWS services, APIs, and accounts. You can create a default IAM role that AWS Security Agent will use to access resources, or choose an existing role with appropriate permissions.

After completing the initial configuration, choose Set up AWS Security Agent to create the agent.

After creating an agent space, the agent configuration page displays three capability cards: Design review, Code review, and Penetration testing. While not required to operate the penetration testing, if you plan to use design review or code review capabilities, you can configure which security requirements will guide those assessments. AWS Security Agent includes AWS managed requirements, and you can optionally define custom requirements tailored to your organization. You can also manage which team members have access to the agent.

Security requirements
AWS Security Agent enforces organizational security requirements that you define so that applications comply with your team’s policies and standards. Security requirements specify the controls and policies that your applications must follow during both design and code review phases.

To manage security requirements, navigate to Security requirements in the navigation pane. These requirements are shared across all agent spaces and apply to both design and code reviews.

Managed security requirements are based on industry standards and best practices. These requirements are ready to use, maintained by AWS, and you can enable them instantly without configuration.

When creating a custom security requirement, you specify the control name and description that defines the policy. For example, you might create a requirement called Network Segmentation Strategy Defined that requires designs to define clear network segmentation separating workload components into logical layers based on data sensitivity. Or you might define Short Session Timeouts for Privileged and PII Access to mandate specific timeout durations for administrative and personally identifiable information (PII) access. Another example is Customer-Managed Encryption Keys Required, which requires designs to specify customer managed AWS Key Management Service (AWS KMS) keys rather than AWS managed keys for encrypting sensitive data at rest. AWS Security Agent evaluates designs and code against these enabled requirements, identifying policy violations.

Design security review
The design review capability analyzes architectural documents and product specifications to identify security risks before code is written. AppSec teams upload design documents through the AWS Security Agent console or ingest them from S3 and other connected services. AWS Security Agent assesses compliance with organizational security requirements and provides remediation guidance.

Before conducting design reviews, confirm you’ve configured the security requirements that AWS Security Agent will check. You can get started with AWS managed security requirements or define custom requirements tailored to your organization, as described in the Security requirements section.

To get started with the Design review, choose Admin access under Web app access to access the web app interface. When logged in, choose Create design review. Enter a Design review name to identify the assessment—for example, when assessing a new feature design that extends your application—and upload up to five design files. Choose Start design review to begin the assessment against your enabled security requirements.

After completing a design review, the design review detail page displays the review status, completion date, and files reviewed in the Details section. The Findings summary shows the count of findings across four compliance status categories:

Non-compliant – The design violates or inadequately addresses the security requirement.
Insufficient data – The uploaded files don’t contain enough information to determine compliance.
Compliant – The design meets the security requirement based on the uploaded documentation.
Not applicable – The security requirement’s relevance criteria indicate it doesn’t apply to this system design.

The Findings summary section helps you quickly assess which security requirements need attention. Non-compliant findings require updates to your design documents, while Insufficient data findings indicate gaps in the documentation where security teams should work with application teams to gather additional clarity before AWS Security Agent can complete the assessment.

The Files reviewed section displays all uploaded documents with options to search and download the original files.

The Review findings section lists each security requirement evaluated during the review along with its compliance status. In this example, the findings include Network Segmentation Strategy Defined, Customer-Managed Encryption Keys Required, and Short Session Timeouts for Privileged and PII Access. These are the custom security requirements defined earlier in the Security requirements section. You can search for specific security requirements or filter findings by compliance status to focus on items that require action.

When you choose a specific finding, AWS Security Agent displays detailed justification explaining the compliance status and provides recommended remediation steps. This context-aware analysis helps you understand security concerns specific to your design rather than generic security guidance. For designs with noncompliant findings, you can update your documentation to address the security requirements and create a new design review to validate the improvements. You can also choose Clone this design review to create a new assessment based on the current configuration or choose Download report to export the complete findings for sharing with your team.

After validating that your application design meets organizational security requirements, the next step is enforcing those same requirements as developers write code.

Code security review
The code review capability analyzes pull requests in GitHub to identify security vulnerabilities and organizational policy violations. AWS Security Agent detects OWASP Top Ten common vulnerabilities such as SQL injection, cross-site scripting, and inadequate input validation. It also enforces the same organizational security requirements used in design review, implementing code compliance with your team’s policies beyond common vulnerabilities.

When your application checks in new code, AWS Security Agent verifies compliance with organizational security requirements that go beyond common vulnerabilities. For example, if your organization requires audit logs to be retained for only 90 days, AWS Security Agent identifies when code configures a 365-day retention period and comments on the pull request with the specific violation. This catches policy violations that traditional security tools miss because the code is technically functional and secure.

To enable code review, choose Enable code review on the agent configuration page and connect your GitHub repositories. You can enable code review for specific repositories or connect repositories without enabling code review if you want to use them for penetration testing context instead.

For detailed setup instructions, visit the AWS Security Agent documentation.

On-demand penetration testing
The on-demand penetration testing capability executes comprehensive security testing to discover and validate vulnerabilities through multistep attack scenarios. AWS Security Agent systematically discovers the application’s attack surface through reconnaissance and endpoint enumeration, then deploys specialized agents to execute security testing across 13 risk categories, including authentication, authorization, and injection attacks. When provided source code, API specifications, and business documentation, AWS Security Agent builds deeper context about the application’s architecture and business rules to generate more targeted test cases. It adapts testing based on application responses and adjusts attack strategies as it discovers new information during the assessment.

AWS Security Agent tests web applications and APIs against OWASP Top Ten vulnerability types, identifying exploitable issues that static analysis tools miss. For example, while dynamic application security testing (DAST) tools look for direct server-side template injection (SSTI) payloads, AWS Security Agent can combine SSTI attacks with error forcing and debug output analysis to execute more complex exploits. AppSec teams define their testing scope—target URLs, authentication details, threat models, and documentation—the same as they would brief a human penetration tester. Using this understanding, AWS Security Agent develops application context and autonomously executes sophisticated attack chains to discover and validate vulnerabilities. This transforms penetration testing from a periodic bottleneck into a continuous security practice, reducing risk exposure.

To enable penetration testing, choose Enable penetration test on the agent configuration page. You can configure target domains, VPC settings for private endpoints, authentication credentials, and additional context sources such as GitHub repositories or S3 buckets. You must verify ownership of each domain before AWS Security Agent can run penetration testing.

After enabling the capability, create and run penetration tests through the AWS Security Agent Web Application. For detailed setup and configuration instructions, visit the AWS Security Agent documentation.

After creating and running a penetration test, the detail page provides an overview of test execution and results. You can run new tests or modify the configuration from this page. The page displays information about the most recent execution, including start time, status, duration, and a summary of discovered vulnerabilities categorized by severity. You can also view a history of all previous test runs with their findings summaries.

For each run, the detail page provides three tabs. The Penetration test run overview tab displays high-level information about the execution, including duration and overall status. The Penetration test logs tab lists all tasks executed during the penetration test, providing visibility into how AWS Security Agent discovered vulnerabilities, including the security testing actions performed, application responses, and the reasoning behind each test. The Findings tab displays all discovered vulnerabilities with complete details, including descriptions, attack reasoning, steps to reproduce, impact, and remediation guidance.

Join the preview
To get started with AWS Security Agent, visit the AWS Security Agent console and create your first agent to begin automating design reviews, code reviews, and penetration testing across your development lifecycle. During the preview period, AWS Security Agent is free of charge.

AWS Security Agent is available in the US East (N. Virginia) Region.

To learn more, visit the AWS Security Agent product page and technical documentation.

— Esra

Noise

Tag Archives: news

Welcoming Ryan Smith as the New Managing Editor at STH

AI Data Center Markets are so Big that Micron is Exiting its Crucial Consumer Business

Amazon Bedrock adds reinforcement ﬁne-tuning simplifying how developers build smarter, more accurate AI models

New serverless customization in Amazon SageMaker AI accelerates model fine-tuning

Introducing checkpointless and elastic training on Amazon SageMaker HyperPod

NVIDIA NVLink Fusion Tapped for Future AWS Trainium4 Deployments

Announcing replication support and Intelligent-Tiering for Amazon S3 Tables

Amazon Bedrock AgentCore adds quality evaluations and policy controls for deploying trusted AI agents

Build multi-step applications and AI workflows with AWS Lambda durable functions

Introducing Database Savings Plans for AWS Databases

Amazon CloudWatch introduces unified data management and analytics for operations, security, and compliance

New and enhanced AWS Support plans add AI capabilities to expert guidance

Amazon OpenSearch Service improves vector database performance and cost with GPU acceleration and auto-optimization

Amazon S3 Vectors now generally available with increased scale and performance

Amazon Bedrock adds 18 fully managed open weight models, including the new Mistral Large 3 and Ministral 3 models

Introducing Amazon EC2 X8aedz instances powered by 5th Gen AMD EPYC processors for memory-intensive workloads

AWS DevOps Agent helps you accelerate incident response and improve system reliability (preview)

Accelerate AI development using Amazon SageMaker AI with serverless MLflow

Introducing Amazon Nova 2 Lite, a fast, cost-effective reasoning model

New AWS Security Agent secures applications proactively from design to deployment (preview)

The collective thoughts of the interwebz