Tag Archives: announcements

Introducing the Amazon OpenSearch Lens for the AWS Well-Architected Framework

2025-11-12 Muslim Abu-Taha

Post Syndicated from Muslim Abu-Taha original https://aws.amazon.com/blogs/big-data/introducing-the-amazon-opensearch-lens-for-the-aws-well-architected-framework/

Earlier this year, we released the Amazon OpenSearch Service Lens, an AWS Well-Architected whitepaper. The AWS Well-Architected Framework provides a consistent approach for evaluating architectures and implementing scalable designs. Using this framework, the Amazon OpenSearch Service Lens outlines how to perform AWS Well-Architected reviews to assess and identify technical risks in your OpenSearch Service deployments.

In this post, we show you how to use the Amazon OpenSearch Service Lens to evaluate your OpenSearch Service workloads against architectural best practices.

Understanding the AWS Well-Architected Framework

At AWS, a well-architected cloud environment is fundamental to helping you achieve your business outcomes. The AWS Well-Architected Framework represents the collective experience of AWS from working with organizations across industries, distilled into a structured approach for evaluating architectures and implementing designs that scale over time. The AWS Well-Architected Framework is built on six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability. Using the Framework, cloud architects, system builders, engineers, and developers can build secure, high performance, resilient, and efficient infrastructure for their applications and workloads.

OpenSearch Service Lens

The OpenSearch Service Lens is a collection of customer-proven design principles and best practices to help you adopt a cloud-native approach to using Amazon OpenSearch Service. These recommendations are based on insights that AWS has gathered from customers, AWS Partners, the community, and our own AWS OpenSearch technical specialist communities.

The OpenSearch Service Lens extends the AWS Well-Architected Framework to help you address critical architectural questions specific to Amazon OpenSearch workloads, for example:

How do you size and configure Amazon OpenSearch Service domains for optimal performance?
What data retention and lifecycle management strategies help balance cost and accessibility?
How do you implement security controls that protect sensitive data while maintaining search functionality?
What operational practices ensure reliable search experiences as your data volumes grow?

The OpenSearch Service Lens joins a collection of AWS Well-Architected Lenses that focus on specialized workloads such as the Internet of Things (IoT), games, artificial intelligence (AI) and machine learning (ML), SAP, and serverless technology.

The lens highlights some of the most common areas for assessment and improvement. It is designed to align with and provide insights across the six pillars of the AWS Well-Architected Framework:

Operational excellence focuses on running and monitoring systems to deliver business value, and continually improving processes and procedures. This topic includes the ability to support development and run workloads effectively, gaining insights into their operations, and continuously improve supporting processes to deliver business value.
Security focuses on protecting your data and systems. This addresses implementing fine-grained access control for users and applications, securing domain access through encryption and network controls, detecting and mitigating vulnerabilities, reducing potential attack surfaces, and protecting sensitive data.
Reliability focuses on ensuring an end user environment performs correctly and consistently when it’s expected to. This topic includes implementing automatic disaster recovery mechanisms, designing multi-Availability Zone deployments for high availability, scaling domain capacity to meet demand, and using automation for operational tasks to reduce human error. It also covers implementing backup and restore strategies, managing cluster state, and setting up monitoring and alerting to maintain service performance and availability.
Performance efficiency focuses on using Amazon OpenSearch Service resources effectively. This includes selecting appropriate instance types and storage options based on your workload requirements, implementing performance monitoring and optimization strategies, and using OpenSearch Service features to reduce operational overhead. It also covers tuning domain configurations, managing data indexing patterns, and optimizing search and analytics queries to achieve the best possible performance while maintaining cost efficiency.
Cost optimization focuses on managing expenses effectively. This topic addresses implementing cost allocation tags to track domain expenses by workload, selecting appropriate instance types and storage options based on your needs, and choosing cost-effective payment options such as Reserved Instances for predictable workloads. It also covers using UltraWarm and cold storage tiers for infrequently accessed data, implementing index lifecycle policies to manage storage costs, and monitoring usage patterns to rightsize domains and optimize performance-to-cost ratios.
Sustainability focuses on minimizing the environmental impacts of running cloud workloads. OpenSearch topics addresses implementing efficient domain sizing strategies, selecting instance types with the best performance-to-energy ratio, optimizing retention policies and using different storage tiers to reduce the active compute footprint.

By applying this lens to your Amazon OpenSearch Service workloads, you gain insights that go beyond general architectural principles to address characteristics of search and analytics implementations. The OpenSearch Service Lens provides a consistent framework for making architectural decisions aligned with AWS best practices for designing a new Amazon OpenSearch Service architecture or optimizing an existing deployment.

Getting started with the OpenSearch Lens

To get started with the Amazon OpenSearch Service Lens, review the six pillars of the AWS Well-Architected Framework: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability.

Then, sign in to the AWS Management Console and open the AWS Well-Architected Tool. Navigate to Custom Lenses and import the Amazon OpenSearch Service Lens. After importing the lens, you can use the specialized questionnaires to evaluate your OpenSearch Service workloads against best practices, and once you complete the questionnaire, you will get insightful feedback.

Next, plan architecture reviews with your team to evaluate your Amazon OpenSearch Service domains using the lens criteria. Document your assessment results, including what works well and where you can improve your deployment. For help understanding the Amazon OpenSearch Service Lens questions, refer to the lens documentation.

If you have an AWS Support plan, you can request help with your architecture review. The OpenSearch Service Lens questions aim to guide your architectural decisions, not test your knowledge. Focus on understanding the architectural principles behind each question. After completing your assessment, create a prioritized improvement plan that addresses findings that could affect your workload performance, data durability, and cost efficiency. For help implementing these improvements, you can work with AWS Professional Services or AWS Partners who specialize in Amazon OpenSearch Service.

Conclusion and next steps

The Amazon OpenSearch Service Lens provides actionable guidance to help you build well-architected search and analytics workloads aligned with your business requirements. Start by accessing the AWS Well-Architected Tool and applying this lens to your OpenSearch Service domains. Make architectural reviews a regular part of your development process. Consider sharing your experiences with the AWS community to help others improve their OpenSearch Service implementations.

You can find more information on AWS Well-Architected Lenses in the AWS Well-Architected Tool User Guide. We encourage you to incorporate this specialized guidance into your architectural reviews and use it to drive continuous improvement in your search and analytics workloads on AWS.

AWS regularly updates the Amazon OpenSearch Service Lens to reflect new service capabilities and architectural best practices. These updates help you take advantage of the latest improvements in Amazon OpenSearch Service while maintaining architectural excellence.

To learn more about Amazon OpenSearch Service, including customer success stories and additional resources, visit Amazon OpenSearch Service page.

About the authors

Contributors

The authors would like to thank the following people for their invaluable help in developing this new OpenSearch Lens for the AWS Well-Architected Framework: Muslim Abu-Taha, Senior Worldwide Specialist Solutions Architect for Amazon OpenSearch; Shih-Yong Wang, Manager, Solutions Architecture; Ankush Agarwal, Solutions Architect; and Jun-Tin Yeh, Cloud Optimization Success Solutions Architect.

The authors would also like to thank the following people for their contributions to technical reviews: Cedric Pelvet, Principal OpenSearch Solutions Architect; Hajer Bouafif, Senior OpenSearch Solutions Architect; Francisco Losada, OpenSearch Solutions Architect; Bharav Patel, OpenSearch Solutions Architect; and Praveen Prasad, Senior Specialist Technical Account Manager.

Amazon MSK Express brokers now support Intelligent Rebalancing for 180 times faster operation performance

2025-11-11 Swapna Bandla

Post Syndicated from Swapna Bandla original https://aws.amazon.com/blogs/big-data/amazon-msk-express-brokers-now-support-intelligent-rebalancing-for-180-times-faster-operation-performance/

Effective today, all new Amazon Managed Streaming for Apache Kafka (Amazon MSK) Provisioned clusters with Express brokers will support Intelligent Rebalancing at no additional cost. With this new capability you can perform automatic partition balancing operations when scaling Apache Kafka clusters up or down. Intelligent Rebalancing maximizes the capacity utilization of Amazon MSK clusters with Express brokers by optimally rebalancing Kafka resources on them for better performance, eliminating the need to manage partitions independently or by using third-party tools. Intelligent Rebalancing on Amazon MSK Express brokers performs these operations up to 180 times faster compared to Standard brokers.

We launched Amazon MSK Express brokers in November 2024 to reimagine Apache Kafka for ease of use, best-in-class price performance, and predictable availability. Amazon MSK Express brokers are designed to deliver up to three times more throughput per-broker, scale up to 20 times faster, and reduce recovery time by 90 percent as compared to Standard brokers running Apache Kafka. Since launch, we have expanded Amazon MSK Express brokers to additional AWS Regions, instance types, and most recently increased support to 5x more partitions per Express broker, improving price-performance by up to 50% for partition-bound workloads.

With Intelligent Rebalancing, Amazon MSK Express broker clusters are continuously monitored for resource imbalance or overload based on intelligent Amazon MSK defaults to maximize cluster performance. When required, brokers are efficiently scaled, without affecting cluster availability for clients to produce and consume data. Customers can now take full advantage of the scaling and performance benefits of Amazon MSK Provisioned clusters for Express brokers while simplifying cluster management operations.

In this post we’ll introduce the Intelligent Rebalancing feature and show an example of how it works to improve operation performance.

When to use Intelligent Rebalancing

With Intelligent Rebalancing, Amazon MSK Express brokers now offer a fully automated solution for managing and scaling Kafka clusters, requiring no additional tools or configuration. Intelligent Rebalancing is enabled by default on all new Amazon MSK Express brokers clusters, so we recommend always keeping it on. Intelligent Rebalancing uses Amazon MSK best practices to trigger automatic rebalancing during the following situations:

Scaling in and out clusters: When customers add or remove brokers from their Amazon MSK Express brokers clusters, Intelligent Rebalancing automatically redistributes partitions to balance resource utilization across the brokers. This ensures that the cluster continues to operate at peak performance, making scaling in and out possible with a single update operation.
Steady-state rebalancing: Even during normal operations, Intelligent Rebalancing continuously monitors the Amazon MSK Express brokers cluster and triggers rebalancing when it detects resource imbalances or hotspots. For example, if certain brokers become overloaded due to uneven distribution of partitions or skewed traffic patterns, Intelligent Rebalancing will automatically move partitions to less utilized brokers to restore balance.

How to use Intelligent Rebalancing

To demonstrate the power of Intelligent Rebalancing, let’s run a few tests on an Amazon MSK Express brokers cluster:

Scaling test: We’ll start by creating an Amazon MSK Express brokers cluster with 3 brokers. We’ll then rapidly scale the cluster up to 6 brokers and back down to 3 brokers, simulating a sudden spike in workload. With Intelligent Rebalancing enabled, you’ll see that the rebalancing of partitions is completed within 5-10 minutes, so that the cluster can sustain the increased throughput without any drop in performance.

You can track the current and historical rebalancing operations using the metric RebalanceInProgress. In the picture below, you can also see that the clients on the producer side are not impacted during this rebalancing.

Next, we’ll create an imbalance in the cluster by directing a large portion of the traffic to a single broker. You’ll see that Intelligent Rebalancing detects this imbalance within minutes and automatically redistributes the partitions, restoring the cluster to an optimal state.

The intelligent rebalancing feature detects hotspots and automatically redistributes affected partitions across other brokers to optimize resource utilization. Without Intelligent Rebalancing, the resource imbalance would persist, potentially leading to performance issues or the need for manual intervention by the customer.

These tests showcase how Intelligent Rebalancing with Amazon MSK Express brokers enables scaling Kafka clusters seamlessly while maintaining consistently high performance, even under varying workload conditions.

Conclusion

Intelligent Rebalancing for Amazon MSK Provisioned clusters with Express brokers are currently being rolled out over the next few weeks in all AWS Regions where Amazon MSK Express brokers are supported. This feature is automatically enabled for all new Amazon MSK Provisioned clusters with Express brokers at no additional cost.

To get started, visit the Amazon MSK console. For more information, see the Amazon MSK Developer Guide.

About the authors

2025 H1 IRAP report is now available on AWS Artifact for Australian customers

2025-11-10 Patrick Chang

Post Syndicated from Patrick Chang original https://aws.amazon.com/blogs/security/2025-h1-irap-report-is-now-available-on-aws-artifact-for-australian-customers/

Amazon Web Services (AWS) is excited to announce that the latest version of Information Security Registered Assessors Program (IRAP) report (2025 H1) is now available through AWS Artifact. An independent Australian Signals Directorate (ASD) certified IRAP assessor completed the IRAP assessment of AWS in September 2025.

The new IRAP report includes four additional AWS services that are now assessed at the PROTECTED level under IRAP. This brings the total number of services assessed at the PROTECTED level to 168.

The four newly assessed services are:

For the full list of services, see the IRAP tab on the AWS Services in Scope by Compliance Program page.

We have developed an IRAP documentation pack to help our Australian customers and their partners plan, architect, and assess risk for their workloads when they use AWS Cloud services.

We developed this pack in accordance with the Australian Cyber Security Centre (ACSC) Cloud Security Guidance and Cloud Assessment and Authorisation framework, which addresses guidance within the Australian Government’s Information Security Manual (ISM, March 2025 version), the Department of Home Affairs’ Protective Security Policy Framework (PSPF), and the Digital Transformation Agency’s Secure Cloud Strategy.

The IRAP pack on AWS Artifact also includes newly updated versions of the AWS Consumer Guide and the whitepaper Reference Architectures for ISM PROTECTED Workloads in the AWS Cloud.

Reach out to your AWS representatives to let us know which additional services you would like to see in scope for upcoming IRAP assessments. We strive to bring more services into scope at the PROTECTED level under IRAP to support your requirements.

If you have feedback about this post, submit comments in the Comments section below.

Introducing the Overview of the AWS European Sovereign Cloud whitepaper

2025-11-06 J.D. Bean

Post Syndicated from J.D. Bean original https://aws.amazon.com/blogs/security/introducing-the-overview-of-the-aws-european-sovereign-cloud-whitepaper/

Amazon Web Services (AWS) recently released a new whitepaper, Overview of the AWS European Sovereign Cloud, available in English, German, and French, detailing the planned design and goals of this new infrastructure. The AWS European Sovereign Cloud is a new, independent cloud for Europe, designed to help public sector organizations and customers in highly regulated industries meet their evolving sovereignty and compliance needs. This effort, backed by a €7.8 billion investment in infrastructure, jobs creation, and skills development, will launch its first AWS Region in the State of Brandenburg, Germany by the end of 2025.

This whitepaper provides a broad overview of the AWS European Sovereign Cloud highlighting how AWS is helping customers achieve their sovereignty requirements while benefitting from access to the full power of AWS.

Key aspects covered in the whitepaper include:

Infrastructure – Dedicated physical infrastructure with multiple Availability Zones, following the established AWS Regional model approach
Logical isolation – Logical separation from existing AWS Regions, with independent billing, account, and identity systems
Operational control – Measures to help assure independent operation of the AWS European Sovereign Cloud, including staffing requirements
Data sovereignty – Design that helps make sure customer content and customer-created metadata remain within EU boundaries unless customers choose otherwise
Corporate governance – A distinct corporate structure under EU law, with EU nationals serving as managing directors and an independent advisory board
Approach to law enforcement requests – The technical, operational, and legal measures implemented to help protect customer data and manage law enforcement requests

The whitepaper describes how these elements work together to deliver sovereign control and operational autonomy of our expansive service portfolio to meet Europe’s digital sovereignty needs. The AWS European Sovereign Cloud will be the only fully featured, independently operated sovereign cloud backed by strong technical controls, sovereign assurances, and legal protections designed to meet the needs of European governments and enterprises. Customers and partners using the AWS European Sovereign Cloud will benefit from the full power of AWS including the same service portfolio, security, availability, performance, architecture, APIs, and innovations such as the AWS Nitro System.

We have already made—and will continue to make—new investments in the design, development, and operation of the AWS European Sovereign Cloud. We are building on the strong foundation that has underpinned AWS services for years, including our long standing commitment to customer control over data residency, our design principal of strong regional isolation, our deep European engineering roots, and our more than a decade of experience operating multiple independent clouds for the most critical and restricted workloads.

For more information about the AWS European Sovereign Cloud visit
AWS European Sovereign Cloud.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Orchestrating big data processing with AWS Step Functions Distributed Map

2025-11-05 Biswanath Mukherjee

Post Syndicated from Biswanath Mukherjee original https://aws.amazon.com/blogs/compute/orchestrating-big-data-processing-with-aws-step-functions-distributed-map/

Developers seek to process and enrich semi-structured big data datasets with durably orchestrated network-based workflows. For example, during quarterly earnings season, finance organizations run thousands of market simulations simultaneously to provide timely insights for scenario planning or risk management—these workloads require coordination between raw datasets and on-premise servers to provide the latest market information.

AWS Step Functions is a visual workflow service capable of orchestrating over 14,000 API actions from over 220 AWS services to build distributed applications. Now, Step Functions Distributed Map streamlines big data dataset transformation by processing Amazon Athena data manifest and Parquet files directly. Using its Distributed Map feature, you can process large scale datasets by running concurrent iterations across data entries in parallel. In Distributed mode, the Map state processes the items in the dataset in iterations called child workflow executions. You can specify the number of child workflow executions that can run in parallel. Each child workflow execution has its own, separate execution history from that of the parent workflow. By default, Step Functions runs 10,000 parallel child workflow executions in parallel.

Distributed Map can process AWS Athena data manifest and Parquet files directly, eliminating the need for custom pre-processing. You also now have visibility into your Distributed Map usage with new Amazon CloudWatch metrics: Approximate Open Map Runs Count, Open Map Run Limit, and Approximate Map Runs Backlog Size.

In this post, you’ll learn how to use AWS Step Functions Distributed Map to process Athena data manifest and Parquet files through a step-by-step demonstration.

This post is part of a series of post about AWS Step Functions Distributed Map:

Processing Amazon S3 objects at scale with AWS Step Functions Distributed Map S3 prefix
Optimizing nested JSON array processing using AWS Step Functions Distributed Map
Orchestrating big data processing with AWS Step Functions Distributed Map

Use case: IoT sensor data processing

You’ll build a sample application that demonstrates processing IoT sensor data in Parquet format using Step Functions Distributed Map. These Parquet data files and a manifest file containing the list of the data files are exported from Athena. The data temperature, humidity, and lbattery level from different devices. The following table shows sample of sensor data:

Example IoT sensor data

Your objective is to use the Athena data manifest file, get the list of Parquet files, and iterate over the data in the files to detect anomalies and also stream the processed data through Amazon Kinesis Data Firehose to an Amazon S3 bucket for further analytics using Athena queries. Following is the criteria to detect anomaly:

Low battery conditions: less than 20%
Humidity anomalies: more than 95% or less than 5%
Temperature spikes: more than 35°C or less than -10°C

The following diagram represents the AWS Step Functions state machine:

Parquet files processing workflow

The Distributed Map runs an Athena query which generates Parquet data files and an Athena manifest file (csv). The manifest file contains the list of Parquet data files.
Distributed Map processes these Parquet data files in parallel using child workflow executions. You can control the number of child workflow executions that can run in parallel using MaxConcurrency parameter. See Step Functions service quotas to learn more about concurrency limits.
Each child workflow execution invokes an AWS Lambda function to process the respective Parquet file. The Lambda function processes individual sensor readings and detects anomalies according to the preceeding logic and returns a processed sensor data summary response.
The child workflow sends the summary response record to Amazon Kinesis firehose stream which stores the results in a specified Amazon S3 results bucket.

The following Athena Start QueryExecution state runs an UNLOAD query to generate data files in Parquet format and a manifest file in CSV. The output will be stored in the S3 bucket specified in the UNLOAD query and the manifest file will be stored in the S3 bucket configured for the Athena workgroup.

{
  "QueryLanguage": "JSONata",
  "States": {
	   "Athena StartQueryExecution": {
	    "Type": "Task",
	        "Resource": "arn:aws:states:::athena:startQueryExecution.sync",
	        "Arguments": {
		"QueryString": "UNLOAD (WRITE_YOUR_SELECT_QUERY_HERE) TO 'S3_URI_FOR_STORING_DATA_OBJECT' WITH (format = 'JSON')",
		"WorkGroup": "primary"
	},
	"Output": {
	"ManifestObjectKey": "{% $join([$states.result.QueryExecution.ResultConfiguration.OutputLocation, '-manifest.csv']) %}"
},
“Next”: “Next State”
…
}

The following ItemReader is configured to use a manifest type of “ATHENA_DATA” with “PARQUET” data input.

{
  "QueryLanguage": "JSONata",
  "States": {
    ...
    "Map": {
        ...
        "ItemReader": {
        	"Resource": "arn:aws:states:::s3:getObject",
   	"ReaderConfig": {
      		"ManifestType": "ATHENA_DATA",
      		"InputType": "PARQUET"
   	},
   	"Arguments": {
      		"Bucket":"Bucket": "{% $split($substringAfter($states.input.ManifestObjectKey, 's3://'), '/')[0] %}",,
      		"Key": "{% $substringAfter($substringAfter($states.input.ManifestObjectKey, 's3://'), '/') %}"
   	}
	    },
        ...
    }
}

Additional supported InputType options are CSV and JSONL. All objects referenced in a single manifest file must have the same InputType format. You specify the Amazon S3 bucket location of Athena manifest CSV file under Arguments.

The context object contains information in a JSON structure about your state machine and execution. Your workflows can reference the context object in a JSONata expression with $states.context.

Within a Map state, the Context object includes the following data:

"Map": {
   "Item": {
      "Index" : Number,
      "Key"   : "String", // Only valid for JSON objects
      "Value" : "String",
      "Source": "String"
   }
}

For each Map state iteration, Index contains the index number for the array item that is being currently processed, Key is available only when iterating over JSON objects, Value contains the array item being processed, and Source contains one of the following:

For state input, the value will be : STATE_DATA
For Amazon S3 LIST_OBJECTS_V2 with Transformation=NONE, the value will show the S3 URI for the bucket. For example: S3://amzn-s3-demo-bucket.
For all the other input types, the value will be the Amazon S3 URI. For example: S3://amzn-s3-demo-bucket/object-key.

Using this newly introduced Source field in the context object, you can connect the child executions with the source object.

Prerequisites

Access to an AWS account through the AWS Management Console and the AWS Command Line Interface (AWS CLI). The AWS Identity and Access Management (IAM) user that you use must have permissions to make the necessary AWS service calls and manage AWS resources mentioned in this post. While providing permissions to the IAM user, follow the principle of least-privilege.
AWS CLI installed and configured. If you are using long-term credentials like access keys, follow manage access keys for IAM users and secure access keys for best practices.
Git Installed
AWS Serverless Application Model (AWS SAM) installed
Python 3.13+ installed

Set up the state machine and sample data

Run the following steps to deploy the Step Functions state machine.

Clone the GitHub repository in a new folder and navigate to the project root folder.

git clone https://github.com/aws-samples/sample-stepfunctions-athena-manifest-parquet-file-processor.git
cd sample-stepfunctions-athena-manifest-parquet-file-processor

Run the following command to install required Python dependencies for the Lambda function.

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -r requirements.txt

Build the application.
```
sam build
```
Deploy the application
```
sam deploy --guided
```
Enter the following details:
- Stack name: The CloudFormation stack name (for example, sfn-parquet-file-processor)
- AWS Region: A supported AWS Region (for example, us-east-1)
- Keep rest of the components to default values.
Note the outputs from the AWS SAM deploy. You will use them in the subsequent steps.
Run the following command to generate sample data in csv format and upload it to an S3 bucket. Replace <IoTDataBucketName> with the value from sam deploy ouptut.
```
python3 scripts/generate_sample_data.py <IoTDataBucketName>
```

Create the Athena database and tables

Before you can run queries, you must set up an Athena database and table for your data.

From Amazon Athena console, navigate to workgoups, select the workgroup named “primary”. Select Edit from Actions. In the query result configuration section, select the options as follows:
1. Management of query results – select customer managed
2. Location of query results – enter s3://<IoTDataBucketName>. Replace <IoTDataBucketName> with the value from sam deploy output.
3. Choose Save to save the changes to the workgroup
Select Query editor tab and run the following commands to create database and tables
```
CREATE DATABASE `iotsensordata`;
```

Create an Athena table in database iotsensordata that references the S3 bucket containing the raw sensor data. In this case it will be <IoTDataBucketName>. Replace <IoTDataBucketName> with the value from sam deploy output.

CREATE EXTERNAL TABLE IF NOT EXISTS `iotsensordata`.`iotsensordata` 
(`deviceid` string, 
`timestamp` string,
`temperature` double,
`humidity` double,
`batterylevel` double,
`latitude` double,
`longitude` double
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('field.delim' = ',')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://<IoTDataBucketName>/daily-data/'
TBLPROPERTIES (
 'classification' = 'csv',
 'skip.header.line.count' = '1'
);

Create an Athena table in database iotsensordata that references the S3 bucket having the analytics results streamed from Kinesis Data Firehose. Replace <IoTAnalyticsResultsBucket> with value from sam deploy output. And replace <year> with the current year (e.g 2025).

CREATE EXTERNAL TABLE IF NOT EXISTS iotsensordata.iotsensordataanalytics (deviceid string, analysisDate string, readingTimestamp string, readingsCount int, metrics struct< temperature: double, humidity: double, batterylevel: double, latitude: double, longitude: double >, anomalies array <string>, anomalyCount int, healthStatus string, timestamp string )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'FALSE', 'dots.in.keys' = 'FALSE', 'case.insensitive' = 'TRUE'
)
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://<IoTAnalyticsResultsBucket>/<year>/'
TBLPROPERTIES ('classification' = 'json', 'typeOfData'='file');

Start your state machine

Now that you have data ready and Athena set up for queries, start your state machine to retrieve and process the data.

Run the following command to start execution of the Step Functions. Replace the <StateMachineArn> and <IoTDataBucketName> with the value from sam deploy output..
```
aws stepfunctions start-execution \
  --state-machine-arn <StateMachineArn> \
  --input '{ "IoTDataBucketName": "<IoTDataBucketName>"}'
```
The Step Functions state machine has the Athena StartQueryExecution state which has an UNLOAD query that generates the sensor data files in a parquet format and a manifest file in CSV format. The manifest will have 5 rows referencing the 5 parquet files. The state machine will process these 5 parquet files in one map run.
Run the following command to get the details of the execution. Replace the executionArn from the previous command.
```
aws stepfunctions describe-execution --execution-arn <executionArn>
```
After you see the status SUCCEEDED, run the following command from Athena query editor to check the processed output from Kinesis Data Firehose that was streamed to S3 bucket referenced by the Athena table created in step 4 of the preceding section.
```
SELECT * FROM iotsensordata.iotsensordataanalytics WHERE anomalycount = 1;
```

If any of the sensor data exceeds the thresholds, the healthstatus attribute will be set to “anomalies_detected”. The workflow produced a summary table of metadata which you can now query for reporting.

Review workflow performance

Using the following observability metrics, you can review key performance behavior of your data processing workflow.
The AWS/States namespace includes the following new metrics for all Step Functions Map Runs.

OpenMapRunLimit: This is the maximum number of open Map Runs allowed in the AWS account. The default value is 1,000 runs and is a hard limit. For more information, see Quotas related to accounts.
ApproximateOpenMapRunCount: This metric tracks the approximate number of Map Runs currently in progress within an account. Configuring an alarm on this metric using the Maximum statistic with a threshold of 900 or higher can help you take proactive action before reaching the OpenMapRunLimit of 1,000. This metric enables operational teams to implement preventive measures, such as staggering new executions or optimizing workflow concurrency, to maintain system stability and prevent backlog accumulation.
ApproximateMapRunBacklogSize: This metric shows up when the ApproximateOpenMapRunCount has reached 1,000 and there are backlogged Map Runs waiting to be executed. Backlogged Map Runs wait at the MapRunStarted event until the total number of open Map Runs is less than the quota.

The following graph shows an example of these new metrics. Use the maximum statistic to visualize these metrics. ApproximateMapRunBacklogSize metrics appear after accounts start getting throttled on the OpenMapRunLimit limit. The OpenMapRun (orange line) is the account hard limit of 1,000 shown as a static line. The ApproximateOpenMapRunCount (violet line) is the current number of active OpenMap runs. The ApproximateMapRunBacklogSize (green line) indicates the map runs waiting in backlog to be processed. When the ApproximateOpenMapRunCount is lower than 1000 (OpenMapRun limit) there are no map runs in backlog. However, when the count reaches the OpenMapRun limit, the backlog of map runs starts to build up. After the active runs complete, the backlog will start to drain out and new runs will begin execution.

Graphed metrics from Amazon CloudWatch

Clean up

To avoid costs, remove all resources created for this post once you’re done. From the Athena query editor, run the following commands:

DROP TABLE `iotsensordata`.`iotsensordata`;
DROP TABLE `iotsensordata`.`iotsensordataanalytics`;
DROP DATABASE `iotsensordata`;

Run the following commands from the AWS CLI after replacing the <placeholder> variable to delete the resources you deployed for this post’s solution:

aws s3 rm s3://<IoTDataBucketName> --recursive
aws s3 rm s3://<IoTAnalyticsResultsBucketName> --recursive
sam delete

Conclusion

With this update, Distributed Map now supports additional data inputs, so you can orchestrate large-scale analytics and ETL workflows. You can now process Amazon Athena data manifest and Parquet files directly, eliminating the need for custom pre-processing. You also now have visibility into your Distributed Map usage with the following metrics: Approximate Open Map Runs Count, Open Map Run Limit, and Approximate Map Runs Backlog Size.

New input sources for Distributed Map are available in all commercial AWS Regions where AWS Step Functions is available. For a complete list of AWS Regions where Step Functions is available, see the AWS Region Table. The improved observability of your Distributed Map usage with new metrics is available in all AWS Regions. To get started, you can use the Distributed Map mode today in the AWS Step Functions console. To learn more, visit the Step Functions developer guide.

For more serverless learning resources, visit Serverless Land.

Optimizing nested JSON array processing using AWS Step Functions Distributed Map

2025-11-05 Biswanath Mukherjee

Post Syndicated from Biswanath Mukherjee original https://aws.amazon.com/blogs/compute/optimizing-nested-json-array-processing-using-aws-step-functions-distributed-map/

When you’re working with large datasets, you’ve likely encountered the challenge of processing complex JSON structures in your automated workflows. You need to preprocess arrays within nested JSON objects before you can run parallel processing on them. Extracting data used to require custom code and extra processing steps, delaying you from building your core application logic.

With AWS Step Functions Distributed Map, you can process large datasets with concurrent iterations of workflow steps across data entries. Using the enhanced ItemsPointer feature of Distributed Maps, you can extract array data directly from JSON objects stored in Amazon S3. Alternatively, for JSON object as state input, you can use Items (JSONata) or ItemsPath (JSONPath). With this enhancement you can point directly to arrays nested within JSON structures, eliminating the need for custom preprocessing of your data. With ItemsPointer, Items, and ItemsPath you can select the nested array data and simplify your workflows.

In this post, we explore how to optimize processing array data embedded within complex JSON structures using AWS Step Functions Distributed Map. You’ll learn how to use ItemsPointer to reduce the complexity of your state machine definitions, create more flexible workflow designs, and streamline your data processing pipelines—all without writing additional transformation code or AWS Lambda functions.

This post is part of a series of post about AWS Step Functions Distributed Map:

Processing Amazon S3 objects at scale with AWS Step Functions Distributed Map S3 prefix
Optimizing nested JSON array processing using AWS Step Functions Distributed Map
Orchestrating big data processing with AWS Step Functions Distributed Map

Use case: e-commerce product data enrichment

In this e-commerce use case example, you’ll build a sample application that demonstrates processing of product inventory data for an e-commerce application using AWS Step Functions Distributed Map. The application receives a JSON file from an upstream application containing an array of product information. The Step Functions workflow reads the JSON file containing product data from an S3 bucket and iterates over the array to enrich each product data in the array.

The following diagram presents the AWS Step Functions state machine.

JSON array processing workflow

The JSON array is processed using the following workflow:

The state machine reads the product-updates.json file from an input S3 bucket. The file contains a JSON array of products.
The Distributed Map state in the state machine, selects the JSON array node using ItemsPointer and iterates over the JSON array.
For each of the items within the array, the state machine invokes a Lambda function for data enrichment. The Lambda function adds product stock and price information to the product data.
The state machine saves the updated product data in an Amazon DynamoDB table.
Finally, the state machine uploads the execution metadata into an output S3 bucket. See limits related to state machine executions and task executions.

MaxConcurrency can be configured to specify the number of child workflow executions in a Distributed Map that can run in parallel. If not specified, then Step Functions doesn’t limit concurrency and runs 10,000 parallel child workflow executions.

You can read a JSON file from a S3 bucket using ItemReader and its sub-fields. If the JSON file, from the S3 bucket, contains a nested object structure, you can select the specific node with your data set with an ItemsPointer. For example, the following input JSON file:

{
  "version": "2024.1",
  "timestamp": "2025-09-26T10:49:36.646197",
  "productUpdates": {
    "items": [
      {
        "productId": "PROD-001",
        "name": "Wireless Headphones",
        "price": 79.99,
        "stock": 150,
        "category": "Electronics"
      },
      {
        "productId": "PROD-002",
        "name": "Smart Watch",
        "category": "Electronics"
      },
      …
    ]
  }
}

The following JSONata-based workflow configuration extracts a nested list of products from productUpdates/items:

"ItemReader": {
   "Resource": "arn:aws:states:::s3:getObject",
   "ReaderConfig": {
      "InputType": "JSON",
      "ItemsPointer": "/productUpdates/items"
   },
   "Arguments": {
      "Bucket": "amzn-s3-demo-bucket",
      "Key": "updates/product-updates.json"
   }
}

For JSONPath-based workflow note that Arguments is replaced with Parameters:

"ItemReader": {
   "Resource": "arn:aws:states:::s3:getObject",
   "ReaderConfig": {
      "InputType": "JSON",
      "ItemsPointer": "/productUpdates/items"
   },
   "Arguments": {
      "Bucket": "amzn-s3-demo-bucket",
      "Key": "updates/product-updates.json"
   }
}

The ItemReader field is not needed when your dataset is JSON data from a previous step. ItemsPointer is only applicable when the input JSON objects read from an S3 bucket. If you are using JSON as state input to a Distributed Map, then you can use the ItemsPath (for JSONPath) or Items (for JSONata) field to specify a location in the input that points to JSON array or object used for iterations.

Prerequisite

To use Step Functions Distributed Map, verify you have:

Access to an AWS account through the AWS Management Console and the AWS Command Line Interface (AWS CLI). The AWS Identity and Access Management (IAM) user that you use must have permissions to make the necessary AWS service calls and manage AWS resources mentioned in this post. While providing permissions to the IAM user, follow the principle of least-privilege.
AWS CLI installed and configured. If you are using long-term credentials like access keys, follow manage access keys for IAM users and secure access keys for best practices.
Git Installed
AWS Serverless Application Model (AWS SAM) installed
Python 3.13+ installed

Set up and run the workflow

Run the following steps to deploy the Step Functions state machine.

Clone the GitHub repository in a new folder and navigate to the project folder.

git clone https://github.com/aws-samples/sample-stepfunctions-json-array-processor.git
cd sample-stepfunctions-json-array-processor

Run the following commands to deploy the application.
```
sam deploy --guided
```
Enter the following details:
- Stack name: Stack name for CloudFormation (for example, stepfunctions-json-array-processor)
- AWS Region: A supported AWS Region (for example, us-east-1)
- Accept all other default values.
The outputs from the sam deploy will be used in the subsequent steps.
Run the following command to generate product-updates.json file containing a nested JSON array of sample products and upload the product-updates.json file to the input S3 bucket. Replace InputBucketName with the value from sam deploy output.
```
python3 scripts/generate_sample_data.py <InputBucketName>
```
Run the following command to start execution of the Step Functions workflow. Replace the StateMachineArn with the value from sam deploy output.
```
aws stepfunctions start-execution \
  --state-machine-arn <StateMachineArn> \
  --input '{}'
```
The state machine reads the input product-updates.json file and invokes a Lambda function to update the database for every product in the array after adding price and stock information. The execution metadata is also uploaded into the results bucket.

Monitor and verify results

Run the following steps to monitor and verify the test results.

Run the following command to get the details of the execution. Replace executionArn with your state machine ARN.
```
aws stepfunctions describe-execution --execution-arn <executionArn>
```
Wait until the status shows SUCCEEDED.
Run the following commands to validate the processed output from ProductCatalogTableName DynamoDB table. Replace the value ProductCatalogTableName with the value from sam deploy output.
```
aws dynamodb scan --table-name <ProductCatalogTableName>
```

Check that the DynamoDB table contains the enriched product data including price and stock attributes. Example output:

{
    "Items": [
        {
            "ProductId": {
                "S": "PROD-005"
            },
            "lastUpdated": {
                "S": "2025-10-07T20:33:34.507Z"
            },
            "stock": {
                "N": "129"
            },
            "price": {
                "N": "139.25"
            }
        },
        {
            "ProductId": {
                "S": "PROD-003"
            },
            "lastUpdated": {
                "S": "2025-10-07T20:33:34.576Z"
            },
            "stock": {
                "N": "471"
            },
            "price": {
                "N": "40.92"
            }
        },
	      …
    ],
    "Count": 5,
    "ScannedCount": 5,
    "ConsumedCapacity": null
}

Clean up

To avoid costs, remove all resources you’ve created while following along with this post.

Run the following command after replacing the <placeholder> variable to delete the resources you deployed for this post’s solution:

aws s3 rm s3://<InputBucketName> --recursive
aws s3 rm s3://<ResultBucketName> --recursive
sam delete

Conclusion

In this post, you learned how to use Step Functions Distributed Map for extracting array data natively from JSON objects stored in a S3 bucket. By removing custom data extraction code, you can simplify the processing of your large-scale parallel workloads. With ItemsPointer you can extract array data within JSON files stored in a S3 bucket , and with Items(JSONata) or ItemsPath (JSONPath), you can extract arrays from complex JSON state input, adding flexibility to your workflow designs.

New input sources for Distributed Map are available in all commercial AWS Regions where AWS Step Functions is available. For a complete list of AWS Regions where Step Functions is available, see the AWS Region Table. To get started, you can use the Distributed Map mode today in the AWS Step Functions console. To learn more, visit the Step Functions developer guide.

For more serverless learning resources, visit Serverless Land.

Enhanced search with match highlights and explanations in Amazon SageMaker

2025-11-05 Ramesh H Singh

Post Syndicated from Ramesh H Singh original https://aws.amazon.com/blogs/big-data/enhanced-search-with-match-highlights-and-explanations-in-amazon-sagemaker/

Amazon SageMaker now enhances search results in Amazon SageMaker Unified Studio with additional context that improves transparency and interpretability. Users can see which metadata fields matched their query and understand why each result appears, increasing clarity and trust in data discovery. The capability introduces inline highlighting for matched terms and an explanation panel that details where and how each match occurred across metadata fields such as name, description, glossary, and schema. Enhanced search results reduces time spent evaluating irrelevant assets by presenting match evidence directly in search results. Users can quickly validate relevance without analyzing individual assets.

In this post, we demonstrate how to use enhanced search in Amazon SageMaker.

Search results with context

Text matches include keyword match, begins with, synonyms, and semantically related text. Enhanced search displays search result text matches in these locations:

Search result: Text matches in each search result’s name, description, and glossary terms are highlighted.
About this result panel: A new About this result panel is displayed to the right of the highlighted search result. The panel displays the text matches for the result item’s searchable content including name, description, glossary terms, metadata, business names, and table schema. The list of unique text match values is displayed at the top of the panel for quick reference.

Data catalogs contain thousands of datasets, models, and projects. Without transparency, users can’t tell why certain results appear or trust the ordering. Users need evidence for search relevance and understandability.

Enhanced search with match explanations improves catalog search in four key ways:
1) transparency is increased because users can see why a result appeared and gain trust,
2) efficiency improves since highlights and explanations reduce time spent opening irrelevant assets,
3) governance is supported by showing where and how terms matched, aiding audit and compliance processes, and
4) consistency is reinforced by revealing glossary and semantic relationships, which reduces misunderstanding and improves collaboration across teams.

How enhanced search works

When a user enters a query, the system searches across multiple fields like name, description, glossary terms, metadata, business names and table schema. With enhanced search transparency, each search result includes the list of text matches that were the basis for including the result, including the field that contained the text match, and a portion of the field’s text value before and after the text match, to provide context. The UI uses this information to display the returned text with the text match highlighted.

For example, a steward searches for “revenue forecasting,” and an asset is returned with the name “Sales Forecasting Dataset Q2” and a description that contains “projected sales figures.” The word sales is highlighted in the name and description, in both the search result and the text matches panel, because sales is a synonym for revenue. The About this result panel also shows that forecast was matched in the schema field name sales_forecast_q2.

Solution overview

In this section we demonstrate how to use the enhanced search features. In this example, we will be demonstrating the use in a marketing campaign where we need user preference data. While we have multiple datasets on users, we will demonstrate how enhanced search simplifies the discovery experience.

Prerequisites

To test this solution you should have an Amazon SageMaker Unified Studio domain set up with a domain owner or domain unit owner privileges. You should also have an existing project to publish assets and catalog assets. For instructions to create these assets, see the Getting started guide.

In this example we created a project named Data_publish and loaded data from the Amazon Redshift sample database. To ingest the sample data to SageMaker Catalog and generate business metadata, see Create an Amazon SageMaker Unified Studio data source for Amazon Redshift in the project catalog.

Asset discovery with explainable search

To find assets with explainable search:

Log in to SageMaker Unified Studio.
Enter the search text user-data. While we get the search results in this view, we want to get further details on each of these datasets. Press enter to go to full search.
In full search, search results are returned when there are text matches based on keyword search, starts with, synonym, and semantic search. Text matches are highlighted within the searchable content that is shown for each result: in the name, description, and glossary terms.
To further enhance the discovery experience and find the right asset, you can look at the About this result panel on the right and see the other text matches, for example, in the summary, table name, data source database name, or column business name, to better understand why the result was included.
After examining the search results and text match explanations, we identified the asset named Media Audience Preferences and Engagement as the right asset for the campaign and selected it for analysis.

Conclusion

Enhanced search transparency in Amazon SageMaker Unified Studio transforms data discovery by providing clear visibility into why assets appear in search results. The inline highlighting and detailed match explanations help users quickly identify relevant datasets while building trust in the data catalog. By showing exactly which metadata fields matched their queries, users spend less time evaluating irrelevant assets and more time analyzing the right data for their projects.

Enhanced search is now available in AWS Regions where Amazon SageMaker is supported.

To learn more about Amazon SageMaker, see the Amazon SageMaker documentation.

About the authors

New whitepaper available – AI for Security and Security for AI: Navigating Opportunities and Challenges

2025-11-04 Debashis Das

Post Syndicated from Debashis Das original https://aws.amazon.com/blogs/security/new-whitepaper-available-ai-for-security-and-security-for-ai-navigating-opportunities-and-challenges/

The emergence of AI as a transformative force is changing the way organizations approach security. While AI technologies can augment human expertise and increase the efficiency of security operations, they also introduce risks ranging from lower technical barriers for threat actors to inaccurate outputs.

As AI adoption accelerates alongside cyber threats and a growing patchwork of regulations, adapting security and compliance strategies is critical.

The World Economic Forum Global Cybersecurity Outlook 2025 reveals that 66% of organizations expect AI to significantly impact cybersecurity.

We’re excited to share a whitepaper we recently authored with SANS Institute called AI for Security and Security for AI: Navigating Opportunities and Challenges. The whitepaper explores the use of AI systems through three interconnected lenses: securing generative AI applications, using generative AI to strengthen overall security posture in the cloud, and protecting against generative AI-powered threats. Key considerations include the following:

Understanding generative AI and AI agents
Scoping generative AI use cases
Using key concepts to help architect generative AI solutions
Verifying large language model (LLM) outputs with automated reasoning
Implementing responsible AI practices throughout the AI lifecycle
Scaling security best practices
Balancing AI automation with human oversight

Effectively using generative AI technologies to enhance your security posture while reducing associated risks is an iterative process that is different for every organization. The whitepaper details key action items that can help set you on the right path. We encourage you to download it, and gain insight into how you can address generative AI security with a multi-layered strategy that meaningfully improves your technical and business outcomes. We look forward to your feedback, and to continuing the journey together.

Download AI for Security and Security for AI: Navigating Opportunities and Challenges.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Amazon Kinesis Data Streams launches On-demand Advantage for instant throughput increases and streaming at scale

2025-11-04 Pratik Patel

Post Syndicated from Pratik Patel original https://aws.amazon.com/blogs/big-data/amazon-kinesis-data-streams-launches-on-demand-advantage-for-instant-throughput-increases-and-streaming-at-scale/

Today, AWS announced the new Amazon Kinesis Data Streams On-demand Advantage mode, which includes warm throughput capability and an updated pricing structure. With this feature you can enable instant scaling for traffic surges while optimizing costs for consistent streaming workloads. On-demand Advantage mode is a cost-effective way to stream with Kinesis Data Streams for use cases that ingest at least 10 MiB/s in aggregate or have hundreds of data streams in an AWS Region.

In this post, we explore this new feature, including key use cases, configuration options, pricing considerations, and best practices for optimal performance.

Real-world use cases

As streaming data volumes grow and use cases evolve, you can face two common challenges with your streaming workloads:

Challenge 1: Preparing for traffic spikes

Many businesses experience predictable but significant traffic surges during events like product launches, content releases, or holiday sales. Using an on-demand capacity mode, you have to complete several steps when preparing for traffic spikes:

Transition to provisioned mode
Manually estimate and increase shards based on anticipated peak demand
Wait for scaling operations to finish
Subsequently return to on-demand mode

This mode-switching process was time consuming, required careful planning, and introduced operational complexity, forcing customers to either accept this operational burden, overprovision capacity well in advance, or risk throttling during critical business periods when data ingestion reliability matters most.

Challenge 2: Cost optimization for consistent workloads

Organizations with large, consistent streaming workloads want to optimize costs without sacrificing the simplicity and scalability available with on-demand streams. On-demand capacity mode serves well for fluctuating data traffic, yet customers desired a more economical approach to handle high-volume streaming workloads.

On-demand Advantage directly address both challenges by providing the capability to warm on-demand streams and a new pricing structure. With the new On-demand Advantage mode, there is no longer a fixed, per-stream charge, and the throughput usage is priced at a lower rate. The only requirement is that the account commits to streaming with at least 25 MiB/s of data ingest and 25 MiB/s of data retrieval usage.

This launch improves data streaming across multiple industries:

Online gaming companies can now prepare their streams for game launches without the cumbersome process of switching between modes and manually calculating shard requirements
Media and entertainment providers can support smooth data ingestion during major content releases and live events
E-commerce services can handle holiday sales traffic while optimizing costs for their baseline workloads.

By combining instant scaling with cost efficiency, you can confidently manage both predictable traffic surges and consistent streaming volumes without compromising on performance or budget.

How it works

The key features of On-demand Advantage mode are warm throughput and committed-usage pricing.

Warm throughput

With the warm throughput feature, available once you’ve enabled On-demand Advantage mode, you can configure your Kinesis Data Streams on-demand streams to have instantly available throughput capacity up to 10 GiB/s. This means you can proactively prepare on-demand streams for expected peak traffic events without the cumbersome process of switching between provisioned modes and manually calculating shard requirements. Key benefits include:

The ability to prepare for peak events so you can handle traffic surges smoothly
Alleviation of the need to build custom scaling solutions
The capability to continue scaling automatically beyond warm throughput if needed, up to 10 GiB/s or 10 million events per second
No additional fee for maintaining warm capacity

Committed-usage pricing

When you’ve enabled On-demand Advantage mode, the billing for the on-demand streams switches to a new structure that removes the stream hour charge and offers a discount of at least 60% for the throughput usage. Based on US East (N. Virginia) pricing, data ingested is priced 60% lower, data retrieval is priced 60% lower, Enhanced fan-out data retrieval is 68% lower, and extended retention is priced 77% lower. In return, you commit to stream 25 MiB/s for at least 24 hours. Even when actual usage is lower, if you enable this setting, you’re charged for the minimum 25 MiB/s throughput at the discounted price. Overall, the signficant discounts offered means that On-demand Advantage is more cost-effective for use cases that ingest at least 10 MiB/s in aggregate, fan out to more than two consumer applications, or have hundreds of data streams in an AWS Region.

Getting started

Follow these steps to start using On-demand Advantage mode.

Enabling On-demand Advantage mode

To start using the On-demand Advantage mode:

In the AWS Management Console

Navigate to the Kinesis Data Streams console
Navigate to the Account Settings tab
Choose Edit billing mode
Select the On-demand Advantage option
Select the checkbox, I acknowledge this change cannot be reverted for 24 hours
Choose Save changes

Using the AWS CLI

You can run the following CLI command to enable the minimum throughput billing commitment:

aws kinesis update-account-settings \
--minimum-throughput-billing-commitment Status=ENABLED

Using the AWS SDK

You can use the SDK to enable the minimum throughput billing commitment. The following Python example shows how to do it:

import boto3

client = boto3.client('kinesis')
response = client.update_account_settings(
    MinimumThroughputBillingCommitment={"Status": "ENABLED"}
)

Once enabled, you commit your stream to this pricing mode for a minimum period of 24 hours, after which you can opt out as needed.

Configuring warm throughput

To start using warm throughput for Kinesis Data Streams On-demand:

Using the AWS Management Console

Navigate to the Kinesis Data Streams console
Select your stream and go to the Configuration tab
Choose Edit next to Warm Throughput
Set your desired warm throughput (up to 10 GiB/s)
Save your changes

Using the AWS CLI

You can run the following CLI command to enable the warm throughput:

aws kinesis update-stream-warm-throughput \
  --stream-name MyStream \
  --warm-throughput-mi-bps 1000

Using the AWS SDK:

You can use the SDK to enable warm throughput. The following Python example shows how to do it:

import boto3

client = boto3.client('kinesis')
response = client.update_stream_warm_throughput(
    StreamName='MyStream',
    WarmThroughputMiBps=1000
)

You can also create a new on-demand stream with warm throughput using the existing CreateStream API, or set warm throughput when converting a data stream from provisioned to On-demand Advantage mode.

Throttling and best practices for optimal performance

When working with warm throughput, it’s important to understand how capacity is managed. Each stream can instantly handle traffic up to the configured warm throughput level and will automatically scale beyond that as needed.

For optimal performance with warm throughput:

Use a uniformly distributed partition key strategy to evenly distribute records across shards and avoid hotspots and consider your partition key strategy carefully as you can ingest a maximum of 1 MiB/s of data per partition key, regardless of the warm throughput configured.
Monitor throughput metrics to adjust warm throughput settings based on actual usage patterns.
Implement backoff and retry logic in producer applications to handle potential throttling.

For cost optimization with committed usage pricing:

Analyze your daily throughput to verify it is at least 10 MiB/s.
Consider consolidating streams across your organization to maximize the benefit of the discount for on-demand streams.
Use cost effective data retrievals with – Use Enhanced Fan-Out – Use Enhanced Fan-Out consumers for applications that need dedicated throughput with 68% lower data retrievals cost in advantage mode.

Warm throughput in action

To demonstrate how warm throughput behaves, we enabled committed pricing in an AWS account and created two on-demand streams: “KDS-OD-STANDARD” and “KDS-OD-WARM-TP”. The “KDS-OD-WARM-TP” stream was configured with 100 MiB/second warm throughput, while “KDS-OD-STANDARD” remained as a regular on-demand stream without warm throughput, as demonstrated in the following screenshot.

In our experiment, we initially simulated approximately 2 MiB/second traffic ingest for both “KDS-OD-STANDARD” and “KDS-OD-WARM-TP” streams. We used a UUID as a partition key so that traffic was evenly distributed across the shards of the Kinesis data streams, helping prevent potential hotspots that might skew our results. After establishing this baseline, we increased the ingest traffic to around 28 MiB/second within 10 minutes. We then further escalated the traffic to exceed 60 MiB/second within 15 minutes of the initial increase, as illustrated in the following screenshot.

The following graph shows the ThrottledRecords CloudWatch metric for both “KDS-OD-STANDARD” and “KDS-OD-WARM-TP” that the warm throughput-enabled stream (“KDS-OD-WARM-TP”) did not encounter throttles during both traffic spikes, as it had 100 MiB/second warm throughput configured. In contrast, the standard on-demand stream (“KDS-OD-STANDARD”) experienced throttling when we increased traffic by 14x initially and by 2x later, before eventually scaling to bring throttles back to zero. This experiment demonstrates that you can use warm throughput to instantly prepare for peak usage times and avoid throttling during sudden traffic increases.

Conclusion

As we outlined in this post, the new Amazon Kinesis Data Streams On-demand Advantage mode provides significant benefits for organizations of different sizes:

Instant scaling for predictable traffic surges without overprovisioning.
Cost optimization for consistent streaming workloads with at least 60% discount.
Simplified operations with no need to switch between different capacity modes.
Enhanced flexibility to handle both expected and unexpected traffic patterns.

With these enhancements you can build and operate real-time streaming applications at many scales. Kinesis Data Streams now provides the ideal combination of scalability, performance, and cost-efficiency.

To learn more about these new features, visit the Amazon Kinesis Data Streams documentation.

About the authors

AWS Weekly Roundup: Project Rainier online, Amazon Nova, Amazon Bedrock, and more (November 3, 2025)

2025-11-03 Betty Zheng (郑予彬)

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-project-rainier-online-amazon-nova-amazon-bedrock-and-more-november-3-2025/

Last week I met Jeff Barr at the AWS Shenzhen Community Day. Jeff shared stories about how builders around the world are experimenting with generative AI and encouraged local developers to keep pushing ideas into real prototypes. Many attendees stayed after the sessions to discuss model grounding, evaluation, and how to bring generative AI into real applications.

Community builders showcased creative Kiro-themed demos, AI-powered IoT projects, and student-led experiments. It was inspiring to see new developers, students, and long-time Amazon Web Services (AWS) community leaders connecting over shared curiosity and excitement for generative AI innovation.

Project Rainier, one of the world’s most powerful operational AI supercomputers is now online. Built by AWS in close collaboration with Anthropic, Project Rainier brings nearly 500,000 AWS custom-designed Trainium2 chips into service using a new Amazon Elastic Compute (Amazon EC2) UltraServer and EC2 UltraCluster architecture designed for high-bandwidth, low-latency model training at hyperscale.

Anthropic is already training and running inference for Claude on Project Rainier, and is expected to scale to more than one million Trainium2 chips across direct usage and Amazon Bedrock by the end of 2025. For architecture details, deployment insights, and behind-the-scenes video of an UltraServer coming online, refer to AWS activates Project Rainier for the full announcement.

Last week’s launches
Here are the launches that got my attention this week:

Amazon Nova – Adds Web Grounding as a new built-in tool for real-time, citation-based web retrieval, and introduces Multimodal Embeddings, a state-of-the-art model that produces unified cross-modal vectors, improving accuracy for Retrieval Augmented Generation (RAG) and semantic search. Both capabilities are available in Amazon Bedrock.
Amazon Bedrock – TwelveLabs’ Marengo Embed 3.0 is now available for long-form, video-native multimodal embeddings across video, images, audio, and text with improved domain accuracy. Stability AI Image Services added four new tools: Outpaint, Fast Upscale, Conservative Upscale, and Creative Upscale for high-resolution upscaling, outpainting, and controlled variations.
Model Context Protocol (MCP) Proxy for AWS – Now generally available as a client-side proxy that connects MCP clients to remote AWS hosted MCP servers using SigV4 authentication. It works with tools like Amazon Q Developer CLI, Kiro, Cursor, and Strands Agents, and provides safety controls such as read-only mode, retry logic, and logging. The Proxy is open-source. You can visit the AWS GitHub repository to view the installation and configuration options and start connecting with remote AWS MCP servers.
Amazon Elastic Container Service (Amazon ECS) – Now supports built-in linear and canary deployment strategies, providing gradual traffic shifting, canary testing with small production slices, deployment bake times for safe rollback, and Amazon CloudWatch alarm-based automated rollbacks.
Amazon DocumentDB – Adds a new query planner in Amazon DocumentDB 5.0 that delivers up to 10 times faster query performance with more optimal index plans and support for $neq, $nin, and nested $elementMatch, and can be enabled through cluster parameter groups without downtime.
Amazon Elastic Block Store (Amazon EBS) – You can now use new per-volume CloudWatch metrics, VolumeAvgIOPS and VolumeAvgThroughput, to get minute-level visibility into average IOPS and throughput for EBS volumes on AWS Nitro based instances. These metrics help monitor performance trends, troubleshoot bottlenecks, and optimize provisioned capacity.
Amazon Kinesis Data Streams – You can now send individual records up to 10 MiB, a tenfold increase from the previous limit, helping support larger Internet of Things (IoT), change data capture (CDC), and AI-generated payloads.
Amazon SageMaker – Unified Studio search results now provide additional search context, showing matched metadata fields and ranking rationale to improve transparency and relevance in data discovery.

Additional updates
Here are some additional projects, blog posts, and news items that I found interesting:

Building production-ready 3D pipelines with AWS VAMS and 4D Pipeline – A reference architecture for creating scalable, cloud-based 3D asset pipelines using AWS Visual Asset Management System (VAMS) and 4D Pipeline, supporting ingest, validation, collaborative review, and distribution across games, visual effects (VFX), and digital twins.
Amazon Location Service introduces new API key restrictions – You can now create granular security policies with bundle IDs to restrict API access to specific mobile applications, improving access control and strengthening application-level security across location-based workloads.
AWS Clean Rooms launches advanced SQL configurations – A performance enhancement for Spark SQL workloads that supports runtime customization of Spark properties and compute sizes, plus table caching for faster and more cost-efficient processing of large analytical queries.
AWS Serverless MCP Server adds event source mappings (ESM) tools – A capability for event-driven serverless applications that supports configuration, performance tuning, and troubleshooting of AWS Lambda event source mappings, including AWS Serverless Application Model (AWS SAM) template generation and diagnostic insights.
AWS IoT Greengrass releases an AI agent context pack – A development accelerator for cloud-connected edge applications that provides ready-to-use instructions, examples, and templates, helping teams integrate generative AI tools such as Amazon Q for faster software creation, testing, and fleet-wide deployment. It’s available as open source on the GitHub repository.
AWS Step Functions introduces a new metrics dashboard – You can now view usage, billing, and performance metrics at the state-machine level for standard and express workflows in a single console view, improving visibility and troubleshooting for distributed applications.

Upcoming AWS events
Check your calendars so that you can sign up for these upcoming events:

AWS Builder Loft – A community tech space in San Francisco where you can learn from expert sessions, join hands-on workshops, explore AI and emerging technologies, and collaborate with other builders to accelerate their ideas. Browse the upcoming sessions and join the events that interest you.
AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by experienced AWS users and industry leaders from around the world: Hong Kong (November 2), Abuja (November 8), Cameroon (November 8), and Spain (November 15).
AWS Skills Center Seattle 4th Anniversary Celebration – A free, public event on November 20 with a keynote, learned panels, recruiter insights, raffles, and virtual participation options.

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse here for upcoming in-person events, developer-focused events, and events for startups.

That’s all for this week. Check back next Monday for another Weekly Roundup!

– Betty

Introducing AWS Lambda event source mapping tools in the AWS Serverless MCP Server

2025-10-30 Ben Freiberg

Post Syndicated from Ben Freiberg original https://aws.amazon.com/blogs/compute/introducing-aws-lambda-event-source-mapping-tools-in-the-aws-serverless-mcp-server/

Modern serverless applications increasingly rely on event-driven architectures, where AWS Lambda functions process events from various sources like Amazon Kinesis, Amazon DynamoDB Streams, Amazon Simple Queue Service (Amazon SQS), Amazon Managed Streaming for Apache Kafka (Amazon MSK), and self-managed Apache Kafka.

Although event source mappings (ESM) offer a powerful mechanism for integrating AWS Lambda with stream and queue-based sources, configuring them to align with high-level architectural goals can sometimes involve navigating a broad set of options and parameters. Achieving an optimal configuration typically requires mapping developer intent to several technical settings, which can introduce inefficiencies or operational overhead.

In May 2025, AWS launched the AWS Serverless MCP Server, which provided AI-powered assistance for serverless application development, including infrastructure provisioning, deployment automation, and architectural guidance. Building on this foundation, AWS is now expanding the Serverless MCP Server to include specialized ESM tools.

These new dedicated tools in the AWS Serverless Model Context Protocol (MCP) Server combine the power of AI assistance with ESM expertise to enhance how developers build and manage event-driven serverless applications using Lambda. The new ESM tools provide contextual guidance specific to ESM configuration that address the challenges of event-driven development.

This post describes how the new tools under Serverless MCP Server work with AI coding assistants to streamline event source mapping management. Learn how to use this solution to accelerate your event-driven development workflow and build robust, high-performing applications more efficiently.

Overview

An event source mapping is a Lambda resource that reads items from stream and queue-based services and invokes a function with batches of records. Within an event source mapping, resources called event pollers actively poll for new messages and invoke functions. Using ESMs, AWS Lambda functions can automatically consume events from various sources without requiring custom polling infrastructure. Lambda handles the complexity of scaling, batching, filtering, and error handling, helping developers focus on business logic.

Navigating ESM configurations

Configuring these mappings optimally, especially for virtual private cloud (VPC)-based sources like Apache Kafka, requires additional understanding of networking, permissions, and performance tuning.

When working with event source mappings, developers need to address several technical considerations. For Kafka Streams using VPC-based Amazon Managed Streaming for Apache Kafka or self-managed Apache Kafka, configurations involve networking setup to enable Lambda access to Kafka topics. Developers must manage bootstrap servers, AWS Identity and Access Management (IAM) permissions, and topic access settings, while also handling authentication including SASL/SCRAM credentials, mTLS certificate management, and Kafka ACL permissions.

Developers need to know how to translate performance requirements, such as processing 1,000 events per second, into specific ESM parameter configurations. Depending on the stream source, this involves determining appropriate batch sizes, parallelization factors, and retry policies while managing iterator age, offset lag and potential timeout issues. Additionally, developers need visibility into configuration effectiveness and other diagnostic information to optimize resource allocation and ensure reliable event processing.

Dedicated event source mapping tools

The new ESM tools in the open source AWS Serverless MCP Server address these challenges by providing AI assistants with proven knowledge of event source mapping patterns and best practices. These tools guide developers through the entire ESM lifecycle, from initial setup to optimization and troubleshooting. They also enhance the event-driven development experience by translating the developers intent into detailed, technical configuration, helping developers express high-level goals such as desired throughput, latency, or reliability requirements. The new tools cover all areas of event source mapping management:

Setup and configuration: Developers initialize new event source mapping configurations using AWS Serverless Application Model (AWS SAM) templates, select appropriate event source settings, and configure networking requirements for VPC-based sources like Amazon MSK.
Optimization and tuning: As applications evolve, the tools assists with fine-tuning ESM parameters like batch size, batching window, retry policies, and parallelization factors based on performance goals and telemetry data.
Troubleshooting and diagnostics: Specialized tools diagnose ESM connectivity issues, analyze Amazon CloudWatch Logs and metrics, and recommend solutions for common problems like VPC misconfigurations or permission errors.

Event source mapping tools in action

This example walks you through a scenario of creating, optimizing, and troubleshooting an event source mapping for Amazon MSK to demonstrate the capabilities of the new ESM tools.

Prerequisites and installation

To get started, download or update the AWS Serverless MCP Server from GitHub or Python Package Index (PyPi) and follow the installation instructions. You can use this MCP server with any AI coding assistant of your choice, such as Amazon Q Developer, Cursor, Cline, Kiro, and more.

Add the following code to your MCP client configuration:

{
  "mcpServers": {
    "awslabs.aws-serverless-mcp-server": {
      "command": "uvx",
      "args": [
        "awslabs.aws-serverless-mcp-server@latest"
      ],
      "env": { 
        "AWS_PROFILE": "your-aws-profile",
        "AWS_REGION": "us-east-1",
        "FASTMCP_LOG_LEVEL": "ERROR"
      }
    }
  }
}

The Serverless MCP Server incorporates built-in guardrails to ensure secure and controlled development. By default, the server operates in a read-only mode, allowing only non-mutating actions. With this safety-first approach, you can explore ESM capabilities and architectural patterns while preventing unintended changes to your applications or infrastructure.

Creating and configuring an event source mapping

Imagine you want to set up a Lambda function to process events from an Amazon MSK cluster. Start by prompting your AI assistant:

Create a new Kafka cluster and a VPC named <your-vpc-name> in <your-aws-region>. The cluster should be in the VPC’s private subnets. Then, create a Lambda function to consume from the stream within the same VPC cluster. Prefix all created resources with <your-prefix>.

The agent uses the esm_guidance to receive tailored guidance based on your use case and performance requirements. The tool analyzes your intent and provides step-by-step instructions for setting up the ESM with optimal configurations.

Apart from creating deployment and initialization scripts and supporting documentation, properly configured IAM polices and security groups rules to access the cluster are also generated. The assistant then validates the ESM parameters against AWS limits and best practices.

Next, you want to understand the networking requirements:

My Kafka cluster is in a VPC. What networking configuration do I need for Lambda to access it?

The Serverless MCP Server provides specialized guidance for VPC-based Kafka configurations using the esm_guidance tool with guidance_type=”networking”. This guidance provides detailed information about subnet requirements, security group rules, and NAT gateway setup, and it validates your network topology for reliable connectivity.

Optimizing event source mapping performance

After your ESM is running, you notice that processing latency is higher than expected. You can ask for optimization guidance:

I have an ESM with UUID <your-esm-uuid> in <your-aws-region>. My target throughput is between 10 MB/s and 100 MB/s. Please update my ESM configuration to meet these throughput requirements while optimizing the cost of the event pollers.

The server uses the esm_optimize tool to analyze your current configuration and provide optimization recommendations. The tool supports three main actions:

Analysis mode: (action="analyze") Analyzes configuration tradeoffs for your optimization targets (throughput, latency, cost, failure rate)
Validation mode: (action="validate") Validates your ESM configuration against AWS limits and event source restrictions
Template generation: (action="generate_template") Creates updated AWS SAM templates with optimized configurations

You can use this tool to get guidance on your event source mapping configurations for Amazon SQS, Amazon Kinesis Data Streams, and Amazon DynamoDB Streams. Here are two examples:

I have a Kinesis stream with 100 shards receiving 100 MB/s of data. My Lambda function processes each record in about 50ms. Currently, my ESM has ParallelizationFactor=1 and BatchSize=100, but I’m seeing high iterator age (over 60 seconds) during peak times. How should I optimize my ESM configuration to reduce processing latency and handle the throughput?

I have an SQS standard queue that receives 50,000 messages per hour during peak times. Each message takes about 2 seconds to process. My current ESM configuration has BatchSize=10 and no ScalingConfig set. I’m seeing message delays during peak hours. How should I optimize my ESM configuration for better throughput while keeping costs reasonable?

The tool generates updated AWS Serverless Application Model (AWS SAM) templates with the recommended configurations, making it easy to apply the changes through your deployment pipeline. However, it always requires explicit user confirmation before any deployment.

Troubleshooting event source mapping issues

When an issue arises, the ESM tools provide diagnostic capabilities. For example, if your ESM stops processing events:

I have a cluster called <your-kafka-cluster-name> and a consumer Lambda function named <your-lambda-function-name>in <your-aws-region>. Please investigate why my ESM (UUID: <your-esm-uuid>) trigger is not working and provide updated configurations to resolve the issue.

The server uses the esm_kafka_troubleshoot tool to provide comprehensive troubleshooting for Apache Kafka clusters. The tool supports two main modes:

Diagnostic mode: (issue_type="diagnosis") Analyzes your ESM status and provides diagnostic indicators. This helps identify whether timeouts occur before or after reaching Kafka brokers. It categorizes issues into specific types for targeted resolution.
Resolution mode: Provides step-by-step resolution guidance for specific issues.

The tool automatically detects your event source type and provides tailored guidance. It validates VPC connectivity, examines IAM permissions, checks security group configurations, and analyzes CloudWatch Logs to provide a detailed diagnosis report with specific remediation steps.

Key benefits

The event source mapping tools in the AWS Serverless MCP Server provide unique advantages over traditional event source mapping configuration approaches:

AI-powered configuration translation: The tools translate high-level developer intent (such as process 1,000 events per second) into specific ESM parameters like batch size, parallelization factor, and batching window.
Complete infrastructure-as-code generation: Unlike generic AWS CLI tools that provide individual commands, ESM tools generate complete AWS SAM templates, initialization scripts, cleanup scripts, and validation scripts for end-to-end automation.
Proactive network validation: For VPC-based event sources like Amazon MSK or self-managed Kafka, the tools validate network topology, security group rules, and connectivity before deployment, preventing common silent failures.
Context-aware troubleshooting: The diagnostic tools correlate ESM status, CloudWatch metrics, VPC configuration, and IAM permissions to provide comprehensive root cause analysis with specific remediation steps.

New tools available in the Serverless MCP Server

The event source mapping tools are designed to minimize trust permission prompts by using a small set of primary tools that internally call specialized functions. The tools can be classified into three main categories:

esm_guidance: This tool provides comprehensive guidance on creating and configuring event source mappings for all event sources (DynamoDB, Kinesis, Kafka, SQS). It handles setup, networking guidance, and troubleshooting based on the guidance_type parameter. The tool automatically generates AWS SAM templates, IAM policies, and security group configurations.
esm_optimize: This advanced optimization tool analyzes configuration tradeoffs, validates ESM settings, and generates AWS SAM templates for performance tuning. It supports three actions:
- analyze: Provides configuration tradeoff analysis for failure rate, latency, throughput, and cost optimization
- validate: Validates ESM configurations against AWS limits and event source restrictions
- generate_template: Creates AWS SAM templates with optimized configurations
esm_kafka_troubleshoot: This specialized troubleshooting tool for Kafka ESM issues supports both Amazon MSK and self-managed Apache Kafka clusters. It also provides diagnostic capabilities and step-by-step resolution guidance for connectivity, authentication, and network issues.

The primary tools internally call specialized helper functions to provide comprehensive functionality that help generate IAM polices, security groups, scaling and concurrency configurations, and validate configurations.

Visit the Serverless MCP Server documentation for the full list of tools and resources.

Best practices and considerations

When building event-driven applications with the AWS Serverless MCP Server, start by using its guidance tools for architectural decisions. The server helps you choose appropriate event sources, understand networking requirements, and configure optimal settings based on your performance goals.For Kafka-based ESMs, pay special attention to VPC configuration. Use the server’s network troubleshooting tools to validate connectivity before deployment. The server can detect common issues like missing NAT gateways, incorrect security group rules, or subnet routing problems.Monitor your event source mappings continuously using the server’s diagnostic tools. Set up alerts for key metrics like iterator age, error rates, and throttling. The server can help you interpret these metrics and recommend configuration adjustments to maintain optimal performance.

Conclusion

The new event source mapping tools in the open-source AWS Serverless MCP Server simplify event source mapping management throughout the development lifecycle, from initial setup to ongoing optimization and troubleshooting. By combining AI assistance with ESM expertise, it helps developers build and deploy event-driven applications more efficiently while avoiding common configuration pitfalls.

As organizations continue to adopt event-driven serverless computing, tools that simplify ESM management and accelerate delivery become increasingly valuable.

To get started, visit the GitHub repository and explore the documentation. Share your experiences and suggestions through the GitHub repository to improve the MCP server’s capabilities and help shape the future of AI-assisted event-driven development.

For more serverless learning resources, visit Serverless Land.

Build more accurate AI applications with Amazon Nova Web Grounding

2025-10-29 Matheus Guimaraes

Post Syndicated from Matheus Guimaraes original https://aws.amazon.com/blogs/aws/build-more-accurate-ai-applications-with-amazon-nova-web-grounding/

Imagine building AI applications that deliver accurate, current information without the complexity of developing intricate data retrieval systems. Today, we’re excited to announce the general availability of Web Grounding, a new built-in tool for Nova models on Amazon Bedrock.

Web Grounding provides developers with a turnkey Retrieval Augmented Generation (RAG) option that allows the Amazon Nova foundation models to intelligently decide when to retrieve and incorporate relevant up-to-date information based on the context of the prompt. This helps to ground the model output by incorporating cited public sources as context, aiming to reduce hallucinations and improve accuracy.

When should developers use Web Grounding?

Developers should consider using Web Grounding when building applications that require access to current, factual information or need to provide well-cited responses. The capability is particularly valuable across a range of applications, from knowledge-based chat assistants providing up-to-date information about products and services, to content generation tools requiring fact-checking and source verification. It’s also ideal for research assistants that need to synthesize information from multiple current sources, as well as customer support applications where accuracy and verifiability are crucial.

Web Grounding is especially useful when you need to reduce hallucinations in your AI applications or when your use case requires transparent source attribution. Because it automatically handles the retrieval and integration of information, it’s an efficient solution for developers who want to focus on building their applications rather than managing complex RAG implementations.

Getting started
Web Grounding seamlessly integrates with supported Amazon Nova models to handle information retrieval and processing during inference. This eliminates the need to build and maintain complex RAG pipelines, while also providing source attributions that verify the origin of information.

Let’s see an example of asking a question to Nova Premier using Python to call the Amazon Bedrock Converse API with Web Grounding enabled.

First, I created an Amazon Bedrock client using AWS SDK for Python (Boto3) in the usual way. For good practice, I’m using a session, which helps to group configurations and make them reusable. I then create a BedrockRuntimeClient.

try:
    session = boto3.Session(region_name='us-east-1')
    client = session.client(
        'bedrock-runtime')

I then prepare the Amazon Bedrock Converse API payload. It includes a “role” parameter set to “user”, indicating that the message comes from our application’s user (compared to “assistant” for AI-generated responses).

For this demo, I chose the question “What are the current AWS Regions and their locations?” This was selected intentionally because it requires current information, making it useful to demonstrate how Amazon Nova can automatically invoke searches using Web Grounding when it determines that up-to-date knowledge is needed.

# Prepare the conversation in the format expected by Bedrock
question = "What are the current AWS regions and their locations?"
conversation = [
   {
     "role": "user",  # Indicates this message is from the user
     "content": [{"text": question}],  # The actual question text
      }
    ]

First, let’s see what the output is without Web Grounding. I make a call to Amazon Bedrock Converse API.

# Make the API call to Bedrock 
model_id = "us.amazon.nova-premier-v1:0" 
response = client.converse( 
    modelId=model_id, # Which AI model to use 
    messages=conversation, # The conversation history (just our question in this case) 
    )
print(response['output']['message']['content'][0]['text'])

I get a list of all the current AWS Regions and their locations.

Now let’s use Web Grounding. I make a similar call to the Amazon Bedrock Converse API, but declare nova_grounding as one of the tools available to the model.

model_id = "us.amazon.nova-premier-v1:0" 
response = client.converse( 
    modelId=model_id, 
    messages=conversation, 
    toolConfig= {
          "tools":[ 
              {
                "systemTool": {
                   "name": "nova_grounding" # Enables the model to search real-time information
                 }
              }
          ]
     }
)

After processing the response, I can see that the model used Web Grounding to access up-to-date information. The output includes reasoning traces that I can use to follow its thought process and see where it automatically queried external sources. The content of the responses from these external calls appear as [HIDDEN] – a standard practice in AI systems that both protects sensitive information and helps manage output size.

Additionally, the output also includes citationsContent objects containing information about the sources queried by Web Grounding.

Finally, I can see the list of AWS Regions. It finishes with a message right at the end stating that “These are the most current and active AWS regions globally.”

Web Grounding represents a significant step forward in making AI applications more reliable and current with minimum effort. Whether you’re building customer service chat assistants that need to provide up-to-date accurate information, developing research applications that analyze and synthesize information from multiple sources, or creating travel applications that deliver the latest details about destinations and accommodations, Web Grounding can help you deliver more accurate and relevant responses to your users with a convenient turnkey solution that is straightforward to configure and use.

Things to know
Amazon Nova Web Grounding is available today in US East (N. Virginia). Web Grounding will also soon launch on US East (Ohio), and US West (Oregon).

Web Grounding incurs additional cost. Refer to the Amazon Bedrock pricing page for more details.

Currently, you can only use Web Grounding with Nova Premier but support for other Nova models will be added soon.

If you haven’t used Amazon Nova before or are looking to go deeper, try this self-paced online workshop where you can learn how to effectively use Amazon Nova foundation models and related features for text, image, and video processing through hands-on exercises.

Matheus Guimaraes | @codingmatheus

Amazon Kinesis Data Streams now supports 10x larger record sizes: Simplifying real-time data processing

2025-10-28 Sumant Nemmani

Post Syndicated from Sumant Nemmani original https://aws.amazon.com/blogs/big-data/amazon-kinesis-data-streams-now-supports-10x-larger-record-sizes-simplifying-real-time-data-processing/

Today, AWS announced that Amazon Kinesis Data Streams now supports record sizes up to 10MiB – a tenfold increase from the previous limit. With this launch, you can now publish intermittent larger data payloads on your data streams while continuing to use existing Kinesis Data Streams APIs in your applications without additional effort. This launch is accompanied by a 2x increase in the maximum PutRecords request size from 5MiB to 10MiB, simplifying data pipelines and reducing operational overhead for IoT analytics, change data capture, and generative AI workloads.

In this post, we explore Amazon Kinesis Data Streams large record support, including key use cases, configuration of maximum record sizes, throttling considerations, and best practices for optimal performance.

Real world use cases

As data volumes grow and use cases evolve, we’ve seen increasing demand for supporting larger record sizes in streaming workloads. Previously, when you needed to process records larger than 1MiB, you had two options:

Split large records into multiple smaller records in producer applications and reassemble them in consumer applications
Store large records in Amazon Simple Storage Service (Amazon S3) and send only metadata through Kinesis Data Streams

Both these approaches are useful, but they add complexity to data pipelines, requiring additional code, increasing operational overhead, and complicating error handling and debugging, particularly when customers need to stream large records intermittently.

This enhancement improves the ease of use and reduces operational overhead for customers handling intermittent data payloads across various industries and use cases. In the IoT analytics domain, connected vehicles and industrial equipment are generating increasing volumes of sensor telemetry data, with the size of individual telemetry records occasionally exceeding the previous 1MiB limit in Kinesis. This required customers to implement complex workarounds, such as splitting large records into multiple smaller ones or storing the large records separately and only sending metadata through Kinesis. Similarly, in database change data capture (CDC) pipelines, large transaction records can be produced, especially during bulk operations or schema changes. In the machine learning and generative AI space, workflows are increasingly requiring the ingestion of larger payloads to support richer feature sets and multi-modal data types like audio and images. The increased Kinesis record size limit from 1MiB to 10MiB limits the need for these types of complex workarounds, simplifying data pipelines and reducing operational overhead for customers in IoT, CDC, and advanced analytics use cases. Customers can now more easily ingest and process these intermittent large data records using the same familiar Kinesis APIs.

How it works

To start processing larger records:

Update your stream’s maximum record size limit (maxRecordSize) through the AWS Console, AWS CLI, or AWS SDKs.
Continue using the same PutRecord and PutRecords APIs for producers.
Continue using the same GetRecords or SubscribeToShard APIs for consumers.

Your stream will be in Updating status for a few seconds before being ready to ingest larger records.

Getting started

To start processing larger records with Kinesis Data Streams, you can update the maximum record size by using the AWS Management Console, CLI or SDK.

On the AWS Management Console,

Navigate to the Kinesis Data Streams console.
Choose your stream and select the Configuration tab.
Choose Edit (next to Maximum record size).
Set your desired maximum record size (up to 10MiB).
Save your changes.

Note: This setting only adjusts the maximum record size for this Kinesis data stream. Before increasing this limit, verify that all downstream applications can handle larger records.

Most common consumers such as Kinesis Client Library (starting with version 2.x), Amazon Data Firehose delivery to Amazon S3 and AWS Lambda support processing records larger than 1 MiB. To learn more, refer to the Amazon Kinesis Data Streams documentation for large records.

You can also update this setting using the AWS CLI:

aws kinesis update-max-record-size \
--stream-arn <stream-arn> \
--max-record-size-in-ki-b 5000

Or using the AWS SDK:

import boto3

client = boto3.client('kinesis')
response = client.update_max_record_size(
StreamARN='arn:aws:kinesis:us-west-2:123456789012:stream/my-stream',
MaxRecordSizeInKiB=5000
)

Throttling and best practices for optimal performance

Individual shard throughput limits of 1MiB/s for writes and 2MiB/s for reads remain unchanged with support for larger record sizes. To work with large records, let’s understand how throttling works. In a stream, each shard has a throughput capacity of 1 MiB per second. To accommodate large records, each shard temporarily bursts up to 10MiB/s, eventually averaging out to 1MiB per second. To help visualize this behavior, think of each shard having a capacity tank that refills at 1MiB per second. After sending a large record (for example, a 10MiB record), the tank begins refilling immediately, allowing you to send smaller records as capacity becomes available. This capacity to support large records is continuously refilled into the stream. The rate of refilling depends on the size of the large records, the size of the baseline record, the overall traffic pattern, and your chosen partition key strategy. When you process large records, each shard continues to process baseline traffic while leveraging its burst capacity to handle these larger payloads.

To illustrate how Kinesis Data Streams handles different proportions of large records, let’s examine the results a simple test. For our test configuration, we set up a producer that sends data to an on-demand stream (defaults to 4 shards) at a rate of 50 records per second. The baseline records are 10KiB in size, while large records are 2MiB each. We conducted multiple test cases by progressively increasing the proportion of large records from 1% to 5% of the total stream traffic, along with a baseline case containing no large records. To ensure consistent testing conditions, we distributed the large records uniformly over time for example, in the 1% scenario, we sent one large record for every 100 baseline records. The following graph shows the results:

In the graph, horizontal annotations indicate throttling occurrence peaks. The baseline scenario, represented by the blue line, shows minimal throttling events. As the proportion of large records increases from 1% to 5%, we observe an increase in the rate at which your stream throttles your data, with a notable acceleration in throttling events between the 2% and 5% scenarios. This test demonstrates how Kinesis Data Streams manages increasing proportion of large records.

We recommend maintaining large records at 1-2% of your total record count for optimal performance. In production environments, actual stream behavior varies based on three key factors: the size of baseline records, the size of large records, and the frequency at which large records appear in the stream. We recommend that you test with your demand pattern to determine the specific behavior.

With on-demand streams, when the incoming traffic exceeds 500 KB/s per shard, it splits the shard within 15 minutes. The parent shard’s hash key values are redistributed evenly across child shards. Kinesis automatically scales the stream to increase the number of shards, enabling distribution of large records across a larger number of shards depending on the partition key strategy employed.

For optimal performance with large records:

Use a random partition key strategy to distribute large records evenly across shards.
Implement backoff and retry logic in producer applications.
Monitor shard-level metrics to identify potential bottlenecks.

If you still need to continuously stream of large records, consider using Amazon S3 to store payloads and send only metadata references to the stream. Refer to Processing large records with Amazon Kinesis Data Streams for more information.

Conclusion

Amazon Kinesis Data Streams now supports record sizes up to 10MiB, a tenfold increase from the previous 1MiB limit. This enhancement simplifies data pipelines for IoT analytics, change data capture, and AI/ML workloads by eliminating the need for complex workarounds. You can continue using existing Kinesis Data Streams APIs without additional code changes and benefit from increased flexibility in handling intermittent large payloads.

For optimal performance, we recommend maintaining large records at 1-2% of total record count.
For best results with large records, implement a uniformly distributed partition key strategy to evenly distribute records across shards, include backoff and retry logic in producer applications, and monitor shard-level metrics to identify potential bottlenecks.
Before increasing the maximum record size, verify that all downstream applications and consumers can handle larger records.

We’re excited to see how you’ll leverage this capability to build more powerful and efficient streaming applications. To learn more, visit the Amazon Kinesis Data Streams documentation.

About the authors

Amazon Nova Multimodal Embeddings: State-of-the-art embedding model for agentic RAG and semantic search

2025-10-28 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-nova-multimodal-embeddings-now-available-in-amazon-bedrock/

Today, we’re introducing Amazon Nova Multimodal Embeddings, a state-of-the-art multimodal embedding model for agentic retrieval-augmented generation (RAG) and semantic search applications, available in Amazon Bedrock. It is the first unified embedding model that supports text, documents, images, video, and audio through a single model to enable crossmodal retrieval with leading accuracy.

Embedding models convert textual, visual, and audio inputs into numerical representations called embeddings. These embeddings capture the meaning of the input in a way that AI systems can compare, search, and analyze, powering use cases such as semantic search and RAG.

Organizations are increasingly seeking solutions to unlock insights from the growing volume of unstructured data that is spread across text, image, document, video, and audio content. For example, an organization might have product images, brochures that contain infographics and text, and user-uploaded video clips. Embedding models are able to unlock value from unstructured data, however traditional models are typically specialized to handle one content type. This limitation drives customers to either build complex crossmodal embedding solutions or restrict themselves to use cases focused on a single content type. The problem also applies to mixed-modality content types such as documents with interleaved text and images or video with visual, audio, and textual elements where existing models struggle to capture crossmodal relationships eﬀectively.

Nova Multimodal Embeddings supports a unified semantic space for text, documents, images, video, and audio for use cases such as crossmodal search across mixed-modality content, searching with a reference image, and retrieving visual documents.

Evaluating Amazon Nova Multimodal Embeddings performance
We evaluated the model on a broad range of benchmarks, and it delivers leading accuracy out-of-the-box as described in the following table.

Nova Multimodal Embeddings supports a context length of up to 8K tokens, text in up to 200 languages, and accepts inputs via synchronous and asynchronous APIs. Additionally, it supports segmentation (also known as “chunking”) to partition long-form text, video, or audio content into manageable segments, generating embeddings for each portion. Lastly, the model oﬀers four output embedding dimensions, trained using Matryoshka Representation Learning (MRL) that enables low-latency end-to-end retrieval with minimal accuracy changes.

Let’s see how the new model can be used in practice.

Using Amazon Nova Multimodal Embeddings
Getting started with Nova Multimodal Embeddings follows the same pattern as other models in Amazon Bedrock. The model accepts text, documents, images, video, or audio as input and returns numerical embeddings that you can use for semantic search, similarity comparison, or RAG.

Here’s a practical example using the AWS SDK for Python (Boto3) that shows how to create embeddings from different content types and store them for later retrieval. For simplicity, I’ll use Amazon S3 Vectors, a cost-optimized storage with native support for storing and querying vectors at any scale, to store and search the embeddings.

Let’s start with the fundamentals: converting text into embeddings. This example shows how to transform a simple text description into a numerical representation that captures its semantic meaning. These embeddings can later be compared with embeddings from documents, images, videos, or audio to find related content.

To make the code easy to follow, I’ll show a section of the script at a time. The full script is included at the end of this walkthrough.

import json
import base64
import time
import boto3

MODEL_ID = "amazon.nova-2-multimodal-embeddings-v1:0"
EMBEDDING_DIMENSION = 3072

# Initialize Amazon Bedrock Runtime client
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

print(f"Generating text embedding with {MODEL_ID} ...")

# Text to embed
text = "Amazon Nova is a multimodal foundation model"

# Create embedding
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "text": {"truncationMode": "END", "value": text},
    },
}

response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json",
)

# Extract embedding
response_body = json.loads(response["body"].read())
embedding = response_body["embeddings"][0]["embedding"]

print(f"Generated embedding with {len(embedding)} dimensions")

Now we’ll process visual content using the same embedding space using a photo.jpg file in the same folder as the script. This demonstrates the power of multimodality: Nova Multimodal Embeddings is able to capture both textual and visual context into a single embedding that provides enhanced understanding of the document.

Nova Multimodal Embeddings can generate embeddings that are optimized for how they are being used. When indexing for a search or retrieval use case, embeddingPurpose can be set to GENERIC_INDEX. For the query step, embeddingPurpose can be set depending on the type of item to be retrieved. For example, when retrieving documents, embeddingPurpose can be set to DOCUMENT_RETRIEVAL.

# Read and encode image
print(f"Generating image embedding with {MODEL_ID} ...")

with open("photo.jpg", "rb") as f:
    image_bytes = base64.b64encode(f.read()).decode("utf-8")

# Create embedding
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "image": {
            "format": "jpeg",
            "source": {"bytes": image_bytes}
        },
    },
}

response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json",
)

# Extract embedding
response_body = json.loads(response["body"].read())
embedding = response_body["embeddings"][0]["embedding"]

print(f"Generated embedding with {len(embedding)} dimensions")

To process video content, I use the asynchronous API. That’s a requirement for videos that are larger than 25MB when encoded as Base64. First, I upload a local video to an S3 bucket in the same AWS Region.

aws s3 cp presentation.mp4 s3://my-video-bucket/videos/

This example shows how to extract embeddings from both visual and audio components of a video file. The segmentation feature breaks longer videos into manageable chunks, making it practical to search through hours of content efficiently.

# Initialize Amazon S3 client
s3 = boto3.client("s3", region_name="us-east-1")

print(f"Generating video embedding with {MODEL_ID} ...")

# Amazon S3 URIs
S3_VIDEO_URI = "s3://my-video-bucket/videos/presentation.mp4"
S3_EMBEDDING_DESTINATION_URI = "s3://my-embedding-destination-bucket/embeddings-output/"

# Create async embedding job for video with audio
model_input = {
    "taskType": "SEGMENTED_EMBEDDING",
    "segmentedEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "video": {
            "format": "mp4",
            "embeddingMode": "AUDIO_VIDEO_COMBINED",
            "source": {
                "s3Location": {"uri": S3_VIDEO_URI}
            },
            "segmentationConfig": {
                "durationSeconds": 15  # Segment into 15-second chunks
            },
        },
    },
}

response = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={
        "s3OutputDataConfig": {
            "s3Uri": S3_EMBEDDING_DESTINATION_URI
        }
    },
)

invocation_arn = response["invocationArn"]
print(f"Async job started: {invocation_arn}")

# Poll until job completes
print("\nPolling for job completion...")
while True:
    job = bedrock_runtime.get_async_invoke(invocationArn=invocation_arn)
    status = job["status"]
    print(f"Status: {status}")

    if status != "InProgress":
        break
    time.sleep(15)

# Check if job completed successfully
if status == "Completed":
    output_s3_uri = job["outputDataConfig"]["s3OutputDataConfig"]["s3Uri"]
    print(f"\nSuccess! Embeddings at: {output_s3_uri}")

    # Parse S3 URI to get bucket and prefix
    s3_uri_parts = output_s3_uri[5:].split("/", 1)  # Remove "s3://" prefix
    bucket = s3_uri_parts[0]
    prefix = s3_uri_parts[1] if len(s3_uri_parts) > 1 else ""

    # AUDIO_VIDEO_COMBINED mode outputs to embedding-audio-video.jsonl
    # The output_s3_uri already includes the job ID, so just append the filename
    embeddings_key = f"{prefix}/embedding-audio-video.jsonl".lstrip("/")

    print(f"Reading embeddings from: s3://{bucket}/{embeddings_key}")

    # Read and parse JSONL file
    response = s3.get_object(Bucket=bucket, Key=embeddings_key)
    content = response['Body'].read().decode('utf-8')

    embeddings = []
    for line in content.strip().split('\n'):
        if line:
            embeddings.append(json.loads(line))

    print(f"\nFound {len(embeddings)} video segments:")
    for i, segment in enumerate(embeddings):
        print(f"  Segment {i}: {segment.get('startTime', 0):.1f}s - {segment.get('endTime', 0):.1f}s")
        print(f"    Embedding dimension: {len(segment.get('embedding', []))}")
else:
    print(f"\nJob failed: {job.get('failureMessage', 'Unknown error')}")

With our embeddings generated, we need a place to store and search them efficiently. This example demonstrates setting up a vector store using Amazon S3 Vectors, which provides the infrastructure needed for similarity search at scale. Think of this as creating a searchable index where semantically similar content naturally clusters together. When adding an embedding to the index, I use the metadata to specify the original format and the content being indexed.

# Initialize Amazon S3 Vectors client
s3vectors = boto3.client("s3vectors", region_name="us-east-1")

# Configuration
VECTOR_BUCKET = "my-vector-store"
INDEX_NAME = "embeddings"

# Create vector bucket and index (if they don't exist)
try:
    s3vectors.get_vector_bucket(vectorBucketName=VECTOR_BUCKET)
    print(f"Vector bucket {VECTOR_BUCKET} already exists")
except s3vectors.exceptions.NotFoundException:
    s3vectors.create_vector_bucket(vectorBucketName=VECTOR_BUCKET)
    print(f"Created vector bucket: {VECTOR_BUCKET}")

try:
    s3vectors.get_index(vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME)
    print(f"Vector index {INDEX_NAME} already exists")
except s3vectors.exceptions.NotFoundException:
    s3vectors.create_index(
        vectorBucketName=VECTOR_BUCKET,
        indexName=INDEX_NAME,
        dimension=EMBEDDING_DIMENSION,
        dataType="float32",
        distanceMetric="cosine"
    )
    print(f"Created index: {INDEX_NAME}")

texts = [
    "Machine learning on AWS",
    "Amazon Bedrock provides foundation models",
    "S3 Vectors enables semantic search"
]

print(f"\nGenerating embeddings for {len(texts)} texts...")

# Generate embeddings using Amazon Nova for each text
vectors = []
for text in texts:
    response = bedrock_runtime.invoke_model(
        body=json.dumps({
            "taskType": "SINGLE_EMBEDDING",
            "singleEmbeddingParams": {
                "embeddingDimension": EMBEDDING_DIMENSION,
                "text": {"truncationMode": "END", "value": text}
            }
        }),
        modelId=MODEL_ID,
        accept="application/json",
        contentType="application/json"
    )

    response_body = json.loads(response["body"].read())
    embedding = response_body["embeddings"][0]["embedding"]

    vectors.append({
        "key": f"text:{text[:50]}",  # Unique identifier
        "data": {"float32": embedding},
        "metadata": {"type": "text", "content": text}
    })
    print(f"  ✓ Generated embedding for: {text}")

# Add all vectors to store in a single call
s3vectors.put_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    vectors=vectors
)

print(f"\nSuccessfully added {len(vectors)} vectors to the store in one put_vectors call!")

This final example demonstrates the capability of searching across different content types with a single query, finding the most similar content regardless of whether it originated from text, images, videos, or audio. The distance scores help you understand how closely related the results are to your original query.

# Text to query
query_text = "foundation models"  

print(f"\nGenerating embeddings for query '{query_text}' ...")

# Generate embeddings
response = bedrock_runtime.invoke_model(
    body=json.dumps({
        "taskType": "SINGLE_EMBEDDING",
        "singleEmbeddingParams": {
            "embeddingPurpose": "GENERIC_RETRIEVAL",
            "embeddingDimension": EMBEDDING_DIMENSION,
            "text": {"truncationMode": "END", "value": query_text}
        }
    }),
    modelId=MODEL_ID,
    accept="application/json",
    contentType="application/json"
)

response_body = json.loads(response["body"].read())
query_embedding = response_body["embeddings"][0]["embedding"]

print(f"Searching for similar embeddings...\n")

# Search for top 5 most similar vectors
response = s3vectors.query_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    queryVector={"float32": query_embedding},
    topK=5,
    returnDistance=True,
    returnMetadata=True
)

# Display results
print(f"Found {len(response['vectors'])} results:\n")
for i, result in enumerate(response["vectors"], 1):
    print(f"{i}. {result['key']}")
    print(f"   Distance: {result['distance']:.4f}")
    if result.get("metadata"):
        print(f"   Metadata: {result['metadata']}")
    print()

Crossmodal search is one of the key advantages of multimodal embeddings. With crossmodal search, you can query with text and find relevant images. You can also search for videos using text descriptions, find audio clips that match certain topics, or discover documents based on their visual and textual content. For your reference, the full script with all previous examples merged together is here:

import json
import base64
import time
import boto3

MODEL_ID = "amazon.nova-2-multimodal-embeddings-v1:0"
EMBEDDING_DIMENSION = 3072

# Initialize Amazon Bedrock Runtime client
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

print(f"Generating text embedding with {MODEL_ID} ...")

# Text to embed
text = "Amazon Nova is a multimodal foundation model"

# Create embedding
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "text": {"truncationMode": "END", "value": text},
    },
}

response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json",
)

# Extract embedding
response_body = json.loads(response["body"].read())
embedding = response_body["embeddings"][0]["embedding"]

print(f"Generated embedding with {len(embedding)} dimensions")
# Read and encode image
print(f"Generating image embedding with {MODEL_ID} ...")

with open("photo.jpg", "rb") as f:
    image_bytes = base64.b64encode(f.read()).decode("utf-8")

# Create embedding
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "image": {
            "format": "jpeg",
            "source": {"bytes": image_bytes}
        },
    },
}

response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json",
)

# Extract embedding
response_body = json.loads(response["body"].read())
embedding = response_body["embeddings"][0]["embedding"]

print(f"Generated embedding with {len(embedding)} dimensions")
# Initialize Amazon S3 client
s3 = boto3.client("s3", region_name="us-east-1")

print(f"Generating video embedding with {MODEL_ID} ...")

# Amazon S3 URIs
S3_VIDEO_URI = "s3://my-video-bucket/videos/presentation.mp4"

# Amazon S3 output bucket and location
S3_EMBEDDING_DESTINATION_URI = "s3://my-video-bucket/embeddings-output/"

# Create async embedding job for video with audio
model_input = {
    "taskType": "SEGMENTED_EMBEDDING",
    "segmentedEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "video": {
            "format": "mp4",
            "embeddingMode": "AUDIO_VIDEO_COMBINED",
            "source": {
                "s3Location": {"uri": S3_VIDEO_URI}
            },
            "segmentationConfig": {
                "durationSeconds": 15  # Segment into 15-second chunks
            },
        },
    },
}

response = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={
        "s3OutputDataConfig": {
            "s3Uri": S3_EMBEDDING_DESTINATION_URI
        }
    },
)

invocation_arn = response["invocationArn"]
print(f"Async job started: {invocation_arn}")

# Poll until job completes
print("\nPolling for job completion...")
while True:
    job = bedrock_runtime.get_async_invoke(invocationArn=invocation_arn)
    status = job["status"]
    print(f"Status: {status}")

    if status != "InProgress":
        break
    time.sleep(15)

# Check if job completed successfully
if status == "Completed":
    output_s3_uri = job["outputDataConfig"]["s3OutputDataConfig"]["s3Uri"]
    print(f"\nSuccess! Embeddings at: {output_s3_uri}")

    # Parse S3 URI to get bucket and prefix
    s3_uri_parts = output_s3_uri[5:].split("/", 1)  # Remove "s3://" prefix
    bucket = s3_uri_parts[0]
    prefix = s3_uri_parts[1] if len(s3_uri_parts) > 1 else ""

    # AUDIO_VIDEO_COMBINED mode outputs to embedding-audio-video.jsonl
    # The output_s3_uri already includes the job ID, so just append the filename
    embeddings_key = f"{prefix}/embedding-audio-video.jsonl".lstrip("/")

    print(f"Reading embeddings from: s3://{bucket}/{embeddings_key}")

    # Read and parse JSONL file
    response = s3.get_object(Bucket=bucket, Key=embeddings_key)
    content = response['Body'].read().decode('utf-8')

    embeddings = []
    for line in content.strip().split('\n'):
        if line:
            embeddings.append(json.loads(line))

    print(f"\nFound {len(embeddings)} video segments:")
    for i, segment in enumerate(embeddings):
        print(f"  Segment {i}: {segment.get('startTime', 0):.1f}s - {segment.get('endTime', 0):.1f}s")
        print(f"    Embedding dimension: {len(segment.get('embedding', []))}")
else:
    print(f"\nJob failed: {job.get('failureMessage', 'Unknown error')}")
# Initialize Amazon S3 Vectors client
s3vectors = boto3.client("s3vectors", region_name="us-east-1")

# Configuration
VECTOR_BUCKET = "my-vector-store"
INDEX_NAME = "embeddings"

# Create vector bucket and index (if they don't exist)
try:
    s3vectors.get_vector_bucket(vectorBucketName=VECTOR_BUCKET)
    print(f"Vector bucket {VECTOR_BUCKET} already exists")
except s3vectors.exceptions.NotFoundException:
    s3vectors.create_vector_bucket(vectorBucketName=VECTOR_BUCKET)
    print(f"Created vector bucket: {VECTOR_BUCKET}")

try:
    s3vectors.get_index(vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME)
    print(f"Vector index {INDEX_NAME} already exists")
except s3vectors.exceptions.NotFoundException:
    s3vectors.create_index(
        vectorBucketName=VECTOR_BUCKET,
        indexName=INDEX_NAME,
        dimension=EMBEDDING_DIMENSION,
        dataType="float32",
        distanceMetric="cosine"
    )
    print(f"Created index: {INDEX_NAME}")

texts = [
    "Machine learning on AWS",
    "Amazon Bedrock provides foundation models",
    "S3 Vectors enables semantic search"
]

print(f"\nGenerating embeddings for {len(texts)} texts...")

# Generate embeddings using Amazon Nova for each text
vectors = []
for text in texts:
    response = bedrock_runtime.invoke_model(
        body=json.dumps({
            "taskType": "SINGLE_EMBEDDING",
            "singleEmbeddingParams": {
                "embeddingPurpose": "GENERIC_INDEX",
                "embeddingDimension": EMBEDDING_DIMENSION,
                "text": {"truncationMode": "END", "value": text}
            }
        }),
        modelId=MODEL_ID,
        accept="application/json",
        contentType="application/json"
    )

    response_body = json.loads(response["body"].read())
    embedding = response_body["embeddings"][0]["embedding"]

    vectors.append({
        "key": f"text:{text[:50]}",  # Unique identifier
        "data": {"float32": embedding},
        "metadata": {"type": "text", "content": text}
    })
    print(f"  ✓ Generated embedding for: {text}")

# Add all vectors to store in a single call
s3vectors.put_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    vectors=vectors
)

print(f"\nSuccessfully added {len(vectors)} vectors to the store in one put_vectors call!")
# Text to query
query_text = "foundation models"  

print(f"\nGenerating embeddings for query '{query_text}' ...")

# Generate embeddings
response = bedrock_runtime.invoke_model(
    body=json.dumps({
        "taskType": "SINGLE_EMBEDDING",
        "singleEmbeddingParams": {
            "embeddingPurpose": "GENERIC_RETRIEVAL",
            "embeddingDimension": EMBEDDING_DIMENSION,
            "text": {"truncationMode": "END", "value": query_text}
        }
    }),
    modelId=MODEL_ID,
    accept="application/json",
    contentType="application/json"
)

response_body = json.loads(response["body"].read())
query_embedding = response_body["embeddings"][0]["embedding"]

print(f"Searching for similar embeddings...\n")

# Search for top 5 most similar vectors
response = s3vectors.query_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    queryVector={"float32": query_embedding},
    topK=5,
    returnDistance=True,
    returnMetadata=True
)

# Display results
print(f"Found {len(response['vectors'])} results:\n")
for i, result in enumerate(response["vectors"], 1):
    print(f"{i}. {result['key']}")
    print(f"   Distance: {result['distance']:.4f}")
    if result.get("metadata"):
        print(f"   Metadata: {result['metadata']}")
    print()

For production applications, embeddings can be stored in any vector database. Amazon OpenSearch Service offers native integration with Nova Multimodal Embeddings at launch, making it straightforward to build scalable search applications. As shown in the examples before, Amazon S3 Vectors provides a simple way to store and query embeddings with your application data.

Things to know
Nova Multimodal Embeddings offers four output dimension options: 3,072, 1,024, 384, and 256. Larger dimensions provide more detailed representations but require more storage and computation. Smaller dimensions offer a practical balance between retrieval performance and resource efficiency. This flexibility helps you optimize for your specific application and cost requirements.

The model handles substantial context lengths. For text inputs, it can process up to 8,192 tokens at once. Video and audio inputs support segments of up to 30 seconds, and the model can segment longer files. This segmentation capability is particularly useful when working with large media files—the model splits them into manageable pieces and creates embeddings for each segment.

The model includes responsible AI features built into Amazon Bedrock. Content submitted for embedding goes through Amazon Bedrock content safety filters, and the model includes fairness measures to reduce bias.

As described in the code examples, the model can be invoked through both synchronous and asynchronous APIs. The synchronous API works well for real-time applications where you need immediate responses, such as processing user queries in a search interface. The asynchronous API handles latency insensitive workloads more efficiently, making it suitable for processing large content such as videos.

Availability and pricing
Amazon Nova Multimodal Embeddings is available today in Amazon Bedrock in the US East (N. Virginia) AWS Region. For detailed pricing information, visit the Amazon Bedrock pricing page.

To learn more, see the Amazon Nova User Guide for comprehensive documentation and the Amazon Nova model cookbook on GitHub for practical code examples.

If you’re using an AI–powered assistant for software development such as Amazon Q Developer or Kiro, you can set up the AWS API MCP Server to help the AI assistants interact with AWS services and resources and the AWS Knowledge MCP Server to provide up-to-date documentation, code samples, knowledge about the regional availability of AWS APIs and CloudFormation resources.

Start building multimodal AI-powered applications with Nova Multimodal Embeddings today, and share your feedback through AWS re:Post for Amazon Bedrock or your usual AWS Support contacts.

— Danilo

Processing Amazon S3 objects at scale with AWS Step Functions Distributed Map S3 prefix

2025-10-25 Biswanath Mukherjee

Post Syndicated from Biswanath Mukherjee original https://aws.amazon.com/blogs/compute/processing-amazon-s3-objects-at-scale-with-aws-step-functions-distributed-map-s3-prefix/

If you’re building large scale enterprise applications, you’ve likely faced the complexities of processing large volumes of data files. Whether you’re analyzing your application logs, processing customer data files, or transforming machine learning datasets, you know the complexity involved in orchestrating workflows. You’ve probably written nested workflows and additional custom code to process objects from Amazon Simple Storage Service (Amazon S3) buckets.

With AWS Step Functions Distributed Map, you can process large scale datasets by running concurrent iterations of workflow steps across data entries in parallel, achieving massive scale with simplified management.

With the new prefix-based iteration feature and LOAD_AND_FLATTEN transformation parameter for Distributed Map, your workflows can now iterate over S3 objects under a specified prefix using S3ListObjectsV2 to process their contents in a single Map state, avoiding nested workflows and reducing operational complexity.

In this post, you’ll learn how to process Amazon S3 objects at scale with the new AWS Step Functions Distributed Map S3 prefix and transformation capabilities.

Use case: Application log processing and summarization

You’ll build a sample Step Functions state machine that demonstrates processing of all the log files from the given S3 prefix using a Distributed Map. You’ll analyze all the log files to build a summary INFO, WARNING and ERROR messages in the log file on hourly basis. The following diagram presents the AWS Step Functions state machine:

Log files processing workflow

The state machine iterates over all the log files from the specified S3 prefix using S3 ListObjectsV2 and process them using AWS Step Functions Distributed Map.
For each log file entry, the state machine puts hourly ErrorCount metric into Amazon CloudWatch.
The state machine then stores hourly metrics count in a Amazon DynamoDB table.
The state machine then invokes an AWS Lambda function to perform metrics aggregation.

The following is an example of the parameters in an ItemReader configured to iterate over the content of S3 objects using S3 ListObjectsV2.

{
  "QueryLanguage": "JSONata",
  "States": {
    ...
    "Map": {
        ...
        "ItemReader": {
            "Resource": "arn:aws:states:::s3:listObjectsV2",
            "ReaderConfig": {
                // InputType is required if Transformation is LOAD_AND_FLATTEN. Use one of the given values
                "InputType": "CSV | JSON | JSONL | PARQUET",
                // Transformation is OPTIONAL and defaults to NONE if not present
                "Transformation": "NONE | LOAD_AND_FLATTEN" 
            },
            "Arguments": {
                "Bucket": "amzn-s3-demo-bucket1",
                "Prefix": "{% $states.input.PrefixKey %}"
            }
        },
        ...
    }
}

With the LOAD_AND_FLATTEN option, your state machine will do the following:

Read the actual content of each object listed by the Amazon S3 ListObjectsV2 call.
Parse the content based on InputType (CSV, JSON, JSONL, Parquet).
Create items from the file contents (rows/records) rather than metadata.

We recommend including a trailing slash on your prefix. For example, if you select data with a prefix of folder1, your state machine will process both folder1/myData.csv and folder10/myData.csv. Using folder1/ will strictly process only one folder. All of the objects listed by prefix need to be in the same data format. For example, if you are selecting InputType as JSONL, your S3 prefix should contain only JSONL files and not a mix of other types.

The context object is an internal JSON structure that is available during an execution. The context object contains information about your state machine and execution. Your workflows can reference the context object in a JSONata expression with $states.context.

Within a Map state, the context object includes the following data:

"Map": {
   "Item": {
      "Index" : Number,
      "Key"   : "String", // Only valid for JSON objects
      "Value" : "String",
      "Source": "String"
   }
}

For each Map iteration, the Index contains the index number for the array item that is being currently processed.

A Key is only available when iterating over JSON objects. Value contains the array item being processed. For example, for the following input JSON object, Names will be assigned to Key and {"Bob", "Cat"} will be assigned to Value.

{
	"Names": {"Bob", "Cat"}
}

Source contains one of the following:

For state input: STATE_DATA
For Amazon S3 LIST_OBJECTS_V2 with Transformation=NONE, the value will show the S3 URI for the bucket. For example: S3://amzn-s3-demo-bucket1
For all the other input types, the value will be the Amazon S3 URI. For example: S3://amzn-s3-demo-bucket1/object-key

Using LOAD_AND_FLATTEN and the Source field, you can connect child executions to their sources.

Prerequisites

Access to an AWS account through the AWS Management Console and the AWS Command Line Interface (AWS CLI). The AWS Identity and Access Management (IAM) user that you use must have permissions to make the necessary AWS service calls and manage AWS resources mentioned in this post. While providing permissions to the IAM user, follow the principle of least-privilege.
AWS CLI installed and configured. If you are using long-term credentials like access keys, follow manage access keys for IAM users and secure access keys for best practices.
Git Installed.
AWS Serverless Application Model (AWS SAM) installed.
Python 3.13 or later installed.

Set up and run the workflow

Run the following steps to deploy and test the Step Functions state machine.

Clone the GitHub repository in a new folder and navigate to the project folder.

git clone https://github.com/aws-samples/sample-stepfunctions-s3-prefix-processor.git
cd sample-stepfunctions-s3-prefix-processor

Run the following commands to deploy the application.
```
sam deploy --guided
```
Enter the following details:
- Stack name: Stack name for CloudFormation (for example, stepfunctions-s3-prefix-processor)
- AWS Region: A supported AWS Region (for example, us-east-1)
- Accept all other default values.
The outputs from the AWS SAM deploy will be used in the subsequent steps.
Run the following command to generate sample log files.
```
python3 scripts/generate_logs.py
```
Run the following to upload the log files to the S3 bucket with the /logs/daily prefix. Replace amzn-s3-demo-bucket1 with the value from the sam deploy output.
```
aws s3 sync logs/ s3://amzn-s3-demo-bucket1/logs/ --exclude '*' --include '*.log'
```
Run the following command to execute the Step Functions workflow. Replace the StateMachineArn with the value from the sam deploy output.
```
aws stepfunctions start-execution \
  --state-machine-arn <StateMachineArn> \
  --input '{}'
```
The Step Function state machine iterates over all the log files with the S3 prefix /logs/daily and processes them in parallel. The workflow updates the metrics in CloudWatch, stores hourly metrics count in DynamoDB, then invokes an AWS Lambda function to aggregate the metrics.

Monitor and verify results

Run the following steps to monitor and verify the test results.

Run the following command to get the details of the execution. Replace executionArn with your state machine ARN.
```
aws stepfunctions describe-execution --execution-arn <executionArn>
```
When the status shows SUCCEEDED, run the following commands to check the processed output from the LogAnalyticsSummaryTableName DynamoDB table. Replace the value LogAnalyticsSummaryTableName with the value from the sam deploy output.
```
aws dynamodb scan --table-name <LogAnalyticsSummaryTableName>
```

Check that hourly ERROR, WARN, and INFO logs statistics are saved in the DynamoDB table. The following is a sample output:

{
    "Items": [
        {
            "ProcessingTime": {
                "S": "2025-10-07T23:45:10.790Z"
            },
            "WarningCount": {
                "N": "2"
            },
            "HourOfDay": {
                "S": "13"
            },
            "TotalRecords": {
                "N": "5"
            },
            "ErrorCount": {
                "N": "3"
            },
            "InfoCount": {
                "N": "0"
            },
            "HourKey": {
                "S": "2025-10-08 13"
            }
        },
        {
            "ProcessingTime": {
                "S": "2025-10-07T23:45:07.456Z"
            },
            "WarningCount": {
                "N": "3"
            },
            "HourOfDay": {
                "S": "09"
            },
            "TotalRecords": {
                "N": "6"
            },
            "ErrorCount": {
                "N": "2"
            },
            "InfoCount": {
                "N": "1"
            },
            "HourKey": {
                "S": "2025-10-08 09"
            }
        },
        …
],
    "Count": 24,
    "ScannedCount": 24,
    "ConsumedCapacity": null
}

Run the following command to check the output of the Step Functions state machine execution output.

aws stepfunctions describe-execution --execution-arn <executionArn> --query 'output' --output text

The following is a sample output:

{
  "Summary": {
    "date": "2025-10-08",
    "totalErrors": 50,
    "totalWarnings": 41,
    "totalRecords": 133,
    "hourlyBreakdown": {
      "00": {
        "errors": 1,
        "warnings": 3,
        "records": 5
      },
      "01": {
        "errors": 1,
        "warnings": 1,
        "records": 5
      },
      "02": {
        "errors": 2,
        "warnings": 3,
        "records": 5
      },
      "03": {
        "errors": 3,
        "warnings": 2,
        "records": 7
      },
…
…
    "generatedAt": "2025-10-08T05:19:05.603889"
  }
}

The output of the Step Functions state machine shows the daily summary insights of the log files created by the Lambda function.

Clean up

To avoid costs, remove all resources created for this post once you’re done. Run the following command after replacing amzn-s3-demo-bucket1 with your own bucket name to delete the resources you deployed for this post’s solution:

aws s3 rm s3://amzn-s3-demo-bucket1 --recursive
sam delete
rm -rf logs/

Conclusion

In this post, you learned how AWS Step Functions Distributed Map can use prefix-based iteration with LOAD_AND_FLATTEN transformation to read and process multiple data objects from Amazon S3 buckets directly. You no longer need one step to process object metadata and another to load the data objects. Loading and flatting in one step is particularly valuable for data processing pipelines, batch operations, and event-driven architectures where objects are continually added to S3 locations. By eliminating the need to maintain object manifests, you can build more resilient, dynamic data processing workflows with less code and fewer moving parts.

New input sources for Distributed Map are available in all commercial AWS Regions where AWS Step Functions is available. To get started, you can use the Distributed Map mode today in the AWS Step Functions console. To learn more, visit the Step Functions developer guide.

For more serverless learning resources, visit Serverless Land.

Introducing AWS RTB Fabric for real-time advertising technology workloads

2025-10-23 Betty Zheng (郑予彬)

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/introducing-aws-rtb-fabric-for-real-time-advertising-technology-workloads/

Today, we’re announcing AWS RTB Fabric, a fully managed service purpose built for real-time bidding (RTB) advertising workloads. The service helps advertising technology (AdTech) companies seamlessly connect with their supply and demand partners, such as Amazon Ads, GumGum, Kargo, MobileFuse, Sovrn, TripleLift, Viant, Yieldmo and more, to run high-volume, latency-sensitive RTB workloads on Amazon Web Services (AWS) with consistent single-digit millisecond performance and up to 80% lower networking costs compared to standard networking costs.

AWS RTB Fabric provides a dedicated, high-performance network environment for RTB workloads and partner integrations without requiring colocated, on-premises infrastructure or upfront commitments. The following diagram shows the high-level architecture of RTB Fabric.

AWS RTB Fabric also includes modules, a capability that helps customers bring their own and partner applications securely into the compute environment used for real-time bidding. Modules support containerized applications and foundation models (FMs) that can enhance transaction efficiency and bidding effectiveness. At launch, AWS RTB Fabric includes modules for optimizing traffic management, improving bid efficiency, and increasing bid response rates, all running inline within the service for consistent low-latency execution.

The growth of programmatic advertising has created a need for low-latency, cost-efficient infrastructure to support RTB workloads. AdTech companies process millions of bid requests per second across publishers, supply-side platforms (SSPs), and demand-side platforms (DSPs). These workloads are highly sensitive to latency because most RTB auctions must complete within 200–300 milliseconds and require reliable, high-speed exchange of OpenRTB requests and responses among multiple partners. Many companies have addressed this by deploying infrastructure in colocation data centers near key partners, which reduces latency but adds operational complexity, long provisioning cycles, and high costs. Others have turned to cloud infrastructure to gain elasticity and scale, but they often face complex provisioning, partner-specific connectivity, and long-term commitments to achieve cost efficiency. These gaps add operational overhead and limit agility. AWS RTB Fabric solves these challenges by providing a managed private network built for RTB workloads that delivers consistent performance, simplifies partner onboarding, and achieves predictable cost efficiency without the burden of maintaining colocation or custom networking setups.

Key capabilities
AWS RTB Fabric introduces a managed foundation for running RTB workloads at scale. The service provides the following key capabilities:

Simplified connectivity to AdTech partners – When you register an RTB Fabric gateway, the service automatically generates secure endpoints that can be shared with selected partners. Using the AWS RTB Fabric API, you can create optimized, private connections to exchange RTB traffic securely across different environments. External Links are also available to connect with partners who aren’t using RTB Fabric, such as those operating on premises or in third-party cloud environments. This approach shortens integration time and simplifies collaboration among AdTech participants.
Dedicated network for low-latency advertising transactions – AWS RTB Fabric provides a managed, high-performance network layer optimized for OpenRTB communication. It connects AdTech participants such as SSPs, DSPs, and publishers through private, high-speed links that deliver consistent single-digit millisecond latency. The service automatically optimizes routing paths to maintain predictable performance and reduce networking costs, without requiring manual peering or configuration.
Pricing model aligned with RTB economics – AWS RTB Fabric uses a transaction-based pricing model designed to align with programmatic advertising economics. Customers are billed per billion transactions, providing predictable infrastructure costs that align with how advertising exchanges, SSPs, and DSPs operate.
Built-in traffic management modules – AWS RTB Fabric includes configurable modules that help AdTech workloads operate efficiently and reliably. Modules such as Rate Limiter, OpenRTB Filter, and Error Masking help you control request volume, validate message formats, and manage response handling directly in the network path. These modules execute inline within the AWS RTB Fabric environment, maintaining network-speed performance without adding application-level latency. All configurations are managed through the AWS RTB Fabric API, so you can define and update rules programmatically as your workloads scale.

Getting started
Today, you can start building with AWS RTB Fabric using the AWS Management Console, AWS Command Line Interface (AWS CLI), or infrastructure-as-code (IaC) tools such as AWS CloudFormation and Terraform.

The console provides a visual entry point to view and manage RTB gateways and links, as shown on the Dashboard of the AWS RTB Fabric console.

You can also use the AWS CLI to configure gateways, create links, and manage traffic programmatically. When I started building with AWS RTB Fabric, I used the AWS CLI to configure everything from gateway creation to link setup and traffic monitoring. The setup ran inside my Amazon Virtual Private Cloud (Amazon VPC) endpoint while AWS managed the low-latency infrastructure that connected workloads.

To begin, I created a requester gateway to send bid requests and a responder gateway to receive and process bid responses. These gateways act as secure communication points within the AWS RTB Fabric.

# Create a requester gateway with required parameters
aws rtbfabric create-requester-gateway \
  --description "My RTB requester gateway" \
  --vpc-id vpc-12345678 \
  --subnet-ids subnet-abc12345 subnet-def67890 \
  --security-group-ids sg-12345678 \
  --client-token "unique-client-token-123"

# Create a responder gateway with required parameters
aws rtbfabric create-responder-gateway \
  --description "My RTB responder gateway" \
  --vpc-id vpc-01f345ad6524a6d7 \
  --subnet-ids subnet-abc12345 subnet-def67890 \
  --security-group-ids sg-12345678 \
  --dns-name responder.example.com \
  --port 443 \
  --protocol HTTPS

After both gateways were active, I created a link from the requester to the responder to establish a private, low-latency communication path for OpenRTB traffic. The link handled routing and load balancing automatically.

# Requester account creating a link from requester gateway to a responder gateway
aws rtbfabric create-link \
  --gateway-id rtb-gw-requester123 \
  --peer-gateway-id rtb-gw-responder456 \
  --log-settings '{"applicationLogs:{"sampling":"errorLog":10.0,"filterLog":10.0}}'

# Responder account accepting a link from requester gateway to responder gateway
aws rtbfabric accept-link \
  --gateway-id rtb-gw-responder456 \
  --link-id link-reqtoresplink789 \
  --log-settings '{"applicationLogs:{"sampling":"errorLog":10.0,"filterLog":10.0}}'

I also connected with external partners using External Links, which extended my RTB workloads to on-premises or third-party environments while maintaining the same latency and security characteristics.

# Create an inbound external link endpoint for an external partner to send bid requests to
aws rtbfabric create-inbound-external-link \
  --gateway-id rtb-gw-responder456

# Create an outbound external link for sending bid requests to an external partner
aws rtbfabric create-outbound-external-link \
  --gateway-id rtb-gw-requester123 \
  --public-endpoint "https://my-external-partner-responder.com"

To manage traffic efficiently, I added modules directly into the data path. The Rate Limiter module controlled request volume, and the OpenRTB Filter validated message formats inline at network speed.

# Attach a rate limiting module
aws rtbfabric update-link-module-flow \
  --gateway-id rtb-gw-responder456 \
  --link-id link-toresponder789 \
  --modules '{"name":"RateLimiter":"moduleParameters":{"rateLimiter":{"tps":10000}}}'

Finally, I used Amazon CloudWatch to monitor throughput, latency, and module performance, and I exported logs to Amazon Simple Storage Service (Amazon S3) for auditing and optimization.

All configurations can also be automated with AWS CloudFormation or Terraform, allowing consistent, repeatable deployment across multiple environments. With RTB Fabric, I could focus on optimizing bidding logic while AWS maintained predictable, single-digit millisecond performance across my AdTech partners.

For more details, refer to the AWS RTB Fabric User Guide.

Now available
AWS RTB Fabric is available today in the following AWS Regions: US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland).

AWS RTB Fabric is continually evolving to address the changing needs of the AdTech industry. The service expands its capabilities to support secure integration of advanced applications and AI-driven optimizations in real-time bidding workflows that help customers simplify operations and improve performance on AWS. To learn more about AWS RTB Fabric, visit the AWS RTB Fabric page.

– Betty

The attendee guide to digital sovereignty sessions at AWS re:Invent 2025

2025-10-21 Brittany Bunch

Post Syndicated from Brittany Bunch original https://aws.amazon.com/blogs/security/the-attendee-guide-to-digital-sovereignty-sessions-at-aws-reinvent-2025/

AWS re:Invent 2025, the premier cloud computing conference hosted by Amazon Web Services (AWS), returns to Las Vegas, Nevada, from December 1–5, 2025. This flagship event brings together the global cloud community for an immersive week of learning, collaboration, and innovation across multiple venues. Whether you’re a cloud expert, business leader, or technology enthusiast, re:Invent offers unparalleled opportunities to explore cutting-edge cloud solutions, engage with AWS experts, and build valuable connections with peers from around the world.

From technical deep dives to strategic business sessions, re:Invent 2025 is your gateway to understanding and using the most advanced cloud technologies. In the Expo, you can visit the Digital Sovereignty and Hybrid Cloud kiosks in the AWS Village to learn about the upcoming AWS European Sovereign Cloud and other digital sovereignty solutions, and get your questions answered by AWS experts.

Join us to discover the latest cloud industry innovations, gain deep technical insights, and learn how to optimize your cloud investments for digital sovereignty. Sessions this year will include comprehensive coverage of the AWS sovereign-by-design approach, including the enhanced security capabilities of the AWS Nitro System, our expanding portfolio of digital sovereignty solutions, and the latest developments of the AWS European Sovereign Cloud. With the growing momentum around digital sovereignty, explore how AWS continues to innovate with sovereign cloud solutions that help customers maintain control over their data while using the full power of the cloud. You can customize your learning path by reserving session seating now by signing in to your attendee portal or the AWS Events mobile app.

Breakout sessions and code talks

To add sessions to your AWS re:Invent agenda and find time and location information, choose the session title link.

Security track

SEC201 | Breakout | AWS European Sovereign Cloud: From concept to reality
Colm MacCárthaigh, VP/Distinguished Engineer – EC2 Networking, AWS Addy Upreti, Principal Technical Product Manager – EC2 Core Product Management, AWS
Get a firsthand look at the AWS European Sovereign Cloud. Explore this new, independent infrastructure’s dedicated architecture, EU-based operations, operational controls coupled with governance and legal framework that powers this cloud. Learn how this cloud solution is built, operated, and secured entirely within Europe.

Cloud operations track

COP409 | Code Talk | Building Sovereign Cloud Environments
Bo Lechangeur, Pr. Delivery Engineer – STCE, AWS, and Randy Domingo, Sr. Software Development Manager – STCE, AWS
As organizations scale their operations globally, they need to meet evolving data residency, security, compliance, and business continuity requirements. This session explores how AWS Control Tower and Landing Zone Accelerator on AWS support key sovereignty requirements, including country-specific compliance frameworks, regional service selection, automated controls for data movement, and cross-border transfers. Through real-world examples, the session demonstrates how organizations can leverage AWS to implement country-specific security controls, maintain operational consistency across multi-region deployments, accelerate cloud compliance, and deploy automated security and compliance at scale.

Hybrid cloud and multicloud track

HMC202 | Breakout | AWS wherever you need it: From the cloud to the edge
Speakers: Spencer Dillard, Director, Software Development – EC2 Edge, AWS, Madhura Kale, Senior Manager, Technical Product Management – EC2 Core, AWS
While most workloads can be migrated to the cloud, some remain on-premises or at the edge due to low latency, local data processing, or digital sovereignty needs. In this session, learn how AWS services like AWS Outposts, AWS Local Zones, AWS Dedicated Local Zones, and AWS IoT support hybrid cloud and edge computing workloads such as multiplayer gaming, high-frequency trading, medical imaging, smart manufacturing, and generative AI applications with data residency requirements.

HMC308 | Breakout | Build generative and agentic AI applications on-premises and at the edge
Speakers: Chris McEvilly, Senior Solutions Architect – Hybrid Edge, AWS, Pranav Chachra, Principal Technical Product Manager – EC2 Core, AWS, and Fernando Galves, Senior Solutions Architect – Generative AI, AWS
As customers scale generative AI and agentic AI implementations from pilots to production, they need to balance speed of innovation with data sovereignty requirements, low-latency edge processing needs, and space, power, and cost efficiency. This session explores how to build generative and agentic AI solutions using AWS Local Zones, AWS Outposts, and AWS Dedicated Local Zones. Discover architectural patterns and best practices for deploying foundation models across distributed locations. Learn how to implement Retrieval Augmented Generation (RAG) with locally stored data. Gain insights into strategies for model selection and optimization.

HMC310 | Breakout | Digital sovereignty and data residency with AWS Hybrid and Edge services
Speakers: Mallory Gershenfeld, Senior Technical Product Manager – S3, AWS, Ben Lavasani, Senior Specialist – Hybrid and Edge, AWS, and Majd Aldeen Masriah, Director of Enterprise – Architecture, Geida
Countries around the world are increasingly introducing or updating data residency and digital sovereignty laws that require at least one copy, or sometimes all data, to be stored or processed in a specific geographic or sovereign location that introduces new challenges for customers. This session explores how AWS services, including AWS Dedicated Local Zones, AWS Local Zones, and AWS Outposts can help you with your digital sovereignty use cases. We’ll examine best practices for data residency, security controls, and operational consistency across deployments at the edge.

Interactive sessions (chalk talks and workshops)

Security track

SEC301| Chalk Talk | Architecting for Digital Sovereignty: From Foundation to Practice
Speakers: Eric Rose, Principal Security SA – Global Services Security, AWS and Armin Schneider, Digital Sovereignty Specialist SA – Global Services Security Digital Sovereignty
Join this chalk talk that bridges security fundamentals with practical architecture strategies for implementing digital sovereignty in the cloud. Through real-world examples from the United Arab Emirates Cybersecurity Council and the upcoming AWS European Sovereign Cloud, we’ll explore how organizations can use AWS sovereignty features effectively. We’ll cover practical architectural patterns for data residency, operational control, and security measures that help customers maintain full control of their data. Perfect for cloud architects and security teams, this session will show you how to design solutions that balance sovereignty requirements with cloud advantages, illustrated with examples from government and enterprise deployments.

Hybrid cloud and multicloud track

HMC301| Workshop | Build and operate resilient and performant distributed applications
Speakers: Saravanan Shanmugam, Senior Solutions Architect – Hybrid Edge, AWS and Sedji Gaouaou, Senior Solutions Architect – Networking, AWS
This workshop explores how to design and implement applications for multi-geo operations while meeting data residency and performance requirements. You will learn how to design fault-tolerant, latency-sensitive applications across distributed locations with limited hardware resources. You will also explore distributed hybrid architectures, edge networking implementations, and traffic management solutions that balance regulatory requirements with high availability needs. Learn practical strategies for optimizing performance while maintaining data sovereignty across distributed locations.

HMC302| Workshop| Implementing agentic AI solutions on-premises and at the edge
Speakers: Fernando Galves, Senior Solutions Architect – Generative AI, AWS and Kyle Palasti, Senior Solutions Architect – Hybrid Edge, AWS
As governments and standards bodies develop data protection and privacy regulations, organizations increasingly need to combine the use of generative AI tooling in the cloud with regulated data that needs to remain on-premises to meet data residency requirements. In this workshop, learn how to extend Amazon Bedrock AgentCore to hybrid and edge services like AWS Outposts and AWS Local Zones to build distributed agentic applications using Model Context Protocol (MCP) and agent-to-agent (A2A) communication with on-premises data for improved model outcomes. Get hands-on with hybrid agentic AI using Amazon Bedrock and Strands Agents while exploring AWS hybrid and edge services.

HMC305 | Workshop | Low-latency SLM deployment: Optimizing inference on AWS Hybrid and Edge Services
Speakers: Leonardo Solano, Principal Solutions Architect – Networking & Hybrid Edge, AWS and Obed Gutierrez, Senior Solutions Architect, AWS
This hands-on workshop demonstrates a fully local deployment approach for running Small Language Models (SLMs) at the edge using AWS Local Zones and AWS Outposts. The implementation focuses on achieving low-latency inference and enabling data sovereignty compliance through Retrieval Augmented Generation (RAG) applications within local infrastructure. Using Amazon Elastic Compute Cloud (Amazon EC2) instances and publicly available models, you will learn how to deploy, optimize, and manage SLMs in edge environments, ensuring the RAG system and language model operate locally to meet strict latency and data residency requirements for production scenarios.

HMC312 | Chalk Talk | Implement RAG while meeting data residency requirements
Speakers: Lakshmi VP, Solutions Architect, AWS and Akshata Ketkar, Senior Product Manager – EC2 Edge, AWS
As governments develop data protection and privacy regulations, organizations increasingly need to leverage generative AI with regulated data that needs to remain on-premises to meet data sovereignty requirements. This session explores how to implement Retrieval Augmented Generation (RAG) with on-premises and edge data. Learn how to extend Amazon Bedrock AgentCore to AWS Outposts and AWS Local Zones for a hybrid RAG architecture, or build a local RAG architecture for more stringent data residency requirements. Discover the latest techniques like reranker models to improve precision without increasing model size, reduce inference cost, and enforce more governance and control over prompt outcomes.

HMC314 | Chalk Talk | Deploying for resilience: HA/DR strategies for AWS Outposts and Local Zones
Speakers: Afaq Khan, Senior Product Manager – EC2 Edge, AWS and Brianna Rosentrater, Senior Solutions Architect – Hybrid Edge, AWS
Critical workloads at the edge demand robust high-availability and disaster recovery strategies. In this chalk talk, learn how to plan and implement resilient deployments using AWS hybrid cloud and edge computing services. We’ll examine how to architect edge infrastructure using AWS Local Zones and AWS Outposts, covering key aspects of networking, compute, and storage redundancy. Through real customer examples and reference architectures, we’ll explore deployment patterns and best practices for maintaining business continuity across failure modes. Join us to learn practical strategies for achieving your RPO/RTO objectives with edge deployments.

HMC403 | Code Talk | Build and optimize edge architects for resiliency with AI
Speakers: Jesus Federico, Principal Solutions Architect – Generative AI, AWS and Robert Belson, Senior Solutions Architect & Developer Advocate, AWS
This live coding session explores how to automate edge infrastructure operations with AI. Discover how to build truly resilient architectures with the latest AWS Outposts and AWS Local Zones APIs. We’ll walk through real-world code examples for querying Outposts hardware inventory, implementing intelligent resource placement, and automating failover configurations. You’ll learn how Amazon Bedrock can analyze architecture patterns and generate Infrastructure as Code (IaC) recommendations for optimal component distribution. Walk away with practical techniques for API integration, automated health checks, and dynamic resource allocation, plus working code samples and deployment templates for building adaptive, highly available edge solutions.

HMC316 | Chalk Talk | Address digital sovereignty with hybrid cloud solutions
Speakers: Sherry Lin, Principal Product Manager – EC2 Core, AWS and Enrico Liguori, Solutions Architect – Networking, AWS
As organizations scale innovative solutions globally, they need to navigate complex digital sovereignty requirements. This session explores how AWS can help you accelerate global scaling while meeting regulatory obligations. We’ll compare various sovereign infrastructure options with a focus on AWS Local Zones, AWS Dedicated Local Zones, AWS Outposts, and AWS European Sovereign Cloud. Learn how to choose the best option for your sovereign needs and architect applications for data residency and resiliency. Discover how to implement security controls to regulate how data can be stored, processed, and transferred, and how to prevent unauthorized data access.

For a full view of digital sovereignty content, including sessions with partners, explore the AWS re:Invent catalog and filter on the Digital Sovereignty area of interest. Not able to attend in-person? Register forthe virtual-only pass offered at no additional cost to livestream keynotes and innovation talks, and access on-demand breakout sessions today. See you in Las Vegas or on the livestream!

If you have feedback about this post, submit comments in the Comments section below.

Introducing Amazon EBS Volume Clones: Create instant copies of your EBS volumes

2025-10-15 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/introducing-amazon-ebs-volume-clones-create-instant-copies-of-your-ebs-volumes/

As someone that used to work at Sun Microsystems, where ZFS was invented, I’ve always loved working with storage systems that offer instant volume copies for my development and testing needs.

Today, I’m excited to share that AWS is bringing similar capabilities to Amazon Elastic Block Store (Amazon EBS) with the launch of Amazon EBS Volume Clones, a new capability that lets you create instant point-in-time copies of your EBS volumes within the same Availability Zone.

Many customers need to create copies of their production data to support development and testing activities in a separate nonproduction environment. Until now, this process required taking an EBS snapshot (stored in Amazon Simple Storage Service (Amazon S3)) and then creating a new volume from that snapshot. Although this approach works, the process creates operational overhead due to multiple steps.

With Amazon EBS Volume Clones, you can now create copies of your EBS volumes with a single API call or console click. The copied volumes are available within seconds and provide immediate access to your data with single-digit millisecond latency. This makes Volume Clones particularly useful for quickly setting up test environments with production data or creating temporary copies of databases for development purposes.

Let me show you how Volume Clones works
For this post, I created a small Amazon Elastic Compute Cloud (Amazon EC2) instance, with an attached volume. I created a file on the root file system with the command echo "Hello CopyVolumes" > hello.txt.

To initiate the copy, I open a browser on the AWS Management Console and I navigate to EC2, Elastic Block Store, Volumes. I select the volume I want to copy.

Note that, at the time of publication of this post, only encrypted volumes can be copied.

On the Actions menu, I choose the Copy Volume option.

Next, I choose the details of the target volume. I can change the Volume type and adjust the Size, IOPS, and Throughput parameters. I choose Copy volume to start the Volume Clone operation.

The copied volume enters the Creating state and becomes available within seconds. I can then attach it to an EC2 instance and start using it immediately.

Data blocks are copied from the source volume and written to the volume copy in the background. The volume remains in the Initializing state until the process is complete. I can monitor its progress with the describe-volume-status API. The initializing operation doesn’t affect the performance of the source volume. I can continue using it normally during the copy process.

I love that the copied volume is available immediately. I don’t need to wait for its initialization to complete. During the initialization phase, my copied volume delivers performance based on the lowest of: a baseline of 3,000 IOPS and 125 MiB/s, the source volume’s provisioned performance, or the copied volume’s provisioned performance.

After initialization is completed, the copied volume becomes fully independent of the source volume and delivers its full provisioned performance.

Alternatively, I can use the AWS Command Line Interface (AWS CLI) to initiate the copy:

aws ec2 copy-volumes                          \
     --source-volume-id vol-1234567890abcdef0 \
     --size 500                               \
     --volume-type gp3

After the volume copy is created, I attach it to my EC2 instance and mount it. I can check the file I created at start is present.

First, I attach the volume from my laptop, using the attach-volume command:

aws ec2 attach-volume \
         --volume-id 'vol-09b700e3a23a9b4ad' \
         --instance-id 'i-079e6504ad25b029e'   \
         --device '/dev/sdb'

Then, I connect to the instance, and I type these commands:

$ sudo lsblk -f
NAME          FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
nvme0n1                                                                              
├─nvme0n1p1   xfs          /     49e26d9d-0a9d-4667-b93e-a23d1de8eacd    6.2G    22% /
└─nvme0n1p128 vfat   FAT16       3105-2F44                               8.6M    14% /boot/efi
nvme1n1                                                                              
├─nvme1n1p1   xfs          /     49e26d9d-0a9d-4667-b93e-a23d1de8eacd                
└─nvme1n1p128 vfat   FAT16       3105-2F44     

$ sudo mount -t xfs /dev/nvme1n1p1 /data

$ df -h
Filesystem        Size  Used Avail Use% Mounted on
devtmpfs          4.0M     0  4.0M   0% /dev
tmpfs             924M     0  924M   0% /dev/shm
tmpfs             370M  476K  369M   1% /run
/dev/nvme0n1p1    8.0G  1.8G  6.2G  22% /
tmpfs             924M     0  924M   0% /tmp
/dev/nvme0n1p128   10M  1.4M  8.7M  14% /boot/efi
tmpfs             185M     0  185M   0% /run/user/1000
/dev/nvme1n1p1    8.0G  1.8G  6.2G  22% /data

$ cat /data/home/ec2-user/hello.txt 
Hello CopyVolumes

Things to know
Volume Clones creates copies within the same Availability Zone as your source volume. You can create copies from encrypted volumes only, and the size of your copy must be equal to or greater than the source volume.

Volume Clones creates crash-consistent copies of your volumes, exactly like snapshots. For application consistency, you need to pause application I/O operations before creating the copy. For example, with PostgreSQL databases, you can use the pg_start_backup() and pg_stop_backup() functions to pause writes and create a consistent copy. At the operating system level on Linux with XFS, you can use the xfs_freeze command to temporarily suspend and resume access to the file system and ensure all cached updates are written to disk.

Although Volume Clones creates point-in-time copies, it complements rather than replaces EBS snapshots for backup purposes. EBS snapshots remain the recommended solution for data backup and protection against AZ-level and volume failures. Snapshots provide incremental backups to Amazon S3 with 11 nines of durability, compared to Volume Clones which maintains EBS volume durability (99.999% for io2, 99.9% for other volume types). Consider using Volume Clones specifically for test and development environment scenarios where you need instant access to volume copies.

Copied volumes exist independently of their source volumes and continue to incur standard EBS volume charges until you delete them. To manage costs effectively, implement governance rules to identify and remove copied volumes that are no longer needed for your development or testing activities.

Pricing and availability
Volume Clones supports all EBS volume types and works with volumes in the same AWS account and Availability Zone. This new capability is available in all AWS commercial Regions, selected Local Zones, and in the AWS GovCloud (US).

For pricing, you’re charged a one-time fee per GiB of data on the source volume at initiation and standard EBS pricing for the new volume.

I find Volume Clones particularly valuable for database workloads and continuous integration (CI) scenarios. For instance, you can quickly create a copy of your production database for testing new features or troubleshooting issues without impacting your production environment or waiting for data to hydrate from Amazon S3.

To get started with Amazon EBS Volume Clones, visit the Amazon EBS section on the console or check out the EBS documentation. I look forward to hearing how you use this capability to improve your development workflows.

— seb

AWS Weekly Roundup: Amazon Quick Suite, Amazon EC2, Amazon EKS, and more (October 13, 2025)

2025-10-13 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-quick-suite-amazon-ec2-amazon-eks-and-more-october-13-2025/

This week I was at the inaugural AWS AI in Practice meetup from the AWS User Group UK. AI-assisted software development and agents were the focus of the evening! Next week I’ll be in Italy for Codemotion (Milan) and an AWS User Group meetup (Rome). I am also excited to try the new Amazon Quick Suite that brings AI-powered research, business intelligence, and automation capabilities into a single workspace.

Last week’s launches
Here are the launches that got my attention this week:

Amazon Quick Suite – A new agentic teammate that quickly answers your questions at work and turns those insights into actions for you. Read more in Esra’s launch post.
Amazon EC2 – General-purpose M8a instances powered by the 5th Generation AMD EPYC (codename Turin) processors and compute-optimized C8i and C8i-flex instances powered by custom Intel Xeon 6 processors are now available.
Amazon EKS – EKS and EKS Distro now support Kubernetes version 1.34 with several improvements.
AWS IAM Identity Center – AWS Key Management Service keys can now be used to encrypt identity data stored in IAM Identity Center organization instances.
Amazon VPC Lattice – You can now configure the number of IPv4 addresses assigned to resource gateway elastic network interfaces (ENIs). The IPv4 addresses are used for network address translation and determine the maximum number of concurrent IPv4 connections to a resource
Amazon Q Developer – Amazon Q Developer can help you get information about AWS product and service pricing, availability, and attributes, making it easier to select the right resources and estimate workload costs using natural language. More info in this blog post.
Amazon RDS for Db2 – You can now perform native database-level backups, offering greater flexibility in database management and migration.
AWS Service Quotas – Get notified of your quota usage with automatic quota management. Configure your preferred notifications channels, such as email, SMS, or Slack. Notifications are also available in AWS Health, and you can subscribe to related AWS Cloudtrail events for automation workflows.
Amazon Connect – You can now programmatically enrich case data with the new case APIs to link related cases, add custom related items, and search across them. You can now also customize service level calculations to your specific needs. New capabilities that have just been introduced include copy and bulk edit of agent scheduling configuration and agent schedule adherence notifications.
AWS Client VPN – Now supports MacOS Tahoe.

Additional updates
Here are some additional projects, blog posts, and news items that I found interesting:

Serverless ICYMI Q3 2025 – A quarterly recap of serverless news, in case you missed it.
Best practices for migrating from Apache Airflow 2.x to Apache Airflow 3.x on Amazon MWAA – A guide to help get the benefit of the new release.
Building self-managed RAG applications with Amazon EKS and Amazon S3 Vectors – A reference architecture for building and deploying a self-managed RAG application using open source tools such as Ray, Hugging Face, and LangChain.
BBVA: Building a multi-region, multi-country global Data and ML Platform at scale – A six-part series of posts describing the journey to transform BBVA entire data analytics infrastructure with one of the largest and most complex cloud migrations in the banking sector.
Customizing text content moderation with Amazon Nova – Fine-tuned for content moderation tasks tailored to your requirements using domain-specific training data and organization-specific moderation guidelines.

Upcoming AWS events
Check your calendars so that you can sign up for these upcoming events:

AWS AI Agent Global Hackathon – This is your chance to dive deep into our powerful generative AI stack and create something truly awesome. From September 8th to October 20th, you have the opportunity to create AI agents using AWS suite of AI services, competing for over $45,000 in prizes and exclusive go-to-market opportunities.
AWS Gen AI Lofts – You can learn AWS AI products and services with exclusive sessions, meet industry-leading experts, and have valuable networking opportunities with investors and peers. Register in your nearest city: Paris (October 7–21), London (Oct 13–21), and Tel Aviv (November 11–19).
AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Budapest (October 16).

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse here upcoming in-person events, developer-focused events, and events for startups.

That’s all for this week. Check back next Monday for another Weekly Roundup!

– Danilo

Announcing Amazon Quick Suite: your agentic teammate for answering questions and taking action

2025-10-09 Esra Kayabali

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/reimagine-the-way-you-work-with-ai-agents-in-amazon-quick-suite/

Today, we’re announcing Amazon Quick Suite, a new agentic teammate that quickly answers your questions at work and turns those insights into actions for you. Instead of switching between multiple applications to gather data, find important signals and trends, and complete manual tasks, Quick Suite brings AI-powered research, business intelligence, and automation capabilities into a single workspace. You can now analyze data through natural language queries, find critical information across enterprise and external sources in minutes, and automate processes from simple tasks to complex multi-department workflows.

Here’s a look into Quick Suite.

Business users often need to gather data across multiple applications—pulling customer details, checking performance metrics, reviewing internal product information, and performing competitive intelligence. This fragmented process often requires consultation with specialized teams to analyze advanced datasets, and in some cases, must be repeated regularly, reducing efficiency and leading to incomplete insights for decision-making.

Quick Suite helps you overcome these challenges by combining agentic teammates for research, business intelligence, and automation into a unified digital workspace for your day-to-day work.

Integrated capabilities that power productivity
Quick Suite includes the following integrated capabilities:

Research – Quick Research accelerates complex research by combining enterprise knowledge, premium third-party data, and data from the internet for more comprehensive insights.
Business intelligence – Quick Sight provides AI-powered business intelligence capabilities that transform data into actionable insights through natural language queries and interactive visualizations, helping everyone make faster decisions and achieve better business outcomes.
Automation – Quick Flows and Quick Automate help users and technical teams to automate any business process from simple, routine tasks to complex multi-department workflows, enabling faster execution and reducing manual work across the organization.

Let’s dive into some of these key capabilities.

Quick Index: Your unified knowledge foundation
Quick Index creates a secure, searchable repository that consolidates documents, files, and application data to power AI-driven insights and responses across your organization.

As a foundational component of Quick Suite, Quick Index operates in the background to bring together all your data—from databases and data warehouses to documents and email. This creates a single, intelligent knowledge base that makes AI responses more accurate and reduces time spent searching for information.

Quick Index automatically indexes and prepares any uploaded files or unstructured data you add to your Quick Suite, enabling efficient searching, sorting, and data access. For example, when you search for a specific project update, Quick Index instantly returns results from uploaded documents, meeting notes, project files, and reference materials—all from one unified search instead of checking different repositories and file systems.

To learn more, visit the Quick Index overview page.

Quick Research: From complex business challenges to expert-level insights
Quick Research is a powerful agent that conducts comprehensive research across your enterprise data and external sources to deliver contextualized, actionable insights in minutes or hours — work that previously could take longer.

Quick Research systematically breaks down complex questions into organized research plans. Starting with a simple prompt, it automatically creates detailed research frameworks that outline the approach and data sources needed for comprehensive analysis.

After Quick Research creates the plan, you can easily refine it through natural language conversations. When you are happy with the plan, it works in the background to gather information from multiple sources, using advanced reasoning to validate findings and provide thorough analysis with citations.

Quick Research integrates with your enterprise data connected to Quick Suite, the unified knowledge foundation that connects to your dashboards, documents, databases, and external sources, including Amazon S3, Snowflake, Google Drive, and Microsoft SharePoint. Quick Research grounds key insights to original sources and reveals clear reasoning paths, helping you verify accuracy, understand the logic behind recommendations, and present findings with confidence. You can trace findings back to their original sources and validate conclusions through source citations. This makes it ideal for complex topics requiring in-depth analysis.

To learn more, visit the Quick Research overview page.

Quick Sight: AI-powered business intelligence
Quick Sight provides AI-powered business intelligence capabilities that transform data into actionable insights through natural language queries and interactive visualizations.

You can create dashboards and executive summaries using conversational prompts, reducing dashboard development time while making advanced analytics accessible without specialized skills.

Quick Sight helps you ask questions about your data in natural language and receive instant visualizations, executive summaries, and insights. This generative AI integration provides you with answers from your dashboards and datasets without requiring technical expertise.

Using the scenarios capability, you can perform what-if analysis in natural language with step-by-step guidance, exploring complex business scenarios and finding answers faster than before.

Additionally, you can respond to insights with one-click actions by creating tickets, sending alerts, updating records, or triggering automated workflows directly from your dashboards without switching applications.

To learn more, visit Quick Sight overview page.

Quick Flows: Automation for everyone
With Quick Flows, any user can automate repetitive tasks by describing their workflow using natural language without requiring any technical knowledge. Quick Flows fetches information from internal and external sources, takes action in business applications, generates content, and handles process-specific requirements.

Starting with straightforward business requirements, it creates a multi-step flow including input steps for gathering information, reasoning groups for AI-powered processing, and output steps for generating and presenting results.

After the flow is configured, you can share it with a single click to your coworkers and other teams. To execute the flow, users can open it from the library or invoke it from chat, provide the necessary inputs, and then chat with the agent to refine the outputs and further customize the results.

To learn more, visit the Quick Flows overview page.

Quick Automate: Enterprise-scale process automation
Quick Automate helps technical teams build and deploy sophisticated automation for complex, multistep processes that span departments, systems, and third-party integrations. Using AI-powered natural language processing, Quick Automate transforms complex business processes into multi-agent workflows that can be created merely by describing what you want to automate or uploading process documentation.

While Quick Flows handles straightforward workflows, Quick Automate is designed for comprehensive and complex business processes like customer onboarding, procurement automations, or compliance procedures that involve multiple approval steps, system integrations, and cross-departmental coordination. Quick Automate offers advanced orchestration capabilities with extensive monitoring, debugging, versioning, and deployment features.

Quick Automate then generates a comprehensive automation plan with detailed steps and actions. You will find a UI agent that understands natural language instructions to autonomously navigate websites, complete form inputs, extract data, and produces structured outputs for downstream automation steps.

Additionally, you can define a custom agent, complete with instructions, knowledge, and tools, to complete process-specific tasks using the visual building experience – no code required.

Quick Automate includes enterprise-grade features such as user role management and human-in-the-loop capabilities that route specific tasks to users or groups for review and approval before continuing workflows. The service provides comprehensive observability with real-time monitoring, success rate tracking, and audit trails for compliance and governance.

To learn more, visit the Quick Automate overview page.

Additional foundational capabilities
Quick Suite includes other foundational capabilities that deliver seamless data organization and contextual AI interactions across your enterprise.

Spaces – Spaces provide a straightforward way for every business user to add their own context by uploading files or connecting to specific datasets and repositories specific to their work or to a particular function. For example, you might create a space for quarterly planning that includes budget spreadsheets, market research reports, and strategic planning documents. Or you could set up a product launch space that connects to your project management system and customer feedback databases. Spaces can scale from personal use to enterprise-wide deployment while maintaining access permissions and seamless integration with Quick Suite capabilities.

Chat agents – Quick Suite includes insights agents that you can use to interact with your data and workflows through natural language. Quick Suite includes a built-in agent to answer questions across all of your data and custom chat agents that you can configure with specific expertise and business context. Custom chat agents can be tailored for particular departments or use cases—such as a sales agent connected to your product catalog data and pricing information stored in a space or a compliance agent configured with your regulatory requirements and actions to request approvals.

Additional things to know
If you’re an existing Amazon QuickSight customer – Amazon QuickSight customers will be upgraded to Quick Suite, a unified digital workspace that includes all your existing QuickSight business intelligence capabilities (now called “Quick Sight”) plus new agentic AI capabilities. This is an interface and capability change—your data connectivity, user access, content, security controls, user permissions, and privacy settings remain exactly the same. No data is moved, migrated, or changed.

Quick Suite offers per-user subscription-based pricing with consumption-based charges for the Quick Index and other optional features. You can find more detail on the Quick Suite pricing page.

Now available
Amazon Quick Suite gives you a set of agentic teammates that helps you get the answers you need using all your data and move instantly from answers to action so you can focus on high value activities that drive better business and customer outcomes.

Visit the getting started page to start using Amazon Quick Suite today.

Happy building
— Esra and Donnie

Understanding the AWS Well-Architected Framework

OpenSearch Service Lens

Getting started with the OpenSearch Lens

Conclusion and next steps

About the authors

Contributors

When to use Intelligent Rebalancing

How to use Intelligent Rebalancing

Conclusion

About the authors

Use case: IoT sensor data processing

Prerequisites

Set up the state machine and sample data

Create the Athena database and tables

Start your state machine

Review workflow performance

Clean up

Conclusion

Use case: e-commerce product data enrichment

Prerequisite

Set up and run the workflow

Monitor and verify results

Clean up

Conclusion

Search results with context

How enhanced search works

Solution overview

Prerequisites

Asset discovery with explainable search

Conclusion

About the authors

Real-world use cases

How it works

Warm throughput

Committed-usage pricing

Getting started

Enabling On-demand Advantage mode

Configuring warm throughput

Throttling and best practices for optimal performance

Warm throughput in action

Conclusion

About the authors

Overview

Navigating ESM configurations

Dedicated event source mapping tools

Event source mapping tools in action

Prerequisites and installation

Creating and configuring an event source mapping

Optimizing event source mapping performance

Troubleshooting event source mapping issues

Key benefits

New tools available in the Serverless MCP Server

Best practices and considerations

Conclusion

Real world use cases

How it works

Getting started

About the authors

Use case: Application log processing and summarization

Prerequisites

Set up and run the workflow

Monitor and verify results

Clean up

Conclusion

Breakout sessions and code talks

Security track

Cloud operations track

Hybrid cloud and multicloud track

Interactive sessions (chalk talks and workshops)

Security track

Hybrid cloud and multicloud track

The collective thoughts of the interwebz